Permanents and Ryser's algorithm
As I discussed in my last blog post, the permanent of an \(n\times n\) matrix \(M=m_{ij}\) is defined as \[ \text{per}(M)=\sum_{\sigma\in S_n}\prod_{i=1}^nm_{i,\sigma(i)} \] where the sum is taken over all permutations of the \(n\) numbers \(1,2,\ldots,n\). It differs from the better known determinant in having no sign changes. For example: \[ \text{per} \begin{bmatrix}a&b&c\\\d&e&f\\\g&h&i\end{bmatrix} =aei+afh+bfg+bdi+cdi+ceg. \] By comparison, here is the determinant: \[ \text{det} \begin{bmatrix}a&b&c\\\d&e&f\\\g&h&i\end{bmatrix} =aei - afh + bfg - bdi + cdi - ceg. \] The apparent simplicity of the permanent definition hides the fact that there is no known sub-exponential algorithm to compute it, nor does it satisfy most of the nice properties of determinants. For example, we have
\[ \text{det}(AB)=\text{det}(A)\text{det}(B) \]
but in general \(\text{per}(AB)\ne\text{per}(A)\text{per}(B)\). Nor is the permanent zero if two rows are equal, or if any subset of rows is linearly dependent.
Applying the definition and summing over all the permutations is prohibitively slow; of \(O(n!n)\) complexity, and unusable except for very small matrices.
In the small but excellent textbook "Combinatorial Mathematics" by Herbert J. Ryser and published in 1963, one chapter is devoted to the inclusion-exclusion principle, of which the computation of permanents is given as an example. The permanent may be considered as a sum of products, where in each product we choose one value from each row and one value from each column.
Suppose we start by adding the rows together and multiplying them: \[ P = (a+d+g)(b+e+h)(c+f+i). \] This will certainly contain all elements of the permanent, but it also includes products we don't want, such as for example \(aef\) where the elements are chosen from only two rows, and \(ghi\) where the elements are all in one row.
To eliminate all possible products from only two rows we subtract them:
\begin{align*} P & -(a+d)(b+e)(c+f)\text{ rows $1$ and $2$}\\ & - (a+g)(b+h)(c+i)\text{ rows $1$ and $3$}\\ & - (d+g)(e+h)(f+i)\text{ rows $2$ and $3$} \end{align*}
The trouble with that subtraction is that we are subtracting products of each individual row twice; for example \(abc\) is in the first and second products. But we only want to subtract those products once from \(P\). So we have to add them again:
\begin{align*} P & -(a+d)(b+e)(c+f)\qquad\text{ rows $1$ and $2$}\\ & - (a+g)(b+h)(c+i)\qquad\text{ rows $1$ and $3$}\\ & - (d+g)(e+h)(f+i)\qquad\text{ rows $2$ and $3$}\\ & + abc\\ & + def\\ & + ghi. \end{align*}
Computing the permanent of a \(4\times 4\) matrix would start by adding all the rows and multiplying the sums. Then we would subtract all products of rows taken three at a time. But this would subtract all products of rows taken two at a time twice for each pair, so we add those products back in again. Finally we find that the number of times we've subtracted products of a single row have cancelled out, so we need to subtract them again:
\begin{array}{ccccc} &1&2&3&4\\ -&1&2&3&\\ -&1&2&&4\\ -&1&&3&4\\ -&&2&3&4\\ +&1&2&&\\ +&1&&3&\\ +&1&&&4\\ +&&2&3&\\ +&&2&3&\\ +&&&3&4\\ -&1\\ -&&2\\ -&&&3\\ -&&&&4 \end{array}
After all of this we are left with only those products which include all fours rows and columns. And as you see, this is a standard inclusion-exclusion approach.
For an \(n\times n\) matrix, let \(S=\{1,2,\ldots,n\}\), and let \(X\subseteq S\). Then define \(R(X)\) to be the product of the sums of elements of rows indexed by \(X\). For example, with \(S=\{1,2,3\}\) and \(X=\{1,3\}\), then
\[ R(X)=(a+g)(b+h)(c+i). \] We can thus write the above method for obtaining the permanent as:
\begin{align*} \text{per}(M)&=\sum_{\emptyset\ne X\subseteq S}(-1)^{n-|X|}R(X)\\ &= (-1)^n\sum_{\emptyset\ne X\subseteq S}(-1)^{|X|}R(X) \end{align*}
This is Ryser's algorithm.
Naive Implementation
A naive implementation would be to simply iterate through all the non-empty subsets \(X\) of \(S=\{1,2,\ldots,n\}\), and for each subset add those rows, and multiply the resulting sums.
Here's one such Julia function:
1using Combinatorics
2
3function perm1(M)
4 n, nc = size(M)
5 S = 1:n
6 P = 0
7 for X in setdiff(powerset(S),powerset([]))
8 P += (-1)^length(X)*prod(sum(M[i,:] for i in X))
9 end
10 return((-1)^n * P)
11end
Alternatively, we can manage without any extra packages, and use the fact that the subsets of \(S\) correspond to the binary digits of integers between 1 and \(2^n-1\):
1function perm2(M)
2 n,nc = size(m)
3 P = 0
4 for i in (1:2^n-1)
5 indx = digits(i,base=2,pad=n)
6 P += (-1)^sum(indx)*prod(sum(M .* indx,dims=1))
7 end
8 return((-1)^n * P)
9end
Now for some tests. There are very few matrices for which the permanent has a known value; however there are some circulant matrices of zeros and ones whose permanent is known. One such is the \(n\times n\) matrix \(M=m_{ij}\) whose first, second, and third circulant superdiagonals are ones; that is, for which
\[ m_{ij}=1 \Leftrightarrow\bmod(j-1,n)\in\{1,2,3\}. \]
(Note that since the permanent is trivially unchanged by any permutation of the rows, \(M\) can be also defined as being a circulant matrix each row of which has three consecutive ones.)
Then
\[ \text{per}(M)=F_{n-1}+F_{n+1}+2 \]
where \(F_k\) is the \(k\) -th Fibonacci number indexed from 1, so that \(F_1=F_2=1\). Note that the sums of Fibonacci numbers whose indices differ by two form the Lucas numbers \(L_n\).
Alternatively,
\[ \text{per}(M)=\text{trace}(C_2^n)+2 \]
where
\[ C_2=\begin{bmatrix}0&1\\\1&1\end{bmatrix} \]
This result, and some others, can be found in the article "Permanents" by Marvin Marcus and Henryk Minc, in The American Mathematical Monthly, Vol. 72, No. 6 (Jun. - Jul., 1965), pp. 577-591.
1julia> n = 10;
2
3julia> M = [1*(mod(j-i,n) in [1,2,3]) for i=1:n, j=1:n]
410×10 Array{Int64,2}:
5 0 1 1 1 0 0 0 0 0 0
6 0 0 1 1 1 0 0 0 0 0
7 0 0 0 1 1 1 0 0 0 0
8 0 0 0 0 1 1 1 0 0 0
9 0 0 0 0 0 1 1 1 0 0
10 0 0 0 0 0 0 1 1 1 0
11 0 0 0 0 0 0 0 1 1 1
12 1 0 0 0 0 0 0 0 1 1
13 1 1 0 0 0 0 0 0 0 1
14 1 1 1 0 0 0 0 0 0 0
15
16julia> perm1(M)
17125
18
19julia> perm2(M)
20125
21
22julia> fibonaccinum(n-1)+fibonaccinum(n+1)+2
23125
24
25julia> lucasnum(n)+2
26125
27
28julia> C2=[0 1; 1 1];
29
30julia> tr(C2^n)+2
31125
However, it seems that using the machinery of subsets adds to the time, which becomes noticeable if \(n\) is large:
1julia> using Benchmarktools
2
3julia> n = 20; M = [1*(mod(j-i,n) in [1,2,3]) for i=1:n, j=1:n];
4
5julia> @time perm1(M)
6 6.703187 seconds (31.46 M allocations: 5.097 GiB, 17.93% gc time)
715129
8
9julia> @time perm2(M)
10 1.677242 seconds (3.21 M allocations: 3.721 GiB, 8.69% gc time)
1115129
That is, the perm2
implementation is about four times faster than perm1
.
Implementation with Gray codes
Here is where we can show some cleverness. Recall that a Gray code is a listing of all numbers 0 through \(2^n-1\) whose binary expansion changes in only one bit between consecutive values. It is also known that for \(1\le k\le 2^n-1\) then the Gray code corresponding to \(k\) is given by \(k\oplus (k\gg 1)\). For example, for \(n=4\):
1julia> for i in 1:15
2 j = xor(i,i>>1)
3 println(lpad(i,2),":\t\b\b\b",bin(i,4),'\t',lpad(j,2),":\t\b\b\b",bin(j,4))
4 end
5 1: [1, 0, 0, 0] 1: [1, 0, 0, 0]
6 2: [0, 1, 0, 0] 3: [1, 1, 0, 0]
7 3: [1, 1, 0, 0] 2: [0, 1, 0, 0]
8 4: [0, 0, 1, 0] 6: [0, 1, 1, 0]
9 5: [1, 0, 1, 0] 7: [1, 1, 1, 0]
10 6: [0, 1, 1, 0] 5: [1, 0, 1, 0]
11 7: [1, 1, 1, 0] 4: [0, 0, 1, 0]
12 8: [0, 0, 0, 1] 12: [0, 0, 1, 1]
13 9: [1, 0, 0, 1] 13: [1, 0, 1, 1]
1410: [0, 1, 0, 1] 15: [1, 1, 1, 1]
1511: [1, 1, 0, 1] 14: [0, 1, 1, 1]
1612: [0, 0, 1, 1] 10: [0, 1, 0, 1]
1713: [1, 0, 1, 1] 11: [1, 1, 0, 1]
1814: [0, 1, 1, 1] 9: [1, 0, 0, 1]
1915: [1, 1, 1, 1] 8: [0, 0, 0, 1]
Note that in the rightmost column of binary expansions, there is only one bit shift between consecutive values: either a single 1 is added, or removed.
For Ryser's algorithm, we can consider this in terms of our sum of rows: if the subsets are given in Gray code order, then moving from one subset to the next is a matter of just adding or subtracting one row from the current sum.
We are not so much interested in the codes themselves, as in their successive differences. For example here are the differences for \(n=4\):
1julia> [xor(k,k>>1)-xor(k-1,(k-1)>>1) for k in 1:15]'
21×14 Adjoint{Int64,Array{Int64,1}}:
3 1 2 -1 4 1 -2 -1 8 1 2 -1 -4 1 -2 -1
This sequence is A055975 in the Online Encyclopaedia of Integer Sequences, and can be computed as
\[ a[n] = \begin{cases} n,&\text{if $n\le 2$}\\ 2a[n/2],&\text{if $n$ is even}\\ (-1)^{(n-1)/2},&\text{if $n$ is odd} \end{cases} \]
We can interpret each term in this sequence as the row that should be added or subtracted from the current sum. A value of \(2^m\) means adding row \(m+1\); a value of \(-2^m\) means subtracting row \(m+1\). From any difference, we can obtain the bit position by taking the logarithm to base 2, and whether to add or subtract by its sign. But in fact the number of differences is very small: only \(2n\), so it will be much easier to create a lookup table, using a dictionary:
1v = vcat([(2^i,i+1,1) for i in 0:n-1],[(-2^i,i+1,-1) for i in 0:n-1])
2lut = Dict(x[1] => (x[2],x[3]) for x in v)
For example, for \(n=3\) the lookup dictionary is
14 => (3, 1)
2-4 => (3, -1)
32 => (2, 1)
4-2 => (2, -1)
5-1 => (1, -1)
61 => (1, 1)
Now we can implement the algorithm by first pre-computing the sequences of differences, and for each element of the sequence, use the lookup dictionary to determine what row is to be added or subtracted. We start with the first row.
Putting all this together gives another implementation of the permanent:
1function perm3(M) # Gray code version with lookup tables
2 n,nc = size(M)
3 if n != nc
4 error("Matrix must be square")
5 end
6 gd = zeros(Int64,1,2^n-1)
7 gd[1] = 1
8
9 v = vcat([(2^i,i+1,1) for i in 0:n-1],[(-2^i,i+1,-1) for i in 0:n-1])
10 lut = Dict(x[1] => (x[2],x[3]) for x in v)
11
12 r = M[1,:] # r will contain the sum of rows
13 s = -1
14 pm = s*prod(r)
15 for i in (2:2^n-1)
16 if iseven(i)
17 gd[i] = 2*gd[div(i,2)]
18 else
19 gd[i] = (-1)^((i-1)/2)
20 end
21 r += M[lut[gd[i]][1],:]*lut[gd[i]][2]
22 s *= -1
23 pm += s*prod(r)
24 end
25 return(pm * (-1)^n)
26end
This can be timed as before:
1julia> @time perm3(M)
2 0.943328 seconds (3.15 M allocations: 728.004 MiB, 15.91% gc time)
315129
This is our best time yet.