# Permanents and Ryser's algorithm

As I discussed in my last blog post, the *permanent* of an \(n\times n\) matrix
\(M=m_{ij}\) is defined as
\[
\text{per}(M)=\sum_{\sigma\in S_n}\prod_{i=1}^nm_{i,\sigma(i)}
\]
where the sum is taken over all permutations of the \(n\) numbers \(1,2,\ldots,n\).
It differs from the better known determinant in having no sign changes. For
example:
\[
\text{per} \begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix}
=aei+afh+bfg+bdi+cdi+ceg.
\]
By comparison, here is the determinant:
\[
\text{det} \begin{bmatrix}a&b&c\\d&e&f\\g&h&i\end{bmatrix}
=aei - afh + bfg - bdi + cdi - ceg.
\]
The apparent simplicity of the permanent definition hides the fact that there is
no known sub-exponential algorithm to compute it, nor does it satisfy most of
the nice properties of determinants. For example, we have

\[ \text{det}(AB)=\text{det}(A)\text{det}(B) \]

but in general \(\text{per}(AB)\ne\text{per}(A)\text{per}(B)\). Nor is the permanent zero if two rows are equal, or if any subset of rows is linearly dependent.

Applying the definition and summing over all the permutations is prohibitively slow; of \(O(n!n)\) complexity, and unusable except for very small matrices.

In the small but excellent textbook "Combinatorial Mathematics" by Herbert J. Ryser and published in 1963, one chapter is devoted to the inclusion-exclusion principle, of which the computation of permanents is given as an example. The permanent may be considered as a sum of products, where in each product we choose one value from each row and one value from each column.

Suppose we start by adding the rows together and multiplying them: \[ P = (a+d+g)(b+e+h)(c+f+i). \] This will certainly contain all elements of the permanent, but it also includes products we don't want, such as for example \(aef\) where the elements are chosen from only two rows, and \(ghi\) where the elements are all in one row.

To eliminate all possible products from only two rows we subtract them:

\begin{align*}
P & -(a+d)(b+e)(c+f)\text{ rows $1$ and $2$}\\\

& - (a+g)(b+h)(c+i)\text{ rows $1$ and $3$}\\\

& - (d+g)(e+h)(f+i)\text{ rows $2$ and $3$}
\end{align*}

The trouble with that subtraction is that we are subtracting products of each individual row twice; for example \(abc\) is in the first and second products. But we only want to subtract those products once from \(P\). So we have to add them again:

\begin{align*}
P & -(a+d)(b+e)(c+f)\qquad\text{ rows $1$ and $2$}\\\

& - (a+g)(b+h)(c+i)\qquad\text{ rows $1$ and $3$}\\\

& - (d+g)(e+h)(f+i)\qquad\text{ rows $2$ and $3$}\\\

& + abc\\\

& + def\\\

& + ghi.
\end{align*}

Computing the permanent of a \(4\times 4\) matrix would start by adding all the rows and multiplying the sums. Then we would subtract all products of rows taken three at a time. But this would subtract all products of rows taken two at a time twice for each pair, so we add those products back in again. Finally we find that the number of times we've subtracted products of a single row have cancelled out, so we need to subtract them again:

\begin{array}{ccccc}
&1&2&3&4\\\

-&1&2&3&\\\

-&1&2&&4\\\

-&1&&3&4\\\

-&&2&3&4\\\

+&1&2&&\\\

+&1&&3&\\\

+&1&&&4\\\

+&&2&3&\\\

+&&2&3&\\\

+&&&3&4\\\

-&1\\\

-&&2\\\

-&&&3\\\

-&&&&4
\end{array}

After all of this we are left with only those products which include all fours rows and columns. And as you see, this is a standard inclusion-exclusion approach.

For an \(n\times n\) matrix, let \(S=\{1,2,\ldots,n\}\), and let \(X\subseteq S\). Then define \(R(X)\) to be the product of the sums of elements of rows indexed by \(X\). For example, with \(S=\{1,2,3\}\) and \(X=\{1,3\}\), then

\[ R(X)=(a+g)(b+h)(c+i). \] We can thus write the above method for obtaining the permanent as:

\begin{align*}
\text{per}(M)&=\sum_{\emptyset\ne X\subseteq S}(-1)^{n-|X|}R(X)\\\

&= (-1)^n\sum_{\emptyset\ne X\subseteq S}(-1)^{|X|}R(X)
\end{align*}

This is *Ryser's algorithm*.

## Naive Implementation

A naive implementation would be to simply iterate through all the non-empty subsets \(X\) of \(S=\{1,2,\ldots,n\}\), and for each subset add those rows, and multiply the resulting sums.

Here's one such Julia function:

```
using Combinatorics
function perm1(M)
n, nc = size(M)
S = 1:n
P = 0
for X in setdiff(powerset(S),powerset([]))
P += (-1)^length(X)*prod(sum(M[i,:] for i in X))
end
return((-1)^n * P)
end
```

Alternatively, we can manage without any extra packages, and use the fact that the subsets of \(S\) correspond to the binary digits of integers between 1 and \(2^n-1\):

```
function perm2(M)
n,nc = size(m)
P = 0
for i in (1:2^n-1)
indx = digits(i,base=2,pad=n)
P += (-1)^sum(indx)*prod(sum(M .* indx,dims=1))
end
return((-1)^n * P)
end
```

Now for some tests. There are very few matrices for which the permanent has a known value; however there are some circulant matrices of zeros and ones whose permanent is known. One such is the \(n\times n\) matrix \(M=m_{ij}\) whose first, second, and third circulant superdiagonals are ones; that is, for which

\[ m_{ij}=1 \Leftrightarrow\bmod(j-1,n)\in\{1,2,3\}. \]

(Note that since the permanent is trivially unchanged by any permutation of the rows, \(M\) can be also defined as being a circulant matrix each row of which has three consecutive ones.)

Then

\[ \text{per}(M)=F_{n-1}+F_{n+1}+2 \]

where \(F_k\) is the \(k\) -th Fibonacci number indexed from 1, so that
\(F_1=F_2=1\). Note that the sums of Fibonacci numbers whose indices differ by
two form the *Lucas numbers* \(L_n\).

Alternatively,

\[ \text{per}(M)=\text{trace}(C_2^n)+2 \]

where

\[ C_2=\begin{bmatrix}0&1\\1&1\end{bmatrix} \]

This result, and some others, can be found in the article "Permanents" by Marvin
Marcus and Henryk Minc, in *The American Mathematical Monthly*, Vol. 72, No. 6
(Jun. - Jul., 1965), pp. 577-591.

```
julia> n = 10;
julia> M = [1*(mod(j-i,n) in [1,2,3]) for i=1:n, j=1:n]
10×10 Array{Int64,2}:
0 1 1 1 0 0 0 0 0 0
0 0 1 1 1 0 0 0 0 0
0 0 0 1 1 1 0 0 0 0
0 0 0 0 1 1 1 0 0 0
0 0 0 0 0 1 1 1 0 0
0 0 0 0 0 0 1 1 1 0
0 0 0 0 0 0 0 1 1 1
1 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 1
1 1 1 0 0 0 0 0 0 0
julia> perm1(M)
125
julia> perm2(M)
125
julia> fibonaccinum(n-1)+fibonaccinum(n+1)+2
125
julia> lucasnum(n)+2
125
julia> C2=[0 1; 1 1];
julia> tr(C2^n)+2
125
```

However, it seems that using the machinery of subsets adds to the time, which becomes noticeable if \(n\) is large:

```
julia> using Benchmarktools
julia> n = 20; M = [1*(mod(j-i,n) in [1,2,3]) for i=1:n, j=1:n];
julia> @time perm1(M)
6.703187 seconds (31.46 M allocations: 5.097 GiB, 17.93% gc time)
15129
julia> @time perm2(M)
1.677242 seconds (3.21 M allocations: 3.721 GiB, 8.69% gc time)
15129
```

That is, the `perm2`

implementation is about four times faster than `perm1`

.

## Implementation with Gray codes

Here is where we can show some cleverness. Recall that a *Gray code* is a
listing of all numbers 0 through \(2^n-1\) whose binary expansion changes in only
one bit between consecutive values. It is also known that for \(1\le k\le 2^n-1\)
then the Gray code corresponding to \(k\) is given by \(k\oplus (k\gg 1)\). For
example, for \(n=4\):

```
julia> for i in 1:15
j = xor(i,i>>1)
println(lpad(i,2),":\t\b\b\b",bin(i,4),'\t',lpad(j,2),":\t\b\b\b",bin(j,4))
end
1: [1, 0, 0, 0] 1: [1, 0, 0, 0]
2: [0, 1, 0, 0] 3: [1, 1, 0, 0]
3: [1, 1, 0, 0] 2: [0, 1, 0, 0]
4: [0, 0, 1, 0] 6: [0, 1, 1, 0]
5: [1, 0, 1, 0] 7: [1, 1, 1, 0]
6: [0, 1, 1, 0] 5: [1, 0, 1, 0]
7: [1, 1, 1, 0] 4: [0, 0, 1, 0]
8: [0, 0, 0, 1] 12: [0, 0, 1, 1]
9: [1, 0, 0, 1] 13: [1, 0, 1, 1]
10: [0, 1, 0, 1] 15: [1, 1, 1, 1]
11: [1, 1, 0, 1] 14: [0, 1, 1, 1]
12: [0, 0, 1, 1] 10: [0, 1, 0, 1]
13: [1, 0, 1, 1] 11: [1, 1, 0, 1]
14: [0, 1, 1, 1] 9: [1, 0, 0, 1]
15: [1, 1, 1, 1] 8: [0, 0, 0, 1]
```

Note that in the rightmost column of binary expansions, there is only one bit shift between consecutive values: either a single 1 is added, or removed.

For Ryser's algorithm, we can consider this in terms of our sum of rows: if the subsets are given in Gray code order, then moving from one subset to the next is a matter of just adding or subtracting one row from the current sum.

We are not so much interested in the codes themselves, as in their successive differences. For example here are the differences for \(n=4\):

```
julia> [xor(k,k>>1)-xor(k-1,(k-1)>>1) for k in 1:15]'
1×14 Adjoint{Int64,Array{Int64,1}}:
1 2 -1 4 1 -2 -1 8 1 2 -1 -4 1 -2 -1
```

This sequence is A055975 in the Online Encyclopaedia of Integer Sequences, and can be computed as

\[
a[n] = \begin{cases}
n,&\text{if $n\le 2$}\\\

2a[n/2],&\text{if $n$ is even}\\\

(-1)^{(n-1)/2},&\text{if $n$ is odd}
\end{cases}
\]

We can interpret each term in this sequence as the row that should be added or subtracted from the current sum. A value of \(2^m\) means adding row \(m+1\); a value of \(-2^m\) means subtracting row \(m+1\). From any difference, we can obtain the bit position by taking the logarithm to base 2, and whether to add or subtract by its sign. But in fact the number of differences is very small: only \(2n\), so it will be much easier to create a lookup table, using a dictionary:

```
v = vcat([(2^i,i+1,1) for i in 0:n-1],[(-2^i,i+1,-1) for i in 0:n-1])
lut = Dict(x[1] => (x[2],x[3]) for x in v)
```

For example, for \(n=3\) the lookup dictionary is

```
4 => (3, 1)
-4 => (3, -1)
2 => (2, 1)
-2 => (2, -1)
-1 => (1, -1)
1 => (1, 1)
```

Now we can implement the algorithm by first pre-computing the sequences of differences, and for each element of the sequence, use the lookup dictionary to determine what row is to be added or subtracted. We start with the first row.

Putting all this together gives another implementation of the permanent:

```
function perm3(M) # Gray code version with lookup tables
n,nc = size(M)
if n != nc
error("Matrix must be square")
end
gd = zeros(Int64,1,2^n-1)
gd[1] = 1
v = vcat([(2^i,i+1,1) for i in 0:n-1],[(-2^i,i+1,-1) for i in 0:n-1])
lut = Dict(x[1] => (x[2],x[3]) for x in v)
r = M[1,:] # r will contain the sum of rows
s = -1
pm = s*prod(r)
for i in (2:2^n-1)
if iseven(i)
gd[i] = 2*gd[div(i,2)]
else
gd[i] = (-1)^((i-1)/2)
end
r += M[lut[gd[i]][1],:]*lut[gd[i]][2]
s *= -1
pm += s*prod(r)
end
return(pm * (-1)^n)
end
```

This can be timed as before:

```
julia> @time perm3(M)
0.943328 seconds (3.15 M allocations: 728.004 MiB, 15.91% gc time)
15129
```

This is our best time yet.