The Butera-Pernici algorithm (2)
The purpose of this post will be to see if we can implement the algorithm in Julia, and thus leverage Julia's very fast execution time.
We are working with polynomials defined on nilpotent variables, which means that
the degree of any generator in a polynomial term will be 0 or 1.  Assume that
our generators are indexed from zero: \(x_0,x_1,\ldots,x_{n-1}\), then any term in a
polynomial will have the form
\[
cx_{i_1}x_{i_2}\cdots x_{i_k}
\]
where \(\{x_{i_1}, x_{i_2},\ldots, x_{i_k}\}\subseteq\{0,1,2,\ldots,n-1\}\).  We can
then express this term as an element of a dictionary {k => v} where
\[
k = 2^{i_1}+2^{i_2}+\cdots+2^{i_k}.
\]
So, for example, the polynomial term \(7x_2x_3x_5\) would correspond to the dictionary term
44 => 7
since \(44 = 2^2+2^3+2^5\).  Two polynomial terms {k1 => v1} and {k2 => v2}
with no variables in common can then be multiplied simply by adding the k terms,
and multiplying the v values, to obtain {k1+k2 => v1*v2} .  And we can check if
k1 and k2 have a common variable easily by evaluating k1 & k2; a non-zero
value indicates a common variable.  This leads to the following Julia function
for multiplying two such dictionaries:
function poly_dict_mul(p1, p2)
  p3 = Dict{BigInt,BigInt}()
  for (k1, v1) in p1
    for (k2, v2) in p2
          if k1 & k2 > 0
            continue
          else
            if k1 + k2 in keys(p3)
                  p3[k1+k2] += v1 * v2
            else
                  p3[k1+k2] = v1 * v2
            end
          end
    end
    end
  return (p3)
end
As you see, this is a simple double loop over the terms in each polynomial
dictionary.  If two terms have a non-zero conjunction, we simply move on.  If
two terms when added already exist in the new dictionary, we add to that term.
If the sum of terms is new, we create a new dictionary element.  The use of
BigInt is to ensure that no matter how big the terms and coefficients become,
we don't suffer from arithmetic overflow.
For example, suppose we consider the product \[ (x_0+x_1+x_2)(x_1+x_2+x_3). \] A straightforward expansion produces \[ x_0x_3 + x_1x_3 + x_1^2 + x_0x_1 + x_0x_2 + 2x_1x_2 + x_2^2 + x_2x_3. \] which by nilpotency becomes \[ x_0x_3 + x_1x_3 + x_0x_1 + x_0x_2 + 2x_1x_2 + x_2x_3. \] The dictionaries corresponding to the two polynomials are
{1 => 1, 2 => 1, 4 => 1}
and
{2 => 1, 4 => 1, 8 => 1}
Then:
julia> poly_dict_mul(Dict(1=>1,2=>1,4=>1),Dict(2=>1,4=>1,8=>1))
Dict{BigInt,BigInt} with 6 entries:
  9  => 1
  10 => 1
  3  => 1
  5  => 1
  6  => 2
  12 => 1
If we were to rewrite the keys as binary numbers, we would have
{1001 => 1, 1010 => 1, 11 => 1, 101 => 1, 110 => 2, 1100 => 1}
in which you can see that each term corresponds with the term of the product above.
Having conquered multiplication, finding the permanent should then require two steps:
- Turning each row of the matrix into a polynomial dictionary.
- Starting with \(p=1\), multiply all rows together, one at a time.
For step 1, suppose we have a row \(i\) of a matrix \(M=m_{ij}\).  Then starting
with an empty dictionary p, we move along the row, and for each non-zero element
\(m_{ij}\) we add the term p[BigInt(1)<<j] = M[i,j].  For speed we use bit
operations instead of arithmetic operations.  This means we can create a list of
all polynomial dictionaries:
function mat_polys(M)
  (n,ncols) = size(M)
  ps = []
  for i in 1:n
    p = Dict{BigInt,BigInt}()
    for j in 1:n
      if M[i,j] == 0
            continue
      else
            p[BigInt(1)<<(j-1)] = M[i,j]
      end
    end
    push!(ps,p)
  end
  return(ps)
end
Step 2 is a simple loop; the permanent will be given as the value in the final step:
function poly_perm(M)
  (n,ncols) = size(M)
  mp = mat_polys(M)
  p = Dict{BigInt,BigInt}(0=>1)
  for i in 1:n
    p = poly_dict_mul(p,mp[i])
  end
  return(collect(values(p))[1])
end
We don't in fact need two separate functions here; since the polynomial dictionary for each row is only used once, we could simply create each one as we needed. However, given that none of our matrices will be too large, the saving of time and space would be minimal.
Now for a few tests:
julia> n = 10; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
julia> poly_perm(M)
125
julia> n = 20; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
julia> @time poly_perm(M)
  0.003214 seconds (30.65 k allocations: 690.875 KiB)
15129
julia> n = 40; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
julia> @time poly_perm(M)
  0.014794 seconds (234.01 k allocations: 5.046 MiB)
228826129
julia> n = 100; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
julia> @time poly_perm(M)
  0.454841 seconds (3.84 M allocations: 83.730 MiB, 27.98% gc time)
792070839848372253129
julia> lucasnum(n)+2
792070839848372253129
This is extraordinarily fast, especially compared with our previous attempts: naive attempts using all permutations, and using Ryser's algorithm.
A few comparisons
Over the previous blog posts, we have explored various different methods of computing the permanent:
- 
permanent, which is the most naive method, using the formal definition, and summing over all the permutations \(S_n\).
- 
perm1, Ryser's algorithm, using the Combinatorics package and iterating over all non-empty subsets of \(\{1,2,\ldots,n\}\).
- 
perm2, Same asperm1but instead of using subsets, we use all non-zero binary vectors of length n.
- 
perm3, Ryser's algorithm using Gray codes to speed the transition between subsets, and using a lookup table.
All these are completely general, and aside from the first function, which is the most inefficient, can be used for any matrix up to size about \(25\times 25\).
So consider the \(n\times n\) circulant matrix with three ones in each row, whose permanent is \(L(n)+2\). The following table shows times in seconds (except where minutes is used) for each calculation:
| 10 | 12 | 15 | 20 | 30 | 40 | 60 | 100 | |
|---|---|---|---|---|---|---|---|---|
| permanent | 9.3 | - | - | - | - | - | - | - | 
| perm1 | 0.014 | 0.18 | 0.72 | 47 | - | - | - | - | 
| perm2 | 0.03 | 0.105 | 2.63 | 166 | - | - | - | - | 
| perm3 | 0.004 | 0.016 | 0.15 | 12.4 | - | - | - | - | 
| poly_perm | 0.0008 | 0.004 | 0.001 | 0.009 | 0.008 | 0.02 | 0.05 | 0.18 | 
Assuming that the time taken for permanent is roughly proportional to \(n!n\),
then we would expect that the time for matrices of sizes 23 and 24 would be
about \(1.5\times 10^{17}\) and \(3.8\times 10^{18}\) seconds respectively.  Note
that the age of the universe is approximately \(4.32\times 10^{17}\) seconds, so
my laptop would need to run for about the third of the universe's age to compute
the permanent of a \(23\times 23\) matrix.  That's about the time since the solar
system and the Earth were formed.
Note also that poly_perm will slow down if the number of non-zero values in
each row increases.  For example, with four consecutive ones in each row, it
takes over 10 seconds for a \(100\times 100\) matrix.  With five ones in each row,
it takes about 2.7 and 21.6 seconds respectively for matrices of size 40 and 60.
Extrapolating indicates that it would take about 250 seconds for the \(100\times
100\) matrix.  In general, an \(n\times n\) matrix with \(k\) non-zero elements in
each row will have a time complexity approximately of order \(n^k\).  However,
including the extra optimization (which we haven't done) that allows for
elements to be set to one before the multiplication, produces an algorithm whose
complexity is \(O(2^{\text{min}(2w,n)}(w+1)n^2)\) where \(n\) is the size of the
matrix, and \(w\) its band-width.  See the original
paper for details.