The Butera-Pernici algorithm (2)

Jan 6, 2020 mathematics computation

The purpose of this post will be to see if we can implement the algorithm in Julia, and thus leverage Julia's very fast execution time.

We are working with polynomials defined on nilpotent variables, which means that the degree of any generator in a polynomial term will be 0 or 1. Assume that our generators are indexed from zero: \(x_0,x_1,\ldots,x_{n-1}\), then any term in a polynomial will have the form \[ cx_{i_1}x_{i_2}\cdots x_{i_k} \] where \(\{x_{i_1}, x_{i_2},\ldots, x_{i_k}\}\subseteq\{0,1,2,\ldots,n-1\}\). We can then express this term as an element of a dictionary {k => v} where \[ k = 2^{i_1}+2^{i_2}+\cdots+2^{i_k}. \] So, for example, the polynomial term \(7x_2x_3x_5\) would correspond to the dictionary term

44 => 7

since \(44 = 2^2+2^3+2^5\). Two polynomial terms {k1 => v1} and {k2 => v2} with no variables in common can then be multiplied simply by adding the k terms, and multiplying the v values, to obtain {k1+k2 => v1*v2} . And we can check if k1 and k2 have a common variable easily by evaluating k1 & k2; a non-zero value indicates a common variable. This leads to the following Julia function for multiplying two such dictionaries:

 1function poly_dict_mul(p1, p2)
 2  p3 = Dict{BigInt,BigInt}()
 3  for (k1, v1) in p1
 4    for (k2, v2) in p2
 5          if k1 & k2 > 0
 6            continue
 7          else
 8            if k1 + k2 in keys(p3)
 9                  p3[k1+k2] += v1 * v2
10            else
11                  p3[k1+k2] = v1 * v2
12            end
13          end
14    end
15    end
16  return (p3)
17end

As you see, this is a simple double loop over the terms in each polynomial dictionary. If two terms have a non-zero conjunction, we simply move on. If two terms when added already exist in the new dictionary, we add to that term. If the sum of terms is new, we create a new dictionary element. The use of BigInt is to ensure that no matter how big the terms and coefficients become, we don't suffer from arithmetic overflow.

For example, suppose we consider the product \[ (x_0+x_1+x_2)(x_1+x_2+x_3). \] A straightforward expansion produces \[ x_0x_3 + x_1x_3 + x_1^2 + x_0x_1 + x_0x_2 + 2x_1x_2 + x_2^2 + x_2x_3. \] which by nilpotency becomes \[ x_0x_3 + x_1x_3 + x_0x_1 + x_0x_2 + 2x_1x_2 + x_2x_3. \] The dictionaries corresponding to the two polynomials are

{1 => 1, 2 => 1, 4 => 1}

and

{2 => 1, 4 => 1, 8 => 1}

Then:

1julia> poly_dict_mul(Dict(1=>1,2=>1,4=>1),Dict(2=>1,4=>1,8=>1))
2Dict{BigInt,BigInt} with 6 entries:
3  9  => 1
4  10 => 1
5  3  => 1
6  5  => 1
7  6  => 2
8  12 => 1

If we were to rewrite the keys as binary numbers, we would have

{1001 => 1, 1010 => 1, 11 => 1, 101 => 1, 110 => 2, 1100 => 1}

in which you can see that each term corresponds with the term of the product above.

Having conquered multiplication, finding the permanent should then require two steps:

Turning each row of the matrix into a polynomial dictionary.
Starting with \(p=1\), multiply all rows together, one at a time.

For step 1, suppose we have a row \(i\) of a matrix \(M=m_{ij}\). Then starting with an empty dictionary p, we move along the row, and for each non-zero element \(m_{ij}\) we add the term p[BigInt(1)<<j] = M[i,j]. For speed we use bit operations instead of arithmetic operations. This means we can create a list of all polynomial dictionaries:

 1function mat_polys(M)
 2  (n,ncols) = size(M)
 3  ps = []
 4  for i in 1:n
 5    p = Dict{BigInt,BigInt}()
 6    for j in 1:n
 7      if M[i,j] == 0
 8            continue
 9      else
10            p[BigInt(1)<<(j-1)] = M[i,j]
11      end
12    end
13    push!(ps,p)
14  end
15  return(ps)
16end

Step 2 is a simple loop; the permanent will be given as the value in the final step:

1function poly_perm(M)
2  (n,ncols) = size(M)
3  mp = mat_polys(M)
4  p = Dict{BigInt,BigInt}(0=>1)
5  for i in 1:n
6    p = poly_dict_mul(p,mp[i])
7  end
8  return(collect(values(p))[1])
9end

We don't in fact need two separate functions here; since the polynomial dictionary for each row is only used once, we could simply create each one as we needed. However, given that none of our matrices will be too large, the saving of time and space would be minimal.

Now for a few tests:

 1julia> n = 10; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
 2julia> poly_perm(M)
 3125
 4
 5julia> n = 20; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
 6julia> @time poly_perm(M)
 7  0.003214 seconds (30.65 k allocations: 690.875 KiB)
 815129
 9
10julia> n = 40; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
11julia> @time poly_perm(M)
12  0.014794 seconds (234.01 k allocations: 5.046 MiB)
13228826129
14
15julia> n = 100; M = [BigInt(1)*mod(j-i,n) in [1,2,3] for i = 1:n, j = 1:n);
16julia> @time poly_perm(M)
17  0.454841 seconds (3.84 M allocations: 83.730 MiB, 27.98% gc time)
18792070839848372253129
19
20julia> lucasnum(n)+2
21792070839848372253129

This is extraordinarily fast, especially compared with our previous attempts: naive attempts using all permutations, and using Ryser's algorithm.

A few comparisons

Over the previous blog posts, we have explored various different methods of computing the permanent:

permanent, which is the most naive method, using the formal definition, and summing over all the permutations \(S_n\).
perm1, Ryser's algorithm, using the Combinatorics package and iterating over all non-empty subsets of \(\{1,2,\ldots,n\}\).
perm2, Same as perm1 but instead of using subsets, we use all non-zero binary vectors of length n.
perm3, Ryser's algorithm using Gray codes to speed the transition between subsets, and using a lookup table.

All these are completely general, and aside from the first function, which is the most inefficient, can be used for any matrix up to size about \(25\times 25\).

So consider the \(n\times n\) circulant matrix with three ones in each row, whose permanent is \(L(n)+2\). The following table shows times in seconds (except where minutes is used) for each calculation:

	10	12	15	20	30	40	60	100
`permanent`	9.3	-	-	-	-	-	-	-
`perm1`	0.014	0.18	0.72	47	-	-	-	-
`perm2`	0.03	0.105	2.63	166	-	-	-	-
`perm3`	0.004	0.016	0.15	12.4	-	-	-	-
`poly_perm`	0.0008	0.004	0.001	0.009	0.008	0.02	0.05	0.18

Assuming that the time taken for permanent is roughly proportional to \(n!n\), then we would expect that the time for matrices of sizes 23 and 24 would be about \(1.5\times 10^{17}\) and \(3.8\times 10^{18}\) seconds respectively. Note that the age of the universe is approximately \(4.32\times 10^{17}\) seconds, so my laptop would need to run for about the third of the universe's age to compute the permanent of a \(23\times 23\) matrix. That's about the time since the solar system and the Earth were formed.

Note also that poly_perm will slow down if the number of non-zero values in each row increases. For example, with four consecutive ones in each row, it takes over 10 seconds for a \(100\times 100\) matrix. With five ones in each row, it takes about 2.7 and 21.6 seconds respectively for matrices of size 40 and 60. Extrapolating indicates that it would take about 250 seconds for the \(100\times 100\) matrix. In general, an \(n\times n\) matrix with \(k\) non-zero elements in each row will have a time complexity approximately of order \(n^k\). However, including the extra optimization (which we haven't done) that allows for elements to be set to one before the multiplication, produces an algorithm whose complexity is \(O(2^{\text{min}(2w,n)}(w+1)n^2)\) where \(n\) is the size of the matrix, and \(w\) its band-width. See the original paper for details.