Carroll's "improved" Doublets: allowing permutations
Carroll originally invented his Doublets in 1877, they were published in "Vanity Fair" (the magazine, not the Thackeray novel) in 1879. Some years later, in an 1892 letter, Carroll added another rule: that permutations were allowed. This allows very neat chains such as:
Because the words stay the same length here, but more connectivity is allowed, we would expect that not only would the largest connected component of the graph be bigger than before, but that chains would be shorter. And this time we can connect "chair" with "table":
To test if two words are permutations, we can create two small function:
1Julia> ssort(x) = join(sort(collect(x)))
2Julia> scmp(x,y) = cmp(ssort(x),ssort(y))
The first function ssort
simply alphabetises a string; the second function
scmp
compares two sorted strings. The function returns zero if the sorted
strings are identical.
We can then create the graph. As before, we'll start with the "medium" word list, and its sublist of five-letter words.
1Julia> nw = length(words5)
2Julia> G5 = Graph(nw)
3
4Julia> for i in 1:nw
5 for j in i+1:nw
6 wi = words5[i]
7 wj = words5[j]
8 if (Hamming()(wi,wj) == 1) | (scmp(wi,wj) == 0)
9 add_edge!(G5,i,j)
10 end
11 end
12 end
This graph G5 has 4388 vertices, 11107 edges. As before, find the largest connected component:
1Julia> CC = connected_components(G5)
2Julia> CL = map(length, CC)
3Julia> mx, indx = findmax(CL)
4Julia> C1, vmap1 = induced_subgraph(G5,CC[indx])
This new graph has 3665 vertices and 10946 edges. This is larger than the graph using only the Hamming distance, which had 4072 vertices.
Rather than just find aloof words, we'll find the number of connected components of all sizes; it turns out that there are only a small number of different such sizes:
1Julia> u = sort(unique(CL), rev = true)
2Julia> show(u)
3
4 [3665, 12, 6, 5, 4, 3, 2, 1]
5
6Julia> freqs = zeros(Int16,2,length(u))
7Julia> for i in 1:length(u)
8 freqs[1,i] = u[i]
9 freqs[2,i] = count(x -> x == u[i], CL))
10 end
11Julia> display(freqs)
12
13 2×8 Matrix{Int16}:
14 3665 12 6 5 4 3 2 1
15 1 1 2 4 4 18 62 485
We see that there are 485 aloof words (less than before), and various small components. The components between 4 and 12 words are:
allow, local, loyal, royal, vocal, aglow, allay, alley, allot, alloy, focal, atoll
group, tutor, trout, croup, grout
radio, ratio, patio, radii
amber, embed, ebbed, ember, umbel, umber
chief, thief, fiche, niche
fizzy, fuzzy, dizzy, tizzy
moron, bacon, baron, baton, boron
ocean, canoe, canon, capon
pupil, papal, papas, pupae, pupal
buddy, giddy, muddy, ruddy, biddy, middy
And now for the longest ladder:
1Julia> eccs = eccentricity(CL);
2Julia> mx = maximum(eccs)
3Julia> inds = findall(x -> x==mx, eccs)
4Julia> show(inds)
5
6 [83, 2984, 3024]
These correspond to the words "court", "cabby", "gabby". The last two words are adjacent in the graph, but each of the other pairs produces a ladder of maximum length of 27. Here's one of them:
1Julia> ld = ladder(subwords[inds[1]], subwords[inds[2]])
2Julia> println(join(ld,", "),"\nlength is ",length(ld))
3
4 court, count, mount, mound, wound, would, could, cloud, clout, flout,
5 flour, floor, flood, blood, brood, broad, board, boars, boors, boobs,
6 booby, bobby, hobby, hubby, tubby, tabby, cabby
7 length is 27