Carroll's "improved" Doublets: allowing permutations

Share on:

Carroll originally invented his Doublets in 1877, they were published in "Vanity Fair" (the magazine, not the Thackeray novel) in 1879. Some years later, in an 1892 letter, Carroll added another rule: that permutations were allowed. This allows very neat chains such as:

roses, noses, notes, steno, stent, scent

Because the words stay the same length here, but more connectivity is allowed, we would expect that not only would the largest connected component of the graph be bigger than before, but that chains would be shorter. And this time we can connect "chair" with "table":

chair, chain, china, chink, clink, blink, blind, blend, blent, bleat, table

To test if two words are permutations, we can create two small function:

1Julia> ssort(x) = join(sort(collect(x)))
2Julia> scmp(x,y) = cmp(ssort(x),ssort(y))

The first function ssort simply alphabetises a string; the second function scmp compares two sorted strings. The function returns zero if the sorted strings are identical.

We can then create the graph. As before, we'll start with the "medium" word list, and its sublist of five-letter words.

 1Julia> nw = length(words5)
 2Julia> G5 = Graph(nw)
 3
 4Julia> for i in 1:nw
 5           for j in i+1:nw
 6               wi = words5[i]
 7               wj = words5[j]
 8               if (Hamming()(wi,wj) == 1) | (scmp(wi,wj) == 0)
 9                   add_edge!(G5,i,j)
10               end
11            end
12        end

This graph G5 has 4388 vertices, 11107 edges. As before, find the largest connected component:

1Julia> CC = connected_components(G5)
2Julia> CL = map(length, CC)
3Julia> mx, indx = findmax(CL)
4Julia> C1, vmap1 = induced_subgraph(G5,CC[indx])

This new graph has 3665 vertices and 10946 edges. This is larger than the graph using only the Hamming distance, which had 4072 vertices.

Rather than just find aloof words, we'll find the number of connected components of all sizes; it turns out that there are only a small number of different such sizes:

 1Julia> u = sort(unique(CL), rev = true)
 2Julia> show(u)
 3
 4       [3665, 12, 6, 5, 4, 3, 2, 1]
 5
 6Julia> freqs = zeros(Int16,2,length(u))
 7Julia> for i in 1:length(u)
 8           freqs[1,i] = u[i]
 9           freqs[2,i] = count(x -> x == u[i], CL))
10       end
11Julia> display(freqs)
12
13       2×8 Matrix{Int16}:
14       3665  12  6  5  4   3   2    1
15          1   1  2  4  4  18  62  485

We see that there are 485 aloof words (less than before), and various small components. The components between 4 and 12 words are:

alive, voice, alike, olive, voile

allow, local, loyal, royal, vocal, aglow, allay, alley, allot, alloy, focal, atoll

group, tutor, trout, croup, grout

radio, ratio, patio, radii

amber, embed, ebbed, ember, umbel, umber

chief, thief, fiche, niche

fizzy, fuzzy, dizzy, tizzy

moron, bacon, baron, baton, boron

ocean, canoe, canon, capon

pupil, papal, papas, pupae, pupal

buddy, giddy, muddy, ruddy, biddy, middy

And now for the longest ladder:

1Julia> eccs = eccentricity(CL);
2Julia> mx = maximum(eccs)
3Julia> inds = findall(x -> x==mx, eccs)
4Julia> show(inds)
5
6       [83, 2984, 3024]

These correspond to the words "court", "cabby", "gabby". The last two words are adjacent in the graph, but each of the other pairs produces a ladder of maximum length of 27. Here's one of them:

1Julia> ld = ladder(subwords[inds[1]], subwords[inds[2]])
2Julia> println(join(ld,",  "),"\nlength is ",length(ld))
3
4       court,  count,  mount,  mound,  wound,  would,  could,  cloud,  clout,  flout,
5       flour,  floor,  flood,  blood,  brood,  broad,  board,  boars,  boors,  boobs,
6       booby,  bobby,  hobby,  hubby,  tubby,  tabby,  cabby
7       length is 27