This is trivially solved by counting letters in a given id using collections.Counter()
, then noting if there are any 2s or 3s in the counts.
import aocd
data = aocd.get_data(day=2, year=2018)
boxids = [id_.strip() for id_ in data.splitlines()]
from collections import Counter
def checksum(boxids):
twos, threes = 0, 0
for id_ in boxids:
counts = set(Counter(id_).values())
twos += int(2 in counts)
threes += int(3 in counts)
return twos * threes
print("Part 1:", checksum(boxids))
Part 1: 6225
The most efficient path to a solution is to use a trie, a tree structure where nodes are letters of a word. This is often used to do efficient prefix testing (does a given prefix exist in a set of words?), but here you can use it to quickly prune the number of possible ids to search given a prefix.
For the given example, the full trie would be:
├── a
│ ├── b
│ │ └── c
│ │ └── d
│ │ └── e
│ └── x
│ └── c
│ └── y
│ └── e
├── f
│ └── g
│ ├── h
│ │ └── i
│ │ └── j
│ └── u
│ └── i
│ └── j
├── k
│ └── l
│ └── m
│ └── n
│ └── o
├── p
│ └── q
│ └── r
│ └── s
│ └── t
└── w
└── v
└── x
└── y
└── z
but you never have to go that far. You start with abcde
and the trie is quickly updated to
└── a
└── b
└── c
└── d
└── e
after finding that there are no letters in the top level of the trie and a
not being present in the trie (so you insert the whole id, continue).
You then process fghij
; you find the a
at the top level, but there is no aghij
in the trie (traverse to a
, the map has no g
, end of test), so you insert fghij
into the trie:
├── a
│ └── b
│ └── c
│ └── d
│ └── e
└── f
└── g
└── h
└── i
└── j
klmno
is treated the same, there is no almno
and no flmno
in the trie (the two letters at the top of the trie that could replace k
, neither of which have an l
entry in their subtree), so you insert the word at the top. pqrst
is treated the same way; 3 tests for aq
, fq
and kq
all fail. You now have:
├── a
│ └── b
│ └── c
│ └── d
│ └── e
├── f
│ └── g
│ └── h
│ └── i
│ └── j
├── k
│ └── l
│ └── m
│ └── n
│ └── o
└── p
└── q
└── r
└── s
└── t
Now comes fguij
. Testing with f
in that id, you find are no ag
, kg
, and pg
prefixes, but f
does exist, so it is worth progressing to the guij
substring of the id and the f
subtree (g
-> h
-> i
-> j
) of the trie only. There are no alternative letters, but g
does exist in the subtree, so we continue on, with the subtree (h
-> i
-> j
), and substring uij
.
We test the only key in the trie here, h
, and find that hij
exists in this subtree. We have a match! fg
is the prefix so far, ij
the postfix we tested, so the answer is fgij
.
So for each id we test, we only need to check a very small subset of letters seen soo far.
def intrie(s, trie):
for c in s:
trie = trie.get(c)
if trie is None:
return False
return True
def matching_ids(boxids):
trie = {}
for id_ in boxids:
current = trie
for i, char in enumerate(id_):
for tchar in current:
if tchar != char:
# check if there is a full postfix match
if intrie(id_[i + 1 :], current[tchar]):
return f"{id_[:i]}{id_[i + 1 :]}"
if char not in current:
# insert remainder as a new trie entry
for c in id_[i:]:
current = current.setdefault(c, {})
break
else:
current = current[char]
testids = """\
abcde
fghij
klmno
pqrst
fguij
axcye
wvxyz""".splitlines()
assert matching_ids(testids) == "fgij"
print("Part 2:", matching_ids(boxids))
Part 2: revtaubfniyhsgxdoajwkqilp