Anonymous message from magazine
1. Problem statement
Given two input texts message and magazine, determine if the message can be reconstructed by consuming the letters from the magazine (Aziz et al., 2018, p. 175).
Input: message = "hello", magazine = "old" Output: False Input: message = "hello", magazine = "old elf hat" Output: True
2. Insights
2.1. Set theory
The message can only be constructed from the magazine if it is a subset of the magazine text.
If we simplify the problem to just 1 letter (let's say that the message can only be constructed out of 1 letter), then it's just a matter of whether the magazine has enough copies of that letter. For example, if the message requirse 5 "e" letters but the magazine (in total) has only 4 of them, then we cannot construct the message from the magazine.
The original problem is just checking whether all letters in the message satisfy the above constraint.
3. Solution
3.1. Brute force
Search to see if each letter in the message is found in the magazine. Each time we find the letter in the magazine, shrink the magazine by slicing out that letter.
The above algorithm mimics how we as humans would solve the problem naively.
The time complexity is at least \(O(m)\) where \(m\) is the length of the message.
But it is much worse than that because magazine.find(c)
itself has \(O(t)\) time
complexity where \(t\) is the length of the magazine text, as the magazine is not
sorted. It may be that Python's find()
method sorts the string internally
first to cut this cost down, but that is beyond the scope of this exercise (one
could argue that the Python implementation has a custom find()
method that is
not the one found in CPython, for example).
The magazine reconstruction is also costly, and could very well involve
wholesale copying of arrays, which also have \(O(L)\) cost where \(L\) is the length
of the array backing the magazine text. So in total the time complexity would be
\(O(m * (t + L))\). Because of the way \(t\) and \(L\) follow each other closely (the
time to run find()
shrinks as the new magazine allocation shrinks by 1 on each
iteration), we could simplify to \(O(m * n)\) where \(n\) is the length of the
magazine text. If \(m\) and \(n\) are of similar size, this degenerates to quadratic
time complexity!
3.2. Optimal
We can instead use hash tables. Just run the input texts through a corresponding hash table, and then check that the letter frequencies in the message text are less than or equal to the letter frequencies in the magazine text.
Even better, only use one hash table for the message text. Then iterate through the magazine text and tick off the character frequency values until we hit 0 for all letters.
The time complexity is \(O(m + n)\), where \(m\) and \(n\) are the lengths of the message and magazine, respectively. The \(O(m)\) comes from the cost of constructing the hash table, which must iterate through every character in the message (insertion into a hash table is \(O(1)\)).
The \(O(n)\) comes from the worst-case scenario where we have to check every
letter in the magazine (the above for
loop has to run its course).
The space complexity is \(O(L)\) where \(L\) is the size of the hash table.
If the input alphabets of the message is known to be within some small range, such as ASCII, then we could avoid using a hash table and instead just use a fixed size array and record the counts of each letter into the corresponding index in this array. This optimization doesn't improve the time or space complexity, though.
3.3. Pythonic
The Pythonic way is to just use subtraction on the Counter
hash table type
directly. This may not be the fastest because we build up two hash tables
instead of one.