Find first occurrence from sorted array
1. Problem statement
Given a sorted array, find the first occurrence of an item k in the array
(Aziz et al., 2018, p. 155). If the item does not exist, return None
.
Input: [1, 2, 4, 7, 8], k=7 Output: 3 (index 3) Input: [1], k=2 Output: None
2. Insights
2.1. Duplicate keys
If there are no duplicate keys (best-case scenario), then we can do binary search to get to the first occurrence.
If there are duplicates though (what if all items are equal to k?), then a single binary search iteration will not be enough.
Hint: if the middle element of the input is equal to k, we can eliminate all candidates to the right.
3. Solution
3.1. Brute force
The most obvious solution is to just iterate through the list, checking each item to see if it equals k.
3.2. Optimal
We can do a binary search once to figure out if the item exists in the array. Then we can do a linear search to the left in case the one we binary-searched-to is a duplicate, to find the first occurrence.
Obviously, the worst case complexity is quite bad. If there are 1 million items all with the same item k, then binary search will terminate after a single pass at the middle (500,000th) element. Then we'd still need to iterate 500,000 times to the left to find the first occurrence at index 0. As the number of items \(n\) in the array grow, the worst-case time complexity also grows linearly in proportion to \(n\). So our worst-case time complexity is \(O(n)\).
We can instead continually do a binary search until we find the first occurrence of the element. Whereas normal binary search stops when it finds the item, in our modified version we can eliminate everything to the right, because at best they will contain non-first duplicates of k or non-k items.
3.3. Optimal (Pythonic)
The Pythonic solution is to use the built-in index()
method for arrays, or the
bisect
module. Both of these solutions delegate the implementation details, so
there's not much work for us to do.