Intersection of two sorted arrays

1. Problem statement

Compute the intersection of two sorted arrays (Aziz et al., 2018, p. 194). Deduplicate the output.

Input: [1, 2] [3, 4]
Output: []

Input: [1, 1, 1, 2] [1, 2, 2, 3]
Output: [1, 2]

2. Insights

2.1. Sorted inputs

This problem is begging us to exploit the fact that the inputs are already sorted.

3. Solution

3.1. Hash tables

We could use hash tables. We simply create two set() objects (essentially hash tables that only have keys, without associated values), and then get the intersection between them.

def brute1(xs: List[int], ys: List[int]) -> List[int]:
    result = list(set(xs) & set(ys))
    result.sort()
    return result

The downside is the space complexity, which is \(O(mn)\) to store the additional sets for each of the arrays, not to mention the sort() we have to run because the set() function does not preserve order.

Also, using a hash table here is wasteful, because we are not taking advatage of the fact that the inputs are sorted.

We should try to find a solution that tries to avoid prohibitively expensive space complexity.

3.2. Brute force

We can compare all letters with overy other letter, by looping over twice.

def brute2(xs: List[int], ys: List[int]) -> List[int]:
    result = []
    for x in xs:
        # Go to next x if it's a dupe.
        if x in result:
            continue
        for y in ys:
            # Similarly, skip over dupes in y.
            if y in result:
                continue
            if x == y:
                result.append(x)
                break
    return result

This has time complexity \(O(xy)\) where \(x\) and \(y\) are lengths of the xs and ys arrays. It's actually worse than this because we're assuming that the expressions x in result and y in result run in constant time, which is simply not true. But as the time complexity is already bad, we leave it at that.

3.4. Optimal

In merged linked lists together, we saw that we only needed to traverse through the two linked lists once because they were already sorted. We can apply the same principle here and only traverse through the xs and ys lists once. The trick is to have a generic loop and within this loop, have two separate indices for each array. Then we advance the index for xs or ys depending on how we process the current item pointed to by each of these indices.

def optimal(xs: List[int], ys: List[int]) -> List[int]:
    i = 0
    j = 0
    result = []

    while i < len(xs) and j < len(ys):
        # Skip over non-matching items (including duplicates).
        if xs[i] < ys[j]:
            i += 1
        elif xs[i] > ys[j]:
            j += 1
        # We have a match.
        else:
            # Again, avoid adding the same item twice into the result.
            if i == 0 or xs[i] != xs[i - 1]:
                result.append(xs[i])

            # We've consumed the information pointed to by both pointers, so
            # they are useless now. Advance the pointers to fetch new content
            # for the next iteration.
            i += 1
            j += 1

    return result

Time complexity is \(O(m+n)\), because we spend \(O(1)\) time per element across both inputs.

4. Tests

🔗
import bisect
from hypothesis import given, strategies as st
from typing import List
import unittest

solution

class Test(unittest.TestCase):
    test_cases

if __name__ == "__main__":
    unittest.main(exit=False)

4.1. Basic tests

def test_basic(self):
    # Empty inputs.
    xs = []
    ys = []
    result = brute1(xs, ys)
    self.assertEqual(result, [])

    # Basic examples, as described in the problem statement.
    xs = [1, 2]
    ys = [3, 4]
    result = brute1(xs, ys)
    self.assertEqual(result, [])
    xs = [1, 1, 1, 2]
    ys = [1, 2, 2, 3]
    result = brute1(xs, ys)
    self.assertEqual(result, [1, 2])

4.2. Property-based tests

@given(st.lists(st.integers(min_value=1, max_value=50),
                min_size=1,
                max_size=50),
       st.lists(st.integers(min_value=1, max_value=50),
                min_size=1,
                max_size=50))
def test_random(self, xs: List[int], ys: List[int]):
    xs.sort()
    ys.sort()

    result_brute1 = brute1(xs, ys)
    result_brute2 = brute2(xs, ys)
    result_better = better(xs, ys)
    result_optimal = optimal(xs, ys)

    # Do the solutions agree with each other?
    self.assertEqual(result_brute1, result_brute2)
    self.assertEqual(result_brute2, result_better)
    self.assertEqual(result_better, result_optimal)

5. References

Aziz, A., Lee, T.-H., & Prakash, A. (2018). Elements of Programming Interviews in Python: The Insiders’ Guide. CreateSpace Independent Publishing Platform (25 July. 2018).