We will want to query `keep_cache` from parallel threads. If we compute
the results on-demand, that means we need synchronization for cache
access in those queries, which adds complexity and overhead. Instead, prefill
the cache with the status of all relevant modules. Note that this doesn't
actually do more work --- we always consult `keep_cache` for all cells of
all selected modules, so scanning all those cells and determining the kept
status of all dependency modules is always required.
Later in this PR we're going to parallelize `scan_module` itself, and that's also
much easier to do when no other parallel threads are running.