In the end, I decided to cache all the values that I collected, but to only do it once, at the end of each call to the subroutine. This meant that I had to collect the results of each value. It gained me a bit in that I was able to make large contributions to the cache early on.
I also made the optimization of eliminating the first 500,000 possible choices right of the bat, because I knew they could not possibly be one of the answers through some basic analysis of the problem.
Eventually, I was lead to this solution.
⍝ The starting number of the longest Collatz Sequence
⍝ that is less than 1,000,000.
N←M ⋄ S←⍬
⍝ This appears to eat up the most amount of time,
⍝ comment it out for much better performance.
→(10000|M)/LOOP ⋄ M÷10000
⍝ Use cached value if there is one.
→(N≥⍴COLLATZ∆CACHE)/NOCACHE ⋄ →(0≠COLLATZ∆CACHE[N])/DONE
NOCACHE:S←N,S ⋄ →(1=N)/DONE
→(2|N)/ODD ⋄ N←N÷2 ⋄ →LOOP
ODD:N←1+3×N ⋄ →LOOP
Notice my comment in the COLLATZCOUNT function. I used the profiler on my code after I noticed that it took almost three minutes to run this code. To my utter amazement (though, less so after I thought about it), I saw that almost all the time was taken up by my book keeping and progress reporting code. I disabled this code and it ran much more quickly, roughly within my minute time-limit.
I couldn't leave it at that, of course, I re-enabled things and decided to try out some more of the statistics and charting features in APLX. This gave me the following information about how the cache population changes over iterations. Smaller means a fuller cache.