2 min read

APL Hacking: Project Euler Daily (#14)

This problem was interesting in all different ways. It asks for Collatz Sequence counts. First, I tried to do the naive way, and discovered how that won't work. I then tried to find a more sophisticated algorithm, and realized that I wouldn't find one that suited me. Instead, I decided to go with a caching algorithm, or, basically, to memoize the results. This still left me with some things to play with. There were issues of just when and where I should do the caching. I wanted the code to be as simple as possible, but I didn't know whether caching should be done step-wise, or if it should be done on each main call to the subroutine.

In the end, I decided to cache all the values that I collected, but to only do it once, at the end of each call to the subroutine. This meant that I had to collect the results of each value. It gained me a bit in that I was able to make large contributions to the cache early on.

I also made the optimization of eliminating the first 500,000 possible choices right of the bat, because I knew they could not possibly be one of the answers through some basic analysis of the problem.

Eventually, I was lead to this solution.

Problem #14:

∇R←PEFOURTEEN;⎕IO
⎕IO←1

⍝ The starting number of the longest Collatz Sequence
⍝ that is less than 1,000,000.

COLLATZ∆CACHE←(2*20)⍴0
COLLATZ∆STATS←⍬
R←500000+⌈/1↑⍒COLLATZCOUNT¨500000↓⍳999999

∇R←COLLATZCOUNT M;C;F;N;S;⎕IO
⎕IO←1

N←M ⋄ S←⍬

⍝ This appears to eat up the most amount of time,
⍝ comment it out for much better performance.
→(10000|M)/LOOP ⋄ M÷10000
COLLATZ∆STATS←COLLATZ∆STATS,+/0=COLLATZ∆CACHE

LOOP:
⍝ Use cached value if there is one.
→(N≥⍴COLLATZ∆CACHE)/NOCACHE ⋄ →(0≠COLLATZ∆CACHE[N])/DONE
NOCACHE:S←N,S ⋄ →(1=N)/DONE
→(2|N)/ODD ⋄ N←N÷2 ⋄ →LOOP
ODD:N←1+3×N ⋄ →LOOP

DONE:
F←S≤⍴COLLATZ∆CACHE
COLLATZ∆CACHE[F/S]←COLLATZ∆CACHE[N]+F/⍳⍴S
R←COLLATZ∆CACHE[M]



Notice my comment in the COLLATZCOUNT function. I used the profiler on my code after I noticed that it took almost three minutes to run this code. To my utter amazement (though, less so after I thought about it), I saw that almost all the time was taken up by my book keeping and progress reporting code. I disabled this code and it ran much more quickly, roughly within my minute time-limit.

I couldn't leave it at that, of course, I re-enabled things and decided to try out some more of the statistics and charting features in APLX. This gave me the following information about how the cache population changes over iterations. Smaller means a fuller cache.

collatz_stats.svg