Fork-Join parallelism using Signals in APLX

In preparation for some research work I'm doing in parallelism, I finally managed to get some comprehension of tasks and multi-programming in APLX.

I came up with the following code, and I would appreciate any improvements or suggestions. It's fast in the case where you have significant amounts of stenciling to do (I used a multi-pass mean filter/blur stencil on an RGB Image). I managed nearly 100% CPU utilization throughout the computation.

Aaron W. Hsu

```∇Z←PBLUR IMG;F;R;G;B;T;TOV;INIT
0 0⍴1 ⎕WE 0
(R G B)←T←'⎕' ⎕NEW¨3⍴⊂'APL'
T.background←1 ⋄ T.wssize←400000000
T.∆M←⊂[2 3](3⍴256)⊤IMG ⋄ Z←0 ⋄ F←3⍴0
T.∆TOV←⊂0 ⎕OV ⎕BOX 'CONVOLUTE3 CONVOLVEMANY CHILD'
R.onSignal←'R.onSignal←"F[1]←1 ⋄ →(^/F)/XT" ⋄ 0 0⍴R.Signal'
G.onSignal←'G.onSignal←"F[2]←1 ⋄ →(^/F)/XT" ⋄ 0 0⍴G.Signal'
B.onSignal←'B.onSignal←"F[3]←1 ⋄ →(^/F)/XT" ⋄ 0 0⍴B.Signal'
INIT←'Sys←''⎕'' ⎕NEW ''System'' ⋄ 2 ⎕OV Sys.∆TOV ⋄ CHILD'
0 0⍴T.Open
0 0⍴R.Execute INIT
0 0⍴G.Execute INIT
0 0⍴B.Execute INIT
0 0⍴⎕WE ¯1
XT:Z←B.∆M+256×G.∆M+256×R.∆M
∇

∇Z←S CONVOLUTE3 IMG;M;W;H;PW;PH;I;SR;SS;R;X
⍝
⍝ Morten contributed optimizations to the original form.
⍝
⍝ Determine size and row size of the stencil.
SR←↑⍴S ⋄ SS←×/⍴S
⍝
⍝ Pad the incoming matrix with zeros around the edge.
(PH PW)←(2×R←⌊SR÷2)+(H W)←⍴IMG
M←(-R)⌽(-R)⊖PH PW↑IMG
⍝
⍝ Generate the template column for a region of the matrix.
I←(+PW×(SR⍴0),(SS-SR)⍴1,(¯1+SR)⍴0)+SS⍴⍳SR
⍝
⍝ Generate all the other region columns.
I←I∘.+(X⍴(W⍴1),(2×R)⍴0)/0,⍳¯1+X←(PH×PW)-(2×R×PW)+2×R
⍝
⍝ Compute the stencil
Z←⌊0.5+H W⍴(,S)+.×(,M)[I]
∇

∇Z←S CONVOLVEMANY IMG;I
I←0 ⋄ Z←IMG
LP:→(10<I←I+1)/0 ⋄ Z←S CONVOLUTE3 Z ⋄ →LP
∇

∇CHILD;S
S←3 3⍴÷9
Sys.onSignal←'Sys.∆M←S CONVOLVEMANY Sys.∆M ⋄ Sys.Signal ⋄ →XT'
Sys.Signal
0 0⍴⎕WE ¯1
∇XT:⍎')OFF'
```