1 min read

Fork-Join parallelism using Signals in APLX

In preparation for some research work I'm doing in parallelism, I finally managed to get some comprehension of tasks and multi-programming in APLX.

I came up with the following code, and I would appreciate any improvements or suggestions. It's fast in the case where you have significant amounts of stenciling to do (I used a multi-pass mean filter/blur stencil on an RGB Image). I managed nearly 100% CPU utilization throughout the computation.

∇Z←PBLUR IMG;F;R;G;B;T;TOV;INIT 
 0 0⍴1 ⎕WE 0 
 (R G B)←T←'⎕' ⎕NEW¨3⍴⊂'APL' 
 T.background←1 ⋄ T.wssize←400000000 
 T.∆M←⊂[2 3](3⍴256)⊤IMG ⋄ Z←0 ⋄ F←3⍴0 
 T.∆TOV←⊂0 ⎕OV ⎕BOX 'CONVOLUTE3 CONVOLVEMANY CHILD' 
 R.onSignal←'R.onSignal←"F[1]←1 ⋄ →(^/F)/XT" ⋄ 0 0⍴R.Signal' 
 G.onSignal←'G.onSignal←"F[2]←1 ⋄ →(^/F)/XT" ⋄ 0 0⍴G.Signal' 
 B.onSignal←'B.onSignal←"F[3]←1 ⋄ →(^/F)/XT" ⋄ 0 0⍴B.Signal' 
 INIT←'Sys←''⎕'' ⎕NEW ''System'' ⋄ 2 ⎕OV Sys.∆TOV ⋄ CHILD' 
 0 0⍴T.Open 
 0 0⍴R.Execute INIT 
 0 0⍴G.Execute INIT 
 0 0⍴B.Execute INIT 
 0 0⍴⎕WE ¯1 
 XT:Z←B.∆M+256×G.∆M+256×R.∆M 
 ∇ 

∇Z←S CONVOLUTE3 IMG;M;W;H;PW;PH;I;SR;SS;R;X

⍝ Morten contributed optimizations to the original form.

⍝ Determine size and row size of the stencil.
SR←↑⍴S ⋄ SS←×/⍴S

⍝ Pad the incoming matrix with zeros around the edge.
(PH PW)←(2×R←⌊SR÷2)+(H W)←⍴IMG
M←(-R)⌽(-R)⊖PH PW↑IMG

⍝ Generate the template column for a region of the matrix.
I←(+PW×(SR⍴0),(SS-SR)⍴1,(¯1+SR)⍴0)+SS⍴⍳SR

⍝ Generate all the other region columns.
I←I∘.+(X⍴(W⍴1),(2×R)⍴0)/0,⍳¯1+X←(PH×PW)-(2×R×PW)+2×R

⍝ Compute the stencil
Z←⌊0.5+H W⍴(,S)+.×(,M)[I]

∇Z←S CONVOLVEMANY IMG;I
I←0 ⋄ Z←IMG
LP:→(10<I←I+1)/0 ⋄ Z←S CONVOLUTE3 Z ⋄ →LP

∇CHILD;S
S←3 3⍴÷9
Sys.onSignal←'Sys.∆M←S CONVOLVEMANY Sys.∆M ⋄ Sys.Signal ⋄ →XT'
Sys.Signal
0 0⍴⎕WE ¯1
∇XT:⍎')OFF'