piping for fun and profit

May 29, 2014 - 3 minutes read - 500 words

I recently discovered something pretty cool: groovy, and in particular groovysh. It lets you do cool stuff like run JVM functions:

➜  ~  groovysh
Groovy Shell (2.3.3, JVM: 1.8.0)
Type ':help' or ':h' for help.
-------------------------------------------------------------------------------
groovy:000> new Random().nextInt()
===> 909782845

But the sad part is that it seems pretty slow on my machine:

➜  ~  time (echo :q | groovysh)
Groovy Shell (2.3.3, JVM: 1.8.0)
Type ':help' or ':h' for help.
-------------------------------------------------------------------------------
groovy:000> :q
( echo :q | groovysh; )  16.56s user 0.31s system 201% cpu 8.384 total

That’s more than 8 seconds just to start up and shut down a prompt that I might just run one command in!

I had a similar problem during my Ph.D. research: part of my research was on a numerical algorithm that had an extremely long startup time, and then had to do a bunch of stuff blazing fast. It was basically a FORTRAN implementation of my iQP code. I was having trouble interfacing it properly with MATLAB and was dreading a future C++ integration. It was particularly problematic because we would have ideally used output from each call of that code incrementally, but that just wasn’t really possible with how the FORTRAN code was set up. Instead, we solved all of the quadratic programs using the slow initial QP solves and wrote the problem data into files. Then we called the FORTRAN code with command line arguments specifying which files to load. That code then read those files and performed the actual QP solve before writing the answer back out to a file again. Afterward, once we had all of the quadratic programs, we ran a very similar code which looped over them, to measure the practical performance of the proposed method. This worked, but the initial QP solve can easily be three orders of magnitude slower than subsequent solves. Furthermore, even if it wasn’t, solving it twice is still completely unnecessary and a waste of time on an already onerously long process.

So I did something a bit different: I created a named pipe with =mkfifo= and had my FORTRAN code and the MATLAB code open it. When the MATLAB code needed a QP solved, it just printed a line into the fifo. As soon as that happens, the FORTRAN code loads those files and cranks through the compuation. The rest of the logic was identical: it printed the runtime statistics to standard out and the solution files out as required. A similar but reverse setup told MATLAB when the files were done again: the MATLAB code simply waited until it received that new line, and then loaded the solution vectors from those files back into the algorithm and proceeded with the next step in the proposed algorithm.

Anyway, that’s my story on the dirtiest workaround to a tempermental development environment. You can recreate it with something like:

term1$ mkfifo smurf
term1$ tail -f smurf | groovysh

and then

term2$ echo new Random().nextInt() >> smurf
term2$ echo :quit >> smurf