Make a lot of plots with Gnuplot::Builder, ImageMagick and Daiku

Preparing a lot of plots for academic papers has always been a pain for me, but recently I've managed to reduce the pain by Gnuplot::Builder and some other goodies. Here's how.

Use Gnuplot::Builder to isolate plots

Until recently I used bash and/or perl to generate gnuplot scripts, and executed those scripts by gnuplot. This resolved most of the problems gnuplot has, but when you plot a lot of images it still has a problem; the plot settings interfere with each other.

When you plot two images by a single gnuplot script, for example, the settings for the first image are also used by the second image. This is because gnuplot's setting parameters are all global, and this drived me nuts when I re-organized the script. If I remove the script for the first image, it affects the second image!

To solve this problem (and a lot of others), I developed a Perl module called Gnuplot::Builder. Gnuplot::Builder holds gnuplot settings in an object, and it plots images by separate gnuplot processes. Therefore different plots never interfere with each other.

This is an example of using Gnuplot::Builder.

use Gnuplot::Builder qw(gscript gfile);

my $base = gscript(
    term => "postscript eps size 6.0,4.0",
    mxtics => 5,
    mytics => 5,
);

foreach my $type (qw(a b c d)) {
    my $script = $base->new_child;
    $script->setq(
        title => "Data type = $type",
        grid => "xy",
        xlabel => "Time [sec]",
        ylabel => "Through-put [MB/sec]",
    );
    $script->plot_with(
        dataset => gfile("data_$type.dat", using => "1:2", with => "lp"),
        output => "plot_data_$type.eps"
    );
}

{
    my $script = $base->new_child;
    $script->setq(
        xlabel => "Number of nodes",
        ylabel => "Through-put [MB/sec]"
    )->set(
        mxtics => undef,
        mytics => undef,
        logscale => "xy"
    );
    $script->plot_with(
        dataset => gfile("scalability.dat", using => "1:2", with => "lp"),
        output => "plot_scalability.eps"
    );
}

Some of the advantages of Gnuplot::Builder are demonstrated in the above example.

  • All $script objects share the same $base object as their parent. $base keeps the common settings for all plots.
  • The first $script have "title" and "grid" options, but the second $script does not. Because they are isolated, the second plot (plot_scalability.eps) does not have its title or grid.
  • The second $script overrides the $base's "mxtics" and "mytics" settings.

With Gnuplot::Builder, it is totally OK to remove the first $script if you change your mind. The second $script remains the same.

Plot in parallel for increased through-put

Gnuplot::Builder is synchronous by default. That is, each time it plots an image, it waits for the gnuplot process to finish. This can take very long time when you plot a lot of images with a lot of data points.

Gnuplot::Builder comes with a simple process manager and it can run gnuplot processes in parallel. This greatly boosts the plotting through-put.

To parallelize your script (like the above example), add the following line just under "use Gnuplot::Builder" line,

$Gnuplot::Builder::Process::ASYNC = 1;

and add the following line at the end of the script.

Gnuplot::Builder::Process->wait_all();

By default, Gnuplot::Builder runs up to 2 processes in parallel. You can customize this number via an environment variable. For example if your CPU has 8 threads, I recommend adding the following line to your .profile.

export PERL_GNUPLOT_BUILDER_PROCESS_MAX_PROCESSES=8

Let gnuplot do extra work (in parallel)

Sometimes gnuplot alone is not enough to get your job done.

For example, if you plot an image with a lot of data points in EPS format, the size of the EPS file becomes too large to be embedded in your paper. In that case, I rasterize the EPS file with ImageMagick.

$ convert -density 300x300 input_vector.eps output_raster.eps

It seems "convert" calls "gs" command under the hood to do the actual job.

The problem of rasterizing is that it takes very long time, again. It's even longer than gnuplot generates the vector image! So I want to parallelize the rastering process, too.

Fortunately we can exploit Gnuplot::Builder's process manager to parallelize it. To do that, rewrite all invocations of plot_with() method like the following.

gscript()->run(sub {
    my ($writer) = @_;
    my $output = "plot_data_$type.eps";
    $script->plot_with(
        dataset => gfile("data_$type.dat", using => "1:2", with => "lp"),
        output => $output,
    );
    my $temp = "$output.temp";
    $writer->("!convert -density 300x300 $output $temp\n");
    $writer->("!mv -f $temp $output\n");
});

run() method allows you to inject arbitrary data into the gnuplot process, even after calling plot_with() method. Using this feature, we inject shell-script commands to "convert" the vector image and "mv" the resulting raster image. Executing shell-script commands is supported by gnuplot.

Now the gnuplot process not only plots the vector image but also converts it into the raster one. This whole process is run in parallel by Gnuplot::Builder.


Use Daiku to select which to plot

For me, fine-tuning images for my paper involves a lot of trials and errors. I plot some data, see the result, change gnuplot settings a little, and plot them again. This continues until I'm satisfied with the quality of the images.

However, plotting all images often take long time, even if we parallelize it to the max. When you are in that "trial-and-error" phase, you need a way to select the particular image to plot and do nothing on everything else. In Perl, you can do that with Daiku.

Daiku is a Perl version of "make" (or "rake"). With Daiku you describe files and how to generate them. The important thing is that you can specify a Perl subroutine as the "how-to" part.

Using Daiku, the whole script will be:

use Daiku;
use Gnuplot::Builder qw(gscript gfile);
$Gnuplot::Builder::Process::ASYNC = 1;

sub plot_and_rasterize {
    my ($script, %plot_params) = @_;
    gscript()->run(sub {
        my ($writer) = @_;
        $script->plot_with(%plot_params);
        my $output = $plot_params{output};
        my $temp = "$output.temp";
        $writer->("!convert -density 300x300 $output $temp\n");
        $writer->("!mv -f $temp $output\n");
    });
}

my $base = gscript(
    term => "postscript eps size 6.0,4.0",
    mxtics => 5,
    mytics => 5,
);

foreach my $type (qw(a b c d)) {
    my $script = $base->new_child;
    $script->setq(
        title => "Data type = $type",
        xlabel => "Time [sec]",
        ylabel => "Through-put [MB/sec]",
    );
    file "plot_data_$type.eps", sub {
        my ($task) = @_;
        plot_and_rasterize(
            $script,
            dataset => gfile("data_$type.dat", using => "1:2", with => "lp"),
            output => $task->dst,
        );
    };
}

{
    my $script = $base->new_child;
    $script->setq(
        xlabel => "Number of nodes",
        ylabel => "Through-put [MB/sec]"
    )->set(
        logscale => "xy"
    );
    file "plot_scalability.eps", sub {
        my ($task) = @_;
        plot_and_rasterize(
            $script,
            dataset => gfile("scalability.dat", using => "1:2", with => "lp"),
            output => $task->dst
        );
    };
}

task "all" => ["plot_scalability.eps", map { "plot_data_$_.eps" } qw(a b c d)];
@ARGV = ("all") if !@ARGV;

build $_ foreach @ARGV;
Gnuplot::Builder::Process->wait_all();

Note that I refactor the rasterizing part into "plot_and_rasterize()" function.

Daiku exports "file", "task" and "build" functions. We use "file" and "task" functions to register subroutines to plot the images. The registered subroutines are not executed until "build" function is called at last.

So, when you fine-tune "plot_data_a.eps", for example, run this script like

$ perl plot.pl plot_data_a.eps

then it plots only "plot_data_a.eps".

Note that Daiku doesn't support asynchronous build jobs. So if you run gnuplot processes in parallel (like the example above), Daiku may fail to resolve task dependency correctly. As long as each task is simple, I think it works well.