ClusterIt 2.4 review

by on

ClusterIt is a collection of clustering tools, to turn your ordinary everyday pile of UNIX workstations into a speedy parallel beast.

License: BSD License
File size: 0K
Developer: Tim Rightnour
0 stars award from

ClusterIt is a collection of clustering tools, to turn your ordinary everyday pile of UNIX workstations into a speedy parallel beast.

Initially this work was based on the work of IBM's PSSP, and copied heavily from the ideas there. Its also lightly based on the work pioneered in GLUnix. I've decided to simplify, and complexify it however:

Glunix is a monstrosity. It allows better control over the individual nodes, and much better load sharing. However I'm convinced alot of the speed advantages of having a parallel cluster are lost with the incredible overhead of running the glunix master and daemon services on a host. Glunix does however offer a real parallel programming environment. Something which is totally beyond the scope of this package.

PSSP is also a very powerful set of tools. Not much more than a bunch of staples written in perl, they provide an incredible tool for tying an unwieldy number of UNIX machines into one fast demon of an MPP.

The advantages of both systems are central control of a large number of machines. Unfortunately, they all have drawbacks.. as does my solution.

Here are some key features of "ClusterIt":
*Fast* parallel execution of remote commands.

C vs. Perl. You do the math.

Heterogeneous cluster makeup.

This makes it very easy to administer a large number of machines, of varying architectures, and operating systems. The fact that my tools are completely architecture independent, make it possible to dsh commands out to machines that aren't even running the same OS! This can be useful for a variety of mass administration tasks an admin may have to undertake.

Choice of authentication.

IBM forces you to use kerberos 4 for authentication on the SP's. This is actually fine for a closed environment like an SP, but for something to be run on just a stack of otherwise useful boxes, you need more freedom. This suite allows you to do whatever you like.. ssh, kerberos, .rhosts. Whatever suits your security needs best.

Sequential node, and random node execution

The idea here is that these dsh-like programs allow you to do something akin to load balanced scripting. For example one could set up an NFS shared build directory, and issue the command:

make -j4 CC="seq 'cd /usr/src/foo ; gcc'"

Which would execute a build in parallel, on 4 nodes in your cluster, assigning processes to each node in sequence. The run command is equivalent to saying: "I don't care where you run, just run and tell me how things turned out." Generally speaking, the run command will achieve better results as the size of the cluster increases. If you have only three nodes, the odds of getting the same node over and over are fairly good.

Job sequencing

It is possible using this package to schedule processes on the remote machines, so that no more than one process per machine is active at any one time. This was designed to combat problems with using seq for paralell builds.

When building in paralell with seq, it is possible that a node recieves a task that will take it much longer than the other nodes to complete. It is also possible that as other nodes finish thier jobs faster, the node which has been bogged down is handed another job. When performing large paralell builds, eventually very slow machines will stall the entire build, as they are attempting to compile many objects at once, and are usually at this point near-death from swapping.

The Job Scheduling in ClusterIt can prevent this in two ways. First, the job scheduling will not allow a node to process any more than one command at a time. If more commands than nodes are requested, the excess commands will block until a node has freed up. Second, the scheduler has the ability to register a benchmark number of some sort for each node. This allows the scheduler to allways give out the fastest of the remaining nodes whenever one is requested. This allows a paralell build to more efficiently utilize a heterogenous cluster.

What's New in This Release:
Fixes jsd to work properly on Linux.
Adds a -v option to all programs to show what version of ClusterIt they are running.
Replaces most instances of sprintf with snprintf.
Replaces most instances of malloc with calloc.
Adds RCMD_CMD_ARGS to most of the programs.
This makes it easier to do things like run "ssh -4" as your RCMD_CMD.
Fixes a bug where trailing whitespace on a GROUP or LUMP entry would cause dsh -g to not match it.
Fixes a bug where Linux machines often recieved truncated output from the child ssh/rsh process.
Fixes a bug where all the programs would mangle argv[0] for ps.

ClusterIt 2.4 keywords