Using the Elektra/Minotaur cluster
Setup
The Elektra cluster consists of 16 dual Intel Xeon and 12 dual AMD Athlon Linux
boxes on a private network that is separated from the outside world through a
switch. The master node (elektra) acts as a firewall:
it is the only node that can be accessed from outside. In principle, there is
no need to log through to the slaves nodes, although you can (machine names:
eXX and mXX, with XX the node number). The private
network has the following advantages: (1) added security; (2) no additional IP
addresses are needed for the slaves; (3) the fast ethernet connection is
completely available to the cluster, as all external traffic is masked by the
switch. A disadvantage of the private network is that partitions residing on
other machines (like the /home partition on ariadne) can only be mounted on the
master node, and the NFS daemon currently being used (knfsd) cannot re-export
them to the private network. In practice, this is not a big problem, as (for
efficiency reasons) we want to minimize the usage of such partitions
anyway. There is a separate /home on the cluster machines, which prevents you
from writing output to /home on ariadne (which easily could lead to a
bottleneck). In addition, the separate /home partition makes the cluster
independent of the external (building-wide) network. Note, however, that you
still can log in on elektra with your standard password, as the master node is
a NIS slave server to ariadne.
Usage
In practice, you use the cluster as follows:
- Copy your executable and input file(s) to your home directory on elektra,
OR compile your program on elektra.
- Submit your job from elektra. Click here for a sample script. There are several ways to request a node:
-
Request any node: just specify the queue name (cluster) in the PBS script via the -q option.
-
Request a Xeon node: in addition the queue, also use the option
-l nodes=xeon.
-
Request an Athlon MP 1900+ node: in addition the queue, also use
the option -l nodes=athlon.
-
Request an Athlon MP 2800+ node: in addition the queue, also use
the option -l nodes=athlon2.
-
Request a specific node by name: use -l
nodes=YYY, where YYY is the node name.
-
Request more than one CPU (for parallel jobs):
This should be done on a per-node basis (i.e., there is always an even number
of CPUs). For a single node (two CPUs), use: -l
nodes=1:ppn=2. This stands for "one node, two processors per node."
If one desires more than one node, it is important to ensure that both nodes
are of the same type. This can be done via one of the above-mentioned
qualifiers (xeon, athlon, athlon2), e.g. -l
nodes=2:ppn=2:xeon. Note that one cannot use a separate
statement for this qualifier, since there should only be one -l nodes line in the script.
See this sample script to learn how to
automatically use all the requested CPUs.
Prerequisites
- Check whether you can log through to all slave nodes via 'ssh eXX',
where X is the slave number, without giving your password. This is essential
for the delivery of your output files. This
link provides instructions on how to arrange matters.
You should also be able to log through (from the slave node) back to the master
node via 'ssh elektra.mse.uiuc.edu' (use the full name, as that is what
PBS uses).
- Use the PBS '-M' option to specify a valid e-mail address where PBS can
send any error messages. If you receive such e-mail messages, it is quite
likely that you skipped step 1.
Heavy I/O
If your program needs large amounts of temporary disk space, or produces data
at a high rate, the private network could still act as a throttle to your
program. In this case, it is best to perform all I/O on the /scratch-local
partition of the slave node, and copy your data to /home at the end of the run.
Contact Erik to learn more about this.
Tips and tools
- From the desktop machines (ariadne, etc.) you can simply check the queue
status on the cluster via ssh elektra qstat -n
(you may wish to define this as an alias). In addition, the command queues gives you a compact overview of all available queues
(local and cluster).
- On elektra, you can view the CPU load of all cluster nodes via the command
load.
Back to resource overview
This page was created on April 14, 2002 and last updated on
May 20, 2004.