| Nodes | Memory |
|---|---|
| node01-node12 | 4Gb |
| node13-node24 | 8Gb |
| node-25-node27 | 16Gb |
| node28-node30 | 32Gb |
The head node is called "dalai.med.harvard.edu", the compute nodes are 'node01' through 'node30'. The compute nodes are located behind the head node on the network, in the sense that the nodes are only network accessible from dalai. Dalai acts as a router for the compute nodes however, so it is possible to directly connect to hosts on the wider internet from the compute nodes.
| Program | Available here |
|---|---|
| Telnet equivalent for PC Users: | Putty |
| FTP equivalent for PC Users: | WinSCP, available here or here. |
| Telnet equivalent for Mac Users: | MacSSH |
| FTP equivalent for Mac Users: | F-Secure SSH (not free, but demo is available) |
| Telnet equivalent for Linux Users: | ssh [from command-line] |
| FTP equivalent for Linux Users: | scp (or sftp) [from command-line] |
These are only suggestions, other applications which support SSH2 may also work.
A scratch directory exists on each of the compute nodes. When your application requires large files or frequent file access, performance will be improved by placing these files in the scratch space. You may not know on which compute node your application will be launched, so you may need to copy required files to all nodes simultaneously using the "rdist" command.
NOTE: The 'qsub' command is part of SGE, but is fairly primitive, which causes a number of problems. The 'qqsub' command was written at the Roth Lab, to overcome those problems. It is highly recommended that you use 'qqsub' in preference to 'qsub'. Throughout this guide, we will refer only to qqsub, since it is hard to conceive of circumstances where you would choose to use the more basic 'qsub'. The syntax is almost identical, qqsub accepts all the same arguments as qsub. For details (including a list of all the problems that qqsub helps you avoid), type 'man qqsub' at the shell prompt. For an example of how to submit a job to the queue, try submitting the jobs "sleeptest.sh":
qqsub will assume
the options are for it, and you will likely get errors."qqsub" and "qstat" are pretty straightforward features of SGE. For more information, see the SGE manuals as well as the man pages for individual programs.
If you do not like the defaults supplied by "qqsub", you can override them by specifying new values using embedded #$ commands just as you would with 'qsub'. See 'man qsub' for details on how to do this.
One last thing: there is a FAQ ('frequently asked questions') for dalai. You may want to check it out if you run into problems getting your program to run on dalai (most people do, that's where the FAQ came from!).
-I' flag. (Interactive
mode doesn't allow any program as an argument, since it starts a shell
on the remote node.) You'll have some informational messages, and
after a few seconds you'll be connected to one of the compute nodes. Here's
an example:
dmorgan@dalai:~/MyProjects$ qqsub -I
Establishing an interactive session on SGE.
This may take several seconds, depending on system load...
local configuration dalai not defined - using global configuration
Your job 748511 ("Interactive") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 748511 has been successfully scheduled.
Establishing /d0/sge_test/ql.sh session to host node15 ...
Linux node15 2.6.20-17-server #2 SMP Mon Jun 9 19:26:46 UTC 2008 x86_64
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
You have mail.
Last login: Thu Apr 3 09:39:02 2008
dmorgan@node15:~$
Any commands I type at this point are executed on node15, not on dalai.
This allows me to run interactive jobs (debugging springs to
mind, there are probably other examples) on a compute node. This way I don't
worry about hogging resources on the head node. And it's better than
logging in directly to the node, because PBS keeps track of what's
free, and sends you there automatically.
When you're done, you can quit the interactive session by typing
'exit', which will terminate the job, with this message:
You are encouraged to use this feature whenever you have programs to debug, or have some kind of interactive work to do (e.g., run a two-minute job, check results, change parameters, run again; check results, change params, run again; etc.). That way we can save the head node for its intended purpose: to control the other nodes. There is one caveat: leaving an interactive job like this running will tie up the node, since SGE allows only two (or four, depending on the node) concurrent jobs per node. Consideration for your fellow dalai-users is probably the best motivator here: "do unto others..." Lastly, you can see which jobs are interactive (see Viewing results) by looking for the word 'Interactiv' in the output ofdmorgan@node15:~$ exit logout Connection to node15 closed. /d0/sge_test/ql.sh exited with exit code 0
qstat.
| Queue | Member Nodes |
|---|---|
| guest | node01-node30 |
| guest-4G | node01-node12 |
| guest-8G | node13-node24 |
| guest-16G | node25-node27 |
| guest-32G | node28-node30 |
| rothlab | node01-node30 |
| rothlab-4G | node01-node12 |
| rothlab-8G | node13-node24 |
| rothlab-16G | node25-node27 |
| rothlab-32G | node28-node30 |
Type 'groups' at the UNIX prompt to see which UNIX groups you're a member of. By default, qqsub will submit the jobs of users who are in the 'rothlab' group to the 'rothlab' queue. It will submit the jobs of users who are NOT in that group to the 'guest' queue. (There is no 'guest' group.)
If you're in the rothlab group and wish to submit your job to the
'guest' queue, you need to add '-q guest' to qqsub on the command
line. Non-members of the 'rothlab' may submit only to the 'guest'
queues. Specifying the name of your default queue will not cause any
harm. An example should illustrate (attempting to submit program
'myProg' to SGE):
For users in 'rothlab' group
| qqsub myProg | no queue specified, submitted to 'rothlab' (default) queue |
| qqsub myProg -q rothlab | default queue specified, submitted to 'rothlab' queue, same as above |
| qqsub myProg -q guest | non-default queue, submitted to 'guest' queue |
| qqsub myProg | no queue specified, submitted to 'guest' (default) queue |
| qqsub myProg -q guest | default queue specified, submitted to 'guest' queue, same as above |
| qqsub myProg -q rothlab | non-default queue. Is queued, but never scheduled, due to lack of permissions. |
You can monitor the queues in a number of ways:
| qstat -f | Lists each queue, showing number of jobs allowed ('Max'), queued ('Que'), running ('Run') |
| qstat | Shows status of each job running, rightmost column indicates queue. |
If you have any trouble using qqsub please submit a support request.
Enjoy!
This page was developed by Frank Gibbons and last modified by the West Quad Computing Group on 28 October 2008.