[Contents] [Prev] [Next] [End]
This chapter describes how to use some of the basic features of LSF. After following the examples in this chapter you should be able to use LSF for most of the everyday tasks.
Configuration options shown in the following examples, such as host types and model names, host CPU factors (representing relative processor speed), and resource names are examples only; your system likely has different values for these settings.
Cluster information includes the cluster master host, cluster name, cluster resource definitions, cluster administrator, etc.
LSF provides tools for users to get information about the system. The first command you want to use when you learn LSF is lsid. This command tells you the version of LSF, the name of your LSF cluster, and the current master host.
% lsid LSF 3.0, Dec 10, 1996 Copyright 1992-1996 Platform Computing Corporation My cluster name is test_cluster My master name is hostA
To find out who your cluster administrator is and a summary of your cluster, run the lsclusters command:
% lsclusters CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS test_cluster ok hostb lsf 6 6
If you are using the LSF MultiCluster product, you will see one line for each of the clusters that your local cluster is connected to in the output of lsclusters.
The lsinfo command lists all the resources available in the cluster.
% lsinfo RESOURCE_NAME TYPE ORDER DESCRIPTION r15s Numeric Inc 15-second CPU run queue length r1m Numeric Inc 1-minute CPU run queue length (alias: cpu) r15m Numeric Inc 15-minute CPU run queue length ut Numeric Inc 1-minute CPU utilization (0.0 to 1.0) pg Numeric Inc Paging rate (pages/second) io Numeric Inc Disk IO rate (Kbytes/second) ls Numeric Inc Number of login sessions (alias: login) it Numeric Dec Idle time (minutes) (alias: idle) tmp Numeric Dec Disk space in /tmp (Mbytes) swp Numeric Dec Available swap space (Mbytes) (alias: swap) mem Numeric Dec Available memory (Mbytes) ncpus Numeric Dec Number of CPUs ndisks Numeric Dec Number of local disks maxmem Numeric Dec Maximum memory (Mbytes) maxswp Numeric Dec Maximum swap space (Mbytes) maxtmp Numeric Dec Maximum /tmp space (Mbytes) cpuf Numeric Dec CPU factor rexpri Numeric N/A Remote execution priority server Boolean N/A LSF server host irix Boolean N/A IRIX UNIX hpux Boolean N/A HP_UX solaris Boolean N/A SunSolaris cserver Boolean N/A Compute Server fserver Boolean N/A File server aix Boolean N/A AIX UNIX type String N/A Host type model String N/A Host model status String N/A Host status hname String N/A Host name TYPE_NAME HPPA SGI6 ALPHA SUNSOL RS6K NTX86 MODEL_NAME CPU_FACTOR DEC3000 10.00 R10K 14.00 PENT200 6.00 IBM350 7.00 SunSparc 6.00 HP735 9.00 HP715 6.00
The lsinfo command displays three lists of information:
The resources listed by lsinfo include built-in resources maintained by the LIM and site specific resources configured by the LSF administrator. For a complete description of how LSF manages resources, see 'Resources'.
The host types and host models are defined by the LSF administrator. Host types represent binary compatible hosts; all hosts of the same type can run the same executables. Host models give the relative CPU performance of different processors. In this example, your LSF cluster treats an R10K processor as being twice as fast as an IBM 350 processor1..
LSF keeps information about all hosts in the cluster. Some information is static and some is dynamic. Static information is either configured by the LSF administrator, or is a fixed property of the system. An example of static host information is the amount of RAM memory available to users on a host.
Dynamic host information, or load indices, is determined by the LSF system, and updated regularly. Dynamic information represents the changing resources available on the host. Examples of dynamic host information are the current CPU load and the currently available temporary file space.
A load sharing cluster may consist of hosts of differing architecture and speed. The lshosts command displays configuration information about hosts. All these parameters are defined by the LSF administrator in the LSF configuration files, or determined by the LIM directly from the system.
% lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES hostD SUNSOL SunSparc 6.0 1 64M 112M Yes (solaris cserver) hostB ALPHA DEC3000 10.0 1 94M 168M Yes (alpha cserver) hostM RS6K IBM350 7.0 1 64M 124M Yes (cserver aix) hostC SGI6 R10K 14.0 16 1024M 1896M Yes (irix cserver) hostA HPPA HP715 6.0 1 98M 200M Yes (hpux fserver)
In this example, the host type SUNSOL represents Sun SPARC systems running Solaris, and ALPHA represents a Digital Alpha server running Digital Unix.
See 'Listing Hosts' for a complete description of the lshosts command.
The lsload command prints out current load information.
% lsload HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem hostD ok 0.1 0.0 0.1 2% 0.0 5 3 81M 82M 45M hostC ok 0.7 1.2 0.5 50% 1.1 11 0 322M 337M 252M hostM ok 0.8 2.2 1.4 60% 15.4 0 136 62M 57M 45M hostA busy *5.2 3.6 2.6 99% *34.4 4 0 70M 34M 18M hostB lockU 1.0 1.0 1.5 99% 0.8 5 33 12M 24M 23M
The first line lists the load index names, and each following line gives the load levels for one host. The r15s, r1m and r15m fields give the CPU load, averaged over different time intervals. The ut field gives the percentage of time the CPU is in use. pg is the paging rate, ls is the number of login sessions, it is the idle time (the time since the last interactive user activity), swp is the available swap space in megabytes, mem is the available RAM in megabytes, and tmp is the available temporary disk space in megabytes.
The status column gives the load status of the host. A host is busy if any load index is beyond its configured threshold. When a load index is beyond its threshold, it is printed with an asterisk '*'. In the above example, hostA is busy because load indices r15s and pg are too high. The lshosts -l command shows the load thresholds.
Hosts with ok status are listed first. The ok hosts are sorted based on CPU and memory load, with the best host listed first.
The lsload command reports more load indices if the -l option is given.
The lsmon command provides an updating display of load information. The xlsmon command is an X-windows graphical display of host status and load levels in your LSF cluster.
See the lsload(1), lsmon(1), and xlsmon(1) manual pages for more information. Also see 'Displaying the Load'.
LSF supports transparent execution of jobs on all server hosts in the cluster. You can run your program on the best available host and interact with it just as if it were running directly on your workstation. Keyboard signals such as CTRL-Z and CTRL-C work as expected.
There are different ways to run jobs on a remote host. To run myjob on the best available host, enter:
% lsrun myjob
LSF automatically selects the best host that is of the same type as the local host.
If you want to run myjob on a host with specific resources, you must specify the resource requirements. For example,
% lsrun -R 'cserver && swp>100' myjob
runs myjob on a host that has resource 'cserver' (see 'Displaying Available Resources') and has at least 100 megabytes of virtual memory available.
If you want to run your job on a particular host, use the -m option:
% lsrun -m hostD myjob
When you run an interactive job on a remote host, you can do most of the job controls as if it were running locally. If your shell supports job control, you can suspend and resume the job and bring the job to background or foreground as if it were a local job. For a complete description, see the lsrun(1) manual page.
You can also write one-line shell scripts or csh aliases to hide the remote execution. For example:
#! /bin/sh # Script to remote execute myjob exec lsrun -m hostD myjob
% alias myjob "lsrun -m hostD myjob"
The lstcsh shell is a load-sharing version of the tcsh command interpreter. It is compatible with csh and supports many useful extensions. csh and tcsh users can use lstcsh to send jobs to other hosts in the cluster without needing to learn any new commands. You can run lstcsh from the command line, or use the chsh command to set it as your login shell. Refer to 'Using lstcsh' for a more detailed description.
lsmake is a load-sharing, parallel version of GNU make. It is compatible with makefiles for most versions of make. lsmake uses the LSF load information to choose the best group of hosts for your make job. Targets in the makefile are processed in parallel on the chosen hosts using the LSF remote execution facilities. You do not need to modify your makefile to use lsmake. By default, lsmake chooses hosts that are all of the same type.
The following example uses the lsmake -V and -j 3 options to run on three hosts and produce verbose output:
% lsmake -V -j 3 [hostA] [hostD] [hostK] << Execute on local host >> cc -O -c arg.c -o arg.o << Execute on remote host hostA >> cc -O -c dev.c -o dev.o << Execute on remote host hostK >> cc -O -c main.c -o main.o << Execute on remote host hostD >> cc -O arg.o dev.o main.o
lsmake includes control over parallelism for recursive makes, which are often used for source code trees that are organized into subdirectories. Parallelism can also be controlled by the load on the NFS file server, so that parallel makes do not overload the server and slow everyone else down. See 'Using lsmake' for details.
LSF Batch uses some (or all) of the hosts in an LSF cluster as batch server hosts. The host list is configured by the LSF administrator. The bhosts command displays information about these hosts.
% bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hostA ok - 2 1 1 0 0 0 hostB ok - 3 2 1 0 0 1 hostC ok - 32 10 9 0 1 0 hostD ok - 32 10 9 0 1 0 hostM unavail - 3 3 1 1 1 0
STATUS gives the status of sbatchd. If a host is down or its sbatchd is not up, its STATUS is 'unavail'. The JL/U column shows the maximum number of job slots a single user can use on each host at one time. MAX gives the maximum number of job slots that are configured for each host. The RUN, SSUSP, and USUSP columns display the number of job slots in use by jobs in RUN state, suspended by the system, and suspended by the user, respectively. The field RSV shows job slots that are reserved by LSF Batch for some jobs. The NJOBS field shows the sum of field RUN, SSUSP, USUSP, and RSV.
For a more detailed description of the bhosts command see 'Batch Hosts'.
To submit a job to the LSF Batch system, use the bsub command.
For example, submit the job sleep 30. This command does nothing, and takes 30 seconds to do it. The LSF administrator configures one queue to be the default job queue; if you submit a job without specifying a queue, the job goes to the default queue.
% bsub sleep 30 Job <1234> is submitted to default queue <normal>
In the above example, 1234 is the job ID assigned by LSF Batch to this job, and normal is the name of the default job queue.
Your batch job remains pending until all conditions for its execution are met. Each batch queue has execution conditions that apply to all jobs in the queue, and you can specify additional conditions when you submit the job.
The -m "host1 host2 ..." option specifies that the job must run on one of the specified hosts. By specifying a single host, you can force your job to wait until that host is available and then run on that host.
For a detailed description of the bsub command see 'Submitting Batch Jobs'.
Job queues represent different job scheduling and control policies. All jobs submitted to the same queue share the same scheduling and control policy. Each job queue can use a configured subset of the server hosts in the LSF cluster; the default is to use all server hosts.
System administrators can configure job queues to control resource access by different users and types of application. Users select the job queue that best fits each job.
The bqueues command lists the available LSF Batch queues:
% bqueues QUEUE_NAME PRIO NICE STATUS MAX JL/U JL/P NJOBS PEND RUN SUSP owners 49 10 Open:Active - - - 1 0 1 0 priority 43 10 Open:Active 10 - - 8 5 3 0 night 40 10 Open:Inactive - - - 44 44 0 0 short 35 20 Open:Active 20 - 2 4 0 4 0 license 33 10 Open:Active 40 - - 1 1 0 0 normal 30 20 Open:Active - 2 - 0 0 0 0 idle 20 20 Open:Active - 2 1 2 0 0 2
A dash '-' in any entry means that the column does not apply to the row. In this example some queues have no per-queue, per-user or per-processor job limits configured, so the MAX, JL/U and JL/P entries are '-'.
You can submit jobs to a queue as long as its STATUS is Open. However, jobs are not dispatched unless the queue is Active.
The bjobs command reports the status of LSF Batch jobs. The -u all option specifies that jobs for all users should be listed; the default is to list only jobs you submitted. Running jobs are listed first. Pending jobs are listed in the order in which they will be scheduled. Jobs in high priority queues are listed before those in lower priority queues.
% bjobs -u all JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 1004 user7 RUN short hostA hostA myjob0 Dec 16 09:23 1235 user2 PEND priority hostM sleep 30 Dec 11 13:55 1234 user2 SSUSP normal hostD hostM sleep 30 Dec 11 10:09 1250 user1 PEND short hostA myjob2 Dec 11 13:59
If you also want to see jobs that finished recently, enter:
% bjobs -a
All your jobs that are still in the LSF Batch system and jobs finished recently are displayed.
The bjobs command has many other options. See 'Batch Jobs'. Also refer to the bjobs(1) manual page for a complete description.
You can submit your job to the LSF Batch system using the X-windows graphical user interface application xbsub as shown in Figure 3.
The xlsbatch command is another X-windows application for LSF Batch (Figure 4). You can use it to monitor host, job, and queue status, and control your jobs.
Both xbsub and xlsbatch have extensive on-line help available through the Help menu of each application.
xbsub can be started either directly from the command line or from xlsbatch using the 'Submit' button.
Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.