[Contents] [Prev] [Next] [End]
Within a company or organization, each division, department, or site may have a separately managed LSF cluster. Many organizations have realized that it is desirable to allow their multitude of clusters to cooperate to reap the benefits of global load sharing:
LSF MultiCluster enables a large organization to form multiple cooperating clusters of computers so that load sharing happens not only within the clusters but also among them. It enables load sharing across large numbers of hosts, allows resource ownership and autonomy to be enforced, non-shared user accounts and file systems to be supported, and communication limitations among the clusters to be taken into consideration in job scheduling.
The commands lshosts, lsload, and lsmon can accept a cluster name to allow you to view the remote cluster. A list of clusters and associated information can be viewed with the lsclusters command.
% lsclusters CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS clus1 ok hostC user1 3 3 clus2 ok hostA user1 3 3
% lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES hostA NTX86 PENT200 10.0 - - - Yes (NT) hostF HPPA HP735 14.0 1 58M 94M Yes (hpux cserver) hostB SUN41 SPARCSLC 3.0 1 15M 29M Yes (sparc bsd) hostD HPPA HP735 14.0 1 463M 812M Yes (hpux cserver) hostE SGI R10K 16.0 16 896M 1692M Yes (irix cserver ) hostC SUNSOL SunSparc 12.0 1 56M 75M Yes (solaris cserver)
% lshosts clus1 HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES hostD HPPA HP735 14.0 1 463M 812M Yes (hpux cserver) hostE SGI R10K 16.0 16 896M 1692M Yes (irix cserver ) hostC SUNSOL SunSparc 12.0 1 56M 75M Yes (solaris cserver)
% lshosts clus2 HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES hostA NTX86 PENT200 10.0 - - - Yes (NT) hostF HPPA HP735 14.0 1 58M 94M Yes (hpux cserver) hostB SUN41 SPARCSLC 3.0 1 15M 29M Yes (sparc bsd)
% lsload clus1 clus2 HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem hostD ok 0.2 0.3 0.4 19% 6.0 6 3 146M 319M 252M hostC ok 0.1 0.0 0.1 1% 0.0 3 43 63M 44M 27M hostA ok 0.3 0.3 0.4 35% 0.0 3 1 40M 42M 13M hostB busy *1.3 1.1 0.7 68% *57.5 2 4 18M 20M 8M hostE lockU 1.2 2.2 2.6 30% 5.2 35 0 10M 693M 399M hostF unavail
A queue may be configured to send LSF Batch jobs to a queue in a remote cluster (see 'LSF Batch Configuration' in the LSF Administrator's Guide). When you submit a job to that local queue it will automatically get sent to the remote cluster:
The bclusters command displays a list of local queues together with their relationship with queues in remote clusters.
% bclusters LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER STATUS testmc send testmc clus2 ok testmc recv - clus2 ok
The meanings of displayed fields are:
In the above example, local queue testmc can forward jobs in the local cluster to testmc queue of remote cluster clus2 and vice versa.
If there is no queue in your cluster that is configured for remote clusters, you will see the following:
% bclusters No local queue sending/receiving jobs from remote clusters
Use the -m option with a cluster name to the bqueues command to display the queues in the remote cluster.
% bqueues -m clus2 QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP fair 3300 Open:Active 5 - - - 1 1 0 0 interactive 1055 Open:Active - - - - 1 0 1 0 testmc 55 Open:Active - - - - 5 2 2 1 priority 43 Open:Active - - - - 0 0 0 0
Submit your job with the bsub command to the queue that is sending jobs to the remote cluster.
% bsub -q testmc -J mcjob myjob Job <101> is submitted to queue <testmc>.
The bjobs command will display the cluster name in the FROM_HOST and EXEC_HOST fields. The format of these fields is 'host@cluster' indicating which cluster the job originated from or was forwarded to. To query the jobs running in another cluster, use the -m option and specify a cluster name.
% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 101 user7 RUN testmc hostC hostA@clus2 mcjob Oct 19 19:41
% bjobs -m clus2 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 522 user7 RUN testmc hostC@clus2 hostA mcjob Oct 19 23:09
Note that the submission time shown from the remote cluster is the time when the job was forwarded to that cluster.
To view the hosts of another cluster you can use a cluster name in place of a host name as the argument to the bhosts command.
% bhosts clus2 HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hostA ok - 10 1 1 0 0 0 hostB ok - 10 2 1 0 0 1 hostF unavail - 3 1 1 0 0 0
Run bhist command to see the history of your job, including information about job forwarding to another cluster.
% bhist -l 101 Job Id <101>, Job Name <mcjob>, User <user7>, Project <default>, Command <myjob> Sat Oct 19 19:41:14: Submitted from host <hostC> to Queue <testmc>,CWD <$HOME> Sat Oct 19 21:18:40: Parameters are modified to:Project <test>,Queue <testmc>, Job Name <mcjob>; Mon Oct 19 23:09:26: Forwarded job to cluster clus2; Mon Oct 19 23:09:26: Dispatched to <hostA>; Mon Oct 19 23:09:40: Running with execution home </home/user7>, Execution CWD < /home/user7>, Execution Pid <4873>; Mon Oct 20 07:02:53: Done successfully. The CPU time used is 12981.4 seconds; Summary of time in seconds spent in various states by Tue Oct 20 07:02:53 1996 PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 5846 0 28399 0 0 0 34245
The lsrun command allows you to specify a cluster name instead of a host name. When a cluster name is specified, a host is selected from the cluster. For example:
% lsrun -m clus2 -R type==any hostname hostA
The -m option to the lslogin command can be used to specify a cluster name. This allows you to login to the best host in a remote cluster.
% lslogin -v -m clus2 <<Remote login to hostF >>
The multicluster environment can be configured so that one cluster accepts interactive jobs from the other cluster, but not vice versa. See 'Running Interactive Jobs on Remote Clusters' in the LSF Administrator's Guide. If the remote cluster will not accept jobs from your cluster, you will get an error:
% lsrun -m clus2 -R type==any hostname ls_placeofhosts: Not enough host(s) currently eligible
By default, LSF assumes a uniform user name space within a cluster and between clusters. It is not uncommon for an organization to fail to satisfy this assumption. Support for non-uniform user name spaces between clusters is provided for the execution of batch jobs. The .lsfhosts file used to support account mapping can be used to specifying cluster names in place of host names.
For example, you have accounts on two clusters, clus1 and clus2. In clus1, your user name is 'user1' and in clus2 your user name is 'ruser_1'. To run your jobs in either cluster under the appropriate user name, you should setup your .lsfhosts file as follows:
% cat ~user1/.lsfhosts clus2 ruser_1
% cat ~ruser_1/.lsfhosts clus1 user1
For another example, you have the account 'user1' on cluster clus1 and you want to use the 'lsfguest' account when sending jobs to be run on cluster clus2. The .lsfhosts files should be setup as follows:
% cat ~user1/.lsfhosts clus2 lsfguest send
% cat ~lsfguest/.lsfhosts clus1 user1 recv
The other features of the .lsfhosts file also work in the multicluster environment. See 'User Controlled Account Mapping' for further details. Also see 'Account Mapping Between Clusters' in the LSF Administrator's Guide.
Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.