[Contents] [Prev] [Next] [End]
Within a single organization, divisions, departments, or sites may have separate LSF clusters managed independently. Many organizations have realized it is desirable to allow their multitude of clusters to cooperate to reap the benefits of global load sharing:
LSF MultiCluster enables a large organization to form multiple cooperating clusters of computers so that load sharing happens not only within the clusters but also among them. It enables load sharing across large numbers of hosts, allows resource ownership and autonomy to be enforced, non-shared user accounts and file systems to be supported, and communication limitations among the clusters to be taken into consideration in job scheduling.
LSF MultiCluster is a separate product in the LSF V3.0 suite. You must obtain a specific license for LSF MultiCluster before you can use it.
This chapter describes the configuration and operational details of LSF MultiCluster. The topics covered are:
If you are going to create a new cluster that is completely irrelevant to the current cluster, follow procedures described in 'Installation'.
If you are going to create a cluster that will possibly share resources with an existing cluster, follow the procedures below. To create a new LSF cluster, use the lsfsetup command from the distribution directory.
You now must configure the cluster and set up all the hosts, as described beginning with 'Initial Configuration'.
The following steps should be followed to enable the sharing of load information, interactive tasks and batch jobs between clusters:
The LIM configuration files lsf.shared and lsf.cluster.cluster (stored in LSF_CONFDIR) are affected by multicluster operation. For sharing to take place between clusters, they must share common definitions in terms of host types, models, and resources. For this reason, the lsf.shared file must be the same on each cluster. This can be accomplished by putting it into a shared file system or replicating it across all clusters.
Each LIM reads the lsf.shared file and its own lsf.cluster.cluster file. All information about a remote cluster is retrieved dynamically by the master LIM's on each cluster communicating with each other. However, before this can occur a master LIM must know the name of at least some of the LSF server hosts in each remote cluster with which it will interact. The names of the servers in a remote cluster are used to locate the current master LIM on that cluster as well as to ensure that any remote master is a valid host for that cluster. The latter is necessary to ensure security and prevent a bogus LIM from interacting with your cluster.
The lsf.shared file in LSF_CONFDIR should list the names of all clusters. For example:
Begin Cluster ClusterName clus1 clus2 End Cluster
The LIM will read the lsf.cluster.cluster file in LSF_CONFDIR for each remote cluster and save the first 10 host names listed in the Host section. These will be considered as valid servers for that cluster, that is, one of these servers must be up and running as the master.
If LSF_CONFDIR is not shared or replicated then it is necessary to specify a list of valid servers in each cluster using the option Servers in the Cluster section. For example:
Begin Cluster ClusterName Servers clus1 (hostC hostD hostE) clus2 (hostA hostB hostF) End Cluster
The hosts listed in the servers column are the contacts so that LIMs in remote clusters can get in touch with the local cluster. One of the hosts listed in Servers column must be up and running as the master for other clusters to contact the local cluster.
To enable the multicluster feature, insert the following section into the lsf.cluster.cluster file:
Begin Parameters FEATURES=lsf_base lsf_mc lsf_batch End Parameters
Note
The license file must support the LSF MultiCluster feature. If you have configured the cluster to run LSF MultiCluster on all hosts, and the license file does not contain the LSF MultiCluster feature, then the hosts will be unlicensed, even if you have valid licenses for other LSF components. See 'Setting Up the License Key' for more details.
By default the local cluster can obtain information about all other clusters specified in lsf.shared. However, if the local cluster is only interested in certain remote clusters, you can use the following section in lsf.cluster.cluster to limit which remote clusters your cluster is interested in. For example,
Begin RemoteClusters CLUSTERNAME clus3 clus4 End RemoteClusters
This means local applications will not know anything about clusters other than clusters clus3 and clus4. Note that this also affects the way RES behaves when it is authenticating a remote user. Remote execution requests originating from users outside of these clusters are rejected. The default behaviour is to accept any request from all the clusters in lsf.shared.
The RemoteClusters section may be used to specify the following parameters associated with each cluster in addition to the CLUSTERNAME parameter.
Load and host information is requested periodically from the remote cluster and cached by the local master LIM. Clients in the local cluster receives the cached copy of the remote cluster information. This parameter controls how long load information from the remote cluster is cached in seconds. The default is 60 seconds. Upon a request from a command, the cached information is used if it is less than CACHE_INTERVAL second old otherwise fresh information is retrieved from the relevant remote cluster by the local master LIM and returned to the user. Host information is cached twice as long as load information is.
The LSF utilities such as lsload, lshosts, lsplace, and lsrun normally only return information about the local cluster. To get information about or run tasks on hosts in remote clusters, you must explicitly specify a clustername (see sections below). To make resources in remote clusters as transparent as possible to the user, you can specify a remote cluster as being equivalent to the local cluster. The master LIM will then consider all equivalent clusters when servicing requests from clients for load, host or placement information. Therefore, you do not have to explicitly specify remote cluster names. For example, lsload will list hosts of the local cluster as well as the remote clusters.
By default, if two clusters are configured to access each other's load information, they also accept interactive jobs from each other. If you want your cluster to access load information of another cluster but not to accept interactive jobs from the other cluster, you set RECV_FROM to 'N'. Otherwise, set RECV_FROM to 'Y'.
For cluster clus1, clus2 is equivalent to the local cluster. Load information is refreshed every 30 seconds. However, clus1 rejects interactive jobs from clus2.
# Excerpt of lsf.cluster.clus1 Begin RemoteClusters ... CLUSTERNAME EQUIV CACHE_INTERVAL RECV_FROM clus2 Y 30 N End RemoteClusters
Cluster clus2 does not treats clus1 as equivalent to the local cluster. Load information is refreshed every 45 seconds. Interactive jobs from clus1 are accepted.
# Excerpt of lsf.cluster.clus1 Begin RemoteClusters ... CLUSTERNAME EQUIV CACHE_INTERVAL RECV_FROM clus1 N 45 Y End RemoteClusters
By default, root access across clusters is not allowed. To allow root access from a remote cluster, specify LSF_ROOT_REX=all in lsf.conf. This implies that root jobs from both the local and remote clusters are accepted. This applies to both interactive and batch jobs.
If you want cluster clus1 and clus2 to allow root access execution for local jobs only, you insert the line LSF_ROOT_REX=local into the lsf.conf of both cluster clus1 and cluster clus2. However, if you want clus2 to also allow root access execution from any cluster, change the line in lsf.conf of cluster clus2 to LSF_ROOT_REX=all.
Note
lsf.conf file is host type specific and not shared across different platforms. You must make sure that the lsf.conf file for all your host types are changed consistently.
To enable batch jobs to flow across clusters the keywords SNDJOBS_TO and RCVJOBS_FROM are used in the queue definition of the lsb.queues file.
Begin Queue QUEUE_NAME=normal SNDJOBS_TO=Queue1@Cluster1 Queue2@Cluster2 ... QueueN@ClusterN RCVJOBS_FROM=Cluster1 Cluster2 ... ClusterN PRIORITY=30 NICE=20 End Queue
Note
You do not specify a remote queue in the RCVJOBS_FROM parameter. The administrator of the remote cluster determines which queues will forward jobs to the normal queue in this cluster.
It is up to you and the administrator of the remote clusters to ensure that the policy of the local and remote queues are equivalent in terms of the scheduling behaviour seen by users' jobs.
If SNDJOBS_TO is defined in a queue, the LSF Batch daemon will first try to match jobs submitted to the queue to hosts in the local cluster. If not enough job slots could be found to run the jobs, mbatchd in the local cluster will negotiate with the mbatchd daemons in the remote clusters defined by the SNDJOBS_TO parameter for possible remote execution. If suitable hosts in a remote cluster are identified by a remote mbatchd, jobs are forwarded to the remote cluster for execution. The status of remotely executed jobs are automatically forwarded to the submission cluster so that users can still view job status as if the jobs were run in the local cluster.
If you want to set up a queue that will forward jobs to remote clusters but will not run any jobs in the local cluster, you can use the scheduling thresholds to prevent local execution. For example, you can set the loadSched for the MEM index to 10000 (assuming no local host has more than 10G of available memory). If your have a queue remote_only in cluster clus1:
Begin Queue QUEUE_NAME=remote_only SNDJOBS_TO=testmc@clus2 MEM=10000/10000 PRIORITY=30 NICE=20 End Queue
Any jobs submitted to the queue remote_only will be forwarded to the queue testmc in cluster clus2.
For clus2, specify the queue testmc as follows:
Begin Queue RCVJOBS_FROM = clus1 QUEUE_NAME = testmc PRIORITY = 55 NICE = 10 DESCRIPTION = Multicluster Queue End Queue
When accepting a job with a pre-execution command from a remote cluster, the local cluster can configure the maximum number of times it will attempt the pre-execution command before returning the job to the submission cluster. The submission cluster will forward the job to one cluster at a time. The parameter to control the maximum number of times a remote jobs pre-exec command is retried by setting MAX_PREEXEC_RETRY in lsb.params.
The information collected by LIMs on remote clusters can be viewed locally. The list of clusters and associated resources can be viewed with the lsclusters command.
% lsclusters CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS clus2 ok hostA user1 3 3 clus1 ok hostC user1 3 3
If you have defined EQUIV to be 'Y' for cluster clus2 in your lsf.cluster.clus1 file, you will see all hosts in cluster clus2 if you run lsload or lshosts from cluster clus1. For example:
% lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES hostA NTX86 PENT200 10.0 1 64M 100M Yes (pc nt) hostF HPPA HP735 14.0 1 58M 94M Yes (hpux cs) hostB SUN41 SPARCSLC 8.0 1 15M 29M Yes (sparc bsd) hostD HPPA A900 30.0 4 264M 512M Yes (hpux cs bigmem) hostE SGI ORIGIN2K 36.0 32 596M 1024M Yes (irix cs bigmem) hostC SUNSOL SunSparc 12.0 1 56M 75M Yes (solaris cs)
You can use a cluster name in place of a host name to get information specific to a cluster. For example:
% lshosts clus1 HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES hostD HPPA A900 30.0 4 264M 512M Yes (hpux cs bigmem) hostE SGI ORIGIN2K 36.0 32 596M 1024M Yes (irix cs bigmem) hostC SUNSOL SunSparc 12.0 1 56M 75M Yes (solaris cs)
% lshosts clus2 HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES hostA NTX86 PENT200 10.0 1 64M 100M Yes (pc nt) hostF HPPA HP735 14.0 1 58M 94M Yes (hpux cs) hostB SUN41 SPARCSLC 8.0 1 15M 29M Yes (sparc bsd)
% lsload clus1 clus2 HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem hostD ok 0.2 0.3 0.4 19% 6.0 6 3 146M 319M 52M hostC ok 0.1 0.0 0.1 1% 0.0 3 43 63M 44M 7M hostA ok 0.3 0.3 0.4 35% 0.0 3 1 40M 42M 10M hostB busy *1.3 1.1 0.7 68% *57.5 2 4 18M 25M 8M hostE lockU 1.2 2.2 2.6 30% 5.2 35 0 10M 293M 399M hostF unavail
LSF commands lshosts, lsload, lsmon, lsrun, lsgrun, and lsplace can accept a cluster name in addition to host names.
The lsrun and lslogin commands can be used to run interactive jobs both within and across clusters. See 'Running Batch Jobs across Clusters' in the LSF User's Guide for examples.
You can configure the multicluster environment so that one cluster accepts interactive jobs from the other cluster, but not vice versa. For example, to make clus1 reject interactive jobs from clus2, you need to specify the RECV_FROM field in the file lsf.cluster.clus1:
Begin RemoteClusters CLUSTERNAME EQUIV CACHE_INTERVAL RECV_FROM clus2 Y 30 N End RemoteClusters
When a user in clus2 attempts to use the cluster clus1, an error will result. For example:
% lsrun -m clus1 -R - hostname ls_placeofhosts: Not enough host(s) currently eligible
Cluster clus2 will not make any placement of jobs on clus1 and therefore lsrun will return an error about not able to find enough hosts.
% lsrun -m hostC -R - hostname ls_rsetenv: Request from a non-LSF host rejected
In this case, the job request is sent to the host hostC and the RES on hostC rejects the job as it is not considered a valid LSF host.
Note
RECV_FROM only controls accessibility of interactive jobs. It does not affect jobs submitted to LSF Batch.
You can configure a queue to send jobs to a queue in a remote cluster. Jobs submitted to the local queue can automatically get sent to remote clusters. The following commands can be used to get information about multiple clusters:
The bclusters command displays a list of queues together with their relationship with queues in remote clusters.
% bclusters LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER STATUS testmc send testmc clus2 ok testmc recv - clus2 ok
The JOB_FLOW field describes whether the local queue is to send jobs to or receive jobs from remote cluster.
If the value of JOB_FLOW is send (that is, SNDJOBS_TO is defined in the local queue), then the REMOTE field indicates a queue name in the remote cluster. If the remote queue in the remote cluster does not have RCVJOBS_FROM defined to accept jobs from the cluster, the status field will never be ok. It will either be disc or reject, where disc means that the communication between the two clusters has not be established yet. This could occur if there are no jobs waiting to be dispatched or the remote master cannot be located. If remote cluster agrees to accept jobs from the local queue and communication has been successfully established, the status will be ok.
If the value of JOB_FLOW is recv (that is, RCVJOBS_FROM is defined in the local queue), then the REMOTE field is always '-'. The CLUSTER field then indicates the cluster name from which jobs will be accepted. The status field will be ok if a connection with the remote cluster has established.
% bclusters LOCAL_QUEUE JOB_FLOW REMOTE CLUSTER STATUS testmc send testmc clus2 disc testmc recv * clus2 disc
The -m host_name option can optionally take a cluster name to display the queues in a remote cluster.
% bqueues -m clus2 QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP fair 3300 Open:Active 5 - - - 0 0 0 0 interactive 1055 Open:Active - - - - 0 0 0 0 testmc 55 Open:Active - - - - 5 2 2 1 priority 43 Open:Active - - - - 0 0 0 0
The bjobs command can display the cluster name in the FROM_HOST and EXEC_HOST fields. The format of these fields can be 'host@cluster' to indicate which cluster the job originated from or was forwarded to. Use the -w option to get the full cluster name. To query the jobs in a specific cluster, use the -m option and specify the cluster name.
% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 101 user7 RUN testmc hostC hostA@clus2 simulate Oct 8 18:32 102 user7 USUSP testmc hostC hostB@clus2 simulate Oct 8 18:56 104 user7 RUN testmc hostA@clus2 hostC verify Oct 8 19:20
% bjobs -m clus2 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 521 user7 RUN testmc hostC@clus1 hostA simulate Oct 8 18:35 522 user7 USUSP testmc hostC@clus1 hostA simulate Oct 8 19:23 520 user7 RUN testmc hostA hostC@clus1 verify Oct 8 19:26
Note that jobs forwarded to a remote cluster are assigned new job IDs. You only need to use local job IDs when manipulating on jobs. The SUBMIT_TIME field displays the real job submission time for local jobs, and job forwarding time for jobs from remote clusters.
To view the hosts of a specific cluster you can use a cluster name in place of a host name.
% bhosts clus2 HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hostA ok - 10 1 1 0 0 0 hostB ok - 10 1 1 0 0 0 hostF closed - 3 3 3 0 0 0
The bhist command displays the history of events about when a job is forwarded to another cluster or was accepted from another cluster.
% bhist -l 101 Job Id <101>, User <user7>, Project <default>, Command <simulate> Tue Oct 08 18:32:11: Submitted from host <hostC> to Queue <testmc>, CWD <$HOME /homes/user7>, Requested Resources <type!=ALPHA> ; Tue Oct 08 18:35:07: Forwarded job to cluster clus2; Tue Oct 08 18:35:25: Dispatched to <hostA>; Tue Oct 08 18:35:35: Running with execution home </homes/user7>, Execution C WD <//homes/user7>, Execution Pid <25212>; Tue Oct 08 20:30:50: USER suspend action initiated (actpid 25672); Tue Oct 08 20:30:50: Suspended by the user or administrator. Summary of time in seconds spent in various states by Tue Oct 08 20:35:24 1996 PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 176 0 6943 274 0 0 7393
By default, LSF assumes a uniform user name space within a cluster and between clusters. It is not uncommon for an organization to fail to satisfy this assumption. Support for non-uniform user name spaces between clusters is provided for the execution of batch jobs.
Support for non-uniform user name spaces between clusters is provided for the execution of batch jobs. The .lsfhosts file used to support account mapping can be used to specify cluster names in place of host names.
For example, a user has accounts on two clusters, clus1 and clus2. On cluster clus1, the user name is 'userA' and on clus2 the user name is 'user_A'. To run jobs in either cluster under the appropriate user name, the .lsfhosts files should be setup as follows:
% cat ~userA/.lsfhosts clus2 user_A
% cat ~user_A/.lsfhosts clus1 userA
For another example, a user has the account 'userA' on cluster clus1 and wants to use the 'lsfguest' account when running jobs on cluster clus2. The .lsfhosts files should be setup as follows:
% cat ~userA/.lsfhosts clus2 lsfguest send
% cat ~lsfguest/.lsfhosts clus1 userA recv
In the third example, a site has two clusters, clus1 and clus2. A user has a uniform account name as userB on all hosts in clus2. However, in clus1, this user has a uniform account name as userA, except on hostX, on which he has the account name userA1. This user would like to use both clusters transparently.
To implement this mapping, the user should set .lsfhosts files in his home directories on different machines as follows:
% cat ~userA1/.lsfhosts clus1 userA hostX userA1 clus2 userB
On any other machine in clus1:
% cat ~userA/.lsfhosts clus2 userB hostX userA1
% cat ~userB/.lsfhosts clus1 userA hostX userA1
Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.