Chapter 8. Managing LSF MultiCluster

What is LSF MultiCluster?

Within a single organization, divisions, departments, or sites may have separate LSF clusters managed independently. Many organizations have realized it is desirable to allow their multitude of clusters to cooperate to reap the benefits of global load sharing:

Users can access a diverse collection of computing resources and get better performance as well as computing capabilities. Many machines that would otherwise be idle can be used to process jobs. Multiple machines can be used to process a single parallel job. All these lead to increased user productivity.
The demands for computing resources fluctuate widely across departments and over time. Partitioning the resources of an organization along user and departmental boundaries forces each department to plan for computing resources according to its maximum demand. Load sharing makes it possible for an organization to plan computing resources globally based on total demand. Resources can be added anywhere and made available to the entire organization. Global policies for load sharing can be implemented. With efficient resource sharing, the organization can realize increased computer usage in an economical manner.

LSF MultiCluster enables a large organization to form multiple cooperating clusters of computers so that load sharing happens not only within the clusters but also among them. It enables load sharing across large numbers of hosts, allows resource ownership and autonomy to be enforced, non-shared user accounts and file systems to be supported, and communication limitations among the clusters to be taken into consideration in job scheduling.

LSF MultiCluster is a separate product in the LSF V3.0 suite. You must obtain a specific license for LSF MultiCluster before you can use it.

This chapter describes the configuration and operational details of LSF MultiCluster. The topics covered are:

Monitoring of load and host information of remote clusters
Access control of inter-cluster interactive tasks
Execute batch jobs transparently on remote clusters
Account mapping between clusters not sharing a uniform username/user ID space

Creating a New Cluster

If you are going to create a new cluster that is completely irrelevant to the current cluster, follow procedures described in 'Installation'.

If you are going to create a cluster that will possibly share resources with an existing cluster, follow the procedures below. To create a new LSF cluster, use the lsfsetup command from the distribution directory.

Step 1.: Log in as root, and change directory to the LSF distribution directory. Run the ./lsfsetup command and choose option 7, 'Install Component'.
Step 2.: Select the lsf.conf file for your existing LSF cluster.
Step 3.: Choose option 1, 'Install Configuration Files', and then option 2, 'Change Current Settings'. Enter the new cluster name and LSF administrator name, and leave all other settings unchanged.
Step 4.: Choose option 3, 'Install the Software Now'. This creates all the configuration and working directories for the new cluster.

You now must configure the cluster and set up all the hosts, as described beginning with 'Initial Configuration'.

The following steps should be followed to enable the sharing of load information, interactive tasks and batch jobs between clusters:

Define the multicluster feature in the lsf.cluster.cluster file. Your licence must have multicluster support.
Configure LIM to specify the sharing of load information and interactive job access control.
Configure LSF Batch to specify the queues sharing jobs and account mapping between the users.

The LIM configuration files lsf.shared and lsf.cluster.cluster (stored in LSF_CONFDIR) are affected by multicluster operation. For sharing to take place between clusters, they must share common definitions in terms of host types, models, and resources. For this reason, the lsf.shared file must be the same on each cluster. This can be accomplished by putting it into a shared file system or replicating it across all clusters.

Each LIM reads the lsf.shared file and its own lsf.cluster.cluster file. All information about a remote cluster is retrieved dynamically by the master LIM's on each cluster communicating with each other. However, before this can occur a master LIM must know the name of at least some of the LSF server hosts in each remote cluster with which it will interact. The names of the servers in a remote cluster are used to locate the current master LIM on that cluster as well as to ensure that any remote master is a valid host for that cluster. The latter is necessary to ensure security and prevent a bogus LIM from interacting with your cluster.

The `lsf.shared` File

The lsf.shared file in LSF_CONFDIR should list the names of all clusters. For example:

Begin Cluster
ClusterName
clus1
clus2
End Cluster

The LIM will read the lsf.cluster.cluster file in LSF_CONFDIR for each remote cluster and save the first 10 host names listed in the Host section. These will be considered as valid servers for that cluster, that is, one of these servers must be up and running as the master.

If LSF_CONFDIR is not shared or replicated then it is necessary to specify a list of valid servers in each cluster using the option Servers in the Cluster section. For example:

Begin Cluster
ClusterName      Servers
clus1          (hostC hostD hostE)
clus2          (hostA hostB hostF)
End Cluster

The hosts listed in the servers column are the contacts so that LIMs in remote clusters can get in touch with the local cluster. One of the hosts listed in Servers column must be up and running as the master for other clusters to contact the local cluster.

The `lsf.cluster.cluster` File

To enable the multicluster feature, insert the following section into the lsf.cluster.cluster file:

Begin Parameters
FEATURES=lsf_base lsf_mc lsf_batch
End Parameters

Note
The license file must support the LSF MultiCluster feature. If you have configured the cluster to run LSF MultiCluster on all hosts, and the license file does not contain the LSF MultiCluster feature, then the hosts will be unlicensed, even if you have valid licenses for other LSF components. See 'Setting Up the License Key' for more details.

By default the local cluster can obtain information about all other clusters specified in lsf.shared. However, if the local cluster is only interested in certain remote clusters, you can use the following section in lsf.cluster.cluster to limit which remote clusters your cluster is interested in. For example,

Begin RemoteClusters
CLUSTERNAME
clus3
clus4
End RemoteClusters

This means local applications will not know anything about clusters other than clusters clus3 and clus4. Note that this also affects the way RES behaves when it is authenticating a remote user. Remote execution requests originating from users outside of these clusters are rejected. The default behaviour is to accept any request from all the clusters in lsf.shared.

The RemoteClusters section may be used to specify the following parameters associated with each cluster in addition to the CLUSTERNAME parameter.

`CACHE_INTERVAL`

Load and host information is requested periodically from the remote cluster and cached by the local master LIM. Clients in the local cluster receives the cached copy of the remote cluster information. This parameter controls how long load information from the remote cluster is cached in seconds. The default is 60 seconds. Upon a request from a command, the cached information is used if it is less than CACHE_INTERVAL second old otherwise fresh information is retrieved from the relevant remote cluster by the local master LIM and returned to the user. Host information is cached twice as long as load information is.

`EQUIV`

The LSF utilities such as lsload, lshosts, lsplace, and lsrun normally only return information about the local cluster. To get information about or run tasks on hosts in remote clusters, you must explicitly specify a clustername (see sections below). To make resources in remote clusters as transparent as possible to the user, you can specify a remote cluster as being equivalent to the local cluster. The master LIM will then consider all equivalent clusters when servicing requests from clients for load, host or placement information. Therefore, you do not have to explicitly specify remote cluster names. For example, lsload will list hosts of the local cluster as well as the remote clusters.

`RECV_FROM`

By default, if two clusters are configured to access each other's load information, they also accept interactive jobs from each other. If you want your cluster to access load information of another cluster but not to accept interactive jobs from the other cluster, you set RECV_FROM to 'N'. Otherwise, set RECV_FROM to 'Y'.

Example

For cluster clus1, clus2 is equivalent to the local cluster. Load information is refreshed every 30 seconds. However, clus1 rejects interactive jobs from clus2.

# Excerpt of lsf.cluster.clus1
Begin RemoteClusters
...
CLUSTERNAME      EQUIV   CACHE_INTERVAL  RECV_FROM
clus2              Y          30             N
End RemoteClusters

Cluster clus2 does not treats clus1 as equivalent to the local cluster. Load information is refreshed every 45 seconds. Interactive jobs from clus1 are accepted.

# Excerpt of lsf.cluster.clus1
Begin RemoteClusters
...
CLUSTERNAME      EQUIV   CACHE_INTERVAL  RECV_FROM
clus1              N         45             Y
End RemoteClusters

Root Access

By default, root access across clusters is not allowed. To allow root access from a remote cluster, specify LSF_ROOT_REX=all in lsf.conf. This implies that root jobs from both the local and remote clusters are accepted. This applies to both interactive and batch jobs.

If you want cluster clus1 and clus2 to allow root access execution for local jobs only, you insert the line LSF_ROOT_REX=local into the lsf.conf of both cluster clus1 and cluster clus2. However, if you want clus2 to also allow root access execution from any cluster, change the line in lsf.conf of cluster clus2 to LSF_ROOT_REX=all.

Note
lsf.conf file is host type specific and not shared across different platforms. You must make sure that the lsf.conf file for all your host types are changed consistently.

LSF Batch Configuration

To enable batch jobs to flow across clusters the keywords SNDJOBS_TO and RCVJOBS_FROM are used in the queue definition of the lsb.queues file.

The syntax is as follows:

Begin Queue
QUEUE_NAME=normal
SNDJOBS_TO=Queue1@Cluster1 Queue2@Cluster2 ... QueueN@ClusterN
RCVJOBS_FROM=Cluster1 Cluster2 ... ClusterN
PRIORITY=30
NICE=20
End Queue

Note
You do not specify a remote queue in the RCVJOBS_FROM parameter. The administrator of the remote cluster determines which queues will forward jobs to the normal queue in this cluster.

It is up to you and the administrator of the remote clusters to ensure that the policy of the local and remote queues are equivalent in terms of the scheduling behaviour seen by users' jobs.

If SNDJOBS_TO is defined in a queue, the LSF Batch daemon will first try to match jobs submitted to the queue to hosts in the local cluster. If not enough job slots could be found to run the jobs, mbatchd in the local cluster will negotiate with the mbatchd daemons in the remote clusters defined by the SNDJOBS_TO parameter for possible remote execution. If suitable hosts in a remote cluster are identified by a remote mbatchd, jobs are forwarded to the remote cluster for execution. The status of remotely executed jobs are automatically forwarded to the submission cluster so that users can still view job status as if the jobs were run in the local cluster.

If you want to set up a queue that will forward jobs to remote clusters but will not run any jobs in the local cluster, you can use the scheduling thresholds to prevent local execution. For example, you can set the loadSched for the MEM index to 10000 (assuming no local host has more than 10G of available memory). If your have a queue remote_only in cluster clus1:

Begin Queue

QUEUE_NAME=remote_only
SNDJOBS_TO=testmc@clus2
MEM=10000/10000
PRIORITY=30
NICE=20
End Queue

Any jobs submitted to the queue remote_only will be forwarded to the queue testmc in cluster clus2.

For clus2, specify the queue testmc as follows:

Begin Queue
RCVJOBS_FROM    = clus1
QUEUE_NAME      = testmc
PRIORITY        = 55
NICE            = 10
DESCRIPTION     = Multicluster Queue
End Queue

When accepting a job with a pre-execution command from a remote cluster, the local cluster can configure the maximum number of times it will attempt the pre-execution command before returning the job to the submission cluster. The submission cluster will forward the job to one cluster at a time. The parameter to control the maximum number of times a remote jobs pre-exec command is retried by setting MAX_PREEXEC_RETRY in lsb.params.

Inter-cluster Load and Host Information Sharing

The information collected by LIMs on remote clusters can be viewed locally. The list of clusters and associated resources can be viewed with the lsclusters command.

% lsclusters
CLUSTER_NAME   STATUS   MASTER_HOST               ADMIN    HOSTS  SERVERS
clus2            ok       hostA                   user1      3        3
clus1            ok       hostC                   user1      3        3

If you have defined EQUIV to be 'Y' for cluster clus2 in your lsf.cluster.clus1 file, you will see all hosts in cluster clus2 if you run lsload or lshosts from cluster clus1. For example:

% lshosts
HOST_NAME      type  model    cpuf ncpus maxmem maxswp server RESOURCES
hostA         NTX86   PENT200 10.0   1    64M   100M    Yes  (pc nt)
hostF         HPPA    HP735   14.0   1    58M    94M    Yes  (hpux cs)
hostB         SUN41  SPARCSLC  8.0   1    15M    29M    Yes  (sparc bsd)
hostD         HPPA    A900    30.0   4   264M   512M    Yes  (hpux cs bigmem)
hostE         SGI    ORIGIN2K 36.0  32   596M   1024M   Yes  (irix cs bigmem)
hostC         SUNSOL SunSparc 12.0   1   56M     75M    Yes  (solaris cs)

You can use a cluster name in place of a host name to get information specific to a cluster. For example:

% lshosts clus1
HOST_NAME      type    model cpuf ncpus maxmem maxswp server RESOURCES
hostD         HPPA    A900    30.0   4   264M   512M    Yes  (hpux cs bigmem)
hostE         SGI    ORIGIN2K 36.0  32   596M   1024M   Yes  (irix cs bigmem)
hostC         SUNSOL SunSparc 12.0   1    56M    75M    Yes  (solaris cs)

% lshosts clus2
HOST_NAME      type    model cpuf ncpus maxmem maxswp server RESOURCES
hostA         NTX86   PENT200 10.0   1    64M   100M    Yes  (pc nt)
hostF         HPPA    HP735   14.0   1    58M    94M    Yes  (hpux cs)
hostB         SUN41  SPARCSLC  8.0   1    15M    29M    Yes  (sparc bsd)

% lsload clus1 clus2
HOST_NAME       status  r15s   r1m  r15m   ut    pg  ls    it   tmp   swp   mem
hostD               ok   0.2   0.3   0.4  19%   6.0   6     3  146M  319M   52M
hostC               ok   0.1   0.0   0.1   1%   0.0   3    43   63M   44M    7M
hostA               ok   0.3   0.3   0.4  35%   0.0   3     1   40M   42M   10M
hostB             busy  *1.3   1.1   0.7  68% *57.5   2     4   18M   25M    8M
hostE            lockU   1.2   2.2   2.6  30%   5.2  35     0   10M  293M  399M
hostF           unavail

LSF commands lshosts, lsload, lsmon, lsrun, lsgrun, and lsplace can accept a cluster name in addition to host names.

Running Interactive Jobs on Remote Clusters

The lsrun and lslogin commands can be used to run interactive jobs both within and across clusters. See 'Running Batch Jobs across Clusters' in the LSF User's Guide for examples.

You can configure the multicluster environment so that one cluster accepts interactive jobs from the other cluster, but not vice versa. For example, to make clus1 reject interactive jobs from clus2, you need to specify the RECV_FROM field in the file lsf.cluster.clus1:

Begin RemoteClusters
CLUSTERNAME  EQUIV   CACHE_INTERVAL     RECV_FROM
clus2          Y          30                N
End RemoteClusters

When a user in clus2 attempts to use the cluster clus1, an error will result. For example:

% lsrun -m clus1 -R - hostname
ls_placeofhosts: Not enough host(s) currently eligible

Cluster clus2 will not make any placement of jobs on clus1 and therefore lsrun will return an error about not able to find enough hosts.

% lsrun -m hostC -R - hostname
ls_rsetenv: Request from a non-LSF host rejected

In this case, the job request is sent to the host hostC and the RES on hostC rejects the job as it is not considered a valid LSF host.

Note
RECV_FROM only controls accessibility of interactive jobs. It does not affect jobs submitted to LSF Batch.

Distributing Batch Jobs across Clusters

You can configure a queue to send jobs to a queue in a remote cluster. Jobs submitted to the local queue can automatically get sent to remote clusters. The following commands can be used to get information about multiple clusters:

`bclusters`

The bclusters command displays a list of queues together with their relationship with queues in remote clusters.

% bclusters
LOCAL_QUEUE     JOB_FLOW   REMOTE     CLUSTER    STATUS
testmc          send       testmc      clus2      ok
testmc          recv         -         clus2      ok

The JOB_FLOW field describes whether the local queue is to send jobs to or receive jobs from remote cluster.

If the value of JOB_FLOW is send (that is, SNDJOBS_TO is defined in the local queue), then the REMOTE field indicates a queue name in the remote cluster. If the remote queue in the remote cluster does not have RCVJOBS_FROM defined to accept jobs from the cluster, the status field will never be ok. It will either be disc or reject, where disc means that the communication between the two clusters has not be established yet. This could occur if there are no jobs waiting to be dispatched or the remote master cannot be located. If remote cluster agrees to accept jobs from the local queue and communication has been successfully established, the status will be ok.

If the value of JOB_FLOW is recv (that is, RCVJOBS_FROM is defined in the local queue), then the REMOTE field is always '-'. The CLUSTER field then indicates the cluster name from which jobs will be accepted. The status field will be ok if a connection with the remote cluster has established.

% bclusters
LOCAL_QUEUE     JOB_FLOW   REMOTE     CLUSTER    STATUS
testmc          send       testmc      clus2      disc
testmc          recv       *           clus2      disc

`bqueues`

The -m host_name option can optionally take a cluster name to display the queues in a remote cluster.

% bqueues -m clus2
QUEUE_NAME     PRIO      STATUS      MAX  JL/U JL/P JL/H NJOBS  PEND  RUN  SUSP
fair          3300    Open:Active      5    -    -    -     0     0     0     0
interactive   1055    Open:Active      -    -    -    -     0     0     0     0
testmc          55    Open:Active      -    -    -    -     5     2     2     1
priority        43    Open:Active      -    -    -    -     0     0     0     0

`bjobs`

The bjobs command can display the cluster name in the FROM_HOST and EXEC_HOST fields. The format of these fields can be 'host@cluster' to indicate which cluster the job originated from or was forwarded to. Use the -w option to get the full cluster name. To query the jobs in a specific cluster, use the -m option and specify the cluster name.

% bjobs
JOBID USER     STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
101   user7    RUN   testmc     hostC       hostA@clus2 simulate   Oct  8 18:32
102   user7    USUSP testmc     hostC       hostB@clus2 simulate   Oct  8 18:56
104   user7    RUN   testmc     hostA@clus2 hostC        verify    Oct  8 19:20

% bjobs -m clus2
JOBID USER     STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
521   user7    RUN   testmc     hostC@clus1 hostA       simulate   Oct  8 18:35
522   user7    USUSP testmc     hostC@clus1 hostA       simulate   Oct  8 19:23
520   user7    RUN   testmc     hostA       hostC@clus1  verify    Oct  8 19:26

Note that jobs forwarded to a remote cluster are assigned new job IDs. You only need to use local job IDs when manipulating on jobs. The SUBMIT_TIME field displays the real job submission time for local jobs, and job forwarding time for jobs from remote clusters.

`bhosts`

To view the hosts of a specific cluster you can use a cluster name in place of a host name.

% bhosts clus2
HOST_NAME          STATUS    JL/U  MAX  NJOBS  RUN  SSUSP USUSP  RSV
hostA              ok          -    10     1     1     0     0     0
hostB              ok          -    10     1     1     0     0     0
hostF              closed      -     3     3     3     0     0     0

`bhist`

The bhist command displays the history of events about when a job is forwarded to another cluster or was accepted from another cluster.

% bhist -l 101
Job Id <101>, User <user7>, Project <default>, Command <simulate>
Tue Oct 08 18:32:11: Submitted from host <hostC> to Queue <testmc>, CWD <$HOME
                     /homes/user7>, Requested Resources <type!=ALPHA>
                     ;
Tue Oct 08 18:35:07: Forwarded job to cluster clus2;
Tue Oct 08 18:35:25: Dispatched to <hostA>;
Tue Oct 08 18:35:35: Running with execution home </homes/user7>, Execution C
                     WD <//homes/user7>, Execution Pid <25212>;
Tue Oct 08 20:30:50: USER suspend action  initiated (actpid 25672);
Tue Oct 08 20:30:50: Suspended by the user or administrator.

Summary of time in seconds spent in various states by Tue Oct 08 20:35:24 1996
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  176      0        6943      274       0        0       7393

Account Mapping Between Clusters

By default, LSF assumes a uniform user name space within a cluster and between clusters. It is not uncommon for an organization to fail to satisfy this assumption. Support for non-uniform user name spaces between clusters is provided for the execution of batch jobs.

User Level Account Mapping

Support for non-uniform user name spaces between clusters is provided for the execution of batch jobs. The .lsfhosts file used to support account mapping can be used to specify cluster names in place of host names.

For example, a user has accounts on two clusters, clus1 and clus2. On cluster clus1, the user name is 'userA' and on clus2 the user name is 'user_A'. To run jobs in either cluster under the appropriate user name, the .lsfhosts files should be setup as follows:

On machines in cluster clus1:

% cat ~userA/.lsfhosts
clus2 user_A

On machines in cluster clus2:

% cat ~user_A/.lsfhosts
clus1 userA

For another example, a user has the account 'userA' on cluster clus1 and wants to use the 'lsfguest' account when running jobs on cluster clus2. The .lsfhosts files should be setup as follows:

On machines in cluster clus1:

% cat ~userA/.lsfhosts
clus2 lsfguest send

On machines in cluster clus2:

% cat ~lsfguest/.lsfhosts
clus1 userA recv

In the third example, a site has two clusters, clus1 and clus2. A user has a uniform account name as userB on all hosts in clus2. However, in clus1, this user has a uniform account name as userA, except on hostX, on which he has the account name userA1. This user would like to use both clusters transparently.

To implement this mapping, the user should set .lsfhosts files in his home directories on different machines as follows:

On hostX of clus1:

% cat ~userA1/.lsfhosts
clus1    userA
hostX    userA1
clus2    userB

On any other machine in clus1:

% cat ~userA/.lsfhosts
clus2    userB
hostX    userA1

On the clus2 machines:

% cat ~userB/.lsfhosts
clus1    userA
hostX    userA1

[Contents] [Prev] [Next] [End]

doc@platform.com

Chapter 8. Managing LSF MultiCluster

What is LSF MultiCluster?

Creating a New Cluster

The lsf.shared File

The lsf.cluster.cluster File

CACHE_INTERVAL

EQUIV

RECV_FROM