[Contents] [Prev] [Next] [End]
This chapter describes the operating concepts and maintenance tasks of the batch queuing system, LSF Batch. This chapter requires concepts from 'Managing LSF Base'. The topics covered in this chapter are:
Managing error log files for LSF Batch daemons was described in 'Managing Error Logs'. In this section discusses the other important log files LSF Batch daemons produce. The LSF Batch log files are found in the directory LSB_SHAREDIR/cluster/logdir.
Each time a batch job completes or exits, an entry is appended to the lsb.acct file. This file can be used to create accounting summaries of LSF Batch system use. The bacct(1) command produces one form of summary. The lsb.acct file is a text file suitable for processing with awk, perl, or similar tools. See the lsb.acct(5) manual page for details of the contents of this file. Additionally, the LSF Batch API supports calls to process the lsb.acct records. See the LSF Programmer's Guide for details of LSF Batch API.
You should move the lsb.acct file to a backup location, and then run your accounting on the backup copy. The daemon automatically creates a new lsb.acct file to replace the moved file. This prevents problems that might occur if the daemon writes new log entries while the accounting programs are running. When the accounting is complete, you can remove or archive the backup copy.
The LSF Batch daemons keep an event log in the lsb.events file. The mbatchd daemon uses this information to recover from server failures, host reboots, and LSF Batch reconfiguration. The lsb.events file is also used by the bhist command to display detailed information about the execution history of batch jobs, and by the badmin command to display the operational history of hosts, queues and LSF Batch daemons.
For performance reasons, the mbatchd automatically backs up and rewrites the lsb.events file after every 1000 batch job completions (this is the default; the value is controlled by the MAX_JOB_NUM parameter in the lsb.param file). The old lsb.events file is moved to lsb.events.1, and each old lsb.events.n file is moved to lsb.events.n+1. The mbatchd never deletes these files. If disk storage is a concern, the LSF administrator should arrange to archive or remove old lsb.events.n files occasionally.
CAUTION!
Do not remove or modify the lsb.events file. Removing or modifying the lsb.events file could cause batch jobs to be lost.
The lsadmin command is used to control LSF Base daemons, LIM and RES. LSF Batch has the badmin command to perform similar operations on LSF Batch daemons.
To check the status of LSF Batch server hosts and queues, use the bhosts and bqueues commands:
% bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hostA ok 2 1 0 0 0 0 0 hostB closed 2 2 2 2 0 0 0 hostD ok - 8 1 1 0 0 0
% bqueues QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP night 30 Open:Inactive - - - - 4 4 0 0 short 10 Open:Active 50 5 - - 1 0 1 0 simulation 10 Open:Active - 2 - - 0 0 0 0 default 1 Open:Active - - - - 6 4 2 0
If the status of a batch server is 'closed', then it will not accept more jobs. A server host can become closed if one of the following conditions is true:
An inactive queue will accept new job submissions, but will not dispatch any new jobs. A queue can become inactive if the LSF cluster administrator explicitly inactivates it via badmin command, or if the queue has a dispatch or run window defined and the current time is outside the time window.
mbatchd automatically logs the history of the LSF Batch daemons in the LSF Batch event log. You can display the administrative history of the batch system using the badmin command.
The badmin hhist command displays the times when LSF Batch server hosts are opened and closed by the LSF administrator.
The badmin qhist command displays the times when queues are opened, closed, activated, and inactivated.
The badmin mbdhist command displays the history of the mbatchd daemon, including the times when the master starts, exits, reconfigures, or changes to a different host.
The badmin hist command displays all LSF Batch history information, including all the events listed above.
You can use badmin hstartup command to start sbatchd on some or all remote hosts from one host:
% badmin hstartup all Start up slave batch daemon on <hostA> ......done Start up slave batch daemon on <hostB> ......done Start up slave batch daemon on <hostD> ......done
Note that you do not have to be root to use the badmin command to start LSF Batch daemons.
For remote startup to work, file /etc/lsf.sudoers has to be set up properly and you have to be able to run rsh across all LSF hosts without having to enter a password. See 'The lsf.sudoers File' for configuration details of lsf.sudoers.
mbatchd is restarted by the badmin reconfig command. sbatchd can be restarted using the badmin hrestart command:
% badmin hrestart hostD Restart slave batch daemon on <hostD> ...... done
You can specify more than one host name to restart sbatchd on multiple hosts, or use 'all' to refer to all LSF Batch server hosts. Restarting sbatchd on a host does not affect batch jobs that are running on that host.
The badmin hshutdown command shuts down the sbatchd.
% badmin hshutdown hostD Shut down slave batch daemon on <hostD> .... done
If sbatchd is shutdown, that particular host will not be available for running new jobs. Existing jobs running on that host will continue to completion, but the results will not be sent to the user until sbatchd is later restarted.
To shut down mbatchd you must first use the badmin hshutdown command to shut down the sbatchd on the master host, and then run the badmin reconfig command. The mbatchd is normally restarted by sbatchd; if there is no sbatchd running on the master host, badmin reconfig causes mbatchd to exit.
If mbatchd is shut down, all LSF Batch service will be temporarily unavailable. However all existing jobs will not be affected. When mbatchd is later restarted, previous status will be restored from the event log file and job scheduling will continue.
Occasionally you may want to drain a batch server host for purposes of rebooting, maintenance, or host removal. This can be achieved by running the badmin hclose command:
% badmin hclose hostB Close <hostB> ...... done
When a host is open, LSF Batch can dispatch jobs to it. When a host is closed no new batch jobs are dispatched, but jobs already dispatched to the host continue to execute. To reopen a batch server host, run badmin hopen command:
% badmin hopen hostB Open <hostB> ...... done
To view the history of a batch server host, run badmin hhist command:
% badmin hhist hostB Wed Nov 20 14:41:58: Host <hostB> closed by administrator <lsf>. Wed Nov 20 15:23:39: Host <hostB> opened by administrator <lsf>.
Each batch queue can be open or closed, active or inactive. Users can submit jobs to open queues, but not to closed queues. Active queues start jobs on available server hosts, and inactive queues hold all jobs. The LSF administrator can change the state of any queue. Queues may also become active or inactive because of queue run or dispatch windows.
The current status of a particular queue or all queues is displayed by the bqueues(1) command. The bqueues -l option also gives current statistics about the jobs in a particular queue such as the total number of jobs in this queue, the number of jobs running, suspended, etc.
% bqueues normal QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP normal 30 Open:Active - - - 2 6 4 2 0
When a batch queue is open, users can submit jobs to the queue. When a queue is closed, users cannot submit jobs to the queue. If a user tries to submit a job to a closed queue, an error message is printed and the job is rejected. If a queue is closed but still active, previously submitted jobs continue to be processed. This allows the LSF administrator to drain a queue.
% badmin qclose normal Queue <normal> is closed % bqueues normal QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP normal 30 Closed:Active - - - 2 6 4 2 0 % bsub -q normal hostname normal: Queue has been closed % badmin qopen normal Queue <normal> is opened
When a queue is active, jobs in the queue are started if appropriate hosts are available. When a queue is inactive, jobs in the queue are not started. Queues can be activated and inactivated by the LSF administrator using the badmin qact and badmin qinact commands, or by configured queue run or dispatch windows.
If a queue is open and inactive, users can submit jobs to this queue but no new jobs are dispatched to hosts. Currently running jobs continue to execute. This allows the LSF administrator to let running jobs complete before removing queues or making other major changes.
% badmin qinact normal Queue <normal> is inactivated % bqueues normal QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSP normal 30 Open:Inctive - - - - 0 0 0 0 % badmin qact normal Queue <normal> is activated
The LSF Batch cluster is a subset of the LSF Base cluster. All servers used by LSF Batch must belong to the base cluster, however not all servers in the base cluster must provide LSF Batch services.
LSF Batch configuration consists of four files: lsb.params, lsb.hosts, lsb.users, and lsb.queues. These files are stored in LSB_CONFDIR/cluster/configdir, where cluster is the name of your cluster.
All these files are optional. If any of these files does not exist, LSF Batch will assume a default configuration.
The lsb.params file defines general parameters about LSF Batch system operation such as the name of the default queue when the user does not specify one, scheduling intervals for mbatchd and sbatchd, etc. Detailed parameters are described in 'The lsb.params File'.
The lsb.hosts file defines LSF Batch server hosts together with their attributes. Not all LSF hosts defined by LIM configuration have to be configured to run batch jobs. Batch server host attributes include scheduling load thresholds, dispatch windows, job slot limits, etc. This file is also used to define host groups and host partitions. See 'The lsb.hosts File' for details of this file.
The lsb.users file contains user-related parameters such as user groups, user job slot limits, and account mapping. See 'The lsb.users File' for details.
The lsb.queues file define job queues. Numerous controls are available at queue level to allow cluster administrators to customize site resource allocation policies. See 'The lsb.queues File' for more details.
When you first install LSF on your cluster, some example queues are already configured for you. You should customize these queues or define new queues to meet your site need.
Note
After changing any of the LSF Batch configuration files, you need to run badmin reconfig to tell mbatchd to pick up the new configuration. You also must run this every time you change LIM configuration.
You can add a batch server host to LSF Batch configuration following the steps below:
To remove a host as a batch server host, follow the steps below:
CAUTION!
You should never remove the master host from LSF Batch. Change LIM configuration to assign a different default master host if you want to remove your current default master from the LSF Batch server pool.
Adding a batch queue does not affect pending or running LSF Batch jobs. To add a batch queue to a cluster:
Before removing a queue, you should make sure there are no jobs in that queue. If you remove a queue that has jobs in it, the jobs are temporarily moved to a lost and found queue. Jobs in the lost and found queue remain pending until the user or the LSF administrator uses the bswitch command to switch the jobs into regular queues. Jobs in other queues are not affected.
In this example, move all pending and running jobs in the night queue to the idle queue, and then delete the night queue.
% badmin qclose night Queue <night> is closed
% bjobs -u all -q night JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 5308 user5 RUN night hostA hostD sleep 500 Nov 21 18:16 5310 user5 PEND night hostA sleep 500 Nov 21 18:17 % bswitch -q night idle 0 Job <5308> is switched to queue <idle> Job <5310> is switched to queue <idle>
The LSF administrator can control batch jobs belonging to any user. Other users may control only their own jobs. Jobs can be suspended, resumed, killed, and moved within and between queues.
The bswitch command moves pending and running jobs from queue to queue. The btop and bbot commands change the dispatching order of pending jobs within a queue. The LSF administrator can move any job. Other users can move only their own jobs.
The btop and bbot commands do not allow users to move their own jobs ahead of those submitted by other users. Only the execution order of the user's own jobs is changed. The LSF administrator can move one user's job ahead of another user's. The btop, bbot, and bswitch commands are described in the LSF User's Guide and in the btop(1)and bswitch(1)manual pages.
The bstop, bresume and bkill commands send UNIX signals to batch jobs. See the kill(1) manual page for a discussion of the UNIX signals.
bstop sends SIGSTOP to sequential jobs and SIGTSTP to parallel jobs.
bkill sends the specified signal to the process group of the specified jobs. If the -s option is not present, the default operation of bkill is to send a SIGKILL signal to the specified jobs to kill these jobs. Twenty seconds before SIGKILL is sent, SIGTERM and SIGINT are sent to give the job a chance to catch the signals and clean up.
Users are only allowed to send signals to their own jobs. The LSF administrator can send signals to any job. See the LSF User's Guide and the manual pages for more information about these commands.
This example shows the use of the bstop and bkill commands:
% bstop 5310 Job <5310> is being stopped % bjobs 5310 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 5310 user5 PSUSP night hostA sleep 500 Nov 21 18:17 % bkill 5310 Job <5310> is being terminated % bjobs 5310 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 5310 user5 EXIT night hostA sleep 500 Nov 21 18:17
Each batch job has its resource requirements. Batch server hosts that match the resource requirements are the candidate hosts. When the batch daemon wants to schedule a job, it first asks the LIM for the load index values of all the candidate hosts. The load values for each host are compared to the scheduling conditions. Jobs are only dispatched to a host if all load values are within the scheduling thresholds.
When a job is running on a host, the batch daemon periodically gets the load information for that host from the LIM. If the load values cause the suspending conditions to become true for that particular job, the batch daemon performs the SUSPEND action to the process group of that job. The batch daemon allows some time for changes to the system load to register before it considers suspending another job.
When a job is suspended, the batch daemon periodically checks the load on that host. If the load values cause the scheduling conditions to become true, the daemon performs the RESUME action to the process group of the suspended batch job.
The SUSPEND and RESUME actions are configurable as described in 'Configurable Job Control Actions'.
LSF Batch has a wide variety of configuration options. This section describes only a few of the options to demonstrate the process. For complete details, see 'LSF Batch Configuration Reference'. The algorithms used to schedule jobs and concepts involved are described in 'How LSF Batch Schedules Jobs'.
LSF is often used on systems that support both interactive and batch users. On one hand, users are often concerned that load sharing will overload their workstations and slow down their interactive tasks. On the other hand, some users want to dedicate some machines for critical batch jobs so that they have guaranteed resources. Even if all your workload is batch jobs, you still want to reduce resource contentions and operating system overhead to maximize the use of your resources.
Numerous parameters in LIM and LSF Batch configurations can be used to control your resource allocation and to avoid undesirable contention.
Since interferences are often reflected from the load indices, LSF Batch responds to load changes to avoid or reduce contentions. LSF Batch can take actions on jobs to reduce interference before or after jobs are started. These actions are triggered by different load conditions. Most of the conditions can be configured at both the queue level and at the host level. Conditions defined at the queue level apply to all hosts used by the queue, while conditions defined at the host level apply to all queues using the host.
At the queue level, scheduling conditions are configured as either resource requirements or scheduling load thresholds, as described in 'The lsb.queues File'. At the host level, the scheduling conditions are defined as scheduling load thresholds, as described in 'The lsb.hosts File'.
At the queue level, suspending conditions are defined as STOP_COND as described in 'The lsb.queues File', or as suspending load threshold as described in 'Load Thresholds'. At the host level, suspending conditions are defined as stop load threshold as described in 'The lsb.hosts File'.
At the queue level, resume conditions are defined as either RESUME_COND, or the scheduling load conditions if RESUME_COND is not defined.
To effectively reduce interference between jobs, correct load indices should be used properly. Below are examples of a few frequently used parameters.
The paging rate (pg) load index relates strongly to the perceived interactive performance. If a host is paging applications to disk, the user interface feels very slow.
The paging rate is also a reflection of a shortage of physical memory. When an application is being paged in and out frequently, the system is spending a lot of time doing overhead, resulting in reduced performance.
The paging rate load index can be used as a threshold to either stop sending more jobs to the host, or to suspend an already running batch job so that interactive users will not be interfered.
This parameter can be used in different configuration files to achieve different purposes. By defining paging rate threshold in lsf.cluster.cluster, the host will become busy from LIM's point of view, therefore no more jobs will be advised by LIM to run on this host.
By including paging rate in LSF Batch queue or host scheduling conditions, batch jobs can be prevented from starting on machines with heavy paging rate, or be suspended or even killed if they are interfering with the interactive user on the console.
A batch job suspended due to pg threshold will not be resumed even if the resume conditions are met unless the machine is interactively idle for more than PG_SUSP_IT minutes, as described in 'Parameters'.
Stricter control can be achieved using the idle time (it) index. This index measures the number of minutes since any interactive terminal activity. Interactive terminals include hard wired ttys, rlogin and lslogin sessions, and X shell windows such as xterm. On some hosts, LIM also detects mouse and keyboard activity.
This index is typically used to prevent batch jobs from interfering with interactive activities. By defining the suspending condition in LSF Batch queue as 'it==0 && pg >50', a batch job from this queue will be suspended if the machine is not interactively idle and paging rate is higher than 50 pages per second. Further more, by defining resuming condition as 'it>5 && pg <10' in the queue, a suspended job from the queue will not resume unless it has been idle for at least 5 minutes and the paging rate is less than 10 pages per second.
The it index is only non-zero if no interactive users are active. Setting the it threshold to 5 minutes allows a reasonable amount of think time for interactive users, while making the machine available for load sharing, if the users are logged in but absent.
For lower priority batch queues, it is appropriate to set an it scheduling threshold of 10 minutes and suspending threshold of 2 minutes in the lsb.queues file. Jobs in these queues are suspended while the execution host is in use, and resume after the host has been idle for a longer period. For hosts where all batch jobs, no matter how important, should be suspended, set a per-host suspending threshold in the lsb.hosts file.
Running more than one CPU-bound process on a machine (or more than one process per CPU for multiprocessors) can reduce the total throughput because of operating system overhead, as well as interfering with interactive users. Some tasks such as compiling can create more than one CPU intensive task.
Batch queues should normally set CPU run queue scheduling thresholds below 1.0, so that hosts already running compute-bound jobs are left alone. LSF Batch scales the run queue thresholds for multiprocessor hosts by using the effective run queue lengths, so multiprocessors automatically run one job per processor in this case. For concept of effective run queue lengths, see lsfintro(1).
For short to medium-length jobs, the r1m index should be used. For longer jobs, you may wish to add an r15m threshold. An exception to this are high priority queues, where turnaround time is more important than total throughput. For high priority queues, an r1m scheduling threshold of 2.0 is appropriate.
The ut parameter measures the amount of CPU time being used. When all the CPU time on a host is in use, there is little to gain from sending another job to that host unless the host is much more powerful than others on the network. The lsload command reports ut in percent, but the configuration parameter in the lsf.cluster.cluster file and the LSF Batch configuration files is set as a fraction in the range from 0 to 1. A ut threshold of 0.9 prevents jobs from going to a host where the CPU does not have spare processing cycles.
If a host has very high pg but low ut, then it may be desirable to suspend some jobs to reduce the contention.
The commands bhist and bjobs are useful for tuning batch queues. bhist shows the execution history of batch jobs, including the time spent waiting in queues or suspended because of system load. bjobs -p shows why a job is pending.
A batch job is suspended when the load level of the execution host causes the suspending condition to become true. The bjobs -lp command shows the reason why the job was suspended together with the scheduling parameters. Use bhosts -l to check the load levels on the host, and adjust the suspending conditions of the host or queue if necessary.
The bhosts -l gives the most recent load values used for the scheduling of jobs.
% bhosts -l hostB HOST: hostB STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOWS ok 20.00 2 2 0 0 0 0 0 - CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem Total 0.3 0.8 0.9 61% 3.8 72 26 0 6M 253M 297M Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M LOAD THRESHOLD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - -
A '-' in the output indicates that the particular threshold is not defined. If no suspending threshold is configured for a load index, LSF Batch does not check the value of that load index when deciding whether to suspend jobs. Normally, the swp and tmp indices are not considered for suspending jobs, because suspending a job does not free up the space being used. However, if swp and tmp are specified by the STOP_COND parameter in your queue, these indices are considered for suspending jobs.
The load indices most commonly used for suspending conditions are the CPU run queue lengths, paging rate and idle time. To give priority to interactive users, set the suspending threshold on the it load index to a non-zero value. Batch jobs are stopped (within about 1.5 minutes) when any user is active, and resumed when the host has been idle for the time given in the it scheduling condition.
To tune the suspending threshold for paging rate, it is desirable to know the behaviour of your application. On an otherwise idle machine, check the paging rate using lsload. Then start your application. Watch the paging rate as the application runs. By subtracting the active paging rate from the idle paging rate, you get a number for the paging rate of your application. The suspending threshold should allow at least 1.5 times that amount. A job may be scheduled at any paging rate up to the scheduling threshold, so the suspending threshold should be at least the scheduling threshold plus 1.5 times the application paging rate. This prevents the system from scheduling a job and then immediately suspending it because of its own paging.
The effective CPU run queue length condition should be configured like the paging rate. For CPU-intensive sequential jobs, the effective run queue length indices increase by approximately one for each job. For jobs that use more than one process, you should make some test runs to determine your job's effect on the run queue length indices. Again, the suspending threshold should be equal to at least the scheduling threshold plus 1.5 times the load for one job.
Suspending thresholds can also be used to enforce inter-queue priorities. For example, if you configure a low-priority queue with an r1m (1 minute CPU run queue length) scheduling threshold of 0.25 and an r1m suspending threshold of 1.75, this queue starts one job when the machine is idle. If the job is CPU intensive, it increases the run queue length from 0.25 to roughly 1.25. A high-priority queue configured with a scheduling threshold of 1.5 and an unlimited suspending threshold will send a second job to the same host, increasing the run queue to 2.25. This exceeds the suspending threshold for the low priority job, so it is stopped. The run queue length stays above 0.25 until the high priority job exits. After the high priority job exits the run queue index drops back to the idle level, so the low priority job is resumed.
By default LSF Batch schedules user jobs according to the First-Come-First-Serve (FCFS) principle. If your sites have many users contending for limited resources, the FCFS policy is not enough. For example, a user could submit 1000 long jobs in one morning and occupying all the resources for a whole week, while other users's urgent jobs wait in queues.
LSF Batch provides fairshare scheduling to give you control on how resources should be shared by competing users. Fairshare can be configured so that LSF Batch can schedule jobs according to each user or user group's configured shares. When fairshare is configured, each user or user group is assigned a priority based on the following factors:
If a user or group has used less than their share of the processing resources, their pending jobs (if any) are scheduled first, jumping ahead of other jobs in the batch queues. The CPU times used for fairshare scheduling are not normalised for the host CPU speed factors.
The special user names others and default can also be assigned shares. The name others refers to all users not explicitly listed in the USER_SHARES parameter. The name default refers to each user not explicitly named in the USER_SHARES parameter. Note that default represents a single user name while others represents a user group name. The special host name all can be used to refer to all batch server hosts in the cluster.
Fairshare affects job scheduling only if there is resource contentions among users such that users with more shares will run more jobs than users with less shares. If there is only one user having jobs to run, then fairshare has no effect on job scheduling.
Fairshare in LSF Batch can be configured at either queue level or host level. At queue level, the shares apply to all users who submit jobs to the queue and all hosts that are configured as hosts for the queue. It is possible that several queues share some hosts as servers, but each queue can have its own fairshare policy.
Queue level fairshare is defined using the keyword FAIRSHARE.
If you want strict resource allocation control on some hosts for all workload, configure fairshare at the host level. Host level fairshare is configured as a host partition. Host partition is a configuration option that allows a group of server hosts to be shared by users according to configured shares. In a host partition each user or group of users is assigned a share. The bhpart command displays the current cumulative CPU usage and scheduling priority for each user or group in a host partition.
Below are some examples of configuring fairshare at both queue level and host level. Details of the configuration syntax are described in 'Host Partitions' and 'Scheduling Policy'.
Note
Do not define fairshare at both the host and the queue level if the queue uses some or all hosts belonging to the host partition because this results in policy conflicts. Doing so will result in undefined scheduling behaviour.
If you have a queue that is shared by critical users and non-critical users, you can configure fairshare so that as long as there are jobs from key users waiting for resource, non-critical users' jobs will not be dispatched.
First you can define a user group key_users in lsb.users file. Then define your queue such that FAIRSHARE is defined:
Begin Queue QUEUE_NAME = production FAIRSHARE = USER_SHARES[[key_users@, 2000] [others, 1]] ... End Queue
By this configuration, key_users each have 2000 shares, while other users together have only 1 share. This makes it virtually impossible for other users' jobs to get dispatched unless no user in the key_users group has jobs waiting to run.
Note that a user group followed by an '@' refers to each user in that group, as you could otherwise configure by listing every user separately, each having 2000 shares. This also defines equal shares among the key_users. If '@' is not present, then all users in the user group share the same share and there will be no fairshare among them.
You can also use host partition to achieve similar result if you want the same fairshare policy to apply to jobs from all queues.
Suppose two departments contributed to the purchase of a large system. The engineering department contributed 70 percent of the cost, and the accounting department 30 percent. Each department wants to get (roughly) their money's worth from the system.
Configure two user groups in the lsb.users file, one listing all the users in the engineering group, and one listing all the members in the accounting group:
Begin UserGroup Group_Name Group_Member eng_users (user6 user4) acct_users (user2 user5) End UserGroup
Then configure a host partition for the host, listing the appropriate shares:
Begin HostPartition PART_NAME = big_servers HOSTS = hostH USER_SHARES = [eng_users, 7] [acct_users, 3] End HostPartition
Note the difference in defining USER_SHARES in a queue and in a host partition. Alternatively, the shares can be configured for each member of a user group by appending an '@' to the group name:
USER_SHARES = [eng_users@, 7] [acct_users@, 3]
If a user is configured to belong to two user groups, the user can specify which group the job belongs to with the -P option to the bsub command.
Similarly you can define the same policy at the queue level if you want to enforce this policy only within a queue.
Round-robin scheduling balances the resource usage between users by running one job from each user in turn, independent of what order the jobs arrived in. This can be configured by defining equal share for everybody. For example:
Begin HostPartition HPART_NAME = even_share HOSTS = all USER_SHARES = [default, 1] End HostPartition
The concept of dispatch and run windows for LSF Batch are described in 'How LSF Batch Schedules Jobs'.
This can be achieved by configuring dispatch windows for the host in the lsb.hosts files, and run windows and dispatch windows for queues in lsb.queues file.
Dispatch windows in lsb.hosts file cause batch server hosts to be closed unless the current time is inside the time windows. When a host is closed by a time window, no new jobs will be sent to it, but the existing jobs running on it will remain running. Details about this parameter is described in 'Host Section'.
Dispatch and run windows defined in lsb.queues limit when a queue can dispatch new jobs and when jobs from a queue are allowed to run. A run window differs from a dispatch window in that when a run window is closed, jobs that are already running will be suspended instead of remain running. Details of these two parameters are described in 'The lsb.queues File'.
By defining different job slot limits to hosts, queues, and users, you can control batch job processing capacity for your cluster, hosts, and users. For example, by limiting maximum job slot for each of your hosts, you can make sure that your system operates at optimal performance. By defining a job slot limit for some users, you can prevent some users from using up all the job slots in the system at one time. There are a variety of job slot limits that can be used for very different purposes, see 'Job Slot Limits' for more concepts and descriptions of job slot limits. Configuration parameters for job slot limits are described in 'LSF Batch Configuration Reference'.
The concept of resource reservation was discussed in 'Resource Reservation'.
The resource reservation feature at the queue level allows the cluster administrator to specify the amount of resources the system should reserve for jobs in the queue. It also serves as the upper limits of resource reservation if a user also specifies it when submitting a job.
The resource reservation requirement can be configured at the queue level as part of the queue level resource requirements. For example:
Begin Queue . RES_REQ = select[type==any] rusage[swap=100:mem=40:duration=60] . End Queue
will allow a job to be scheduled on any host that the queue is configured to use and will reserve 100 megabytes of swap and 40 megabytes of memory for a duration of 60 minutes. See 'Queue-Level Resource Reservation' for detailed configuration syntax for this parameter.
The concept of processor reservation was described in 'Processor Reservation'. You may want to configure this feature if your cluster has a lot of sequential jobs that compete for resources with parallel jobs.
See 'Processor Reservation for Parallel Jobs' for configuration options for this feature.
When LSF Batch runs your jobs, it tries to make it as transparent to the user as possible. By default, the execution environment is maintained to be as close to the current environment as possible. LSF Batch will copy the environment from the submission host to execution host. It also sets the umask and the current working directory.
Since a network can be heterogeneous, it is often impossible or undesirable to reproduce the submission host's execution environment on the execution host. For example, if home directory is not shared between submission and execution host, LSF Batch runs the job in the /tmp on the execution host. If DISPLAY environment variable is something like 'Unix:0.0', or ':0.0', then it must be processed before using on the execution host. These are automatically handled by LSF Batch.
Users can change the default behaviour by using a job starter, or using the '-L' option of the bsub command to change the default execution environment. See 'Using A Job Starter' for details of a job starter.
For resource control purpose, LSF Batch also changes some of the execution environment of jobs. These include nice values, resource limits, or any other environment by configuring a job starter.
In addition to environment variables inherited from the user, LSF Batch also sets a few more environment variables for batch jobs. These are:
Many LSF tools use LSF Remote Execution Server (RES) to run jobs such as lsrun, lsmake, lstcsh, and lsgrun. You can control the execution priority of jobs started via RES by modifying your LIM configuration file lsf.cluster.cluster. This can be done by defining the REXPRI parameter for individual hosts. See 'Descriptive Fields' for details of this parameter.
LSF Batch jobs can be run with a nice value as defined in your lsb.queues file. Each queue can have a different nice value. See 'NICE' for details of this parameter.
Resource limits control how much resource can be consumed by jobs. By defining such limits, the cluster administrator can have better control of resource usage. For example, by defining a high priority short queue, you can allow short jobs to be scheduled earlier than long jobs. To prevent some users from submitting long jobs to this short queue, you can set CPU limit for the queue so that no jobs submitted from the queue can run for longer than that limit.
Details of resource limit configuration are described in 'Resource Limits'.
Your batch jobs can be accompanied with a pre-execution and a post-execution command. This can be used for many purposes. For example, creation and deletion of scratch directories, or check for necessary conditions before running the real job. Details of these concepts are described in 'Pre- and Post-execution Commands'.
The pre-execution and post-execution commands can be configured at the queue level as described in 'Queue-Level Pre-/Post-Execution Commands'.
Some jobs have to be started under particular shells or require certain setup steps to be performed before the actual job is executed. This is often handled by writing wrapper scripts around the job. The LSF job starter feature allows you to specify an executable which will perform the actual execution of the job, doing any necessary setup before hand. One typical use of this feature is to customize LSF for use with Atria ClearCase environment. See 'Support for Atria ClearCase'.
The job starter can be specified at the queue level using the JOB_STARTER parameter in the lsb.queues file. This allows the LSF Batch queue to control the job startup. For example, the following might be defined in a queue:
Begin Queue . JOB_STARTER = xterm -e . End Queue
This way all jobs submitted into this queue will be run under an xterm.
The following are other possible uses of a job starter:
A job starter is configured at the queue level. See 'Job Starter' for details.
Many applications have restricted access based on the number of software licenses purchased. LSF can help manage licensed software by automatically forwarding jobs to licensed hosts, or by holding jobs in batch queues until licenses are available.
There are three main types of software license: host locked, host locked counted, and network floating.
Host locked software licenses allow users to run an unlimited number of copies of the product on each of the hosts that has a license. You can configure a boolean resource to represent the software license, and configure your application to require the license resource. When users run the application, LSF chooses the best host from the set of licensed hosts.
See 'Customizing Host Resources' for instructions on configuring boolean resources, and 'The lsf.task and lsf.task.cluster Files' for instructions on configuring resource requirements for an application.
Host locked counted licenses are only available on specific licensed hosts, but also place a limit on the maximum number of copies available on the host. If an External LIM can get the number of licenses currently available, you can configure an external load index licenses giving the number of free licenses on each host. By specifying licenses>=1 in the resource requirements for the application, you can restrict the application to run only on hosts with available licenses.
See 'Changing LIM Configuration' for instructions on writing and using an ELIM, and 'The lsf.task and lsf.task.cluster Files' for instructions on configuring resource requirements for an application.
If a shell script check_license can check license availability and acquires a license if one is available, another solution is to use this script as a pre-execution command when submitting the licensed job.
% bsub -m licensed_hosts -E check_license licensed_job
An alternative is to configure the check_license script as a queue level pre-execution command (see 'Queue-Level Pre-/Post-Execution Commands' for more details).
It is possible that the license becomes unavailable between the time the check_license script is run, and when the job is actually run. To handle this case, the LSF administrator can configure a queue so that jobs in this queue will be requeued if they exit with value(s) indicating that the license was not successfully obtained (see 'Automatic Job Requeue').
A floating license allows up to a fixed number of machines or users to run the product at the same time, without restricting which host the software can run on. Floating licenses can be thought of as 'cluster resources'; rather than belonging to a specific host, they belong to all hosts in the cluster.
You can also use the resource reservation feature to control floating licenses. To do this, configure an external load index and write an ELIM that always reports a static number N, where N is the total number of licenses. Configure queue level resource requirements such that the rusage section specifies the reservation requirement of one license for the duration of the job execution. This way, LSF Batch keeps track of the counter and will not over-commit licenses by always running no more than N jobs at the same time. Details for configuring a queue level resource requirement are described in 'Queue-Level Resource Requirement'.
Alternatively, a pre-execution command can be configured so that LSF Batch periodically checks for the availability of a license, and keeps the job pending in the queue until a license becomes available (and a suitable execution host can be found). Pre-execution conditions are described in 'Queue-Level Pre-/Post-Execution Commands'.
As another alternative, a site can configure requeue exit values so that a job will be requeued if it fails to get a license (see 'Automatic Job Requeue').
Using LSF Batch to run licensed software can improve the utilization of the licenses - the licenses can be kept in use 24 hours a day, 7 days a week. For expensive licenses, this increases their value to the users. Also, productivity can be increased, as users do not have to wait around for a license to become available.
There are numerous ways to build queues. This section gives some examples.
You want to dispatch large batch jobs only to those hosts that are idle. These jobs should be suspended as soon as an interactive user begins to use the machine. You can (arbitrarily) define a host to be idle if there has been no terminal activity for at least 5 minutes and the 1 minute average run queue is no more than 0.3. The idle queue does not start more than one job per processor.
Begin Queue QUEUE_NAME = idle NICE = 20 RES_REQ = it>5 && r1m<0.3 DTOP_COND = it==0 RESUME_COND = it>10 PJOB_LIMIT = 1 End Queue
If a department buys some fast servers with its own budget, they may want to restrict the use of these machines to users in their group. The owners queue includes a USERS section defining the list of users and user groups that are allowed to use these machines. This queue also defines fairshare policy so that users can have equal sharing of resources.
Begin Queue QUEUE_NAME = owners PRIORITY = 40 r1m = 1.0/3.0 FAIRSHARE = USER_SHARES[[default, 1]] USERS = server_owners HOSTS = server1 server2 server3 End Queue
On the other hand, the department may want to allow other people to use its machines during off hours so that the machine cycles are not wasted. The night queue only schedules jobs after 7 p.m. and kills jobs around 8 a.m. every day. Jobs are also allowed to run over the weekend.
To ensure jobs in the night queue do not hold up resources after the run window is closed, TERMINATE_WHEN is defined as WINDOW so that when the run window is closed, jobs that have been started but have not finished will be killed.
Because no USERS section is given, all users can submit jobs to this queue. The HOSTS section still contains the server host names. By setting MEMLIMIT for this queue, jobs that use a lot of real memory automatically have their time sharing priority reduced on hosts that support the RLIMIT_RSS resource limit.
This queue also reserves swp memory of 40MB for the job and this reservation decreases to 0 over 20 minutes after the job starts.
Begin Queue QUEUE_NAME = night RUN_WINDOW = 5:19:00-1:08:00 19:00-08:00 PRIORITY = 5 RES_REQ = ut<0.5 && swp>50 rusage[swp=40:duration=20:decay=1] r1m = 0.5/3.0 MEMLIMIT = 5000 TERMINATE_WHEN = WINDOW HOSTS = server1 server2 server3 DESCRIPTION = Low priority queue for overnight jobs End Queue
Some software packages have fixed licenses and must be run on certain hosts. Suppose a package is licensed to run only on a few hosts as are tagged with product resource. Also suppose that on each of these hosts, only one license is available.
To ensure correct hosts are chosen to run jobs, a queue level resource requirement 'type==any && product' is defined. To ensure that the job gets a license when it starts, the HJOB_LIMIT has been defined to limit one job per host. Since software licenses are expensive resources that should not be under-utilized, the priority of this queue has been defined to be higher than any other queues so that jobs in this queue are considered for scheduling first. It also has a small nice value so that more CPU time is allocated to jobs from this queue.
Begin Queue QUEUENAME = license NICE = 0 PRIORITY = 80 HJOB_LIMIT = 1 RES_REQ = type==any && product r1m = 2.0/4.0 DESCRIPTION = Licensed software queue End Queue
The short queue can be used to give faster turnaround time for short jobs by running them before longer jobs.
Jobs from this queue should always be dispatched first, so this queue has the highest PRIORITY value. The r1m scheduling threshold of 2 and no suspending threshold mean that jobs are dispatched even when the host is being used and are never suspended. The CPULIMIT value of 15 minutes prevents users from abusing this queue; jobs running more than 15 minutes are killed.
Because the short queue runs at a high priority, each user is only allowed to run one job at a time.
Begin Queue QUEUE_NAME = short PRIORITY = 50 r1m = 2/ CPULIMIT = 15 UJOB_LIMIT = 1 DESCRIPTION = For jobs running less than 15 minutes End Queue
Because the short queue starts jobs even when the load on a host is high, it can preempt jobs from other queues that are already running on a host. The extra load created by the short job can make some load indices exceed the suspending threshold for other queues, so that jobs from those other queues are suspended. When the short queue job completes, the load goes down and the preempted job is resumed.
Some special-purpose computers are accessed through front end hosts. You can configure the front end host in lsb.hosts so that it accepts only one job at a time, and then define a queue that dispatches jobs to the front end host with no scheduling constraints.
Suppose hostD is a front end host:
Begin Queue QUEUE_NAME = front PRIORITY = 50 HOSTS = hostD JOB_STARTER = pload DESCRIPTION = Jobs are queued at hostD and started with pload command End Queue
To interoperate with NQS, you must configure one or more LSF Batch queues to forward jobs to remote NQS hosts. An NQS forward queue is an LSF Batch queue with the parameter NQS_QUEUES defined. The following queue forwards jobs to the NQS queue named pipe on host cray001:
Begin Queue QUEUE_NAME = nqsUse PRIORITY = 30 NICE = 15 QJOB_LIMIT = 5 CPULIMIT = 15 NQS_QUEUES = pipe@cray001 DESCRIPTION = Jobs submitted to this queue are forwarded to NQS_QUEUES USERS = all End Queue
The lsb.hosts file defines host attributes. Host attributes also affect the scheduling decisions of LSF Batch. By default LSF Batch uses all server hosts as configured by LIM configuration files. In this case you do not have to list all hosts in the Host section. For example:
Begin Host HOST_NAME MXJ JL/U swp # This line is keyword(s) default 2 1 20 End Host
The virtual host name default refers to each of the other hosts configured by LIM but is not explicitly mentioned in the Host section of the lsb.hosts file. This file defines a total allowed job slot limit of 2 and a per user job limit of 1 for every batch server host. It also defines a scheduling load threshold of 20MB of swap memory.
In most cases your cluster is heterogeneous in some way, so you may have different controls for different machines. For example:
Begin Host HOST_NAME MXJ JL/U swp # This line is keyword(s) hostA 8 2 () hppa 2 () () default 2 1 20 End Host
In this file you add host type hppa in the HOST_NAME column. This will include all server hosts from LIM configuration that have host type hppa and are not explicitly listed in the Host section of this file. You can also use a host model name for this purpose. Note the '()' in some of the columns. It refers to undefined parameters and serves as a place-holder for that column.
lsb.hosts file can also be used to define host groups and host partitions, as exemplified in 'Sharing Hosts Between Two Groups'.
xlsadmin is a GUI tool for managing your LSF cluster. This tool allows you to do the LSF management work described so far in this chapter as well as the tasks in 'Managing LSF Base'.
xlsadmin consists of two operation modes: managing and configuration. In managing mode, xlsadmin allows you to:
In configuration mode, xlsadmin allows you to:
Figure 5 is the xlsadmin management main window. The upper area displays all cluster hosts defined by LIM configuration. The middle contains two areas listing queues and batch server hosts. The bottom area is a message window in response to an operation performed.
To view the status of a host or queue, double click on the queue or host and you will get a popup window. Figure 6 is a batch server host popup window when you double click on hostB in the Batch Server Hosts area. Figure 7 is a batch queue popup window when you double click on night in the Batch Queues area.
To perform a control action on a host or queue, select the host or queue in the main management window and choose an operation from the Manage pull-down menu.
By clicking on the Config tab in the management main window, you switch to configuration mode and the main window will change to configuration main window, as shown in Figure 8.
The configuration main window contains all areas in a management main window, with the addition of definition areas shown as icons. The definition areas are for defining global names that are used by host or queue configurations or global parameters.
The icons in the upper area are used for defining host types, host models, resource names, task resource list, external load indices as you can otherwise do by editing lsf.shared file. The icons in the middle area allow you to define host groups, host partitions that you can otherwise do by editing lsb.hosts file, user parameters and user groups as you can otherwise do by editing lsb.users file, and parameters defined in lsb.params file.
By clicking on an icon you will get a popup window for editing that parameter. Figure 9 shows the resource name editing window when you click on the Resource icon.
By double clicking on a host or queue, you will get a popup window that allows you to modify the configuration parameters of that host or queue. You can also add or delete hosts or queues by using the Configure pull-down menu and choose the proper configuration options.
Figure 10 shows the host editing window for LIM configuration by double clicking on hostD in the Cluster Hosts area. This window modifies the host attributes of hostD as you can otherwise do by editing lsf.cluster.cluster file.
Figure 11 shows the queue editing window for creating a new queue.
After you have made all the configuration changes, you can save the changes to files by using the File pull-down menu and choosing Save To Files. You can use xlsadmin to verify the correctness of your configuration by using the File pull-down menu and choosing Check before you choose Commit from the File menu, which is equivalent to running lsadmin reconfig and badmin reconfig.
Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.