LSF Administrator's Quick Reference

Platform Computer Corporation


Common Options

All commands take the following options. They will not be shown unless they differ for a specific command:

-h   Print command usage to standard error and exit.
-V   Print LSF version to standard error and exit.

When used by the LSF system administrator, the bchkpnt, bmig, bkill, bstop, bresume, and bswitch commands take an extra option:

-u username
Operate on jobs submitted by the named user, or by all users if the reserved user name all is given.

Administration

lsfsetup

Menu driven LSF installation, upgrade, and configuration utility.

lsfrestart and lsfshutdown

Restart or shutdown the LSF daemons on all hosts in the local cluster.

lsfrestart [-f]
lsfshutdown [-f]
-f   Continue without seeking confirmation if an error is encountered.

lsadmin

LSF administrative tool to control the operation of LIM and RES daemons in an LSF cluster. Without arguments, lsadmin prompts for commands.

lsadmin [-h] [-V] [command [command_options] [command_args]]
ckconfig [-v]
Check LSF LIM configuration files. If -v is specified, display detailed messages about configuration file status.
reconfig [-v] [-f]
Restart LIM daemons on all hosts in the local cluster. If -v is specified, display detailed messages about configuration file status. If -f is specified, the operation will proceed without confirmation unless the configuration files contain fatal errors.
limstartup [hostname ... | [-f] all]
limrestart [-v] [-f] [hostname ... | all]
limshutdown [hostname ... | [-f] all]
Start-up, restart, or shut down LIM daemons on the hosts specified or on all hosts in the local cluster if the reserved hostname all is the only argument provided. Default: local host. If -v is specified, display detailed messages about configuration file status. If -f is specified, no confirmation will be requested.
limlock [-l duration]
limunlock
Lock or unlock LIM daemon on the local host. If -l is specified, the host is locked for duration seconds; otherwise, it will be locked until explicitly unlocked. When a host is locked, its LIM load status becomes lockU.
resstartup [hostname ... | [-f] all]
resrestart [hostname ... | [-f] all]
resshutdown [hostname ... | [-f] all]
Start-up, restart, or shut down RES daemons on the hosts specified or on all hosts in the local cluster if the reserved hostname all is the only argument provided. Default: local host. If -f is specified, no confirmation will be requested. For resstartup, the LSF administrator should be able to use rsh on all LSF hosts.
reslogon [-c cpuTime] [hostname ... | all]
reslogoff [hostname ... | all]
Turn on or turn off RES daemon task logging on the hosts specified or on all hosts in the local cluster if the reserved hostname all is the only argument provided. Default: local host. RES will write resource usage information into the log file lsf.acct.hostname. If -c is specified, log only the tasks which used more than cpuTime; otherwise all tasks will be logged.
help [command ...]
Display the syntax and functionality of the specified command(s).
quit
Exit the lsadmin session.

lsreconfig is an alias for lsadmin reconfig, lslockhost is an alias for lsadmin limlock, and lsunlockhost is an alias for lsadmin limunlock. These commands are for backward compatibility.

badmin

Administration tool to control and monitor LSF Batch with a set of privileged and non-privileged commands. Privileged commands can only be invoked by root or LSF administrators; all other commands can be invoked by any user. Without arguments, badmin prompts for commands.

badmin [-h] [-V] [command [command_options] [command_args]]
ckconfig [-v]
Check LSF Batch configuration files. If -v is specified, display detailed messages about configuration file status.
reconfig [-v] [-f]
Dynamically reconfigure the LSF Batch system. If -v is specified, display detailed messages about configuration file status. If -f is specified, the operation will proceed without confirmation unless the configuration files contain fatal errors.
qopen [queue_name ... | all]
qclose [queue_name ... | all]
qact [queue_name ... | all]
qinact [queue_name ... | all]
Open, close, activate, or inactivate the LSF Batch queues specified by queue_name, or all queues if the reserved word all is given. If no queue is specified, the system default queue is assumed.
qhist [-t time0, time1] [-f logfile_name] [queue_name ...]
hhist [-t time0, time1] [-f logfile_name] [host_name ...]
mbdhist [-t time0, time1] [-f logfile_name]
hist [-t time0, time1] [-f logfile_name]
Display the event history of LSF Batch; qhist displays the named queues (default: all queues), hhist displays the named hosts (default: all hosts), mbdhist displays the master batch daemon (mbatchd) and hist displays all three. If -t is specified, display only those events that happened during the period from time0 to time1 (see bhist(1) for the time format). If -f is specified, use logfile_name as the event log file.
hopen [host_name ... | all]
hclose [host_name ... | all]
Open or close the server hosts specified or all hosts in the LSF Batch system if the reserved hostname all is given. Default: local host.
hstartup [host_name ... | [-f] all]
hrestart [host_name ... | [-f] all]
hshutdown [host_name ... | [-f] all]
Start-up, restart, or shut down slave batch daemons (sbatchd) on the server hosts specified or all hosts in the LSF Batch system if the reserved hostname all is given. Default: local host. If -f is specified, , no confirmation will be requested. For hstartup, the LSF administrator should be able to use rsh on all LSF hosts.
help [command ...]
? [command ...]
Display the syntax and functionality of the specified command(s).
quit
Exit the badmin session.

xlsadmin

Motif-based Graphical User Interface application for LSF administration.

Accounting

bacct

Report accounting statistics on completed batch jobs in the LSF Batch system.

bacct [-h] [-V] [-b] [-l] [-w] [-d] [-e] [-f logfile] [-N host_spec]
[-C time0, time1] [-S time0, time1] [-D time0, time1] [-q queuelist]
[-m hostlist] [-u userlist|all] [-P projectlist] [jobId ...]
-b
Display brief information on each job and a summary. Default: display only the summary.
-l
Display all the information on each job and a summary. Default: display only the summary.
-w
Display in wide format. No truncation is performed on user name, queue name, from host, execution host or job name.
-d
Consider only successfully completed jobs (DONE status). Default: all finished jobs (DONE or EXIT status).
-e
Consider only exited jobs (EXIT status). Default: all finished jobs (DONE or EXIT status).
-f logfile
Use logfile as the job log file to be analysed. Default: the current job log file (lsb.acct).
-N host_spec
Display normalized CPU time relative to the host type, host model, or CPU factor of the execution host.
-C time0, time1
Consider only those jobs whose completion or exit times were within the time interval time0 to time1. Default: all logged jobs.
-S time0, time1
Consider only those jobs whose submission times were within the time interval time0 to time1. Default: all logged jobs.
-D time0, time1
Consider only those jobs whose dispatch times were within the time interval time0 to time1. Default: all logged jobs.
-q queuelist
Consider only jobs submitted to the named queues. Default: all queues.
-m hostlist
Consider only jobs executed on the named hosts. Default: all hosts.
-u userlist|all
Consider only jobs submitted by the named users, or all users if the reserved name all is given. A mixture of user names and user IDs can be listed. Default: the invoker.
-P projectlist
Consider only those jobs submitted to projectlist. If two or more project names are given, they must be enclosed by (") or ('). The default is to consider all project names.
jobId
Consider only the specified jobs. This option overrides all other options except -h, -V, -b, -l, and -f. Default: all jobs that satisfy the other options.

LSF Components

LSF has four parts to its architecture: a Load Information Manager (LIM), a Remote Execution Server (RES), a slave batch daemon (sbatchd) and a master batch daemon (mbatchd).

They are root owned daemons. LIM, RES and sbatchd run on each host in a load sharing cluster. These daemons are invoked at boot time. The sbatchd daemon on the master host invokes mbatchd.

LIM

The LIM collects load and resource information about all hosts in the cluster and provides host selection services to applications through LSLIB. The LIM maintains information on static system resources and dynamic load.

RES

The RES provides the mechanisms for transparent remote execution of tasks. The RES accepts remote execution requests from all load sharing applications and handles input/output on the remote host for load shared processes.

mbatchd

User jobs are held by mbatchd when submitted. mbatchd periodically checks the load information on all candidate hosts by contacting the master LIM. When a host with the necessary resource becomes available, mbatchd will send a job to the sbatchd on that host for execution. When more than one candidate host becomes available, mbatchd chooses the best host.

sbatchd

An sbatchd daemon accepts job execution requests from the mbatchd, and monitors the progress of its jobs. sbatchd controls the execution of the jobs and reports job status to mbatchd.

Troubleshooting

What Should Be Running?

The process status command, ps, run on an LSF server host should show the LIM (lim), RES (res), and sbatchd daemons. The master host should also show the mbatchd daemon.

LSF Files Directory
lsf.conf $LSF_ENVDIR/etc
lsfsetup $LSF_SERVERDIR
All LSF daemons $LSF_SERVERDIR
Administration tools $LSF_BIN
Configuration files $LSF_CONFDIR
Batch configuration files $LSF_CONFDIR/lsbatch/cluster/configdir
lsf.acct.host $LSF_RES_ACCTDIR or /tmp
lsb.acct $LSB_SHAREDIR/cluster/logdir
lsb.events $LSB_SHAREDIR/cluster/logdir
daemon.log.host $LSF_LOGDIR

LSF Error Log Files

LSF error messages can be logged to either syslog, or log files if LSF_LOGDIR is defined in lsf.conf. There are three error log files for each server host: lim.log.host, res.log.host and sbatchd.log.host. In addition, the master host has mbatchd.log.host.

LSF Configuration Files

lsf.conf

Generic environment configuration file describing the configuration and operation of the LSF installation.

LSF_CONFDIR
The directory where all the rest of the LSF configuration files are installed.
LSF_SERVERDIR
The directory where all LSF server binaries are installed.
LSF_ROOT_REX
Allow root to run jobs through LSF.
LSF_LOG_MASK
Set the level of daemon error message logging.
LSF_LOGDIR
Directory under which error messages from all daemons are logged.
LSF_SERVER_HOSTS
Defines one or more LSF server hosts that the application must contact in order to get in touch with a LIM. Typically used by client-only hosts that do not run a LIM.
LSF_AFS_CELLNAME
AFS cellname must be specified here if AFS is installed.
LSF_AUTH
Defines the type of authentication to use.
LSF_STRIP_DOMAIN
If all hosts in the cluster can be reached using short host names, this parameter can be used to specify the portion of the domain name to remove.
LSF_LICENSE_FILE
The full pathname of the FLEXlm license file used by LSF.
LSF_LIM_PORT
Defines the UDP port number LIM uses to serve all applications.
LSF_RES_PORT
Defines the TCP port number RES uses to serve all applications.
LSB_CONFDIR
The directory where the LSF Batch configuration files are installed.
LSB_DEBUG
If defined, LSF Batch will run in single user mode.
LSB_MAILPROG
Defines the name of a sendmail-compatible transport program.
LSB_MAILTO
Defines the user to whom LSF Batch sends electronic mail when jobs complete or have errors, and in the case of critical system errors.
LSB_MBD_PORT
Defines the TCP port number mbatchd uses to serve all applications.
LSB_SBD_PORT
Defines the TCP port number sbatchd uses to serve all applications.
LSB_SHAREDIR
Defines where LSF Batch keeps job history and accounting log files for each cluster.

lsf.shared

This is the system configuration file that is shared by all load sharing clusters of an LSF installation. This file contains the following sections:

Cluster
Contains a list of the names of the clusters in this LSF installation. Keywords: ClusterName, Servers.
HostType
Defines the list of valid host types. Keywords: TypeName
HostModel
Defines the host models and their associated CPU scaling factors. Keywords: ModelName, CPUFactor.
Resource
Defines static resource names. Keywords: ResourceName, Description.
NewIndex
Defines external load indices (site defined load indices).

lsf.cluster.cluster

The configuration file for the named cluster. The cluster name must be defined in lsf.shared.

Parameters
Specifies miscellaneous parameters.
ELIMARGS
Specifies the arguments to be passed to the external LIM on startup.
FEATURES
Specifies the names of those LSF features that are to be enabled for all the hosts in the cluster. Valid names: lsf_base (LSF Base) lsf_batch (LSF Batch), lsf_mc (LSF MultiCluster), and lsf_js (LSF JobScheduler). Default: lsf_base and lsf_batch.
ClusterAdmins
Defines LSF administrators for this cluster. Keywords: ADMINISTRATORS.
RemoteClusters
Defines the remote clusters that the local cluster is interested in. Only used in an LSF MultiCluster environment.
CACHE_INTERVAL
Controls how long load information from the remote cluster is cached locally.
EQUIV
Specifies the remote cluster may be "equivalent" to the local cluster.
RECV_FROM
Controls whether remote cluster users can run interactive jobs on local cluster.
Host
Lists the hosts that form this cluster together with their attributes.
HOSTNAME
The official host name (as returned by hostname(1)). Mandatory.
model
Determines the CPU scaling factor for the host. Mandatory.
type
Defines the host type. Mandatory.
server
Defines the host as a server. Optional. Default: 1 (server).
ND
Defines the number of local disks on this host. Optional. Used when LIM does not report disks correctly.
r15s, r1m, r15m, ut, pg, it, io, ls, swp, mem, tmp, and external index names
Load threshold indices. The host is marked as busy when any value is exceeded.
RESOURCES
The static resources associated with this host.
RUNWINDOW
Defines when the host accepts remote jobs.

lsf.task, lsf.task.cluster, and .lsftask

Task resource requirement lists. lsf.task applies to all clusters and all users while lsf.task.cluster applies to the named cluster. Individual users can define a .lsftask in their home directory.

RemoteTasks
Defines the tasks that can be run remotely. A resource requirement string may be appended to a task.

hosts

Defines LSF hosts in order to resolve inconsistent host naming practices in some environments. The format is the same as /etc/hosts.

lsb.params

This file defines the operating parameters of LSF Batch.

DEFAULT_QUEUE
The system default queues.
DEFAULT_HOST_SPEC
A host name or host model name used as the system default for adjusting CPU time limit.
MBD_SLEEP_TIME
The job dispatching interval in seconds.
SBD_SLEEP_TIME
The job checking interval in seconds.
JOB_ACCEPT_INTERVAL
The minimum interval between dispatching jobs to the same host. Measured in numbers of MBD_SLEEP_TIME periods.
MAX_SBD_FAIL
The maximum number of retries for reaching a non-responding sbatchd daemon.
CLEAN_PERIOD
The amount of time that records are kept by the mbatchd daemon for jobs that have finished or been killed.
MAX_JOB_NUM
The maximum number of finished jobs that the lsb.events file can store before mbatchd switches to a new file.
HIST_HOURS
The number of resent hours during which the CPU time used by a user is considered when calculating the priorities of a fairshare queue.
PG_SUSP_IT
The number of seconds during which a host should be interactively idle before a pg suspended job can be resumed.
DEFAULT_PROJECT
The system default project name.

lsb.queues

This file defines the job queues configured for an LSF cluster.

QUEUE_NAME
Name of the queue. 'default' is reserved and cannot be used as queue name.
DESCRIPTION
A brief description of the queue.
PRIORITY
Priority of the queue.
NICE
The nice value for running jobs.
QJOB_LIMIT
The maximum number of job slots.
UJOB_LIMIT
Per-user maximum number of job slots.
HJOB_LIMIT
Per-host maximum number of job slots.
PJOB_LIMIT
Per-processor maximum number of job slots.
FAIRSHARE
Jobs in this queue are scheduled based on a fair share policy.
PREEMPTION
Defines preemption relationship between this queue and other queues:
PREEMPTIVE
Jobs in this queue may preempt jobs (running or suspended) from lower priority queues.
PREEMPTABLE
Running jobs from this queue may be preempted by jobs in higher priority queues even if those higher priority queues have not specified PREEMPTIVE.
EXCLUSIVE
Jobs dispatched from this queue can run exclusively on a host if the user so specifies at job submission time.
INTERACTIVE
Specifies the queue's policy in accepting interactive jobs.
JOB_ACCEPT_INTERVAL
The minimum interval between dispatching jobs to the same host. Overrides the same parameter in lsb.params. Measured in numbers of MBD_SLEEP_TIME periods
JOB_CONTROLS
Control actions for suspending, resuming, and terminating jobs dispatched to this queue.
TERMINATE_WHEN
Specifies that the TERMINATE action be invoked (instead of the SUSPEND action) when the run window closes, the load exceeds the suspending thresholds, or the job is being preempted to allow another job to run.
DISPATCH_WINDOW
Defines the times during which jobs in this queue can be dispatched.
RUN_WINDOW
Defines the times during which jobs in this queue may execute.
NEW_JOB_SCHED_DELAY
The delay time after a new job has been submitted to this queue before mbatchd starts a new schedule session.
SLOT_RESERVE
Job slots reservation time threshold for scheduling parallel jobs. Measured in numbers of MBD_SLEEP_TIME periods. Default: 0.
USERS
The names of users and user groups that are authorized to use this queue.
ADMINISTRATORS
Administrators of the queue.
HOSTS
Names of hosts, host groups and host partitions that are used to run jobs from this queue.
r15s, r1m, r15m, ut, pg, io, ls, it, swp, mem, tmp, and external index names
The threshold values for the individual load indices used in scheduling and suspending jobs.
RES_REQ
The resource requirements for selecting and sorting candidate hosts to run jobs in this queue.
STOP_COND
Resource requirement string specifying the condition for suspending a running job in this queue.
RESUME_COND
Resource requirement string specifying the condition for resuming a suspended job in this queue.
MIG
The automatic job migration threshold in minutes.
DEFAULT_HOST_SPEC
A host name or host model name for adjusting CPU time limits.
SNDJOBS_TO
Specifies the list of remote queues to send jobs to. For LSF MultiCluster only.
RCVJOBS_FROM
Specifies the list of remote clusters that are allowed to send jobs to the queue. For LSF MultiCluster only.
CPULIMIT
The total amount of normalized CPU time that a job from this queue is allowed to consume.
RUNLIMIT
The wall-clock run time limit for a job from this queue.
PROCLIMIT
The processor limit (parallelism limit) for a parallel job which can be accepted by this queue.
FILELIMIT
The per-process file size limit for all jobs from this queue.
DATALIMIT
The per-process data segment size limit for all jobs from this queue.
STACKLIMIT
The per-process stack segment size limit for all jobs from this queue.
CORELIMIT
The per-process core file size limit for all jobs from this queue.
MEMLIMIT
The amount of total resident set size limit for a job from this queue.
SWAPLIMIT
The amount of total virtual memory limit for a job from this queue.
PROCESSLIMIT
The number of concurrent processes for a job from this queue.
PRE_EXEC
The pre-execution command for the jobs in the queue.
POST_EXEC
The post execution command for the jobs in the queue.
REQUEUE_EXIT_VALUES
The exit values used by LSF Batch to requeue jobs dispatched from a queue.
JOB_STARTER
Job starter command for jobs in the queue.
NQS_QUEUES
The NQS destination queues.

lsb.hosts

This file contains information about the batch server hosts in an LSF cluster.

Host
Defines the hosts that are used by an LSF cluster as batch job servers.
HOST_NAME
The host name. It can be the official host name, the host type name, the host model name, or the reserved word default.
MXJ
The maximum number of job slots that this host can process concurrently.
JL/U
The maximum number of job slots per user that can be processed concurrently on this host.
DISPATCH_WINDOW
Defines the times during which batch jobs may be dispatched to this host to run.
r15s, r1m, r15m, ut, pg, io, ls, it, swp, mem, tmp, and external index names
The threshold values for individual load indices used in scheduling and suspending jobs.
CHKPNT
Specifies special types of checkpoint support available on the host.
MIG
The automatic job migration threshold.
HostGroup
Defines host groups, which are aliases for groups of hosts.
GROUP_NAME
The name of a host group.
GROUP_MEMBER
A list of names of hosts and host groups that are members of the group.
HostPartition
Defines subsets of hosts that must be accessed by users in a controlled manner, giving each user or user group a fair share of the resources.
HPART_NAME
The name of the host partition.
HOSTS
A list of names of hosts and host groups that are members of the host partition.
USER_SHARES
A number of [username, share] pairs.

lsb.users

This file contains information about the batch users in an LSF cluster.

UserGroup
Defines user groups, which are aliases for groups of users.
GROUP_NAME
The name of a user group.
GROUP_MEMBER
A list of names of users and user groups that are members of the group.
User
Defines the maximum number of jobs that can be run concurrently by the LSF Batch system for specific users or user groups.
USER_NAME
The user name, the user group name or the reserved word default.
MAX_JOBS
The maximum number of job slots for this user or user group that can be used concurrently in the cluster.
JL/P
The maximum number of job slots for this user or user group that can be used concurrently on each processor.

lsb.calendars

This file contains the definitions of system calendars. It is only applicable to LSF JobScheduler systems.

NAME
The name of the calendar.
TIME_EVENTS
A list of time expressions separated by spaces defining when this calendar is active. The format of a time expression is "year:month:day:hour:min[%duration]".
DESCRIPTION
A string description of the calendar.

LSF Architecture

LSF Architecture

Related Documents

Documentation for LSF consists of this reference and the LSF User's Quick Reference, the LSF Installation Quick Reference, the LSF User's Guide, the LSF Administrator's Guide, the LSF Programmer's Guide, the LSF JobScheduler User's Guide, the LSF man pages, and the xlsadmin, xlsbatch, and xlsmon on-line help.


doc@platform.com

Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.