[Contents] [Prev] [Next] [End]


Chapter 5. JobScheduler Environment


There are a number of commands in the JobScheduler system to allow you to monitor the system and to acquire information about available resources.

bhosts
Displays information about JobScheduler servers.
bparams
Displays information about the configurable system parameters of JobScheduler.
bqueues
Displays status and other information about the JobScheduler queues.
lshosts
Displays static resource information about the hosts available in the cluster.
lsinfo
Displays all valid resource information about the cluster.
lsload
Displays load information of the hosts in the JobScheduler cluster.
lsmon
Displays and updates the load information of the hosts in a cluster.

Viewing Cluster Information

The lsclusters command lists JobScheduler cluster and resource summary information. The -l option displays all information available about the cluster and its configuration:

% lsclusters -l
CLUSTER_NAME   STATUS   MASTER_HOST               ADMIN    HOSTS  SERVERS
production        ok       hostG                  JSadm      19       17
LSF administrators: JSadm user8 user2
Available resources:hpux solaris aix nt cserver fserver
Available host types:hppa sgi sparc rs6k intel
Available host models:hp735 hp715 PENTIUM120 IBM350 SunSparc r10000
Accept jobs from this cluster: yes
Send jobs to this cluster: yes

The first name listed in the LSF administrators line is the primary cluster administrator who can actually change the cluster configuration. The value for SERVERS means total number of server hosts in the cluster, while the value for HOSTS means total hosts including servers and client-only hosts. The last two lines in the output are useful for multi-cluster only.

Detailed resource and host information can be found by other commands as described below.

Cluster Host Resource Information

A cluster may consist of hosts of different architectures and capabilities. Your jobs may depend on specific features being available. Use the lshosts command to display static information about hosts.

% lshosts
HOST_NAME      type    model cpuf ncpus maxmem  maxswp server    RESOURCES
hostA           hppa   HP715   4    1     64M    128M    Yes     (cs hpux)
hostB          sunsol  sparc   3    1     96M    128M    Yes     (sun solaris)
hostD           SGI    R10000  10   8     512M   1024M   Yes     (sgi irix6)

The output displays the CPU architecture (type), model (model), and relative performance factor (cpuf), the number of processors (ncpus), the maximum amount of physical memory available for user processes (maxmem), total swap space available (maxswp), if the host is a JobScheduler server or client (server), and the resources defined for each host (RESOURCES).

To view a particular machine, you can specify a host name on the command line.

% lshosts hostD
HOST_NAME      type    model cpuf ncpus maxmem  maxswp server    RESOURCES
hostD           SGI    R10000  10   8     512M   1024M   Yes     (sgi irix6)

The parameters displayed by lshosts are either defined by the JobScheduler administrator in the configuration files, or determined directly from the system.

Cluster Host Dynamic Load Information

The Load Information Managers (LIM) in the JobScheduler system monitors the dynamic load situation of every server host in the cluster. This information is used by JobScheduler to make job scheduling decisions to achieve load balancing. Use the lsload command to display the current load information.

% lsload
HOST_NAME   status  r15s   r1m  r15m  ut   pg   ls  it  tmp  swp  mem
hostB           ok   0.1   0.0  0.1   2%   0.0   5   9  24M  52M  45M
hostD           ok   2.2   2.6  1.6  89%   14    4   0  45M  189M 33M
hostA         busy   1.4  *2.1  1.5  99%   0.8   5  33  12M  24M  23M

The first line lists all the load index names. The load indices are host status (status), 15-second run queue length (r15s), 1-minute run queue length (r1m), 15-minute run queue length (r15m), CPU utilization (ut), paging activity (pg), logins (ls), idle time (it), available space in temporary file system (tmp), available swap space (swp), and available memory (mem).

A host is busy if any load index is beyond its configured threshold. When a load index is beyond its threshold, it is printed with an asterisk. In the example, host hostA is busy because load index r1m is too high.

The JobScheduler system collects 11 built-in load indices and an arbitrary number of external load indices. The external load indices are site configured load indices that reflect any site specific load information.

The -l option displays the values of all load indices, including external load indices. You can also specify host names on the command line to display the load of specific hosts.

% lsload -l hostA
HOST_NAME status  r15s  r1m  r15m   ut   pg  io  ls  it  tmp  swp  mem  nio
hostA      lockW   0.2  0.1   0.1  11%  0.0 228   7   0  11M  52M  25M  3.5

In this example, nio is an external load index defined by the JobScheduler administrator.

The lsmon command provides a continuous updating display of load information. An example display from lsmon is shown below. You can specify the resource requirements, refresh interval and other parameters interactively or on the command line. See the lsmon(1) manual page for more information.


Hostname: hostE                                      Refresh rate:  10 secs

HOST_NAME      status  r15s   r1m  r15m   ut    pg  ls    it   swp   mem   tmp
hostA          ok       0.4   0.5   0.7   5%   0.1  28     0  538M  421M   12M
hostE          ok       0.0   0.1   0.1   8%   9.9   8     0   27M   51M  253M
hostB          ok       0.0   0.0   0.0   5%   0.0   2     0   47M    0M   45M
hostF          busy     3.2  *3.1   2.4  96%   0.9  20     0  129M    2M   29M

Cluster Resource Information

The lsinfo command lists all valid resource names in the cluster. The resource names are either built-in resource names maintained by the Load Information Manager (LIM), or defined by cluster administrators through configuration. The output from the lsinfo command is usually quite long; the following example has been edited.

% lsinfo
RESOURCE_NAME   TYPE   ORDER  DESCRIPTION
r15s          Numeric   Inc   15-second CPU run queue length
r1m           Numeric   Inc   1-minute CPU run queue length (alias: cpu)
r15m          Numeric   Inc   15-minute CPU run queue length
ut            Numeric   Inc   1-minute CPU utilization (0.0 to 1.0)
pg            Numeric   Inc   Paging rate (pages/second)
ls            Numeric   Inc   Number of login sessions (alias: login)
it            Numeric   Dec   Idle time (minutes) (alias: idle)
swp           Numeric   Dec   Available swap space (Mbytes) (alias: swap)
mem           Numeric   Dec   Available memory (Mbytes)
ncpus         Numeric   Dec   Number of CPUs
ndisks        Numeric   Dec   Number of local disks
maxmem        Numeric   Dec   Maximum memory (Mbytes)
maxswp        Numeric   Dec   Maximum swap space (Mbytes)
cpuf          Numeric   Dec   CPU factor
hppa          Boolean   N/A   HPPA architecture
solaris       Boolean   N/A   SunSolaris
cs            Boolean   N/A   Compute Server
fs            Boolean   N/A   File server
type           String   N/A   Host type
model          String   N/A   Host model
hname          String   N/A   Host name

TYPE_NAME
SGI
hppa
sunsol
intel

MODEL_NAME       CPU_FACTOR
R10000             10
HP715               3
sparc               5
pentium120          2

You will frequently find it convenient to use resource requirements to restrict the selection of hosts to run your jobs. The output of lsinfo tells you the whole dictionary of valid resource names that you can use in your resource requirements. For details of using resource requirements, see 'Resource Requirements'.

JobScheduler Server Host Information

The bhosts command displays information about the JobScheduler server hosts that have been configured to run production jobs. It is possible to configure your cluster such that only some server hosts in the JobScheduler cluster run jobs.

% bhosts
HOST_NAME     STATUS   JL/U  MAX   NJOBS   RUN  SSUSP USUSP RSV
hostA         closed     2     1     1      1     0     0     0
hostB         ok         2     4     1      1     0     0     0
hostD         ok         -     8     4      3     0     1     0

The command displays the status of the host (status), job limit per user (JL/U), maximum number of job slots on the host for running jobs concurrently (MAX), number of jobs currently dispatched (NJOBS), running (RUN), suspended by the system (SSUSP), and suspended by users (USUSP). The field RSV indicates job slots that are currently reserved for future jobs.

A server host is closed if it cannot accept more jobs. In the above example, hostA is only allowed to run one job at a time (MAX=1) and it already has one job running, so its status is closed.

The -l option gives all information about each JobScheduler server. You can also specify host names on the command line to list the information for specific hosts.

% bhosts -l hostB

HOST: hostB
 STATUS     CPUF  JL/U  MAX  NJOBS  RUN SSUSP USUSP  RSV  DISPATCH_WINDOWS
  ok         3     1     4    1      1     0     0    0    2:00-20:30

CURRENT LOAD USED FOR SCHEDULING:
           r15s   r1m  r15m   ut    pg    io   ls    it    tmp    swp    mem
 Total     0.8   0.4   0.6   18%   0.3   61    3     0     19M    36M    19M
 Reserved  0.0   0.0   0.0    0%   0.0    0    0     0      0M    0M     0M

LOAD THRESHOLD USED FOR SCHEDULING
           r15s   r1m  r15m   ut    pg    io   ls    it    tmp    swp    mem
 loadSched   -    1.2    -     -    34     -    -     -     -      20      -
 loadStop    -     -     -     -     -     -    -     -     -      -       -

The DISPATCH_WINDOWS field is a configuration parameter that your cluster administrator can set to disallow jobs to be sent to the host unless inside the time windows.

The CURRENT LOAD USED FOR SCHEDULING section of the output shows you the load values used by JobScheduler in determining whether additional jobs should be sent to this host. The Total load includes the real load information adjusted by Reserved load. The Reserved load will be non-zero if some jobs are submitted with the resource reservation option.

The LOAD THRESHOLD USED FOR SCHEDULING section of the output is the configured load threshold for this host. JobScheduler will not schedule a new job to this host if one or more of its load indices go beyond the loadSched threshold. If the host load is beyond the loadStop threshold, then some or all existing jobs running on this host will be suspended until the load situation falls within loadStop threshold. A '-' in the threshold values means that there is no threshold defined for this load index.

Listing User Information

The busers command displays the maximum number of jobs a user or group may execute on a single processor, the maximum number of jobs a user or group may execute in the cluster, the total number of jobs submitted by the user, and the number of jobs in the PEND, RUN, SSUSP and USUSP states. The default is to display information about your options.

% busers all
USER/GROUP       JL/P  MAX  NJOBS  PEND  RUN  SSUSP USUSP RSV
default            1    12     -     -     -     -     -   -
userD              1    12    34    22    10     2     0   0
groupA             -   100    19     7    11     1     1   0
user1              2     -     1     0     0     1     0   0

Note
If the reserved user name all is specified, busers reports all users who currently have jobs in the system, as well as default, which represents a typical user. The purpose of listing default in the output is to show the job limits (JL/P and MAX) of a typical user. No other parameters make sense for default.

Viewing the JobScheduler Queues

Jobs are kept in job queues. JobScheduler runs jobs from a queue when resources are available and user specified conditions are met. An arbitrary number of queues can be configured by your cluster administrators to implement different scheduling policies and job execution constraints. A job queue can be configured to run jobs on all server hosts in the JobScheduler cluster, or to run jobs only on some designated hosts. Queues can be configured for different purposes. For example, a 'DBM' queue could be configured only to run jobs that do database maintenance and an 'orderCenter' queue can be configured to run data-driven order processing jobs.

The bqueues command lists the available JobScheduler queues.

% bqueues
QUEUE_NAME     PRIO NICE    STATUS     MAX  JL/U JL/P NJOBS  PEND  RUN  SUSP
orderCenter     43    0   Open:Active    -    -    -     3     0     3     0
DBM             43   10   Open:Active    -    -    1     5     4     1     0
nightly         30   20   Open:Inactive  -    -    2    23    23     0     0

For each queue defined in the system, the output displays the priority (PRIO), the operating system scheduling priority to be set when the job is started (NICE), the queue status (STATUS), the limit on the number of jobs dispatched at one time (MAX), the limit on the number of jobs dispatched at one time for each user (JL/U), the limit on the number of jobs dispatched to each processor (JL/P), the total number of jobs in queue (NJOBS)1, the number of pending jobs (PEND), the number of running jobs (RUN), and the number of suspended jobs (SUSP).

Detailed Queue Information

The -l option displays the complete status and configuration for each queue. You can specify a queue name on the command line to select specific queues:

% bqueues -l DBM

QUEUE: DBM
  -- For Database maintenance jobs. Only run on hosts that have access to DB.

PARAMETERS/STATISTICS
 PRIO NICE     STATUS      MAX  JL/U  JL/P  NJOBS  PEND  RUN  SSUSP USUSP
  40   10    Open:Active    -    -     1      5      4    1     0     0

SCHEDULING PARAMETERS
           r15s   r1m  r15m   ut    pg    io   ls    it    tmp    swp    mem
 loadSched   -    0.8    -     -    -      -    -     -     -      5M     -
 loadStop    -     -     -    -     -      -    -     -     -      -      -

USERS: all users
HOSTS:all hosts used by lsbatch system
RES_REQ: type==rs6000
ADMINISTRATORS:  user4
PRE_EXEC: su $DBADMIN -c /usr/local/bin/dbinit
POST_EXEC:su $DBADMIN -c /usr/local/bin/dbclose
REQUEUE_EXIT_VALUES:  45

The SCHEDULING PARAMETERS define the job scheduling and/or suspending load thresholds. JobScheduler only runs jobs on hosts that are within the loadSched threshold. A job that is already running will be suspended if the load of the execution host has gone beyond the loadStop threshold. In the above example, a job in DBM queue will be dispatched to a host only if the host's 1-minute run queue length is less than 0.8, and the free swap space is greater than 5MB. An already running job will never be stopped because no loadStop threshold has been defined for this queue.

Note
Queue level SCHEDULING PARAMETERS apply to all hosts in the queue as defined by the HOSTS parameter. If a particular host also has LOAD THRESHOLD USED FOR SCHEDULING defined (See 'JobScheduler Server Host Information'), JobScheduler will use whichever value is more restrictive in doing the scheduling.

The RES_REQ parameter is a queue-level resource requirement. This allows your cluster administrator to specify a common resource requirement for all jobs in this queue. In the previous example, the resource requirement tells JobScheduler only to run jobs on hosts of type rs6000. See 'Resource Requirements' for details about resource requirements.

A queue can also have one or more administrators so that they can do some administrative control over the queue and jobs in the queue.

The PRE_EXEC (pre-execution command) and POST_EXEC (post-execution command) parameters allow your cluster administrators to define operations that must be done before or after the execution of a job. If pre-execution command fails, the job will be requeued and retried later. In this example, the pre-execution command initializes the database and post-execution command does some cleaning up. The LSF Administrator's Guide discusses more details about setting up pre-execution and post-execution commands at queue level.

The REQUEUE_EXIT_VALUES parameter defines one or more job exit values such that if a job from this queue exits with one of the values, the job will be requeued and automatically retried later. This can be used for robust job processing so that temporary error conditions will not abort the job execution.

Note that the bqueues output only displays fields that apply to the queue. Any status or configuration field that is not displayed has a default value that does not affect job scheduling or execution. See the LSF Administrator's Guide for the complete features available for queues.

Cluster Monitoring GUI

JobScheduler comes with two graphical interface applications for displaying the system and cluster information previously described.

xlsmon

xlsmon displays cluster host configuration information and real-time load information. It displays host status, load levels, load history, and cluster configuration information.

The xlsmon main window shows an icon for each host in the cluster. Each host is labelled with its status. Hosts change colour as their status changes. Figure 16 shows the xlsmon main window.

Figure 16. xlsmon Main Window

xlsmon Main Window

You can choose other displays from the View menu. The Detailed Host Load window displays load levels as bar graphs. You can select which load indices and which hosts are displayed by choosing options from the View menu in the Detailed Host Load window. Figure 17 shows the Detailed Host Load window.

Figure 17. xlsmon Detailed Host Load Window

xlsmon Detailed Host Load Window

The History of Host Load window displays the load levels as strip charts, so you can see the load history starting from when the History of Host Load window is first displayed. As with the Detailed Host Load window, you can select hosts and load indices by choosing options from the View menu. Figure 18 shows the Detailed Host Load window.

Figure 18. xlsmon History of Host Load Window

xlsmon History of Host Load Window

The Cluster Configuration window, shown in Figure 19, displays the same host information as the lshosts command displays.

Figure 19. xlsmon Cluster Configuration Window

xlsmon Cluster Configuration Window

Each xlsmon window has a Help menu item that calls up on-line help. For more information about using xlsmon, see the on-line help.

xlsbatch

xlsbatch displays information about various JobScheduler entities such as queues, hosts, jobs, etc.

The main window of xlsbatch contains three optional areas: job area, host area, and queue area. You can choose to display some areas but not others. Figure 20 shows the main window of xlsbatch.

Figure 20. xlsbatch Main Window

xlsbatch Main Window

By double-clicking on the corresponding icon of a host, queue, or job, you will get a pop-up window which shows more detailed information. Figure 21 is the pop-up window for queue information.

Figure 21. Queue Information Pop-up Window

Queue Information Pop-up Window


1. Includes jobs that have not been dispatched and jobs that have been dispatched but have not finished.


[Contents] [Prev] [Next] [End]

doc@platform.com

Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.