[Contents] [Prev] [Next] [End]
There are a number of commands in the JobScheduler system to allow you to monitor the system and to acquire information about available resources.
The lsclusters command lists JobScheduler cluster and resource summary information. The -l option displays all information available about the cluster and its configuration:
% lsclusters -l CLUSTER_NAME STATUS MASTER_HOST ADMIN HOSTS SERVERS production ok hostG JSadm 19 17 LSF administrators: JSadm user8 user2 Available resources:hpux solaris aix nt cserver fserver Available host types:hppa sgi sparc rs6k intel Available host models:hp735 hp715 PENTIUM120 IBM350 SunSparc r10000 Accept jobs from this cluster: yes Send jobs to this cluster: yes
The first name listed in the LSF administrators line is the primary cluster administrator who can actually change the cluster configuration. The value for SERVERS means total number of server hosts in the cluster, while the value for HOSTS means total hosts including servers and client-only hosts. The last two lines in the output are useful for multi-cluster only.
Detailed resource and host information can be found by other commands as described below.
A cluster may consist of hosts of different architectures and capabilities. Your jobs may depend on specific features being available. Use the lshosts command to display static information about hosts.
% lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES hostA hppa HP715 4 1 64M 128M Yes (cs hpux) hostB sunsol sparc 3 1 96M 128M Yes (sun solaris) hostD SGI R10000 10 8 512M 1024M Yes (sgi irix6)
The output displays the CPU architecture (type), model (model), and relative performance factor (cpuf), the number of processors (ncpus), the maximum amount of physical memory available for user processes (maxmem), total swap space available (maxswp), if the host is a JobScheduler server or client (server), and the resources defined for each host (RESOURCES).
To view a particular machine, you can specify a host name on the command line.
% lshosts hostD HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES hostD SGI R10000 10 8 512M 1024M Yes (sgi irix6)
The parameters displayed by lshosts are either defined by the JobScheduler administrator in the configuration files, or determined directly from the system.
The Load Information Managers (LIM) in the JobScheduler system monitors the dynamic load situation of every server host in the cluster. This information is used by JobScheduler to make job scheduling decisions to achieve load balancing. Use the lsload command to display the current load information.
% lsload HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem hostB ok 0.1 0.0 0.1 2% 0.0 5 9 24M 52M 45M hostD ok 2.2 2.6 1.6 89% 14 4 0 45M 189M 33M hostA busy 1.4 *2.1 1.5 99% 0.8 5 33 12M 24M 23M
The first line lists all the load index names. The load indices are host status (status), 15-second run queue length (r15s), 1-minute run queue length (r1m), 15-minute run queue length (r15m), CPU utilization (ut), paging activity (pg), logins (ls), idle time (it), available space in temporary file system (tmp), available swap space (swp), and available memory (mem).
A host is busy if any load index is beyond its configured threshold. When a load index is beyond its threshold, it is printed with an asterisk. In the example, host hostA is busy because load index r1m is too high.
The JobScheduler system collects 11 built-in load indices and an arbitrary number of external load indices. The external load indices are site configured load indices that reflect any site specific load information.
The -l option displays the values of all load indices, including external load indices. You can also specify host names on the command line to display the load of specific hosts.
% lsload -l hostA HOST_NAME status r15s r1m r15m ut pg io ls it tmp swp mem nio hostA lockW 0.2 0.1 0.1 11% 0.0 228 7 0 11M 52M 25M 3.5
In this example, nio is an external load index defined by the JobScheduler administrator.
The lsmon command provides a continuous updating display of load information. An example display from lsmon is shown below. You can specify the resource requirements, refresh interval and other parameters interactively or on the command line. See the lsmon(1) manual page for more information.
Hostname: hostE Refresh rate: 10 secs HOST_NAME status r15s r1m r15m ut pg ls it swp mem tmp hostA ok 0.4 0.5 0.7 5% 0.1 28 0 538M 421M 12M hostE ok 0.0 0.1 0.1 8% 9.9 8 0 27M 51M 253M hostB ok 0.0 0.0 0.0 5% 0.0 2 0 47M 0M 45M hostF busy 3.2 *3.1 2.4 96% 0.9 20 0 129M 2M 29M
The lsinfo command lists all valid resource names in the cluster. The resource names are either built-in resource names maintained by the Load Information Manager (LIM), or defined by cluster administrators through configuration. The output from the lsinfo command is usually quite long; the following example has been edited.
% lsinfo RESOURCE_NAME TYPE ORDER DESCRIPTION r15s Numeric Inc 15-second CPU run queue length r1m Numeric Inc 1-minute CPU run queue length (alias: cpu) r15m Numeric Inc 15-minute CPU run queue length ut Numeric Inc 1-minute CPU utilization (0.0 to 1.0) pg Numeric Inc Paging rate (pages/second) ls Numeric Inc Number of login sessions (alias: login) it Numeric Dec Idle time (minutes) (alias: idle) swp Numeric Dec Available swap space (Mbytes) (alias: swap) mem Numeric Dec Available memory (Mbytes) ncpus Numeric Dec Number of CPUs ndisks Numeric Dec Number of local disks maxmem Numeric Dec Maximum memory (Mbytes) maxswp Numeric Dec Maximum swap space (Mbytes) cpuf Numeric Dec CPU factor hppa Boolean N/A HPPA architecture solaris Boolean N/A SunSolaris cs Boolean N/A Compute Server fs Boolean N/A File server type String N/A Host type model String N/A Host model hname String N/A Host name TYPE_NAME SGI hppa sunsol intel MODEL_NAME CPU_FACTOR R10000 10 HP715 3 sparc 5 pentium120 2
You will frequently find it convenient to use resource requirements to restrict the selection of hosts to run your jobs. The output of lsinfo tells you the whole dictionary of valid resource names that you can use in your resource requirements. For details of using resource requirements, see 'Resource Requirements'.
The bhosts command displays information about the JobScheduler server hosts that have been configured to run production jobs. It is possible to configure your cluster such that only some server hosts in the JobScheduler cluster run jobs.
% bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV hostA closed 2 1 1 1 0 0 0 hostB ok 2 4 1 1 0 0 0 hostD ok - 8 4 3 0 1 0
The command displays the status of the host (status), job limit per user (JL/U), maximum number of job slots on the host for running jobs concurrently (MAX), number of jobs currently dispatched (NJOBS), running (RUN), suspended by the system (SSUSP), and suspended by users (USUSP). The field RSV indicates job slots that are currently reserved for future jobs.
A server host is closed if it cannot accept more jobs. In the above example, hostA is only allowed to run one job at a time (MAX=1) and it already has one job running, so its status is closed.
The -l option gives all information about each JobScheduler server. You can also specify host names on the command line to list the information for specific hosts.
% bhosts -l hostB HOST: hostB STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOWS ok 3 1 4 1 1 0 0 0 2:00-20:30 CURRENT LOAD USED FOR SCHEDULING: r15s r1m r15m ut pg io ls it tmp swp mem Total 0.8 0.4 0.6 18% 0.3 61 3 0 19M 36M 19M Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M LOAD THRESHOLD USED FOR SCHEDULING r15s r1m r15m ut pg io ls it tmp swp mem loadSched - 1.2 - - 34 - - - - 20 - loadStop - - - - - - - - - - -
The DISPATCH_WINDOWS field is a configuration parameter that your cluster administrator can set to disallow jobs to be sent to the host unless inside the time windows.
The CURRENT LOAD USED FOR SCHEDULING section of the output shows you the load values used by JobScheduler in determining whether additional jobs should be sent to this host. The Total load includes the real load information adjusted by Reserved load. The Reserved load will be non-zero if some jobs are submitted with the resource reservation option.
The LOAD THRESHOLD USED FOR SCHEDULING section of the output is the configured load threshold for this host. JobScheduler will not schedule a new job to this host if one or more of its load indices go beyond the loadSched threshold. If the host load is beyond the loadStop threshold, then some or all existing jobs running on this host will be suspended until the load situation falls within loadStop threshold. A '-' in the threshold values means that there is no threshold defined for this load index.
The busers command displays the maximum number of jobs a user or group may execute on a single processor, the maximum number of jobs a user or group may execute in the cluster, the total number of jobs submitted by the user, and the number of jobs in the PEND, RUN, SSUSP and USUSP states. The default is to display information about your options.
% busers all USER/GROUP JL/P MAX NJOBS PEND RUN SSUSP USUSP RSV default 1 12 - - - - - - userD 1 12 34 22 10 2 0 0 groupA - 100 19 7 11 1 1 0 user1 2 - 1 0 0 1 0 0
Note
If the reserved user name all is specified, busers reports all users who currently have jobs in the system, as well as default, which represents a typical user. The purpose of listing default in the output is to show the job limits (JL/P and MAX) of a typical user. No other parameters make sense for default.
Jobs are kept in job queues. JobScheduler runs jobs from a queue when resources are available and user specified conditions are met. An arbitrary number of queues can be configured by your cluster administrators to implement different scheduling policies and job execution constraints. A job queue can be configured to run jobs on all server hosts in the JobScheduler cluster, or to run jobs only on some designated hosts. Queues can be configured for different purposes. For example, a 'DBM' queue could be configured only to run jobs that do database maintenance and an 'orderCenter' queue can be configured to run data-driven order processing jobs.
The bqueues command lists the available JobScheduler queues.
% bqueues QUEUE_NAME PRIO NICE STATUS MAX JL/U JL/P NJOBS PEND RUN SUSP orderCenter 43 0 Open:Active - - - 3 0 3 0 DBM 43 10 Open:Active - - 1 5 4 1 0 nightly 30 20 Open:Inactive - - 2 23 23 0 0
For each queue defined in the system, the output displays the priority (PRIO), the operating system scheduling priority to be set when the job is started (NICE), the queue status (STATUS), the limit on the number of jobs dispatched at one time (MAX), the limit on the number of jobs dispatched at one time for each user (JL/U), the limit on the number of jobs dispatched to each processor (JL/P), the total number of jobs in queue (NJOBS)1, the number of pending jobs (PEND), the number of running jobs (RUN), and the number of suspended jobs (SUSP).
The -l option displays the complete status and configuration for each queue. You can specify a queue name on the command line to select specific queues:
% bqueues -l DBM QUEUE: DBM -- For Database maintenance jobs. Only run on hosts that have access to DB. PARAMETERS/STATISTICS PRIO NICE STATUS MAX JL/U JL/P NJOBS PEND RUN SSUSP USUSP 40 10 Open:Active - - 1 5 4 1 0 0 SCHEDULING PARAMETERS r15s r1m r15m ut pg io ls it tmp swp mem loadSched - 0.8 - - - - - - - 5M - loadStop - - - - - - - - - - - USERS: all users HOSTS:all hosts used by lsbatch system RES_REQ: type==rs6000 ADMINISTRATORS: user4 PRE_EXEC: su $DBADMIN -c /usr/local/bin/dbinit POST_EXEC:su $DBADMIN -c /usr/local/bin/dbclose REQUEUE_EXIT_VALUES: 45
The SCHEDULING PARAMETERS define the job scheduling and/or suspending load thresholds. JobScheduler only runs jobs on hosts that are within the loadSched threshold. A job that is already running will be suspended if the load of the execution host has gone beyond the loadStop threshold. In the above example, a job in DBM queue will be dispatched to a host only if the host's 1-minute run queue length is less than 0.8, and the free swap space is greater than 5MB. An already running job will never be stopped because no loadStop threshold has been defined for this queue.
Note
Queue level SCHEDULING PARAMETERS apply to all hosts in the queue as defined by the HOSTS parameter. If a particular host also has LOAD THRESHOLD USED FOR SCHEDULING defined (See 'JobScheduler Server Host Information'), JobScheduler will use whichever value is more restrictive in doing the scheduling.
The RES_REQ parameter is a queue-level resource requirement. This allows your cluster administrator to specify a common resource requirement for all jobs in this queue. In the previous example, the resource requirement tells JobScheduler only to run jobs on hosts of type rs6000. See 'Resource Requirements' for details about resource requirements.
A queue can also have one or more administrators so that they can do some administrative control over the queue and jobs in the queue.
The PRE_EXEC (pre-execution command) and POST_EXEC (post-execution command) parameters allow your cluster administrators to define operations that must be done before or after the execution of a job. If pre-execution command fails, the job will be requeued and retried later. In this example, the pre-execution command initializes the database and post-execution command does some cleaning up. The LSF Administrator's Guide discusses more details about setting up pre-execution and post-execution commands at queue level.
The REQUEUE_EXIT_VALUES parameter defines one or more job exit values such that if a job from this queue exits with one of the values, the job will be requeued and automatically retried later. This can be used for robust job processing so that temporary error conditions will not abort the job execution.
Note that the bqueues output only displays fields that apply to the queue. Any status or configuration field that is not displayed has a default value that does not affect job scheduling or execution. See the LSF Administrator's Guide for the complete features available for queues.
JobScheduler comes with two graphical interface applications for displaying the system and cluster information previously described.
xlsmon displays cluster host configuration information and real-time load information. It displays host status, load levels, load history, and cluster configuration information.
The xlsmon main window shows an icon for each host in the cluster. Each host is labelled with its status. Hosts change colour as their status changes. Figure 16 shows the xlsmon main window.
You can choose other displays from the View menu. The Detailed Host Load window displays load levels as bar graphs. You can select which load indices and which hosts are displayed by choosing options from the View menu in the Detailed Host Load window. Figure 17 shows the Detailed Host Load window.
The History of Host Load window displays the load levels as strip charts, so you can see the load history starting from when the History of Host Load window is first displayed. As with the Detailed Host Load window, you can select hosts and load indices by choosing options from the View menu. Figure 18 shows the Detailed Host Load window.
The Cluster Configuration window, shown in Figure 19, displays the same host information as the lshosts command displays.
Each xlsmon window has a Help menu item that calls up on-line help. For more information about using xlsmon, see the on-line help.
xlsbatch displays information about various JobScheduler entities such as queues, hosts, jobs, etc.
The main window of xlsbatch contains three optional areas: job area, host area, and queue area. You can choose to display some areas but not others. Figure 20 shows the main window of xlsbatch.
By double-clicking on the corresponding icon of a host, queue, or job, you will get a pop-up window which shows more detailed information. Figure 21 is the pop-up window for queue information.
Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.