[Contents] [Prev] [Next] [End]
LSF JobScheduler provides a single system image for your cluster so that you can use the whole cluster as if it were a single computer. After you have submitted jobs into the system, you can view the status of your jobs or do various manipulations on your jobs from anywhere in the cluster. This chapter demonstrates the job tracking and manipulation tools in JobScheduler.
The status of a submitted job is one of the following:
See 'Job Status' for further information about job states.
Use the bjobs command to view the submitted jobs.
% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 6848 user1 PEND sysadm hostA diskcheck Dec 17 11:52 7142 user1 PEND sysadm hostA backup Dec 21 15:45
By default bjobs will only display the jobs you submitted. Use the -u user_name option to view the jobs of other users. Use the reserved user name all to see the jobs of all the users.
% bjobs -u all JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 6745 user2 RUN business hostD hostB report Dec 19 09:04 6916 user3 RUN business hostA hostD analyse Dec 19 09:05 6848 user1 PEND sysadm hostA diskcheck Dec 17 11:52 7142 user1 PEND sysadm hostA backup Dec 21 15:45 7157 user4 PEND night hostA forecast Dec 18 10:56
Sometimes you may have forgotten detailed attributes about your jobs. Use -l option to view everything about your jobs. You can also specify the jobID to view a particular job.
% bjobs -l 7142 Job Id <7142>, Job Name <backup>, User <user1>, Project <default>, Status <PEND >, Queue <sysadm>, Command </var/adm/backup/bin/dumpit> Sat Dec 21 15:04:34: Submitted from host <hostA>, CWD </var/adm>, Specified Hos ts <hostD>, Dependency Condition(calendar(weekly)); PENDING REASONS: Job dependency condition not satisfied; SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - -
The SCHEDULING PARAMETERS for a job come from queue's SCHEDULING PARAMETERS, as described in 'Detailed Queue Information'.
Use the -s option to view the suspended jobs only, showing the reason why the jobs are suspended.
% bjobs -s JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 1999 user1 PSUSP default hostA jobA Dec 10 15:33 The job was suspended by user or system admin while pending;
Use the -p option to view the pending jobs only. Along with the job information it also shows the reason why each job was not dispatched during the last dispatch turn.
% bjobs -p JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 1999 user1 PSUSP default hostA jobA Dec 10 15:33 The job was suspended by user or system admin while pending; 5518 user1 PEND default hostA jobB Dec 14 10:27 Job dependency condition not satisfied; 8056 user1 PEND default hostA jobB Dec 20 11:41 Job dependency condition not satisfied;
If you want to know more details about why your jobs are pending, use both -p and -l options. You can use several options in combination to get more details from the system.
You may need to know what has happened to your job since it was submitted. The bhist command displays a summary of the pending, suspended and running time of jobs. Use the -l option to print the time information and a complete history of the scheduling events for each run of your job.
% bhist -l Job Id <7848>, Job Name <diskcheck>, User <user1>, Project <default>, Command < find / -name core -atime +7 -exec rm {} \;> Tue Dec 17 11:52:13: Submitted from host <hostA> to Queue <default>, CWD <$HOME >, Dependency Condition <calendar(daily)>; Sat Dec 21 07:00:12: Started on <hostA>, Pid <29027>; Sat Dec 21 07:00:12: Running with execution home </home/user1>, Execution CWD < /home/user1>; Sat Dec 21 07:00:55: Done successfully. The CPU time used is 12.2 seconds; Sun Dec 22 07:00:05: Started on <hostA>, Pid <986>; Sun Dec 22 07:00:05: Running with execution home </home/user1>, Execution CWD < /home/user1>; Sun Dec 22 07:01:18: Done successfully. The CPU time used is 11.9 seconds; Mon Dec 23 07:00:02: Started on <hostA>, Pid <2892>; Mon Dec 23 07:00:02: Running with execution home </home/user1>, Execution CWD < /home/user1>; Mon Dec 23 07:01:13: Done successfully. The CPU time used is 10.5 seconds; Tue Dec 24 07:00:10: Started on <hostA>, Pid <4905>; Tue Dec 24 07:00:10: Running with execution home home/user1>, Execution CWD </h ome/user1>; Tue Dec 24 07:03:31: Done successfully. The CPU time used is 19.7 seconds; Tue Dec 24 15:17:14: Delete requested by user or administrator <user1>; Tue Dec 24 15:17:14: Exited. The CPU time used is 0.0 seconds. Summary of time in seconds spent in various states by Tue Dec 24 15:17:14 1996 PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 617057 0 44 0 0 0 617101
JobScheduler keeps job history information after the job completes a run, so you can look at the history of jobs that ran in the past. The length of the history depends on how often JobScheduler prunes event log files. The system automatically backs up and prunes the job history log when necessary.
By default, bhist only displays job history from the current event log file. The -n num_logfiles option tells the bhist command to search through the specified number of log files instead of only searching the current log file. Log files are searched from the most recent files starting with the current event file and then the backup files.
% bhist -n 3
The above command will read the current event file and then the two most recent backup files.
The bmodify command allows you to modify the options of a submitted job. The value for the option you want to modify is overridden with a new value using the same option syntax as the bsub command.
% bmodify -w "calendar(complex)" 7848
To reset an option to its default value, use the option string followed by 'n'. No value should be specified when resetting an option.
% bmodify -wn 7848
Modifying option values will only affect future scheduling of the job. If the job is not dependent on a calendar and has already been started, then bmodify will not change any option values.
The -O option allows you to change the options of a calendar dependent job to affect only the next run of the job.
% bmodify -O -w "calendar(complex)" 7848
After a job has run once with the new values, the old values for the options are restored for subsequent runs. If a job already has been dispatched, then any option changes will take effect the next time the job is scheduled.
Note
All options specified at submission time may be changed except for the job command line and the environment variables.
You can use the bmodify command to change queues.
% bmodify -q resubmit 7848
There is also the bswitch command which is used to switch one or more unfinished jobs from one queue to another.
% bswitch -J diskcheck resubmit Job <7848> is switched to queue <resubmit>
Use the bdel command to remove a calendar-driven job. This command removes a specific job associated with a calendar from the system. If the job is currently running, bdel kills the process before removing the job from the system.
% bdel 3456 Job <3456> is being deleted
You can specify a job by name using the -J option.
% bdel -J jobA Job <3457> is being deleted
Use the bkill command on a job that depends on another job or on a file or external event.
% bkill 3467 Job <3467> is being terminated
You can use bkill to send an arbitrary signal to your job using the -s option. You can specify either the signal name or the signal number. On most versions of UNIX, signal names and numbers are listed in the kill(1) or signal(2) manual page.
% bkill -s SIGTSTP 3488 Job <3488> is being signalled
This example sends the SIGTSTP signal (terminal stop) to the job.
Note
Different operating systems use different numbering sequences for signals. Therefore, signal numbers are translated across platforms. The intended meaning of a signal is interpreted by the machine from which the bkill command is issued. For example, if you send signal 24 from a SUN Solaris host, it means SIGTSTP. If the job is running on an HP-UX server, SIGTSTP is defined as signal number 25, so signal 25 is sent to the job.
Using bkill on a job associated with calendars kills the current run, if the job has been started, and requeues the job. If the job is not currently running, bkill has no effect. To permanently remove the job, use bdel.
You can only delete or kill your own jobs. Only the JobScheduler administrators can operate on jobs submitted by other users.
You can specify the number of times your job will execute. Submit the job, then use the -n num_runs option to the bdel command. After the job runs the specified times, it is deleted from the system.
% bsub -w "calendar(daily)" -J jobA command Job <8087> is submitted to default queue <default>. % bdel -n 5 -J jobA Job <8087> will be deleted after running next 5 times
The bstop and bresume commands are convenient aliases for bkill -s, sending the SIGSTOP/SIGTSTP and SIGCONT signals respectively.
You cannot send arbitrary signals to a pending job; most signals are only valid for running jobs. However, you can send kill, suspend and resume signals to pending jobs.
% bstop -J diskcheck Job <7848> is being stopped
bstop sends the SIGSTOP signal to sequential jobs and SIGTSTP to parallel jobs. SIGTSTP is sent to a parallel job so the master process can trap the signal and pass it to all the slave processes running on other hosts. Suspending causes your job to go into USUSP state if it has already started, or to go into PSUSP state if it is pending.
% bresume -J diskcheck Job <7848> is being resumed
Resuming a user suspended job does not immediately put your job into RUN state. The job must first satisfy its dependency conditions. bresume first puts your job into SSUSP state. The job can then be scheduled accordingly.
Note
Sending arbitrary signals to a job running on a Windows NT machine is not supported. You can only use the bstop and bresume commands on a job running on Windows NT.
After you have submitted a number of jobs all assigned the same job_name (see 'Grouping Related Jobs'), you can use that job_name to refer to the jobs as a group.
% bjobs -J job_group JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8315 user1 PEND default hostA job_group Dec 24 15:17 8316 user1 PEND default hostA job_group Dec 24 15:18 8318 user1 RUN default hostA job_group Dec 24 15:22
To switch all jobs in a group to another queue:
% bswitch -J job_group normal 0 Job <8315> is switched to queue <normal> Job <8316> is switched to queue <normal> Job <8318> is switched to queue <normal>
To suspend all jobs in a group:
% bstop -J job_group 0 Job <8315> is being stopped Job <8316> is being stopped Job <8318> is being stopped
To remove all jobs in a group:
% bdel -J job_group 0 Job <8315> is being deleted Job <8316> is being deleted Job <8318> is being deleted
To show the history of all jobs in a group:
% bhist -J job_group 0 Summary of time in seconds spent in various states: JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL 8315 user1 *b_group 17 49 39 81 0 0 186 8316 user1 *b_group 84 50 0 0 0 0 134 8318 user1 *b_group 18 50 43 19 0 0 130
Note
For the bjobs command all jobs of the same name are considered. For the other commands, only the last submitted job of this name is considered by default, unless the special jobID 0 is specified. The bmodify command only operates on a single jobID.
You may prefer to use the JobScheduler GUI to manage your jobs. xlsbatch displays all JobScheduler entities such as jobs, queues, and hosts. It also allows you to manipulate jobs directly from the GUI. Figure 24 is the xlsbatch job information window.
By selecting jobs in the job window, you can directly perform job manipulations that have been discussed in this chapter. For example, by selecting a job and then clicking on the 'Detail' button, a pop-up window will be started showing all details of your selected job, as shown in Figure 25.
Figure 26 shows the detailed job history window if you click on the 'History' button .
Detailed usage of the xlsbatch is described in the on-line help.
If you want to modify a job, you can either click on the 'Modify' button, or run the xbmodify GUI directly from the command line. A job modification window will pop-up as shown in Figure 27.
By clicking on the 'Job ID' button , you get another pop-up window with all jobIDs for you to select, as shown in Figure 28.
Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.