[Contents] [Prev]
[Next] [End]
Appendix E. LSF on Windows
NT
This appendix describes how to run LSF on Windows
NT. It is assumed that you are already familiar with LSF concepts.
Requirements
The following are requirements for running LSF
on Windows NT:
- Windows NT 3.51/4.0 Workstation or Server running
on systems with Intel Pentium 100MHz or higher speed processors
- All the machines in one cluster belong to a single
NT domain
- User must use domain account (as opposed to local
accounts) when interacting with LSF
- All directories that LSF uses as well as user's
home directories must be on NTFS partitions
- Users must enter their passwords into an encrypted
database maintained by LSF and any changes to NT passwords must be reflected
in the password database used by LSF
- The machines running LSF are expected to have
fixed IP addresses which precludes the use of DHCP for dynamically assigning
IP addresses to hosts
- Approximately 30 Megabytes of disk space is available
Recommended
- NT Resource Kit contains many useful utilities
(for example, pview) for monitoring processes.
- A telnet daemon to enable remote login
sessions or some other form of remote access software to allow for easier
management.
Features and Limitations
LSF 3.0 for Windows NT has the following features
and limitations:
- An NT machine can be both a client and a server
in an LSF cluster
- Heterogeneous NT and UNIX clusters are supported
- Support for LSF Base with the following restrictions:
- No support for remote execution through RES
- The ls and it indices in LIM
are not supported
- No support for lslogin
- Support for LSF Batch with the following restrictions:
- No NQS compatibility support or interoperability
with NQS daemons
- No support for checkpointing
- No support for resource limits other than CPU
time
- No support for account mapping
- No environment variable reinitialization using
the -L option of bsub
- Job control limited to stop/resume/kill a job
- No support for job control actions
- No support for NT user groups
- No support for interactive batch jobs using the
-I option of bsub
- Support for LSF JobScheduler with the following
restrictions:
- No support for file or external events
- No support for LSF MultiCluster. Multicluster
functionality can be obtained by running a UNIX LSF master host.
- LSF GUIs are not available in the current release.
All interaction with LSF is through command-line tools.
- UNIX man pages are provided in HTML format
Installation
The following steps are required to install LSF
on Windows NT:
- Step 1.
- Create a domain account lsfadmin with the User Manager on
an NT server. This is the account which will run the LSF daemons. Give
the lsfadmin account the following privileges:
Act as part of the operating system
Debug program
Increase quotas
Login as a service
Replace a process level token
- The privileges can be enabled by selecting the
'Policies / User Rights' menu item. Use the 'Show Advance Rights' check
box to display these privileges.
- Once the account is created, the privileges need
to be added locally on each host that will run LSF. You can use:
usrmgr \\hostname
- to start up the User Manager on the remote machine
instead of physically logging on to the machine. This requires the local
Administrators group on each machine to include the Domain Administrators
and the logged on user to be members of the Domain Administrators group.
Note
Any valid user name other than lsfadmin can be specified.
The remainder of this appendix assumes that lsfadmin is
the account name.
- Step 2.
- Select a Windows NT Server to install the LSF binaries and configuration.
On the server create a share or select an existing share in which to install
LSF. This share will be used by other NT machines to access the LSF binaries
and configuration files.
- Ensure that the lsfadmin account has
'Full Control' access to that share. For example if you created a share
\\serverA\lsf then go into the File Manager and select the 'Share
As' option of the "Disk" Menu. Use the 'Permissions' button to
give lsfadmin 'Full Control' access.
- Step 3.
- Login as lsfadmin and unzip the distribution into a directory.
Note
You must use a version of unzip that supports long
file names.
- Step 4.
- From the distribution directory run lsfsetup.cmd. The lsfsetup.cmd
batch script performs the basic installation of all LSF components. It
does not perform any configuration. To invoke the command, type lsfsetup.cmd
on the command line or double-click on the lsfsetup.cmd icon in
File Manager.
- Use option (1) to install for the first
time.
- The lsfsetup.cmd script will prompt for
LSF_INDEP, LSF_MACHDEP and the cluster name and installs
the appropriate files in those directories. An lsf.conf file is
created in LSF_MACHDEP/etc.
- You must take the following additional steps after
using lsfsetup.cmd:
- Edit the Cluster section in the lsf.shared
file and enter the cluster name specified in the set up.
- Edit the lsf.cluster.cluster file
and add lsfadmin to the ClusterAdmins section.
- Add specific hosts into the Host section
of the lsf.cluster.cluster file.
- The file permissions for the LSF configuration
files and working directories should be set manually after installation
using the File Manager. The following are recommended settings for the
permissions:
LSB_SHAREDIR, LSF_ENVDIR, LSF_LOGDIR
lsfadmin - Full Control (All) (All)
Everyone - Special Access (R) (R)
LSF_BINDIR, LSF_LIBDIR, LSF_SERVERDIR
lsfadmin - Full Control (All) (All)
Everyone - Special Access (RX) (RX)
LSF_CONFDIR, LSB_CONFDIR
lsfadmin - Full Control (All) (All)
- The 'Everyone' group may be replaced by any other
group that you want be able to interact with LSF.
Note
Whenever you are specifying a path in any LSF configuration file,
use forward slash (/) as the path separator, instead of a backslash
(\). LSF treats the backslash as a continuation character. Backslashes
can be used in specifying paths outside of the configuration files.
- Step 5.
- Add LSF_ENVDIR as a system environment variable using the
System dialog under the Control Panel applet. This must be done on each
host that will run LSF. The value for LSF_ENVDIR should be set
using an UNC path. For example, if LSF was installed under the directory
'lsf' on the share \\serverA\tools, then LSF_ENVDIR
should be set as:
LSF_ENVDIR = \\serverA\tools\lsf\etc
- Using the Control Panel requires physically logging
on to each machine. You can avoid this by using the Registry Editor (regedt32.exe)
to add a value to the following key on each host:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Environment
- To add the value LSF_ENVDIR on a host,
use the 'Select Computer...' option under the 'Registry' menu. Activate
the above key and select 'Add Value...' under 'Edit' menu, type in the
value name LSF_ENVDIR, push 'OK', then type in the path for LSF_ENVDIR,
and push 'OK'. To modify the value LSF_ENVDIR, double click its
existing value, then change the string.
- Step 6.
- On each machine that will act as a LSF server, the lssrvman.exe
binary must be installed to run as an NT service. lssrvman.exe
runs under the lsfadmin account created in step
1 and is used to start up LSF daemons.
- To install lssrvman.exe, login under
an account that is in the domain administrator group. Open a DOS window
and type the full UNC path name of lssrvman.exe followed by install
as in the following example:
\\serverA\lsf\etc\lssrvman install
- You will be prompted for a list of host names
on which to install the service. Use the host names entered into the lsf.
cluster.cluster file as
shown above. You will also be prompted for the username and password
of the lsfadmin account created in step
1. Note that the username must be specified as domain\lsfadmin
where domain is your domain name.
- This creates an LSF Service service that
can be viewed from the Services dialog of the Control Panel applet. To
uninstall the LSF Service service, type:
\\serverA\lsf\etc\lssrvman remove
- If you decide to move the location of the lssrvman.exe
binary after installing, you must first remove the service before installing
again. If you are only replacing the lssrvman.exe binary, you
can just stop the service, replace the binary and restart it.
- Step 7.
- After lssrvman.exe has been installed on each machine, it
is possible to manage the LSF daemons from a single NT machine. You can
start up the LSF Service from either the command-line or through
a GUI. See 'Starting LSF Service'.
- Step 8.
- Each user wishing to run jobs under LSF must supply the password for
their domain account to LSF. This is done through the lspasswd.exe
command. It prompts for a password which is stored in the LSF password
database. For example:
D:\>lspasswd
Enter your old password:
Enter your new password:
Repeat your new password:
-- Your password has been changed
- If specifying your password for the first time,
press [ENTER] when asked for your old
password.
- The lspasswd.exe communicates with lssrvman.exe
on the local machine. Therefore this command must be executed on a LSF
server host. The passwords are encrypted before being transferred over
the network and are stored in a hidden file passwd.lsfuser in
LSF_CONFDIR (as defined in lsf.conf). LSF daemons read
the file and decrypt the password before using it to start a job on behalf
of a user.
Note
This means that LSF_CONFDIR must be shared by all
NT hosts.
- Additionally, the privilege 'Logon as a batch
job' must be enabled for each user wishing to run jobs under LSF. The privilege
must be granted on each host where the user will run jobs. To simplify
administration, you can create a global NT user group (for example, LSF
Users) and give the group the 'Logon as a batch job' privilege on each
machine. Then you can simply add users to this group who wish to run batch
jobs through LSF.
Starting LSF Service
Starting LSF Service from Command Line
To start the executable lssrvman.exe,
type the following:
lssrvcntrl start [-m hostname] lssrvman
The -m parameter allows you to start up
the LSF Service on a remote host. If you specify the hostname 'all',
the operation will occur on all hosts listed in the lsf.cluster.cluster
file.
To stop lssrvman.exe, you type:
lssrvcntrl stop [-m hostname] lssrvman
Note
This only stops the LSF Service. It will leave the LSF daemons
(lim, res, sbatchd, and
mbatchd) running.
Starting LSF Service through GUI
The LSF Service can also be started from
the Services dialog box available in the Control Panel applet.
By default, when the LSF Service is started
it will start up the LSF daemons lim, res, and sbatchd.
The sbatchd daemon will start mbatchd on the master host.
If a particular daemon is stopped, you can start
it up again, without having to restart the LSF Service. To do this
use the start up option of lsadmin or badmin. For example:
lsadmin limstartup [hostname]
lsadmin resstartup [hostname]
badmin hstartup [hostname]
will start up lim, res, and sbatchd
respectively on the specified host.
License Management
LSF uses the FLEXlm license management software
from Globetrotter Software. The following steps should be taken to install
a demo or permanent license.
Demo License
To install a demo license copy the license information
into the file license.dat and save it in the LSF_CONFDIR.
Specify the variable LSF_LICENSE_FILE in lsf.conf to
point to the license.dat file. For example:
LSF_LICENSE_FILE=//serverA/lsf/conf/license.dat
Permanent License
To obtain the host IDs necessary to generate a
permanent license, run the following command on each of the license server
hosts:
lmutil lmhostid
The lmutil command is installed in LSF_SERVERDIR
defined in lsf.conf.
The host ID information will be used to generate
the license key information. Send the output of the command to your LSF
vendor.
To install a permanent license, you need to install
and run the FLEXlm License daemon as an NT service on your license server.
The following steps should be followed to accomplish this:
- Step 1.
- Create a directory C:\flexlm on the license server host. Under
the flexlm directory create a bin directory. Copy the
following files from LSF_SERVERDIR into C:\flexlm\bin:
lmgrd.exe, lmgr325a.dll, lsf_ld.exe, install.exe,
and lmutil.exe
- Step 2.
- On the license server, copy the permanent license into the license.dat
file under C:\flexlm . Change the DAEMON line in the
license.dat file to specify the full drive and path of the lsf_ld.exe.
For example:
DAEMON lsf_ld c:/flexlm/bin/lsf_ld.exe
- Step 3.
- Copy C:\flexlm\license.dat into LSF_CONFDIR and define
the variable LSF_LICENSE_FILE in lsf.conf to point to
it. For example:
LSF_LICENSE_FILE=//serverA/lsf/conf/license.dat
- Step 4.
- From a DOS window go into the C:\flexlm\bin directory and
type the following:
c:\flexlm\bin>install c:\flexlm\bin\lmgrd.exe
- This will install the FLEXlm License daemon as
an NT service.
- Step 5.
- Start the License daemon using the Service dialog box available in
the Control Panel applet.
- The lmutil command can be used to interact
with the License daemon. For example,
lmutil lmstat -a -c c:\flexlm\license.dat
- will display the status of the license server.
The log file for the License daemon can found under %SYSTEMROOT\system32.
Mail
When LSF needs to send mail to users, it invokes
the program defined by LSB_MAILPROG in the lsf.conf file.
If LSB_MAILPROG is not defined, no mail is sent. The default installation
sets LSB_MAILPROG to a sample mail client called lsmail.exe
which supports sending mail to a UNIX host by using rsh to invoke
sendmail on the UNIX machine. To support this, lsmail.exe
should be copied to a file corresponding to the name of the UNIX host,
for example:
copy lsmail.exe unixhost.exe
where unixhost is UNIX machine which supports
sendmail. The LSB_MAILPROG should correspond to the unixhost.exe
file. For example:
LSB_MAILPROG=//serverA/tools/lsf/bin/unixhost.exe
Sites may choose to write different mail clients.
See 'LSB_MAILPROG' for
details on how LSB_MAILPROG is invoked.
Environment Variable Handling
LSF transfers most environment variables between
submission and execution hosts. The following environment variables are
overridden based on the values on the execution host:
COMPSPEC
COMPUTERNAME
NTRESKIT
OS2LIBPATH
PROCESSOR_ARCHITECTURE
PROCESSOR_LEVEL
SYSTEMDRIVE
SYSTEMROOT
WINDIR
These must be defined as system environment variables
on the execution host.
If the WINDIR on the submission and execution
host are different then the system PATH variable on the execution
host is used instead of that from the submission host.
Avoid using drive names in environment variables
(especially the %PATH variable) for drives which are connected
over the network. It is preferable to use the UNC form of the path. This
is because drive maps are shared between all users logged on to a particular
machine. For example, if an interactive user has drive F: mapped
to \\serverX\share, then any batch job will also see drive F:
mapped to \\serverX\share. However, drive F: may have
been mapped to a different share on the submission host of the job.
The Job Starter feature can be used to perform
more site-specific handling of environment variables. See 'Job
Starter' and 'Using A Job Starter'
for more details.
Windows NT 4.0
The command shell (cmd.exe) under Windows
NT 4.0 does not support being started from a directory which is specified
as a UNC name. For example, if you type the command:
start /d\\serverA\share\username cmd.exe
cmd.exe will end up starting in the directory
specified by %WINDIR%. As a result jobs submitted from a shared
directory may not start in the correct directory on the execution host.
The command shell from Windows NT 3.51, however, does support this feature.
For LSF to work correctly on NT 4.0 machines, you
should replace cmd.exe with the cmd.exe from NT 3.51.
cmd.exe typically resides in the directory %WINDIR%\system32.
Alternatively, you can copy the NT 3.51 cmd.exe into %WINDIR%\system32\cmd351.exe
and set the LSF_CMD_SHELL variable in lsf.conf to tell
LSF use this shell instead of cmd.exe. For example, put the following
into lsf.conf:
LSF_CMD_SHELL=cmd351.exe
Security Issues
The default authentication of method of LSF is
to use privileged ports. On UNIX, this requires binaries which need to
be authenticated (for example bsub) to made setuid root.
NT does not have the concept of setuid binaries and does not restrict
which binaries can use privileged ports. A security risk can occur if a
user discovers the format of LSF protocol messages and writes a program
which tries to communicate with an LSF server. It is recommended that external
authentication (via eauth) be used where such a security risk
is a concern.
The system environment variable LSF_ENVDIR
is used by LSF to obtain the location of lsf.conf which points
to important configuration files. Any user who can modify system environment
variables can modify LSF_ENVDIR to point to their own configuration
and start up programs under the lsfadmin account.
Once the LSF Service is started, it will
only accept requests from the lsfadmin account. To allow other
users to interact with the LSF Service, you must set up the lsf.sudoers
file under the directory specified by the SYSTEMROOT environment
variable. See 'The lsf.sudoers
File' for the format of the lsf.sudoers file.
Note
Only the LSF_STARTUP_USERS and LSF_STARTUP_PATH
are used on NT. You should ensure that only authorized users modify the
files under the SYSTEMROOT directory.
All external binaries invoked by the LSF daemons
(such as esub, eexec, elim, eauth,
and queue level pre- and post-execution commands) are run under the lsfadmin
account.
Heterogeneous NT/UNIX Environments
If you are running in a mixed UNIX and NT environment,
then you must have a UNIX-style passwd file in LSF_CONFDIR on
the NT side. This passwd file is used to assign user and group
IDs for user's on NT systems so that they are consistent with those defined
on UNIX.
The passwd file can be created by copying
/etc/passwd from the UNIX system, or by issuing the command:
ypcat passwd > passwd
and copying the resulting file onto the NT system.
This assumes that uses have accounts with the same names on both UNIX and
NT systems.
The following points should also be considered
in a heterogeneous UNIX/NT environment:
- In a mixed NT/UNIX cluster, you should ensure
that the LSF master host runs on a UNIX platform. Transfer of the master
after failure from UNIX to NT or NT to UNIX hosts is not supported.
- By default, LSF transfers environment variables
from the submission to the execution host. However, some environment variables
do not make sense when transferred. When submitting a job from NT to a
UNIX machine, the -L option of bsub can be used to reinitialize
the environment variables. If submitting a job from a UNIX machine to an
NT machine, you can set the environment variables explicitly in your job
script. Alternatively the Job Starter feature can be used to reset the
environment variables before starting the job.
LSF automatically resets the PATH on the
execution host if the submission host is of a different type. If the submission
host is NT and the execution host is UNIX, the PATH variable is
set to /bin:/usr/bin:/sbin:/usr/sbin and LSF_BINDIR (if
defined in lsf.conf) is appended to it. If the submission host
is UNIX and the execution host is NT, the PATH variable is set
to the system PATH variable with LSF_BINDIR appended
to it. LSF looks for the presence of the WINDIR variable in the
job's environment to determine whether the job was submitted from an NT
or UNIX host. If WINDIR is present, it is assumed that the submission
host was NT, otherwise the submission host is assumed to be a UNIX machine.
- The lssrvcntrl.exe binary only works
when invoked from an NT machine. You will not be able to start up LSF daemons
on an NT machine from a UNIX machine. The converse is also true: you cannot
start the LSF daemons on a UNIX machine from an NT machine.
- The LSF configuration files have to accessible
from both the NT and UNIX machines. You need to set up a shared file system
between the UNIX and NT machines via NFS client on NT or a SMB server on
UNIX.
Alternatively you can replicate the configuration
files.
Differences between LSF for UNIX and NT
The following are some of the differences between
the UNIX and NT versions of LSF:
- The shell used to invoke commands is cmd.exe
instead of /bin/sh as on UNIX. For example, the queue-level pre
and post-exec commands are invoked as:
cmd.exe /C pre-exec command
- The NULL device on NT is NUL
rather than /dev/null as on UNIX. LSF translates /dev/null
to NUL for NT.
- The /etc directory on UNIX corresponds
to the %SYSTEMROOT directory on NT.
- LSF uses the directory C:\temp on NT
and /tmp on UNIX. The temporary directory used by LSF can be configured
by setting LSF_TMPDIR as a system environment variable.
- There is no native support in NT for UNIX-style
signals. Therefore sending an arbitrary signal to a job via the -s
option of bkill has no meaning on NT. LSF, however, supports the
job control functionality by providing the equivalent of SIGSTOP,
SIGCONT, and SIGTERM to suspend, resume, and terminate
a job. These can be accessed through the commands bstop, bresume,
and bkill.
- The UNIX umask parameter is ignored on
NT
- When inputting commands to bsub, remember
the syntax of the commands must be specified in the form understood by
NT batch files. For example to specify multiple commands in a single line,
use the '&&' as the command separator instead of ';'
as in UNIX. For example, use:
bsub 'cmd1 && cmd2'
- instead of:
bsub 'cmd1; cmd2'
- Also when specifying commands from standard input,
use CNTL-Z to indicate EOF. On UNIX, CNTL-D
is used. For example:
c:\temp> bsub -q simulation
bsub> myjob arg1 arg2
bsub> ^Z
- The tmp index returned by lim,
measures the space on the drive specified by the TEMP system environment
variable.
Miscellaneous
The following are miscellaneous notes on running
LSF on Windows NT:
- Ports used by LSF services should be configured
in via the LSF_LIM_PORT, LSF_RES_PORT, LSB_SBD_PORT,
and LSB_MBD_PORT variables in the lsf.conf file as NT
does not maintain a /etc/services file or currently support NIS.
- When writing an external command that is invoked
by LSF (for example, elim, esub, or eexec),
the command must be a binary executable, that is, elim.exe or
esub.exe. It cannot be a batch file such as elim.bat.
- Always specify LSF_LOGDIR in lsf.conf
so that error messages are logged to a file. The current version does not
support logging to NT's event log.
- LSF_USE_HOSTEQUIV parameter in lsf.conf
is ignored on NT
- Nice values specified at the queue-level through
the NICE parameter are mapped to NT process priority classes as
follows:
- nice>=0 corresponds to an NT priority
class of IDLE
nice<0 corresponds to an NT priority class of NORMAL
- LSF does not support HIGH or REAL-TIME
priority classes.
- The io index shows 0, unless the disk
performance counters are turned on. To turn on disk performance counters,
use the DISKPERF command.
Note
Turning on the performance counters incurs extra overhead in disk
I/O.
- A job which runs under a CPU time limit may exceed
that limit by up to SBD_SLEEP_TIME. This is because sbatchd
periodically checks if the limit has been exceeded. On UNIX systems, the
CPU limit can be enforced by the OS at the process level.
- The UNIX man pages converted to HTML format are
stored in LSF_MANDIR as defined in lsf.conf.
[Contents] [Prev]
[Next] [End]
doc@platform.com
Copyright © 1994-1997 Platform Computing Corporation.
All rights reserved.