NAME

dsh - run a command on a cluster of machines

SYNOPSIS

dsh [-eiqtv] [-f fanout] [-g rungroup1,...,rungroupN] [-l username] [-o porttimeout] [-p portnum] [-w node1,...,nodeN] [-x node1,...,nodeN] [command ...] dsh [-eiqtv] [-f fanout] [-g rungroup1,...,rungroupN] [-l username] [-o porttimeout] [-p portnum] [-w node1,...,nodeN] [-x node1,...,nodeN] -s scriptname [arguments ...]

DESCRIPTION

The dsh utility can be used to run a command, or group of commands on a cluster of machines. All commands are run in parallel, on the cluster. Interrupt signals will be sent to the remote host that is currently being displayed to the user. The following options are available:

-e
Unless the -e option is specified, stderr from remote commands will not be reported to the user.

-f
If the -f option is specified, followed by a number, it sets the fanout size of the cluster. The fanout size is the number of nodes a command will run on in parallel at one time. Thus a 80 node cluster, with a fanout size of 64, would run 64 nodes in parallel, then, when all have finished, it would execute the command on the last 16 nodes. The fanout size defaults to 64. This option overrides the FANOUT environment variable.

-g
If the -g option is specified, followed by a comma separated list of group names, the command will only be run on that group of nodes. A node may be a part of more than one group if desired, however running without the -g option will run the command on the same node as many times as it appears in the file specified by the CLUSTER environment variable. This option is silently ignored if used with the -w option.

-i
The -i option will list information about the current cluster, and command groupings. It will print out the current value of the fanout, and how many groups of machines there are within the cluster. It will also show you which command you are about to run, and your username if specified with the -l option.

-l
If the -l option is specified, followed by a username, the commands will be run under that userid on the remote machines. Consideration must be taken for proper authentication, for this to work.

-o
The -o option is used to set the timeout in seconds to be used when testing remote connections. The default is five seconds.

-p
The -p option can be used to set the port number that testing should occur on when testing remote connections. The default behavior is to guess based on the remote command name.

-q
The -q option does not issue any commands, but displays information about the cluster, and the fanout groupings.

-s
The -s option causes dsh to copy a script to the remote machine, execute it once, and delete it, all in a single operation. The -s option requires a script name, which will be copied to all remote machines and executed. You may also optionally specify any number of additional arguments to the script on the command line. The script will be placed in a temporary directory under /tmp on the remote node, executed, and then the directory will be recursively deleted. Any executable can be used as the script, regardless of programming language. The script is copied with the tar command, preserving permissions of the original. The -s option cannot be used with the standard mode of dsh to run other commands, nor can it be used in interactive mode.

-t
The -t option causes dsh to attempt a connection test to each node prior to attempting to run the remote command. If the test fails for any reason, the remote command will not be attempted. This can be useful when clusterfiles have suffered bitrot and some nodes no longer exist, or might be down for maintenance. The default timeout is 5 seconds. The timeout can be changed with the -o option. dsh will attempt to guess the port number of the remote service based on your RCMD_CMD setting. It knows about ssh and rsh. If dsh fails to guess your port correctly, you may use the -p argument to set the remote port number. If the RCMD_TEST environment variable exists, the testing will automatically take place.

-v
Prints the version of ClusterIt to the stdout, and exits.

-w
If the -w option is specified, followed by a comma delimited list of machine names, the command will be run on each node in the list. Without this option, dsh runs on the nodes listed in the file pointed to by the CLUSTER environment variable.

-x
The -x option can be used to exclude specific nodes from the cluster. The format is the same as the -w option, a comma delimited list of machine names. This option is silently ignored if used with the -w option.

ENVIRONMENT

dsh utilizes the following environment variables.

CLUSTER
Contains a filename, which is a newline separated list of nodes in the cluster.

RCMD_CMD
Command to use to connect to remote machines. The command chosen must be able to connect with no password to the remote host. Defaults to rsh

RCMD_CMD_ARGS
Arguments to pass to the remote shell command. Defaults to none.

RCMD_PORT
The port number used to test remote connections. See the -p flag.

RCMD_TEST
When set, dsh will automatically test all hosts before launching the remote command. See the -t option for more information.

RCMD_TEST_TIMEOUT
The timeout in seconds to use when testing for remote connections.

RCMD_USER
The username to connect to remote machines as by default.

FANOUT
When set, limits the maximum number of concurrent commands sent at once. This can be used to keep from overloading a small host when sending out commands in parallel. Defaults to 64. This environment setting can be overridden by the -f option.

FILES

The file pointed to by the CLUSTER environment variable has the following format:
pollux
castor
GROUP:alpha
rigel
kent
GROUP:sparc
alshain
altair
LUMP:alphasparc
alpha
sparc

This example would have pollux and castor a member of no groups, rigel and kent a member of group 'alpha', and alshain and altair a member of group `sparc'. Note the format of the GROUP command, it is in all capital letters, followed by a colon, and the group name. There can be no spaces following the GROUP command, or in the name of the group.

There is also a LUMP command, which is identical in syntax to the GROUP command. This command allows you to create a named group of groups. Each member of the lump is the name of a group. The LUMP command is terminated by another LUMP or GROUP command, or the EOF marker.

Any line beginning with a `#' symbol denotes a comment field, and the entire line will be ignored. Note that a hash mark placed anywhere other than the first character of a line, will be considered part of a valid hostname or command.

EXAMPLES

The command:
dsh hostname

will display:

pollux: pollux
castor: castor

if the file pointed to by CLUSTER contains:

pollux
castor

The command:

dsh -w hadar,rigel hostname

will display:

hadar:  hadar
rigel:  rigel

The command:

dsh -w hadar,rigel -s /bin/date

Will copy /bin/date to /tmp/dsh.$$ on hadar and rigel and execute it on each node, displaying the date and time on each remote machine, assuming that the /bin/date you copied is a valid binary for the remote end.

DIAGNOSTICS

Exit status is 0 on success, 1 if an error occurs.

SEE ALSO

dshbak(1), pcp(1), pdf(1), prm(1), rsh(1), tar(1), kerberos(3), hosts.equiv(5), rhosts(5)

HISTORY

The dsh command appeared in clusterit 1.0. It is based on the dsh command in IBM PSSP.

AUTHOR

Dsh was written by Tim Rightnour.

BUGS

Solaris 2.5.1 has a maximum of 256 open file descriptors. This means that dsh will fail on a fanout size greater than about 32-40 nodes.