hostsup.html

Generating the Hostsup File

The first thing you need to do after performing a base install of linux on the cluster, is generate a list of hostnames that represents those machines which:

  1. Succesfully completed base installation.
  2. Are currently online and responding.

This hostname list will be consulted every time we want to execute a command across all nodes of the cluster.

Usually by putting

if [ -z $HOSTNAMES ]; then HOSTNAMES=`cut -d' ' -f2 /root/hostsup` fi

at the beginning of our shell scripts. This lets us use the same script for addressing all of the machines at once as well as addressing a subset.

HOSTNAMES=slave006 bin/rsync.passwords

for example will rsync the password lists to machine slave006 only.

Generation:

My preferred way of generating the hostsup file is to use the nmap command from master.

	nmap -sP 192.168.2.0/24 | sed -n -e '/be up/p' | grep -v master > hostsup
or
	nmap -sP 192.168.2.0/24 | grep 'be up' | grep -v master > hostsup

Both of those command lines give identical ouput. Choose your favourite.

Nmap pings all hosts on the 192.168.2.0 subnet, and the ouput is piped through sed or grep to list only those machines which are "up", "master" is removed from the list which finally gets written to a file named "hostsup". You should copy this file to where your shell scripts expect to find it. I use /root/hostsup and that is where all the cluster maintenance scripts I wrote expect to find it. I am sure there is a better way to do this.

If we don't remove the $master-server, then we will end up treating it like just another slave and we can expect random things to start breaking.

Cluster Handling Scripts

if [ -z $HOSTNAMES ]; then HOSTNAMES=`cut -d' ' -f2 /root/hostsup` fi

Putting that stanza at the beginning a shell scripts lets us use refer to a subset of machines on the command line, or if no subset is given, the full list from the /root/hostsup file will be used.

TODO: The hostsup list should also take into account the link status in $TFTPROOT/$IP since it may be the case that a machine was hung during base install and is pinging alive.

Next: Preventing ssh from Prompting