Xsan 2.1.1: Script to reboot Intel-based Xserve MDCs after failover
View the script to reboot Intel-based Xserve MDCs after failover.
This script in this article immediately reboots an Xsan Metadata Controller (MDC) that has lost control of an Xsan volume after an Xsan volume failover. If a failover occurs for any reason, the script will be executed by the MDC that takes control of the volume. The script uses Lights Out Management (LOM) to send a reboot signal to the previously active MDC. This action ensures that the Xsan volume’s metadata cannot be modified by the previously active MDC after a failover.
Xsan includes multiple safeguards to avoid a situation where an Xsan volume is active on more than one MDC. This script is not required. It is provided as an example for sites where an additional hardware-based method is desired to prevent this unlikely scenario.
Use of this script is optional and at your own risk. Verify system requirements before installing. Apple does not provide support for modifications of this script.
Notes
This script is designed for Xsans with two MDCs running Xsan 2.1.1. Do not deploy this script if your Xsan volume has more than two MDCs.
Both MDCs must be Intel-based Xserves because of the LOM commands this script uses to send the reboot signal.
This script can be used in Xsans with more than one volume. Each volume should be configured to run on both MDCs with the same failover priority.
It is recommended that only Xsan and Open Directory services be hosted on Xsan MDCs. If Open Directory services are running on the MDCs, it is important that an Open Directory Replica be available at all times. This will ensure continual availability of Open Directory services should an MDC be rebooted by this script.
Important: The MDC’s internal hard drives should be formatted as Mac OS Extended with journaling enabled.
Installing the script
Perform these steps on both MDCs:
Use the steps described in this article to configure the LOM Addresses.
Create a LOM password file by executing this command:
sudo sh -c "echo PASSWORD > /private/var/root/Other_MDCs_LOM_Password"
Replace ‘PASSWORD’ with the LOM administrator’s password on the other MDC.
Restrict access to the LOM password file by executing the following commands in Terminal:
sudo chmod 400 /private/var/root/Other_MDCs_LOM_Password
sudo chown root:wheel /private/var/root/Other_MDCs_LOM_Password
Back up the original script by executing this Terminal command:
sudo mv /Library/Filesystems/Xsan/bin/cvfail /Library/Filesystems/Xsan/bin/cvfail.bak
To create the new script, copy the text below beginning with the line “
#!/bin/sh
” to the “# end script
” line. Paste the text into a new plain text document in TextEdit, using these guidelines.#!/bin/sh
# cvfail
# This script is intended for use in Xsan 2.1.1.
# This script may be replaced by a future software update.
# -------------- edit the variables below this line ----------------
# For more information on these settings: http://support.apple.com/kb/HT3620
# Set reset_enabled to 'yes' to reset the other MDC upon failover.
# Set to 'no' to disable reset on both MDCs before performing maintenance.
reset_enabled='no'
# IP address of other MDC's Lights Out Management (LOM) interface
lom_ip='Other_MDCs_LOM_IP_address'
# LOM admin user on other MDC
lom_username='Other_MDCs_LOM_admin_username'
# LOM password for other MDC is stored in this file
lom_password_file='/private/var/root/Other_MDCs_LOM_Password'
# name of SAN
san_name='My Xsan'
# ------------------ do not edit below this line ------------------
hostname="$1"
fsm_port="$2"
fs_name="$3"
last_reset='/private/tmp/.cvfail'
reset_interval=15
sendNotification() {
# Note: notification method is subject to change in future versions
if [ ! "$subject" ]; then
command="xsan:command = sendFailover
xsan:hostname = $hostname
xsan:volume = $fs_name"
else
command="xsan:command = sendNotification
xsan:messageSubject = $subject
xsan:messageBody = $body
xsan:messageType = failover"
fi
echo "$command" | /usr/sbin/serveradmin command &
}
messageBody() {
input=`echo $1 | /usr/bin/tr -d '\n\r' | /usr/bin/sed "s/\"//g; s/\'//g"`
if [ "$body" ]; then
body="$body $input"
else
body="$input"
fi
}
# do not reset the other MDC when in maintenance mode
if [ "`echo $reset_enabled | /usr/bin/awk '{print tolower}'`" != 'yes' ]; then
echo "cvfail $fs_name: Maintenance mode. MDC will not be reset."
sendNotification
exit 0
fi
# do not reset the other MDC if it has already been reset within reset interval
if [ -f "$last_reset" ]; then
eval $(/usr/bin/stat -s "$last_reset")
time_since_last_reset=$(($(/bin/date +%s) - $st_ctime))
if [ $time_since_last_reset -le $reset_interval ]; then
echo "cvfail $fs_name: MDC already reset. Will not reset again."
sendNotification
exit 0
fi
fi
# check the password file
if [ ! -r "$lom_password_file" ]; then
echo "cvfail $fs_name: $lom_password_file: Cannot read file or file does not exist."
subject="$san_name: Volume $fs_name did not fail over"
messageBody "The failover script for the volume $fs_name in $san_name"
messageBody "did not complete successfully on $hostname because"
messageBody "the password file ($lom_password_file) could not be read or does not exist."
sendNotification
exit 1
fi
# reset the other MDC
echo "cvfail $fs_name: Sending reset command as '$lom_username' to LOM IP: $lom_ip"
ipmitool_output=`/usr/bin/ipmitool -l lan -U "$lom_username" \
-f "$lom_password_file" -H "$lom_ip" chassis power reset 2>&1`
# send the appropriate notification
if [ $? -eq 0 ]; then
echo "cvfail $fs_name: MDC reset succeeded."
/usr/bin/touch "$last_reset"
sendNotification
exit 0
else
echo "cvfail $fs_name: MDC reset failed. ipmitool: $ipmitool_output"
subject="$san_name: Volume $fs_name did not fail over"
messageBody "The failover script for the volume $fs_name in $san_name"
messageBody "did not complete successfully on $hostname because"
messageBody "an ipmitool error occurred: $ipmitool_output"
sendNotification
exit 1
fi
# end script
Edit the variables in the script between the lines containing “edit the variables below this line” and “do not edit below this line”. These values are highlighted yellow in the text above, but should appear unformatted in Plain Text script. Remember that the values entered here pertain to the other MDC. Maintain the single quote characters around each value.
reset_enabled='yes'
Leave this value set to yes on both MDCs during normal operation.
Important: To prevent the script from rebooting the other MDC during planned maintenance, change this value to no on both MDCs before performing maintenance. Planned maintenance tasks include running Software Update on either MDC, starting a volume, changing volume settings, or forcing failover.
san_name='My Xsan'
Replace My Xsan with the name of the SAN found by selecting Overview from the left-hand column in Xsan Admin and observing the Name value.
Save the new script in the following location:
/Library/Filesystems/Xsan/bin/cvfail
Execute the following commands in Terminal:
sudo chmod 544 /Library/Filesystems/Xsan/bin/cvfail
sudo chown root:wheel /Library/Filesystems/Xsan/bin/cvfail
Learn more
Testing the scripts
Once the scripts have been deployed on both MDCs, you can test their functionality by taking the following steps.
Open Xsan Admin and select Volumes under SAN Assets. Observe the Hosted By value for the Xsan Volume you wish to test.
Click the action button and select Force Failover.
Note: If you are running Xsan Admin on the MDC that was running the volume, the system will reboot immediately. If you are running Xsan Admin on another Mac, Xsan Admin may become unresponsive after the Force Failover command. If Xsan Admin’s “Failing-over volume” progress dialog does not close after one minute, force Xsan Admin to quit.
Reconnect to your SAN in Xsan Admin, if needed. Observe the Hosted By value for the Xsan Volume which was failed-over. If the failover event was successful, the value should become the name of the other MDC.
Note: Testing failover may result in a momentary disruption of Xsan volume availability. Avoid possible disruptions to production by testing failover during non-production hours.