Troubleshooting Xsan volume failover issues
Learn about troubleshooting Xsan volume failover issues.
What is "volume failover"?
If two or more Metadata Controllers (MDC) are configured to host an Xsan volume, in the event that a MDC becomes inactive another MDC should take control of the Xsan volume. The act of a standby MDC assuming active hosting from another MDC is called "failover."
Understanding failover-related communications:
While configuring Xsan, one designates over which subnet Xsan metadata communication will occur. Over this subnet, MDCs communicate failover-related heartbeats. If this heartbeat communication cannot occur due to network conditions or misconfiguration, failover will not occur as expected.
To determine which interface is being used for metadata communications follow these steps:
Xsan 2.0 and later:
Open Xsan Admin and select the Overview Pane. The Metadata Subnet is displayed in the Details section.
Xsan 1.4.2 and earlier:
Open Xsan Admin and select Setup > Computers. Inspect the settings for each MDC. Observe the ‘Access the SAN via’ setting which designates the subnet over which metadata communications will occur.
When may failover occur?
The following events may result in failover of all Xsan volumes:
The active MDC is unreachable by the secondary MDC via the ethernet network used for metadata communications.
The active MDC is rebooted or shutdown.
The following events will result in failover of a specific Xsan volume:
The active MDC cannot see one or more LUNs associated with the volume.
Xsan Admin or cvadmin is used to failover a volume.
Note: Following a failover event, a volume does not automatically failover back to the MDC that was hosting the volume previous to the failover.
Use these suggestions to troubleshoot failover-related issues.
Verify network paths
Use the ping command to verify that there is a network path between each MDC on the metadata ethernet network. Ensure there are no firewall rules preventing communication between MDCs.
Verify metadata controller fsnameservers configuration
Failover will not occur as expected if MDCs are configured to communicate metadata over different subnets. To verify that the MDCs are communicating via the proper network interface, on each MDC inspect the fsnameservers file located in /Library/Filesystems/Xsan/config. The fsnameservers file should be configured with the IP Address of the interface used to communicate metadata for each MDC.
For example, if two MDCs were intended to communicate metadata information over 10.0.0.0/24 subnet, then each MDCs fsnameservers file would correctly contain the following example entries:
10.0.0.1
10.0.0.2
An incorrect example would be:
10.0.0.1
192.168.2.2
If the fsnameservers file is incorrectly configured the issue may be corrected by doing the following:
Xsan 2.0 and later: Open Xsan Admin and select the Computers Pane. From the list of computers, choose the computer whose IP Address is incorrectly listed in the fsnameservers file. Click the Action button and select ‘Remove Computer from SAN". Once removed, click the + button to add the computer back to the SAN. Once the computer has been added back to the SAN, inspect each MDC’s fsnameservers file to verify the contents have been updated to contain the appropriate information.
Xsan 1.4.2 and earlier: Open Xsan Admin and choose Setup > Computers. Inspect the settings for each MDC. Configure the "Access the SAN via" setting so that each MDC is communicating Metadata via the subnet reserved for metadata communications. Once the configuration has been corrected, inspect each MDC’s fsnameservers file to verify the contents have been updated to contain the appropriate information.
Note: If an MDC’s metadata network interface’s IP Address is changed, the MDC’s fsnameservers file is not automatically updated to reflect the new IP Address. For more information about changing the IP address of an MDC, see "Changing a Controller's IP Address" in the Xsan Administrator Guide.
Ensure that ‘fsm’ is running on each MDC
In order for an Xsan volume to failover, standby MDCs must be running an instance of the fsm process for each Xsan volume. Verify that each Xsan volume’s fsm process is running on each MDC.
Xsan Admin 2.1 and later
Open Xsan Admin.
Click the Volumes pane.
Inspect each volume for a yellow exclamation mark.
If a volume has an exclamation mark on it, then at least one MDC is not running the fsm process associated with that volume. Follow the suggestions in this article.
Xsan Admin 2.0 and earlier
Execute the following command in Terminal:
If running Mac OS X 10.4.x or earlier:
ps -auxwww | grep -i YourVolumeName
If running Mac OS X 10.5 or later:
ps auxwww | grep -i YourVolumeName
Replace YourVolumeName with the name of your Xsan volume.
This example output reflects that the fsm associated with MyXsanVolume is running:
root 67967 0.0 2.6 289876 54248 ?? Ss 9:00AM 0:32.14 /Library/Filesystems/Xsan/bin/fsm MyXsanVolume MyHost.domain.com 0
If the fsm process associated with an Xsan volume is not running, reboot the MDC. Afterwards, verify that the process is running.
Testing failover
Note: Testing failover may result in a momentary disruption in volume availability. Avoid possible disruptions to production by testing failover during off-production hours.
Xsan 2.0 and later:
Open Xsan Admin and select the Volumes pane. Observe the Hosted By value for the Xsan Volume you wish to test.
Click the action button and select Force Failover.
Observe the Hosted By value for the Xsan Volume which was failed-over. If the failover event was successful, the value should become the name of another MDC.
Xsan 1.4.2 and earlier:
Observe which MDC is controlling the Xsan volume in Xsan Admin by selecting the volume from the SAN Components sidebar and observe the Hosted by value.
To initiate a volume failover, issue the following command in the Terminal:
sudo cvadmin -e 'fail YourVolumeName'
Replace YourVolumeName with the name of the Xsan volume you wish to failover.
Observe the Hosted By value for the Xsan Volume which was failed-over. If the failover event was successful, the value should become the name of another MDC.
Learn more
See this article.