Xsan: Troubleshooting "Disk Stripe Group DOWN for this client" errors

Products Affected

Xsan 1.4, Xsan 2, Xsan 2.1, Xsan 2.2

Symptoms

  1. An Xsan metadata controller's system log may show repeating messages similar to the following:
    Aug 25 08:37:12 mdc1 fsm[123]: Xsan FSS 'MyVolume[0]': [Node 42] Disk Stripe Group 1 is DOWN for this client. # disks 2 unitmap[1] 0xfffff partaccess 0x1
  2. Xsan Admin may report that there are no visible LUNs for an Xsan client.
  3. Attempting to mount a volume in Xsan Admin may not work, and this alert may appear: "Not all data LUNs of the volume are visible to this computer. Check the Fibre Channel cables and try again."

Resolution

Resolving LUN visibility issues

If you know which Xsan system is affected, follow these steps:

  1. Restart the affected computer.
  2. Check the configuration of the Fibre Channel switch to be sure the SAN components are in the same Fibre Channel zone.
  3. See this article.

Identifying affected systems

If you are not sure which Xsan system is affected, check for computers with errors in Xsan Admin's computers pane. Alternatively, you can refer to the volume log in Xsan Admin, or by reading the file at this path:

/Library/Filesystems/Xsan/data/Volume_Name/log/cvlog

In the volume log, look for the most recent occurrence of the "Disk Stripe Group DOWN" alert message. Then, look for a log entry above this line which has the same timestamp, same node number, and contains the words "Client Login". This line tells you the IP address or hostname of the affected system. You should see output such as this:

[0825 08:37:12] 0xabcdef01 (Info) Node [42] [xsanclient1.example.com:55045] Client Login (active 3).
[0825 08:37:12] 0xabcdef01 (Warning) [Node 42] Disk Stripe Group 1 is DOWN for this client. # disks 2 unitmap[1] 0xfffff partaccess 0x1


From these log messages, you can see that the affected system has a hostname of "xsanclient1.example.com".

Once you know which system is affected, follow the steps in the "Resolving LUN visibility issues" section above.

Additional Information

Fibre Channel issues can prevent an Xsan client from seeing LUNs. If the client can't see all of an Xsan volume's data LUNs, the client will be unable to mount the Xsan volume. Each time the client tries to mount the volume, the Xsan volume log will show a "Client Login" line with a new node number followed by one or more "Disk Stripe Group n is DOWN" messages. The number "n" indicates which stripe group has missing LUN(s).

Not helpful Somewhat helpful Helpful Very helpful Solved my problem