StorNext labels gone? DON’T panic!

Topic

In SAN environments labels are often used to address a specific LUN, rather than using the device identifiers or numbers, to simplify the addressing of devices. Plus, a device does not have to be tied to a unique device number/id.

Losing a label due to a crash, or a defective hardware part or simply because a client overwrote the label, leaves the administrator in a quite scary und definitely uncomfortable situation. Data can’t be accessed or could even be lost.

 

Situation

If you haven’t experienced a lost label from a specific LUN in a StorNext environment yourself, you have at least heard of stories where the label magically disappeared. The result is that no client can mount and therefore has no access to the volume with the missing LUN. No access to the data…? Where is the data…??? NO, don’t do this this to me! PANIC…?!! Or not…?

Troubleshoot

For starters, you have to realize that not the LUN is missing but the label – THAT is the root cause. Missing labels will cause Windows and OS/X clients to drop the connection to the affected file system, although the LUN is actually still present. While Linux cashes the labels and continues to work until the next reboot, it may lead to confusion. To be sure your problem is actually caused by a missing label, you can either reboot the Linux client or rescan the SCSI bus (i.e. scsi-rescan – –forcerescan will go out and rescan the fabric and drop cached LUNs – and therefore labels).

Having this cache purged, the next best command is “cvlabel”. Cvlabel has many options but in my experience the most valuable option is “-c” to provide information about the labels, device numbers and the serial numbers per LUN. Yes, the –L (long) option will give you serial numbers as well but the output of –c keeps the label name in the first row which simplifies the maintenance.

If the count of LUN’s shows the number you have expected, you can confirm that the issue is caused by a missing label, instead of a hardware issue (e.g. dead SFP, broken wire, etc). The amount of LUNs for the affected file system is based on the StorNext file system configuration file. While it was simple to count the LUNs based on the ASCII configuration, the XML based configuration is a bit trickier. To identify the missing label look for the keyword CvfsDisk_UNKNOWN in the output.

[root #12] cvlabel –c
data01 /dev/sdc 4678727647 EFI   # host 0 lun 0 sectors 4678727647 sector_size 512 inquiry [LSI     MR9260-16i     2.13] serial 600062B2005322C018CCC98B128FE95C
CvfsDisk_UNKNOWN /dev/sde 4678727647 EFI   # host 0 lun 0 sectors 4678727647 sector_size 512 inquiry [LSI     MR9260-16i     2.13] serial 600062B2005322C018CCC98B128FE38E
CvfsDisk_UNKNOWN /dev/sdf 4678727647 EFI   # host 0 lun 0 sectors 4678727647 sector_size 512 inquiry [LSI     MR9260-16i     2.13] serial 600062B2005322C018CCC98B128FE32C
CvfsDisk_UNKNOWN /dev/sdg 4678727647 EFI   # host 0 lun 0 sectors 4678727647 sector_size 512 inquiry [LSI     MR9260-16i     2.13] serial 600062B2005322C018CCC98B128FE3D1
meta /dev/sdb 217823199 EFI   # host 0 lun 0 sectors 217823199 sector_size 512 inquiry [LSI     MR9260-16i     2.13] serial 600062B2004DA2001938E48F0D6DCD5D

Solution

There have 3 options to reapply the label to a LUN(s):

Option 1: Lucky

If you’re lucky, there is a fairly recent output of cvlabel available that includes a record of the lost label. Simply compare the serial number from the current cvlabel –c output with the archived list from before and reapply the label. How to easily label a LUN will be explained further down.

3 COMMENTS

  1. Ran Pergamin September 13, 2015 at 11:39 pm Reply

    Great article Roger.

    Couple of important comments, I learned while recovering 2 x 300TB file systems last week:

    1. When using nssdbg.log file ensure that you take the latest (date/time) list of labels, cause there may be old relabeled devices. I grep-ed the relevant lines, imported to xls and took the latest.

    2. Labels are loaded on boot. If you have a client/MDCs mounted you can run cvadmin->disk and see the labels and paths. While this doesn’t have serials, if your paths are not changed, it can be another method to recover or at least verify.

    3. Another mean is to use the file system configuration file that has the labels in it. Again, no serials, but if you did name your SAN Luns to match your snfs labels (always good practice) this can really help you recover.

    • rbeck September 14, 2015 at 8:01 pm Reply

      Thanks for you reply Ran, all good points.

      1. That’s a vary good point to look for the newest entries in the log. Haven’t mentioned that in article.

      2, Cvadmin shows you the labels indeed but I think it’s not the issue that you don’t know the label name rather than not knowing which device had one. I see your point referring to the device and the label name here.

      3. The configuration file should be consulted every time especially if you use labels names which do not really carry the file system name. I.e. I prefer to use label names like “san1_data00” and label names like “lun0, lun1 etc” aren’t helpful at all.

  2. Lance Gropper November 19, 2015 at 5:53 pm Reply

    Hello Rogert:

    What about a situation where the MDC sees the labels, but a Linux client does not?

    Lance

Leave a Reply