Storage alert bug running vSphere on HP BL465c G7 blades

I have recently been configuring some BL465c G7 blades at work running vSphere 5.0 Update 1 installed on an internal SD card and using FC storage for the datastores. I ran into a strange issue where from within the vSphere client my blade hosts would show as having alerts after running HP’s October firmware bundle containing the BIOS version A19 15/8/12.

Hosts showing alert status

Hosts showing alert status

After some investigation I found that they had storage faults, specifically the following alarm:

Alert Detail

Alert Detail

Then when drilling into the Hardware Status tab (with the HP offline bundle installed) it shows that the storage controller has drive faults or missing disks as seen below.

Storage status showing failed drives

Storage status showing failed drives

This was really weird as these particular blades were ordered without the integrated P410 controller as we weren’t planning on using local disk. Weird…

Anyway, I spent a while trying different things to clear the alerts without any luck. Diving into the iLO from the System Information summary, the storage tab shows the same error:

iLO storage summary

iLO storage summary – BAD

But after a few minutes and a page refresh, the status clears and looks as expected:

iLO storage summary - OK

iLO storage summary – OK

This continues to flip-flop between good and bad every minute or so. WHAT???

I decided to roll back the BIOS and iLO firmware (as well as the iLO which made no difference) and what was interesting is that when going back to BIOS version A19 19/3/12, the iLO status still shows the same problem as above, but the CIM provider information within vSphere no longer shows the storage controller…because it DOESN’T HAVE ONE! :-). See the screenshot below:

Older BIOS doesn't present storage controller CIM information

Older BIOS doesn’t present storage controller CIM information

I logged a job with HP and after working through several diagnostics with them they came to the conclusion that this was definitely a bug and would be addressed in a future BIOS update.

So for those of you out there that have been pulling your hair out like I have, there is a bug and there is no immediate fix other than rolling back to BIOS rev. A19 19/3/12 or earlier. Either that or you have to live with your hosts continually showing alerts :-) NOTE: Rolling back the BIOS only masks the alert from coming through into the vSphere client and still shows up in the iLO status summary. However, the alert doesn’t appear to generate any IML event logs…it also does not show up in HP SIM either.

This bug only appears to affect blades that don’t have the optional P410 controller installed and I only have BL465c G7 blades to test this on. It may also affect BL465/460c G7 and earlier where the controller is optional.

UPDATE: I have been told by HP that the bug is caused by the disk drive backplane being active even when the controller isn’t present and they also suggest that it can be observed with any BIOS/iLO combination. I have also found that some blades seem fine with just the BIOS rollback while some still bring the storage controller status back into vSphere. For these odd few a rollback to iLO2 1.28 seems to fix the problem, hence I am making this my baseline for now.

UPDATE – October 2 2013

After stumbling across a few updates to the iLO 3 firmware I noticed that v1.57 specifically mentions the following fix:

  • Incorrect Drive status shown in the iLO 3 GUI when the HP Smart Array P410 controller is removed from the ProLiant c-Class Blade BL465c G7.

However, after testing this new firmware the same problem exists and is also present on the latest v1.61 firmware. What is interesting to see is that the error is slightly different in that while the drives flip-flop between “not installed” and “Fault”, the number of drive bays does not change now. Now the number of drives is always correctly shown as two…I guess progress is progress right??? :-P

I’ll open a new case with HP and hopefully find a fix for this hugely annoying bug!!!

About these ads

About Ben Loveday
My name is Ben Loveday and I am working as a Systems Architect in New Zealand. I have a keen interest in VMware products and are VCAP5-DCD, VCAP5-DCA and Microsoft MCITP certified. I am studying towards VCDX5 certification…I hope! My areas of focus are the virtualisation of manufacturing automation systems with the aim to improve traditional automation/SCADA system design and improve their availability and reliability. I am married with three kids and my hobbies include playing the guitar (less often than I’d like) and listening to music, mostly 80′s Metal and Rock/Blues :-) Oh..and I'm a PC gamer!

16 Responses to Storage alert bug running vSphere on HP BL465c G7 blades

  1. stacycarter says:

    Hey Ben, do you know if they ever came out with a fix for this? Seeing the same thing on my BL465c blades. It seems that HP ESXi Offline Bundle 1.4.5 does not fix this….

  2. Pingback: That Day An Entire HP C7000 Enclosure Got Hit With a Nasty Bug | virtualstace.com

  3. stacycarter says:

    Thanks, Ben. Just opened a new case for this issue as well! Let me know if you get an answer for this one :-)

  4. Pingback: Advisory: HP Proliant BL465c G7 – Storage Fault with P410 removed | Ben's Jibber Jabber

  5. Matt says:

    Same here, v1.61 did nothing to fix the issue. Something extra strange to note on my issue is that I have a large batch of these servers where some of them have the issue and some don’t no matter what the BIOS or iLO versions are at. The servers should be identical but they obviously are not. It’s quite frustrating and I can’t find out what is different between two “identical” BL465 G7 servers.

  6. Matt says:

    Ahh, I just saw the Advisory you posted. I’m going to give that a try. Maybe that is the single difference between all of these servers.

  7. Kevin says:

    I am experiencing this issue as well. Have spent hours trying various BIOS / ILO / bundle versions without success. Just one blade in my 16 blade enclosure has the issue.

    • Ben Loveday says:

      Hi Kevin,
      it’s a strange issue alright! I’ve been testing a blade with the SAS cable removed as per the advisory but so far I’m still getting the alerts!
      I’ll post an update in the coming weeks :-)
      Cheers,
      Ben

      • stacycarter says:

        FYI – We had to remove the SAS cable and the backplane power cable from each BL465 blade to stop the alerts If we only removed the SAS cable, we still received alerts. I’ve also pushed with HP to have a fix for this included in the next iLO firmware version.

  8. Ben Loveday says:

    Cool, thanks Stacy, that’s good to know. I’ll try that! Yes, it would be much easier if they just fixed the problem :-P

  9. Kevin says:

    I have removed the backplane power cable as Stacy has described. It has produced the desired result. ILO System Information for the blade now shows “Drive information not available”, and alerts are not generated in vCenter, as there is no hard drive listed under the Hardware Status tab. Thanks very much Stacy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 213 other followers

%d bloggers like this: