Hi all:
Thanks in advance for taking the time to read my thread.
Here is the background:
- We have a Dell server running Server 2008 R2 SP1 connected to a Winchester Systems SAN via dual port HBA to Channel 0 on Storage Processor A & Channel 0 on Storage Processor B.
- We have the vendors drivers installed and the MPIO feature installed.
- The drives properly appear as Multi-Path Disk Devices within Device Manager | Disk Drives
- MPIO is configured for Fail Over Only
- We can successfully expose the LUNS to the server over either *individual* channel with no issues accessing the storage (effectively confirming both paths are functioning properly when used independantly)
- This server has failover clustering installed as part of Exchange 2010 DAGs
Reproducable Issue:
- While CH:0 on SP:A is initially exposed, we expose CH:0 on SP:B
- We reboot for device manager to pick up the changes
- We go into Device Manager | Multi-Path Disk Device Properties and click on the MPIO tab or run "mpclaim -s -d" from the CLI
- The server Blue screens and crashes -- each and every single time.
- I've tried working with our storage vendor and they believe it is a microsoft issue.
Here is a windb of MEMORY.DMP
*********************************************
*
*
* Bugcheck Analysis
*
*
*
*********************************************
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000014, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff880010771c2, address which referenced memory
FOLLOWUP_IP:
msdsm!DsmpQueryLoadBalancePolicy+232
fffff880`010771c2 8b4814 mov ecx,dword ptr [rax+14h]
SYMBOL_STACK_INDEX: 3
SYMBOL_NAME: msdsm!DsmpQueryLoadBalancePolicy+232
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: msdsm
IMAGE_NAME: msdsm.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 4ce7a476
FAILURE_BUCKET_ID: X64_0xD1_msdsm!DsmpQueryLoadBalancePolicy+232
BUCKET_ID: X64_0xD1_msdsm!DsmpQueryLoadBalancePolicy+232
0: kd> lmvm msdsm
start end module name
fffff880`01060000 fffff880`01086000 msdsm (pdb symbols) d:\symbols\msdsm.pdb\E4D203DABED04CC8A14C0F3894E777D11\msdsm.pdb
Loaded symbol image file: msdsm.sys
Image path: \SystemRoot\system32\DRIVERS\msdsm.sys
Image name: msdsm.sys
Timestamp: Sat Nov 20 05:35:34 2010 (4CE7A476)
CheckSum: 000251CF
ImageSize: 00026000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
--------------------------------------------------------------------------------
It should be noted that we have tested if Fail-Over happens at all when both storage processors are exposed by physically pulling the fibre connected to the HBA. I can confirm that failover occurs instantly and successfully, however we can never confirm access the MPIO tab.
I've have googled around for this issue and have come across a few KB articles that try to address this, but they have not helped:
Article ID: 2277440 - not applicable as we are running R2 SP1
Article ID: 981379 - did not help
Many other existing articles did not directly address our issue since so they were not installed.
Thanks again for any assistance offered.