A grueling couple of days of running the MS Exchange 2010 Jetstress tool and watching it fail for the read latency tests. This was for 4 mailbox servers and 12 databases with one passive copy in the DAG. The active copy disk lives on an EMC CX4 SAN and the passive lives on an EMC CX3 SAN. We were seeing failures on both sides, but only the first 3 of the six databases on each server. Very odd.
DB San Disk: RAID 5 (both active and passive)
Log San Disk: RAID 10 (both active and passive)
We first thought we needed to break up the 12 databases to seperate LUNs inside ESXi rather than having one giant disk in ESXi and then carving it out inside Windows 2008 R2.
This was NOT the case.
Solution:
When creating disks for your databases and log volumes inside ESX make sure that you use the new virtual iSCI adapter to split the data traffic I/O. Meaning when you create a new disk and hook it to the LUNs that you already got simply select a new iSCSI port number. I I alternated between controller 1 and 2 keeping my OS on controller 0.
Example:
MBDB01 was put on SCSI controller: 1:0
MBDB01Log was put on SCSI controller: 1:1
MBDB02 was put on SCSI controller: 2:0
MBDB02Log was put on SCSI controller: 2:1
MBDB03 was put on SCSI controller: 1:2
MBDB03Log was put on SCSI controller: 1:3
Even though the physical part of this whole thing is that everything is traveling through the same fiber channel, the guest OS doesn’t know that and actually builds new scsci controller hardware for you for each new controller you setup.
Jetstress now passes with flying colors on all fronts.
We did the same JetStress test on a ESX 4.1 connected to CX4 via FC and we used the default LSI Logic SAS inside the VM.
The results are like yours, very, very odd.
Two out of four DBs (every on a separated RDM LUN, with one passive copy) were OK, the other two were failed due to
latencies higher than 20ms , the last one even much higher than 20ms!
Using PVSCSI the test was OK, but we could only use max. 5 LUNs, more than 5 causes JetStress to fail with I/O device error.
Yesterday I did one test with 3 LSI SAS adaptors inside the JetStress-VM, and this test was „PASS“.
Like you said, this is very strange since there is still only one HBA on ESX talking SCSI to CX4!