You’re about to enter a world where creating a Virtual Machine hot-clone is done faster than powering it off. My former Capgemini colleagues, Ernst Cozijnsen and John van der Sluis recently implemented EMC PowerPath/VE, here's their story.
It took the guys in storage land a long time to deliver.... But finally it's there.... A really great kick-ass plug-in to boost your vSphere 4 storage performance through the roof.
In prior versions of ESX the Native Multi Pathing “NMP” plug-in was available for balancing the storage load over different Fiber Channel HBA’s and storage paths to your storage array(s). Beside that it’s not really “Multi Pathing” it had another major disadvantage of being able to stress your storage array in such a way it could crash. (Yes.. we know how it works and yes… we succeeded in this). This crashing didn’t had much to do with ESX but more with how the storage arrays handle the different request coming in from the FA port and distributing the load across the storage processors inside the box.
If commands for e.g. LUN-A come in via 2 different FA ports on the array which all have their own storage processor, there needs to be a lot of “inter communication” between the storage CPU’s inside the box. For a normal environment this is no issue but when you start to stretch the limit this can and will cause major concerns. Therefore I have written this script.
This script is run at boot time from rc.local which makes sure that all the ESX hosts in your environment will send their storage I/O via the same path to your storage box. The Storage CPU “inter communication” is there for kept to a minimum.
Disk vmhba2:1:4 /dev/sdh (512000MB) has 4 paths and policy of Fixed
FC 16:0.1 50060b0000646c8a<->50060e8004f2e812 vmhba2:1:4 On active preferred
FC 16:0.1 50060b0000646c8a<->50060e8004f2e873 vmhba2:2:4 On
FC 19:0.1 50060b0000646062<->50060e8004f2e802 vmhba4:1:4 On
FC 19:0.1 50060b0000646062<->50060e8004f2e863 vmhba4:2:4 On
Disk vmhba2:1:5 /dev/sdi (512000MB) has 4 paths and policy of Fixed
FC 16:0.1 50060b0000646c8a<->50060e8004f2e812 vmhba2:1:5 On
FC 16:0.1 50060b0000646c8a<->50060e8004f2e873 vmhba2:2:5 On active preferred
FC 19:0.1 50060b0000646062<->50060e8004f2e802 vmhba4:1:5 On
FC 19:0.1 50060b0000646062<->50060e8004f2e863 vmhba4:2:5 On
Disk vmhba2:1:6 /dev/sdj (307200MB) has 4 paths and policy of Fixed
FC 16:0.1 50060b0000646c8a<->50060e8004f2e812 vmhba2:1:6 On
FC 16:0.1 50060b0000646c8a<->50060e8004f2e873 vmhba2:2:6 On
FC 19:0.1 50060b0000646062<->50060e8004f2e802 vmhba4:1:6 On active preferred
FC 19:0.1 50060b0000646062<->50060e8004f2e863 vmhba4:2:6 On
Disk vmhba2:1:7 /dev/sdk (307200MB) has 4 paths and policy of Fixed
FC 16:0.1 50060b0000646c8a<->50060e8004f2e812 vmhba2:1:7 On
FC 16:0.1 50060b0000646c8a<->50060e8004f2e873 vmhba2:2:7 On
FC 19:0.1 50060b0000646062<->50060e8004f2e802 vmhba4:1:7 On
FC 19:0.1 50060b0000646062<->50060e8004f2e863 vmhba4:2:7 On active preferred
Disk vmhba2:1:8 /dev/sdl (512000MB) has 4 paths and policy of Fixed
FC 16:0.1 50060b0000646c8a<->50060e8004f2e812 vmhba2:1:8 On active preferred
FC 16:0.1 50060b0000646c8a<->50060e8004f2e873 vmhba2:2:8 On
FC 19:0.1 50060b0000646062<->50060e8004f2e802 vmhba4:1:8 On
FC 19:0.1 50060b0000646062<->50060e8004f2e863 vmhba4:2:8 On
Disk vmhba2:1:9 /dev/sdm (512000MB) has 4 paths and policy of Fixed
FC 16:0.1 50060b0000646c8a<->50060e8004f2e812 vmhba2:1:9 On
FC 16:0.1 50060b0000646c8a<->50060e8004f2e873 vmhba2:2:9 On active preferred
FC 19:0.1 50060b0000646062<->50060e8004f2e802 vmhba4:1:9 On
FC 19:0.1 50060b0000646062<->50060e8004f2e863 vmhba4:2:9 On
Above output will be displayed by executing the command “esxcfg-mpath –l” on ESX 3.x
Only having 1 out of 4 paths available for transport you can imagine that there is a lot of wasted resources doing….. well… noting really 
The story changes when VMware announced their vSphere 4.0 PSA (pluggable Storage Architecture). This enabled Storage manufactures to start developing storage related plug-ins for vSphere.
EMC was one of the 1st that modified their existing “PowerPath” software to fit into ESX calling it “EMC PowerPath/VE for VMware”. This plug-in enables you to really spread your storage load over all your HBA’s and storage paths. Finally a real “Multipath” plug-in with full load balancing capability’s
Being all hyped up about this nice thing there was only one way to find out if it did the trick. Let’s put this test into motion! PowerPath / VE comes with a huge set of whitepapers and best practices. Too much pages and to less time this is “The Nutshell version”. The topology ->
Because vSphere is the last ESX version with a Service Console installing this plug-in requires the “VMware Remote CLI” for remote pushing. As test we installed this on the “Virtual Center” server itself.
For managing the plug-in after installation the EMC RTOOLS CLI also need to be installed. It can be found together with the license server at http://powerlink.emc.com
Did I hear you say…. License server??? Yes isn’t it great? VMware finally stops using ELM-tools license manager because of the product friendliness and EMC re-imports it… Thanks EMC
Anyway, After installing all the nice tools on the VC server and pushing the plug-in on/into ESX finally some progress. This is how your path policy looks before installing the plug-in.

After installing the plug-in it looks like this:

All paths enabled and ready to hurl loads of I/O to it, to Make sure the GUI says the same as the Service Console:

After doing some intensive speed testing we got the following result without any additional configuration changes.

A massive mind whopping 120MB/sec per HBA creating a total of 250MB/s read and 250MB/s write simultaneously, doing some quick math, 500MB/sec throughput means:
In one minute, this ->
...is going in here -> 
Like Montel Jordan says: http://www.youtube.com/watch?v=qZwcNu1xg_A
kudos to Ernst Cozijnsen and John van der Sluis.
More info at EMC:
http://www.emc.com/collateral/software/white-papers/h6533-performance-optimization-vmware-powerpath-ve-wp.pdf
Unfortunately, EMC is very late to the party. Compellent Storage arrays on VmWare 4 allow this natively, no plugin, with far better throughput.
Myself I use qlogic 4Gbps HBAs, which are rated to 150k IOPS on a single HBA. I set the queue depth to 128, they are connected to switches, not directly, the array's HBAs have queue depths of ~1500 on each 4Gbps port.
Granted the spindles on my array won't come close to 150k IOPS, given the number of hosts connected(34 currently), don't think I have too much to worry about with regards to needing more throughput than one HBA from any given host is capable of doing.
My ~250 VM environment averages roughly 600 IOPS on the VMFS volumes, those IOPS are controller IOPS, so I'd put money down a good chunk of those IOPS are cache hits. There are spikes that go higher of course, I'm just talking average.
The VMs do a lot of activity over NFS which while hosted on the same storage array does not of course run itself in a VM, but rather has a dedicated cluster. I can't break out the IOPS activity generated from VM systems to NFS vs physical since they all utilize the same shared file system on the NAS.
I am happy that vSphere finally supports round robin MPIO though, I do like to be "testing" all of the paths at all times during the day, knowing that everything is wired up/configured correctly. Would hate to be in an active/standby state where you may not discover a configuration error until a failure occurs. I recall such a situation back in 2004 on a CX600 for example, can you say 36 hour conference call? (note: not the fault of the CX600, it was the fault of the operator who mis configured it, the problem only cascaded when one of the controllers failed).
@ Eric - glad you liked PowerPath/VE!
1) We're working to integrate the licensing with the new vSphere model - agreed it's a much better model, the license server has got to go.
2) Updates going forward will be able to use VUM (did a survey on my blog, and it steered the product direction (we're listening!)
@ Aubrey - the vStorage APIs have 4 groupings 3 that are GA, and 1 which isn't:
- vStorage APIs for Multipathing (aka pluggable storage architecture).
- vStorage APIs for SRM
- vStorage APIs for Data Protection (aka "son of VCB")
- vStorage APIs for Array Integration (aka VAAI - not in current vSphere generation).
Currently, only EMC is shipping something that uses the vStorage APIs for Multipathing. These come in the form of a Multipating Plug-in (MPP) or third party Storage Array Type Plugin or SATP (the SATP stack governs path detection/manangement behavior and selects default PSP); or a third party Path Selection Plugin or PSP (the PSP stack selects the path for any given IO based on Fixed/MRU/RR logic - and a 3rd party PSP can change this to be whatever is best).
PP/VE changes BOTH the SATP behavior (path detection is automated, paths are automatically tested, flaky paths are killed, new paths don't require rescans) and the PSP behavior (uses adaptive queue depth with 3rd party arrays, and predictive - combining initiator and target queue depths on EMC arrays).
One nice tidbit is that customers can "trade in" their physical PowerPath licenses for PowerPath/VE licenses in most cases, so it can be very low cost (in some cases free).
Customers are voting with their feet - PP/VE is one of the most successful PowerPath products ever.
That said, no product is perfect, and we're working always to make it better.
FYI, Dell EqualLogic has a 3rd party PSP in beta. I'm sure others will come too.
Oh - and EMC of course supports the free NMP RR (everyone does).
We also have broad SRM (including SRM 4 support and failback extensions for Celerra and now CLARiiON), a vCenter plugin for ease of use, VMware-integrated snapshots (Replication Manager), VM-Aware Navisphere. We've also public demonstrated and committed to VAAI readiness across V-Max, CX4 (and newer) and Celerra and launch. vSphere 4 also automatically registers with CLARiiON (both CX3 and CX4). EMC also supports the vStorage APIs for Data Protection in our backup products.
What exactly does Compellent have re integration with VMware? Not disputing that I'm sure they have a fine platform, but it's certainly not correct to position that EMC is "catching up", isn't it?
Now, re: VM-Aware Navisphere, it is a bummer that this is CX4 only - I pushed engineering on this one hard (as it is an awesome function) - but the reason is that while Navi 26 and Navi 28 "look" very similar we went through a 32-bit to 64-bit kernel change - which BTW is not XP embedded (you can't believe everything you read on Wikipedia
Seems a bit harsh to compare an array that is 3.5 years old to something new - don't you think?
Open to corrections, but that's the state of the union, IMO.
Thank you for this! What is the best way to get in on the program?
Thanks,
James
I'm about to press the button on the purchase of an AX4-5i, and given what I've read this weekend in your and other blogs, I'm sold on PP/VE.
You mentioned the license 'trade-in' option from PP to PP/VE, so my million $$ question is: What is the approx cost to trade-in the PP licenses that come with the AX4 (I'm assuming 10 for the entry-level solution I'm looking at) to PP/VE for this solution?
Please, please give me some good news on this!
Thanks, Rob
I realize the CX3 is a bit older but c'mon it's software and should not be obsolete at that point. Port the code and don't force people to upgrade to a CX4 or DMX if they want "performance".
I guess I am just one of the "few" customers that EMC threw hardware at (free CX3-80 from a CX700) because they were incapable of fixing various issues and the countless hours I wasted is enough to jump ship and never look back.
No longer is it the times where customers are locked into vendors and need PS to get off of them, Thank You SVMotion!
I look forward to working with your former colleagues who moved on and ventured to the newer companies that don't have all the legacy BS to deal with.
To all of you left with the 3.5yr old arrays with dog slow navisphere management performance, no thin-provisioning, no global hot spares, no native ethernet replication and crappy follow the sun support, I feel for you...
So do I miss your point or is the MP plugin not that spectacular ?
naa.600508b4001059cf0000500000450000 : HP Fibre Channel Disk (naa.600508b4001059cf0000500000450000)
vmhba1:C0:T1:L10 LUN:10 state:active fc Adapter: WWNN: 20:00:00:1b:32:90:a4:b4 WWPN: 21:00:00:1b:32:90:a4:b4 Target: WWNN: 50:00:1f:e1:50:05:f4:90 WWPN: 50:00:1f:e1:50:05:f4:99
vmhba1:C0:T0:L10 LUN:10 state:active fc Adapter: WWNN: 20:00:00:1b:32:90:a4:b4 WWPN: 21:00:00:1b:32:90:a4:b4 Target: WWNN: 50:00:1f:e1:50:05:f4:90 WWPN: 50:00:1f:e1:50:05:f4:9d
vmhba2:C0:T1:L10 LUN:10 state:active fc Adapter: WWNN: 20:00:00:1b:32:90:4f:ad WWPN: 21:00:00:1b:32:90:4f:ad Target: WWNN: 50:00:1f:e1:50:05:f4:90 WWPN: 50:00:1f:e1:50:05:f4:98
vmhba2:C0:T0:L10 LUN:10 state:active fc Adapter: WWNN: 20:00:00:1b:32:90:4f:ad WWPN: 21:00:00:1b:32:90:4f:ad Target: WWNN: 50:00:1f:e1:50:05:f4:90 WWPN: 50:00:1f:e1:50:05:f4:9c
naa.600508b4001059cf00005000006d0000 : HP Fibre Channel Disk (naa.600508b4001059cf00005000006d0000)
vmhba1:C0:T1:L12 LUN:12 state:active fc Adapter: WWNN: 20:00:00:1b:32:90:a4:b4 WWPN: 21:00:00:1b:32:90:a4:b4 Target: WWNN: 50:00:1f:e1:50:05:f4:90 WWPN: 50:00:1f:e1:50:05:f4:99
vmhba1:C0:T0:L12 LUN:12 state:active fc Adapter: WWNN: 20:00:00:1b:32:90:a4:b4 WWPN: 21:00:00:1b:32:90:a4:b4 Target: WWNN: 50:00:1f:e1:50:05:f4:90 WWPN: 50:00:1f:e1:50:05:f4:9d
vmhba2:C0:T1:L12 LUN:12 state:active fc Adapter: WWNN: 20:00:00:1b:32:90:4f:ad WWPN: 21:00:00:1b:32:90:4f:ad Target: WWNN: 50:00:1f:e1:50:05:f4:90 WWPN: 50:00:1f:e1:50:05:f4:98
vmhba2:C0:T0:L12 LUN:12 state:active fc Adapter: WWNN: 20:00:00:1b:32:90:4f:ad WWPN: 21:00:00:1b:32:90:4f:ad Target: WWNN: 50:00:1f:e1:50:05:f4:90 WWPN: 50:00:1f:e1:50:05:f4:9c
WOW! We bought plenty of those physical PP licenses for servers that actually runs VMware ESX3.5. What a waste of money but... If we can trade them in, this a big chunk out of the cost of an upgrade to vSphere.
Chad you make my day, thx