08-14-2013 09:32 AM
I'm new to 3PAR and new to this board but I need help setting my expectations about what performance the a 3PAR 7400 4-node with 2 disk shelves can achieve. The array has with 96 900GB 10K SAS drives. These drives support approximately 140 IOPS, so 96 of them should give me 6720 IOPS (allowing for a write penalty of 2 becuase the CPG is RAID1)
I just set this up a RAID1 CPG and created a 16TB TPVV
I'm using Centos 6.4 on the host (BL 465c G8: 32 core AMD with 128GB RAM)
MTU on NICs and 3PAR set to 9000
BL 465c G8 -> 10Gb pass through module -> HP 5800 switch -> HP 3PAR 7400.
A quick test using dd gives the following:
root@c7k1hv3 ~]# dd bs=1M count=512 if=/dev/zero of=/mnt/tmp/test2 conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 2.35333 s, 228 MB/s
Is this considered a good result? I think not given that the array is doing nothing and I have a "shed load" of IOPS, however I'm new to 3PAR and SANs in general - (background is networks and systems)
I limited the server to only seeing a single target on the 7400 4-node to avoid any multipathing complications.
Thanks in advance for any feedback, comments or guidance.
08-14-2013 10:53 AM
I would be intereted to see the test results..
1. With multipathing and load balancing across at least 2 paths.
2. With varying block sizes. In your test you used 1M bs, have you tied smaller bs's?
08-15-2013 02:29 AM
Hi Jason, Members
Here's a summary of through using different byte sizes.
BS Count Throughput
1M 1024 227 MB/s
512K 2048 222 MB/s
256K 4096 231 MB/s
128K 8096 226 MB/s
64K 16192 229 MB/s
8K 128000 215 MB/s
4K 256000 209 MB/s
Command: dd bs=BS count=Count if=/dev/zero of=/mnt/tmp/test conv=fdatasync
I'll add in a second path and repeat the test.
Thanks for replying.
08-15-2013 04:48 AM - edited 08-15-2013 04:52 AM
I added in a second path and repeated the test.
Recap on setup from server to switch:
HP BL465c in Bay 3
Eth0 mapping to PTM internal port 3 which is connected to swith-a-port 3-vlan83
Eth1 mapping to PTM internal port 3 which is connected to swith-b-port 3-vlan84
Switch a and b are not connected.
3PAR is connected to the switches as follows:
0:2:1 Switch A Port 41 192.168.83.201
0:2:2 Switch B Port 41 192.168.84.202
1:2:1 Switch A Port 43 192.168.83.211
1:2:2 Switch B Port 43 192.168.84.212
2:2:1 Switch A Port 45 192.168.83.221
2:2:2 Switch B Port 45 192.168.84.222
3:2:1 Switch A Port 47 192.168.83.231
3:2:2 Switch B Port 47 192.168.84.232
In the original post, I only had an iscsi mapping to Node 0 port 1. For the multipath test, I added a mapping to Node 1 port 2.
View from the server.
[root@c7k1hv3 ~]# multipath -ll
mpathb (360002ac0000000000000000200004983) dm-4 3PARdata,VV
size=16T features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 7:0:0:0 sdb 8:16 active ready running
`- 8:0:0:0 sdc 8:32 active ready running
[root@c7k1hv3 ~]# arp -a
? (192.168.83.201) at 2c:27:d7:53:e9:3e [ether] on eth0
? (192.168.84.212) at 2c:27:d7:53:e9:ea [ether] on eth1
The second path didn't make any noticable difference.
BS Count Throughput
1M 1024 233 MB/s
512K 2048 225 MB/s
256K 4096 234 MB/s
128K 8192 224 MB/s
64K 16384 234 MB/s
8K 128000 215 MB/s
4K 256000 191 MB/s
WRT the server iSCSI setup, I followed the directions in the HP 3PAR RHEL and Oracle implementation guide.
Let me know if you want me to upload any config files.
08-15-2013 05:55 AM
A common misconception. A single IO stream is the worst possible test. The 3PAR is designed for multiple IO streams and multiple workloads. Get and set up IOMeter.
Get a couple of servers (2 beefy should be able to easily do it). Create 8 VVs per server (full or thin) on FC drives. Make sure the VVs are big enough that you are hitting all the disks. Create half, make a really large VV so the other test VVs are some distance away from the first ones. Prevents short-stroking.
Under Access Specifications, create a new test:
* Name: 3PAR
* Transfer Request Size: 8KB
* Percent of Access Specification: 100%
* Percent Read/Write Distribution: default
* Percent Random/Sequential Dist: 100% Random
* Align I/Os on 16 KB boundaries
* Align block size = 16k.
You want 4 – 8 workers per CPU core.
Note: While I work for HP, all of my comments (whether noted or not), are my own and are not any official representation of the company.
If my post was useful, click on my KUDOS! "White Star" !
08-15-2013 06:11 AM
Agree with Sheldon about the comment to how DD inheriently works. With the DD my understanding is you are basically doing a sequential "single-threaded" operation, which althought not completely without merit doesn't necessarily refelct the "real world" - w/ multiple data streams of varying R/W percentages and varying block sizes.
Generally speaking IOP "ratings" are based on certain R/W percentages @ a certain block size average. But even then IOPS are only part of the story and throughput (KB/s), with the block size in mind is also an important measurement.
Various tools are availabe to generate and test a this "real world" I/O as Sheldon mentions as well.
08-15-2013 06:18 AM
BTW, if you write only zeros to the array, it is not even writing a lot to the disks ...
Hope this helps!
There are only 10 types of people in the world -
those who understand binary, and those who don't.
No support by private messages. Please ask the forum!
If you feel this was helpful please click the KUDOS! thumb below!
08-15-2013 08:58 AM
Thanks to everyone for their comments advice. I'll get iometer installed next week however the reason I was using dd is twofold:
1) a lot of people still use it for a simple benchmarking exercise and my expectations were set by (2)
2) when the array was configured as RAID 5 - magazine level redundancy - I got better better results?
root@ftest:~# dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 1.38168 s, 389 MB/s
I expected a move to RAID1 to increase performance, not decrease it.
For the record, I created 16 VM on the server and re-ran the 'dd' command simultaneously - I got approx 70MB/s throughput per VM which is very respectible....it's close to 900Mbps!
Thank you all again for your time and comments. I will definitely explore iometer when resources permit (no windows machine to hand) and update this post.
08-27-2013 09:15 PM
For HP 3PAR storage system querries you can also visit the HP Guided troubleshooting tree.
Below is the link for 3PAR HPGT:
To assign points on this post? Click the white Thumbs up below!
09-10-2014 09:58 AM
I know this is an older post, but it was very close to my situation, so here we go:
We're running a 3PAR 7450 loaded mostly with flash with 8Gb FC heads.
A particular application we are running needs to run a single-threaded and sustain a throughput rate of 200MB/Sec+. We are coming off a DL380 G8 (have tried Emulex HBAs as well as QLogic), but we are getting numbers in the 40MB-50MB/s range (we have some flexibility in block size, but basically between 16k and 64k is our limit). It is an application feeding into a SQL database. The application behavior can not be changed, however, meaning we can't add more threads.
Is there any way to optimize for this? This unit outperforms the requirements of our other apps, so even if a modification will slow down our other systems, we're ok with that. This application is the one we really need to get running.
Funny enough, on local SAS disk, we can get 600MB/sec+, but for a lot of reasons, we need this on the SAN.
Any help will be greatly appreciated! (And if it REALLY works, name your favorite single malt...heh)
11-15-2014 07:29 AM
A 3PAR 7450 loaded mostly with flash:
You will want it running 3.2.1 MU1. Not sure if the new Adaptive Flash Cache will help, but it could be set for "simulator mode" and then you could see if actually turning it on would help.
If you have not done so lately, double-check your switches and HBAs to verify they really are running at 8 Gb. You may want to force the switch ports for the 3PAR and the server(s) to 8Gb rather than auto-negotiate.
Double-check the server's multipathing. I am assuming redundant HBAs and redundant SAN fabrics. At least two paths per HBA; you could go four per HBA.
And verify round-robin is enabled at the host.
Check the HP 3PAR Implementation Guide for the OS to see if there's anything in particular that could be set. Sometimes HBA queue depth and other HBA parameters.
More than that, get Sales involved. They can help you find a resource to do in-depth tuning.