[ofa-general] OFED SRP Client / StorageGear Target / Performance with Modified Write Protocol
Ken Jeffries
kenjeffries at storagegear.com
Mon Jul 30 08:30:18 PDT 2007
We have been doing a fair amount of performance testing on our SRP target.
One thing we found early on was that client writes were considerably slower
than client reads. We addressed this by patching the SRP client code so
that it could include the client write data in the SRP CMD IU if it would
fit. This notion is in iSER but is not in standard SRP. Architecturally,
the capability is signaled using an additional data buffer format bit.
We find that client write performance is considerably improved by using
this capability. We are calling SRP spec compliant writes "standard
writes" and our modified writes "iu data writes".
We also implemented a similar capability for client reads but on our system
we did not see a performance improvement.
We would like to know if other SRP'rs would be interesting in us making
the patch available for either inclusion or for discussion. Since we did
this without input from anyone else we are not going to claim that the
way we did it is necessarily the best way to do it.
Below are some of our performance numbers, preceeded by a description of
our test setup.
The StorageGear SRP Solid State Disk System is an asymmetrical embedded system
based on proprietary firmware and a Supermicro X7DBi+ motherboard with two
2.00GHz Woodcrest processors (four cpus altogether). The system used in this
test includes two Mellanox sdr pci-e hcas in 8x slots. Four independent SSDs
(SRP0, SRP1, ...) are configured. SRP0 is made visible on the first hca port,
SRP1 is made visible on the second hca port and so on. Each hca is statically
associated with a cpu at boot time. The native block size of each ssd is 4KB.
The native block size can be configured to be from 512B to 64KB. We suspect
that 4KB is best for Linux applications.
"testy" is a small client program that uses Linux asynchronous i/o and O_DIRECT
to drive read and write requests as quickly as possible. It tries to keep a
specified number of reads or writes of specified size outstanding for a
specified time. testy was written because available tools were not able to
load the StorageGear target sufficiently. All testy io is random. For an
SSD, random io performance should be the same as sequential so we don't look
at sequential performance at all.
The SRP clients, Tesla and Newton, used in the tests have Asus A8N32-SLI Deluxe
motherboards, each with a AMD 1.8GHz Dual Core Opteron 165 processor, 1GB ram,
2 Mellanox sdr pci-e 8x hcas in 16x slots running OFED-1.2 with SRP on SUSE
Linux Enterprise Server 10 (x86_64). Tesla runs kernel 2.6.16.27-0.9-smp and
Newton runs kernel 2.6.16.21-0.8-smp.
Two Mellanox MTEK 43132 8-port 4x switches are used to implement two subnets.
SMs for each subnet are provided by separate systems.
For these tests, four testys are run, two per client, one per srp target. The
paths are arranged thru visibility and allow/deny configuration to use all
four client ports and all four srp target ports. We monitor our target cpu
utilization and we know that the maximum number of "small" iops for a
particular hca is reached when the cpu associated with the hca reaches 100%
utilization. All numbers are 90 second testy run averages.
4KB Random Standard Reads
testy target target iops target hca hca iops
----- ------ ----------- ---------- --------
newton.0 srp0 30636
newton.1 srp1 30682
hca0 61318
tesla.0 srp2 30680
tesla.1 srp3 30710
hca1 61390
4KB Random Standard Writes
testy target target iops target hca hca iops
----- ------ ----------- ---------- --------
newton.0 srp0 25201
newton.1 srp1 25291
hca0 50492
tesla.0 srp2 25412
tesla.1 srp3 25441
hca1 50853
4KB Random IU Data Writes
testy target target iops target hca hca iops
----- ------ ----------- ---------- --------
newton.0 srp0 31993
newton.1 srp1 32526
hca0 64519
tesla.0 srp2 32172
tesla.1 srp3 32594
hca1 64766
-
64KB Random Standard Reads
testy target target mbps target hca hca mbps
----- ------ ----------- ---------- --------
newton.0 srp0 681.2
newton.1 srp1 681.2
hca0 1362.4
tesla.0 srp2 680.1
tesla.1 srp3 680.2
hca1 1360.3
128KB Random Standard Writes
testy target target mbps target hca hca mbps
----- ------ ----------- ---------- --------
newton.0 srp0 747.8
newton.1 srp1 739.5
hca0 1487.3
tesla.0 srp2 747.2
tesla.1 srp3 738.7
hca1 1485.9
-
The following tests are one testy to one srp target.
4KB Random Reads
testy target target iops target hca hca iops
----- ------ ----------- ---------- --------
tesla srp3 59289
hca1 59289
4KB Random Standard Writes
testy target target iops target hca hca iops
----- ------ ----------- ---------- --------
tesla srp3 43054
hca1 43054
4KB Random IU Data Writes
testy target target iops target hca hca iops
----- ------ ----------- ---------- --------
tesla srp3 53839
hca1 53839
128 Random Standard Reads
testy target target mbps target hca hca mbps
----- ------ ----------- ---------- --------
tesla srp3 971.9
hca1 971.9
128 Random Standard Writes
testy target target mbps target hca hca mbps
----- ------ ----------- ---------- --------
tesla srp3 881.5
hca1 881.5
We have done some testing with directly connected DDR
hcas. The DDR hcas provide an iops boost in the range
of 10%.
Ken Jeffries
StorageGear
More information about the general
mailing list