[ofa-general] OFED SRP Client / StorageGear Target / Performance with Modified Write Protocol

Ken Jeffries kenjeffries at storagegear.com
Mon Jul 30 08:30:18 PDT 2007


We have been doing a fair amount of performance testing on our SRP target.
One thing we found early on was that client writes were considerably slower
than client reads. We addressed this by patching the SRP client code so
that it could include the client write data in the SRP CMD IU if it would
fit. This notion is in iSER but is not in standard SRP. Architecturally,
the capability is signaled using an additional data buffer format bit.
We find that client write performance is considerably improved by using
this capability. We are calling SRP spec compliant writes "standard
writes" and our modified writes "iu data writes".

We also implemented a similar capability for client reads but on our system
we did not see a performance improvement.

We would like to know if other SRP'rs would be interesting in us making
the patch available for either inclusion or for discussion. Since we did
this without input from anyone else we are not going to claim that the
way we did it is necessarily the best way to do it.

Below are some of our performance numbers, preceeded by a description of
our test setup.

The StorageGear SRP Solid State Disk System is an asymmetrical embedded system
based on proprietary firmware and a Supermicro X7DBi+ motherboard with two
2.00GHz Woodcrest processors (four cpus altogether). The system used in this
test includes two Mellanox sdr pci-e hcas in 8x slots. Four independent SSDs
(SRP0, SRP1, ...) are configured. SRP0 is made visible on the first hca port,
SRP1 is made visible on the second hca port and so on. Each hca is statically
associated with a cpu at boot time. The native block size of each ssd is 4KB.
The native block size can be configured to be from 512B to 64KB. We suspect
that 4KB is best for Linux applications.

"testy" is a small client program that uses Linux asynchronous i/o and O_DIRECT
to drive read and write requests as quickly as possible. It tries to keep a
specified number of reads or writes of specified size outstanding for a
specified time. testy was written because available tools were not able to
load the StorageGear target sufficiently.  All testy io is random. For an
SSD, random io performance should be the same as sequential so we don't look
at sequential performance at all.

The SRP clients, Tesla and Newton, used in the tests have Asus A8N32-SLI Deluxe
motherboards, each with a AMD 1.8GHz Dual Core Opteron 165 processor, 1GB ram,
2 Mellanox sdr pci-e 8x hcas in 16x slots running OFED-1.2 with SRP on SUSE
Linux Enterprise Server 10 (x86_64).  Tesla runs kernel 2.6.16.27-0.9-smp and
Newton runs kernel 2.6.16.21-0.8-smp.

Two Mellanox MTEK 43132 8-port 4x switches are used to implement two subnets.
SMs for each subnet are provided by separate systems.

For these tests, four testys are run, two per client, one per srp target. The
paths are arranged thru visibility and allow/deny configuration to use all
four client ports and all four srp target ports. We monitor our target cpu
utilization and we know that the maximum number of "small" iops for a
particular hca is reached when the cpu associated with the hca reaches 100%
utilization. All numbers are 90 second testy run averages.

4KB Random Standard Reads
testy       target  target iops     target hca  hca iops
-----       ------  -----------     ----------  --------
newton.0    srp0    30636
newton.1    srp1    30682
                                    hca0        61318
tesla.0     srp2    30680
tesla.1     srp3    30710
                                    hca1        61390

4KB Random Standard Writes
testy       target  target iops     target hca  hca iops
-----       ------  -----------     ----------  --------
newton.0    srp0    25201
newton.1    srp1    25291
                                    hca0        50492
tesla.0     srp2    25412
tesla.1     srp3    25441
                                    hca1        50853

4KB Random IU Data Writes
testy       target  target iops     target hca  hca iops
-----       ------  -----------     ----------  --------
newton.0    srp0    31993
newton.1    srp1    32526
                                    hca0        64519
tesla.0     srp2    32172
tesla.1     srp3    32594
                                    hca1        64766
-

64KB Random Standard Reads
testy       target  target mbps     target hca  hca mbps
-----       ------  -----------     ----------  --------
newton.0    srp0    681.2
newton.1    srp1    681.2
                                    hca0        1362.4
tesla.0     srp2    680.1
tesla.1     srp3    680.2
                                    hca1        1360.3

128KB Random Standard Writes
testy       target  target mbps     target hca  hca mbps
-----       ------  -----------     ----------  --------
newton.0    srp0    747.8
newton.1    srp1    739.5
                                    hca0        1487.3
tesla.0     srp2    747.2
tesla.1     srp3    738.7
                                    hca1        1485.9
-

The following tests are one testy to one srp target.

4KB Random Reads
testy       target  target iops     target hca  hca iops
-----       ------  -----------     ----------  --------
tesla       srp3    59289
                                    hca1        59289

4KB Random Standard Writes
testy       target  target iops     target hca  hca iops
-----       ------  -----------     ----------  --------
tesla       srp3    43054
                                    hca1        43054

4KB Random IU Data Writes
testy       target  target iops     target hca  hca iops
-----       ------  -----------     ----------  --------
tesla       srp3    53839
                                    hca1        53839
128 Random Standard Reads
testy       target  target mbps     target hca  hca mbps
-----       ------  -----------     ----------  --------
tesla       srp3    971.9
                                    hca1        971.9

128 Random Standard Writes
testy       target  target mbps     target hca  hca mbps
-----       ------  -----------     ----------  --------
tesla       srp3    881.5
                                    hca1        881.5


We have done some testing with directly connected DDR
hcas. The DDR hcas provide an iops boost in the range
of 10%.

Ken Jeffries
StorageGear




More information about the general mailing list