[ofa-general] SRP on RHEL 5.3/OFED 1.3 vs RHEL 5.1/OFED 1.2?

Ifan W ifan at novaglobal.com.sg
Wed Sep 9 08:12:46 PDT 2009


Hi John,

Seeing your post and i was curious whether you have found the answer  
to your problem.
I am currently facing the same problem on  RHEL 5.3 + OFED 1.4  
connecting to  DDN 9900.

Appreciate if you could share your finding so far.

Thanks.

Regards,
Ifan


On May 23, 2009, at 2:40 AM, John Valdes wrote:

> Hi all,
>
> We have a storage array (a DDN 9550) attached to 8 servers via IB.
> This setup has been running fine for the last 1.5 years or so, with
> the servers running RHEL 5.1 and the OFED (OpenIB) 1.2 stack that's
> included with RHEL 5.1.
>
> Recently, we tried to upgrade to new servers running RHEL 5.3 with
> its bundled OFED 1.3 stack, but now we're seeing frequent timeouts
> resulting in LUN resets and SCSI command aborts between the servers
> and the DDN.  As far as we can tell, our IB setup on the servers under
> 5.3 is identical to the setup under 5.1, so we don't know why we're
> seeing the timeouts and resets.
>
> Is anyone aware of any changes when using IB SRP w/ RHEL 5.3 and OFED
> 1.3 vs RHEL 5.1/OFED 1.2 which might be causing this?
>
> For reference, here are some of the details of our setup:
>
> OLD CONFIGURATION
> -----------------
> * SuperMicro P4DP6 motherboard, w/ dual Xeon CPUs (x86, single core
> "Prestonia"), all circa 2002 hardware
> * Cisco SFS-HCA-X2T7-A1 IB HCA (aka Mellanox Cougar Cub), 133 MHz  
> PCI-X,
> 128 MB memory, Firmware v3.5.917, dual port (port 1 attached to DDN)
> * RHEL 5.1 w/ bundled OFED/OpenIB 1.2
> * ib_mthca module loaded w/o any extra options
> * ib_srp module loaded w/ option "srp_sg_tablesize=255"
> * Connection to DDN established using "srp_daemon" invoked as:
> "srp_daemon -coe" with options "max_sect=8192,max_cmd_per_lun=5"
> given in /etc/srp_daemon.conf (Note that due to a bug in the OFED
> 1.2 srp_daemon, the "max_sect=8192" option is ignored, which is OK
> since we weren't taking advantage of that option).
> * 7 DDN LUNs are accessed by all 8 servers as clustered logical
> volumes (under RedHat's CLVM) holding GFS filesystems.
> * 8 unique (not-shared) DDN LUNs are accessed by the servers (one LUN
> per server) as a plain disk holding an ext3 filesystem.
>
> NEW CONFIGURATION
> -----------------
> * SuperMicro H8DME-2 motherboard, w/ dual quad-core AMD Opteron  
> 2342, x86_64
> * Cisco SFS-HCA-X2T7-A1 IB HCA (aka Mellanox Cougar Cub), 133 MHz  
> PCI-X,
> 128 MB memory, Firmware v3.5.917, dual port (port 1 attached to DDN)
> --same card as in old configuration, physically moved to new servers
> * RHEL 5.3 w/ bundled OFED/OpenIB 1.3
> * ib_mthca module loaded w/o any extra options
> * ib_srp module loaded w/ option "srp_sg_tablesize=255"
> * Connection to DDN established using "srp_daemon" invoked as:
> "srp_daemon -coe -f /etc/ofed/srp_daemon.conf" with options
> "max_sect=8192,max_cmd_per_lun=5" srp_daemon.conf
> * 7 DDN LUNs are accessed by all 8 servers as clustered logical
> volumes (under RedHat's CLVM) holding GFS filesystems.
> * 8 unique (not-shared) DDN LUNs are accessed by the servers (one LUN
> per server) as a plain disk holding an ext3 filesystem.
>
>
> With the new configuration, timeouts/resets have frequently occurred
> when starting up CLVM on the servers (eg, when the servers scan the
> LUNs looking for the Linux (clustered) LVM data) as well as when doing
> I/O to the mounted filesystems.  Just to make sure the CLVM/GFS setup
> wasn't causing problems, we tested the plain ext3 filesystem on the
> non-shared LUN from one of the new servers, and when doing a simple
> "dd" to the LUN, we were still seeing timeouts and LUN resets.
>
> Does any of this sound familiar to anyone?  Do you have a recommended
> IB/SRP setup for RHEL 5.3?
>
> John
>
> ----------------------------------------------------------------------
> John Valdes                  Mathematics and Computer Science Division
> valdes at anl.gov                             Argonne National Laboratory
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list