[ofa-general] SRP on RHEL 5.3/OFED 1.3 vs RHEL 5.1/OFED 1.2?

John Valdes valdes at anl.gov
Tue Sep 15 15:46:43 PDT 2009


Hi Ifan,

Sorry for the delay in replying.

> Seeing your post and i was curious whether you have found the answer to 
> your problem.
> I am currently facing the same problem on  RHEL 5.3 + OFED 1.4  
> connecting to  DDN 9900.
>
> Appreciate if you could share your finding so far.

After various tests, we concluded that the problem was with our PCI-X
HCAs:

>> * Cisco SFS-HCA-X2T7-A1 IB HCA (aka Mellanox Cougar Cub), 133 MHz  
>> PCI-X, 128 MB memory, Firmware v3.5.917, dual port (port 1 attached to DDN)

It was either some problem with using the PCI-X on our new servers, or
some problem with the firmware on the HCAs themselves.  Whatever it
was, it was causing the HCA to reset with a "catastrophic" error, as
reported by the kernel module.  We tried downgrading to OFED 1.2 (the
version that had been working on our old servers under RHEL 5.1) but
saw the same problems.  We also tried replacing the PCI-X HCAs with
PCI-e HCAs, and the latter worked fine with both OFED 1.2 and 1.3.  We
never tried OFED 1.4, but I would guess that we'd see the same issue.
So the "fix" was to replace the HCAs.

What HCA are you using?

John



More information about the general mailing list