[openib-general][patch review] srp: fmr implementation,

Roland Dreier rdreier at cisco.com
Tue Apr 11 21:23:16 PDT 2006


    Vu> Hi Roland, Sorry to take this long to response. Thanks for all
    Vu> the enhancements.  I cced some Engenio's engineer who can help
    Vu> to send latest FW to you.

Thanks... I haven't been good about following up with Engenio about
this issue (IOs with a single direct region of > 58 * 4096 bytes
failing to complete).

    Vu> Here is my status of testing this patch.  On x86-64 system I
    Vu> got data corruption problem reported after ~4 hrs of running
    Vu> Engenio's Smash test tool when I tested with Engenio storage
    Vu> On ia64 system I got multiple async event 3
    Vu> (IB_EVENT_QP_ACCESS_ERR) and even 1 (IB_EVENT_QP_FATAL),
    Vu> finally the error handling path kicked in and the system
    Vu> paniced. Please see log below (I tested with Mellanox's srp
    Vu> target reference implementation - I don't see this error
    Vu> without the patch)

Hmm, that's interesting.  Did you see this type of problem with the
original FMR patch you wrote (and did you do this level of stress
testing)?  I'm wondering whether the issue is in the SRP driver, or
whether there is a bug in the FMR stuff at a lower level.

What kind of HCAs were you using?  I assume on ia64 you're using
PCI-X, what about on x86-64?  PCIe or not?  Memfree or not?

Another thing that might be useful if it's convenient for you would be
to use an IB analyzer and trigger on a NAK to see what happens on the
wire around the IB_EVENT_QP_ACCESS_ERR.

Thanks,
  Roland



More information about the general mailing list