[openib-general][patch review] srp: fmr implementation,
Roland Dreier
rdreier at cisco.com
Tue Apr 11 21:23:16 PDT 2006
Vu> Hi Roland, Sorry to take this long to response. Thanks for all
Vu> the enhancements. I cced some Engenio's engineer who can help
Vu> to send latest FW to you.
Thanks... I haven't been good about following up with Engenio about
this issue (IOs with a single direct region of > 58 * 4096 bytes
failing to complete).
Vu> Here is my status of testing this patch. On x86-64 system I
Vu> got data corruption problem reported after ~4 hrs of running
Vu> Engenio's Smash test tool when I tested with Engenio storage
Vu> On ia64 system I got multiple async event 3
Vu> (IB_EVENT_QP_ACCESS_ERR) and even 1 (IB_EVENT_QP_FATAL),
Vu> finally the error handling path kicked in and the system
Vu> paniced. Please see log below (I tested with Mellanox's srp
Vu> target reference implementation - I don't see this error
Vu> without the patch)
Hmm, that's interesting. Did you see this type of problem with the
original FMR patch you wrote (and did you do this level of stress
testing)? I'm wondering whether the issue is in the SRP driver, or
whether there is a bug in the FMR stuff at a lower level.
What kind of HCAs were you using? I assume on ia64 you're using
PCI-X, what about on x86-64? PCIe or not? Memfree or not?
Another thing that might be useful if it's convenient for you would be
to use an IB analyzer and trigger on a NAK to see what happens on the
wire around the IB_EVENT_QP_ACCESS_ERR.
Thanks,
Roland
More information about the general
mailing list