[ofa-general] RE: the SDP module prints allot of error messages to the /var/log/messages
Jim Mott
jim at mellanox.com
Wed Nov 21 08:22:21 PST 2007
Hi,
These messages indicate real errors in zero copy bcopy operations.
They mean that when sdp_sendmsg() returned to user space, the SDP driver
thought that there were in-flight IB send operations pointing to user
pages. Bad things can happen in this case.
I have never seen these in simple testing by hand. The patch I sent
you yesterday for review adds this test to the error recovery path. If
you hit ^C (for example) during a transfer, you may see these messages
but no crash. That patch is not in the current code yet because I need
to understand why cleanup is not coordinated with completions and thus
generating these messages.
Were the regressions running with the test patch or did you see these in
normal operation? If normal operation, please let me know how to
reproduce.
Thanks,
JIm
Jim Mott
Mellanox Technologies Ltd.
mail: jim at mellanox.com
Phone: 512-294-5481
-----Original Message-----
From: Dotan Barak [mailto:dotanb at dev.mellanox.co.il]
Sent: Wednesday, November 21, 2007 10:01 AM
To: Jim Mott; openib-general
Subject: the SDP module prints allot of error messages to the
/var/log/messages
In our nightly regression i noticed that the /var/log/messages is filled
with the following error messages:
Nov 21 17:28:30 sw186 kernel: sdp_sock(42203:19000): Could not reap -32
in-flight sends
Nov 21 17:28:30 sw186 kernel: sdp_sock(42202:19010): Could not reap -2
in-flight sends
Nov 21 17:28:30 sw186 kernel: sdp_sock(42203:19000): Could not reap -29
in-flight sends
Nov 21 17:28:30 sw186 kernel: sdp_sock(42204:19005): Could not reap -14
in-flight sends
Nov 21 17:28:30 sw186 kernel: sdp_sock(42202:19010): Could not reap -2
in-flight sends
Nov 21 17:28:30 sw186 kernel: sdp_sock(42202:19010): Could not reap -7
in-flight sends
Nov 21 17:28:30 sw186 kernel: sdp_sock(42203:19000): Could not reap -28
in-flight sends
Nov 21 17:28:30 sw186 kernel: sdp_sock(42203:19000): Could not reap -4
in-flight sends
Nov 21 17:28:30 sw186 kernel: sdp_sock(42203:19000): Could not reap -32
in-flight sends
Nov 21 17:28:31 sw186 kernel: sdp_sock(42204:19005): Could not reap -14
in-flight sends
Are those error messages are really necessary?
thanks
Dotan
More information about the general
mailing list