[ofw] [IPoIB_NDIS6_CM] [Patch 0/3] Improper IPoIB behavior during simultaneous sends on both sides

Alex Naslednikov xalex at mellanox.co.il
Tue Nov 9 08:19:36 PST 2010


We faced several problems when trying to send large files over IPoIB (each side copied large file shared on NFS)
Problem #1- ib_post_send returned an "IB_INVALID_PARAMETER" results
Problem #2 - IPoIB disappeared on one of the machines during the copy

Problem #2 was a consequence of #1, because:
1. ib_post_send returned IB_INVALID_PARAMETER and "hung" flag was set as a consequence of this operation 
2. NDIS realized that IPoIB got stuck and sent restart command. But it was a race in this flow.

There are 3 patches that fix these problem:
Patch #1:
__ipoib_reset_adapter is called in a separate thread from ipoib_adapter_reset and it changes the value of p_adapter->ipoib_state. On the other hand, ipoib_adapter_reset calls to shutter_shut and also checks and changes ipoib_state.
Thus, the possible race (that happened) is that __ipoib_reset_adapter will start running before call to shutter_shut.

Patch #2:
When IPoIB has to send an NBL with number of SG elements greater than HW can handle at one send, it switches to 'send_copy' flow. But send_gen always return non-success status in this case (caused by CM flow commit)

Patch #3:
Invalid max SGE calculation caused ib_post_send to fail

Alexander (XaleX) Naslednikov
SW Networking Team
Mellanox Technologies




More information about the ofw mailing list