[ofw] Possible bug in multi segment MAD reassembly

Jan Bottorff jbottorff at xsigo.com
Tue May 22 02:20:09 PDT 2007


Hi,

 

I've been debugging an assert <CL_ASSERT( cur_seg <= p_rmpp->seg_limit)>
we see in function __process_segment in file cor\al\al_mad.c. It looks
like this happens infrequently, perhaps when processing IOC polling
responses (the GETTABLE query is IB_MAD_ATTR_NODE_RECORD).

 

It looks like what happens may be __process_segment is not able to
expand the MAD buffer via the call to al_resize_mad, perhaps due to
memory allocation failure. The failure happens (more?) under heavy
server load. The result seems to be the assert when the next query reply
segment comes in. It looks like the code assumes only the initial window
of 1 segment is allowed unless acked, but am not so sure the IB stack on
the other end also uses an initial window of only 1 segment (on Linux
where the subnet manager is). Actually, I'm not so sure the Windows
subnet manager looks like it assumes an initial window of 1. My guess is
if the expansion of the MAD buffer goes ok, the pipelined segments (past
the window of 1) don't break anything.

 

We suspect if you set the ibbus parameter iocPollingInterval to 0, this
problem may go away. This might also mean it gets much worse with much
faster polling.

 

Does anybody else see this assert? Or have any comments on this as a
potential bug? This is in the 615 svn version.

 

Thanks

 

Jan

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20070522/f02a3da2/attachment.html>


More information about the ofw mailing list