[openib-general] Re: [Fwd: Re: [Fwd: Re: OpenSM Bugs]]

Hal Rosenstock halr at voltaire.com
Thu Jan 20 11:41:08 PST 2005


On Thu, 2005-01-20 at 13:45, Tom Duffy wrote:
> On Wed, 2005-01-19 at 19:15 -0500, Hal Rosenstock wrote:
> > The OpenIB side wouldn't send an ACK so this message must indicate
> > something else. Any chance you can get an IB trace of what is going on ?
> > Alternatively can you dump out the MAD where the unsupported base
> > version is printed out ? It will take me a little bit before I am setup
> > to try to recreate this.
> 
> Here is the analysis of what is going on (from Solaris's perspective).
> 
> Sandip Barua wrote:
> > Based on the trace output I obtained this morning,
> > I am seeing a number of the following transactions:
> > 
> > 1) IBMF makes a request (non-RMPP send)

This is the SA GetTable request.

> > 2) openIB responds with one RMPP MAD which is marked as both
> >    the first and the last packet of the RMPP transaction

This is the SA GetTableResp.

> > 3) IBMF sends an ACK

Not 100% sure what the OpenIB side would do with this. The ACK is an SA 
packet which should have the MAD header properly filled in.

Can you insert some code to dump the first 32 bytes of the MAD when the
base version is unsupported (in the if clause at line 1463)
and send me the output ?

Also, is there anything in the OSM logs ?

> > At this point, the transaction should be complete,

Agreed.

>  but MADs continue
> > to arrive. In one case an ACK arrives which is an
> > illegal packet to receive while in receiver terminate loop,

OpenIB doesn't generate ACKs so this is a mystery to me.

>  so, ibmf
> > returns an ABORT. 

Yes, if the receiver is terminating, receiving an ACK with IsDS 0 should
cause an ABORT to be sent.

> In the remaining cases, DATA packets are being
> > received after the transaction is complete,

This is also a mystery to me. I don't know what these additional DATA
packets from OpenIB would be.

>  so, ibmf drops them and
> > returns ACKs if in the receiver termination loop.

Looks to me like these should also be ABORTed with BadT in the receiver
termination loop.
 
> > From the IBMF side, the following packets are being sent,
> > o request data packet (non-rmpp, will not have base version set)

Do you mean RMPP version (since RMPP is not active) ? Doesn't base
version always need setting ?

> > o ACK packet on completion of the transaction
> > o ABORT packet on receiving an ACK
> > 
> > Based on my review of the code, and the occurrence of certain
> > messages, it looks to me like the base version should be set on all
> > outgoing ACK and ABORT packets.

Thanks for the detailed analysis.

-- Hal






More information about the general mailing list