[openib-general] Re: [Fwd: Re: [Fwd: Re: OpenSM Bugs]]

sandip barua Sandip.Barua at Sun.COM
Thu Jan 20 13:07:28 PST 2005


I did find the bug that causes the incorrect base version for outgoing
MADs while in the receiver termination loop. I fixed the software on
Tom's solaris system. I still do see the incoming ACK, but not many
other unexpected packets from a brief look at the traces.
A couple responses below as well.

- Sandip

Hal Rosenstock wrote:
> On Thu, 2005-01-20 at 13:45, Tom Duffy wrote:
> 
>>On Wed, 2005-01-19 at 19:15 -0500, Hal Rosenstock wrote:
>>
>>>The OpenIB side wouldn't send an ACK so this message must indicate
>>>something else. Any chance you can get an IB trace of what is going on ?
>>>Alternatively can you dump out the MAD where the unsupported base
>>>version is printed out ? It will take me a little bit before I am setup
>>>to try to recreate this.
>>
>>Here is the analysis of what is going on (from Solaris's perspective).
>>
>>Sandip Barua wrote:
>>
>>>Based on the trace output I obtained this morning,
>>>I am seeing a number of the following transactions:
>>>
>>>1) IBMF makes a request (non-RMPP send)
> 
> 
> This is the SA GetTable request.
> 
> 
>>>2) openIB responds with one RMPP MAD which is marked as both
>>>   the first and the last packet of the RMPP transaction
> 
> 
> This is the SA GetTableResp.
> 
> 
>>>3) IBMF sends an ACK
> 
> 
> Not 100% sure what the OpenIB side would do with this. The ACK is an SA 
> packet which should have the MAD header properly filled in.
> 
> Can you insert some code to dump the first 32 bytes of the MAD when the
> base version is unsupported (in the if clause at line 1463)
> and send me the output ?
> 
> Also, is there anything in the OSM logs ?
> 
> 
>>>At this point, the transaction should be complete,
> 
> 
> Agreed.
> 
> 
>> but MADs continue
>>
>>>to arrive. In one case an ACK arrives which is an
>>>illegal packet to receive while in receiver terminate loop,
> 
> 
> OpenIB doesn't generate ACKs so this is a mystery to me.
> 
> 
>> so, ibmf
>>
>>>returns an ABORT. 
> 
> 
> Yes, if the receiver is terminating, receiving an ACK with IsDS 0 should
> cause an ABORT to be sent.
> 
> 
>>In the remaining cases, DATA packets are being
>>
>>>received after the transaction is complete,
> 
> 
> This is also a mystery to me. I don't know what these additional DATA
> packets from OpenIB would be.
> 
> 
>> so, ibmf drops them and
>>
>>>returns ACKs if in the receiver termination loop.
> 
> 
> Looks to me like these should also be ABORTed with BadT in the receiver
> termination loop.

My interpretation is that if a DATA packet arrives in the receiver
termination loop, an ACK should be sent. If an ACK arrives, and the
transaction is not double-sided, an ABORT should be sent. If any other
packet arrives, an ABORT should be sent.

>  
> 
>>>From the IBMF side, the following packets are being sent,
>>>o request data packet (non-rmpp, will not have base version set)
> 
> 
> Do you mean RMPP version (since RMPP is not active) ? Doesn't base
> version always need setting ?

Yes. I was looking at the wrong version field. Sorry about that.

> 
> 
>>>o ACK packet on completion of the transaction
>>>o ABORT packet on receiving an ACK
>>>
>>>Based on my review of the code, and the occurrence of certain
>>>messages, it looks to me like the base version should be set on all
>>>outgoing ACK and ABORT packets.
> 
> 
> Thanks for the detailed analysis.
> 
> -- Hal
> 
> 
> 

- Sandip




More information about the general mailing list