[openib-general] Re: [Fwd: Re: [Fwd: Re: OpenSM Bugs]]
sandip barua
Sandip.Barua at Sun.COM
Thu Jan 20 13:07:28 PST 2005
I did find the bug that causes the incorrect base version for outgoing
MADs while in the receiver termination loop. I fixed the software on
Tom's solaris system. I still do see the incoming ACK, but not many
other unexpected packets from a brief look at the traces.
A couple responses below as well.
- Sandip
Hal Rosenstock wrote:
> On Thu, 2005-01-20 at 13:45, Tom Duffy wrote:
>
>>On Wed, 2005-01-19 at 19:15 -0500, Hal Rosenstock wrote:
>>
>>>The OpenIB side wouldn't send an ACK so this message must indicate
>>>something else. Any chance you can get an IB trace of what is going on ?
>>>Alternatively can you dump out the MAD where the unsupported base
>>>version is printed out ? It will take me a little bit before I am setup
>>>to try to recreate this.
>>
>>Here is the analysis of what is going on (from Solaris's perspective).
>>
>>Sandip Barua wrote:
>>
>>>Based on the trace output I obtained this morning,
>>>I am seeing a number of the following transactions:
>>>
>>>1) IBMF makes a request (non-RMPP send)
>
>
> This is the SA GetTable request.
>
>
>>>2) openIB responds with one RMPP MAD which is marked as both
>>> the first and the last packet of the RMPP transaction
>
>
> This is the SA GetTableResp.
>
>
>>>3) IBMF sends an ACK
>
>
> Not 100% sure what the OpenIB side would do with this. The ACK is an SA
> packet which should have the MAD header properly filled in.
>
> Can you insert some code to dump the first 32 bytes of the MAD when the
> base version is unsupported (in the if clause at line 1463)
> and send me the output ?
>
> Also, is there anything in the OSM logs ?
>
>
>>>At this point, the transaction should be complete,
>
>
> Agreed.
>
>
>> but MADs continue
>>
>>>to arrive. In one case an ACK arrives which is an
>>>illegal packet to receive while in receiver terminate loop,
>
>
> OpenIB doesn't generate ACKs so this is a mystery to me.
>
>
>> so, ibmf
>>
>>>returns an ABORT.
>
>
> Yes, if the receiver is terminating, receiving an ACK with IsDS 0 should
> cause an ABORT to be sent.
>
>
>>In the remaining cases, DATA packets are being
>>
>>>received after the transaction is complete,
>
>
> This is also a mystery to me. I don't know what these additional DATA
> packets from OpenIB would be.
>
>
>> so, ibmf drops them and
>>
>>>returns ACKs if in the receiver termination loop.
>
>
> Looks to me like these should also be ABORTed with BadT in the receiver
> termination loop.
My interpretation is that if a DATA packet arrives in the receiver
termination loop, an ACK should be sent. If an ACK arrives, and the
transaction is not double-sided, an ABORT should be sent. If any other
packet arrives, an ABORT should be sent.
>
>
>>>From the IBMF side, the following packets are being sent,
>>>o request data packet (non-rmpp, will not have base version set)
>
>
> Do you mean RMPP version (since RMPP is not active) ? Doesn't base
> version always need setting ?
Yes. I was looking at the wrong version field. Sorry about that.
>
>
>>>o ACK packet on completion of the transaction
>>>o ABORT packet on receiving an ACK
>>>
>>>Based on my review of the code, and the occurrence of certain
>>>messages, it looks to me like the base version should be set on all
>>>outgoing ACK and ABORT packets.
>
>
> Thanks for the detailed analysis.
>
> -- Hal
>
>
>
- Sandip
More information about the general
mailing list