[openib-general] Problem with directed route SMPs with beginning or ending LID routed parts
halr at voltaire.com
Thu Jan 12 17:43:18 PST 2006
On Thu, 2006-01-12 at 19:22, Ralph Campbell wrote:
> I'm trying to resolve a bug in the directed route handling code.
> I thought I would alert the general list in case someone had
> a solution. I'm continuing to find a clean fix but its
> The basic problem is that smi_handle_dr_smp_send() and
> smi_handle_dr_smp_recv() are modifying the directed
> route part (inc/dec hop_ptr) when the packet is in the LID
> routed part of the path.
> Here is an example:
> Receive SubnGet(NodeInfo) LRH:DLID=0x0009, LRH:SLID=0x000A,
> hop_ptr=2, hop_cnt=1, DrSLID=0xFFFF, DrDLID=0x0009.
> It is processed OK through ib_mad_recv_done_handler() ...
> port_priv->device->process_mad() generates OK response ...
> agent_send_response() calls ib_create_ah_from_wc()
> which creates the correct AH (to 0x000A) ...
> ib_post_send_mad() calls handle_outgoing_dr_smp() which
> calls smi_handle_dr_smp_send() which INCORRECTLY decrements
> hop_ptr since this is a reply to 0x0009 not 0xFFFF.
> The difficulty is that at this point, the AH is opaque so
> you can't easily tell that the DLID isn't the permissive LID.
> You can see that DrDLID isn't 0xFFFF but you can't just
> return 1 in smi_handle_dr_smp_send() because if OpenIB
> received this same reply (i.e., on node with LID=0x000A), it
> would still think its at the beginning of the destination LID
> routed part and not decrement hop_ptr.
> I think there is a similar problem when sending requests
> where the initial part of the path is LID routed.
Yes, I've been aware of this test case for some time now but haven't had
the chance to figure out a good solution either. It currently is a
compliance test which is not used by any SM that I'm aware of. Maybe
I'll have some cycles early next week to work on this. Thanks for the
analysis of it.
More information about the general