[openib-general] Problem with directed route SMPs with beginning or ending LID routed parts
Ralph Campbell
ralphc at pathscale.com
Thu Jan 12 16:22:20 PST 2006
I'm trying to resolve a bug in the directed route handling code.
I thought I would alert the general list in case someone had
a solution. I'm continuing to find a clean fix but its
difficult.
The basic problem is that smi_handle_dr_smp_send() and
smi_handle_dr_smp_recv() are modifying the directed
route part (inc/dec hop_ptr) when the packet is in the LID
routed part of the path.
Here is an example:
Receive SubnGet(NodeInfo) LRH:DLID=0x0009, LRH:SLID=0x000A,
hop_ptr=2, hop_cnt=1, DrSLID=0xFFFF, DrDLID=0x0009.
It is processed OK through ib_mad_recv_done_handler() ...
port_priv->device->process_mad() generates OK response ...
agent_send_response() calls ib_create_ah_from_wc()
which creates the correct AH (to 0x000A) ...
ib_post_send_mad() calls handle_outgoing_dr_smp() which
calls smi_handle_dr_smp_send() which INCORRECTLY decrements
hop_ptr since this is a reply to 0x0009 not 0xFFFF.
The difficulty is that at this point, the AH is opaque so
you can't easily tell that the DLID isn't the permissive LID.
You can see that DrDLID isn't 0xFFFF but you can't just
return 1 in smi_handle_dr_smp_send() because if OpenIB
received this same reply (i.e., on node with LID=0x000A), it
would still think its at the beginning of the destination LID
routed part and not decrement hop_ptr.
I think there is a similar problem when sending requests
where the initial part of the path is LID routed.
--
Ralph Campbell <ralphc at pathscale.com>
More information about the general
mailing list