[openib-general] [PATCH] agent: Fix agent_mad_send PCI mapping and gather address and length

Hal Rosenstock halr at voltaire.com
Wed Nov 10 13:19:29 PST 2004


I haven't cleared the other issues before getting back to this but
wanted to respond to some of the points below:

On Tue, 2004-11-09 at 23:55, Roland Dreier wrote:
>     Roland> OK, I think I understand the problem, but I'm not sure
>     Roland> what the correct solution is.  When a DR SMP arrives at a
>     Roland> CA from the SM, hop_cnt == hop_ptr == number of hops in
>     Roland> the directed route,
> 
>     Hal> What was the number ?
> 
> For one port it was 4 and for another it was 6.  It could really be
> anything (it's just how many hops away the SM is).

I think I understand how DR is supposed to work :-) I was just looking
for the actual values in the failed case to try to understand what is
code was doing as I don't have a configuration to recreate this (at
least yet).

>From what you indicated, it looks like it would be the following case
so no response would be sent:

	/* C14-13:2 */
	if (2 <= hop_ptr && hop_ptr <= hop_cnt) {
		if (node_type != IB_NODE_SWITCH
			return 0;

but I'm not sure whether those were the values on entry to the
smi_dr_handle_smp_recv routine that was excised from the code.

>     Hal> I integrated this patch and checked it back in. I don't think
>     Hal> this is the solution for all cases (and something else is
>     Hal> broken).
> 
> Could be.  I had a hard time checking the code in smi.c (which is
> split between smi_handle_dr_smp_recv() and smi_handle_dr_smp_send() as
> well as smi_check_forward_dr_smp(), but which has outgoing and
> returning DR handling mixed together) against the IB spec (which
> splits outgoing and returning DR handling).

I had to squint hard the first time I went through this too (and
probably will again). I will explain how this works in sufficient detail
if this is of interest.

>     Hal> The second call to smi_handle_dr_smp_recv was to validate the
>     Hal> DR in the response packet before sending it. The response
>     Hal> would be a returning DR packet (D bit 1). If hop_cnt ==
>     Hal> hop_ptr,
> 
> I guess the problem with calling smi_handle_dr_smp_recv() twice on the
> same packet is that the function may alter the packet.

No, the second call to smi_handle_dr_smp_recv() was on the outgoing
response and not the incoming request. The thought was that a packet
coming from process_mad is much like an incoming received packet and
hence the call to smi_handle_dr_smp_recv. The routine validates the
packet but also can do some fixups depending on which case it falls
into. Guess it's only dangerous to validate this and wrong to fix it up.

The key to me is the following:
The split of responsibility on the DR header formation is a little
unclear to me. In the case of the SM, are the DR headers fully formed
before handing it to the MAD layer or is some DR fixup needed ?

-- Hal




More information about the general mailing list