[openib-general] [RFC] [PATCH 2/7] ib_multicast 2.6.20: add ib_multicast module to track join requests from the same port

Sun Oct 15 01:28:26 PDT 2006

Sean Hefty wrote:
> Eitan Zahavi wrote:
>   
>> I disagree. If you sniff at the MAD level you can simply react to the 
>> lower level messages.
>>     
>
> First, when designing this, I did consider using the MAD snooping ability, and 
> changing what could be done with snooping.  However, the multicast handling is 
> not simply sniffing MADs going out on the wire and incrementing / decrementing 
> some count.  It can change or prevent a MAD from being sent.  This is a 
> fundamental change to the behavior of the ib_mad APIs.
>   
I am sorry I was not involved in that early stage. My bad.
I need to look deeper into the code. As long as a response is generated 
even though the MAD was not sent this is not
an API change but a bug fix.
In this stage it seems that only a patch would convince you otherwise. I 
will try working on it this week.
What I had in mind was to provide back a MAD response in the case of 
delete when the client is not the last one on the group.
All other MADs go on the wire (duplicate "join").
> MADs are sent and tracked by their respective registered ib_mad clients.  
Exactly and the agent ID is part of the MAD trans_id. So we know which 
agent is sending which MAD.
> Trying 
> to push this down into the MAD layer means that the send request from one client 
> may now occur on some other client's registration.  
Not sure I am following you here.
If you refer to the race where one client sends "join" while the other 
sends "leave" you should make sure:
1. Mark a client as "joined" only after receiving the SA response.
2. Consider a "leave" when the client MAD is sent out.

> If that client decides to 
> unregister in the middle of their send, the operation is canceled, and now needs 
> to be restarted on some other registration.  And even though the operation was 
> canceled, we still need to know whether it was seen by the SA.  This requires 
> sniffing all MADs, and quickly gets extremely complex.
>   
Cancel does not really revert a post_send. Isn't it?
So if we catch it just before it is posting we should be fine.
> In order to avoid issues these with which registered client is actually 
> performing the operation, the solution is to filter multicast requests through a 
> single registration.  
If each client uses its own agent ID then it is available in the 
trans_id of the MAD.
> The ib_mad layer is complex enough as it is.  (Have you 
> tried tracing a MAD through the send path?)  We don't need to push even more 
> functionality down into it.
>   
I agree that layering on top is easier. But does it really solve the 
bug? I think not. If you would REPLACE the API and not provide both options
(above and below refcount enforcement ) it would make sense to me.
> - Sean
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>