[openib-general] RE: Failed multicast join withnew multicast module
Sean Hefty
sean.hefty at intel.com
Wed Jun 7 19:48:42 PDT 2006
>I might be missing your point but UD is unreliable so the sends can be
>dropped. The delay/retry is to make sure the join does occur,
This is different than a dropped request or reply. In this case, the receiver
gets a reply, but it will be a failure from the SA to join the group. For
example, a NonMember tries to re-join before a FullMember which would have
created the group does. The result is that requests that receive a reply also
need to be retried, with the timeout dependent on some remote node in the fabric
creating the group.
>> So, the only safe thing to do is for all multicast clients to detach from all
>> multicast groups, destroy all address handles,
>
>Why all groups ?
Because the SM has lost track that any groups in the fabric existed, so those
groups must be recreated, all potentially with different mlids.
>> possibly wait for a new group to be created, and then start all over again.
>
>Start what all over again ?
I meant attach the QP to the new group and allocate a new address handle.
This is a general comment, and not directed at anyone specific, but is this
really the architecture and implementation that we want to aim for? I really
think that we need to look at solutions that don't break existing communication,
unless the links providing that communication actually go down, even if this
means extending the architecture.
- Sean
More information about the general
mailing list