[openib-general] multicast code/merge status

Or Gerlitz ogerlitz at voltaire.com
Tue Jan 9 23:10:44 PST 2007


Sean Hefty wrote:
>> Other then that, as we discussed in SC06 there are some changes that 
>> need to be integrated in the code to allow for interoperability 
>> between a multicast rdma cm based app to IPoIB, specifically removing 
>> the RDMA CM signature from the mgid which generated from the ip addr 
>> and pkey, but not only.

> ...I have not completed these changes yet.  Specifically, I have not 
> added a send only join parameter or changed the qkey.

OK, I understand that adding a send only join param changes the 
librdmacm/ucma ABI and further that you might be somehow busy to fully 
implement the sendonly scheme at the multicast code for the 2.6.21 time 
frame.

How about adding sendonly param to the ABI and having the ucma kernel 
code returning -EINVAL if someone tries to set it to true. Such code can 
be pushed to 2.6.21 and when you have the time to complete the 
implementation you can complete this?

> I have also not full examined an issue where the SM log fills up with 
> bad multicast join requests.

Is it what Dotan has reported? i recall the test does not use librdmacm 
nor IPoIB, so how does it exercise the kernel ib_sa api at all ??? i 
guess it uses libibmad or libibumad to send the joins etc.


>> The second change is related to the qkey, looking in the current code
>> of cma_join_ib_multicast() (at the multicast-sa_cache branch of the 
>> rdma-dev git) i see that the qkey is the mc ip address, which is not 
>> consistent with what librdmacm is assuming (0x1234567 etc).
> 
> This is a bug in the kernel code.  It should be using the standard qkey 
> of 0x12345678 - for now anyway.

OK

>> Anyway, what we need here is to plug into the scheme of ipoib which 
>> uses   the qkey associated with the ipv4 broadcast multicast group. It 
>> turns out that there is some twilight zone here which i am working to 
>> understand better. You can see that for the ipv4 brd group ipoib lets 
>> the SM to allocate the group and qkey (ie the create param of 
>> ipoib_mcast_join is zero), i will give it some thought and let you 
>> know how i think the rdma cm can plug into this scheme, will be happy 
>> to get   other ideas as well.

> The rdma_cm knows the qkey that ipoib uses before it joins a multicast 
> group. See cma_join_ib_multicast() - call to ib_sa_get_mcmember_rec().

Looking on the code, i understand that if an multicast consumer attempts 
to join a group for which another consumer is already joined then it 
just gets the group params, that is the mgid is your discriminator (with 
the exception of an all zeros mgid which has a different treatment) 
which makes much sense to me.

Going forward with this idea, a cma consumer that wants to use the ipv4 
broadcast group qkey can join the group and learn the qkey.

However, there are two problems with this approach

a) it can't provide the qkey to the rdma cm for another group it want to 
join and assuming the --local-- ipoib is not joined on the other group, 
we are back to the original problem.

b) assuming the above problem is solved, the cma consumer must stay 
on-line (ie not leave) with the broadcast group and hence will get all 
the ipv4 broadcast traffic of the cluster.

We can assume that at least some of the multicast traffic of the node is 
routed to an ipoib subnet, we can further assume that the net stack 
would cause ipoib to join to the mgroup related to the "all hosts" ipv4 
address --> 224.0.0.1

Since for our apps needs we do intend to join the 224.0.0.1 group, 
resolving a) above is fine for us --> we will join 224.0.0.1 above, 
provide the qkey to the rdma cm and it will join to the other group (eg 
224.5.5.5) with this qkey.

what do you think?

> int ib_sa_get_mcmember_rec(struct ib_device *device, u8 port_num,
> 			   union ib_gid *mgid, struct ib_sa_mcmember_rec *rec)
> {
> 	struct mcast_device *dev;
> 	struct mcast_port *port;
> 	struct mcast_group *group;
> 	unsigned long flags;
> 	int ret = 0;
> 
> 	dev = ib_get_client_data(device, &mcast_client);
> 	if (!dev)
> 		return -ENODEV;
> 
> 	port = &dev->port[port_num - dev->start_port];
> 	if (mgid && memcmp(mgid, &mgid0, sizeof mgid0)) {
> 		spin_lock_irqsave(&port->lock, flags);
> 		group = mcast_find(port, mgid);
> 		if (group)
> 			*rec = group->rec;
> 		else
> 			ret = -EADDRNOTAVAIL;
> 		spin_unlock_irqrestore(&port->lock, flags);
> 	} else {
> 		memset(rec, 0, sizeof *rec);
> 		ib_get_cached_gid(device, port_num, 0, &rec->port_gid);
> 		rec->pkey = 0xFFFF;
> 		get_random_bytes(&rec->qkey, sizeof rec->qkey);
> 		rec->join_state = 1;

can you remind me what the idea/trick here, aren't you supposed to 
generate an mgid for this case?

> 	}
> 
> 	return ret;
> }

Or.





More information about the general mailing list