[ofa-general] [PATCH/RFC] IPoIB: Don't drop multicast packets sent before group is joined

Wed Jun 3 23:54:21 PDT 2009

Roland Dreier wrote:
>> arps are a different thing from regular payloads. They are not handled
>> using socket semantics.

> Oh right, I misunderstood the way the net core is using queue_len.

I assume we are talking on this net/core/neighbour.c :: _neigh_event_send() code piece

   923		if (neigh->nud_state == NUD_INCOMPLETE) {
   924			if (skb) {
   925				if (skb_queue_len(&neigh->arp_queue) >=
   926				    neigh->parms->queue_len) {
   927					struct sk_buff *buff;
   928					buff = __skb_dequeue(&neigh->arp_queue);
   929					kfree_skb(buff);
   930					NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards);
   931				}
   932				__skb_queue_tail(&neigh->arp_queue, skb);
   933			}
   934			rc = 1;
   935		}

my understanding is that the packet to be queued is not the ARP but rather what the socket / user attempt to send which can't be served by the kernel before the resolution is known. 

So in that respect, since with IPoIB two lookups are needed - 1st, resolve from L3 IP address to L3 IPoIB address and 2nd, resolve from L3 to L2 IB address (GID --> LID, MGID --> MLID). Following that, maybe we could integrate somehow with that sysctl param, such that the user would be able to control how much packets would be queued before the IB L2 resolution is done. As was stated here, with the two patches, the user still has control through the socket send buffer, which sounds fair-enough to me.

> OK, I think your patch makes sense and I'll add Or's too.

thanks

Or.