[ofw] [IPoIB_CM] Problems with large pings

Alex Estrin alex.estrin at qlogic.com
Thu Jan 8 09:37:00 PST 2009


Hello,

Thank you for your feedback.

1. This sturcture was used for header dereference only.
And CM receive buffers are allocated of size  = cm_xfer_block_size.
Please see  __cm_recv_desc_ctor() in ipoib_endpoint.c
Could you please point me to the spot where memory corruption is possible?

2-3. Yes, this is an issue I'm aware of.
The major the problem is,I think, that send descriptor is used from the stack, but setting MAX_WRS_PER_MSG  > 16 would make send descriptor too big and cause stack overflow. The possible solution would be to use pool of send descriptors (similar to what we use on receive side).

4. For 4k fabrics IPoIB parameter setting sometimes is not enough.
There are also legacy HCAs and switches can coexist on the same fabric.
However you are right, the way payload_mtu is adjusted could be done better.
I think the best way to handle this is to get actual port mtu value from ca query during port initialization,
Then adjust payload_mtu parameter accordingly.
(also in this case NdisMInitializeScatterGatherDma should be called after param is fixed). 

Thoughts?

Thanks,
Alex. 

> -----Original Message-----
> From: Alex Naslednikov [mailto:xalex at mellanox.co.il] 
> Sent: Thursday, January 08, 2009 11:02 AM
> To: Alex Estrin
> Cc: Tzachi Dar; Ishai Rabinovitz; ofw at lists.openfabrics.org
> Subject: [ofw] [IPoIB_CM] Problems with large pings
> 
> Hello, Alex,
> I tried to test IPoIB_CM module (in connected mode, rev. 
> 1797) and found a problem when sending large ping. 
> Settings: MTU_SIZE == 32K (32768), Connected Mode: On, ping size: 32K 
> Below is the description of the problem and some investigations. 
> Please, let us know your suggestions and comments regards 
> this issue; meanwhile, we continue to debug.
>  
>  
> This ping always fails and almost always leads both machines to crash.
> 1. I enlarged the size of _ipoib_pkt!data to be 
> MAX_CM_PAYLOAD_MTU just to avoid possible memory corruption 
> on receive side during the debug.
> I found that this struct also used during CM operations and 
> theoretically can receive buffers larger than 4K (Max MTU size for UD)
>  
> typedef struct _ipoib_pkt
> {
>  ipoib_hdr_t  hdr;
>  union _payload
>  {
> - uint8_t   data[MAX_UD_PAYLOAD_MTU];
> + uint8_t   data[MAX_CM_PAYLOAD_MTU];
>   ipoib_arp_pkt_t arp;
>   ip_pkt_t  ip;
>  
>  } PACK_SUFFIX type;
>  
> } PACK_SUFFIX ipoib_pkt_t;
>
> 2. Now the crash was not reproduced, but ping still didn't work.
> That's because MAX_WRS_PER_MSG was defined as 
> MAX_CM_PAYLOAD_MTU/MAX_UD_PAYLOAD_MTU.
> In our case, MAX_WRS_PER_MSG  was equal to 16 (62520/4092).
> In the case of ICMP packet IPoIB_CM still uses the UD mode; 
> i.e. it tries to fragment the message into chunks of 'payload_mtu'.
> In our case, params.payload_mtu == 2K. That is, Ping of 32K 
> should be fragmented into 16 chunks.
>  
> From the __send_fragments:
> seg_len = ( next_sge > ( 
> p_port->p_adapter->params.payload_mtu - wr_size ) )?
>     ( p_port->p_adapter->params.payload_mtu - wr_size ) : next_sge;
>  
> 3. But there's a following check inside __send_fragments:
>  
>   if( wr_idx >= ( MAX_WRS_PER_MSG - 1 ) )
>    return NDIS_STATUS_RESOURCES;
>  
> In our case, wr_idx always reach 15 (i.e. 16 elements), so 
> this check fails.
>  
> 4. And there's yet another inconsistence:
> Not all cards support 4K MTU (in UD mode). Originally, it was 
> possible to enlarge MTU to 4K by using IPoIB parameters.
> Now consider the following code:
>  
>  p_adapter->params.cm_payload_mtu =
>    min( MAX_CM_PAYLOAD_MTU, p_adapter->params.payload_mtu );
>  p_adapter->params.cm_xfer_block_size = 
>    p_adapter->params.cm_payload_mtu + sizeof(eth_hdr_t);
>  p_adapter->params.payload_mtu = 
>    min( DEFAULT_PAYLOAD_MTU, p_adapter->params.payload_mtu);
>  p_adapter->params.xfer_block_size = (sizeof(eth_hdr_t) + 
> p_adapter->params.payload_mtu);
>  
> It means that if you has card that supports 4K MTU, 
> p_adapter->params.payload_mtu will nevertheless get only 2K 
> value (and that's still important in UD mode)
>  
> Thanks,
> XaleX
>  
> 


More information about the ofw mailing list