[ofw] [IPoIB_CM] Problems with large pings

Alex Naslednikov xalex at mellanox.co.il
Thu Jan 8 08:01:54 PST 2009


Hello, Alex,
I tried to test IPoIB_CM module (in connected mode, rev. 1797) and found
a problem when sending large ping. 
Settings: MTU_SIZE == 32K (32768), Connected Mode: On, ping size: 32K 
Below is the description of the problem and some investigations. 
Please, let us know your suggestions and comments regards this issue;
meanwhile, we continue to debug.
 
 
This ping always fails and almost always leads both machines to crash.
1. I enlarged the size of _ipoib_pkt!data to be MAX_CM_PAYLOAD_MTU just
to avoid possible memory corruption on receive side during the debug.
I found that this struct also used during CM operations and
theoretically can receive buffers larger than 4K (Max MTU size for UD)
 
typedef struct _ipoib_pkt
{
 ipoib_hdr_t  hdr;
 union _payload
 {
- uint8_t   data[MAX_UD_PAYLOAD_MTU];
+ uint8_t   data[MAX_CM_PAYLOAD_MTU];
  ipoib_arp_pkt_t arp;
  ip_pkt_t  ip;
 
 } PACK_SUFFIX type;
 
} PACK_SUFFIX ipoib_pkt_t;
 
2. Now the crash was not reproduced, but ping still didn't work.
That's because MAX_WRS_PER_MSG was defined as
MAX_CM_PAYLOAD_MTU/MAX_UD_PAYLOAD_MTU.
In our case, MAX_WRS_PER_MSG  was equal to 16 (62520/4092).
In the case of ICMP packet IPoIB_CM still uses the UD mode; i.e. it
tries to fragment the message into chunks of 'payload_mtu'.
In our case, params.payload_mtu == 2K. That is, Ping of 32K should be
fragmented into 16 chunks.
 
>From the __send_fragments:
seg_len = ( next_sge > ( p_port->p_adapter->params.payload_mtu - wr_size
) )?
    ( p_port->p_adapter->params.payload_mtu - wr_size ) : next_sge;
 
3. But there's a following check inside __send_fragments:
 
  if( wr_idx >= ( MAX_WRS_PER_MSG - 1 ) )
   return NDIS_STATUS_RESOURCES;
 
In our case, wr_idx always reach 15 (i.e. 16 elements), so this check
fails.
 
4. And there's yet another inconsistence:
Not all cards support 4K MTU (in UD mode). Originally, it was possible
to enlarge MTU to 4K by using IPoIB parameters.
Now consider the following code:
 
 p_adapter->params.cm_payload_mtu =
   min( MAX_CM_PAYLOAD_MTU, p_adapter->params.payload_mtu );
 p_adapter->params.cm_xfer_block_size = 
   p_adapter->params.cm_payload_mtu + sizeof(eth_hdr_t);
 p_adapter->params.payload_mtu = 
   min( DEFAULT_PAYLOAD_MTU, p_adapter->params.payload_mtu);
 p_adapter->params.xfer_block_size = (sizeof(eth_hdr_t) +
p_adapter->params.payload_mtu);
 
It means that if you has card that supports 4K MTU,
p_adapter->params.payload_mtu will nevertheless get only 2K value (and
that's still important in UD mode)
 
Thanks,
XaleX
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20090108/b1d1ff25/attachment.html>


More information about the ofw mailing list