[ofw] [IPoIB_CM] Problems with large pings
Alex Naslednikov
xalex at mellanox.co.il
Thu Jan 8 08:01:54 PST 2009
Hello, Alex,
I tried to test IPoIB_CM module (in connected mode, rev. 1797) and found
a problem when sending large ping.
Settings: MTU_SIZE == 32K (32768), Connected Mode: On, ping size: 32K
Below is the description of the problem and some investigations.
Please, let us know your suggestions and comments regards this issue;
meanwhile, we continue to debug.
This ping always fails and almost always leads both machines to crash.
1. I enlarged the size of _ipoib_pkt!data to be MAX_CM_PAYLOAD_MTU just
to avoid possible memory corruption on receive side during the debug.
I found that this struct also used during CM operations and
theoretically can receive buffers larger than 4K (Max MTU size for UD)
typedef struct _ipoib_pkt
{
ipoib_hdr_t hdr;
union _payload
{
- uint8_t data[MAX_UD_PAYLOAD_MTU];
+ uint8_t data[MAX_CM_PAYLOAD_MTU];
ipoib_arp_pkt_t arp;
ip_pkt_t ip;
} PACK_SUFFIX type;
} PACK_SUFFIX ipoib_pkt_t;
2. Now the crash was not reproduced, but ping still didn't work.
That's because MAX_WRS_PER_MSG was defined as
MAX_CM_PAYLOAD_MTU/MAX_UD_PAYLOAD_MTU.
In our case, MAX_WRS_PER_MSG was equal to 16 (62520/4092).
In the case of ICMP packet IPoIB_CM still uses the UD mode; i.e. it
tries to fragment the message into chunks of 'payload_mtu'.
In our case, params.payload_mtu == 2K. That is, Ping of 32K should be
fragmented into 16 chunks.
>From the __send_fragments:
seg_len = ( next_sge > ( p_port->p_adapter->params.payload_mtu - wr_size
) )?
( p_port->p_adapter->params.payload_mtu - wr_size ) : next_sge;
3. But there's a following check inside __send_fragments:
if( wr_idx >= ( MAX_WRS_PER_MSG - 1 ) )
return NDIS_STATUS_RESOURCES;
In our case, wr_idx always reach 15 (i.e. 16 elements), so this check
fails.
4. And there's yet another inconsistence:
Not all cards support 4K MTU (in UD mode). Originally, it was possible
to enlarge MTU to 4K by using IPoIB parameters.
Now consider the following code:
p_adapter->params.cm_payload_mtu =
min( MAX_CM_PAYLOAD_MTU, p_adapter->params.payload_mtu );
p_adapter->params.cm_xfer_block_size =
p_adapter->params.cm_payload_mtu + sizeof(eth_hdr_t);
p_adapter->params.payload_mtu =
min( DEFAULT_PAYLOAD_MTU, p_adapter->params.payload_mtu);
p_adapter->params.xfer_block_size = (sizeof(eth_hdr_t) +
p_adapter->params.payload_mtu);
It means that if you has card that supports 4K MTU,
p_adapter->params.payload_mtu will nevertheless get only 2K value (and
that's still important in UD mode)
Thanks,
XaleX
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20090108/b1d1ff25/attachment.html>
More information about the ofw
mailing list