[ofw] [PATCH] IPOIB_NDIS6_CM remove NBL/MDL alloc & dealloc from UD recv data path

Smith, Stan stan.smith at intel.com
Tue Nov 9 14:44:46 PST 2010


Fab Tillier wrote:
> Hi Stan,
>
> Smith, Stan wrote on Tue, 9 Nov 2010 at 10:10:18
>
>> Hello,
>>   Enclosed you will find a technology preview patch implementing
>> Fab's suggestions about removing NETWORK_BUFFER_LIST and MDL
>> allocation and deallocation for every UD receive operation; now
>> allocated/deallocated in recv_pool constructor & destructor routines.
>> The results is approximately a 3 Mbyte/sec transfer rate increase;
>> less than I had envisioned although good nonetheless.
>
> What does it do for small message latencies?

Have not measured latencies per se.
Code instrumentation shows a .3529 micro-seconds removed from a single receive operation (CQ callback to rx repost).
Evidence would suggest a latency reduction; how much is not yet clear.
Will look into some MPI latency tests and let you know.


<snip...>
>> @@ -1506,11 +1579,19 @@
>>         /* Destroy the receive packet and buffer pools.
>>         if( p_port->buf_mgr.h_buffer_pool )
>>                 NdisFreeBufferPool( p_port->buf_mgr.h_buffer_pool);*/
+#if UD_NBL_IN_DESC
+       /* Free the receive and send descriptors. */
+       cl_qpool_destroy( &p_port->buf_mgr.recv_pool);

>>+     if( p_port->buf_mgr.h_packet_pool )
>> +               NdisFreeNetBufferListPool ( p_port->buf_mgr.h_packet_pool );
>> +#else
>>      if( p_port->buf_mgr.h_packet_pool )
>>                 NdisFreeNetBufferListPool ( p_port-> buf_mgr.h_packet_pool );
>>
>>         /* Free the receive and send descriptors. */
>>         cl_qpool_destroy( &p_port->buf_mgr.recv_pool );
>
> Would it be invalid to free the recv_pool before the packet_pool in
> the current code flow?  If not, no need to #if UD_NBL_IN_DESC this
> part, as the change would be valid regardless.  As it stands, this
> looks like a bigger change than it is at first glance.

Yes, the recv_pool holds a reference(NB alloc) from the h_packet_pool.
Best not to destroy packet pool before all elements are returned via recv_dtor() on recv_pool destroy.

<snip>
>> @@ -1657,12 +1800,29 @@
>>         IN                              ipoib_recv_desc_t* const
>>         p_desc, IN                              NET_BUFFER_LIST*
>>  const p_net_buffer_list OPTIONAL ) {
>> -       NET_BUFFER              *p_buf = NULL;
>> -       MDL                             *p_mdl = NULL;
>>         IPOIB_ENTER(IPOIB_DBG_RECV );
>>
>>         if( p_net_buffer_list )
>>         {
>> +               NET_BUFFER      *p_buf = NULL;
>> +               MDL                     *p_mdl = NULL; +#if
>> UD_NBL_IN_DESC +               ASSERT( p_desc->p_NBL );
>> +               ASSERT( p_desc->p_mdl );
>> +               ASSERT( p_net_buffer_list == p_desc->p_NBL ); +
>> +               NET_BUFFER_LIST_NEXT_NBL(p_net_buffer_list) = NULL;
>
> This code seems to assume that the NBL passed in is not chained to
> any others.  Might be worth asserting that NET_BUFFER_LIST_NEXT_NBL
> is NULL when you enter, rather than setting it?

This is original Mellanox IPoIB code.
There are cases where the caller is processing a list of NBLs so NEXT_NBL could be set.

>
>> +
>> +               p_buf = NET_BUFFER_LIST_FIRST_NB( p_net_buffer_list
>> );
>
> You also assume there's only one buffer in the NBL.  Worth an assert,
> IMO.

NBL pool is allocated such that there is only one buffer per NBL structure.
Assert() would validate initial assumption; will investigate.

>
>> +               p_mdl = NET_BUFFER_FIRST_MDL( p_buf );
>
> Same here with the MDLs.

ACK.


>
>> +
>> +               ASSERT( p_mdl == p_desc->p_mdl );
>> +               ASSERT( NET_BUFFER_CURRENT_MDL( p_buf ) == p_mdl ); +
>> +               /* reset Avail buffer lengths to full Rx size */
>
> Why do you need to reset these here?  Don't the length get set when
> the receive completes?  Do you use these fields when pre-posting the
> receive buffers?

Umm, seems like overkill now that you mention it.
Just keeping an orderly house.
Will investigate.


>
>> +               NET_BUFFER_DATA_LENGTH( p_buf ) =
>> sizeof(ipoib_pkt_t) + sizeof(ib_grh_t); +
>> NdisAdjustMdlLength( p_mdl, sizeof(ipoib_pkt_t) + sizeof(ib_grh_t)
>>                 );
+#else
>>                 NET_BUFFER_LIST_NEXT_NBL(p_net_buffer_list) = NULL;
>>                 p_buf = NET_BUFFER_LIST_FIRST_NB(p_net_buffer_list);
>> CL_ASSERT( p_buf ); @@ -1670,6 +1830,7 @@
>>                 CL_ASSERT( p_mdl );
>>                 NdisFreeMdl(p_mdl);
>>                 NdisFreeNetBufferList(p_net_buffer_list);
+#endif
>>         }
>>
>>         /* Return the descriptor to its pools. */
>> @@ -1701,7 +1862,25 @@
>>         MDL                                             *p_mdl;
>>
>>         IPOIB_ENTER(  IPOIB_DBG_RECV );
>> +#if UD_NBL_IN_DESC
>> +       PNET_BUFFER                     NetBuffer;
>> +       UNREFERENCED_PARAMETER(p_port);
>> +
>> +       IPOIB_ENTER( IPOIB_DBG_BUF );
>> +
>> +       ASSERT( p_desc->p_NBL );
>> +       ASSERT( p_desc->p_mdl );
>> +       p_net_buffer_list = p_desc->p_NBL;
>> +
>> +       NetBuffer = NET_BUFFER_LIST_FIRST_NB( p_net_buffer_list );
>> +       p_mdl = NET_BUFFER_FIRST_MDL( NetBuffer );
>> +
>> +       ASSERT( p_mdl == p_desc->p_mdl );
>> +       ASSERT( NET_BUFFER_CURRENT_MDL( NetBuffer ) == p_mdl );
>>
>> +       NET_BUFFER_DATA_LENGTH( NetBuffer ) = p_desc->len;
>> +       NdisAdjustMdlLength( p_mdl, p_desc->len );
>> +#else  // !UD_NBL_IN_DESC
>>         p_mdl = NdisAllocateMdl(p_port->p_adapter->h_adapter,
>>                                                         &p_desc-
>>> buf.eth.pkt,
>>                                                         p_desc->len
>>         ); @@ -1732,6 +1911,7 @@ IPOIB_PORT_FROM_NBL(
>>         p_net_buffer_list ) = p_port; IPOIB_RECV_FROM_NBL(
>>         p_net_buffer_list ) = p_desc;
>> p_net_buffer_list->SourceHandle = p_port->p_adapter->h_adapter;
>> +#endif // !UD_NBL_IN_DESC
>>
>>         IPOIB_EXIT(  IPOIB_DBG_RECV );
>>         return p_net_buffer_list;

TBC.

stan.




More information about the ofw mailing list