[ofw] [PATCH] IPOIB_NDIS6_CM remove NBL/MDL alloc & dealloc from UD recv data path
Smith, Stan
stan.smith at intel.com
Tue Nov 9 14:44:46 PST 2010
Fab Tillier wrote:
> Hi Stan,
>
> Smith, Stan wrote on Tue, 9 Nov 2010 at 10:10:18
>
>> Hello,
>> Enclosed you will find a technology preview patch implementing
>> Fab's suggestions about removing NETWORK_BUFFER_LIST and MDL
>> allocation and deallocation for every UD receive operation; now
>> allocated/deallocated in recv_pool constructor & destructor routines.
>> The results is approximately a 3 Mbyte/sec transfer rate increase;
>> less than I had envisioned although good nonetheless.
>
> What does it do for small message latencies?
Have not measured latencies per se.
Code instrumentation shows a .3529 micro-seconds removed from a single receive operation (CQ callback to rx repost).
Evidence would suggest a latency reduction; how much is not yet clear.
Will look into some MPI latency tests and let you know.
<snip...>
>> @@ -1506,11 +1579,19 @@
>> /* Destroy the receive packet and buffer pools.
>> if( p_port->buf_mgr.h_buffer_pool )
>> NdisFreeBufferPool( p_port->buf_mgr.h_buffer_pool);*/
+#if UD_NBL_IN_DESC
+ /* Free the receive and send descriptors. */
+ cl_qpool_destroy( &p_port->buf_mgr.recv_pool);
>>+ if( p_port->buf_mgr.h_packet_pool )
>> + NdisFreeNetBufferListPool ( p_port->buf_mgr.h_packet_pool );
>> +#else
>> if( p_port->buf_mgr.h_packet_pool )
>> NdisFreeNetBufferListPool ( p_port-> buf_mgr.h_packet_pool );
>>
>> /* Free the receive and send descriptors. */
>> cl_qpool_destroy( &p_port->buf_mgr.recv_pool );
>
> Would it be invalid to free the recv_pool before the packet_pool in
> the current code flow? If not, no need to #if UD_NBL_IN_DESC this
> part, as the change would be valid regardless. As it stands, this
> looks like a bigger change than it is at first glance.
Yes, the recv_pool holds a reference(NB alloc) from the h_packet_pool.
Best not to destroy packet pool before all elements are returned via recv_dtor() on recv_pool destroy.
<snip>
>> @@ -1657,12 +1800,29 @@
>> IN ipoib_recv_desc_t* const
>> p_desc, IN NET_BUFFER_LIST*
>> const p_net_buffer_list OPTIONAL ) {
>> - NET_BUFFER *p_buf = NULL;
>> - MDL *p_mdl = NULL;
>> IPOIB_ENTER(IPOIB_DBG_RECV );
>>
>> if( p_net_buffer_list )
>> {
>> + NET_BUFFER *p_buf = NULL;
>> + MDL *p_mdl = NULL; +#if
>> UD_NBL_IN_DESC + ASSERT( p_desc->p_NBL );
>> + ASSERT( p_desc->p_mdl );
>> + ASSERT( p_net_buffer_list == p_desc->p_NBL ); +
>> + NET_BUFFER_LIST_NEXT_NBL(p_net_buffer_list) = NULL;
>
> This code seems to assume that the NBL passed in is not chained to
> any others. Might be worth asserting that NET_BUFFER_LIST_NEXT_NBL
> is NULL when you enter, rather than setting it?
This is original Mellanox IPoIB code.
There are cases where the caller is processing a list of NBLs so NEXT_NBL could be set.
>
>> +
>> + p_buf = NET_BUFFER_LIST_FIRST_NB( p_net_buffer_list
>> );
>
> You also assume there's only one buffer in the NBL. Worth an assert,
> IMO.
NBL pool is allocated such that there is only one buffer per NBL structure.
Assert() would validate initial assumption; will investigate.
>
>> + p_mdl = NET_BUFFER_FIRST_MDL( p_buf );
>
> Same here with the MDLs.
ACK.
>
>> +
>> + ASSERT( p_mdl == p_desc->p_mdl );
>> + ASSERT( NET_BUFFER_CURRENT_MDL( p_buf ) == p_mdl ); +
>> + /* reset Avail buffer lengths to full Rx size */
>
> Why do you need to reset these here? Don't the length get set when
> the receive completes? Do you use these fields when pre-posting the
> receive buffers?
Umm, seems like overkill now that you mention it.
Just keeping an orderly house.
Will investigate.
>
>> + NET_BUFFER_DATA_LENGTH( p_buf ) =
>> sizeof(ipoib_pkt_t) + sizeof(ib_grh_t); +
>> NdisAdjustMdlLength( p_mdl, sizeof(ipoib_pkt_t) + sizeof(ib_grh_t)
>> );
+#else
>> NET_BUFFER_LIST_NEXT_NBL(p_net_buffer_list) = NULL;
>> p_buf = NET_BUFFER_LIST_FIRST_NB(p_net_buffer_list);
>> CL_ASSERT( p_buf ); @@ -1670,6 +1830,7 @@
>> CL_ASSERT( p_mdl );
>> NdisFreeMdl(p_mdl);
>> NdisFreeNetBufferList(p_net_buffer_list);
+#endif
>> }
>>
>> /* Return the descriptor to its pools. */
>> @@ -1701,7 +1862,25 @@
>> MDL *p_mdl;
>>
>> IPOIB_ENTER( IPOIB_DBG_RECV );
>> +#if UD_NBL_IN_DESC
>> + PNET_BUFFER NetBuffer;
>> + UNREFERENCED_PARAMETER(p_port);
>> +
>> + IPOIB_ENTER( IPOIB_DBG_BUF );
>> +
>> + ASSERT( p_desc->p_NBL );
>> + ASSERT( p_desc->p_mdl );
>> + p_net_buffer_list = p_desc->p_NBL;
>> +
>> + NetBuffer = NET_BUFFER_LIST_FIRST_NB( p_net_buffer_list );
>> + p_mdl = NET_BUFFER_FIRST_MDL( NetBuffer );
>> +
>> + ASSERT( p_mdl == p_desc->p_mdl );
>> + ASSERT( NET_BUFFER_CURRENT_MDL( NetBuffer ) == p_mdl );
>>
>> + NET_BUFFER_DATA_LENGTH( NetBuffer ) = p_desc->len;
>> + NdisAdjustMdlLength( p_mdl, p_desc->len );
>> +#else // !UD_NBL_IN_DESC
>> p_mdl = NdisAllocateMdl(p_port->p_adapter->h_adapter,
>> &p_desc-
>>> buf.eth.pkt,
>> p_desc->len
>> ); @@ -1732,6 +1911,7 @@ IPOIB_PORT_FROM_NBL(
>> p_net_buffer_list ) = p_port; IPOIB_RECV_FROM_NBL(
>> p_net_buffer_list ) = p_desc;
>> p_net_buffer_list->SourceHandle = p_port->p_adapter->h_adapter;
>> +#endif // !UD_NBL_IN_DESC
>>
>> IPOIB_EXIT( IPOIB_DBG_RECV );
>> return p_net_buffer_list;
TBC.
stan.
More information about the ofw
mailing list