[openib-general] RE: [RFC] DAT 2.0 immediate data proposal

Kanevsky, Arkady Arkady.Kanevsky at netapp.com
Tue Jan 24 15:46:18 PST 2006


But this penalizes user which need to deal with 2 way to deal
with post calls and completions.

I do not think we are not to far from consensus.
Transport independent App will allocate 4 bytes extra
for buffers that can match immediate data.
Completion data will return where the immediate data is return
(Consumer can not request it on posting), and 4 bytes for immediate
data in completion event.
The rest are ironing details for complete specification.
This is no different than for any other new functionality proposed.
And except for wasting 4 bytes per buffer or completion I do
not see how it penalizes IB. Moreover if Apps knows that Provider
returns immediate data in completion event it can avoid any penalty.

Arkady

Arkady Kanevsky                       email: arkady at netapp.com
Network Appliance Inc.               phone: 781-768-5395
1601 Trapelo Rd. - Suite 16.        Fax: 781-895-1195
Waltham, MA 02451                   central phone: 781-768-5300
 

> -----Original Message-----
> From: Arlin Davis [mailto:ardavis at ichips.intel.com] 
> Sent: Tuesday, January 24, 2006 5:42 PM
> To: Caitlin Bestler
> Cc: Davis, Arlin R; Kanevsky, Arkady; Lentini, James; 
> dat-discussions at yahoogroups.com; openib-general at openib.org
> Subject: Re: [openib-general] RE: [RFC] DAT 2.0 immediate 
> data proposal
> 
> ok, maybe we should backup and start over....
> 
> This is exactly why immediate data was initially proposed as 
> an extension instead of general API. We start to penalize 
> native IB features based on the requirements of other RDMA 
> interfaces that have to emulate the feature anyway.  What 
> prevents the next  RDMA interface that comes along from 
> requiring other variations of the interface due to 
> implementation implications?  This is an IB specific feature 
> that does not map well on iWARP so lets just call it what it 
> is and let IB providers supply immediate data capabilities 
> via the extension interface.
> 
> -arlin
> 
> Caitlin Bestler wrote:
> 
> >>
> >>Maybe we need to just go back to one model and always deliver
> >>via the event? With the post_recv_immed requirements, other
> >>transports have a mechanism to emulate and create the
> >>necessary resources on the recv side to place idata and copy
> >>to event when operation is completed. Would this work for iWARP?
> >>
> >>
> >>
> >>Two different models for receiving idata should be avoided if
> >>at all possible.
> >>
> >>
> >>
> >>    
> >>
> >
> >Always delivering by the event is not feasible for an iWARP vendor.
> >If you are working over RDMAC verbs then the work completion is no
> >longer accessible by the time the Work Completion is reaped. 
> So copying
> >from the receive buffer to the event does not work since the location
> >of the receive buffer is now known only to the application.
> >
> >The same problem exists in the opposite direction for InfiniBand HCAs
> >using standard verbs. They cannot copy from the CQE to the receive
> >buffer.
> >
> >So the user is stuck checking a flag or the event type to know where
> >their data is. This is not terribly user friendly, but it is the best
> >that can be offered if we want to enable this optimization. The need
> >to check the flag does reduce the value of the optimization though.
> >
> >
> >  
> >
> >>
> >>6. Is dto_completion_data xfer_length include immediate_data
> >>size or not?
> >>
> >>
> >>
> >>no
> >>
> >>
> >>
> >>    
> >>
> >
> >Then how does the receiver know how much data there is?
> >
> >Even if an iWarp Provider attempts to optimize immediate
> >placement into the CQ, it will end up setting the xfer_length
> >whenever the packet is received out of order.
> >
> >So it is far simpler for the application to simply know that
> >the data will be in the buffer, and that the xfer_length will
> >be set. It doesn't need to worry about whether they were set
> >by the cq_poll verb or by the hardware.
> >
> >  
> >
> >>
> >>11. Need to cleanup operation description to make it clear
> >>that Send|RDMA_write and immediate data part
> >>
> >>is a single atomic operation. The current "followed by"
> >>language is misleading.
> >>
> >>Make it explicit that there is a single local DTO completion
> >>and single remote DTO completion.
> >>
> >>
> >>
> >>Ok, I will clean that up
> >>
> >>
> >>    
> >>
> >
> >The best mapping available over RDMAC-compliant firmware for
> >an iWARP NIC would be to post two operations (RDMA Write followed
> >by a short Send). That would require additional spacein the send
> >and completion queues since a completion for the write can only
> >be suppressed for a successful completion.
> >
> >Whether these extra slots were required would be an IA attribute.
> >
> >And the requirement is that nothing for that QP can come between
> >the iWARP Write and the Send. How the provider does that is up
> >to it. Options include locking over both posts and a composite
> >work request. Anyone working over existing RDMAC-compliant
> >verbs will have to use the first approach.
> >
> >
> >  
> >
> >>12. Is your intension that post_recv_immed can ONLY except
> >>immediate data and is not
> >>
> >>capable to recv any message?
> >>
> >>
> >>
> >>No, the intention is to extend the post_recv to handle 32bit
> >>idata which may arrive with or without other send or 
> rdma_write data.
> >>
> >>
> >>
> >>Does it make more sense to add a dto_flags to the existing 
> post_recv?
> >>
> >>
> >>    
> >>
> >
> >How does this map to iWARP?
> >
> >When the data can be sent as an immediate OR as data, then 
> when received
> >it can be placed into the receive buffer or even potentially directly
> >into the CQ when everything aligns just right.
> >
> >But an iWARP sender has to place the immediate value as the first
> >four bytes of a Send message. There is no other mapping than makes
> >sense. Shoving the rest of the message up is complex, as is using
> >the last four bytes of the message since the last four bytes *could*
> >cross a DDP Segment boundary, and would require the user to provide
> >a buffer that was 4 bytes larger.
> > 
> >
> >
> >
> >_______________________________________________
> >openib-general mailing list
> >openib-general at openib.org
> >http://openib.org/mailman/listinfo/openib-general
> >
> >To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> >
> >  
> >
> 



More information about the general mailing list