[openib-general] RE: [RFC] DAT 2.0 immediate data proposal

Mon Jan 23 16:47:45 PST 2006

> 
> 
> 
> Maybe we need to just go back to one model and always deliver
> via the event? With the post_recv_immed requirements, other
> transports have a mechanism to emulate and create the
> necessary resources on the recv side to place idata and copy
> to event when operation is completed. Would this work for iWARP?
> 
> 
> 
> Two different models for receiving idata should be avoided if
> at all possible.
> 
> 
> 

Always delivering by the event is not feasible for an iWARP vendor.
If you are working over RDMAC verbs then the work completion is no
longer accessible by the time the Work Completion is reaped. So copying
from the receive buffer to the event does not work since the location
of the receive buffer is now known only to the application.

The same problem exists in the opposite direction for InfiniBand HCAs
using standard verbs. They cannot copy from the CQE to the receive
buffer.

So the user is stuck checking a flag or the event type to know where
their data is. This is not terribly user friendly, but it is the best
that can be offered if we want to enable this optimization. The need
to check the flag does reduce the value of the optimization though.

> 
> 
> 
> 6. Is dto_completion_data xfer_length include immediate_data
> size or not?
> 
> 
> 
> no
> 
> 
> 

Then how does the receiver know how much data there is?

Even if an iWarp Provider attempts to optimize immediate
placement into the CQ, it will end up setting the xfer_length
whenever the packet is received out of order.

So it is far simpler for the application to simply know that
the data will be in the buffer, and that the xfer_length will
be set. It doesn't need to worry about whether they were set
by the cq_poll verb or by the hardware.

> 
> 
> 
> 11. Need to cleanup operation description to make it clear
> that Send|RDMA_write and immediate data part
> 
> is a single atomic operation. The current "followed by"
> language is misleading.
> 
> Make it explicit that there is a single local DTO completion
> and single remote DTO completion.
> 
> 
> 
> Ok, I will clean that up
> 
> 

The best mapping available over RDMAC-compliant firmware for
an iWARP NIC would be to post two operations (RDMA Write followed
by a short Send). That would require additional spacein the send
and completion queues since a completion for the write can only
be suppressed for a successful completion.

Whether these extra slots were required would be an IA attribute.

And the requirement is that nothing for that QP can come between
the iWARP Write and the Send. How the provider does that is up
to it. Options include locking over both posts and a composite
work request. Anyone working over existing RDMAC-compliant
verbs will have to use the first approach.

> 
> 12. Is your intension that post_recv_immed can ONLY except
> immediate data and is not
> 
> capable to recv any message?
> 
> 
> 
> No, the intention is to extend the post_recv to handle 32bit
> idata which may arrive with or without other send or rdma_write data.
> 
> 
> 
> Does it make more sense to add a dto_flags to the existing post_recv?
> 
>

How does this map to iWARP?

When the data can be sent as an immediate OR as data, then when received
it can be placed into the receive buffer or even potentially directly
into the CQ when everything aligns just right.

But an iWARP sender has to place the immediate value as the first
four bytes of a Send message. There is no other mapping than makes
sense. Shoving the rest of the message up is complex, as is using
the last four bytes of the message since the last four bytes *could*
cross a DDP Segment boundary, and would require the user to provide
a buffer that was 4 bytes larger.