[ofiwg] raw Ethernet support

Christoph Lameter christoph at graphe.net
Sat Jul 19 08:07:27 PDT 2014



>> There are multiple elements to this since one would need l2 and l3 access.
> 
> Are you treating these as independent sets of functionality?

They are meshed together in the flow steering API. Maybe it would be better to separate those?

> 
>> On the l2 level you would use the MAC address for packets and then format
>> the content of the frames yourself. This would allow support of an
>> arbitrary ethernet protocol. Applications would have to be able to set the
>> MAC addresses where they would like to receive traffic (multiple in order
>> to support broadcast and multicast addresses).
> 
> So, for raw Ethernet access, you would expect the application to format the Ethernet header and perform the CRC, or at least provide the space for the CRC with some mechanism for the hardware to perform the calculation?

Right. Access to the CRC is currently not possible but it is used by some devices to provide timestamp information. The device needs a mechanism by which it can indicate which features for acceleration are available and a mechanism to switch them on and off.

> 
> And received frames would be the entire frame, not just the data portion, correct?

Both may be useful. Maybe two modes?

> 
>> On the l3 level the IP address would be used as the address and supportive
>> functions are available (like NIC calculation / verification of checksums
>> and the remainder of the typically provided offload for regular NICs).
> 
> Does the raw IP format use the same rules as the raw Ethernet format?  I.e. the app provides a formatted packet header (assuming that the answer to my previous questions are yes).

Correct.
> 
> Would you expect IPv4 and IPv6 to be exposed separately?

Hmm I have not seen much IPv6 so far. Longer headers increase latency so we stick to IPv4.

>> The flow steering API kind of covers both of these levels to some extend
>> but it is something to the side.
> 
> The verbs flow steering seems reasonable to me.  Using a control interface to specify those sort of filters on an endpoint should be possible.  Is there anything about the verbs flow steering data structures that you would want to improve upon?

That would require a bit more time than I have now to answer.
> 
>> The fundmantal idea is that these protocols would allow the use of the
>> Fabric API to generate IP datagrams and thereby native IP format traffic
>> can be supported. This allows writing of userspace frabric communication
>> layers based on IP that would work with commodity hardware.
>> 
>> For us also the effect is that we would like to create a user space IP
>> stack and use the fabric API to communicate directly with IP based servers
>> and services that may just operate with a standard OS IP stack. The fabric
>> API then works as an way to bypass the operating system.
>> 
>> This is solution that currently works very well for us. We do both IP based
>> and IB based communication through the IBverbs/RDMA apis.
> 
> I assume that the kernel would validate that an application has permissions to open such an endpoint.  I haven't looked at the flow steering implementation details.

Correct. This works via capabilities. CAP_NET_RAW is required for such access.


More information about the ofiwg mailing list