[ofa-general] general questions about librdmacm

Steven Dake sdake at redhat.com
Fri Sep 26 13:17:02 PDT 2008


On Fri, 2008-09-26 at 10:54 -0500, Scott M. Ferris wrote:
> On Wed, Sep 24, 2008 at 03:43:56PM -0700, Steven Dake wrote:
> > 
> > Totem is a reliable virtual synchrony multicast protocol which transmits
> > a message from any node to all nodes in a collection of computers
> > (called the configuration or membership).  It has a few requirements:
> > 
> > unreliable datagram multicast
> > unreliable datagram unicast
> > ability to bind to a specific port and interface
> > ability to poll() (POLLIN) via system call for new multicast datagram
> > messages
> 
> It's been a few years since I read the totem paper, but If I recall
> correctly, totem also has certain ordering requirements about unicast
> and multicast, which the paper asserts are true for ethernet.  
> 

Totem can deal with reordering of any packets and has no
otherrequirements then above.  It is designed for these sorts of
networks.

Thanks for your detailed response.  I have a small question below...

> It's not clear to me that Infiniband will provide the same guarantees
> in all cases.  If unicasts and multicasts are sent on different queue
> pairs, I'm not sure any ordering is guaranteed.  I can also imagine
> the IB virtual lane (VL) feature potentially reordering delivery if
> messages end up in different VLs.  I'd recommend talking to someone
> more knowledgable in Infiniband than I am to check what you need to do
> to meet the totem ordering requirement.
> 
> > Few questions:
> > 1)  I would like to continue to use IP addressing but it looks like I
> > have to use a different addressing model in librdmacm.  I looked at the
> > examples in the library and it isn't clear to me whether they use IP
> > addressing or some other addressing model.  I see references to IPoverIB
> > but I don't see any information in the wiki on the topic.  Anyone have
> > links to documentation on the topic of node addressing?
> 
> IPoIB is fairly transparent to you.  The OFED software provides Linux
> netdevices (e.g. ib0, ib1) which pass IP traffic just like an ethernet
> netdevice would.  Normal IP routing controls what interface gets used.
> For the most part IP-based software just works.  Low-level things like
> DHCP daemons will notice differences, since the hardware addresses are
> larger.
> 
> If IPoIB connected mode is being used, unicasts and multicasts will be
> sent on different queue pairs, since the unicasts will use a
> connection, and the multicasts can't.  You may need to disable IPoIB
> connected mode in order to get the ordering guarantees totem needs,
> since I'm not sure any ordering is guaranteed between messages queued
> to different queue pairs.
> 
> If you want to use native IB protocols instead of IPoIB, librdmacm
> will be easier than using libibcm.  The RDMA communication manager
> uses IPoIB to resolve IP addresses to native IB addresses, so you can
> avoid changing your addressing model.  The rdma_cm(7) man page
> describes how to bind and connect.
> 
> > 2)  The library doesn't have any non blocking (kernel wait queue based)
> > polling mechanism that I can see.  Am I missing a call here?
> 
> Look at ibv_create_comp_channel(3) and ibv_get_cq_event(3).
> 
> > 3)  Of course using the standard socket API would be highly desired as
> > it requires less code changes.  Is there some other library I should be
> > using?
> 
> IPoIB gives you the standard socket API.  
> 

Where is IPoIB?  Is it a standard feature of the OFED software stack?

Thanks again!
regards
-steve

> RDMA CM uses a similiar set of calls for binding and connecting,
> though there are some differences.  rdma_cm_ids are somewhat like
> socket fds.  Once you have a multicast or unicast rmda_cm_id, you use
> the IB verbs API (libibverbs) to send and receive.
> 




More information about the general mailing list