[ofa-general] general questions about librdmacm
Steven Dake
sdake at redhat.com
Fri Sep 26 13:17:02 PDT 2008
On Fri, 2008-09-26 at 10:54 -0500, Scott M. Ferris wrote:
> On Wed, Sep 24, 2008 at 03:43:56PM -0700, Steven Dake wrote:
> >
> > Totem is a reliable virtual synchrony multicast protocol which transmits
> > a message from any node to all nodes in a collection of computers
> > (called the configuration or membership). It has a few requirements:
> >
> > unreliable datagram multicast
> > unreliable datagram unicast
> > ability to bind to a specific port and interface
> > ability to poll() (POLLIN) via system call for new multicast datagram
> > messages
>
> It's been a few years since I read the totem paper, but If I recall
> correctly, totem also has certain ordering requirements about unicast
> and multicast, which the paper asserts are true for ethernet.
>
Totem can deal with reordering of any packets and has no
otherrequirements then above. It is designed for these sorts of
networks.
Thanks for your detailed response. I have a small question below...
> It's not clear to me that Infiniband will provide the same guarantees
> in all cases. If unicasts and multicasts are sent on different queue
> pairs, I'm not sure any ordering is guaranteed. I can also imagine
> the IB virtual lane (VL) feature potentially reordering delivery if
> messages end up in different VLs. I'd recommend talking to someone
> more knowledgable in Infiniband than I am to check what you need to do
> to meet the totem ordering requirement.
>
> > Few questions:
> > 1) I would like to continue to use IP addressing but it looks like I
> > have to use a different addressing model in librdmacm. I looked at the
> > examples in the library and it isn't clear to me whether they use IP
> > addressing or some other addressing model. I see references to IPoverIB
> > but I don't see any information in the wiki on the topic. Anyone have
> > links to documentation on the topic of node addressing?
>
> IPoIB is fairly transparent to you. The OFED software provides Linux
> netdevices (e.g. ib0, ib1) which pass IP traffic just like an ethernet
> netdevice would. Normal IP routing controls what interface gets used.
> For the most part IP-based software just works. Low-level things like
> DHCP daemons will notice differences, since the hardware addresses are
> larger.
>
> If IPoIB connected mode is being used, unicasts and multicasts will be
> sent on different queue pairs, since the unicasts will use a
> connection, and the multicasts can't. You may need to disable IPoIB
> connected mode in order to get the ordering guarantees totem needs,
> since I'm not sure any ordering is guaranteed between messages queued
> to different queue pairs.
>
> If you want to use native IB protocols instead of IPoIB, librdmacm
> will be easier than using libibcm. The RDMA communication manager
> uses IPoIB to resolve IP addresses to native IB addresses, so you can
> avoid changing your addressing model. The rdma_cm(7) man page
> describes how to bind and connect.
>
> > 2) The library doesn't have any non blocking (kernel wait queue based)
> > polling mechanism that I can see. Am I missing a call here?
>
> Look at ibv_create_comp_channel(3) and ibv_get_cq_event(3).
>
> > 3) Of course using the standard socket API would be highly desired as
> > it requires less code changes. Is there some other library I should be
> > using?
>
> IPoIB gives you the standard socket API.
>
Where is IPoIB? Is it a standard feature of the OFED software stack?
Thanks again!
regards
-steve
> RDMA CM uses a similiar set of calls for binding and connecting,
> though there are some differences. rdma_cm_ids are somewhat like
> socket fds. Once you have a multicast or unicast rmda_cm_id, you use
> the IB verbs API (libibverbs) to send and receive.
>
More information about the general
mailing list