[ofa-general] Multi-threaded diags (Was: Re: [PATCH 4/5] infiniband-diags/libibnetdisc: Introduce a context object.)
Jason Gunthorpe
jgunthorpe at obsidianresearch.com
Thu Aug 27 11:20:56 PDT 2009
On Thu, Aug 27, 2009 at 09:48:10AM -0700, Ira Weiny wrote:
> > FSM multiplexing the recv path usually gives much better performance,
> > something like net discovery is quite easy..
>
> Using the original algorithm and data structures lended itself to
> threading. Now that I am neck deep in all this I have thought that
> rewriting it all might be easier.
Yah. mayhaps..
> > main loop:
> > fill tx queue from next list
> > recieve replies and correlate with next list
> This would still need additional code (or additional synchronization in the
> API to libibnetdisc) if you wanted a user app to be multi-threaded. Someone
> has to be in charge of receiving all replies on that ibmad_port object and
> handing them to the proper owner. Of course one could open multiple
> ibmad_port objects but how is the app writer to know to do that? Digging
> through the code to find out that libibnetdisc is consuming all the replies?
What is the use case here? I thought the app would be something like:
main()
{
foo = libibnetdisc_setup();
libibnetdisc_discover_all(foo,res);
// Do interesting things with res.
}
Where the goal is to have libibnetdisc_discover_all complete
expediently.
As long as the context 'foo' is re-entrant in all ways with all other
libraries and contexts I think useful threaded apps can be created.
> This is what got me on this in the first place. smp_query_via
> (_do_madrpc) is not thread safe.
Sure, the entire library is not thread safe around the ibmad_port
context. But who cares? If the caller to libibnetdisc wants to thread
that way they need to open another context.
> Also, I feel that someone down the road might fall into the same
> trap that I did thinking that smp_query_via is thread safe and I
> would like to fix that.
Well.. How can it be threaded? umad_send/umad_recv are inherently
single threaded APIs. You have to layer a TID based threading dispatch
mechanism on top of it. Much better to let the kernel do that and open
multiple umad fds.
> > each entry:
> > add to next list additional ports
> >
> > Repeat until dead.
> >
> > Where a 'next list' would be a set of actions along the lines of
> > 'query node' or 'query port' the action on a 'query node' completion
> > is to generate 'query port' next list items for all the ports, and on
> > 'query port' completion is to generate 'query node' items for all
> > enabled ports..
> >
> > libumad is nonblocking, parallel, etc...
>
> Yes, and libibmad layers on top of it an easier interface to issue common
> queries. Why should we ask the user to re-implement that code?
Well, the very best way to do this is to have a FSM engine API at the
core of the MAD libary:
mad_ctx->callback = done_this;
mad_post(mad,mad_ctx)
done_this(reply):
...
> For example, mad_rpc now handles redirection. My implementation
> does not yet. So now I have to handle that on my own as well...
> :-(
To be honest, I don't like the libibmad/libibumad APIs one bit - I'm
not surprised they don't work for you..
Frankly, we really need a usable MAD libary with sane APIs, and very
high level APIs on top of that. You cannot make an IB application
without doing SA queries at a minimum and the current process is
HORRID.
I see nothing of value in libimad and libibumad to support that :|
Jason
More information about the general
mailing list