[openib-general] umad_recv won't block after first read...

Hal Rosenstock halr at voltaire.com
Thu Aug 10 20:52:43 PDT 2006


Hi Abhijit,

On Thu, 2006-08-10 at 10:55, Abhijit Gadgil wrote:
> Hi Hal,

> I tried using the umad code as per the latest repository. 
> (The latest fix is on libibumad/umad.c Line # 806 right?) 

Yes.

> I manually applied that patch.

OK but not sure why you did this "manually".

>  It doesn't seem to work yet. 

What do you mean ? Do you mean that change makes no difference for this
and you still have the same problem ?

> Infact, what I figured out was that the 'poll' on the umad->fd isn't 
> blocking either. 

What do you mean by either ? 

A poll with an negative timeout should be infinite which means blocking
so something is happening on the fd but perhaps is not reported
correctly. This particular usage has not been tried to my knowledge
although it is used in a similar manner for some other things (by
OpenSM).

What kernel version are you using ? Are you using OpenIB from svn or
OFED or something else ? What version is this up to ?

> The read returns the correct 'mad_agent' ie. 0 in this case and some length which is usually 24 for the specific code.

That shows the breakage. Not sure why.

> I am attaching the local copy of infiniband/include/mad.h and src/fields.c, so that you may be able to try this code.  (There may be stray printf's in those files!). Also, since I was not quite clear about whether the subscriptions should include the RID information (as per section 15.2.5), so I tried including it first, which the SA doesn't seem to like, but the subscriptions work after I get rid of the RID header. This particular aspect is not quite clear to me yet. 
> 
> Please let me know what you find.

I'll try to look at this more tomorrow. I have some other nits on the
test code you sent. I'll comment on these later as well although I don't
think they are the crux of the issue.

-- Hal

> Regards.
> 
> -abhijit
> 
> 
> On Aug 10, 2006 08:02 PM, Hal Rosenstock <halr at voltaire.com> wrote:
> 
> > Hi again Abhijit,
> > 
> > On Thu, 2006-08-10 at 09:46, Abhijit Gadgil wrote:
> > > Hi Hal, 
> > > 
> > > Please see below.
> > > 
> > > On Aug 10, 2006 07:01 PM, Hal Rosenstock <halr at voltaire.com> wrote:
> > > 
> > > > Hi Abhijit,
> > > > 
> > > > On Thu, 2006-08-10 at 07:21, Abhijit Gadgil wrote:
> > > > > Hi All, 
> > > > > 
> > > > > I am trying to write a simple program using libibumad to 'subscribe' for traps and then receive traps from the SA. Most of the things seem to work fine, however I am facing a small problem where, after first read for the trap, all subsequent reads are not blocking (and return some incorrect length). 
> > > > 
> > > > What do those calls return ? What version of management are you using ? 
> > > > 
> > > 
> > > I am running the management code from the SVN (svn release 8781, it may be slightly outdated!) 
> > 
> > A fix just went in to libibumad:umad_recv which may impact your results.
> > Can you update this and retry ?
> > 
> > What do the reads return other than incorrect length ? 
> > 
> > -- Hal
> > 
> > > > > Attached is the simple code, can someone tell, what exactly is wrong out here? 
> > > > 
> > > > I didn't build and run this so my comments are based on just looking at
> > > > the code. I don't think it would build as there are other changes needed
> > > > to support this (e.g. IB_SA_INFINFO_XXX in libibmad at a minimum).
> > > > 
> > > 
> > > Oh I am sorry, I didn't mention this before, I modified the libibmad sources (specifically src/fields.c and include/infiniband/mad.h) files to accomplish this. Once I get it right, I will submit a patch. (It's too hacky right now)
> > > 
> > > > Is the main loop based on some operational program ? If so, which one ?
> > > > 
> > > > A couple of specific comments:
> > > > 
> > > > init_sa_headers: InformInfo does not actually use RMPP so the
> > > > initialization here needs to change. Not sure what doing this would
> > > > cause without actually building and running this.
> > > > 
> > > 
> > > This was my first try of trying to use umad, hence for simplicity I copied from some reference code that was having RMPP enabled. I think I should get rid of this as well. 
> > > 
> > > 
> > > > Based on this, what is the result of the subscription ? Does it really
> > > > succeed ?
> > > 
> > > Well the subscriptions in-deed succeeded and I was able to receive IPoIB broadcast multicast group creation/deletion traps as well, but the problem mentioned below (ie. non-blocking reads) started appearing. 
> > > 
> > > > main: Rather than hard coding SM LID to 0x12, there are ways to get this
> > > > dynamically. There are examples of how to do this.
> > > 
> > > Sorry about this again. I realized it later that it is stupid to hard code it (eg. I could have got it from the ca[].port->sm_lid), will fix that eventually. 
> > > 
> > > Thanks.
> > > 
> > > -abhijit
> > > 
> > > > -- Hal
> > > > 
> > > > > Thanks
> > > > > 
> > > > > -abhijit
> 
> 
> 





More information about the general mailing list