[openib-general] umad_recv won't block after first read...

Abhijit Gadgil Abhijit.Gadgil at pantasys.com
Thu Aug 10 21:44:25 PDT 2006


Hi Hal, 

Sorry for being ambiguous on the answers below. However, I figured out what the problem was (while not looking at the code and thinking over it offline.) The main mistake was the umad_send part in the while(1) loop. Where I have specified the timeout value greater than '0' which means the mads were solicited. The SubnAdmResponse should not be sent as solicited and that was the main problem. So if I set the timeout value to '0' and the retries count to '0', there is no data available for subsequent reads and the 'read' blocks as expected. 

Thanks for the help. Some of the clarifications for previous questions are below. Please see inline.

On Aug 11, 2006 09:22 AM, Hal Rosenstock <halr at voltaire.com> wrote:

> Hi Abhijit,
> 
> On Thu, 2006-08-10 at 10:55, Abhijit Gadgil wrote:
> > Hi Hal,
> 
> > I tried using the umad code as per the latest repository. 
> > (The latest fix is on libibumad/umad.c Line # 806 right?) 
> 
> Yes.
> 
> > I manually applied that patch.
> 
> OK but not sure why you did this "manually".
> 

Sorry about this, the machine where I am testing this code does not grab code from the svn repository directly, hence I just edited the file with hand. 

> >  It doesn't seem to work yet. 
> 
> What do you mean ? Do you mean that change makes no difference for this
> and you still have the same problem ?
> 
> > Infact, what I figured out was that the 'poll' on the umad->fd isn't 
> > blocking either. 
> 
> What do you mean by either ? 
> 

Well both 'read' and 'poll' were returning immediately because of the 'timeout' parameter specified in the umad_send. So even if I specify the timeout to be a negative value (in umad_poll), there was a data available always. :-( 

> A poll with an negative timeout should be infinite which means blocking
> so something is happening on the fd but perhaps is not reported
> correctly. This particular usage has not been tried to my knowledge
> although it is used in a similar manner for some other things (by
> OpenSM).
> 
> What kernel version are you using ? Are you using OpenIB from svn or
> OFED or something else ? What version is this up to ?
> 

I am using the latest kernel version 2.6.17 and openIB from svn as well. (same revision ie. 8781).

> > The read returns the correct 'mad_agent' ie. 0 in this case and some length which is usually 24 for the specific code.
> 
> That shows the breakage. Not sure why.
> 
> > I am attaching the local copy of infiniband/include/mad.h and src/fields.c, so that you may be able to try this code.  (There may be stray printf's in those files!). Also, since I was not quite clear about whether the subscriptions should include the RID information (as per section 15.2.5), so I tried including it first, which the SA doesn't seem to like, but the subscriptions work after I get rid of the RID header. This particular aspect is not quite clear to me yet. 
> > 
> > Please let me know what you find.
> 
> I'll try to look at this more tomorrow. I have some other nits on the
> test code you sent. I'll comment on these later as well although I don't
> think they are the crux of the issue.

Please let me know additional comments that you have. 

Further, it is not quite clear from the specification that whether one should include the RIDs in the InformInfo records during subscription. What is the correct intended behavior?

Regards

-abhijit

> 
> -- Hal
> 
> > Regards.
> > 
> > -abhijit
> > 
> > 
> > On Aug 10, 2006 08:02 PM, Hal Rosenstock <halr at voltaire.com> wrote:
> > 
> > > Hi again Abhijit,
> > > 
> > > On Thu, 2006-08-10 at 09:46, Abhijit Gadgil wrote:
> > > > Hi Hal, 
> > > > 
> > > > Please see below.
> > > > 
> > > > On Aug 10, 2006 07:01 PM, Hal Rosenstock <halr at voltaire.com> wrote:
> > > > 
> > > > > Hi Abhijit,
> > > > > 
> > > > > On Thu, 2006-08-10 at 07:21, Abhijit Gadgil wrote:
> > > > > > Hi All, 
> > > > > > 
> > > > > > I am trying to write a simple program using libibumad to 'subscribe' for traps and then receive traps from the SA. Most of the things seem to work fine, however I am facing a small problem where, after first read for the trap, all subsequent reads are not blocking (and return some incorrect length). 
> > > > > 
> > > > > What do those calls return ? What version of management are you using ? 
> > > > > 
> > > > 
> > > > I am running the management code from the SVN (svn release 8781, it may be slightly outdated!) 
> > > 
> > > A fix just went in to libibumad:umad_recv which may impact your results.
> > > Can you update this and retry ?
> > > 
> > > What do the reads return other than incorrect length ? 
> > > 
> > > -- Hal
> > > 
> > > > > > Attached is the simple code, can someone tell, what exactly is wrong out here? 
> > > > > 
> > > > > I didn't build and run this so my comments are based on just looking at
> > > > > the code. I don't think it would build as there are other changes needed
> > > > > to support this (e.g. IB_SA_INFINFO_XXX in libibmad at a minimum).
> > > > > 
> > > > 
> > > > Oh I am sorry, I didn't mention this before, I modified the libibmad sources (specifically src/fields.c and include/infiniband/mad.h) files to accomplish this. Once I get it right, I will submit a patch. (It's too hacky right now)
> > > > 
> > > > > Is the main loop based on some operational program ? If so, which one ?
> > > > > 
> > > > > A couple of specific comments:
> > > > > 
> > > > > init_sa_headers: InformInfo does not actually use RMPP so the
> > > > > initialization here needs to change. Not sure what doing this would
> > > > > cause without actually building and running this.
> > > > > 
> > > > 
> > > > This was my first try of trying to use umad, hence for simplicity I copied from some reference code that was having RMPP enabled. I think I should get rid of this as well. 
> > > > 
> > > > 
> > > > > Based on this, what is the result of the subscription ? Does it really
> > > > > succeed ?
> > > > 
> > > > Well the subscriptions in-deed succeeded and I was able to receive IPoIB broadcast multicast group creation/deletion traps as well, but the problem mentioned below (ie. non-blocking reads) started appearing. 
> > > > 
> > > > > main: Rather than hard coding SM LID to 0x12, there are ways to get this
> > > > > dynamically. There are examples of how to do this.
> > > > 
> > > > Sorry about this again. I realized it later that it is stupid to hard code it (eg. I could have got it from the ca[].port->sm_lid), will fix that eventually. 
> > > > 
> > > > Thanks.
> > > > 
> > > > -abhijit
> > > > 
> > > > > -- Hal
> > > > > 
> > > > > > Thanks
> > > > > > 
> > > > > > -abhijit
> > 
> > 
> > 
> 







More information about the general mailing list