[ofa-general] message is received but sender report error.

Tang, Changqing changquing.tang at hp.com
Mon Oct 29 07:18:14 PDT 2007


Yes I think it is a sync problem when close the connection, the sender sent a zero byte message with immediate data. The receier received the message correctly and destroy the coresponding QP immediate. The sender got the completion with status=12.

If I delay the QP destroying, the code works fine.


--CQ

> -----Original Message-----
> From: Dotan Barak [mailto:dotanb at dev.mellanox.co.il]
> Sent: Monday, October 29, 2007 6:47 AM
> To: Tang, Changqing
> Cc: Sean Hefty; Roland Dreier; general at lists.openfabrics.org
> Subject: Re: [ofa-general] message is received but sender
> report error.
>
> If you are not connecting the QPs using CM, maybe you have a
> sync problem?
> one side (the sender) is in RTS and the other side isn't in
> RTR (or a sync problem when closing the connection)
>
> Dotan
>
> Tang, Changqing wrote:
> > The timeout is 18 (~1sec), and retry is 7 (max).
> >
> > The error only occurs 1% of runs, sometimes I run the same
> hello_world code in a loop, and caught it after 1500 runs. So
> I don't think it is a cable issue(but I have not checked the
> port error counter).
> >
> > --CQ
> >
> >
> >> -----Original Message-----
> >> From: Dotan Barak [mailto:dotanb at dev.mellanox.co.il]
> >> Sent: Sunday, October 28, 2007 2:48 AM
> >> To: Tang, Changqing
> >> Cc: Sean Hefty; Roland Dreier; general at lists.openfabrics.org
> >> Subject: Re: [ofa-general] message is received but sender report
> >> error.
> >>
> >> Hi.
> >>
> >> Maybe you should increase your timeout/retry count for your
> >> application?
> >> can you check the ports error counters (using perfquery) maybe you
> >> have bad cables in your subnet ....
> >>
> >> Dotan
> >>
> >> Tang, Changqing wrote:
> >>
> >>> This is Verbs layer code, no IB CM is used.
> >>>
> >>> --CQ
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Sean Hefty [mailto:sean.hefty at intel.com]
> >>>> Sent: Thursday, October 25, 2007 12:38 PM
> >>>> To: Tang, Changqing; Roland Dreier
> >>>> Cc: general at lists.openfabrics.org
> >>>> Subject: RE: [ofa-general] message is received but sender report
> >>>> error.
> >>>>
> >>>>
> >>>>
> >>>>> If this is the case, how would we fix the problem ? It's
> >>>>>
> >>>>>
> >>>> hard for us to
> >>>>
> >>>>
> >>>>> delay to destroy the QP, because we don't know how long
> to delay.
> >>>>> The other way is to do something from the driver, or firmware.
> >>>>>
> >>>>>
> >>>> Do you disconnect the QPs using the IB CM?
> >>>>
> >>>> - Sean
> >>>>
> >>>>
> >>>>
> >>> _______________________________________________
> >>> general mailing list
> >>> general at lists.openfabrics.org
> >>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >>>
> >>> To unsubscribe, please visit
> >>> http://openib.org/mailman/listinfo/openib-general
> >>>
> >>>
> >>>
> >>
> >
> >
>
>



More information about the general mailing list