[ofa-general] message is received but sender report error.

Tang, Changqing changquing.tang at hp.com
Sun Oct 28 20:09:25 PDT 2007


The timeout is 18 (~1sec), and retry is 7 (max).

The error only occurs 1% of runs, sometimes I run the same hello_world code in a loop, and caught it after 1500 runs. So I don't think it is a cable issue(but I have not checked the port error counter).

--CQ

> -----Original Message-----
> From: Dotan Barak [mailto:dotanb at dev.mellanox.co.il]
> Sent: Sunday, October 28, 2007 2:48 AM
> To: Tang, Changqing
> Cc: Sean Hefty; Roland Dreier; general at lists.openfabrics.org
> Subject: Re: [ofa-general] message is received but sender
> report error.
>
> Hi.
>
> Maybe you should increase your timeout/retry count for your
> application?
> can you check the ports error counters (using perfquery)
> maybe you have bad cables in your subnet ....
>
> Dotan
>
> Tang, Changqing wrote:
> > This is Verbs layer code, no IB CM is used.
> >
> > --CQ
> >
> >
> >> -----Original Message-----
> >> From: Sean Hefty [mailto:sean.hefty at intel.com]
> >> Sent: Thursday, October 25, 2007 12:38 PM
> >> To: Tang, Changqing; Roland Dreier
> >> Cc: general at lists.openfabrics.org
> >> Subject: RE: [ofa-general] message is received but sender report
> >> error.
> >>
> >>
> >>> If this is the case, how would we fix the problem ? It's
> >>>
> >> hard for us to
> >>
> >>> delay to destroy the QP, because we don't know how long to delay.
> >>> The other way is to do something from the driver, or firmware.
> >>>
> >> Do you disconnect the QPs using the IB CM?
> >>
> >> - Sean
> >>
> >>
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> >
>
>



More information about the general mailing list