[ofa-general] Is response time of GMP is more than SMP
Hal Rosenstock
hrosenstock at xsigo.com
Wed Jun 25 08:55:31 PDT 2008
Hi Sumit,
On Wed, 2008-06-25 at 20:21 +0530, Sumit Gaur - Sun Microsystem wrote:
> Hi Hal,
>
> Hal Rosenstock wrote:
> > Hi Sumit,
> >
> > On Wed, 2008-06-25 at 12:11 +0530, Sumit Gaur - Sun Microsystem wrote:
> >
> >>Hi Hal/Sashak,
> >>
> >>Hal Rosenstock wrote:
> >>
> >>>Hi Sumit,
> >>>
> >>>On Tue, 2008-06-24 at 14:21 +0530, Sumit Gaur - Sun Microsystem wrote:
> >>>
> >>>
> >>>>Hi,
> >>>>I am using OFED 2.5.*
> >>>
> >>> ^^^^^
> >>> 1.2.5.* ?
> >>>
> >>
> >>Sorry for typo .. it is 1.2.5.*
> >>
> >>>>and observing that my SMI requests are serving very fast
> >>>>and response time is very less on the contrary my GSI requests were served in
> >>>>more time and response time sometime goes more than 2 sec. Any light on this
> >>>>different behavior.
> >>>
> >>>
> >>>What are the specific GS requests which are slow in response ? Are they
> >>>compute intensive ?
> >>
> >>I am sending only request for
> >>
> >> rpc.mgtclass = IB_PERFORMANCE_CLASS;
> >> rpc.method = IB_MAD_METHOD_GET;
> >>
> >>at every one second.
Does perfquery work reliably with the same node(s) you are having
trouble with ?
Does your app follow what perfquery does ?
> >>>In general, there are a few possibilities (which can cause this). SM
> >>>traffic is VL15 whereas GS traffic is on a data VL (usually VL0 in most
> >>>subnets).
> >>>
> >>>Some possibilities are:
> >>>1. Timeout/retry being hit for some GS traffic (GS request or response
> >>>lost/corrupted)
> >>
> >>Yes, this is also happening, Sometimes I am getting corrupt data back,
> >
> >
> > Is there an error indicated ?
> For such packets I am getting umad_status as 110.
That's ETIMEDOUT. You need to handle the errors (and not treat the
receive as a valid packet). Are you doing that ?
The underlying question is why are you getting the timeout relatively
frequently so I recommend checking all the error counters along the
path.
Are you sure the request gets to the responder ? Does the responder
respond and it doesn't make it back ?
-- Hal
> >>and if I retry to send same request again it fails or send corrupted data back again.
> >
> >
> >>>2. Data VL busy (is there anything else utilizing VL0 ?)
> >>
> >>Not sure about it. Is there anything to verify it?
> >
> >
> > There's an optional counter not commonly implemented so maybe starting
> > by verifying all the PortCounters along the path from requester to
> > responder to see whether there are any low level issues with your
> > subnet.
> >
More information about the general
mailing list