[ofa-general] Is response time of GMP is more than SMP

Hal Rosenstock hrosenstock at xsigo.com
Wed Jun 25 08:55:31 PDT 2008


Hi Sumit,

On Wed, 2008-06-25 at 20:21 +0530, Sumit Gaur - Sun Microsystem wrote:
> Hi Hal,
> 
> Hal Rosenstock wrote:
> > Hi Sumit,
> > 
> > On Wed, 2008-06-25 at 12:11 +0530, Sumit Gaur - Sun Microsystem wrote:
> > 
> >>Hi Hal/Sashak,
> >>
> >>Hal Rosenstock wrote:
> >>
> >>>Hi Sumit,
> >>>
> >>>On Tue, 2008-06-24 at 14:21 +0530, Sumit Gaur - Sun Microsystem wrote:
> >>>
> >>>
> >>>>Hi,
> >>>>I am using OFED 2.5.*
> >>>
> >>>                  ^^^^^
> >>>                  1.2.5.* ?
> >>>
> >>
> >>Sorry for typo .. it is 1.2.5.*
> >>
> >>>>and observing that my SMI requests are serving very fast 
> >>>>and response time is very less on the contrary my GSI requests were served in 
> >>>>more time and response time sometime goes more than 2 sec. Any light on this 
> >>>>different behavior.
> >>>
> >>>
> >>>What are the specific GS requests which are slow in response ? Are they
> >>>compute intensive ?
> >>
> >>I am sending only request for
> >>
> >>	rpc.mgtclass = IB_PERFORMANCE_CLASS;
> >>	rpc.method = IB_MAD_METHOD_GET;
> >>
> >>at every one second.

Does perfquery work reliably with the same node(s) you are having
trouble with ?

Does your app follow what perfquery does ?

> >>>In general, there are a few possibilities (which can cause this). SM
> >>>traffic is VL15 whereas GS traffic is on a data VL (usually VL0 in most
> >>>subnets).
> >>>
> >>>Some possibilities are:
> >>>1. Timeout/retry being hit for some GS traffic (GS request or response
> >>>lost/corrupted)
> >>
> >>Yes, this is also happening, Sometimes I am getting corrupt data back,
> > 
> > 
> > Is there an error indicated ?
> For such packets I am getting umad_status as 110.

That's ETIMEDOUT. You need to handle the errors (and not treat the
receive as a valid packet). Are you doing that ?

The underlying question is why are you getting the timeout relatively
frequently so I recommend checking all the error counters along the
path.

Are you sure the request gets to the responder ? Does the responder
respond and it doesn't make it back ?

-- Hal

> >>and if I retry to send same request again it fails or send corrupted data back again.
> > 
> > 
> >>>2. Data VL busy (is there anything else utilizing VL0 ?)
> >>
> >>Not sure about it. Is there anything to verify it?
> > 
> > 
> > There's an optional counter not commonly implemented so maybe starting
> > by verifying all the PortCounters along the path from requester to
> > responder to see whether there are any low level issues with your
> > subnet.
> > 




More information about the general mailing list