[openib-general] Unusable QP's on CM established connections from gen2 client to gen1 server.

Todd Rimmer todd.rimmer at qlogic.com
Tue Nov 14 04:58:40 PST 2006


> From: Bub Thomas
> Sent: Monday, November 13, 2006 3:37 AM
> To: Sean Hefty; Bub Thomas
> Cc: Erez Cohen; openib-general at openib.org
> Subject: Re: [openib-general] Unusable QP's on CM established
connections
> from gen2 client to gen1 server.
> 
> Sean,
> you got it!
> Setting the hop_limit from 64 down to 0 or 1 solved the problem. :-)
> Don't ask me where I got that hop_limit from, it must have been an
> example I found somewhere.
> Can you explain why that hop_limit/is_global makes a difference in
> communication between gen1 and gen2? Does the counterpart need to have
> the same hop_limit?

Hop limit is often used to identify local vs global routes.  A hop limit
can result is a global route being assumed and hence the unexpected use
of GRH network layer headers.

> The path record values I use are queried from the OSM using a
> SERVICE_RECORD query followed by a path record query.
> I'm not using any alternate path record values, is this critical?
> In addition I enclose the values I path into the ib_cm_sned_req call.
> Can you pls have a look if you find something alse looking abnormal.
> Thanks
> Thomas Bub
> 
>     req_param.qp_type                    = IBV_QPT_RC;
>     req_param.qp_num                     = _dataQpNum;
>     req_param.starting_psn               = _dataQpNum;;
>     req_param.service_id                 = htonll(SERVICE_ID);
> 
>     req_param.primary_path               = &path_record;
>     req_param.alternate_path             = NULL;
>     req_param.private_data               = NULL;
>     req_param.private_data_len           = 0;
> 
>     req_param.responder_resources        = 4;
>     req_param.initiator_depth            = 4;

These should not be hardcoded, but should come from a query of the CA
capabilities.

>     req_param.remote_cm_response_timeout = 20;
>     req_param.local_cm_response_timeout  = 20;

These should be computed based on path record pkt lifetime and local CA
Ack turnaround time.  Check the archives, about a month ago I posted
some computations for these values.

>     req_param.retry_count                = 7;
>     req_param.rnr_retry_count            = 7;

FYI for RNR retry, 7=infinite.

>     req_param.max_cm_retries             = 5;
> 
>     path_record.sgid             = _localGid;
>     path_record.dgid             = _remoteGid;
>     path_record.slid             = htons(_localLID);
>     path_record.dlid             = htons(_remoteLID);
>     path_record.flow_label       = 0;
>     path_record.hop_limit        = 0;
>     path_record.traffic_class    = 0;
>     path_record.pkey             = 0xffff;
>     path_record.sl               = 0;
>     path_record.rate             = IBV_RATE_10_GBPS;
>     path_record.packet_life_time = 0;
>     path_record.mtu              = IBV_MTU_2048;

All the path record values should come from the SA.  While hardcoding
might work in some cases, it will not work on all fabrics.  For example,
in a DDR fabric setting the rate to 10 GBPS will run at 1/2 the
potential bandwidth.

Todd Rimmer





More information about the general mailing list