[openib-general] [CM] question about the code that handle the attribute: packet_life_time

Todd Rimmer todd.rimmer at qlogic.com
Wed Nov 22 11:14:37 PST 2006


> From: Sean Hefty
> Sent: Wednesday, November 22, 2006 1:30 PM
> To: Dotan Barak
> Cc: openib
> Subject: Re: [openib-general] [CM] question about the code that handle
the
> attribute: packet_life_time
> 
> Dotan Barak wrote:
> > I noticed the following code lines in the cm.c (this code handles
> > packet_life_time):
> >
> > in function: cm_format_req
> >        cm_req_set_primary_local_ack_timeout(req_msg,
> >                 min(31, param->primary_path->packet_life_time + 1));
> >
> > in function: cm_format_paths_from_req
> >         primary_path->packet_life_time_selector = IB_SA_EQ;
> >         primary_path->packet_life_time =
> >                 cm_req_get_primary_local_ack_timeout(req_msg);
> >         primary_path->packet_life_time -=
> > (primary_path->packet_life_time > 0);
> >
> > Why do you check the minimum between 31 and packet_life_time + 1?
> > I understand where the value 31 is coming from, but why do you add 1
to
> > the original packet_life_time value?
> 
> Packet life time is the time that it takes a packet to traverse the
path,
> and is
> a 6-bit value.
> 
> Primary local ack timeout is a 5-bit value calculated as: 2 x packet
life
> time +
> local CA ack delay.  Adding 1 to the packet life time doubles it
because
> of how
> it's represented.
> 
> I subtract one on the remote side to get back to the original packet
life
> time.
>   This is off if the true packet life time is really 31 (about 2.75
> hours), but
> more accurate for practical values.  The check for 31 is a result of
> trying to
> cram the 6-bit value into a 5-bit field.
> 

While this happens to work for many fabrics, the real calculation for
the ack timeouts should include the local and remote CA Ack Delays.
This is a little tricky as when sending the REQ the active side puts
into the REQ:
local_ack_timeout = 2*(PktLifeTime)+local CA Ack Delay
	- note above 2*PktLifeTime refers to the value of lifetime in
usecs, not the log2 5 bit value stored in the packets.

receiver uses this value + its own local CA Ack Delay to compute Ack
Delay for the QP (note there is insufficient info in the REQ to
accurately reverse compute the PktLifeTime, AckDelay/2 (-1 for log2
math) is an approximation which can be on the high side).

In the REP, the TargetAckDelay is supplied and this should be used by
the active side to adjust its QP AckTimeout by adding this to the
local_ack_timeout in the REQ which it sent.

when stress on HCAs gets high and/or fabrics start to get congested,
accurate settings for these values can matter.  The Mellanox HCA
firmware reports a value of 134ms for the Ack Delay.  In contrast short
paths in a fabric can have PktLifeTimes < 66ms, in which case the Ack
Delay becomes the dominate factor.  In fact for a 0 hop path (host
talking to itself), the PktLifeTime is technically 0 and Ack Delay is
the only factor.

To date this has not been an issue since most of the MPIs hard code the
AckTimeouts and don't use the Path Records.  However we have been using
PathRecords for MPI for a few years now and found that accuracy in these
computations is important, especially under certain stressful benchmarks
and applications.

Todd Rimmer




More information about the general mailing list