[openib-general] APM support in openib stack

Dotan Barak dotanb at dev.mellanox.co.il
Sun Oct 29 07:35:14 PST 2006


Hi.


Venkatesh Babu wrote:
>  I don't think there is any event which says "path1 is back again". It 
> is the application which needs to load the alternate path. The HW just 
> sends an event IB_EVENT_PORT_ACTIVE when port comes up. Upon recipt of 
> the this event the application has to see if there exists a path from 
> this port to the remote node and then load this alternate path by 
> sending the APR message.
> PS: In Gen1 implementation there was an event called IB_PATH_MIG_ARMED 
> which was generated by HW/FW after loading the alternate path by the 
> application.
>
>  SA event notification is to just callback registered handlers when 
> IB_EVENT_PORT_ACTIVE event occurrs on any node in the subnet or  on a 
> specific node according to the registeration parameters.
>
>  VBabu
>
> somenath wrote:
>
>   
>> Sean,
>>
>> will there be a new API for SA event notification?
>> today we already get this IB_EVENT_PATH_MIG (as defined below),  will  
>> "path1 is back again" event
>> be delivered the same way?
>>
>> thanks, som.
>>
>> enum ib_event_type {
>>    IB_EVENT_CQ_ERR,
>>    IB_EVENT_QP_FATAL,
>>    IB_EVENT_QP_REQ_ERR,
>>    IB_EVENT_QP_ACCESS_ERR,
>>    IB_EVENT_COMM_EST,
>>    IB_EVENT_SQ_DRAINED,
>>    IB_EVENT_PATH_MIG,
>>    IB_EVENT_PATH_MIG_ERR,
>>    IB_EVENT_DEVICE_FATAL,
>>    IB_EVENT_PORT_ACTIVE,
>>    IB_EVENT_PORT_ERR,
>>    IB_EVENT_LID_CHANGE,
>>    IB_EVENT_PKEY_CHANGE,
>>    IB_EVENT_SM_CHANGE,
>>    IB_EVENT_SRQ_ERR,
>>    IB_EVENT_SRQ_LIMIT_REACHED,
>>    IB_EVENT_QP_LAST_WQE_REACHED
>> };
>>     

I checked the code of the file cm.c (if OFED 1.1) and the attribute 
alt_timeout is not mentioned anywhere in this code.
I believe that the value of this attribute is set to zero, which means 
that the QP will wait infinite time to the answer (that will never come).

Venkatesh, can you check this issue by querying the QP attributes after 
the path was migrated?
I think that you will find that the value of the timeout attribute is zero.

Sean, i don't familiar with the cm.c code, but i believe that the 
following patch will solve this issue:

Index: last_stable/drivers/infiniband/core/cm.c
===================================================================
--- last_stable.orig/drivers/infiniband/core/cm.c       2006-10-29 16:58:08.000000000 +0200
+++ last_stable/drivers/infiniband/core/cm.c    2006-10-29 17:31:57.000000000 +0200
@@ -3221,6 +3221,7 @@ static int cm_init_qp_rtr_attr(struct cm
                if (cm_id_priv->alt_av.ah_attr.dlid) {
                        *qp_attr_mask |= IB_QP_ALT_PATH;
                        qp_attr->alt_port_num = cm_id_priv->alt_av.port->port_num;
+                       qp_attr->alt_timeout = cm_id_priv->alt_av.packet_life_time;
                        qp_attr->alt_ah_attr = cm_id_priv->alt_av.ah_attr;
                }
                ret = 0;


thanks
Dotan




More information about the general mailing list