[ofw] Bugfix for IPOIB failure

Alex Naslednikov alexn at mellanox.co.il
Thu Jun 26 05:01:07 PDT 2008


 

________________________________

Hi all,

The original problem was IPoIB failure (link up/down) during the
operation of  'heavy' applications.

After our investigation, we found when one execute application with
heavy load on the HCA, driver always have non-empty CQs, so it continues
to work with its DPCs and not allows to other DPC to be performed.
 
The solution
Based on an anologous solution for mthca driver, driver has to enable
other than its own DPC to be performed. It's possible to count certain
amount of time that driver spent on DPC handling, than exit, thus
allowing other DPC to run :

 

 

Index: hw/mlx4/kernel/bus/net/eq.c

===================================================================

--- hw/mlx4/kernel/bus/net/eq.c (revision 2634)

+++ hw/mlx4/kernel/bus/net/eq.c (revision 2635)

@@ -160,6 +160,9 @@

int cqn;

int eqes_found = 0;

int set_ci = 0;

+ static const uint32_t cDpcMaxTime = 10000; //max time to spend in a
while loop

+ 

+ uint64_t start = cl_get_time_stamp();

while ((eqe = next_eqe_sw(eq))) {

/*

@@ -222,6 +225,7 @@

default:

mlx4_warn(dev, "Unhandled event %02x(%02x) on EQ %d at index %u\n",

eqe->type, eqe->subtype, eq->eqn, eq->cons_index);

+ 

break;

};

@@ -244,6 +248,10 @@

eq_set_ci(eq, 0);

set_ci = 0;

}

+ 

+ if (cl_get_time_stamp() - start > cDpcMaxTime ) {

+ break; //allow other DPCs as well

+ }

}

eq_set_ci(eq, 1);

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20080626/d285261b/attachment.html>


More information about the ofw mailing list