[ofw] RE: Bugfix for IPOIB failure

Fab Tillier ftillier at windows.microsoft.com
Thu Jun 26 09:34:49 PDT 2008


Note that all kernel ULPs should do something along these lines - SRP, IPoIB, VNIC, as well as internal IBAL services-should all limit how much time they spend processing completions and requeue their DPC if they exceed it.

The natural progression of this idea is that a ULP would provide the DPC object to queue in response to a CQ event, rather than callbacks.

-Fab

From: ofw-bounces at lists.openfabrics.org [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Alex Naslednikov
Sent: Thursday, June 26, 2008 5:01 AM
To: ofw at lists.openfabrics.org; Tzachi Dar
Subject: [ofw] Bugfix for IPOIB failure



________________________________

Hi all,

The original problem was IPoIB failure (link up/down) during the operation of  'heavy' applications.
After our investigation, we found when one execute application with heavy load on the HCA, driver always have non-empty CQs, so it continues to work with its DPCs and not allows to other DPC to be performed.

The solution
Based on an anologous solution for mthca driver, driver has to enable other than its own DPC to be performed. It's possible to count certain amount of time that driver spent on DPC handling, than exit, thus allowing other DPC to run :





Index: hw/mlx4/kernel/bus/net/eq.c

===================================================================

--- hw/mlx4/kernel/bus/net/eq.c (revision 2634)

+++ hw/mlx4/kernel/bus/net/eq.c (revision 2635)

@@ -160,6 +160,9 @@

int cqn;

int eqes_found = 0;

int set_ci = 0;

+ static const uint32_t cDpcMaxTime = 10000; //max time to spend in a while loop

+

+ uint64_t start = cl_get_time_stamp();

while ((eqe = next_eqe_sw(eq))) {

/*

@@ -222,6 +225,7 @@

default:

mlx4_warn(dev, "Unhandled event %02x(%02x) on EQ %d at index %u\n",

eqe->type, eqe->subtype, eq->eqn, eq->cons_index);

+

break;

};

@@ -244,6 +248,10 @@

eq_set_ci(eq, 0);

set_ci = 0;

}

+

+ if (cl_get_time_stamp() - start > cDpcMaxTime ) {

+ break; //allow other DPCs as well

+ }

}

eq_set_ci(eq, 1);
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20080626/2b22a3b7/attachment.html>


More information about the ofw mailing list