[openib-general] RHEL 4 U3 - lost completions

Bill Hartner bhartner at austin.rr.com
Mon Oct 2 10:26:36 PDT 2006


I am testing an app in development on RHEL 4 U3 using uDAPL.  The app runs OK on gen1 stacks, but cannot run on any OFED based stack I have tried on RHEL 4 U3.  The symptom is RDMAs not getting completion.  A completion notification is sent, but
mthca_poll_cq() finds no completion.  I debugged the problem to this: the memory for the completion queue is not pinned and at some point the page struct changes *after* the HCA has been handed the address of the completion queue, so subsequent completions
are written elsewhere in memory and the app hangs waiting for completion.

I hacked in the following to get the app running, I replaced the allocation of the completion buffer in libmthca,

ret = posix_memalign(memptr, alignment, size);

with,

size = (size + (4096-1)) & ~(4096-1);
*memptr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_LOCKED,0,0);

Is there a restriction on using completion queues on a RHEL 4 Update 3 kernel ? Am I missing a patch ?

Details in http://openib.org/bugzilla/show_bug.cgi?id=147

-Bill




More information about the general mailing list