[openib-general] RHEL 4 U3 - lost completions
Bill Hartner
bhartner at austin.rr.com
Mon Oct 2 10:26:36 PDT 2006
I am testing an app in development on RHEL 4 U3 using uDAPL. The app runs OK on gen1 stacks, but cannot run on any OFED based stack I have tried on RHEL 4 U3. The symptom is RDMAs not getting completion. A completion notification is sent, but
mthca_poll_cq() finds no completion. I debugged the problem to this: the memory for the completion queue is not pinned and at some point the page struct changes *after* the HCA has been handed the address of the completion queue, so subsequent completions
are written elsewhere in memory and the app hangs waiting for completion.
I hacked in the following to get the app running, I replaced the allocation of the completion buffer in libmthca,
ret = posix_memalign(memptr, alignment, size);
with,
size = (size + (4096-1)) & ~(4096-1);
*memptr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS | MAP_LOCKED,0,0);
Is there a restriction on using completion queues on a RHEL 4 Update 3 kernel ? Am I missing a patch ?
Details in http://openib.org/bugzilla/show_bug.cgi?id=147
-Bill
More information about the general
mailing list