[openib-general] RHEL 4 U3 - lost completions

Roland Dreier rdreier at cisco.com
Mon Oct 2 10:47:57 PDT 2006


    Bill> I am testing an app in development on RHEL 4 U3 using uDAPL.
    Bill> The app runs OK on gen1 stacks, but cannot run on any OFED
    Bill> based stack I have tried on RHEL 4 U3.  The symptom is RDMAs
    Bill> not getting completion.  A completion notification is sent,
    Bill> but mthca_poll_cq() finds no completion.  I debugged the
    Bill> problem to this: the memory for the completion queue is not
    Bill> pinned and at some point the page struct changes *after* the
    Bill> HCA has been handed the address of the completion queue, so
    Bill> subsequent completions are written elsewhere in memory and
    Bill> the app hangs waiting for completion.

The memory should be pinned by the call to  __mthca_reg_mr() in
mthca_create_cq(), since the kernel will do get_user_pages() on the
memory.

By any chance, does your app do fork() or system() or something like that?

 - R.




More information about the general mailing list