[Users] RDMA issues in ib_qib [was:IPoIB on CentOS 6.5]
Peter Kjellström
cap at nsc.liu.se
Tue Mar 24 06:05:55 PDT 2015
On Mon, 23 Mar 2015 22:01:42 +0000
"Foraker, Jim" <foraker1 at llnl.gov> wrote:
> On 3/23/15, 2:24 AM, "Peter Kjellström" <cap at nsc.liu.se> wrote:
>
> >On Thu, 19 Mar 2015 16:17:09 +0000
> >"Foraker, Jim" <foraker1 at llnl.gov> wrote:
> >
> >> Peter,
> >> Thanks. I¹ve told our RedHat folks that the IPoIB issue is a
> >> high priority for us. Our bug for the qib kernel RDMA issue is
> >> 1188417, which was closed as a duplicate of
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1171803.
> >
> >And (no surprise) both those are non-public. Can you give a short
> >summary (root cause, work-around, fix planned for, ..)?
> Sorry, I prodded one of our RH folks to make our BZ public
> again. It should be visible now. In case it¹s not, the problem
> boils down to to a snippet of qib_mag_sg() in qib_dma.c:
>
> #ifdef CONFIG_NEED_SG_DMA_LENGTH
> sg->dma_length = sg->length
> #endif
>
> CONFIG_NEED_SG_DMA_LENGTH is a config option from more recent
> kernels not present in RHEL6. It appears to have been inadvertently
> brought in while backporting an upstream patch. (struct
> scatterlist).dma_length should be getting set in RHEL6, but because
> of the ifdef, it¹s not. Remove the ifdef and kernel RDMA works fine
> again.
Many thanks, we'll probably include that patch in our local rebuilds.
Out of curiosity, which infinipath-psm do you match the ib_qib from the
rhel-6.6 kernel with?
We noticed the ipath_userinit complaining about version mismatch
(kernel too new) and rebuilt a newer upstream version. This caused the
warning to go away but MPI behavior to degrade (hangs during MPI
teardown mostly). Currently we run with a locally patched version that
simply demoted that version mismatch warning to a debug print...
/Peter
More information about the Users
mailing list