[openib-general] problems with lustre o2ib module & ofed

Thierry Delaitre delaitt at cpc.wmin.ac.uk
Mon Sep 25 00:42:59 PDT 2006


On Mon, 25 Sep 2006, Or Gerlitz wrote:

> Jack Morgenstein wrote:
> > Did you recompile Lustre following the installation of ofed-1.1?
> > I'm not familiar with the Lustre installation procedure (i.e., if it
> > gets compiled on the current host).  If yes, you probably merely need
> > to uninstall and reinstall Lustre o2ib.
>
> OK, can we state clearly what's the user needs to do with modules
> directly dependent on ofed symbols (eg Lustre's o2ib, NFSoRDMA, RDS and
> hopefully more to come).
>
> Is it recompile / uninstall / install ???

The issue is about the installation of Lustre 1.5.95 o2ib with OFED-1.1rc6
for SLES10.

ofed-1.1-rc6 compiles nicely as shown below. The ib kernel modules all
resides under /lib/modules/2.6.16.21-0.8-smp/kernel/drivers/infiniband/
and do match the ones compiled by ofed. I have tried these steps several
times.

n32:~ # lsmod | grep ib
libcfs                103060  1 lnet
ib_ucm                 19332  0
ib_addr                10756  1 rdma_cm
ib_cm                  31968  2 ib_ucm,rdma_cm
ib_ipoib               48144  0
ib_sa                  16652  3 rdma_cm,ib_cm,ib_ipoib
ib_uverbs              38312  2 rdma_ucm,ib_ucm
ib_umad                17968  0
ib_mthca              116240  0
ib_mad                 36116  4 ib_cm,ib_sa,ib_umad,ib_mthca
ib_core                49024  9
ib_ucm,rdma_cm,ib_cm,ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad

I compiled lustre for the above kernel and ofed installation. I get the
following when doing a 'lctl network up' in lustre. I have modversion set
to on in the kernel. If i set it to 'n' then i get a null pointer
exception and the module crashes.

ko2iblnd: disagrees about version of symbol ib_create_cq
ko2iblnd: Unknown symbol ib_create_cq
ko2iblnd: disagrees about version of symbol ib_dereg_mr
ko2iblnd: Unknown symbol ib_dereg_mr
ko2iblnd: disagrees about version of symbol ib_destroy_cq
ko2iblnd: Unknown symbol ib_destroy_cq
ko2iblnd: disagrees about version of symbol ib_get_dma_mr
ko2iblnd: Unknown symbol ib_get_dma_mr
ko2iblnd: disagrees about version of symbol ib_alloc_pd
ko2iblnd: Unknown symbol ib_alloc_pd
ko2iblnd: disagrees about version of symbol ib_modify_qp
ko2iblnd: Unknown symbol ib_modify_qp
ko2iblnd: disagrees about version of symbol ib_dealloc_pd
ko2iblnd: Unknown symbol ib_dealloc_pd
LustreError: 5725:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND
o2ib, module ko2iblnd, rc=256

I have tried with ofed-1.1-rc5 and experiences the same issue.

Thierry.

> Or.
>
> > On Sunday 24 September 2006 12:57, Thierry Delaitre wrote:
> >> I get the following when loading lustre o2ib module. I'm using ofed-1.1
> >> rc6 on sles10 and i'm sure the ib modules are the ones recompiled for the
> >> kernel i'm using and lustre too. I don't understand why i get the
> >> following as i only have one version of the ib modules ?
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
>

----------------------------------------
Dr Thierry DELAITRE
Systems and Services Manager, CSCS
University of Westminster
115 New Cavendish Street, London W1W 6UW

Tel: 020 7911 5000 ext: 3586
Fax: 020 7911 5089
Mobile short dial code 1788

http://www.cscs.wmin.ac.uk/~delaitt
----------------------------------------

This e-mail and its attachments are intended for the above named only
and may be confidential.  If they have come to you in error you must
not copy or show them to anyone, nor should you take any action based
on them, other than to notify the error by replying to the sender.




More information about the general mailing list