[openib-general] problems with lustre o2ib module & ofed

Jack Morgenstein jackm at dev.mellanox.co.il
Sun Sep 24 23:58:11 PDT 2006


Robert,

We build "external modules that depend on other external
modules that you previously built" all the time in our
regression testing -- and this runs properly under lots of
distibutions and under lots of different linux kernels.

we do not experience the problem you describe.

We build kernel modules which exercise various installed OFED 1.1
kernel modules (ib_verbs, ib_mad, etc etc).  We then load these
kernel modules during our regression testing to verify the operation
of the OFED 1.1 kernel modules.

If the explanation you provide below is correct, our kernel module
testing would not work at all. (We do not do any of the workarounds
you described below).

We have seen problems like the one described when either:
a. The dependent external modules were not rebuilt following
   OFED installation.
or
b. There were old .ko files lying around which were loaded instead of
   the installed OFED .ko files.

- Jack

On Sunday 24 September 2006 20:45, Robert Walsh wrote:
> This explanation gets ugly :-)
> 
> The short description is: you can't build external modules that depend 
> on other external modules that you previously built.
> 
> The reason why is: the kernel devel stuff ships with a file called 
> Module.symvers, which contains all the version information for all the 
> symbols in the kernel and in all the modules built when the kernel was 
> built.  When you build an external module, the kernel build stuff looks 
> in here to get the version information for any symbol referenced that it 
> can't find in the group of modules you're building.  If you've replaced 
> some modules with newer ones (like what happens when you install 
> OFED-1.1, for example), then the symbol versions in the new modules will 
> not match what's in the Module.symvers file.
> 
> In your case, you installed a bunch of new modules (OFED-1.1) and then, 
> in a second step, installed another new module (Lustre).  The OFED-1.1 
> build was OK because all external symbols that it referenced (all of 
> which are in vmlinux, I think) had properly matching version entries in 
> Module.symvers.  The Lustre build, however, was pulling ib_* symbols 
> from the new OFED-1.1 modules that had mismatching symbol versions in 
> Module.symvers from the original kernel modules (I don't remember if the 
> kernel build warns about mismatching symbol versions at build time.)
> 
> At insmod time, the kernel checks that the symbol versions of 
> already-loaded modules match the expected versions in the to-be-loaded 
> module.  In your case, they will not.
> 
> One solutions is: extract the kernel sources form the OFED-1.1 
> distribution, patch them as the OFED build script would, add in the 
> Lustre bits and build the whole thing yourself manually.
> 
> Another solution is: update the Module.symvers file.
> 
> Neither is a terribly satisfactory solution.
> 
> Regards,
>   Robert.
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 




More information about the general mailing list