[ofa-general] ***SPAM*** Mellanox ibv_reg_mr (memory region) function call fails under load when using the mlx4 driver

Phillip Wilson phillipwils at gmail.com
Tue Feb 24 21:51:31 PST 2009


The “ibv_reg_mr()” function call fails with HCA (DID=0x634A) that uses the
mlx4_0 driver when the system is under load (memory and cpu).  The system
usually has over 500MB of system memory when “ibv_reg_mr()” call fails.



If I only run one HCA with (DID=0x6278) that uses the mthca0 driver with the
other tools to generate stress the “ibv_reg_mr()” call always passes.  If I
only run the HCA with (DID=0x634A) with the other tools to generate stress
the “ibv_reg_mr()” call will always fails; it usually takes less than 30
minutes for the failure to occur.





The maximum number of memory regions requested at one time is up to 8 (32MB)
with two HCA dual port cards and the maximum size for a memory region is 1
MB.



(i.e. ctx->mr = ibv_reg_mr(ctx->pd,

                                             buffer,  /*malloc 4MB buffer
per process*/

                                             size,      /*2 Bytes to 1MB */

                                             IBV_ACCESS_LOCAL_WRITE);

)



I modified the ibv_rc_pingpong test to use the parent-child paradigm instead
of the current client/server approach for my environment.  The code forks a
parent process and a child process per port which serves the same purpose as
the current client/server approach.  The code also forks a process to run on
a HCA.  Basically, the same code is executed on each HCA except for the user
libraries (libmlx4.so, libmthca.so), mlx4.ko, mthca.ko and firmware on each
HCA.



Since the code in the user libraries is very similar to each other, I
suspect the issue is in the kernel code or HCA firmware.



Does any one know what kernel patch fixes this issue starting from kernel
2.6.24 through 2.6.28?  Has anyone else seen this issue?



System Information:



The system has 4GB of memory.



uname -a

Linux (none) 2.6.24.02.02.08 #21 SMP Thu Feb 19 11:04:35 PST 2009 ia64
unknown



OFED 1.2.5



lspci -d 15b3:



0000:10:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex
(Tavor compatibility mode) (rev 20)

0000:c3:00.0 InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0)



lspci -d 15b3: -n

0000:10:00.0 0c06: 15b3:6278 (rev 20)

0000:c3:00.0 0c06: 15b3:634a (rev a0)



ibv_devinfo -v

hca_id: mlx4_0

        fw_ver:                         2.5.000



hca_id: mthca0

        fw_ver:                         4.8.930
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090224/ad6e95f6/attachment.html>


More information about the general mailing list