***SPAM*** Re: [ofa-general] ***SPAM*** Mellanox ibv_reg_mr (memory region) function call fails under load when using the mlx4 driver

Dotan Barak dotanba at gmail.com
Tue Feb 24 23:50:54 PST 2009


Do you execute your program under the root user or under any other user?
(maybe you fail because of the ulimit value of memory which can be pinned)


Dotan

On Wed, Feb 25, 2009 at 7:51 AM, Phillip Wilson <phillipwils at gmail.com> wrote:
> The “ibv_reg_mr()” function call fails with HCA (DID=0x634A) that uses the
> mlx4_0 driver when the system is under load (memory and cpu).  The system
> usually has over 500MB of system memory when “ibv_reg_mr()” call fails.
>
>
>
> If I only run one HCA with (DID=0x6278) that uses the mthca0 driver with the
> other tools to generate stress the “ibv_reg_mr()” call always passes.  If I
> only run the HCA with (DID=0x634A) with the other tools to generate stress
> the “ibv_reg_mr()” call will always fails; it usually takes less than 30
> minutes for the failure to occur.
>
>
>
>
>
> The maximum number of memory regions requested at one time is up to 8 (32MB)
> with two HCA dual port cards and the maximum size for a memory region is 1
> MB.
>
>
>
> (i.e. ctx->mr = ibv_reg_mr(ctx->pd,
>
>                                              buffer,  /*malloc 4MB buffer
> per process*/
>
>                                              size,      /*2 Bytes to 1MB */
>
>                                              IBV_ACCESS_LOCAL_WRITE);
>
> )
>
>
>
> I modified the ibv_rc_pingpong test to use the parent-child paradigm instead
> of the current client/server approach for my environment.  The code forks a
> parent process and a child process per port which serves the same purpose as
> the current client/server approach.  The code also forks a process to run on
> a HCA.  Basically, the same code is executed on each HCA except for the user
> libraries (libmlx4.so, libmthca.so), mlx4.ko, mthca.ko and firmware on each
> HCA.
>
>
>
> Since the code in the user libraries is very similar to each other, I
> suspect the issue is in the kernel code or HCA firmware.
>
>
>
> Does any one know what kernel patch fixes this issue starting from kernel
> 2.6.24 through 2.6.28?  Has anyone else seen this issue?
>
>
>
> System Information:
>
>
>
> The system has 4GB of memory.
>
>
>
> uname -a
>
> Linux (none) 2.6.24.02.02.08 #21 SMP Thu Feb 19 11:04:35 PST 2009 ia64
> unknown
>
>
>
> OFED 1.2.5
>
>
>
> lspci -d 15b3:
>
>
>
> 0000:10:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex
> (Tavor compatibility mode) (rev 20)
>
> 0000:c3:00.0 InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0)
>
>
>
> lspci -d 15b3: -n
>
> 0000:10:00.0 0c06: 15b3:6278 (rev 20)
>
> 0000:c3:00.0 0c06: 15b3:634a (rev a0)
>
>
>
> ibv_devinfo -v
>
> hca_id: mlx4_0
>
>         fw_ver:                         2.5.000
>
>
>
> hca_id: mthca0
>
>         fw_ver:                         4.8.930
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>



More information about the general mailing list