***SPAM*** Re: [ofa-general] ***SPAM*** Mellanox ibv_reg_mr (memory region) function call fails under load when using the mlx4 driver

Phillip Wilson phillipwils at gmail.com
Wed Feb 25 00:29:34 PST 2009


All programs are executed as the root user.

ulimit -a

time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
coredump(blocks)     0
memory(kbytes)       unlimited
locked memory(kbytes) unlimited
process              8063
nofiles              1048576
vmemory(kbytes)      unlimited
locks                unlimited


On Tue, Feb 24, 2009 at 11:50 PM, Dotan Barak <dotanba at gmail.com> wrote:

> Do you execute your program under the root user or under any other user?
> (maybe you fail because of the ulimit value of memory which can be pinned)
>
>
> Dotan
>
> On Wed, Feb 25, 2009 at 7:51 AM, Phillip Wilson <phillipwils at gmail.com>
> wrote:
> > The “ibv_reg_mr()” function call fails with HCA (DID=0x634A) that uses
> the
> > mlx4_0 driver when the system is under load (memory and cpu).  The system
> > usually has over 500MB of system memory when “ibv_reg_mr()” call fails.
> >
> >
> >
> > If I only run one HCA with (DID=0x6278) that uses the mthca0 driver with
> the
> > other tools to generate stress the “ibv_reg_mr()” call always passes.  If
> I
> > only run the HCA with (DID=0x634A) with the other tools to generate
> stress
> > the “ibv_reg_mr()” call will always fails; it usually takes less than 30
> > minutes for the failure to occur.
> >
> >
> >
> >
> >
> > The maximum number of memory regions requested at one time is up to 8
> (32MB)
> > with two HCA dual port cards and the maximum size for a memory region is
> 1
> > MB.
> >
> >
> >
> > (i.e. ctx->mr = ibv_reg_mr(ctx->pd,
> >
> >                                              buffer,  /*malloc 4MB buffer
> > per process*/
> >
> >                                              size,      /*2 Bytes to 1MB
> */
> >
> >                                              IBV_ACCESS_LOCAL_WRITE);
> >
> > )
> >
> >
> >
> > I modified the ibv_rc_pingpong test to use the parent-child paradigm
> instead
> > of the current client/server approach for my environment.  The code forks
> a
> > parent process and a child process per port which serves the same purpose
> as
> > the current client/server approach.  The code also forks a process to run
> on
> > a HCA.  Basically, the same code is executed on each HCA except for the
> user
> > libraries (libmlx4.so, libmthca.so), mlx4.ko, mthca.ko and firmware on
> each
> > HCA.
> >
> >
> >
> > Since the code in the user libraries is very similar to each other, I
> > suspect the issue is in the kernel code or HCA firmware.
> >
> >
> >
> > Does any one know what kernel patch fixes this issue starting from kernel
> > 2.6.24 through 2.6.28?  Has anyone else seen this issue?
> >
> >
> >
> > System Information:
> >
> >
> >
> > The system has 4GB of memory.
> >
> >
> >
> > uname -a
> >
> > Linux (none) 2.6.24.02.02.08 #21 SMP Thu Feb 19 11:04:35 PST 2009 ia64
> > unknown
> >
> >
> >
> > OFED 1.2.5
> >
> >
> >
> > lspci -d 15b3:
> >
> >
> >
> > 0000:10:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex
> > (Tavor compatibility mode) (rev 20)
> >
> > 0000:c3:00.0 InfiniBand: Mellanox Technologies: Unknown device 634a (rev
> a0)
> >
> >
> >
> > lspci -d 15b3: -n
> >
> > 0000:10:00.0 0c06: 15b3:6278 (rev 20)
> >
> > 0000:c3:00.0 0c06: 15b3:634a (rev a0)
> >
> >
> >
> > ibv_devinfo -v
> >
> > hca_id: mlx4_0
> >
> >         fw_ver:                         2.5.000
> >
> >
> >
> > hca_id: mthca0
> >
> >         fw_ver:                         4.8.930
> >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090225/ac9e3913/attachment.html>


More information about the general mailing list