[ofa-general] interesting ibv_reg_mr failures
Dotan Barak
dotanb at dev.mellanox.co.il
Sun Mar 25 02:22:20 PDT 2007
Hi Troy.
I can only answer about your info which is related to the mthca devices.
Troy Benjegerdes wrote:
> We have been getting some interesting failures with ibv_reg_mr..
>
> gcc -ggdb -libverbs -o mr-test mr-test.c
> /usr/src/ibv-mr-test/mr-test
> mr-test: bufsize 1048576
> device # 0 name="mthca0" guid="00066a0098000464"
> ibv_open_device() context=0x10012c98
> ibv_alloc_pd() pd=0x10013678
> alloc: 2482
> ibv_reg_mr failed:: Cannot allocate memory
> fw_ver: 3.3.2
> max_mr_size 0xffffffffffffffff
> max_mr: 131056, could only register 2482 regions
> sleep 5 sec
> free: 0
> done
I wasn't able to reproduce this failure but i noticed that you are using
an old FW version (current version is 3.5.0).
>
> with a 10MB buffer:
>
> gcc -ggdb -libverbs -o mr-test mr-test.c
> /usr/src/ibv-mr-test/mr-test
> mr-test: bufsize 10485760
> device # 0 name="mthca0" guid="00066a0098000464"
> ibv_open_device() context=0x10012c98
> ibv_alloc_pd() pd=0x10013678
> alloc: 2482
> ibv_reg_mr failed:: Cannot allocate memory
> fw_ver: 3.3.2
> max_mr_size 0xffffffffffffffff
> max_mr: 131056, could only register 2482 regions
> sleep 5 sec
> free: 0
> done
On 64 bit machine i got a kernel oops, bug number 490 was opened in the
Bugzilla and we are analyzing this failure.
> And, on an PCI-express mellanox hca:
> /afs/scl.ameslab.gov/user/troy/src/ibv-mr-test/mr-test
> mr-test: bufsize 10485760
> device # 0 name="mthca0" guid="0002c9020040272c"
> ibv_open_device() context=0x504c00
> ibv_alloc_pd() pd=0x503f30
> alloc: 12277
> ibv_reg_mr failed:: Cannot allocate memory
> fw_ver: 5.1.0
> max_mr_size 0xffffffffffffffff
> max_mr: 131056, could only register 12277 regions
> sleep 5 sec
> free: 0
> done
I'm checking this issue and let you know about what i will find.
>
> On the pci-express hca, it also looks like the memory usage, as
> reported by "free" goes down by about 300MB once all these regions are
> allocated.. but the process usage as reported by top is only 20mb
> total virtual size. What's going on here?
are you talking about the "free memory" which is reported by top?
thanks
Dotan
More information about the general
mailing list