[ofa-general] interesting ibv_reg_mr failures

Dotan Barak dotanb at dev.mellanox.co.il
Sun Mar 25 02:22:20 PDT 2007


Hi Troy.

I can only answer about your info which is related to the mthca devices.


Troy Benjegerdes wrote:
> We have been getting some interesting failures with ibv_reg_mr..
>
> gcc -ggdb -libverbs -o mr-test mr-test.c
> /usr/src/ibv-mr-test/mr-test
> mr-test: bufsize 1048576
> device # 0 name="mthca0" guid="00066a0098000464"
>         ibv_open_device() context=0x10012c98
>         ibv_alloc_pd() pd=0x10013678
>         alloc: 2482
>         ibv_reg_mr failed:: Cannot allocate memory
>         fw_ver: 3.3.2
>         max_mr_size 0xffffffffffffffff
>         max_mr: 131056, could only register 2482 regions
>         sleep 5 sec
>         free: 0
> done
I wasn't able to reproduce this failure but i noticed that you are using 
an old FW version (current version is 3.5.0).
>
> with a 10MB buffer:
>
> gcc -ggdb -libverbs -o mr-test mr-test.c
> /usr/src/ibv-mr-test/mr-test
> mr-test: bufsize 10485760
> device # 0 name="mthca0" guid="00066a0098000464"
>         ibv_open_device() context=0x10012c98
>         ibv_alloc_pd() pd=0x10013678
>         alloc: 2482
>         ibv_reg_mr failed:: Cannot allocate memory
>         fw_ver: 3.3.2
>         max_mr_size 0xffffffffffffffff
>         max_mr: 131056, could only register 2482 regions
>         sleep 5 sec
>         free: 0
> done
On 64 bit machine i got a kernel oops, bug number 490 was opened in the 
Bugzilla and we are analyzing this failure.
> And, on an PCI-express mellanox hca:
> /afs/scl.ameslab.gov/user/troy/src/ibv-mr-test/mr-test
> mr-test: bufsize 10485760
> device # 0 name="mthca0" guid="0002c9020040272c"
>         ibv_open_device() context=0x504c00
>         ibv_alloc_pd() pd=0x503f30
>         alloc: 12277
>         ibv_reg_mr failed:: Cannot allocate memory
>         fw_ver: 5.1.0
>         max_mr_size 0xffffffffffffffff
>         max_mr: 131056, could only register 12277 regions
>         sleep 5 sec
>         free: 0
> done
I'm checking this issue and let you know about what i will find.
>
> On the pci-express hca, it also looks like the memory usage, as 
> reported by "free" goes down by about 300MB once all these regions are 
> allocated.. but the process usage as reported by top is only 20mb 
> total virtual size. What's going on here?
are you talking about the "free memory" which is reported by top?

thanks
Dotan



More information about the general mailing list