[ofa-general] interesting ibv_reg_mr failures

Troy Benjegerdes troy at scl.ameslab.gov
Sun Mar 25 17:41:11 PDT 2007


On Mar 25, 2007, at 4:22 AM, Dotan Barak wrote:

> Hi Troy.
>
> I can only answer about your info which is related to the mthca  
> devices.
>
>
> Troy Benjegerdes wrote:
>> We have been getting some interesting failures with ibv_reg_mr..
>>
>> gcc -ggdb -libverbs -o mr-test mr-test.c
>> /usr/src/ibv-mr-test/mr-test
>> mr-test: bufsize 1048576
>> device # 0 name="mthca0" guid="00066a0098000464"
>>         ibv_open_device() context=0x10012c98
>>         ibv_alloc_pd() pd=0x10013678
>>         alloc: 2482
>>         ibv_reg_mr failed:: Cannot allocate memory
>>         fw_ver: 3.3.2
>>         max_mr_size 0xffffffffffffffff
>>         max_mr: 131056, could only register 2482 regions
>>         sleep 5 sec
>>         free: 0
>> done
> I wasn't able to reproduce this failure but i noticed that you are  
> using an old FW version (current version is 3.5.0).
>>
>> with a 10MB buffer:
>>
>> gcc -ggdb -libverbs -o mr-test mr-test.c
>> /usr/src/ibv-mr-test/mr-test
>> mr-test: bufsize 10485760
>> device # 0 name="mthca0" guid="00066a0098000464"
>>         ibv_open_device() context=0x10012c98
>>         ibv_alloc_pd() pd=0x10013678
>>         alloc: 2482
>>         ibv_reg_mr failed:: Cannot allocate memory
>>         fw_ver: 3.3.2
>>         max_mr_size 0xffffffffffffffff
>>         max_mr: 131056, could only register 2482 regions
>>         sleep 5 sec
>>         free: 0
>> done
> On 64 bit machine i got a kernel oops, bug number 490 was opened in  
> the Bugzilla and we are analyzing this failure.
>> And, on an PCI-express mellanox hca:
>> /afs/scl.ameslab.gov/user/troy/src/ibv-mr-test/mr-test
>> mr-test: bufsize 10485760
>> device # 0 name="mthca0" guid="0002c9020040272c"
>>         ibv_open_device() context=0x504c00
>>         ibv_alloc_pd() pd=0x503f30
>>         alloc: 12277
>>         ibv_reg_mr failed:: Cannot allocate memory
>>         fw_ver: 5.1.0
>>         max_mr_size 0xffffffffffffffff
>>         max_mr: 131056, could only register 12277 regions
>>         sleep 5 sec
>>         free: 0
>> done
> I'm checking this issue and let you know about what i will find.
>>
>> On the pci-express hca, it also looks like the memory usage, as  
>> reported by "free" goes down by about 300MB once all these regions  
>> are allocated.. but the process usage as reported by top is only  
>> 20mb total virtual size. What's going on here?
> are you talking about the "free memory" which is reported by top?

Both the free memory reported by 'top', and the free memory reported  
by the 'free' command on debian.






More information about the general mailing list