[ofa-general] interesting ibv_reg_mr failures

Troy Benjegerdes troy at scl.ameslab.gov
Fri Mar 23 10:30:55 PDT 2007


We have been getting some interesting failures with ibv_reg_mr..

The max_mr number is on the order of 100K regions, yet we are only  
able to register under 2500 regions on a mellanox card, and around  
4800 regions on an ehca.

A testcase is available here:

http://source.scl.ameslab.gov/hg/ibv-mr-test?f=b117b624511e;file=mr- 
test.c

raw wget-able form:
http://source.scl.ameslab.gov/hg/ibv-mr-test?f=b117b624511e;file=mr- 
test.c;style=raw


Here's what the output looks like:

gcc -ggdb -libverbs -o mr-test mr-test.c
/usr/src/ibv-mr-test/mr-test
mr-test: bufsize 1048576
device # 0 name="mthca0" guid="00066a0098000464"
         ibv_open_device() context=0x10012c98
         ibv_alloc_pd() pd=0x10013678
         alloc: 2482
         ibv_reg_mr failed:: Cannot allocate memory
         fw_ver: 3.3.2
         max_mr_size 0xffffffffffffffff
         max_mr: 131056, could only register 2482 regions
         sleep 5 sec
         free: 0
done
device # 1 name="ehca0" guid="000255000001c900"
         ibv_open_device() context=0x10012c98
         ibv_alloc_pd() pd=0x10012080
         alloc: 3067
         free: 0
done

with a 10MB buffer:

gcc -ggdb -libverbs -o mr-test mr-test.c
/usr/src/ibv-mr-test/mr-test
mr-test: bufsize 10485760
device # 0 name="mthca0" guid="00066a0098000464"
         ibv_open_device() context=0x10012c98
         ibv_alloc_pd() pd=0x10013678
         alloc: 2482
         ibv_reg_mr failed:: Cannot allocate memory
         fw_ver: 3.3.2
         max_mr_size 0xffffffffffffffff
         max_mr: 131056, could only register 2482 regions
         sleep 5 sec
         free: 0
done
device # 1 name="ehca0" guid="000255000001c900"
         ibv_open_device() context=0x10012c98
         ibv_alloc_pd() pd=0x10012080
         alloc: 4119   PID264f ehca0 EHCA_ERR:ehcau_reg_mr  
ibv_cmd_reg_mr ret=c
         alloc: 4120
         ibv_reg_mr failed:: Cannot allocate memory
         fw_ver:
         max_mr_size 0x200000000
         max_mr: 61382, could only register 4120 regions
         sleep 5 sec
         free: 0
done

And, on an PCI-express mellanox hca:
/afs/scl.ameslab.gov/user/troy/src/ibv-mr-test/mr-test
mr-test: bufsize 10485760
device # 0 name="mthca0" guid="0002c9020040272c"
         ibv_open_device() context=0x504c00
         ibv_alloc_pd() pd=0x503f30
         alloc: 12277
         ibv_reg_mr failed:: Cannot allocate memory
         fw_ver: 5.1.0
         max_mr_size 0xffffffffffffffff
         max_mr: 131056, could only register 12277 regions
         sleep 5 sec
         free: 0
done

On the pci-express hca, it also looks like the memory usage, as  
reported by "free" goes down by about 300MB once all these regions  
are allocated.. but the process usage as reported by top is only 20mb  
total virtual size. What's going on here?




More information about the general mailing list