[ofa-general] interesting ibv_reg_mr failures
Troy Benjegerdes
troy at scl.ameslab.gov
Fri Mar 23 10:30:55 PDT 2007
We have been getting some interesting failures with ibv_reg_mr..
The max_mr number is on the order of 100K regions, yet we are only
able to register under 2500 regions on a mellanox card, and around
4800 regions on an ehca.
A testcase is available here:
http://source.scl.ameslab.gov/hg/ibv-mr-test?f=b117b624511e;file=mr-
test.c
raw wget-able form:
http://source.scl.ameslab.gov/hg/ibv-mr-test?f=b117b624511e;file=mr-
test.c;style=raw
Here's what the output looks like:
gcc -ggdb -libverbs -o mr-test mr-test.c
/usr/src/ibv-mr-test/mr-test
mr-test: bufsize 1048576
device # 0 name="mthca0" guid="00066a0098000464"
ibv_open_device() context=0x10012c98
ibv_alloc_pd() pd=0x10013678
alloc: 2482
ibv_reg_mr failed:: Cannot allocate memory
fw_ver: 3.3.2
max_mr_size 0xffffffffffffffff
max_mr: 131056, could only register 2482 regions
sleep 5 sec
free: 0
done
device # 1 name="ehca0" guid="000255000001c900"
ibv_open_device() context=0x10012c98
ibv_alloc_pd() pd=0x10012080
alloc: 3067
free: 0
done
with a 10MB buffer:
gcc -ggdb -libverbs -o mr-test mr-test.c
/usr/src/ibv-mr-test/mr-test
mr-test: bufsize 10485760
device # 0 name="mthca0" guid="00066a0098000464"
ibv_open_device() context=0x10012c98
ibv_alloc_pd() pd=0x10013678
alloc: 2482
ibv_reg_mr failed:: Cannot allocate memory
fw_ver: 3.3.2
max_mr_size 0xffffffffffffffff
max_mr: 131056, could only register 2482 regions
sleep 5 sec
free: 0
done
device # 1 name="ehca0" guid="000255000001c900"
ibv_open_device() context=0x10012c98
ibv_alloc_pd() pd=0x10012080
alloc: 4119 PID264f ehca0 EHCA_ERR:ehcau_reg_mr
ibv_cmd_reg_mr ret=c
alloc: 4120
ibv_reg_mr failed:: Cannot allocate memory
fw_ver:
max_mr_size 0x200000000
max_mr: 61382, could only register 4120 regions
sleep 5 sec
free: 0
done
And, on an PCI-express mellanox hca:
/afs/scl.ameslab.gov/user/troy/src/ibv-mr-test/mr-test
mr-test: bufsize 10485760
device # 0 name="mthca0" guid="0002c9020040272c"
ibv_open_device() context=0x504c00
ibv_alloc_pd() pd=0x503f30
alloc: 12277
ibv_reg_mr failed:: Cannot allocate memory
fw_ver: 5.1.0
max_mr_size 0xffffffffffffffff
max_mr: 131056, could only register 12277 regions
sleep 5 sec
free: 0
done
On the pci-express hca, it also looks like the memory usage, as
reported by "free" goes down by about 300MB once all these regions
are allocated.. but the process usage as reported by top is only 20mb
total virtual size. What's going on here?
More information about the general
mailing list