***SPAM*** Re: ***SPAM*** Re: [ofa-general] ib_reg_phys_mr( ) results in crash

neutron neutronsharc at gmail.com
Fri Feb 20 12:44:12 PST 2009


When we installed the ofed, we use:  "<OFED_1.3.1_dir>/install.pl --all".
So we expect it should have installed everything.

"ofed_info" shows "ofa_kernel-1.3.1" is installed, but
"ofa_kernel_devel" is not.  What's that package for? where to get it?
It seems not located at " <OFED_1.3.1_dir>/SRPMS ".    Thanks.

Below is the output given by "ofed_info".
-----------------
OFED-1.3.1
libibverbs:
git://git.openfabrics.org/ofed_1_3/libibverbs.git ofed_1_3
commit 40b771aa6a9c0ad092b2e20775b4723d3b173792
libmthca:
git://git.openfabrics.org/ofed_1_3/libmthca.git ofed_1_3
commit 9501e698d257949acfab2edc90812602966dbcc9
libmlx4:
git://git.openfabrics.org/ofed_1_3/libmlx4.git ofed_1_3
commit 3869d6dab7e12fe452270ca641f7dd7082b42482
libehca:
git://git.openfabrics.org/ofed_1_3/libehca.git ofed_1_3
commit fd898180cfa3b737f893f432a80b91bac3396325
libipathverbs:
git://git.openfabrics.org/ofed_1_3/libipathverbs.git ofed_1_3
commit 82be4d81859d1fd2edf830220fe65a9923b80a46
libcxgb3:
git://git.openfabrics.org/ofed_1_3/libcxgb3.git ofed_1_3
commit 6f7485feb244d8571fcab2292ef92c97bea48df0
libnes:
git://git.openfabrics.org/ofed_1_3/libnes.git ofed_1_3
commit 471fa2e5a7bb2f8946119396358c31adcc6c2fb3
libibcm:
git://git.openfabrics.org/ofed_1_3/libibcm.git ofed_1_3
commit 53ec35f544bbc1838bbadc2210909c25a954a5e2
librdmacm:
git://git.openfabrics.org/ofed_1_3/librdmacm.git ofed_1_3
commit a0ef80a1e0d5debdae48a844fbc8d09aec5b24b1
dapl1:
git://git.openfabrics.org/ofed_1_3/dapl1.git ofed_1_3
commit 7a9b58d6c50fc0a357de540ec3eb2ab2e07f8779
dapl2:
git://git.openfabrics.org/ofed_1_3/dapl2.git ofed_1_3
commit 2583f07d9d0f55eee14e0b0e6074bc6fd0712177
libsdp:
git://git.openfabrics.org/ofed_1_3/libsdp.git ofed_1_3
commit c8102dccc502930442b23de658674d386456b350
sdpnetstat:
git://git.openfabrics.org/ofed_1_3/sdpnetstat.git ofed_1_3
commit 3341620a7259c4f7bdd4180864b98e260c3dc223
srptools:
git://git.openfabrics.org/ofed_1_3/srptools.git ofed_1_3
commit e0ce2d42eeb25f8e89b8f6daaa32a630c9b64f0d
perftest:
git://git.openfabrics.org/ofed_1_3/perftest.git ofed_1_3
commit 6321b5468f7293088cc003809049c02b176130d8
qlvnictools:
git://git.openfabrics.org/ofed_1_3/qlvnictools.git ofed_1_3
commit 086f9cb80ee790d61bddaf201ecbae32a2ff21dd
tvflash:
git://git.openfabrics.org/ofed_1_3/tvflash.git ofed_1_3
commit f5e7407a7f2058448df5e5320d9843f944427429
mstflint:
git://git.openfabrics.org/ofed_1_3/mstflint.git ofed_1_3
commit 78bbd3d521a9078553a991111ffb6f76665b9ee9

qperf:
git://git.openfabrics.org/ofed_1_3/qperf.git ofed_1_3
commit 6221aabd038df0b7033e035378ca190641ed2295
management:
git://git.openfabrics.org/ofed_1_3/management.git ofed_1_3
commit d9c852406dae14e8284f9cfb1c7f495bbb55fddf
ibutils:
git://git.openfabrics.org/ofed_1_3/ibutils.git ofed_1_3
commit 7daf94fab6eaf307316326f3f49704e6080a1508
ibsim:
git://git.openfabrics.org/ofed_1_3/ibsim.git ofed_1_3
commit 55113d9f919709c7c97ea41d29991941b9c8be70

ofa_kernel-1.3.1:
Git:
git://git.openfabrics.org/ofed_1_3/linux-2.6.git ofed_kernel
commit 39e1dc833f98e5134f91fcf7f33df402adf4bc0c

# MPI
mvapich-1.0.1-2533.src.rpm
mvapich2-1.0.3-1.src.rpm
openmpi-1.2.6-1.src.rpm
mpitests-3.0-773.src.rpm


=-----------------

On Fri, Feb 20, 2009 at 3:21 AM, Liang Zhen <Zhen.Liang at sun.com> wrote:
> Hmm, I didn't see any problem in your code. Have you installed
> ofa_kernel_devel (kernel headers of  OFED) after installation of
> ofa_kernel_1_3_1?
>
> Regards
> Liang
>
> neutron:
>>
>> I'm using Mellanox HCA 'mthca0' type: MT25208, kernel version:
>> 2.6.18-53.1.14.el5,  ofed 1.3.1.
>>
>> The failed function call is like:
>>
>> {
>>
>> ctx->send_buf = dma_alloc_coherent(ctx->ib_dev->dma_device, MAX_SIZE,
>>                &dma_addr, GFP_KERNEL);
>>
>> ctx->phy_buf[0].addr = dma_addr;
>> ctx->phy_buf[0].size = MAX_SIZE;
>> ctx->iovstart = (u64) ctx->send_buf;
>>
>> printk("pd=%p, phy_buf[0].addr=%p,size=%d, iovstart=%llx\n",
>>       ctx->pd, ctx->phy_buf[0].addr, ctx->phy_buf[0].size, ctx->iovstart
>> );
>>
>> send_mr = ib_reg_phys_mr( ctx->pd, &ctx->phy_buf[0], 1,
>>                        IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ
>>                         | IB_ACCESS_LOCAL_WRITE, &(ctx->iovstart));
>> }
>>
>> The phy_buf[0] is a "ib_phys_buf" corresponding to "ctx->send_buf".
>>
>> Below is /var/log/messages output around the crash.
>> ----------------
>> Feb 19 12:50:22 wci30 kernel:  pd=ffff8101da3ddce0,
>> phy_buf[0].addr=00000001bbe4b000,size=1024, iovstart=ffff8101bbe4b000
>>
>> Feb 19 12:50:22 wci30 kernel: Unable to handle kernel NULL pointer
>> dereference at 0000000000000000
>>  RIP:
>> Feb 19 12:50:22 wci30 kernel:  [<0000000000000000>]
>> _stext+0x7ffff000/0x1000
>> Feb 19 12:50:22 wci30 kernel: PGD 1c06d5067 PUD 1c9dcd067 PMD 0
>> Feb 19 12:50:22 wci30 kernel: Oops: 0010 [1] SMP
>> Feb 19 12:50:22 wci30 kernel: last sysfs file: /module/libata/version
>> Feb 19 12:50:22 wci30 kernel: CPU 0
>> Feb 19 12:54:05 wci30 syslogd 1.4.1: restart.
>> Feb 19 12:54:05 wci30 kernel: klogd 1.4.1, log source = /proc/kmsg
>> started.
>> Feb 19 12:54:05 wci30 kernel: Linux version 2.6.18-53.1.14.el5
>> (brewbuilder at hs20-bc2-3.build.redha
>> t.com) (gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)) #1 SMP Tue Feb
>> 19 07:18:46 EST 2008
>> Feb 19 12:54:05 wci30 kernel: Command line: ro root=LABEL=/ rhgb quiet
>>
>> ====================
>> It's strange that the kernel doesn't print out the function call stack
>> before crashing.
>>
>> Any hints?  Thanks a lot!
>>
>> On Wed, Feb 18, 2009 at 7:40 PM, Roland Dreier <rdreier at cisco.com> wrote:
>>
>>>
>>>  > Before calling ib_reg_phys_mr,  printk() shows that all its arguments
>>>  > are valid.  But the system always crashes immediately after entering
>>>  > the function ib_reg_phys_mr( ).    Any possible reasons ?  Thanks!!
>>>
>>> What do you mean by "immediately after entering ib_reg_phys_mr()"?  Do
>>> you get an oops message?  If so that would be very important info for
>>> debugging this.
>>>
>>> - R.
>>>
>>>
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
>
>



More information about the general mailing list