[openib-general] Re: [PATCHv3] kDAPL: remove use of HANDLE's (vs. r2564)

Hal Rosenstock halr at voltaire.com
Thu Jun 9 12:11:49 PDT 2005


On Wed, 2005-06-08 at 16:44, Tom Duffy wrote: 
> This just in!  Don't know if was caused by your patches or mine, but I
> ran the server on 192.168.0.26, and tried to connect with the client, it
> didn't connect right away, so I did a control-c.  Then, the kernel
> panic'ed:
> 
> [root at sins-stinger-10 ~]# ./kdapltest -T Q -d -s 192.168.0.26 -D mthca0a
> Server Name: 192.168.0.26
> Server Net Address: 192.168.0.26
> DT_cs_Client: Starting Test ...
> DT_cs_Client: IA mthca0a opened
> DT_cs_Client: EP created
> *****  DAPL  Characteristics  *****
> Provider: mthca0a  Version 1.0  DAPL 1.2
> Adapter: Generic InfiniBand HCA by Linux Version 0.0
> Supporting:
>         64512 EPs with 65535 DTOs and 0 in RDMA/RDs and 0ut RDMA/RDs each
>         65408 EVDs of up to 65535 entries  (default S/R size is 256/256)
>         IOVs of up to 28 elements
>         131056 LMRs (and 131056 RMRs) of up to 0xffffffffffffffff bytes
>         Maximum MTU 0x80000000 bytes, RDMA 0x80000000 bytes
>         Maximum Private data size 92 bytes
> ***** ***** ***** ***** ***** *****
> DT_cs_Client: Posting 1 recv buffer
> DT_cs_Client: Connect Endpoint
> DT_cs_Client: Await connection ...
> 
> <-- I DID A CONTROL-C HERE -->
> 
> [root at sins-stinger-10 ~]# dapl_path_comp_handler: path resolution failed -110 retry 1802201964!!!
> d<ap4>l_ibpa_atht:_c romeqp__henand:dl peren: d epff_pfftr81 000x65ab622b6bbe46b0 6bal6bre6bad6by
> completed? status 3
> dapl_path_comp_handler: path resolution failed -110 retry 1802201965!!!
> dapl_path_comp_handler: ep_ptr 0x6b6b6b6b6b6b6b6b
> general protection fault: 0000 [1] SMP
> CPU 0
> Modules linked in: kdapltest ib_dat_provider dat ib_at ib_ipoib ib_sdp ib_cm md5 ipv6 parport_pc lp parport autofs4 nfs lockd rfcomm l2cap bluetooth pcmcia yenta_socket rsrc_nonstatic pcmcia_core sunrpc ext3 jbd dm_mod video container button battery ac ohci_hcd tpm_nsc tpm i2c_amd756 i2c_core ib_mthca ib_sa ib_mad ib_core tg3 floppy xfs exportfs mptscsih mptbase sd_mod scsi_mod
> Pid: 11881, comm: ib_at_wq/0 Not tainted 2.6.12-rc6openib
> RIP: 0010:[<ffffffff88309c2d>] <ffffffff88309c2d>{:ib_dat_provider:dapl_evd_connection_callback+67}
> RSP: 0018:ffff8100218b9dc8  EFLAGS: 00010296
> RAX: 6b6b6b6b6b6b6b6b RBX: ffff81005a22be40 RCX: 0000000000004008
> RDX: ffff810075baaaf8 RSI: ffffffff883141a0 RDI: 0000000000000048
> RBP: ffff81005a22be70 R08: 6b6b6b6b6b6b6b6b R09: 0000000000000000
> R10: 0000000000000010 R11: 0000000000000010 R12: ffff81007d9b8c50
> R13: ffff81005a22be40 R14: 0000000000000292 R15: ffffffff882f5280
> FS:  00002aaaaaad7d60(0000) GS:ffffffff804e7880(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00007fffffd1dce6 CR3: 00000000769db000 CR4: 00000000000006e0
> Process ib_at_wq/0 (pid: 11881, threadinfo ffff8100218b8000, task ffff81004167caa0)
> Stack: ffff81004167caa0 6b6b6b6b6b6b6b6b 0000000000000000 000040080000006e
>        ffff810075baaaf8 ffff81003fdc8f10 0000000000000000 0000000000000003
>        ffff8100218b9e48 ffffffff88302d18
> Call Trace:<ffffffff88302d18>{:ib_dat_provider:dapl_path_comp_handler+416}
>        <ffffffff882f529b>{:ib_at:req_comp_work+27} <ffffffff8014912c>{worker_thread+476}
>        <ffffffff801327c0>{default_wake_function+0} <ffffffff8014d7d0>{keventd_create_kthread+0}
>        <ffffffff80148f50>{worker_thread+0} <ffffffff8014d7d0>{keventd_create_kthread+0}
>        <ffffffff8014da59>{kthread+217} <ffffffff80133d10>{schedule_tail+64}
>        <ffffffff8010f6db>{child_rip+8} <ffffffff8014d7d0>{keventd_create_kthread+0}
>        <ffffffff8014d980>{kthread+0} <ffffffff8010f6d3>{child_rip+0}
> 
> 
> Code: 48 8b 80 90 00 00 00 48 89 44 24 28 c7 44 24 30 00 00 00 00
> RIP <ffffffff88309c2d>{:ib_dat_provider:dapl_evd_connection_callback+67} RSP <ffff8100218b9dc8>
>  <<0>3>geSlneabra cl orprruotpteciotin:on s ftaaurtlt=f: ff00f80010 [072]5b aa<4af>S8,MP l en
> 51CP2        =
> 1R e<dz4>on
> e:M 0odx5ula2escf l07in1/ke0xd 5ain2c:f0 k71da.
> teLastst  iusb_erda: t_[<prffovffidfferff8 d83at02 3fib5>_a]t( daibpl_i_dpoesibtro iy_b_cms_dpid +0ibxb_c9/m0 xbmde 5[ ibip_dv6at _pparorpviordter_p])c l
> p<04>f0 p:a rp6bor t6 ba ut6bofs 64b n 6fsb  l6bock 6db  rf6bco mm6d l 62cba p6b bl 6uebto 6otbh  6bpc m6ciba  6bye
> aP_sreocv keobtj:<4 s> tarsrtrc=f_nffonf8st10at07ic5b aapc8emc0,ia l_cenor=5e1 2 nRrpedcz<on4e> : ex0xt3170 jfcbd2a 5/dm0x_m17od0fc v2aid5.eo
>  Lcoasntt aiusneerr:  [bu<fttffonfff bffat88te0ary21 91ac>] (ohkmciem_h_acdllo tc+pm0x_n61sc/0 xetp0 m[ xfi2s]c_)am
> d750060: i2 0c_0co 0re0  i00b_ m0th0ca 00 i b_00sa  00ib _m00ad 0 i1b _c00or ec 0tg a32  fl0o6pp 0y0  xf00s  0ex0p
> tf01s0 :mp 0ts0c si02h  m00ptb 0as0e  00sd _m00od 0 s0c si00_mo 0d2
> 4>Pi 0d:0 <114>88 82,0  coa3mm : 06ib_ 0at0_w 0q/01  N00ot         <
> taiNentxted o 2bj.6:. s12ta-rrtc6=fopffenf8ib10
> 075RIbaP:ad 01001, 0:le[<n=ff51ff2
> ffRe88dz30on9ce:2d 0>]x1 70<4fc><2af5f/ff0xff17ff0f88c230a59c.
> >L{:asibt _dusater_p: ro[<viffdeffr:ffdaffpl88_e0avd21_c91on>]nec(ktimeonm__calallolbc+ac0xk+6167/0}xe
> 0 R[xSPfs:] 0)01<48:>
> f0f80010:0 5900c3 dd0c08  0 E0F LA00GS : 00000 010029 06
>  0R0AX:<4 6> b601b6 b600b6b a6b06b f6be  RB00X:  f04fff 081000 05a02
> e4010 0:RC X:00 00 000200 00000 000004 00008                        2b
>  00RD<X:4> f 0ff0f 810000 750b1aa 0af08  R00SI : cbfff 0ff5ff f0488 310041a 00 0RD
> I: 0000000000000048
> RBP: ffff81
> Message from syslogd at sins-st0inger-10 at Wed 0Jun  8 13:36:54 2005 ...
> sins-5stinger-10 kerneal: general protection fault: 00020 [1] SMP
> 2be70 R08: 6b6b6b6b6b6b6b6b R09: 0000000000000033
> R10: 0000000000000010 R11: 0000000000000010 R12: ffff81007d9b8cd0
> R13: ffff81005a22be40 R14: 0000000000000292 R15: ffffffff882f5280
> FS:  00002aaaaae0ae60(0000) GS:ffffffff804e7900(0000) knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00007fffffbd2e96 CR3: 000000007cb2f000 CR4: 00000000000006e0
> Process ib_at_wq/1 (pid: 11882, threadinfo ffff810059c3c000, task ffff81004167d190)
> Stack: 00000000ffffff92 6b6b6b6b6b6b6b6b 0000000000000000 000040087d9b8d10
>        ffff810075baaaf8 0000000000000001 0000000000000092 0000000000000003
>        ffff810059c3de48 ffffffff88302d18
> Call Trace:<ffffffff88302d18>{:ib_dat_provider:dapl_path_comp_handler+416}
>        <ffffffff882f529b>{:ib_at:req_comp_work+27} <ffffffff8014912c>{worker_thread+476}
>        <ffffffff801327c0>{default_wake_function+0} <ffffffff8014d7d0>{keventd_create_kthread+0}
>        <ffffffff80148f50>{worker_thread+0} <ffffffff8014d7d0>{keventd_create_kthread+0}
>        <ffffffff8014da59>{kthread+217} <ffffffff80133d10>{schedule_tail+64}
>        <ffffffff8010f6db>{child_rip+8} <ffffffff8014d7d0>{keventd_create_kthread+0}
>        <ffffffff8014d980>{kthread+0} <ffffffff8010f6d3>{child_rip+0}
> 
> 
> Code: 48 8b 80 90 00 00 00 48 89 44 24 28 c7 44 24 30 00 00 00 00
> RIP <ffffffff88309c2d>{:ib_dat_provider:dapl_evd_connection_callback+67} RSP <ffff810059c3ddc8>

This is somewhat garbled but it appears that the connection was
destroyed from under the path resolution (during dapl_ib_connect). It
looks like there is another case to protect against :-( This one is
particularly nasty.

-- Hal




More information about the general mailing list