[openib-general] [kDAPL][PATCH] fix panic in server side
Hal Rosenstock
halr at voltaire.com
Wed May 4 23:00:56 PDT 2005
On Wed, 2005-05-04 at 17:59, Tom Duffy wrote:
> This occurred when I did a connect without first setting up a listen
> using kdapltest.
>
> Unable to handle kernel NULL pointer dereference at 0000000000000020 RIP:
> <ffffffff882b2c61>{:ib_dat_provider:dapl_cm_passive_cb_handler+27}
> PGD 371ab067 PUD 36822067 PMD 0
> Oops: 0000 [1] SMP
> CPU 2
> Modules linked in: kdapltest ib_dat_provider dat ib_at ib_cm ib_ipoib ib_sa ib_umad md5 ipv6 parport_pc lp parport autofs4 nfs lockd sunrpc pcmcia yenta_socket rsrc_nonstatic pcmcia_core ext3 jbd video container button battery ac uhci_hcd ehci_hcd hw_random i2c_i801 i2c_core ib_mthca ib_mad ib_core e1000 floppy dm_snapshot dm_zero dm_mirror xfs exportfs dm_mod mptscsih mptbase sd_mod scsi_mod
> Pid: 11326, comm: ib_cm/2 Not tainted 2.6.12-rc3openib
> RIP: 0010:[<ffffffff882b2c61>] <ffffffff882b2c61>{:ib_dat_provider:dapl_cm_passive_cb_handler+27}
> RSP: 0018:ffff810038f65d90 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff81000b3764b0 RCX: ffff81000b376550
> RDX: 0000000000000056 RSI: ffff81000b376548 RDI: ffff81003f0cd8a0
> RBP: ffff81003f0cd8a0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000020 R11: ffff81003925ae10 R12: ffff81003f0cd8a0
> R13: ffff81003dcb8a88 R14: 0000000000000000 R15: ffff81000b3765a0
> FS: 0000000000000000(0000) GS:ffffffff80487a00(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000020 CR3: 0000000038bbd000 CR4: 00000000000006e0
> Process ib_cm/2 (pid: 11326, threadinfo ffff810038f64000, task ffff81003e75af30)Stack: ffffffff882aae32 0100000000000293 0000000000000000 ffff81000b3764b0
> 000000003f0cd8a0 ffff81003dcb8a88 0000000000000000 ffff81000b3765a0
> ffffffff882abd1e ffff810001e1b680
> Call Trace:<ffffffff882aae32>{:ib_cm:cm_process_work+50} <ffffffff882abd1e>{:ib_cm:cm_work_handler+1438}
> <ffffffff801435f8>{do_sigaction+520} <ffffffff882ab780>{:ib_cm:cm_work_handler+0}
> <ffffffff801489ac>{worker_thread+476} <ffffffff801321c0>{default_wake_function+0}
> <ffffffff80130283>{__wake_up_common+67} <ffffffff801487d0>{worker_thread+0}
> <ffffffff8014cfe0>{keventd_create_kthread+0} <ffffffff8014d259>{kthread+217}
> <ffffffff801336d0>{schedule_tail+64} <ffffffff8010f57b>{child_rip+8}
> <ffffffff8014cfe0>{keventd_create_kthread+0} <ffffffff8014d180>{kthread+0}
> <ffffffff8010f573>{child_rip+0}
>
> Code: 8b 40 20 89 44 24 04 83 7c 24 04 03 74 30 83 7c 24 04 03 77
> RIP <ffffffff882b2c61>{:ib_dat_provider:dapl_cm_passive_cb_handler+27} RSP <ffff810038f65d90>
> CR2: 0000000000000020
>
> This cuteness happened in the kdapltest userland program:
>
> Warning: conn_event_wait DAT_CONNECTION_EVENT_DISCONNECTED
> DT_cs_Client: bad connection event
> DT_cs_Client: Cleaning Up ...
> DT_cs_Client: IA mthca0a closed
> DT_cs_Client: ========== End of Work -- Client Exiting
> TEST INSTANCE 1
> *** glibc detected *** free(): invalid next size (fast): 0x000000000050ca50 ***
> Aborted (core dumped)
Hmm, when I do this (on x86 and ), I get the following:
DT_cs_Client: Connect Endpoint
DT_cs_Client: Await connection ...
Warning: conn_event_wait DAT_CONNECTION_EVENT_DISCONNECTED
DT_cs_Client: bad connection event
DT_cs_Client: Cleaning Up ...
DT_cs_Client: IA mthca0a closed
DT_cs_Client: ========== End of Work -- Client Exiting
This takes a while until REQ timeout/retries are exhausted.
On x86_64, I do get the following (same as you in userland):
DT_cs_Client: Connect Endpoint
DT_cs_Client: Await connection ...
Warning: conn_event_wait DAT_CONNECTION_EVENT_DISCONNECTED
DT_cs_Client: bad connection event
DT_cs_Client: Cleaning Up ...
DT_cs_Client: IA mthca0a closed
DT_cs_Client: ========== End of Work -- Client Exiting
*** glibc detected *** double free or corruption: 0x000000000050c010 ***
Aborted
but no oops. That is a kdapltest test program issue as you noted.
How long did you wait ? Did you ctl-C out ?
I'm presuming you loaded IPoIB and are pointing the client at an IPoIB
which is pingable but does not have a kdapl server (connection listener)
started up. I don't understand why the passive CM handler would be
invoked without a server started.
Would you describe your procedure some more so I can see if I can
recreate this ?
-- Hal
More information about the general
mailing list