[openib-general] Re: [kDAPL][PATCH] fix panic in server side

James Lentini jlentini at netapp.com
Fri May 6 10:35:10 PDT 2005


> Tom,
>
> There were two changes in this patch:
>
> - check for a NULL comm_id pointer
> - add error returns
>
> My understanding of the subsequent discussion is that:
>
> - we should not have been using the comm_id->state value. We should
>  have been using the event->event value instead
>
> - if we return a non 0 error code, our CM ID will be destroyed
>
> Given those pieces of information:
>
> Do we still want to check for a NULL comm_id pointer?
> Should we always return 0 from this function?
>
> I think the answer is yes to both of the above, but I want to make sure there 
> is consensus.

On second thought, I'd like to retract that last statement. If the 
comm_id pointer is NULL, that strikes me as a bug somewhere that needs 
to be fixed. An debug assert to verify that the comm_id is not null 
seems like a better option.

>
> james
>
> On Wed, 4 May 2005, Tom Duffy wrote:
>
>> This occurred when I did a connect without first setting up a listen
>> using kdapltest.
>> 
>> Unable to handle kernel NULL pointer dereference at 0000000000000020 RIP:
>> <ffffffff882b2c61>{:ib_dat_provider:dapl_cm_passive_cb_handler+27}
>> PGD 371ab067 PUD 36822067 PMD 0
>> Oops: 0000 [1] SMP
>> CPU 2
>> Modules linked in: kdapltest ib_dat_provider dat ib_at ib_cm ib_ipoib ib_sa 
>> ib_umad md5 ipv6 parport_pc lp parport autofs4 nfs lockd sunrpc pcmcia 
>> yenta_socket rsrc_nonstatic pcmcia_core ext3 jbd video container button 
>> battery ac uhci_hcd ehci_hcd hw_random i2c_i801 i2c_core ib_mthca ib_mad 
>> ib_core e1000 floppy dm_snapshot dm_zero dm_mirror xfs exportfs dm_mod 
>> mptscsih mptbase sd_mod scsi_mod
>> Pid: 11326, comm: ib_cm/2 Not tainted 2.6.12-rc3openib
>> RIP: 0010:[<ffffffff882b2c61>] 
>> <ffffffff882b2c61>{:ib_dat_provider:dapl_cm_passive_cb_handler+27}
>> RSP: 0018:ffff810038f65d90  EFLAGS: 00010202
>> RAX: 0000000000000000 RBX: ffff81000b3764b0 RCX: ffff81000b376550
>> RDX: 0000000000000056 RSI: ffff81000b376548 RDI: ffff81003f0cd8a0
>> RBP: ffff81003f0cd8a0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000020 R11: ffff81003925ae10 R12: ffff81003f0cd8a0
>> R13: ffff81003dcb8a88 R14: 0000000000000000 R15: ffff81000b3765a0
>> FS:  0000000000000000(0000) GS:ffffffff80487a00(0000) 
>> knlGS:0000000000000000
>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> CR2: 0000000000000020 CR3: 0000000038bbd000 CR4: 00000000000006e0
>> Process ib_cm/2 (pid: 11326, threadinfo ffff810038f64000, task 
>> ffff81003e75af30)Stack: ffffffff882aae32 0100000000000293 0000000000000000 
>> ffff81000b3764b0
>>       000000003f0cd8a0 ffff81003dcb8a88 0000000000000000 ffff81000b3765a0
>>       ffffffff882abd1e ffff810001e1b680
>> Call Trace:<ffffffff882aae32>{:ib_cm:cm_process_work+50} 
>> <ffffffff882abd1e>{:ib_cm:cm_work_handler+1438}
>>       <ffffffff801435f8>{do_sigaction+520} 
>> <ffffffff882ab780>{:ib_cm:cm_work_handler+0}
>>       <ffffffff801489ac>{worker_thread+476} 
>> <ffffffff801321c0>{default_wake_function+0}
>>       <ffffffff80130283>{__wake_up_common+67} 
>> <ffffffff801487d0>{worker_thread+0}
>>       <ffffffff8014cfe0>{keventd_create_kthread+0} 
>> <ffffffff8014d259>{kthread+217}
>>       <ffffffff801336d0>{schedule_tail+64} <ffffffff8010f57b>{child_rip+8}
>>       <ffffffff8014cfe0>{keventd_create_kthread+0} 
>> <ffffffff8014d180>{kthread+0}
>>       <ffffffff8010f573>{child_rip+0}
>> 
>> Code: 8b 40 20 89 44 24 04 83 7c 24 04 03 74 30 83 7c 24 04 03 77
>> RIP <ffffffff882b2c61>{:ib_dat_provider:dapl_cm_passive_cb_handler+27} RSP 
>> <ffff810038f65d90>
>> CR2: 0000000000000020
>> 
>> This cuteness happened in the kdapltest userland program:
>> 
>> Warning: conn_event_wait DAT_CONNECTION_EVENT_DISCONNECTED
>> DT_cs_Client: bad connection event
>> DT_cs_Client: Cleaning Up ...
>> DT_cs_Client: IA mthca0a closed
>> DT_cs_Client: ========== End of Work -- Client Exiting
>> TEST INSTANCE 1
>> *** glibc detected *** free(): invalid next size (fast): 0x000000000050ca50 
>> ***
>> Aborted (core dumped)
>> 
>> This patches fixes two things.  It makes it so that
>> dapl_cm_passive_cb_handler won't dereference a potentially NULL ib_cm_id
>> and also that it returns ret instead of 0 all the time.
>> 
>> Signed-off-by: Tom Duffy <tduffy at sun.com>
>> 
>> Index: dapl_openib_cm.c
>> ===================================================================
>> --- linux-kernel/dat-provider/dapl_openib_cm.c	(revision 2257)
>> +++ linux-kernel/dat-provider/dapl_openib_cm.c	(working copy)
>> @@ -227,6 +227,9 @@ int dapl_cm_passive_cb_handler(struct ib
>> {
>> 	int ret = 0;
>> 
>> +	if (!comm_id)
>> +		return -1;
>> +
>> 	switch (comm_id->state) {
>> 	case IB_CM_IDLE:
>> 		ret = do_passive_idle(comm_id);
>> @@ -247,7 +250,7 @@ int dapl_cm_passive_cb_handler(struct ib
>> 		break;
>> 	}
>> 
>> -	return 0;
>> +	return ret;
>> }
>> 
>> static void dapl_ib_destroy_cm_id_work(void *data)
>> 
>



More information about the general mailing list