[ewg] RE: OFED-1.3-beta sdp issue

Jim Mott jim at mellanox.com
Wed Dec 5 05:31:08 PST 2007


Hi,
  This looks very much like bug 793 (https://bugs.openfabrics.org/show_bug.cgi?id=793).  There was a change in the sk_buff definition in 2.6.22+ kernels.  

  Could you verify that the fix posted in the bug is in your sdp_bcopy.c (or just send me your drivers/infiniband/ulp/sdp/sdp_bcopy.c) file?  This bug got picked up as a patch that gets applied by the build process instead of a change to base code.  Perhaps it is not being picked up for PPC.  I'll check it out.

Thanks,
JIm

Jim Mott
Mellanox Technologies Ltd.
mail: jim at mellanox.com
Phone: 512-294-5481

From: Stefan Roscher [mailto:stefan.roscher at de.ibm.com] 
Sent: Wednesday, December 05, 2007 4:32 AM
To: Jim Mott
Cc: Hoang-Nam Nguyen; Christoph Raisch; ewg at lists.openfabrics.org
Subject: OFED-1.3-beta sdp issue


Hi Jim, 

during the OFED-1.3-beta2 test on ppc64 systems with SLES10-SP1 I saw the following issue. 

I booted linux kernel 2.6.22 and 2.6.23 on SLES10-SP1 and netpipe sdp fails. with the following oops: 


REGS: c000000008ccf930 TRAP: 0700   Not tainted  (2.6.23-ppc64)                 
MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24000044  XER: 00000005                 
TASK = c000000008ccb6a0[25] 'events/6' THREAD: c000000008ccc000 CPU: 6           
GPR00: c000000000322b98 c000000008ccfbb0 c000000000680048 0000000000000087       
GPR04: 0000000000000000 0000000000000000 0000000000000000 000000000024a7d8       
GPR08: 0000001bac9151b0 c0000000005c8108 c0000001daa87b58 c0000000005c8110       
GPR12: 0000000000000000 c00000000059a300 0000000000000000 0000000000000000       
GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000002100000       
GPR20: c00000000054de98 c0000001a0bc4b00 0000000000000001 0000000000000000       
GPR24: 0000000000000000 c0000001beb7d000 00000000beb7d014 0000000000000006       
GPR28: c0000001aae86100 c0000001d433c080 c00000000062ef28 c0000001db841880       
NIP [c000000000322b9c] .skb_over_panic+0x50/0x58                                 
LR [c000000000322b98] .skb_over_panic+0x4c/0x58                                 
Call Trace:                                                                     
[c000000008ccfbb0] [c000000000322b98] .skb_over_panic+0x4c/0x58 (unreliable)     
[c000000008ccfc40] [d000000000559df0] .sdp_poll_cq+0x380/0xa68 [ib_sdp]         
[c000000008ccfd10] [d00000000055a8fc] .sdp_work+0xe8/0x10c [ib_sdp]             
[c000000008ccfda0] [c000000000076fac] .run_workqueue+0x118/0x208                 
[c000000008ccfe40] [c000000000077f70] .worker_thread+0xcc/0xf0                   
[c000000008ccff00] [c00000000007caa4] .kthread+0x78/0xc4                         
[c000000008ccff90] [c000000000026be4] .kernel_thread+0x4c/0x68                   
Instruction dump:                                                               
80a30068 e8e300b8 e90300c0 812300ac 814300b0 2fa00000 409e0008 e81e8028         
e87e8038 f8010070 4bd3e4d1 60000000 <0fe00000> 48000000 7c0802a6 faa1ffa8   

This issue occurs only on the two kernels mentioned above. 

My Question is , is this the bug you described here: https://bugs.openfabrics.org/show_bug.cgi?id=807 

or should I open a new one?     
                                                                                
Kind Regards

Stefan Roscher 



More information about the ewg mailing list