[openib-general] A Couple of CM Questions

Hal Rosenstock halr at voltaire.com
Tue Mar 8 11:45:37 PST 2005


Hi Sean,

My main question has to do with an error path in cm_req_handler. If
cm_init_av fails (lines 1098 or 1103), I get the following crash:

Mar  8 14:19:04 localhost kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Mar  8 14:19:04 localhost kernel:  printing eip:
Mar  8 14:19:04 localhost kernel: d09db042
Mar  8 14:19:04 localhost kernel: *pde = 0ba1d067
Mar  8 14:19:04 localhost kernel: *pte = 00000000
Mar  8 14:19:04 localhost kernel: Oops: 0000 [#1]
Mar  8 14:19:04 localhost kernel: Modules linked in: ib_cm ib_umad ide_cd cdrom lp ipv6 autofs parport_pc parport uhci_hcd ehci_hcd ib_mthca ib_mad ib_core ohci_hcd eepro100 mii evdev usbcore
Mar  8 14:19:04 localhost kernel: CPU:    0
Mar  8 14:19:04 localhost kernel: EIP:    0060:[<d09db042>]    Tainted: P      VLI
Mar  8 14:19:04 localhost kernel: EFLAGS: 00010286   (2.6.10) 
Mar  8 14:19:04 localhost kernel: EIP is at cm_alloc_msg+0x42/0x100 [ib_cm]
Mar  8 14:19:04 localhost kernel: eax: 00000000   ebx: cf641800   ecx: 00000000   edx: cfffa340
Mar  8 14:19:04 localhost kernel: esi: c1ca9400   edi: cf641958   ebp: 00000000   esp: c30d5e38
Mar  8 14:19:04 localhost kernel: ds: 007b   es: 007b   ss: 0068
Mar  8 14:19:04 localhost kernel: Process ib_cm/0 (pid: 4948, threadinfo=c30d4000 task=c2863aa0)
Mar  8 14:19:04 localhost kernel: Stack: cffff560 000000d0 00000028 00000000 c1ca9400 00000000 00000004 d09e0330 
Mar  8 14:19:04 localhost kernel:        c1ca9400 c30d5e80 33215650 000040a9 0000407e 00000296 ffffffc2 00000282 
Mar  8 14:19:04 localhost kernel:        0400407e 000040a9 00000246 00000292 c1ca9400 ffffffea c1c90ea8 d09dc531 
Mar  8 14:19:04 localhost kernel: Call Trace:
Mar  8 14:19:04 localhost kernel:  [<d09e0330>] ib_send_cm_rej+0x70/0x2d0 [ib_cm]
Mar  8 14:19:04 localhost kernel:  [<d09dc531>] ib_destroy_cm_id+0x3b1/0x780 [ib_cm]
Mar  8 14:19:04 localhost kernel:  [<c0217ecb>] rb_erase+0x4b/0xf0
Mar  8 14:19:04 localhost kernel:  [<d09dd96c>] cm_req_handler+0x16c/0x780 [ib_cm]
Mar  8 14:19:04 localhost kernel:  [<d09e3490>] cm_work_handler+0x0/0x130 [ib_cm]
Mar  8 14:19:04 localhost kernel:  [<d09e34c2>] cm_work_handler+0x32/0x130 [ib_cm]
Mar  8 14:19:04 localhost kernel:  [<c012e751>] worker_thread+0x251/0x470
Mar  8 14:19:04 localhost kernel:  [<c0112b20>] default_wake_function+0x0/0x20
Mar  8 14:19:04 localhost kernel:  [<c0112b20>] default_wake_function+0x0/0x20
Mar  8 14:19:04 localhost kernel:  [<c012e500>] worker_thread+0x0/0x470
Mar  8 14:19:04 localhost kernel:  [<c013525a>] kthread+0xaa/0xb0
Mar  8 14:19:04 localhost kernel:  [<c01351b0>] kthread+0x0/0xb0
Mar  8 14:19:04 localhost kernel:  [<c0100885>] kernel_thread_helper+0x5/0x10
Mar  8 14:19:04 localhost kernel: Code: 74 24 20 89 04 24 e8 be 05 77 ef 89 c3 b8 f4 ff ff ff 85 db 0f 84 b5 00 00 00 b9 56 00 00 00 89 df 89 e8 f3 ab 8b 86 8c 00 00 00 <8b> 10 8d 86 a0 00 00 00 89 44 24 04 8b 42 04 8b 40 04 89 04 24 

Also, it appears to me that the comm IDs in the CM messages are not
endianized on the IB "wire". This causes no issue with interoperability
but is slightly less clean to look at.

Thanks for your help with this.

-- Hal




More information about the general mailing list