[ofa-general] crash in ipoib

Sean Hefty sean.hefty at intel.com
Tue Jun 12 11:03:24 PDT 2007


Copying ofa general list.

We've seen a crash similar to this now a total of 4 times. 

These are x64, 2.6.9-42.EL.  The crashes only seem to occur on a specific set of
systems in our cluster.

The latest crash has a similar stack trace as the one listed below.

badness in 18042_panic_blink drivers/input/serio/18042.c : 992
18042_panic_blink + 485
panic + 445
apic_timer_interrupt + 133
oops_end + 38
oops_end + 65
do_page_fault + 1204
ipoib_cm_send + 433
error_exit
ipoib_ib_completion + 0
ipoib_cm_handle_rx_wc + 239

(the trace goes on and on)

- Sean

>>No known issues with IPoIB. Can you send the command line and all
>>details on the machine you work.
>>Also - do you have the oops printout
>
>Woody will need to provide details on the machine.  Here's what's available
>from the oops printout:  (might not be related to ipoib or cm)
>
>(top portion is cut off)
>badness in 18042_panic_blink drivers/input/serio/18042.c : 992
>18042_panic_blink + 485
>panic + 445
>apic_timer_interrupt + 133
>oops_end + 38
>oops_end + 65
>do_page_fault + 1204
>error_exit
>ipoib_ib_completion
>ipoib_cm_handle_rx_wc + 378
>ipoib_ib_completion + 144
>usb_hcd_irq
>mthca_eq_int + 221
>ret_from_intr
>mthca_tavor_interrupt + 95
>handle_IRQ_event
>do_IRQ
>ret_from_intr
>csum_partial + 725
>skb_checksum + 308
>ip_conntrack:tcp_error + 312
>ip_conntrack_in + 163
>try_to_wake_up + 876
>nf_iterate + 82
>ip_rcv_finish
>ip_rcv + 1119
>net1f_receive_sck + 791
>process_backlog + 136
>net_rx_action
>do_softirq
>do_IRQ
>ret_from_intr
>spin_unlock_irqrestore
>ib_send_cm_rep
>ib_ipoib_cm_rx_handler
>cm_alloc_msg
>ib_send_cm_rtu
>ipoib_cm_rx_event_handler
>ib_find_cached_pkey
>cm_process_work
>cm_req_handler
>cm_work_handler
>cm_work_handler
>worker_thread
>blah blah blah



More information about the general mailing list