[ofa-general] Re: [PATCH 4/4] [RFC] IPoIB/cm: Add connected mode support for devices without SRQs

Pradeep Satyanarayana pradeeps at linux.vnet.ibm.com
Fri Nov 2 09:50:47 PDT 2007


Pradeep Satyanarayana wrote:
> Roland Dreier wrote:
>> FWIW, I left netpipe-tcp running in a loop overnight over a connected
>> mode IPoIB interface on a system running my for-2.6.25 tree (plus a
>> hack to use the non-SRQ code on mlx4 by forcing create SRQ to fail).
>> It ran with no problems (and transferred nearly a billion packets and
>> 10 TB of data).
>>
> Yes, it definitely seems much better with the for-2.6.25 tree and it all 
> seems to go off well. Except for one crash in cache_alloc_refill() 
> all of the other test runs have completed. BTW, I have been using SLAB thus
> far. I will switch to SLUB and see if that makes any difference.
> 
> And thanks for testing it out on mlx4.

Ran into some crashes with Slub too. Maybe it is PPC64 specific. What machine
did you run the mlx4 tests on?

However, here is a stack trace with e1000 (I was running netperf on IB when I
saw this crash)that indicates that this is unlikely to be an IB issue.


0:mon> t
[c00000000ffff7f0] c000000000361340 .skb_release_data+0xf0/0x120
[c00000000ffff880] c000000000360ee0 .kfree_skbmem+0x20/0x130
[c00000000ffff900] c0000000003cc118 .__udp4_lib_rcv+0x408/0x950
[c00000000ffffa10] c00000000039efc0 .ip_local_deliver_finish+0x170/0x340
[c00000000ffffab0] c00000000039eb28 .ip_rcv_finish+0x198/0x4c0
[c00000000ffffb70] c00000000036a268 .netif_receive_skb+0x3a8/0x6e0
[c00000000ffffc40] d0000000003b34b0 .e1000_clean_rx_irq+0x250/0x6c0 [e1000]
[c00000000ffffd50] d0000000003b08a0 .e1000_clean+0x2e0/0x390 [e1000]
[c00000000ffffe10] c00000000036daa0 .net_rx_action+0x1f0/0x2a0
[c00000000ffffed0] c000000000064d48 .__do_softirq+0xe8/0x1e0
[c00000000fffff90] c00000000002ad88 .call_do_softirq+0x14/0x24
[c0000000006678a0] c00000000000c2b8 .do_softirq+0x88/0xe0
[c000000000667930] c000000000064f04 .irq_exit+0x74/0x90
[c0000000006679b0] c00000000000cccc .do_IRQ+0xec/0x1e0
[c000000000667a40] c000000000004780 hardware_interrupt_entry+0x18/0x98
--- Exception: 501 (Hardware Interrupt) at c00000000003d94c .pseries_dedicated_idle_sleep+0xdc/0x1c0
[c000000000667d30] 00000000021464d8 (unreliable)
[c000000000667dd0] c00000000001207c .cpu_idle+0x13c/0x250
[c000000000667e60] c0000000004349b8 .rest_init+0x78/0x90
[c000000000667ee0] c000000000510a24 .start_kernel+0x354/0x400
[c000000000667f90] c000000000434930 .start_here_common+0x54/0x64
0:mon> e
cpu 0x0: Vector: 300 (Data Access) at [c00000000ffff4d0]
    pc: c0000000000a842c: .put_page+0x2c/0x1a0
    lr: c000000000361340: .skb_release_data+0xf0/0x120
    sp: c00000000ffff750
   msr: 8000000000009032
   dar: 6e65747065726600
 dsisr: 40000000
  current = 0xc00000000058e450
  paca    = 0xc00000000058ed00
    pid   = 0, comm = swapper
0:mon>

I would like to pursue this a bit further since I think this is an issue that
needs to be addressed. I pulled from your for-2.6.25 git tree. Is this the same
as Linus' 2.6.24-rc1?

Pradeep




More information about the general mailing list