[openib-general] Fwd: Re: Problems with OFED IPoIB HA on SLES10

Michael S. Tsirkin mst at mellanox.co.il
Wed Oct 4 05:46:57 PDT 2006


Another point: this seems to be crashing while we
are requeueing the packet through dev_start_xmit upon
path record completion.
It looks like this could try to requeue even though the
interface is going down - could this trigger some problems?


Quoting r. Michael S. Tsirkin <mst at mellanox.co.il>:
Subject: Fwd: Re: Problems with OFED IPoIB HA on SLES10

BTW, any idea?
The ipoib_ha is just a script that ups/downs and configures interfaces,
so this crash it seems coul also happen on systems without it.

-- 
MST

Date: Tue, 3 Oct 2006 22:39:54 -0700
From: "Scott Weitzenkamp (sweitzen)" <sweitzen at cisco.com>
Subject: Re: [openib-general] Problems with OFED IPoIB HA on SLES10

If I fail back and forth between ib0 and ib1 every 30 seconds or so for several hours, while IPoIB traffic is running, IPoIB host gets an Oops: and IPoIB stops working.
 
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
general protection fault: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 7
Modules linked in: af_packet ib_sdp rdma_ucm rdma_cm ib_addr ib_cm ib_ipoib ib_s
a ib_uverbs ib_umad ib_mthca ib_mad ib_core nls_utf8 st ipv6 nfs lockd nfs_acl s
unrpc button battery ac apparmor aamatch_pcre loop usbhid dm_mod hw_random ide_c
d ehci_hcd uhci_hcd cdrom i8xx_tco ide_floppy usbcore shpchp e1000 pci_hotplug f
loppy reiserfs edd fan thermal processor siimage sg mptspi mptscsih mptbase scsi
_transport_spi piix sd_mod scsi_mod ide_disk ide_core
Pid: 23541, comm: ib_mad1 Tainted: G     U 2.6.16.21-0.8-smp #1
RIP: 0010:[<ffffffff802cffea>] <ffffffff802cffea>{_spin_lock_irqsave+3}
RSP: 0018:ffff810132a4fc20  EFLAGS: 00010086
RAX: 0000000000000286 RBX: 0000000000000000 RCX: ffffffff883324ee
RDX: ffff810128d5e380 RSI: 0000000000000000 RDI: 0000ffff1b6017ff
RBP: 00000000fffffffc R08: ffffffff803d3260 R09: ffff810140333800
R10: ffff81000107d400 R11: 0000000000000292 R12: ffff810128d5e380
R13: ffff810132a4fc78 R14: 0000ffff1b6017ff R15: 0000000000000003
FS:  0000000000000000(0000) GS:ffff810142d19740(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b0b5e6ae180 CR3: 0000000128cbc000 CR4: 00000000000006e0
Process ib_mad1 (pid: 23541, threadinfo ffff810132a4e000, task ffff810142b56100)
Stack: ffffffff8833c5f5 ffff8101302b3000 0000ffff1b6012ff 0000000000000002
       0000000000000296 ffff8101302b3500 ffffffff8027753e ffff810128d5e3a0
       ffff81012bce1680 ffff810128d5e380
Call Trace: <ffffffff8833c5f5>{:ib_ipoib:path_rec_completion+862}
       <ffffffff8027753e>{dev_queue_xmit+545} <ffffffff8833c5b2>{:ib_ipoib:path_
rec_completion+795}
       <ffffffff8833252e>{:ib_sa:ib_sa_path_rec_callback+64}
       <ffffffff80138f17>{lock_timer_base+27} <ffffffff80138f89>{try_to_del_time
r_sync+81}
       <ffffffff883322b3>{:ib_sa:send_handler+72} <ffffffff8826762f>{:ib_mad:ib_
mad_complete_send_wr+421}
       <ffffffff88267f00>{:ib_mad:ib_mad_completion_handler+947}
       <ffffffff88267b4d>{:ib_mad:ib_mad_completion_handler+0}
       <ffffffff80140177>{run_workqueue+153} <ffffffff8014081e>{worker_thread+0}
       <ffffffff801437e5>{keventd_create_kthread+0} <ffffffff80140927>{worker_th
read+265}
       <ffffffff8012787f>{__wake_up_common+62} <ffffffff8012905a>{default_wake_f
unction+0}
       <ffffffff801437e5>{keventd_create_kthread+0} <ffffffff80143aca>{kthread+2
36}
       <ffffffff8010b60a>{child_rip+8} <ffffffff801437e5>{keventd_create_kthread
+0}
       <ffffffff801439de>{kthread+0} <ffffffff8010b602>{child_rip+0}
 
Code: f0 ff 0f 0f 88 29 01 00 00 c3 fa f0 ff 0f 0f 88 2a 01 00 00
RIP <ffffffff802cffea>{_spin_lock_irqsave+3} RSP <ffff810132a4fc20>
 
 
 
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
 


    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Scott Weitzenkamp (sweitzen)
    Sent: Tuesday, October 03, 2006 2:53 PM
    To: Vladimir Sokolovsky
    Cc: EWG; openib-General
    Subject: Re: [openib-general] [openfabrics-ewg] Problems with OFED IPoIB HA on SLES10
   
    Vlad, thaks for the fast response.  I have some followup questions about configuring IPoIB HA, see below.
   
                    3) I got IPoIB HA working on SLES 10, but the documentation is a little lacking.   Looks like I have to put the same IP address in ifcfg-ib0 and ifcfg-ib1, is this correct?
                   
   
            Yes, IP address should be the same. Actually the configuration of the secondary interface does not matter.
            The High Availability daemon reads the configuration of the primary interface and migrates it between the interfaces in case of failure.
           
           
    If I don't have an ifcfg-ib1 file, then ipoib_ha.pl won't start.
   
    If I don't have an ifcfg-ib1, then ipoib_ha.pl won't start.  I would prefer to not configure ifcfg-ib1 since I don't plan to use it.
   
    # ipoib_ha.pl --with-arping --with-multicast -v
    Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory
    Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory
    Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory
    Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory
    Can't open conf /etc/sysconfig/network/ifcfg-ib1: No such file or directory
    ...
   
    If I put different IP addresses in ifcfg-ib0 and ifcfg-ib1, then the ifcfg-ib1 IP address is used for both ib0 and ib1!
   
    # pwd
    /etc/sysconfig/network
    # cat ifcfg-ib0
    DEVICE=ib0
    BOOTPROTO=static
    IPADDR=192.168.2.46
    NETMASK=255.255.255.0
    ONBOOT=yes
    # cat ifcfg-ib1
    DEVICE=ib1
    BOOTPROTO=static
    IPADDR=192.168.6.46
    NETMASK=255.255.255.0
    ONBOOT=yes
    # /etc/init.d/openibd start
    Loading HCA driver and Access Layer:                       [  OK  ]
    Setting up InfiniBand network interfaces:
        ib0       device: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor com
    patibility mode) (rev 20)
        ib0       configuration: ib1
    Bringing up interface ib0:                                 [  OK  ]
        ib1       device: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor com
    patibility mode) (rev 20)
    Bringing up interface ib1:                                 [  OK  ]
    Setting up service network . . .                           [  done  ]
    # ifconfig ib0
    ib0       Link encap:UNSPEC  HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00
    -00
              inet addr:192.168.6.46  Bcast:192.168.6.255  Mask:255.255.255.0
              inet6 addr: fe80::202:c902:21:700d/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:3 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:128
              RX bytes:0 (0.0 b)  TX bytes:224 (224.0 b)
   
    # ifconfig ib1
    ib1       Link encap:UNSPEC  HWaddr 00-00-04-05-FE-80-00-00-00-00-00-00-00-00-00
    -00
              inet addr:192.168.6.46  Bcast:192.168.6.255  Mask:255.255.255.0
              inet6 addr: fe80::202:c902:21:700e/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:128
              RX bytes:0 (0.0 b)  TX bytes:304 (304.0 b)
   
    Notice how both ib0 and ib1 have the IP address from ifcfg-ib1.  This contradicts this info from ipoib_release_notes.txt:
   
           b.   The ib1 interface uses the configuration script of ib0.
       
    Scott
   

_______________________________________________
openfabrics-ewg mailing list
openfabrics-ewg at openib.org
http://openib.org/mailman/listinfo/openfabrics-ewg


_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-- 
MST




More information about the general mailing list