[ofa-general] How does ib_srpt decide which ports to use?

Chris Worley worleys at gmail.com
Sat Jun 6 10:27:34 PDT 2009


On Sat, Jun 6, 2009 at 1:36 AM, Bart Van Assche
<bart.vanassche at gmail.com> wrote:
> On Sat, Jun 6, 2009 at 1:15 AM, Chris Worley<worleys at gmail.com> wrote:
>> Setup: 1.4.1 w/ 3 dual-port QDR cards in each of two hosts, all ports
>> direct connected, opensm running on all port GUIDs from one host, all
>> links active.
>>
>> Problem: ibsrpdm only advertises the first port of the first HCA of the target.
>> Next problem: I can add targets via
>> /sys/class/infiniband_srp/srp-*/add_target on the initiator, but only
>> when naming the two port guids of the first HCA on the target.  In
>> testing, both ports are used.
>>
>> Can somebody aim me in the right direction of what/who's stopping
>> after the first HCA?
>
> Please have a look at the /sys/class/infiniband_srpt/srpt-*/login_info
> information on the target. The following information should be
> present:
> * One /sys/class/infiniband_srpt/srpt-* entry per HCA.
> * For each HCA, /sys/class/infiniband_srpt/srpt-${HCA}/login_info
> should contain one line for each port of that HCA.

# cat /sys/class/infiniband_srpt/srpt-*/login_info
tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000041,service_id=0024710000000040
tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000042,service_id=0024710000000040
tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000045,service_id=0024710000000040
tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000046,service_id=0024710000000040
tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000002c903000292af,service_id=0024710000000040
tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000002c903000292b0,service_id=0024710000000040

Each port has an entry, and the port GUIDs are correct (dgid's), but
the rest of the GUIDs refer to the node GUID of the first IB HCA:
0024710000000040.

Is that expected?

>
> On the initiator you can use the information obtained from
> "login_info" (after having replaced tid_ext by id_ext) to log in to
> the target:
> echo ... > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target

Using the first HCA's node GUIDs from my target adds on the initiator
seems to work, but soon after (and not doing anything w/ the devices)
the system panic'd (and remote power cycling is not working).  It
doesn't look like the panic was anywhere in IB or SRP modules:

...
SCSI device sdbo: 314287168 512-byte hdwr sectors (160915 MB)
sdbo: Write Protect is off
sdbo: Mode Sense: 83 00 10 08
SCSI device sdbo: drive cache: write back w/ FUA
SCSI device sdbo: 314287168 512-byte hdwr sectors (160915 MB)
sdbo: Write Protect is off
sdbo: Mode Sense: 83 00 10 08
SCSI device sdbo: drive cache: write back w/ FUA
 sdbo: unknown partition table
sd 42:0:0:5: Attached scsi disk sdbo
  Vendor: SCST_BIO  Model: vdisk6            Rev:  102
  Type:   Direct-Access                      ANSI SCSI revision: 04
SCSI device sdbp: 314287168 512-byte hdwr sectors (160915 MB)
sdbp: Write Protect is off
sdbp: Mode Sense: 83 00 10 08
SCSI device sdbp: drive cache: write back w/ FUA
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
SCSI device sdbp: 314287168 512-byte hdwr sectors (160915 MB)
sdbp: Write Protect is off
sdbp: Mode Sense: 83 00 10 08
SCSI device sdbp: drive cache: write back w/ FUA
 sdbp: unknown partition table
sd 42:0:0:6: Attached scsi disk sdbp
  Vendor: SCST_BIO  Model: vdisk7            Rev:  102
  Type:   Direct-Access                      ANSI SCSI revision: 04
SCSI device sdbq: 314287168 512-byte hdwr sectors (160915 MB)
sdbq: Write Protect is off
sdbq: Mode Sense: 83 00 10 08
SCSI device sdbq: drive cache: write back w/ FUA
SCSI device sdbq: 314287168 512-byte hdwr sectors (160915 MB)
sdbq: Write Protect is off
sdbq: Mode Sense: 83 00 10 08
SCSI device sdbq: drive cache: write back w/ FUA
 sdbq: unknown partition table
sd 42:0:0:7: Attached scsi disk sdbq
 host31: ib_srp: srp_qp_in_err_timer called
 host31: ib_srp: srp_qp_in_err_timer flushed reset - done
 host31: ib_srp: Sending CM DREQ failed
 host37: ib_srp: DREQ received - connection closed
 host32: ib_srp: srp_qp_in_err_timer called
 host32: ib_srp: srp_qp_in_err_timer flushed reset - done
 host32: ib_srp: Sending CM DREQ failed
 host38: ib_srp: DREQ received - connection closed
 host37: ib_srp: connection closed
ib_srp:  host37: add qp_in_err timer
 host38: ib_srp: connection closed
ib_srp:  host38: add qp_in_err timer
 host37: ib_srp: srp_qp_in_err_timer called
 host37: ib_srp: srp_qp_in_err_timer flushed reset - done
 host37: ib_srp: Sending CM DREQ failed
 host31: ib_srp: DREQ received - connection closed
 host38: ib_srp: srp_qp_in_err_timer called
 host38: ib_srp: srp_qp_in_err_timer flushed reset - done
 host38: ib_srp: Sending CM DREQ failed
 host32: ib_srp: DREQ received - connection closed
 host31: ib_srp: connection closed
ib_srp:  host31: add qp_in_err timer
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
 host31: ib_srp: srp_qp_in_err_timer called
 host31: ib_srp: srp_qp_in_err_timer flushed reset - done
 host31: ib_srp: Sending CM DREQ failed
 host37: ib_srp: DREQ received - connection closed
 host32: ib_srp: srp_qp_in_err_timer called
 host32: ib_srp: srp_qp_in_err_timer flushed reset - done
 host32: ib_srp: Sending CM DREQ failed
 host38: ib_srp: DREQ received - connection closed
 host37: ib_srp: connection closed
ib_srp:  host37: add qp_in_err timer
 host38: ib_srp: connection closed
ib_srp:  host38: add qp_in_err timer
 host37: ib_srp: srp_qp_in_err_timer called
 host37: ib_srp: srp_qp_in_err_timer flushed reset - done
 host37: ib_srp: Sending CM DREQ failed
 host31: ib_srp: DREQ received - connection closed
 host38: ib_srp: srp_qp_in_err_timer called
 host38: ib_srp: srp_qp_in_err_timer flushed reset - done
 host38: ib_srp: Sending CM DREQ failed
 host32: ib_srp: DREQ received - connection closed
 host31: ib_srp: connection closed
ib_srp:  host31: add qp_in_err timer
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
 host31: ib_srp: srp_qp_in_err_timer called
 host31: ib_srp: srp_qp_in_err_timer flushed reset - done
 host31: ib_srp: Sending CM DREQ failed
 host37: ib_srp: DREQ received - connection closed
 host32: ib_srp: srp_qp_in_err_timer called
 host32: ib_srp: srp_qp_in_err_timer flushed reset - done
 host32: ib_srp: Sending CM DREQ failed
 host38: ib_srp: DREQ received - connection closed
 host37: ib_srp: connection closed
ib_srp:  host37: add qp_in_err timer
 host38: ib_srp: connection closed
ib_srp:  host38: add qp_in_err timer
 host37: ib_srp: srp_qp_in_err_timer called
 host37: ib_srp: srp_qp_in_err_timer flushed reset - done
 host37: ib_srp: Sending CM DREQ failed
 host31: ib_srp: DREQ received - connection closed
 host38: ib_srp: srp_qp_in_err_timer called
 host38: ib_srp: srp_qp_in_err_timer flushed reset - done
 host38: ib_srp: Sending CM DREQ failed
 host32: ib_srp: DREQ received - connection closed
 host31: ib_srp: connection closed
ib_srp:  host31: add qp_in_err timer
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
 host31: ib_srp: srp_qp_in_err_timer called
 host31: ib_srp: srp_qp_in_err_timer flushed reset - done
 host31: ib_srp: Sending CM DREQ failed
 host37: ib_srp: DREQ received - connection closed
 host32: ib_srp: srp_qp_in_err_timer called
 host32: ib_srp: srp_qp_in_err_timer flushed reset - done
 host32: ib_srp: Sending CM DREQ failed
 host38: ib_srp: DREQ received - connection closed
 host37: ib_srp: connection closed
ib_srp:  host37: add qp_in_err timer
 host38: ib_srp: connection closed
ib_srp:  host38: add qp_in_err timer
 host37: ib_srp: srp_qp_in_err_timer called
 host37: ib_srp: srp_qp_in_err_timer flushed reset - done
 host37: ib_srp: Sending CM DREQ failed
 host31: ib_srp: DREQ received - connection closed
 host38: ib_srp: srp_qp_in_err_timer called
 host38: ib_srp: srp_qp_in_err_timer flushed reset - done
 host38: ib_srp: Sending CM DREQ failed
 host32: ib_srp: DREQ received - connection closed
 host31: ib_srp: connection closed
ib_srp:  host31: add qp_in_err timer
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
 host31: ib_srp: srp_qp_in_err_timer called
 host31: ib_srp: srp_qp_in_err_timer flushed reset - done
 host31: ib_srp: Sending CM DREQ failed
 host37: ib_srp: DREQ received - connection closed
 host32: ib_srp: srp_qp_in_err_timer called
 host32: ib_srp: srp_qp_in_err_timer flushed reset - done
 host32: ib_srp: Sending CM DREQ failed
 host38: ib_srp: DREQ received - connection closed
 host37: ib_srp: connection closed
ib_srp:  host37: add qp_in_err timer
 host38: ib_srp: connection closed
ib_srp:  host38: add qp_in_err timer host37: ib_srp: srp_qp_in_err_timer called
 host37: ib_srp: srp_qp_in_err_timer flushed reset - done
 host37: ib_srp: Sending CM DREQ failed
 host31: ib_srp: DREQ received - connection closed
 host38: ib_srp: srp_qp_in_err_timer called
 host38: ib_srp: srp_qp_in_err_timer flushed reset - done
 host38: ib_srp: Sending CM DREQ failed
 host32: ib_srp: DREQ received - connection closed
 host31: ib_srp: connection closed
ib_srp:  host31: add qp_in_err timer
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
 host31: ib_srp: srp_qp_in_err_timer called
 host31: ib_srp: srp_qp_in_err_timer flushed reset - done
 host31: ib_srp: Sending CM DREQ failed
 host37: ib_srp: DREQ received - connection closed
 host32: ib_srp: srp_qp_in_err_timer called
 host32: ib_srp: srp_qp_in_err_timer flushed reset - done
 host32: ib_srp: Sending CM DREQ failed
 host38: ib_srp: DREQ received - connection closed
 host37: ib_srp: connection closed
ib_srp:  host37: add qp_in_err timer
 host38: ib_srp: connection closed
ib_srp:  host38: add qp_in_err timer
 host37: ib_srp: srp_qp_in_err_timer called
 host37: ib_srp: srp_qp_in_err_timer flushed reset - done
 host37: ib_srp: Sending CM DREQ failed
 host31: ib_srp: DREQ received - connection closed
 host38: ib_srp: srp_qp_in_err_timer called
 host38: ib_srp: srp_qp_in_err_timer flushed reset - done
 host38: ib_srp: Sending CM DREQ failed
 host32: ib_srp: DREQ received - connection closed
 host31: ib_srp: connection closed
ib_srp:  host31: add qp_in_err timer
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
 host31: ib_srp: srp_qp_in_err_timer called
 host31: ib_srp: srp_qp_in_err_timer flushed reset - done
 host31: ib_srp: Sending CM DREQ failed
 host37: ib_srp: DREQ received - connection closed
 host32: ib_srp: srp_qp_in_err_timer called
 host32: ib_srp: srp_qp_in_err_timer flushed reset - done
 host32: ib_srp: Sending CM DREQ failed
 host38: ib_srp: DREQ received - connection closed
 host37: ib_srp: connection closed
ib_srp:  host37: add qp_in_err timer
 host38: ib_srp: connection closed
ib_srp:  host38: add qp_in_err timer
 host37: ib_srp: srp_qp_in_err_timer called
 host37: ib_srp: srp_qp_in_err_timer flushed reset - done
 host37: ib_srp: Sending CM DREQ failed
 host31: ib_srp: DREQ received - connection closed
 host38: ib_srp: srp_qp_in_err_timer called
 host38: ib_srp: srp_qp_in_err_timer flushed reset - done
 host38: ib_srp: Sending CM DREQ failed
 host32: ib_srp: DREQ received - connection closed
 host31: ib_srp: connection closed
ib_srp:  host31: add qp_in_err timer
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
 host31: ib_srp: srp_qp_in_err_timer called
 host31: ib_srp: srp_qp_in_err_timer flushed reset - done
 host31: ib_srp: Sending CM DREQ failed
 host37: ib_srp: DREQ received - connection closed
 host32: ib_srp: srp_qp_in_err_timer called
 host32: ib_srp: srp_qp_in_err_timer flushed reset - done
 host32: ib_srp: Sending CM DREQ failed
 host38: ib_srp: DREQ received - connection closed
 host37: ib_srp: connection closed
ib_srp:  host37: add qp_in_err timer host38: ib_srp: connection closed
ib_srp:  host38: add qp_in_err timer
 host37: ib_srp: srp_qp_in_err_timer called
 host37: ib_srp: srp_qp_in_err_timer flushed reset - done
 host37: ib_srp: Sending CM DREQ failed
 host31: ib_srp: DREQ received - connection closed
 host38: ib_srp: srp_qp_in_err_timer called
 host38: ib_srp: srp_qp_in_err_timer flushed reset - done
 host38: ib_srp: Sending CM DREQ failed
 host32: ib_srp: DREQ received - connection closed
 host31: ib_srp: connection closed
ib_srp:  host31: add qp_in_err timer
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
 host31: ib_srp: srp_qp_in_err_timer called
 host31: ib_srp: srp_qp_in_err_timer flushed reset - done
 host31: ib_srp: Sending CM DREQ failed
 host37: ib_srp: DREQ received - connection closed
 host32: ib_srp: srp_qp_in_err_timer called
 host32: ib_srp: srp_qp_in_err_timer flushed reset - done
 host32: ib_srp: Sending CM DREQ failed
 host38: ib_srp: DREQ received - connection closed
 host37: ib_srp: connection closed
ib_srp:  host37: add qp_in_err timer
 host38: ib_srp: connection closed
ib_srp:  host38: add qp_in_err timer
 host37: ib_srp: srp_qp_in_err_timer called
 host37: ib_srp: srp_qp_in_err_timer flushed reset - done
 host37: ib_srp: Sending CM DREQ failed
 host31: ib_srp: DREQ received - connection closed
 host38: ib_srp: srp_qp_in_err_timer called
 host38: ib_srp: srp_qp_in_err_timer flushed reset - done
 host38: ib_srp: Sending CM DREQ failed
 host32: ib_srp: DREQ received - connection closed
 host31: ib_srp: connection closed
ib_srp:  host31: add qp_in_err timer
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
 host31: ib_srp: srp_qp_in_err_timer called
 host31: ib_srp: srp_qp_in_err_timer flushed reset - done
 host31: ib_srp: Sending CM DREQ failed
 host37: ib_srp: DREQ received - connection closed
 host32: ib_srp: srp_qp_in_err_timer called
 host32: ib_srp: srp_qp_in_err_timer flushed reset - done
 host32: ib_srp: Sending CM DREQ failed
 host38: ib_srp: DREQ received - connection closed
 host37: ib_srp: connection closed
ib_srp:  host37: add qp_in_err timer
 host38: ib_srp: connection closed
ib_srp:  host38: add qp_in_err timer
 host37: ib_srp: srp_qp_in_err_timer called
 host37: ib_srp: srp_qp_in_err_timer flushed reset - done
 host37: ib_srp: Sending CM DREQ failed
 host31: ib_srp: DREQ received - connection closed
 host38: ib_srp: srp_qp_in_err_timer called
 host38: ib_srp: srp_qp_in_err_timer flushed reset - done
 host38: ib_srp: Sending CM DREQ failed
 host32: ib_srp: DREQ received - connection closed
 host31: ib_srp: connection closed
ib_srp:  host31: add qp_in_err timer
 host32: ib_srp: connection closed
ib_srp:  host32: add qp_in_err timer
 host31: ib_srp: Sending CM DREQ failed
 host32: ib_srp: Sending CM DREQ failed
Unable to handle kernel paging request at ffffffff882539ee RIP:
 [<ffffffff882539ee>]
PGD 203027 PUD 205027 PMD 407f4f067 PTE 0
Oops: 0010 [1] PREEMPT SMP
CPU 0
Modules linked in: mlx4_ib mlx4_core ib_uverbs ib_umad ib_mad ib_core
ppdev parport_pc lp parport button ac battery tsdev dm_snapshot
dm_mirror dm_mod loop i2c_i801 psmouse i2c_core floppy serio_raw
pcspkr shpchp pci_hotplug evdev ext2 mbcache ide_cd cdrom piix
ata_piix libata sd_mod generic ehci_hcd ide_core uhci_hcd e1000
qla2xxx firmware_class scsi_transport_fc scsi_mod thermal processor
fan
Pid: 0, comm: swapper Not tainted 2.6.18-6-clim-amd64 #1
RIP: 0010:[<ffffffff882539ee>]  [<ffffffff882539ee>]
RSP: 0018:ffffffff80597ef8  EFLAGS: 00010246
RAX: ffffffff80625fd8 RBX: ffff8103f12584f0 RCX: ffff8103f125a840
RDX: ffffffff80597f00 RSI: 1144ab87d59a6f6a RDI: ffff8103f12584f0
RBP: ffffffff805cc400 R08: 0000000000000000 R09: ffffffff80597ed8
R10: 00004131a65e699e R11: 0000000000000000 R12: 0000000000000102
R13: ffffffff882539ee R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff80616000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffffffff882539ee CR3: 000000041ac7e000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff80624000, task ffffffff805144c0)
Stack:  ffffffff8028de7c ffffffff80597f00 ffffffff80597f00 ffff810001035400
 0000000000000001 ffffffff80619110 000000000000000a 0000000000000000
 ffffffff8020ffbc ffffffff805144c0 0000000000000046 ffffffff80597f78
Call Trace:
 <IRQ> [<ffffffff8028de7c>] run_timer_softirq+0x13b/0x1be
 [<ffffffff8020ffbc>] __do_softirq+0x52/0xcb
 [<ffffffff8025c31c>] call_softirq+0x1c/0x28
 [<ffffffff8026990d>] do_softirq+0x2c/0x7d
 [<ffffffff8028a7a1>] irq_exit+0x3f/0x4c
 [<ffffffff80272d19>] smp_apic_timer_interrupt+0x3d/0x3f
 [<ffffffff80255a47>] mwait_idle+0x0/0x4a
 [<ffffffff8025bcba>] apic_timer_interrupt+0x66/0x6c
 <EOI> [<ffffffff80255a7d>] mwait_idle+0x36/0x4a
 [<ffffffff80247a78>] cpu_idle+0x92/0xc9
 [<ffffffff80267617>] rest_init+0x3f/0x41
 [<ffffffff8062e8bd>] start_kernel+0x241/0x246
 [<ffffffff8062e288>] _sinittext+0x288/0x28c


Code:  Bad RIP value.
RIP  [<ffffffff882539ee>]
 RSP <ffffffff80597ef8>
CR2: ffffffff882539ee
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!

I'll go in and power-cycle this in a few hours and try again.

Chris
>
> Bart.
>



More information about the general mailing list