[ofa-general] How does ib_srpt decide which ports to use?

Bart Van Assche bart.vanassche at gmail.com
Sat Jun 6 10:40:05 PDT 2009


On Sat, Jun 6, 2009 at 7:27 PM, Chris Worley <worleys at gmail.com> wrote:
>
> On Sat, Jun 6, 2009 at 1:36 AM, Bart Van Assche
> <bart.vanassche at gmail.com> wrote:
> > On Sat, Jun 6, 2009 at 1:15 AM, Chris Worley<worleys at gmail.com> wrote:
> >> Setup: 1.4.1 w/ 3 dual-port QDR cards in each of two hosts, all ports
> >> direct connected, opensm running on all port GUIDs from one host, all
> >> links active.
> >>
> >> Problem: ibsrpdm only advertises the first port of the first HCA of the target.
> >> Next problem: I can add targets via
> >> /sys/class/infiniband_srp/srp-*/add_target on the initiator, but only
> >> when naming the two port guids of the first HCA on the target.  In
> >> testing, both ports are used.
> >>
> >> Can somebody aim me in the right direction of what/who's stopping
> >> after the first HCA?
> >
> > Please have a look at the /sys/class/infiniband_srpt/srpt-*/login_info
> > information on the target. The following information should be
> > present:
> > * One /sys/class/infiniband_srpt/srpt-* entry per HCA.
> > * For each HCA, /sys/class/infiniband_srpt/srpt-${HCA}/login_info
> > should contain one line for each port of that HCA.
>
> # cat /sys/class/infiniband_srpt/srpt-*/login_info
> tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000041,service_id=0024710000000040
> tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000042,service_id=0024710000000040
> tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000045,service_id=0024710000000040
> tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000024710000000046,service_id=0024710000000040
> tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000002c903000292af,service_id=0024710000000040
> tid_ext=0024710000000040,ioc_guid=0024710000000040,pkey=ffff,dgid=fe800000000000000002c903000292b0,service_id=0024710000000040
>
> Each port has an entry, and the port GUIDs are correct (dgid's), but
> the rest of the GUIDs refer to the node GUID of the first IB HCA:
> 0024710000000040.
>
> Is that expected?

Yes. The ioc_guid in the above output is a GUID that identifies the
SRP target. A quote from the ib_srpt source code:

/*
 * We do not have a consistent service_id (ie. also id_ext of target_id)
 * to identify this target. We currently use the guid of the first HCA
 * in the system as service_id; therefore, the target_id will change
 * if this HCA is gone bad and replaced by different HCA.
 */

I'm not sure however what ibsrpdm should display -- I don't know
whether it should display one single set of login parameters or all
possible login parameters.

> > On the initiator you can use the information obtained from
> > "login_info" (after having replaced tid_ext by id_ext) to log in to
> > the target:
> > echo ... > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target
>
> Using the first HCA's node GUIDs from my target adds on the initiator
> seems to work, but soon after (and not doing anything w/ the devices)
> the system panic'd (and remote power cycling is not working).  It
> doesn't look like the panic was anywhere in IB or SRP modules:
> [ ... ]

That's bad news. Anyway, if the kernel on the initiator system
crashes, that's a bug in the kernel of the initiator system. I hope
that this can be resolved through a support contract. If not, I'm
afraid that you will have to experiment with kernel versions and OFED
versions in order to find a combination that works.

Bart.



More information about the general mailing list