[ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hoursof failures

Philippe.GREGOIRE at CEA.FR Philippe.GREGOIRE at CEA.FR
Thu Mar 29 00:09:48 PDT 2007


Michael
tracing route between HCA port and the subnet manager will give the lid of the switch connected to this HCA port :

[root at cors127 ~]# ibstat
CA 'mthca0'
        CA type: MT23108
        Number of ports: 2
        Firmware version: 3.0.0
        Hardware version: a1
        Node GUID: 0x0008f10403962eb0
        System image GUID: 0x0008f10403962eb3
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 10
                Base lid: 26
                LMC: 1
                SM lid: 14
                Capability mask: 0x00110a68
                Port GUID: 0x0008f10403962eb1
        Port 2:
                State: Down
                Physical state: Polling
                Rate: 2
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x00110a68
                Port GUID: 0x0008f10403962eb2
[root at cors127 ~]# ibtracert 26 14
>From ca {0x0008f10403962eb0} portnum 1 lid 0x1a-0x1b "cors127 HCA-1"
[1] -> switch port {0x0005ad000001a775}[2] lid 0x2-0x2 "Cisco Switch SFS7000"
[24] -> switch port {0x0005ad0000001834}[5] lid 0x10-0x10 "Topspin Switch - U3"
[3] -> switch port {0x0005ad0000001830}[1] lid 0xe-0xe "Topspin Switch - U1"
To switch {0x0005ad0000001830} portnum 0 lid 0xe-0xe "Topspin Switch - U1"
[root at cors127 ~]# ibtracert 26 14 2>&1 | awk '(NR==2) {print $7}'
0x2-0x2

HCA port lid and its subnet manager lid are available in /sys/infiniband, so
it 's better to do :

[root at cors127 ~]# ibtracert $(</sys/class/infiniband/mthca0/ports/1/lid) $(</sys/class/infiniband/mthca0/ports/1/sm_lid) 2>&1 | awk '(NR==2) {sub(/-.*/, "", $7); print $7}'
0x2

PS: redirection of stderr to stdout is required as ibtracert gives all info on stderr.

Philippe
-------- Message d'origine--------
De: general-bounces at lists.openfabrics.org de la part de Michael S. Tsirkin
Date: mer. 28/03/2007 22:12
À: Hal Rosenstock
Cc: Michael S. Tsirkin; general at lists.openfabrics.org; bugmail at lists.openfabrics.org
Objet : Re: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hoursof failures
 
> > > Not true; ibportstate can do this.
> > 
> > I found that, yes.
> > However, to automate this fully I need to find the lid
> > of the switch that is connected to specific HCA ports.
> 
> So do you have the GUID or LID or the HCA port(s) in question ?

Yes, that's easy to get.

> > I expect ibnetdiscover can do this, but was unable to grok
> > the output syntax. 
> 
> I'll explain once I have the answer to the above question.
> 
> > Is it documented somewhere?
> 
> In the man page but this may not be sufficient for your purposes.
> 
> > Alternatively, can linkinfo be queried with saquery?
> 
> Not currently.



-- 
MST
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070329/8183660d/attachment.html>


More information about the general mailing list