[ofa-general] RE: [Bug 465] IPoIB CM HA fails after several hoursof failures

Hal Rosenstock halr at voltaire.com
Thu Mar 29 04:34:54 PDT 2007


On Thu, 2007-03-29 at 02:09, Philippe.GREGOIRE at CEA.FR wrote:
> Michael
> tracing route between HCA port and the subnet manager will give the
> lid of the switch connected to this HCA port :
> 
> [root at cors127 ~]# ibstat
> CA 'mthca0'
>         CA type: MT23108
>         Number of ports: 2
>         Firmware version: 3.0.0
>         Hardware version: a1
>         Node GUID: 0x0008f10403962eb0
>         System image GUID: 0x0008f10403962eb3
>         Port 1:
>                 State: Active
>                 Physical state: LinkUp
>                 Rate: 10
>                 Base lid: 26
>                 LMC: 1
>                 SM lid: 14
>                 Capability mask: 0x00110a68
>                 Port GUID: 0x0008f10403962eb1
>         Port 2:
>                 State: Down
>                 Physical state: Polling
>                 Rate: 2
>                 Base lid: 0
>                 LMC: 0
>                 SM lid: 0
>                 Capability mask: 0x00110a68
>                 Port GUID: 0x0008f10403962eb2
> [root at cors127 ~]# ibtracert 26 14
> >From ca {0x0008f10403962eb0} portnum 1 lid 0x1a-0x1b "cors127 HCA-1"
> [1] -> switch port {0x0005ad000001a775}[2] lid 0x2-0x2 "Cisco Switch
> SFS7000"
> [24] -> switch port {0x0005ad0000001834}[5] lid 0x10-0x10 "Topspin
> Switch - U3"
> [3] -> switch port {0x0005ad0000001830}[1] lid 0xe-0xe "Topspin Switch
> - U1"
> To switch {0x0005ad0000001830} portnum 0 lid 0xe-0xe "Topspin Switch -
> U1"
> [root at cors127 ~]# ibtracert 26 14 2>&1 | awk '(NR==2) {print $7}'
> 0x2-0x2
> 
> HCA port lid and its subnet manager lid are available in
> /sys/infiniband, so
> it 's better to do :
> 
> [root at cors127 ~]# ibtracert
> $(</sys/class/infiniband/mthca0/ports/1/lid)
> $(</sys/class/infiniband/mthca0/ports/1/sm_lid) 2>&1 | awk '(NR==2)
> {sub(/-.*/, "", $7); print $7}'
> 0x2
> 
> PS: redirection of stderr to stdout is required as ibtracert gives all
> info on stderr.

This was fixed recently so it depends on the version being used.

-- Hal

> Philippe
> -------- Message d'origine--------
> De: general-bounces at lists.openfabrics.org de la part de Michael S.
> Tsirkin
> Date: mer. 28/03/2007 22:12
> À: Hal Rosenstock
> Cc: Michael S. Tsirkin; general at lists.openfabrics.org;
> bugmail at lists.openfabrics.org
> Objet : Re: [ofa-general] RE: [Bug 465] IPoIB CM HA fails after
> several hoursof failures
> 
> > > > Not true; ibportstate can do this.
> > >
> > > I found that, yes.
> > > However, to automate this fully I need to find the lid
> > > of the switch that is connected to specific HCA ports.
> >
> > So do you have the GUID or LID or the HCA port(s) in question ?
> 
> Yes, that's easy to get.
> 
> > > I expect ibnetdiscover can do this, but was unable to grok
> > > the output syntax.
> >
> > I'll explain once I have the answer to the above question.
> >
> > > Is it documented somewhere?
> >
> > In the man page but this may not be sufficient for your purposes.
> >
> > > Alternatively, can linkinfo be queried with saquery?
> >
> > Not currently.
> 
> 
> 
> --
> MST
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
> 
> 
> 




More information about the general mailing list