[ofa-general] Re: "ibdiagnet -r" and zero systemguids

Craig Prescott prescott at hpc.ufl.edu
Wed Jul 16 17:41:12 PDT 2008


I forgot to add that other than this
SystemGUID=0x0000000000000000 issue, the HCA appears
to work perfectly.

Thanks,
Craig

Craig Prescott wrote:
> 
> Hi;
> 
> When we run 'ibdiagnet -r' on our OFED 1.2 cluster,
> it bombs with a complaint about a system guid that is
> zero on our only PCI-X HCA in the fabric (see appended).
> ibdiagnet seems to be trying to saw off the leading zeroes
> from the system guid, and to have nothing left afterwards
> seems unexpected.
> 
> Running 'ibdiagnet -r' from an OFED 1.3.1 machine does
> not bomb, but I am still concerned/unclear.
> 
> My questions are: is it ok to have an HCA running
> around on your fabric with a system guid of zero?
> What if there was more than one?  Is there any way to
> assign this HCA a sensible system guid, and would it
> be useful?
> 
> The HCA in question is a Cougar cub running the 3.5.0
> firmware from Mellanox.  FWIW, the node and port guids
> for this HCA look sensible:
> 
> [root at submit ~]# tvflash -g
> HCA #0
> Node  GUID = 0005ad0000050948
> Port1 GUID = 0005ad0000050949
> Port2 GUID = 0005ad000005094a
> 
> If it isn't obvious already, I confess I'm not clear
> about how system guids are used.  From what I can gather
> from google-ing around, a system guid of zero for an HCA
> means that the HCA vendor simply did not assign one.  I
> am under the impression that this is uncommon, but not
> unheard of.  Is that correct?
> 
> I did some searches through both volumes of the 1.2.1 IB
> spec and came up empty, but I could have easily missed any
> substantial discussion about system guids.  Any pointers or
> enlightenment in this area would be appreciated.
> 
> Thanks,
> Craig Prescott
> UF HPC Center
> 
> [root at submit ~]# ibdiagnet -r
> Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
> Loading IBDM from: /usr/lib64/ibdm1.2
> -W- Topology file is not specified.
>     Reports regarding cluster links will use direct routes.
> -I- Using port 1 as the local port.
> -I- Discovering the subnet ... 394 nodes (46 Switches & 348 CA-s) 
> discovered.
> 
> -I- Parsing Subnet file:/tmp/ibdiagnet.lst
> -I- Defined 382/394 systems/nodes
> 
> -I---------------------------------------------------
> -I- Bad Guids Info
> -I---------------------------------------------------
> -W- Found Device with SystemGUID=0x0000000000000000:
>     a HCA    The Local Device "submit.ufhpc/P1" 
> PortGUID=0x0005ad0000050949 at direct path=""
> ...
> -I---------------------------------------------------
> -I- mgid-mlid-HCAs matching table
> -I---------------------------------------------------
> mgid                                  | mlid   | HCAs
> -------------------------------------------------------------------------------- 
> 
> 
> 
> ERROR can't use empty string as operand of "+"
>     while executing
> "if {([removeLeadingZeros $n] > [removeLeadingZeros $end] + 1)} {
>          if {$start == $end} {
>             append res "$end,"
>          } else {
>      ..."
>     (procedure "groupNumRanges" line 15)
>     invoked from within
> "groupNumRanges $NEW_GROUPS($pNs)"
>     (procedure "groupingEngine" line 24)
>     invoked from within
> "groupingEngine $groups"
>     (procedure "compressNames" line 12)
>     invoked from within
> "compressNames $mlidHcas"
>     (procedure "reportFabQualities" line 82)
>     invoked from within
> "reportFabQualities" can't use empty string as operand of "+"
> 
> 




More information about the general mailing list