[ofa-general] Re: "ibdiagnet -r" and zero systemguids
Craig Prescott
prescott at hpc.ufl.edu
Wed Jul 16 17:41:12 PDT 2008
I forgot to add that other than this
SystemGUID=0x0000000000000000 issue, the HCA appears
to work perfectly.
Thanks,
Craig
Craig Prescott wrote:
>
> Hi;
>
> When we run 'ibdiagnet -r' on our OFED 1.2 cluster,
> it bombs with a complaint about a system guid that is
> zero on our only PCI-X HCA in the fabric (see appended).
> ibdiagnet seems to be trying to saw off the leading zeroes
> from the system guid, and to have nothing left afterwards
> seems unexpected.
>
> Running 'ibdiagnet -r' from an OFED 1.3.1 machine does
> not bomb, but I am still concerned/unclear.
>
> My questions are: is it ok to have an HCA running
> around on your fabric with a system guid of zero?
> What if there was more than one? Is there any way to
> assign this HCA a sensible system guid, and would it
> be useful?
>
> The HCA in question is a Cougar cub running the 3.5.0
> firmware from Mellanox. FWIW, the node and port guids
> for this HCA look sensible:
>
> [root at submit ~]# tvflash -g
> HCA #0
> Node GUID = 0005ad0000050948
> Port1 GUID = 0005ad0000050949
> Port2 GUID = 0005ad000005094a
>
> If it isn't obvious already, I confess I'm not clear
> about how system guids are used. From what I can gather
> from google-ing around, a system guid of zero for an HCA
> means that the HCA vendor simply did not assign one. I
> am under the impression that this is uncommon, but not
> unheard of. Is that correct?
>
> I did some searches through both volumes of the 1.2.1 IB
> spec and came up empty, but I could have easily missed any
> substantial discussion about system guids. Any pointers or
> enlightenment in this area would be appreciated.
>
> Thanks,
> Craig Prescott
> UF HPC Center
>
> [root at submit ~]# ibdiagnet -r
> Loading IBDIAGNET from: /usr/lib64/ibdiagnet1.2
> Loading IBDM from: /usr/lib64/ibdm1.2
> -W- Topology file is not specified.
> Reports regarding cluster links will use direct routes.
> -I- Using port 1 as the local port.
> -I- Discovering the subnet ... 394 nodes (46 Switches & 348 CA-s)
> discovered.
>
> -I- Parsing Subnet file:/tmp/ibdiagnet.lst
> -I- Defined 382/394 systems/nodes
>
> -I---------------------------------------------------
> -I- Bad Guids Info
> -I---------------------------------------------------
> -W- Found Device with SystemGUID=0x0000000000000000:
> a HCA The Local Device "submit.ufhpc/P1"
> PortGUID=0x0005ad0000050949 at direct path=""
> ...
> -I---------------------------------------------------
> -I- mgid-mlid-HCAs matching table
> -I---------------------------------------------------
> mgid | mlid | HCAs
> --------------------------------------------------------------------------------
>
>
>
> ERROR can't use empty string as operand of "+"
> while executing
> "if {([removeLeadingZeros $n] > [removeLeadingZeros $end] + 1)} {
> if {$start == $end} {
> append res "$end,"
> } else {
> ..."
> (procedure "groupNumRanges" line 15)
> invoked from within
> "groupNumRanges $NEW_GROUPS($pNs)"
> (procedure "groupingEngine" line 24)
> invoked from within
> "groupingEngine $groups"
> (procedure "compressNames" line 12)
> invoked from within
> "compressNames $mlidHcas"
> (procedure "reportFabQualities" line 82)
> invoked from within
> "reportFabQualities" can't use empty string as operand of "+"
>
>
More information about the general
mailing list