[ofw] RE: installation/connectivity problems on hpc server

Hal Rosenstock hal.rosenstock at gmail.com
Mon Nov 16 14:32:19 PST 2009


On Mon, Nov 16, 2009 at 4:52 PM, Anatoly Greenblatt
<anatolyg at voltaire.com> wrote:
> This change was somewhere between ofed 1.3.1 and 1.4.1 timeframe. Until recently I used opensm with ofed 1.3.1.
>
> The sa_query from __port_get_bcast fails.
>
> But now I'm confused. I look into linux code and there the sm_key is not set aswell.
>
> This leads me to think that the difference is in some packing or alignment.

I think it maybe an RMPP issue.

-- Hal

>
> Thanks,
> Anatoly.
>
> -----Original Message-----
> From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com]
> Sent: Monday, November 16, 2009 11:31 PM
> To: Anatoly Greenblatt
> Cc: Sean Hefty; Fab Tillier; Tzachi Dar; Smith, Stan; ofw at lists.openfabrics.org
> Subject: Re: [ofw] RE: installation/connectivity problems on hpc server
>
> On Mon, Nov 16, 2009 at 4:13 PM, Anatoly Greenblatt
> <anatolyg at voltaire.com> wrote:
>> Thanks Hal,
>>
>> But the reason of difference in behavior of linux and windows nodes lies
>> is probably that in windows drivers the sm_key is never set. The sm_key
>> is defined in sa_hdr which is part of mad. If I'm not mistaken, mad
>> buffers are zallocated.
>>
>> Opensm expect OSM_DEFAULT_SM_KEY and it worked until recently because of
>> this endianess bug.
>
> That change was made 5/22/08. That's recent in terms of Windows OpenSM :-)
>
>> Sean, can you check if in linux code (specifically in ipoib default
>> broadcast query/join path) the sm_key is set?
>
> Which SA query/queries is/are failing ? The trust is only needed for
> certain MCMemberRecord queries (not joins/leaves/etc. though), certain
> ServiceRecord queries, PortInfoRecord and PKeyTableRecord queries, and
> certain Sets of InformInfo.
>
> -- Hal
>
>
>> Anatoly.
>>
>>
>> -----Original Message-----
>> From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com]
>> Sent: Monday, November 16, 2009 10:52 PM
>> To: Anatoly Greenblatt
>> Cc: Sean Hefty; Fab Tillier; Tzachi Dar; Smith, Stan;
>> ofw at lists.openfabrics.org
>> Subject: Re: [ofw] RE: installation/connectivity problems on hpc server
>>
>> On Mon, Nov 16, 2009 at 3:32 PM, Anatoly Greenblatt
>> <anatolyg at voltaire.com> wrote:
>>> I think this bug can be noticed only when opensm runs on ppc while
>> host
>>> run on intel platforms or vise-versa.
>>
>> From opensm_release_notes:
>> OpenSM Compatibility
>> --------------------
>> Note that OpenSM version 3.2.1 and earlier used a value of 1 in host
>> byte order for the default SM_Key, so there is a compatibility issue
>> with these earlier versions of OpenSM when the 3.2.2 or later version
>> is running on a little endian machine. This affects SM handover as well
>> as SA queries (saquery tool in infiniband-diags).
>>
>>> Also, I'd appreciate if anyone can point me to the place where this
>>> sa_key filled in sa query (or how it is mapped).
>>
>> The --smkey option in saquery allows this to be specified.
>>
>> -- Hal
>>
>>> Thanks,
>>> Anatoly.
>>>
>>> -----Original Message-----
>>> From: Hal Rosenstock [mailto:hal.rosenstock at gmail.com]
>>> Sent: Monday, November 16, 2009 8:53 PM
>>> To: Sean Hefty
>>> Cc: Anatoly Greenblatt; Fab Tillier; Tzachi Dar; Smith, Stan;
>>> ofw at lists.openfabrics.org
>>> Subject: Re: [ofw] RE: installation/connectivity problems on hpc
>> server
>>>
>>> On Mon, Nov 16, 2009 at 11:46 AM, Sean Hefty <sean.hefty at intel.com>
>>> wrote:
>>>>>Stan/Sean, when you perform linux/windows dapl tests do you use
>>> windows
>>>>>or linux opensm and what is the value of sa_key?
>>>>
>>>> I always use a linux version of opensm, since I share switches with a
>>> linux
>>>> cluster.  I don't know what value the sa_key is, but it's 99% likely
>>> to be
>>>> whatever the default is.
>>>>
>>>> Do you know what exactly the sa_key maps to?
>>>
>>> The default sa_key depends on which version of OpenSM is being used.
>>>
>>> # Note that for both values above (sm_key and sa_key)
>>> # OpenSM version 3.2.1 and below used the default value '1'
>>> # in a host byte order, it is fixed now but you may need to
>>> # change the values to interoperate with old OpenSM running
>>> # on a little endian machine.
>>>
>>>> Is this how the user specifies the
>>>> SM key, or is it something else?
>>>
>>> There are two separate keys: sm_key and sa_key.
>>>
>>> -- Hal
>>>
>>>> - Sean
>>>>
>>>> _______________________________________________
>>>> ofw mailing list
>>>> ofw at lists.openfabrics.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>>>>
>>>
>>
>



More information about the ofw mailing list