[Fwd: Re: [openib-general] Re: IBM eHCA testing..]
Brett Bode
brett at scl.ameslab.gov
Fri Oct 14 13:22:47 PDT 2005
On Oct 14, 2005, at 2:27 PM, Troy Benjegerdes wrote:
>
>
> From: Hal Rosenstock <halr at voltaire.com>
> Date: October 14, 2005 12:41:13 PM CDT
> To: Troy Benjegerdes <troy at scl.ameslab.gov>
> Cc: IBMEHCA DD <IBMEHCAD at de.ibm.com>, openib-general at openib.org
> Subject: Re: [openib-general] Re: IBM eHCA testing..
>
>
> On Fri, 2005-10-14 at 12:08, Troy Benjegerdes wrote:
>> Hal Rosenstock wrote:
>>
>>> On Thu, 2005-10-13 at 18:46, Troy Benjegerdes wrote:
>>>
>>>
>>>> I'm also attaching part of an opensm log file.
>>>>
>>>> (the full copy is at http://scl.ameslab.gov/~troy/osm-ehca.log )
>>>>
>>>> The IBM galaxy adapters are at:
>>>> Initial path: [0][1][16]
>>>> Initial path: [0][1][13]
>>>>
>>>>
>>>>
>>>
>>> The OpenSM is just saying that a SMP transaction it issued (in this
>>> case, SM Get P_KeyTable) is timing out (no response made it back to
>>> OpenSM).
>>>
>>> BTW, what svn rev is OpenSM up to ?
>>>
>>> -- Hal
>>>
>>>
>> So, how about a patch to opensm to report what svn rev it was built
>> from ;)
>
> Can you do svn info in the userspace/management/osm directory ?
Path: .
URL: https://openib.org/svn/gen2/trunk/src/linux-kernel/infiniband
Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd
Revision: 3493
Node Kind: directory
Schedule: normal
Last Changed Author: roland
Last Changed Rev: 3487
Last Changed Date: 2005-09-19 17:59:27 -0500 (Mon, 19 Sep 2005)
Properties Last Updated: 2005-02-15 16:24:20 -0600 (Tue, 15 Feb 2005)
>
>> I just discovered another problem.. We have been running pfvs2 over
>> IPoIB on the same subnet, and in debugging this, I restarted opensm
>> several times, and somewhere in the stack a PVFS2 write failed. I
>> wouldn't think that a short downtime of the SM from restarting it
>> would
>> cause any IPoIB TCP sessions to fall over..
>
> As Fab indicated, there are a number of places where the SM/SA is
> needed:
> 1. SA PathRecords (used when a path to a new IP end node is needed or
> an
> existing one timesout)
> 2. SA MCMemberRecord joins, queries, and leaves (used when an interface
> is up'ed, down'ed, etc.)
>
> Is this on an existing TCP session ? Is it OpenIB IPoIB clients at each
> end ? What svn version is being used for this ?
>
> -- Hal
>
It looks like each client node maintains an open TCP stream to each of
the servers. pvfs2 appears to not be very robust to failure. However
the pvfs2 folks just released a new version which changes their network
protocol somewhat. I plan to get the new version installed next week
and will see if it handles things a bit more robustly.
Brett
More information about the general
mailing list