[ewg] Update from September OpenFabrics Interoperability Event at UNH-IOL

Bob Noseworthy ren at iol.unh.edu
Sat Sep 5 10:18:57 PDT 2009


Yes indeed,  please refer to the June 09 published Logo List for details:
http://www.iol.unh.edu/services/testing/ofa/interoplist/09jun/#rnic

Best Regards,
- Bob Noseworthy
  Chief Engineer / Technical Sherpa
  +1-909-891-0090 {unified phone number for office, cell, etc}
  +1-603-862-0090 {IOL Main number-associate this with any shipments}
  University of New Hampshire's InterOperability Laboratory (UNH-IOL)

pandit ib wrote:
> Has there been any new interoperability testing between the iWARP
> vendors since Oct 08?
>
> Ranjit
>
>
> On Tue, Oct 21, 2008 at 9:40 AM, Bob Noseworthy<ren at iol.unh.edu> wrote:
>   
>> Greetings EWG members,
>>  A bug for the observed IPoIB issue was logged last Friday,  and updated
>> yesterday confirming that RC3 still demonstrates the issue. This is logged
>> as #1287 --  https://bugs.openfabrics.org/show_bug.cgi?id=1287
>>
>> Further issues/observations from the recent OFA Interoperability Logo
>> Group's September Interoperability Event are at the end of this email.
>> Summary of reported IPoIB issue:
>> If IPoIB datagram mode is enabled,  and IP frames of 8K or larger are sent,
>>  and no ARP entry exists for the destination,  then the first IP frame is
>> always lost (ping used),  no matter what the timeout is set to (as high as
>> 15s)
>>
>>
>> The following is a short summary of various updates from the September
>> OpenFabrics Interoperability Event.  Due to confidentiality reasons, many
>> details are occluded.  Per the request of the IWG on Oct 14, this
>> information is being shared with the EWG.
>>
>> ==================
>>
>>
>> Below are rough notes from our testers, principally Nick Wood and Mike
>> Hagen.
>> IB update;
>>
>> 1. An SDP issue was observed once and not reproduced - suspected to be an
>> issue with starting testing too soon after netserver was started while all
>> three SDP tests were running simultaneously.   When retesting was performed
>> tests were not run simultaneously and no issues were seen.
>>
>> 2. An SRP issues was observed once and not reproduced - A vendors SRP target
>> was seen to become unresponsive when srp_sg_tablesize was increased to 255.
>>  Subsequent testing did not reproduce this behavior but is still being
>> pursued.
>>
>> 2a.  A vendors HCA was seen to perform slowly on SRP transfers,  this was
>> traced to an issue with the default srp_sg_tablesize of 16 had to be
>> increased to 131 for reasonable performance.    Reminder - performance is
>> outside the scope of the Logo program. Tziporet - this default value perhaps
>> could be increased as recommended unless there is a reason 16 is preferred.
>>
>>
>>
>> 3.     There is a link issue between two vendor's HCA cards. The fix that
>> was introduced allowed the link indication light to come up however
>> ibdiagnet never completes (hangs at IPoIB subnets check) and had to be
>> killed. Ibdiagnet also reports the following error:
>>
>>
>> -I---------------------------------------------------
>> -I- PM Counters Info
>> -I---------------------------------------------------
>> -E- Could not get PM info:
>>  "pmGetPortCounters 0xffff 1" failed 4 consecutive times.
>> -E- Could not get PM info:
>>  "pmGetPortCounters 0xffff 1" failed 4 consecutive times.
>> -I- No illegal PM counters values were found
>>
>>  This happens with both VendorA cards when linked to any speed card from
>> VendorB *without* an sm running. If there is an sm running and the fix is in
>> place on the machines housing the VendorA cards then everything works
>> flawlessly when linked with any speed VendorB card.
>>
>>  Upon removal of the cable from the VendorA card, that card gets put into a
>> bad state; with the fix in place and an sm running. The sm does not activate
>> the newly established link. This happened with VendorA cards to any VendorB
>> card. OpenSM also reports an error on screen; OpenSM: SM port is down.
>> Reestablishing the connection that was in place when the opensm instance was
>> started restores the active state.
>>
>>  One final bit of information that I have been able to glean. It does not
>> appear to matter if you restore the original connection that the opensm was
>> started on. The only connection that brings the card back to an active state
>> is if you link it with a qdr hca even if that connection was not the
>> original. If you then attempt to restore the original the active state will
>> not be restored.
>> Currently this issue is presumed to be principally a vendor matter, but if
>> evidence points to additional issues with ibdiagnet, or other OFED matters,
>> then bugs will be filed.
>>
>>
>> 4.  Similar to the above issue,  it was observed that two vendor's HCAs that
>> should link at DDR when directly connected were actually linking at SDR
>> speeds, regardless of the cable used.  This is a known issue however seems
>> to be a failure of the Link Init test procedure as the highest denominator
>> speed is not achieved.
>>
>> 5. An issue with ibdiagnet was discovered by a vendor and bugs submitted
>> (unrelated to issue 3 above)
>>
>> ==================
>>
>> iWARP update;
>>
>> 1. "dapltest -T P" will not work between two  cards.  They both have
>> implemented a different peer2peer protocol that ensures that a client does a
>> transfer before the server, to overcome the limitation in the iWARP standard
>> that says a client must send first data or the connection must be teared
>> down.
>>
>> 2. The section in the IWG test suite covering dapl must be updated to
>> include at least some reference to /etc/dat.conf which must be configured in
>> order to use any dapl based application including many MPIs and dapltest.
>> (This was being addressed by Arlin Davis)
>>
>> 3. dapl2.0 and dapltest2.0 do not work with iWARP devices.  From the base
>> OFED1.4 install dapl2.0-utils must be uninstalled and compat-dapl must be
>> installed from the OFED website.
>>
>> 4. Due to the dapl problems, Intel MPI works in single vendor environments
>> but will not work in multi-vendor environments.
>>
>> 5. The default OpenMPI installed with OFED 1.4 is version 1.2.7.  iWARP
>> support is officially not added until OpenMPI 1.3.
>>
>> 6. Loopback functionality is still not seen by all vendors.  (this has
>> relevance to OFED feature enhancement #1275
>> <https://bugs.openfabrics.org/show_bug.cgi?id=1275>
>>
>> 7.  Dynamic links support was not seen by all vendors when using Intel MPI.
>>
>>
>> ==================
>> ==================
>>
>>
>> Testing is ongoing with RC3 and future 1.4RCs on a best effort basis until
>> the GA, at which time the Logo Event will be held for those participating.
>>  If you have additional questions about these comments,  the
>> Interoperability Events, Logo Events,  or the OFA Interoperability Test
>> Plan, please feel free to contact us here at UNH-IOL,  our OFA
>> Interoperability Logo Group team can be reached at ofalab at iol.unh.edu.
>> <mailto:ofalab at iol.unh.edu>
>> The testplan, logo list and past logo reports can be reviewed at
>> http://www.iol.unh.edu/services/testing/ofa/
>>
>>
>> Best Regards,
>> - Bob Noseworthy
>>  Chief Engineer / Technical Sherpa
>>    +1-909-891-0090 {unified phone number for office, cell, etc}
>>  +1-603-862-0090 {IOL Main number-associate this with any shipments}
>>  UNH-IOL
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Rupert Dance wrote:
>>     
>>> I have sent another reminder to UNH IOL to get this logged. I will
>>> continue
>>> to follow up on this.
>>>
>>> Thanks
>>>
>>> Rupert
>>> -----Original Message-----
>>> From: Tziporet Koren [mailto:tziporet at dev.mellanox.co.il] Sent: Sunday,
>>> October 19, 2008 8:48 AM
>>> To: Rupert Dance
>>> Cc: EWG
>>> Subject: Have you opened bugs to OFED 1.4
>>> I mean the bugs you explained in the last OFED meeting.
>>>
>>> Thanks
>>> Tziporet
>>>
>>>
>>>       
>> _______________________________________________
>> ewg mailing list
>> ewg at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>
>>     



More information about the ewg mailing list