[ofa-general] Re: [ewg] Update from September OpenFabrics Interoperability Event at UNH-IOL

pandit ib ranjit.pandit.ib at gmail.com
Fri Sep 4 11:27:41 PDT 2009


Has there been any new interoperability testing between the iWARP
vendors since Oct 08?

Ranjit


On Tue, Oct 21, 2008 at 9:40 AM, Bob Noseworthy<ren at iol.unh.edu> wrote:
> Greetings EWG members,
>  A bug for the observed IPoIB issue was logged last Friday,  and updated
> yesterday confirming that RC3 still demonstrates the issue. This is logged
> as #1287 --  https://bugs.openfabrics.org/show_bug.cgi?id=1287
>
> Further issues/observations from the recent OFA Interoperability Logo
> Group's September Interoperability Event are at the end of this email.
> Summary of reported IPoIB issue:
> If IPoIB datagram mode is enabled,  and IP frames of 8K or larger are sent,
>  and no ARP entry exists for the destination,  then the first IP frame is
> always lost (ping used),  no matter what the timeout is set to (as high as
> 15s)
>
>
> The following is a short summary of various updates from the September
> OpenFabrics Interoperability Event.  Due to confidentiality reasons, many
> details are occluded.  Per the request of the IWG on Oct 14, this
> information is being shared with the EWG.
>
> ==================
>
>
> Below are rough notes from our testers, principally Nick Wood and Mike
> Hagen.
> IB update;
>
> 1. An SDP issue was observed once and not reproduced - suspected to be an
> issue with starting testing too soon after netserver was started while all
> three SDP tests were running simultaneously.   When retesting was performed
> tests were not run simultaneously and no issues were seen.
>
> 2. An SRP issues was observed once and not reproduced - A vendors SRP target
> was seen to become unresponsive when srp_sg_tablesize was increased to 255.
>  Subsequent testing did not reproduce this behavior but is still being
> pursued.
>
> 2a.  A vendors HCA was seen to perform slowly on SRP transfers,  this was
> traced to an issue with the default srp_sg_tablesize of 16 had to be
> increased to 131 for reasonable performance.    Reminder - performance is
> outside the scope of the Logo program. Tziporet - this default value perhaps
> could be increased as recommended unless there is a reason 16 is preferred.
>
>
>
> 3.     There is a link issue between two vendor's HCA cards. The fix that
> was introduced allowed the link indication light to come up however
> ibdiagnet never completes (hangs at IPoIB subnets check) and had to be
> killed. Ibdiagnet also reports the following error:
>
>
> -I---------------------------------------------------
> -I- PM Counters Info
> -I---------------------------------------------------
> -E- Could not get PM info:
>  "pmGetPortCounters 0xffff 1" failed 4 consecutive times.
> -E- Could not get PM info:
>  "pmGetPortCounters 0xffff 1" failed 4 consecutive times.
> -I- No illegal PM counters values were found
>
>  This happens with both VendorA cards when linked to any speed card from
> VendorB *without* an sm running. If there is an sm running and the fix is in
> place on the machines housing the VendorA cards then everything works
> flawlessly when linked with any speed VendorB card.
>
>  Upon removal of the cable from the VendorA card, that card gets put into a
> bad state; with the fix in place and an sm running. The sm does not activate
> the newly established link. This happened with VendorA cards to any VendorB
> card. OpenSM also reports an error on screen; OpenSM: SM port is down.
> Reestablishing the connection that was in place when the opensm instance was
> started restores the active state.
>
>  One final bit of information that I have been able to glean. It does not
> appear to matter if you restore the original connection that the opensm was
> started on. The only connection that brings the card back to an active state
> is if you link it with a qdr hca even if that connection was not the
> original. If you then attempt to restore the original the active state will
> not be restored.
> Currently this issue is presumed to be principally a vendor matter, but if
> evidence points to additional issues with ibdiagnet, or other OFED matters,
> then bugs will be filed.
>
>
> 4.  Similar to the above issue,  it was observed that two vendor's HCAs that
> should link at DDR when directly connected were actually linking at SDR
> speeds, regardless of the cable used.  This is a known issue however seems
> to be a failure of the Link Init test procedure as the highest denominator
> speed is not achieved.
>
> 5. An issue with ibdiagnet was discovered by a vendor and bugs submitted
> (unrelated to issue 3 above)
>
> ==================
>
> iWARP update;
>
> 1. "dapltest -T P" will not work between two  cards.  They both have
> implemented a different peer2peer protocol that ensures that a client does a
> transfer before the server, to overcome the limitation in the iWARP standard
> that says a client must send first data or the connection must be teared
> down.
>
> 2. The section in the IWG test suite covering dapl must be updated to
> include at least some reference to /etc/dat.conf which must be configured in
> order to use any dapl based application including many MPIs and dapltest.
> (This was being addressed by Arlin Davis)
>
> 3. dapl2.0 and dapltest2.0 do not work with iWARP devices.  From the base
> OFED1.4 install dapl2.0-utils must be uninstalled and compat-dapl must be
> installed from the OFED website.
>
> 4. Due to the dapl problems, Intel MPI works in single vendor environments
> but will not work in multi-vendor environments.
>
> 5. The default OpenMPI installed with OFED 1.4 is version 1.2.7.  iWARP
> support is officially not added until OpenMPI 1.3.
>
> 6. Loopback functionality is still not seen by all vendors.  (this has
> relevance to OFED feature enhancement #1275
> <https://bugs.openfabrics.org/show_bug.cgi?id=1275>
>
> 7.  Dynamic links support was not seen by all vendors when using Intel MPI.
>
>
> ==================
> ==================
>
>
> Testing is ongoing with RC3 and future 1.4RCs on a best effort basis until
> the GA, at which time the Logo Event will be held for those participating.
>  If you have additional questions about these comments,  the
> Interoperability Events, Logo Events,  or the OFA Interoperability Test
> Plan, please feel free to contact us here at UNH-IOL,  our OFA
> Interoperability Logo Group team can be reached at ofalab at iol.unh.edu.
> <mailto:ofalab at iol.unh.edu>
> The testplan, logo list and past logo reports can be reviewed at
> http://www.iol.unh.edu/services/testing/ofa/
>
>
> Best Regards,
> - Bob Noseworthy
>  Chief Engineer / Technical Sherpa
>    +1-909-891-0090 {unified phone number for office, cell, etc}
>  +1-603-862-0090 {IOL Main number-associate this with any shipments}
>  UNH-IOL
>
>
>
>
>
>
>
>
>
> Rupert Dance wrote:
>>
>> I have sent another reminder to UNH IOL to get this logged. I will
>> continue
>> to follow up on this.
>>
>> Thanks
>>
>> Rupert
>> -----Original Message-----
>> From: Tziporet Koren [mailto:tziporet at dev.mellanox.co.il] Sent: Sunday,
>> October 19, 2008 8:48 AM
>> To: Rupert Dance
>> Cc: EWG
>> Subject: Have you opened bugs to OFED 1.4
>> I mean the bugs you explained in the last OFED meeting.
>>
>> Thanks
>> Tziporet
>>
>>
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>



More information about the general mailing list