[ewg] Interop test failure using OFED-3.5 RC4

Elken, Tom tom.elken at intel.com
Mon Jan 14 10:03:24 PST 2013


BTW,
Mike posted an alternate patch to the Bug 2410, which removed hard-coded values for _all_ HCAs by using ibv_query_device() to query the HCA.  
Thankfully, Ido used that alternate patch.

-Tom

> -----Original Message-----
> From: Marciniszyn, Mike
> Sent: Monday, January 14, 2013 9:58 AM
> To: Woodruff, Robert J; Ido Shamai
> Cc: Elken, Tom; ewg at lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward;
> Tziporet Koren; rsdance at soft-forge.com
> Subject: RE: Interop test failure using OFED-3.5 RC4
> 
> The new package has been posted, and I verified that the qib <-> qib issue is
> gone with the new tar ball.    Ido has RESOLVED bz 2410 as well.
> 
> Interop could be done with the new perftest/rc4 or just wait for the next RC.
> 
> Mike
> 
> > -----Original Message-----
> > From: Woodruff, Robert J
> > Sent: Monday, January 14, 2013 12:52 PM
> > To: Ido Shamai; Marciniszyn, Mike
> > Cc: Elken, Tom; ewg at lists.openfabrics.org; Hefty, Sean; Mascarenhas,
> Edward;
> > Tziporet Koren
> > Subject: RE: Interop test failure using OFED-3.5 RC4
> >
> > Were you able to get the new package posted yet ?
> >
> > We need this ASAP so we can do another OFED-3.5 RC.
> >
> > Woody
> >
> >
> > -----Original Message-----
> > From: Ido Shamai [mailto:idos at dev.mellanox.co.il]
> > Sent: Friday, January 11, 2013 12:32 PM
> > To: Marciniszyn, Mike
> > Cc: Woodruff, Robert J; Elken, Tom; ewg at lists.openfabrics.org; Hefty, Sean;
> > Mascarenhas, Edward
> > Subject: Re: Interop test failure using OFED-3.5 RC4
> >
> > On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
> > > I've opened OFED bz 2410 for this issue.
> > >
> > > Mike
> >
> > Great thanks.
> > I will apply the patch and release a new version to OFED website tomorrow
> > morning.
> >
> > Ido
> >
> > >> -----Original Message-----
> > >> From: Woodruff, Robert J
> > >> Sent: Friday, January 11, 2013 1:30 PM
> > >> To: Marciniszyn, Mike; Elken, Tom; ewg at lists.openfabrics.org; Ido
> > >> Shamai
> > >> Subject: RE: Interop test failure using OFED-3.5 RC4
> > >>
> > >>
> > >> Adding Shamai from Mellanox to this thread.
> > >>
> > >> Woody
> > >>
> > >> -----Original Message-----
> > >> From: ewg-bounces at lists.openfabrics.org [mailto:ewg-
> > >> bounces at lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
> > >> Sent: Friday, January 11, 2013 7:51 AM
> > >> To: Elken, Tom; ewg at lists.openfabrics.org
> > >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
> > >>
> > >> This is definitely a perftest bug.
> > >>
> > >> This is a significant re-write of these utilities and this bug is a
> > >> regression in the routine ctx_set_out_reads().
> > >>
> > >> In 1.4 the code is this:
> > >>
> >
> /****************************************************************
> > >> **************
> > >>   *
> > >>
> > >>
> >
> ****************************************************************
> > >> **************/
> > >> static int ctx_set_out_reads(struct ibv_context *context,int
> > >> num_user_reads) {
> > >>
> > >>
> > >>          int max_reads;
> > >>
> > >>          max_reads = (is_dev_hermon(context) == HERMON) ?
> > >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<---------------
> > >>
> > >>          if (num_user_reads > max_reads) {
> > >>                  fprintf(stderr," Number of outstanding reads is
> > >> above max = %d\n",max_reads);
> > >>                  fprintf(stderr," Changing to that max value\n");
> > >>                  num_user_reads = max_reads;
> > >>          }
> > >>          else if (num_user_reads <= 0) {
> > >>                  num_user_reads = max_reads;
> > >>          }
> > >>
> > >>          return num_user_reads;
> > >> }
> > >>
> > >> The new 2.0 code is:
> > >>
> >
> /****************************************************************
> > >> **************
> > >>   *
> > >>
> > >>
> >
> ****************************************************************
> > >> **************/
> > >> static int ctx_set_out_reads(struct ibv_context *context,int
> > >> num_user_reads) {
> > >>
> > >>
> > >>          int max_reads;
> > >>
> > >>          Device ib_fdev = ib_dev_name(context);
> > >>
> > >>          switch (ib_fdev) {
> > >>                  case CONNECTIB : ;
> > >>                  case CONNECTX3 : ;
> > >>                  case CONNECTX2 : ;
> > >>                  case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
> > break;
> > >>                  case LEGACY : max_reads = MAX_OUT_READ; break;
> > >>                  default : max_reads = 0; <--------------------
> > >>          }
> > >>
> > >>          if (num_user_reads > max_reads) {
> > >>                  printf(RESULT_LINE);
> > >>                  fprintf(stderr," Number of outstanding reads is
> > >> above max = %d\n",max_reads);
> > >>                  fprintf(stderr," Changing to that max value\n");
> > >>                  num_user_reads = max_reads;
> > >>          }
> > >>          else if (num_user_reads <= 0) {
> > >>                  num_user_reads = max_reads;
> > >>          }
> > >>
> > >>          return num_user_reads;
> > >> }
> > >>
> > >> The old code will return MAX_OUT_READ, while the new code for any
> > >> other HCAs (qib and probably others), will return 0.
> > >>
> > >> I have a patch that works, while preserving the desired hardcoded
> > >> values for "known/legacy" devices:
> > >> +
> > >>
> >
> +/***************************************************************
> > >> *******
> > >> +********
> > >> + *
> > >> +
> > >>
> >
> +***************************************************************
> > >> ********
> > >> +*******/ static int device_max_reads(struct ibv_context *context) {
> > >> +       struct ibv_device_attr attr;
> > >> +       int ret = 0;
> > >> +
> > >> +       if (!ibv_query_device(context,&attr)) {
> > >> +               ret = attr.max_qp_rd_atom;
> > >> +       }
> > >> +       return ret;
> > >> +}
> > >> +
> > >>
> > >>
> >
> /****************************************************************
> > >> **************
> > >>    *
> > >>
> > >>
> >
> ****************************************************************
> > >> **************/
> > >> @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
> > >>                  case CONNECTX2 : ;
> > >>                  case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
> > break;
> > >>                  case LEGACY : max_reads = MAX_OUT_READ; break;
> > >> -               default : max_reads = 0;
> > >> +               default : max_reads = device_max_reads(context);
> > >>          }
> > >>
> > >>          if (num_user_reads > max_reads) {
> > >>
> > >> I'm curious why the old and new code used hardcoded values?
> > >>
> > >> Mike
> > >> _______________________________________________
> > >> ewg mailing list
> > >> ewg at lists.openfabrics.org
> > >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg




More information about the ewg mailing list