[ewg] Interop test failure using OFED-3.5 RC4

Marciniszyn, Mike mike.marciniszyn at intel.com
Mon Jan 14 09:58:17 PST 2013


The new package has been posted, and I verified that the qib <-> qib issue is gone with the new tar ball.    Ido has RESOLVED bz 2410 as well.

Interop could be done with the new perftest/rc4 or just wait for the next RC.

Mike

> -----Original Message-----
> From: Woodruff, Robert J
> Sent: Monday, January 14, 2013 12:52 PM
> To: Ido Shamai; Marciniszyn, Mike
> Cc: Elken, Tom; ewg at lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward;
> Tziporet Koren
> Subject: RE: Interop test failure using OFED-3.5 RC4
> 
> Were you able to get the new package posted yet ?
> 
> We need this ASAP so we can do another OFED-3.5 RC.
> 
> Woody
> 
> 
> -----Original Message-----
> From: Ido Shamai [mailto:idos at dev.mellanox.co.il]
> Sent: Friday, January 11, 2013 12:32 PM
> To: Marciniszyn, Mike
> Cc: Woodruff, Robert J; Elken, Tom; ewg at lists.openfabrics.org; Hefty, Sean;
> Mascarenhas, Edward
> Subject: Re: Interop test failure using OFED-3.5 RC4
> 
> On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
> > I've opened OFED bz 2410 for this issue.
> >
> > Mike
> 
> Great thanks.
> I will apply the patch and release a new version to OFED website tomorrow
> morning.
> 
> Ido
> 
> >> -----Original Message-----
> >> From: Woodruff, Robert J
> >> Sent: Friday, January 11, 2013 1:30 PM
> >> To: Marciniszyn, Mike; Elken, Tom; ewg at lists.openfabrics.org; Ido
> >> Shamai
> >> Subject: RE: Interop test failure using OFED-3.5 RC4
> >>
> >>
> >> Adding Shamai from Mellanox to this thread.
> >>
> >> Woody
> >>
> >> -----Original Message-----
> >> From: ewg-bounces at lists.openfabrics.org [mailto:ewg-
> >> bounces at lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
> >> Sent: Friday, January 11, 2013 7:51 AM
> >> To: Elken, Tom; ewg at lists.openfabrics.org
> >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
> >>
> >> This is definitely a perftest bug.
> >>
> >> This is a significant re-write of these utilities and this bug is a
> >> regression in the routine ctx_set_out_reads().
> >>
> >> In 1.4 the code is this:
> >>
> /****************************************************************
> >> **************
> >>   *
> >>
> >>
> ****************************************************************
> >> **************/
> >> static int ctx_set_out_reads(struct ibv_context *context,int
> >> num_user_reads) {
> >>
> >>
> >>          int max_reads;
> >>
> >>          max_reads = (is_dev_hermon(context) == HERMON) ?
> >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<---------------
> >>
> >>          if (num_user_reads > max_reads) {
> >>                  fprintf(stderr," Number of outstanding reads is
> >> above max = %d\n",max_reads);
> >>                  fprintf(stderr," Changing to that max value\n");
> >>                  num_user_reads = max_reads;
> >>          }
> >>          else if (num_user_reads <= 0) {
> >>                  num_user_reads = max_reads;
> >>          }
> >>
> >>          return num_user_reads;
> >> }
> >>
> >> The new 2.0 code is:
> >>
> /****************************************************************
> >> **************
> >>   *
> >>
> >>
> ****************************************************************
> >> **************/
> >> static int ctx_set_out_reads(struct ibv_context *context,int
> >> num_user_reads) {
> >>
> >>
> >>          int max_reads;
> >>
> >>          Device ib_fdev = ib_dev_name(context);
> >>
> >>          switch (ib_fdev) {
> >>                  case CONNECTIB : ;
> >>                  case CONNECTX3 : ;
> >>                  case CONNECTX2 : ;
> >>                  case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
> break;
> >>                  case LEGACY : max_reads = MAX_OUT_READ; break;
> >>                  default : max_reads = 0; <--------------------
> >>          }
> >>
> >>          if (num_user_reads > max_reads) {
> >>                  printf(RESULT_LINE);
> >>                  fprintf(stderr," Number of outstanding reads is
> >> above max = %d\n",max_reads);
> >>                  fprintf(stderr," Changing to that max value\n");
> >>                  num_user_reads = max_reads;
> >>          }
> >>          else if (num_user_reads <= 0) {
> >>                  num_user_reads = max_reads;
> >>          }
> >>
> >>          return num_user_reads;
> >> }
> >>
> >> The old code will return MAX_OUT_READ, while the new code for any
> >> other HCAs (qib and probably others), will return 0.
> >>
> >> I have a patch that works, while preserving the desired hardcoded
> >> values for "known/legacy" devices:
> >> +
> >>
> +/***************************************************************
> >> *******
> >> +********
> >> + *
> >> +
> >>
> +***************************************************************
> >> ********
> >> +*******/ static int device_max_reads(struct ibv_context *context) {
> >> +       struct ibv_device_attr attr;
> >> +       int ret = 0;
> >> +
> >> +       if (!ibv_query_device(context,&attr)) {
> >> +               ret = attr.max_qp_rd_atom;
> >> +       }
> >> +       return ret;
> >> +}
> >> +
> >>
> >>
> /****************************************************************
> >> **************
> >>    *
> >>
> >>
> ****************************************************************
> >> **************/
> >> @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
> >>                  case CONNECTX2 : ;
> >>                  case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
> break;
> >>                  case LEGACY : max_reads = MAX_OUT_READ; break;
> >> -               default : max_reads = 0;
> >> +               default : max_reads = device_max_reads(context);
> >>          }
> >>
> >>          if (num_user_reads > max_reads) {
> >>
> >> I'm curious why the old and new code used hardcoded values?
> >>
> >> Mike
> >> _______________________________________________
> >> ewg mailing list
> >> ewg at lists.openfabrics.org
> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg




More information about the ewg mailing list