[ewg] Interop test failure using OFED-3.5 RC4
Marciniszyn, Mike
mike.marciniszyn at intel.com
Mon Jan 14 09:58:17 PST 2013
The new package has been posted, and I verified that the qib <-> qib issue is gone with the new tar ball. Ido has RESOLVED bz 2410 as well.
Interop could be done with the new perftest/rc4 or just wait for the next RC.
Mike
> -----Original Message-----
> From: Woodruff, Robert J
> Sent: Monday, January 14, 2013 12:52 PM
> To: Ido Shamai; Marciniszyn, Mike
> Cc: Elken, Tom; ewg at lists.openfabrics.org; Hefty, Sean; Mascarenhas, Edward;
> Tziporet Koren
> Subject: RE: Interop test failure using OFED-3.5 RC4
>
> Were you able to get the new package posted yet ?
>
> We need this ASAP so we can do another OFED-3.5 RC.
>
> Woody
>
>
> -----Original Message-----
> From: Ido Shamai [mailto:idos at dev.mellanox.co.il]
> Sent: Friday, January 11, 2013 12:32 PM
> To: Marciniszyn, Mike
> Cc: Woodruff, Robert J; Elken, Tom; ewg at lists.openfabrics.org; Hefty, Sean;
> Mascarenhas, Edward
> Subject: Re: Interop test failure using OFED-3.5 RC4
>
> On 1/11/2013 9:36 PM, Marciniszyn, Mike wrote:
> > I've opened OFED bz 2410 for this issue.
> >
> > Mike
>
> Great thanks.
> I will apply the patch and release a new version to OFED website tomorrow
> morning.
>
> Ido
>
> >> -----Original Message-----
> >> From: Woodruff, Robert J
> >> Sent: Friday, January 11, 2013 1:30 PM
> >> To: Marciniszyn, Mike; Elken, Tom; ewg at lists.openfabrics.org; Ido
> >> Shamai
> >> Subject: RE: Interop test failure using OFED-3.5 RC4
> >>
> >>
> >> Adding Shamai from Mellanox to this thread.
> >>
> >> Woody
> >>
> >> -----Original Message-----
> >> From: ewg-bounces at lists.openfabrics.org [mailto:ewg-
> >> bounces at lists.openfabrics.org] On Behalf Of Marciniszyn, Mike
> >> Sent: Friday, January 11, 2013 7:51 AM
> >> To: Elken, Tom; ewg at lists.openfabrics.org
> >> Subject: Re: [ewg] Interop test failure using OFED-3.5 RC4
> >>
> >> This is definitely a perftest bug.
> >>
> >> This is a significant re-write of these utilities and this bug is a
> >> regression in the routine ctx_set_out_reads().
> >>
> >> In 1.4 the code is this:
> >>
> /****************************************************************
> >> **************
> >> *
> >>
> >>
> ****************************************************************
> >> **************/
> >> static int ctx_set_out_reads(struct ibv_context *context,int
> >> num_user_reads) {
> >>
> >>
> >> int max_reads;
> >>
> >> max_reads = (is_dev_hermon(context) == HERMON) ?
> >> MAX_OUT_READ_HERMON : MAX_OUT_READ;<---------------
> >>
> >> if (num_user_reads > max_reads) {
> >> fprintf(stderr," Number of outstanding reads is
> >> above max = %d\n",max_reads);
> >> fprintf(stderr," Changing to that max value\n");
> >> num_user_reads = max_reads;
> >> }
> >> else if (num_user_reads <= 0) {
> >> num_user_reads = max_reads;
> >> }
> >>
> >> return num_user_reads;
> >> }
> >>
> >> The new 2.0 code is:
> >>
> /****************************************************************
> >> **************
> >> *
> >>
> >>
> ****************************************************************
> >> **************/
> >> static int ctx_set_out_reads(struct ibv_context *context,int
> >> num_user_reads) {
> >>
> >>
> >> int max_reads;
> >>
> >> Device ib_fdev = ib_dev_name(context);
> >>
> >> switch (ib_fdev) {
> >> case CONNECTIB : ;
> >> case CONNECTX3 : ;
> >> case CONNECTX2 : ;
> >> case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
> break;
> >> case LEGACY : max_reads = MAX_OUT_READ; break;
> >> default : max_reads = 0; <--------------------
> >> }
> >>
> >> if (num_user_reads > max_reads) {
> >> printf(RESULT_LINE);
> >> fprintf(stderr," Number of outstanding reads is
> >> above max = %d\n",max_reads);
> >> fprintf(stderr," Changing to that max value\n");
> >> num_user_reads = max_reads;
> >> }
> >> else if (num_user_reads <= 0) {
> >> num_user_reads = max_reads;
> >> }
> >>
> >> return num_user_reads;
> >> }
> >>
> >> The old code will return MAX_OUT_READ, while the new code for any
> >> other HCAs (qib and probably others), will return 0.
> >>
> >> I have a patch that works, while preserving the desired hardcoded
> >> values for "known/legacy" devices:
> >> +
> >>
> +/***************************************************************
> >> *******
> >> +********
> >> + *
> >> +
> >>
> +***************************************************************
> >> ********
> >> +*******/ static int device_max_reads(struct ibv_context *context) {
> >> + struct ibv_device_attr attr;
> >> + int ret = 0;
> >> +
> >> + if (!ibv_query_device(context,&attr)) {
> >> + ret = attr.max_qp_rd_atom;
> >> + }
> >> + return ret;
> >> +}
> >> +
> >>
> >>
> /****************************************************************
> >> **************
> >> *
> >>
> >>
> ****************************************************************
> >> **************/
> >> @@ -496,7 +510,7 @@ static int ctx_set_out_reads(struct ibv_
> >> case CONNECTX2 : ;
> >> case CONNECTX : max_reads = MAX_OUT_READ_HERMON;
> break;
> >> case LEGACY : max_reads = MAX_OUT_READ; break;
> >> - default : max_reads = 0;
> >> + default : max_reads = device_max_reads(context);
> >> }
> >>
> >> if (num_user_reads > max_reads) {
> >>
> >> I'm curious why the old and new code used hardcoded values?
> >>
> >> Mike
> >> _______________________________________________
> >> ewg mailing list
> >> ewg at lists.openfabrics.org
> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
More information about the ewg
mailing list