[openib-general] Performance Degradation with OFED v. Voltaire(lustre)
Michael S. Tsirkin
mst at mellanox.co.il
Tue Dec 19 07:48:00 PST 2006
Interesting. So, does lustre actually work on top of rdma_cm?
Quoting r. Bernadat, Philippe <philippe_bernadat at hp.com>:
Subject: RE: Performance Degradation with OFED v. Voltaire(lustre)
I checked. We apparently never go through this path (with lustre)
> -----Original Message-----
> From: Bernadat, Philippe
> Sent: Tuesday, December 19, 2006 4:20 PM
> To: Michael S. Tsirkin
> Cc: Or Gerlitz; Roland Dreier; openib-general at openib.org
> Subject: RE: Performance Degradation with OFED v. Voltaire(lustre)
>
> Sorry to say that this still doesn't do it.
> Are we sure we go this path ?
>
> I double checked the code I compiled and tried was:
>
> static int cma_query_ib_route(struct rdma_id_private
> *id_priv, int timeout_ms,
> struct cma_work *work)
> {
> struct rdma_dev_addr *addr = &id_priv->id.route.addr.dev_addr;
> struct ib_sa_path_rec path_rec;
> ib_sa_comp_mask mask;
>
> memset(&path_rec, 0, sizeof path_rec);
> ib_addr_get_sgid(addr, &path_rec.sgid);
> ib_addr_get_dgid(addr, &path_rec.dgid);
> path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
> path_rec.numb_path = 1;
>
> if (tavor_quirk) {
> path_rec.mtu_selector = IB_SA_LT;
> path_rec.mtu = IB_MTU_2048;
> mask = IB_SA_PATH_REC_MTU_SELECTOR |
> IB_SA_PATH_REC_MTU;
> } else
> mask = 0;
>
> id_priv->query_id = ib_sa_path_rec_get(id_priv->id.device,
> id_priv->id.port_num,
> &path_rec, mask |
> IB_SA_PATH_REC_DGID |
> IB_SA_PATH_REC_SGID |
> IB_SA_PATH_REC_PKEY |
> IB_SA_PATH_REC_NUMB_PATH,
> timeout_ms, GFP_KERNEL,
> cma_query_handler, work,
> &id_priv->query);
>
> return (id_priv->query_id < 0) ? id_priv->query_id : 0;
> }
>
> Philippe
>
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il]
> > Sent: Tuesday, December 19, 2006 1:25 PM
> > To: Bernadat, Philippe
> > Cc: Or Gerlitz; Roland Dreier; openib-general at openib.org
> > Subject: Re: Performance Degradation with OFED v. Voltaire(lustre)
> >
> > > So after a bit more testing, setting the route path mtu to
> > 1024 before
> > > the qp creation (rdma_create_qp()) seems sufficient.
> >
> > OK, so the following fixes the tavor_quirk flag in cma to
> > actually do something.
> > Could you please replace the patch cma_tavor_quirk.patch with
> > this one,
> > set tavor_quirk option for cma module, and see if this works
> > as expected?
> >
> > Unpack OFED 1.1, copy the following to
> > OFED-1.1/openib-1.1/kernel_patches/fixes/cma_tavor_quirk.patch
> > removing the patch by the same name that is in OFED
> > (also remove xxx_cma_tavor_quirk.txt or other patches if you
> > put them there)
> > and then pack OFED 1.1 and rebuild.
> >
> >
> > Thanks,
> >
> > -----------------
> >
> > Tavor systems get better performance with 1K MTU. Since there does
> > not seem to be any way to find out whether the remote system
> > uses Tavor,
> > add an option to limit the MTU globally.
> >
> > Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>
> >
> > ---
> >
> > diff --git a/drivers/infiniband/core/cma.c
> > b/drivers/infiniband/core/cma.c
> > index 50150c8..261bf45 100644
> > --- a/drivers/infiniband/core/cma.c
> > +++ b/drivers/infiniband/core/cma.c
> > @@ -48,6 +48,10 @@ MODULE_AUTHOR("Sean Hefty");
> > MODULE_DESCRIPTION("Generic RDMA CM Agent");
> > MODULE_LICENSE("Dual BSD/GPL");
> >
> > +static int tavor_quirk = 0;
> > +module_param_named(tavor_quirk, tavor_quirk, int, 0644);
> > +MODULE_PARM_DESC(tavor_quirk, "Tavor performance quirk:
> > limit MTU to 1K if > 0");
> > +
> > #define CMA_CM_RESPONSE_TIMEOUT 20
> > #define CMA_MAX_CM_RETRIES 3
> >
> > @@ -1138,6 +1142,7 @@ static int cma_query_ib_route(struct
> > rdma_id_private *id_priv, int timeout_ms,
> > {
> > struct rdma_dev_addr *addr = &id_priv->id.route.addr.dev_addr;
> > struct ib_sa_path_rec path_rec;
> > + ib_sa_comp_mask mask;
> >
> > memset(&path_rec, 0, sizeof path_rec);
> > ib_addr_get_sgid(addr, &path_rec.sgid);
> > @@ -1145,8 +1150,15 @@ static int cma_query_ib_route(struct
> > rdma_id_private *id_priv, int timeout_ms,
> > path_rec.pkey = cpu_to_be16(ib_addr_get_pkey(addr));
> > path_rec.numb_path = 1;
> >
> > + if (tavor_quirk) {
> > + path_rec.mtu_selector = IB_SA_LT;
> > + path_rec.mtu = IB_MTU_2048;
> > + mask = IB_SA_PATH_REC_MTU_SELECTOR | IB_SA_PATH_REC_MTU;
> > + } else
> > + mask = 0;
> > +
> > id_priv->query_id = ib_sa_path_rec_get(id_priv->id.device,
> > - id_priv->id.port_num, &path_rec,
> > + id_priv->id.port_num, &path_rec, mask |
> > IB_SA_PATH_REC_DGID |
> > IB_SA_PATH_REC_SGID |
> > IB_SA_PATH_REC_PKEY |
> > IB_SA_PATH_REC_NUMB_PATH,
> > timeout_ms, GFP_KERNEL,
> >
> > --
> > MST
> >
--
MST
More information about the general
mailing list