From mst at mellanox.co.il Thu Sep 1 00:13:01 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Sep 2005 10:13:01 +0300 Subject: [openib-general] [PATCH] memory leaks in ipoib, srp Message-ID: <20050901071301.GF1707@mellanox.co.il> I noticed the following while working on the client data patch. BTW, opinions on the newer version I sent? --- Fix IPoIB and SRP memory leak on device removal. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.12.2/drivers/infiniband/ulp/srp/ib_srp.c =================================================================== --- linux-2.6.12.2.orig/drivers/infiniband/ulp/srp/ib_srp.c 2005-09-01 12:01:29.000000000 +0300 +++ linux-2.6.12.2/drivers/infiniband/ulp/srp/ib_srp.c 2005-09-01 12:04:54.000000000 +0300 @@ -1416,6 +1416,8 @@ static void srp_remove_one(struct ib_dev ib_dealloc_pd(host->pd); kfree(host); } + + kfree(dev_list); } static int __init srp_init_module(void) Index: linux-2.6.12.2/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- linux-2.6.12.2.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2005-09-01 12:01:29.000000000 +0300 +++ linux-2.6.12.2/drivers/infiniband/ulp/ipoib/ipoib_main.c 2005-09-01 12:04:31.000000000 +0300 @@ -1065,6 +1065,8 @@ static void ipoib_remove_one(struct ib_d ipoib_dev_cleanup(priv->dev); free_netdev(priv->dev); } + + kfree(dev_list); } static int __init ipoib_init_module(void) -- MST From mst at mellanox.co.il Thu Sep 1 00:25:30 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Sep 2005 10:25:30 +0300 Subject: [openib-general] [PATCH] sa_query: avoid unnecessary list scan Message-ID: <20050901072530.GG1707@mellanox.co.il> Here's a small cleanup patch for sa query. --- Using ib_get_client_data on sa event performs a list scan. Its better to use container_of to get the sa device directly. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.12.2/drivers/infiniband/core/sa_query.c =================================================================== --- linux-2.6.12.2.orig/drivers/infiniband/core/sa_query.c 2005-09-01 12:01:29.000000000 +0300 +++ linux-2.6.12.2/drivers/infiniband/core/sa_query.c 2005-09-01 12:15:57.000000000 +0300 @@ -431,8 +431,8 @@ static void ib_sa_event(struct ib_event_ event->event == IB_EVENT_LID_CHANGE || event->event == IB_EVENT_PKEY_CHANGE || event->event == IB_EVENT_SM_CHANGE) { - struct ib_sa_device *sa_dev = - ib_get_client_data(event->device, &sa_client); + struct ib_sa_device *sa_dev; + sa_dev = container_of(handler, typeof(*sa_dev), event_handler); schedule_work(&sa_dev->port[event->element.port_num - sa_dev->start_port].update_task); -- MST From danb at voltaire.com Thu Sep 1 02:55:14 2005 From: danb at voltaire.com (Dan Bar Dov) Date: Thu, 1 Sep 2005 12:55:14 +0300 Subject: [openib-general] ISER cleanup Message-ID: The ISER SVN head is all formatted with the kernel's scripts/Lindent. Dan > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Grant Grundler > Sent: Thursday, August 25, 2005 6:32 AM > To: Sean Hubbell > Cc: openib-general at openib.org > Subject: Re: [openib-general] ISER cleanup > > On Wed, Aug 24, 2005 at 12:27:38PM -0400, Sean Hubbell wrote: > > Just a thought, but you can use the gnu indent application > to do this > > very easily (not sure if you did this, I just thought it > might help if > > you have not). Here is a sample command: > > > > indent -kr --use-tabs -i2 -l80 -nhnl sourceFilename > > Please use what is reccomended in Documentation/Codingstyle: > Now, again, GNU indent has the same brain-dead settings > that GNU emacs > has, which is why you need to give it a few command > line options. > However, that's not too bad, because even the makers of > GNU indent > recognize the authority of K&R (the GNU people aren't > evil, they are > just severely misguided in this matter), so you just > give indent the > options "-kr -i8" (stands for "K&R, 8 character > indents"), or use > "scripts/Lindent", which indents in the latest style. > > grant > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Thu Sep 1 03:34:48 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 06:34:48 -0400 Subject: [openib-general] Re: OpenSM 1.8.0 libvendor initial merge nits In-Reply-To: <4316A3B4.8040703@mellanox.co.il> References: <1125509574.4401.3801.camel@hal.voltaire.com> <4315EC87.4060808@mellanox.co.il> <1125515837.4401.3944.camel@hal.voltaire.com> <4316A3B4.8040703@mellanox.co.il> Message-ID: <1125570888.4401.4975.camel@hal.voltaire.com> On Thu, 2005-09-01 at 02:46, Eitan Zahavi wrote: > > Hmm, I update autoconf, automake, and libtool and now get: > > + automake --foreign --add-missing --copy > > configure.in: installing `config/install-sh' > > configure.in: installing `config/missing' > > Makefile.am:27: OSMV_OPENIB does not appear in AM_CONDITIONAL > > Makefile.am:31: OSMV_SIM does not appear in AM_CONDITIONAL > > Makefile.am:44: OSMV_GEN1 does not appear in AM_CONDITIONAL > ... > > Any ideas ? > I see this too on some machines (RH 7.3) but you can see these > variables are declared AM_CONDITIONAL in the config/osmv.m4 > I will double check. Might be missing a "fi" somewhere... I think you mean ../osm/config/osmvsel.m4. How is this/does it need to be included at the libvendor level when autogen.sh is run from there ? Also, now that there are new minimum requirements for at least automake and autoconf (not sure about libtool), shouldn't these be reflected in the all the osm autotools scripts ? Do you know what they are ? -- Hal From halr at voltaire.com Thu Sep 1 04:18:32 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 07:18:32 -0400 Subject: [openib-general] OpenSM 1.8.0 osmtest changes Message-ID: <1125573512.4401.5060.camel@hal.voltaire.com> Hi Yael, I noticed the following files changed in osmtest but were not listed in the file list you provided: M osmt_service.c M main.c M osmt_slvl_vl_arb.c Just want to verify that these should be merged. Thanks. -- Hal From guyg at voltaire.com Thu Sep 1 04:14:15 2005 From: guyg at voltaire.com (Guy German) Date: Thu, 01 Sep 2005 14:14:15 +0300 Subject: [openib-general][PATCH][kdapl]: FMR and EVD patch In-Reply-To: <1125494751.3794.24.camel@r2d2> References: <1125413613.4127.13.camel@r2d2> <1125494751.3794.24.camel@r2d2> Message-ID: <1125573255.7485.19.camel@r2d2> Hi James, I don't think I completely made sense in the former flow description. Let me present 2 options for flows with interrupt<->thread handle for the gen2 verbs, and please correct me if I am missing something : - completion callback is called (interrupt/tasklet context) - callback wakes up thread - thread polls all completions - request completion notification - if cq !empty go back to continue polling Another option (that Gleb suggested) can be : - completion callback is called - callback requests completion notification - callback polls the cq and inserts all completions in a ulp's queue - thread is waiting on the ulp-queue and handles the wc's. Thanks, Guy From mst at mellanox.co.il Thu Sep 1 05:50:45 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Sep 2005 15:50:45 +0300 Subject: [openib-general] [PATCH applied] sdp_link locking Message-ID: <20050901125045.GJ1707@mellanox.co.il> The following patch adds locking to sdp_link. It does not yet include request cancellation, and the workqueue handling code also seems buggy (cancels must flush workqueue, and ideally we wouldnt need a separate workqueue at all). --- Add locking to sdp link code. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.12.2/drivers/infiniband/ulp/sdp/sdp_link.c =================================================================== --- linux-2.6.12.2.orig/drivers/infiniband/ulp/sdp/sdp_link.c 2005-09-01 17:35:49.000000000 +0300 +++ linux-2.6.12.2/drivers/infiniband/ulp/sdp/sdp_link.c 2005-09-01 17:38:44.000000000 +0300 @@ -128,6 +128,7 @@ struct sdp_link_arp { static kmem_cache_t *wait_cache; static kmem_cache_t *info_cache; +static DECLARE_MUTEX(sdp_link_mutex); static LIST_HEAD(info_list); static struct workqueue_struct *link_wq; @@ -150,7 +151,7 @@ static u64 path_lookup_id; /* * proto */ -static void do_link_path_lookup(void *data); +static void retry_link_path_lookup(void *); /* * sdp_link_path_complete - generate a path record completion for user @@ -219,8 +220,8 @@ static struct sdp_path_info *sdp_path_in info->arp_time = SDP_LINK_ARP_TIME_MIN; INIT_LIST_HEAD(&info->wait_list); + INIT_WORK(&info->timer, retry_link_path_lookup, info); list_add(&info->info_list, &info_list); - INIT_WORK(&info->timer, do_link_path_lookup, info); return info; } @@ -252,6 +253,8 @@ static void sdp_link_path_rec_done(int s struct sdp_path_wait *sweep; int result; + down(&sdp_link_mutex); + info->query = NULL; sdp_dbg_data(NULL, "Path Record status <%d>", status); @@ -301,6 +304,7 @@ static void sdp_link_path_rec_done(int s sdp_path_info_destroy(info, result); } } + up(&sdp_link_mutex); } /* @@ -345,9 +349,8 @@ static int sdp_link_path_rec_get(struct /* * do_link_path_lookup - resolve an ip address to a path record */ -static void do_link_path_lookup(void *data) +static void do_link_path_lookup(struct sdp_path_info *info) { - struct sdp_path_info *info = data; struct ipoib_dev_priv *priv; struct net_device *dev = NULL; struct rtable *rt; @@ -376,11 +379,6 @@ static void do_link_path_lookup(void *da */ if (info->flags & SDP_LINK_F_PATH) goto done; - /* - * route information present, but no path query. - */ - if (info->ca) - goto path; result = ip_route_output_key(&rt, &fl); if (result < 0 || !rt) { @@ -543,6 +541,13 @@ error: ip_rt_put(rt); } +static void retry_link_path_lookup(void *data) +{ + down(&sdp_link_mutex); + do_link_path_lookup(data); + up(&sdp_link_mutex); +} + /* * Public functions */ @@ -566,7 +571,9 @@ int sdp_link_path_lookup(u32 dst_addr, { struct sdp_path_info *info; struct sdp_path_wait *wait; - int result; + int result = 0; + + down(&sdp_link_mutex); *id = _SDP_PATH_LOOKUP_ID(); @@ -589,7 +596,7 @@ int sdp_link_path_lookup(u32 dst_addr, */ if (info->flags & SDP_LINK_F_VALID) { sdp_link_path_complete(*id, 0, info, completion, arg); - return 0; + goto done; } /* * add request to list of lookups. @@ -598,7 +605,7 @@ int sdp_link_path_lookup(u32 dst_addr, if (!wait) { sdp_dbg_warn(NULL, "Failed to create path wait object"); result = -ENOMEM; - goto error; + goto done; } wait->id = *id; @@ -613,8 +620,8 @@ int sdp_link_path_lookup(u32 dst_addr, if (!((SDP_LINK_F_ARP|SDP_LINK_F_PATH) & info->flags)) do_link_path_lookup(info); - return 0; -error: +done: + up(&sdp_link_mutex); return result; } @@ -630,6 +637,7 @@ static void sdp_link_sweep(void *data) struct sdp_path_info *info; struct sdp_path_info *sweep; + down(&sdp_link_mutex); list_for_each_entry_safe(info, sweep, &info_list, info_list) { if (jiffies > (info->use + SDP_LINK_INFO_TIMEOUT)) { sdp_dbg_ctrl(NULL, @@ -643,6 +651,7 @@ static void sdp_link_sweep(void *data) sdp_path_info_destroy(info, -ETIMEDOUT); } } + up(&sdp_link_mutex); queue_delayed_work(link_wq, &link_timer, SDP_LINK_SWEEP_INTERVAL); } @@ -656,8 +665,8 @@ static void sdp_link_sweep(void *data) */ static void sdp_link_arp_work(void *data) { - struct sdp_work *work = (struct sdp_work *)data; - struct sk_buff *skb = (struct sk_buff *)work->arg; + struct sdp_work *work = data; + struct sk_buff *skb = work->arg; struct sdp_path_info *info; struct sdp_link_arp *arp; int result; @@ -670,6 +679,8 @@ static void sdp_link_arp_work(void *data (arp->src_ip & 0x00ff0000) >> 16, (arp->src_ip & 0xff000000) >> 24, GID_ARG(arp->src_gid)); + + down(&sdp_link_mutex); /* * find a path info structure for the source IP address. */ @@ -696,6 +707,7 @@ static void sdp_link_arp_work(void *data } done: + up(&sdp_link_mutex); kfree_skb(skb); kfree(work); } @@ -811,6 +823,7 @@ void sdp_link_addr_cleanup(void) * remove ARP packet processing. */ dev_remove_pack(&sdp_arp_type); + /* * destroy work queue */ -- MST From tom at ammasso.com Thu Sep 1 06:10:43 2005 From: tom at ammasso.com (Tom Tucker) Date: Thu, 1 Sep 2005 09:10:43 -0400 Subject: [openib-general] [PATCH][iWARP] Added provider CM verbsandquery provider methods Message-ID: <8E9D028761D8264D910612167E8457E8FA3C94@mail2.ammasso.com> > -----Original Message----- > From: Gleb Natapov [mailto:glebn at voltaire.com] > Sent: Wednesday, August 31, 2005 1:55 AM > To: Caitlin Bestler > Cc: Tom Tucker; openib-general at openib.org > Subject: Re: [openib-general] [PATCH][iWARP] Added provider > CM verbsandquery provider methods > > On Tue, Aug 30, 2005 at 12:00:42PM -0700, Caitlin Bestler wrote: > > > > TOE for the purposes of RDMA may have more legs within the > > > community, > > > > however, this has yet to be tested. > > > Is it possible to implement RDMA semantics using linux native > > > TCP stack (with hardware assistance of cause)? Just asking. > > > > > > > > > > It is possible to implement RDMA on the host processor. But > > it will not match the performance of hardware. The difference > > will be substantial at 10G. If someobody could build a software > > only solution that performed at 10G they would have done so. > > Having zero manufacturing cost would give them quite a > > competitive edge over solutions that required hardware. > > > I am not talking about software only solution. Hardware assistance > is needed, but something less then TOE. Something stateless like > Dave wants. > > > The need for offload has more to do with memory bandwidth > > than raw processing power. The data bandwidth required to > > support look-up of large data structures and for placement > > of the raw payload nearly consumes the bus bandwidth when > > operating at peak wire speeds. If you make that worse by > > moving the raw packets over the wire, and *then* copying > > them to a final location (a second memory move) *and* > > additional memory touches for accessing control structures... > > > Linux already have the infrastructure for zero-copy send, with > some hardware help it is possible to implement zero-copy receive too. > Moving data in memory is out of the question. > > Anyway I think this questions should be answered before moving this > discussion to netdev. I don't know how to do it and I've thought quite a bit about it. If you want the semantics specified for RDDP over a reliable transport, then you need to do TCP in hardware. Consider RDMA_WRITE. The target buffer is specified in the RDDP header which is in turn carried as part of the TCP payload. Therefore in order for the hardware to get to the RDDP header, it must crack the TCP header. The TCP processing can't be stateless because the hardware must discard things such as duplicates or risk overwriting an application buffer that has already been written, delivered, and reused for some other purpose. There are many other examples, but the point is TCP state information is needed by the hardware. > > -- > Gleb. > From jlentini at netapp.com Thu Sep 1 06:46:40 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 1 Sep 2005 09:46:40 -0400 (EDT) Subject: [openib-general] [PATCH][MSTFLINT] fix segfault in full usage print Message-ID: fix segfault in full usage print Signend-off-by: James Lentini Index: flint.cpp =================================================================== --- flint.cpp (revision 3279) +++ flint.cpp (working copy) @@ -4649,7 +4649,8 @@ void usage(const char *sname, bool full printf(descr, sname); if (full) { - printf(full_descr); + printf(full_descr, sname, sname, sname, sname, sname, sname, sname, + sname, sname, sname, sname); } } From dotanb at mellanox.co.il Thu Sep 1 07:33:25 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 1 Sep 2005 17:33:25 +0300 Subject: [openib-general] can one create a QP in kernel level with inline data? Message-ID: <506C3D7B14CDD411A52C00025558DED6089DCA13@mtlex01.yok.mtl.com> Can one create a QP in kernel level with inline data? and what about posting data with the inline flag being set? thanks Dotan Barak Software Verification Engineer Mellanox Technologies LTD Tel: +972-4-9097200 Ext: 231 Fax: +972-4-9593245 P.O. Box 86 Yokneam 20692 ISRAEL. Home: +972-77-8841095 Cell: 052-4222383 [ May the fork be with you ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Sep 1 07:53:07 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 10:53:07 -0400 Subject: [openib-general] kernel oops In-Reply-To: <1125532576.4401.4283.camel@hal.voltaire.com> References: <430F4DBD.4070703@xsigo.com> <43138B0E.1090309@ichips.intel.com> <1125407465.4401.1246.camel@hal.voltaire.com> <43148390.7010605@ichips.intel.com> <1125445011.4401.2434.camel@hal.voltaire.com> <4314F0DC.6020803@ichips.intel.com> <1125532576.4401.4283.camel@hal.voltaire.com> Message-ID: <1125586387.4398.74.camel@hal.voltaire.com> On Wed, 2005-08-31 at 19:56, Hal Rosenstock wrote: > I'll work on a patch for this. Here's a patch for this. Let me know if it works. [I tried it out and it works for me.] If it does, the next question is how does the pointer get trashed. -- Hal In ib_at_route_by_ip, validate IB device supplied in ib_at_ib_route structure passed in. Signed-off-by: Hal Rosenstock Index: at.c =================================================================== --- at.c (revision 3291) +++ at.c (working copy) @@ -1369,11 +1369,30 @@ int ib_at_paths_by_route(struct ib_at_ib { struct path_req *preq; struct async *parent; + struct ib_at_dev *ib_dev, *e; + struct ipoib_dev_priv *priv; + int found = 0; /* int r; */ if (!ib_route || npath <= 0 || !path_arr) return -EINVAL; + /* If supplied, validate ib_device pointer in supplied ib_route */ + if (ib_route->out_dev) { + for (ib_dev = ib_at_devs, e = ib_dev + IB_AT_MAX_DEV; + ib_dev < e; ib_dev++) { + if (!ib_dev->netdev || !ib_dev->valid) + continue; + priv = ib_dev->netdev->priv; + if (priv->ca == ib_route->out_dev) { + found = 1; + break; + } + } + if (!found) + return -EINVAL; + } + if (!(preq = kmem_cache_alloc(path_req_cache, SLAB_KERNEL))) return -ENOMEM; @@ -1475,6 +1494,7 @@ EXPORT_SYMBOL(ib_at_ips_by_subnet); int ib_at_invalidate_paths(struct ib_at_ib_route *ib_route) { + /* Need to validate ib_route->out_dev if supplied */ return 0; /* no caching for now */ } EXPORT_SYMBOL(ib_at_invalidate_paths); From jlentini at netapp.com Thu Sep 1 08:01:44 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 1 Sep 2005 11:01:44 -0400 (EDT) Subject: [openib-general] [ANNOUNCE] Address Translation Service (ATS) Specification posted Message-ID: The DAT Collaborative has ratified a formal specification of the Address Translation Service (ATS). It is available on their website: http://www.datcollaborative.org/ATS_v1.pdf From glebn at voltaire.com Thu Sep 1 08:12:50 2005 From: glebn at voltaire.com (Gleb Natapov) Date: Thu, 1 Sep 2005 18:12:50 +0300 Subject: [openib-general] RDMA Generic Connection Management In-Reply-To: <4315F2DC.5060202@ichips.intel.com> References: <1125323947.6584.106.camel@r2d2> <431374D4.5080909@ichips.intel.com> <52hdd8lb1e.fsf@cisco.com> <4315E0D6.6060508@ichips.intel.com> <52d5nudmdc.fsf@cisco.com> <4315F2DC.5060202@ichips.intel.com> Message-ID: <20050901151249.GG21040@minantech.com> On Wed, Aug 31, 2005 at 11:11:40AM -0700, Sean Hefty wrote: > I don't have a good solution yet for calls like ib_cma_get_device(). Yet > another possibility is to have it return a device pointer in a callback. > Then it can synchronize with device removal internally. > What if ib_cma_get_device() will return client data for the device and we let the ULP to figure out whether the data is still valid in the way most suitable for the ULP? -- Gleb. From rolandd at cisco.com Thu Sep 1 08:32:12 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 01 Sep 2005 08:32:12 -0700 Subject: [openib-general] Re: ibv_get_async_event References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <43162ECE.3070405@ichips.intel.com> Message-ID: <52y86gajnn.fsf@cisco.com> I found at least one silly bug in my code that could cause the crash you saw. Can you pull the latest svn kernel code and try again? By the way, I found regress.sh in the dapl source but I'm not sure what it means to "set to 40x40." Can you explain that for a non-DAPL expert? Thanks, Roland From rolandd at cisco.com Thu Sep 1 08:41:15 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 01 Sep 2005 08:41:15 -0700 Subject: [openib-general] Re: can one create a QP in kernel level with inline data? In-Reply-To: <506C3D7B14CDD411A52C00025558DED6089DCA13@mtlex01.yok.mtl.com> (Dotan Barak's message of "Thu, 1 Sep 2005 17:33:25 +0300") References: <506C3D7B14CDD411A52C00025558DED6089DCA13@mtlex01.yok.mtl.com> Message-ID: <52psrsaj8k.fsf@cisco.com> Dotan> Can one create a QP in kernel level with inline data? and Dotan> what about posting data with the inline flag being set? No, this is not implemented yet. - R. From mshefty at ichips.intel.com Thu Sep 1 09:04:37 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Sep 2005 09:04:37 -0700 Subject: [openib-general] kernel oops In-Reply-To: <1125586387.4398.74.camel@hal.voltaire.com> References: <430F4DBD.4070703@xsigo.com> <43138B0E.1090309@ichips.intel.com> <1125407465.4401.1246.camel@hal.voltaire.com> <43148390.7010605@ichips.intel.com> <1125445011.4401.2434.camel@hal.voltaire.com> <4314F0DC.6020803@ichips.intel.com> <1125532576.4401.4283.camel@hal.voltaire.com> <1125586387.4398.74.camel@hal.voltaire.com> Message-ID: <43172695.3000200@ichips.intel.com> Hal Rosenstock wrote: > Here's a patch for this. Let me know if it works. [I tried it out and it > works for me.] If it does, the next question is how does the pointer get > trashed. I don't think that the pointer is getting trashed. The SA was not running, so I don't think that any route was returned. - Sean From mshefty at ichips.intel.com Thu Sep 1 09:10:14 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Sep 2005 09:10:14 -0700 Subject: [openib-general] RDMA Generic Connection Management In-Reply-To: <20050901151249.GG21040@minantech.com> References: <1125323947.6584.106.camel@r2d2> <431374D4.5080909@ichips.intel.com> <52hdd8lb1e.fsf@cisco.com> <4315E0D6.6060508@ichips.intel.com> <52d5nudmdc.fsf@cisco.com> <4315F2DC.5060202@ichips.intel.com> <20050901151249.GG21040@minantech.com> Message-ID: <431727E6.9010408@ichips.intel.com> Gleb Natapov wrote: > On Wed, Aug 31, 2005 at 11:11:40AM -0700, Sean Hefty wrote: > >>I don't have a good solution yet for calls like ib_cma_get_device(). Yet >>another possibility is to have it return a device pointer in a callback. >>Then it can synchronize with device removal internally. >> > > What if ib_cma_get_device() will return client data for the device and > we let the ULP to figure out whether the data is still valid in the way > most suitable for the ULP? While I think that returning the client data would be useful, I don't think that this really helps the ULP any. It seems likely that a client would free their client data upon removal of the associated device. So they can't trust this pointer any more than the device pointer. - Sean From rolandd at cisco.com Thu Sep 1 09:14:28 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 01 Sep 2005 09:14:28 -0700 Subject: [openib-general] Re: [PATCH] memory leaks in ipoib, srp In-Reply-To: <20050901071301.GF1707@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 1 Sep 2005 10:13:01 +0300") References: <20050901071301.GF1707@mellanox.co.il> Message-ID: <52ll2gahp7.fsf@cisco.com> Michael> I noticed the following while working on the client data patch. Good catch, applied. Michael> BTW, opinions on the newer version I sent? I'm undecided. On the one hand, it seems like a reasonable thing to do. On the other hand, I'm not sure whether having two remove entry points is just going to confuse people. And in any case I don't see the real motivation for making the change now. - R. From rolandd at cisco.com Thu Sep 1 09:20:36 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 01 Sep 2005 09:20:36 -0700 Subject: [openib-general] Re: [PATCH] sa_query: avoid unnecessary list scan In-Reply-To: <20050901072530.GG1707@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 1 Sep 2005 10:25:30 +0300") References: <20050901072530.GG1707@mellanox.co.il> Message-ID: <52br3cahez.fsf@cisco.com> Thanks, applied. From glebn at voltaire.com Thu Sep 1 09:23:07 2005 From: glebn at voltaire.com (Gleb Natapov) Date: Thu, 1 Sep 2005 19:23:07 +0300 Subject: [openib-general] RDMA Generic Connection Management In-Reply-To: <431727E6.9010408@ichips.intel.com> References: <1125323947.6584.106.camel@r2d2> <431374D4.5080909@ichips.intel.com> <52hdd8lb1e.fsf@cisco.com> <4315E0D6.6060508@ichips.intel.com> <52d5nudmdc.fsf@cisco.com> <4315F2DC.5060202@ichips.intel.com> <20050901151249.GG21040@minantech.com> <431727E6.9010408@ichips.intel.com> Message-ID: <20050901162307.GH21040@minantech.com> On Thu, Sep 01, 2005 at 09:10:14AM -0700, Sean Hefty wrote: > Gleb Natapov wrote: > >On Wed, Aug 31, 2005 at 11:11:40AM -0700, Sean Hefty wrote: > > > >>I don't have a good solution yet for calls like ib_cma_get_device(). Yet > >>another possibility is to have it return a device pointer in a callback. > >>Then it can synchronize with device removal internally. > >> > > > >What if ib_cma_get_device() will return client data for the device and > >we let the ULP to figure out whether the data is still valid in the way > >most suitable for the ULP? > > While I think that returning the client data would be useful, I don't think > that this really helps the ULP any. It seems likely that a client would > free their client data upon removal of the associated device. So they > can't trust this pointer any more than the device pointer. > The point is to let client decide how it synchronise access to the data with remove callback. For instance it may use semaphore like this: in connect: in remove callback: down(sem) down(sem) ptr = ib_cma_get_device() free(ptr) use ptr up(sem) up(sem) -- Gleb. From mshefty at ichips.intel.com Thu Sep 1 09:27:11 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Sep 2005 09:27:11 -0700 Subject: [openib-general] Re: [RFC] change to ib_create_cm_id() In-Reply-To: <20050901065904.GC1707@mellanox.co.il> References: <4316302D.2030401@ichips.intel.com> <20050901065904.GC1707@mellanox.co.il> Message-ID: <43172BDF.4040207@ichips.intel.com> Michael S. Tsirkin wrote: >>This will bind all cm_id's to a specific device, including cm_id's >>associated with listens. This will help prevent the CM from returning a >>cm_id associated with a device that a consumer may have already seen as >>removed. > > Looking at the API, cm_ids are not currently associated with a specific device. > What am I missing? The proposal is to change the cm_id's so that they become associated with a specific device. Currently, they are not visibly associated with a device. > So, I gather a ULP would need a list of cm_ids per connection, scanning > all of them on each cm operation, scanning and updating > these lists in all listening connections on each hotplug event. I'm not following you here. For active connections, clients already need to create a QP that is associated with a device. This simply binds the cm_id to that same device. On passive connection callbacks, the created cm_id would likewise be associated with the same device. > I wander whether cm can do the same thing internally, making the list > part of the cm id object? The intent is to make the associated between a cm_id and a device explicit. A client would then be able to destroy all cm_id's associated with a device that was being removed. The change affects listeners the most. Instead of calling listen once, it would need to be called once for each device. Since most clients have per device context, the listen cm_id would need to move from a single global structure into the per device structures maintained by the client. - Sean From caitlinb at broadcom.com Thu Sep 1 09:35:02 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 1 Sep 2005 09:35:02 -0700 Subject: [openib-general] [PATCH][iWARP] Added provider CM verbsandquery provider methods Message-ID: <54AD0F12E08D1541B826BE97C98F99F1F592@NT-SJCA-0751.brcm.ad.broadcom.com> > > Linux already have the infrastructure for zero-copy send, with some > > hardware help it is possible to implement zero-copy receive too. > > Moving data in memory is out of the question. > > > > Anyway I think this questions should be answered before moving this > > discussion to netdev. > > I don't know how to do it and I've thought quite a bit about it. > > If you want the semantics specified for RDDP over a reliable > transport, then you need to do TCP in hardware. > Consider RDMA_WRITE. The target buffer is specified in the > RDDP header which is in turn carried as part of the TCP payload. > Therefore in order for the hardware to get to the RDDP > header, it must crack the TCP header. The TCP processing > can't be stateless because the hardware must discard things > such as duplicates or risk overwriting an application buffer > that has already been written, delivered, and reused for some > other purpose. There are many other examples, but the point > is TCP state information is needed by the hardware. > I fully agree with this analysis. To go further, if the receive buffer is specified by *either* RDMA or TCP semantics then you cannot place directly to that buffer from the hardware unless the hardware owns the TCP connection. Doing zero-copy at the Ethernet or IP layer does not accomplish anything -- the application is thinking TCP or RDMA. It is also important to remember that there are multiple aspects to how RDMA and/or a QP/CQ interface improves efficiency. Placing directly from the raw receive buffer to the user's buffer without requiring a context switch is only part of it. The elimination of excess notifications to the user is equally vital. Any stateless design will not only target the wrong buffers, it will have no idea as to when the application really needs to be woken up. This is not really the right forum to discuss whether these hardware enhancements are valuable. If they were not valuable companies would not be building them. Building products presupposes end customers that are willing to pay for those products. If there are no customers then these features will go away no matter what is decided here. If the customers want them they will find a way to deploy the hardware. So the question is how to integrate these features in a way that preserve the authority of the main stack in defending against attacks, regulating traffic, etc. The alternative is not for hardware offload to go away, but for it not to be integrated. I hope we can all agree that the latter is not a desirable outcome. From halr at voltaire.com Thu Sep 1 09:32:55 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 12:32:55 -0400 Subject: [openib-general] OpenSM Build from osm level Message-ID: <1125592374.4398.149.camel@hal.voltaire.com> Hi Eitan, When I run autogen.sh from the osm level, libvendor now works (but opensm and osmtest don't). -- Hal Visiting libvendor/autogen.sh | + aclocal -I config -I ../config | + libtoolize --force --copy | Putting files in AC_CONFIG_AUX_DIR, `config'. | + autoheader | + automake --foreign --add-missing --copy | + autoconf Visiting opensm/autogen.sh | + aclocal -I config -I ../config | + libtoolize --force --copy | Putting files in AC_CONFIG_AUX_DIR, `config'. | + autoheader | + automake --foreign --add-missing --copy | automake: LINK was already defined in condition OSMV_SIM, which is included in condition TRUE ... | Makefile.am:72: ... `LINK' previously defined here | + autoconf Visiting osmtest/autogen.sh | + aclocal -I config -I ../config | + libtoolize --force --copy | Putting files in AC_CONFIG_AUX_DIR, `config'. | + autoheader | + automake --foreign --add-missing --copy | automake: LINK was already defined in condition OSMV_SIM, which is included in condition TRUE ... | Makefile.am:24: ... `LINK' previously defined here | + autoconf From ardavis at ichips.intel.com Thu Sep 1 09:45:03 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 01 Sep 2005 09:45:03 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <52y86gajnn.fsf@cisco.com> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <43162ECE.3070405@ichips.intel.com> <52y86gajnn.fsf@cisco.com> Message-ID: <4317300F.9040903@ichips.intel.com> Roland Dreier wrote: >I found at least one silly bug in my code that could cause the crash >you saw. Can you pull the latest svn kernel code and try again? > >By the way, I found regress.sh in the dapl source but I'm not sure >what it means to "set to 40x40." Can you explain that for a non-DAPL >expert? > >Thanks, > Roland > > > sorry, dapltest has parameters for threads and connections per thread that I use for my scale-up testing. I just took the "client 6" section of regress.sh (see below) and changed to -t 40 -w 40 and ran against a srv.sh running on a separate node. The device is the device name used in /etc/dat.conf and the host is the hostname (IPoIB ip address) where srv.sh is running. The current version of uDAPL does not have the async processing hooks so you may not see any problems. I am working on these changes now and will send out a patch shortly. I will give your latest changes a try. #==================================================================== #client6 #==================================================================== ./dapltest -T T -s ${host} -D ${device} -i 10000 -t 4 -w 8 \ client SR 256 \ server RW 4096 \ server SR 256 \ client SR 256 \ server RW 4096 \ server SR 256 \ client SR 4096 \ server SR 256 From rolandd at cisco.com Thu Sep 1 09:47:33 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 01 Sep 2005 09:47:33 -0700 Subject: [openib-general] at won't compile with gcc-2.95 Message-ID: <52slwo91lm.fsf@cisco.com> Not sure how much it matters, given that it seems all this will need to be rewritten before going upstream, but at doesn't compile with gcc-2.95: CC [M] drivers/infiniband/core/at.o In file included from /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at.c:57: /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:142: badly punctuated parameter list in `#define' In file included from /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at.c:57: /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:71: field `offset_words' already initialized /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:72: field `offset_bits' already initialized /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:72: duplicate initializer /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:72: (near initialization for `ats_rec_table[2].field_name') /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:73: field `size_bits' already initialized /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:73: warning: excess elements in struct initializer /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:73: warning: (near initialization for `ats_rec_table[2]') /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at.c: In function `ats_op_complete': /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at.c:134: warning: implicit declaration of function `WARN' make[4]: *** [drivers/infiniband/core/at.o] Error 1 make[3]: *** [drivers/infiniband/core] Error 2 make[2]: *** [drivers/infiniband] Error 2 make[1]: *** [drivers] Error 2 make: *** [_all] Error 2 From halr at voltaire.com Thu Sep 1 09:45:02 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 12:45:02 -0400 Subject: [openib-general] Re: OpenSM Build from osm level In-Reply-To: <43172E25.9030804@mellanox.co.il> References: <1125592374.4398.149.camel@hal.voltaire.com> <43172E25.9030804@mellanox.co.il> Message-ID: <1125593101.4398.169.camel@hal.voltaire.com> On Thu, 2005-09-01 at 12:36, Eitan Zahavi wrote: > > When I run autogen.sh from the osm level, libvendor now works (but > > opensm and osmtest don't) > Is this the 1.8.0 branch or the trunk? Same for this on both 1.8.0 branch and merge of 1.8.0 to trunk (with identical autogen.sh, configure.in, Makefile.am, at all levels from osm down and osm/config/osmvsel.m4 -- Hal From viswa.krish at gmail.com Thu Sep 1 09:49:18 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Thu, 1 Sep 2005 09:49:18 -0700 Subject: [openib-general] kernel oops Message-ID: <4df28be405090109492c323780@mail.gmail.com> I will try out this patch and let you know.. Hal Rosenstock wrote: > Here's a patch for this. Let me know if it works. [I tried it out and it > works for me.] If it does, the next question is how does the pointer get > trashed. I don't think that the pointer is getting trashed. The SA was not running, so I don't think that any route was returned. - Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Thu Sep 1 09:48:17 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 01 Sep 2005 19:48:17 +0300 Subject: [openib-general] Re: OpenSM Build from osm level In-Reply-To: <1125592374.4398.149.camel@hal.voltaire.com> References: <1125592374.4398.149.camel@hal.voltaire.com> Message-ID: <431730D1.3080402@mellanox.co.il> Hal Rosenstock wrote: > Hi Eitan, > > When I run autogen.sh from the osm level, libvendor now works (but > opensm and osmtest don't). Does the build fail or you only refer to the warning messages below? I was able to start fresh and complete a build after these warnings. The cause of the warning is the need to override the LINK rule such that the simulator build will use g++ for linking ... EZ > > -- Hal > > Visiting libvendor/autogen.sh > | + aclocal -I config -I ../config > | + libtoolize --force --copy > | Putting files in AC_CONFIG_AUX_DIR, `config'. > | + autoheader > | + automake --foreign --add-missing --copy > | + autoconf > Visiting opensm/autogen.sh > | + aclocal -I config -I ../config > | + libtoolize --force --copy > | Putting files in AC_CONFIG_AUX_DIR, `config'. > | + autoheader > | + automake --foreign --add-missing --copy > | automake: LINK was already defined in condition OSMV_SIM, which is included in condition TRUE ... > | Makefile.am:72: ... `LINK' previously defined here > | + autoconf > Visiting osmtest/autogen.sh > | + aclocal -I config -I ../config > | + libtoolize --force --copy > | Putting files in AC_CONFIG_AUX_DIR, `config'. > | + autoheader > | + automake --foreign --add-missing --copy > | automake: LINK was already defined in condition OSMV_SIM, which is included in condition TRUE ... > | Makefile.am:24: ... `LINK' previously defined here > | + autoconf From viswak at yahoo.com Thu Sep 1 09:58:23 2005 From: viswak at yahoo.com (viswanath krishnamurthy) Date: Thu, 1 Sep 2005 09:58:23 -0700 (PDT) Subject: [openib-general] Re: List of issues in uverbs In-Reply-To: <52mzmxd8ar.fsf@cisco.com> Message-ID: <20050901165823.59182.qmail@web33201.mail.mud.yahoo.com> --- Roland Dreier wrote: > viswanath> Here is new list of issues with > uverbs > > Thanks for the reports. > > viswanath> I have attached the firmware > version/svn info in the > viswanath> attachment. > > In the future can you attach things as text/plain > (or just include > them in your email)? If you attach it as > application/octet-stream > then I have to save the attachment and open it > manually, rather than > just reading it as part of your email. OK.. > > viswanath> 2. libmthca library crashes when a > server accepts lots > viswanath> of new incoming sessions. See log > (gdb) in the > viswanath> attachment. (It accepts about 170 > connections) Looks > viswanath> like a memory allocation issue. > > I found a few bugs in libmthca relating to > allocating doorbell records > for memfree HCAs. I've checked in fixes. Please > try the latest > subversion libmthca and let me know if it helps. This definitely helped. No more crashes in the library. Thanks > > viswanath> 3. Kernel oops when lots of traffic > between multiple > viswanath> clients and server. Very consistently > reproducible. > viswanath> See attachment for details > > Can you post the application you use to reproduce > this? I still see the crash with yesterday's checkout consistently at the same place. I will send the application today to reproduce. If some debug log needs to be collected let me know. > > Thanks, > Roland > Thanks, Viswa __________________________________ Yahoo! Mail Stay connected, organized, and protected. Take the tour: http://tour.mail.yahoo.com/mailtour.html From rolandd at cisco.com Thu Sep 1 10:02:08 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 01 Sep 2005 10:02:08 -0700 Subject: [openib-general] Re: List of issues in uverbs In-Reply-To: <20050901165823.59182.qmail@web33201.mail.mud.yahoo.com> (viswanath krishnamurthy's message of "Thu, 1 Sep 2005 09:58:23 -0700 (PDT)") References: <20050901165823.59182.qmail@web33201.mail.mud.yahoo.com> Message-ID: <52oe7c90xb.fsf@cisco.com> viswanath> I still see the crash with yesterday's checkout viswanath> consistently at the same place. I will send the viswanath> application today to reproduce. If some debug log needs viswanath> to be collected let me know. If you can just send the app and instructions on how to reproduce, that's perfect. That way we don't have to go back and forth adding tracing patches and so on -- I can just debug it directly. - R. From jlentini at netapp.com Thu Sep 1 10:07:19 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 1 Sep 2005 13:07:19 -0400 (EDT) Subject: [openib-general] Re: [PATCH] memory leaks in ipoib, srp In-Reply-To: <52ll2gahp7.fsf@cisco.com> References: <20050901071301.GF1707@mellanox.co.il> <52ll2gahp7.fsf@cisco.com> Message-ID: On Thu, 1 Sep 2005, Roland Dreier wrote: > Michael> I noticed the following while working on the client data patch. > > Good catch, applied. > > Michael> BTW, opinions on the newer version I sent? > > I'm undecided. On the one hand, it seems like a reasonable thing to > do. On the other hand, I'm not sure whether having two remove entry > points is just going to confuse people. And in any case I don't see > the real motivation for making the change now. I agree. Adding two removal functions is confusing. Michael, I liked your original idea of having the add upcall return a context value. The issue with that approach was that the return value was both an indication of the client's interest in a future device removal event and the client's context. Ideally, the client's context would remain opaque, but because it was being used for two things, this wasn't the case. Off the top of my head, I can think of two ways to fix that: make the return value a structure: struct ib_context { // you can probably think of a better name int subscribe; // ditto void *context; }; ib_context* (*add) (struct ib_device *); or add an additional parameter to the client's add call for the context: int (*add) (struct ib_device *, void **context); where int is !0 if the client want's to receive a removal callback. From ardavis at ichips.intel.com Thu Sep 1 10:07:29 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 01 Sep 2005 10:07:29 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <4317300F.9040903@ichips.intel.com> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <43162ECE.3070405@ichips.intel.com> <52y86gajnn.fsf@cisco.com> <4317300F.9040903@ichips.intel.com> Message-ID: <43173551.1080906@ichips.intel.com> Roland Dreier wrote: > >> I found at least one silly bug in my code that could cause the crash >> you saw. Can you pull the latest svn kernel code and try again? >> > Your latest changes in 3293 work great. thanks, -arlin From halr at voltaire.com Thu Sep 1 10:21:12 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 13:21:12 -0400 Subject: [openib-general] Re: at won't compile with gcc-2.95 In-Reply-To: <52slwo91lm.fsf@cisco.com> References: <52slwo91lm.fsf@cisco.com> Message-ID: <1125595270.4398.215.camel@hal.voltaire.com> On Thu, 2005-09-01 at 12:47, Roland Dreier wrote: > Not sure how much it matters, given that it seems all this will need > to be rewritten before going upstream, but at doesn't compile with gcc-2.95: > > CC [M] drivers/infiniband/core/at.o > In file included from /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at.c:57: > /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:142: badly punctuated parameter list in `#define' > In file included from /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at.c:57: > /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:71: field `offset_words' already initialized > /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:72: field `offset_bits' already initialized > /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:72: duplicate initializer > /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:72: (near initialization for `ats_rec_table[2].field_name') > /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:73: field `size_bits' already initialized > /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:73: warning: excess elements in struct initializer > /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at_priv.h:73: warning: (near initialization for `ats_rec_table[2]') > /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at.c: In function `ats_op_complete': > /data/home/roland/Src/linux-2.6.13/drivers/infiniband/core/at.c:134: warning: implicit declaration of function `WARN' > make[4]: *** [drivers/infiniband/core/at.o] Error 1 > make[3]: *** [drivers/infiniband/core] Error 2 > make[2]: *** [drivers/infiniband] Error 2 > make[1]: *** [drivers] Error 2 > make: *** [_all] Error 2 I see the cause of the duplicate initializer and have just checked in a fix for this. Not sure yet on what gcc 2.95 doesn't like about: #define WARN(fmt, arg ...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); but that is the cause of the other compile issue in at.c. -- Hal From rolandd at cisco.com Thu Sep 1 10:32:53 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 01 Sep 2005 10:32:53 -0700 Subject: [openib-general] Re: at won't compile with gcc-2.95 In-Reply-To: <1125595270.4398.215.camel@hal.voltaire.com> (Hal Rosenstock's message of "01 Sep 2005 13:21:12 -0400") References: <52slwo91lm.fsf@cisco.com> <1125595270.4398.215.camel@hal.voltaire.com> Message-ID: <527je08zi2.fsf@cisco.com> > Not sure yet on what gcc 2.95 doesn't like about: > #define WARN(fmt, arg ...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); > but that is the cause of the other compile issue in at.c. Probably it doesn't like the extra space in WARN(fmt, arg ...). Hmm... changing it to #define WARN(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); fixes the original complaint, but then gcc 2.95 doesn't like things like: WARN("pending request not found in parent request!"); ie WARN() with no arg parameter. Not sure how to make gcc 2.95 happy about that... - R. From halr at voltaire.com Thu Sep 1 11:01:16 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 14:01:16 -0400 Subject: [openib-general] Re: OpenSM Build from osm level In-Reply-To: <431730D1.3080402@mellanox.co.il> References: <1125592374.4398.149.camel@hal.voltaire.com> <431730D1.3080402@mellanox.co.il> Message-ID: <1125597675.4398.336.camel@hal.voltaire.com> On Thu, 2005-09-01 at 12:48, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > Hi Eitan, > > > > When I run autogen.sh from the osm level, libvendor now works (but > > opensm and osmtest don't). > Does the build fail or you only refer to the warning messages below? > I was able to start fresh and complete a build after these warnings. > > The cause of the warning is the need to override the LINK rule such that > the simulator build will use g++ for linking ... It was unclear whether it was a warning or an error. Can this be eliminated ? If I ignore that and continue, it configures and builds. -- Hal > EZ > > > > -- Hal > > > > Visiting libvendor/autogen.sh > > | + aclocal -I config -I ../config > > | + libtoolize --force --copy > > | Putting files in AC_CONFIG_AUX_DIR, `config'. > > | + autoheader > > | + automake --foreign --add-missing --copy > > | + autoconf > > Visiting opensm/autogen.sh > > | + aclocal -I config -I ../config > > | + libtoolize --force --copy > > | Putting files in AC_CONFIG_AUX_DIR, `config'. > > | + autoheader > > | + automake --foreign --add-missing --copy > > | automake: LINK was already defined in condition OSMV_SIM, which is included in condition TRUE ... > > | Makefile.am:72: ... `LINK' previously defined here > > | + autoconf > > Visiting osmtest/autogen.sh > > | + aclocal -I config -I ../config > > | + libtoolize --force --copy > > | Putting files in AC_CONFIG_AUX_DIR, `config'. > > | + autoheader > > | + automake --foreign --add-missing --copy > > | automake: LINK was already defined in condition OSMV_SIM, which is included in condition TRUE ... > > | Makefile.am:24: ... `LINK' previously defined here > > | + autoconf > > From jlentini at netapp.com Thu Sep 1 11:06:50 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 1 Sep 2005 14:06:50 -0400 (EDT) Subject: [openib-general] Re: at won't compile with gcc-2.95 In-Reply-To: <527je08zi2.fsf@cisco.com> References: <52slwo91lm.fsf@cisco.com> <1125595270.4398.215.camel@hal.voltaire.com> <527je08zi2.fsf@cisco.com> Message-ID: On Thu, 1 Sep 2005, Roland Dreier wrote: > > Not sure yet on what gcc 2.95 doesn't like about: > > #define WARN(fmt, arg ...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); > > but that is the cause of the other compile issue in at.c. > > Probably it doesn't like the extra space in WARN(fmt, arg ...). > Hmm... changing it to > > #define WARN(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); > > fixes the original complaint, but then gcc 2.95 doesn't like things like: > > WARN("pending request not found in parent request!"); > > ie WARN() with no arg parameter. Not sure how to make gcc 2.95 happy > about that... I think you want #define WARN(fmt, ...) \ printk("ib_at: %s: " fmt "\n", __FUNCTION__, __VA_ARGS__) (note that I've removed the ";") but I don't have gcc 2.95 to test it. Roland, out of curiosity, why are you using gcc 2.95? According to http://www.gnu.org/software/gcc/releases.html, it is fairly old. From arlin.r.davis at intel.com Thu Sep 1 11:08:15 2005 From: arlin.r.davis at intel.com (Arlin Davis) Date: Thu, 1 Sep 2005 11:08:15 -0700 Subject: [openib-general] [PATCH] uDAPL changes to support async events Message-ID: James, Here are the changes to support async events. Also consolidated the uAT,uCM,uCQ threads into one processing thread. Thanks, -arlin Signed-off-by: Arlin Davis ardavis at ichips.intel.com Index: dapl/openib/dapl_ib_util.c =================================================================== --- dapl/openib/dapl_ib_util.c (revision 3293) +++ dapl/openib/dapl_ib_util.c (working copy) @@ -55,13 +55,14 @@ #include #include -#include -#include -#include -#include - -int g_dapl_loopback_connection = 0; +#include +int g_dapl_loopback_connection = 0; +int g_ib_destroy = 0; +int g_ib_pipe[2]; +DAPL_OS_THREAD g_ib_thread; +DAPL_OS_LOCK g_hca_lock; +struct dapl_llist_entry *g_hca_list; /* just get IP address, IPv4 only for now */ int dapli_get_hca_addr( struct dapl_hca *hca_ptr ) @@ -130,7 +131,18 @@ int32_t dapls_ib_init (void) { dapl_dbg_log (DAPL_DBG_TYPE_UTIL, " dapl_ib_init: \n" ); - if (dapli_cm_thread_init() || dapli_at_thread_init()) + + /* initialize hca_list lock */ + dapl_os_lock_init(&g_hca_lock); + + /* initialize hca list for CQ events */ + dapl_llist_init_head(&g_hca_list); + + /* create pipe for waking up work thread */ + if (pipe(g_ib_pipe)) + return 1; + + if (dapli_ib_thread_init()) return 1; return 0; @@ -139,8 +151,7 @@ int32_t dapls_ib_release (void) { dapl_dbg_log (DAPL_DBG_TYPE_UTIL, " dapl_ib_release: \n" ); - dapli_at_thread_destroy(); - dapli_cm_thread_destroy(); + dapli_ib_thread_destroy(); return 0; } @@ -196,6 +207,7 @@ ibv_get_device_name(hca_ptr->ib_trans.ib_dev) ); return DAT_INTERNAL_ERROR; } + hca_ptr->ib_trans.ib_ctx = hca_ptr->ib_hca_handle; /* set inline max with enviromment or default, get local lid and gid 0 */ hca_ptr->ib_trans.max_inline_send = @@ -223,19 +235,22 @@ goto bail; } - /* one thread for each device open */ - if (dapli_cq_thread_init(hca_ptr)) { - dapl_dbg_log (DAPL_DBG_TYPE_ERR, - " open_hca: cq_thread_init failed for %s\n", - ibv_get_device_name(hca_ptr->ib_trans.ib_dev) ); - goto bail; - } + /* initialize hca wait object for uAT event */ + dapl_os_wait_object_init(&hca_ptr->ib_trans.wait_object); - /* initialize cq_lock and wait object */ - dapl_os_lock_init(&hca_ptr->ib_trans.cq_lock); - dapl_os_wait_object_init (&hca_ptr->ib_trans.wait_object); - - dapl_dbg_log (DAPL_DBG_TYPE_UTIL, + /* + * Put new hca_transport on list for async and CQ event processing + * Wakeup work thread to add to polling list + */ + dapl_llist_init_entry((DAPL_LLIST_ENTRY*)&hca_ptr->ib_trans.entry); + dapl_os_lock( &g_hca_lock ); + dapl_llist_add_tail(&g_hca_list, + (DAPL_LLIST_ENTRY*)&hca_ptr->ib_trans.entry, + &hca_ptr->ib_trans.entry); + write(g_ib_pipe[1], "w", sizeof "w"); + dapl_os_unlock(&g_hca_lock); + + dapl_dbg_log (DAPL_DBG_TYPE_UTIL, " open_hca: %s, port %d, %s %d.%d.%d.%d INLINE_MAX=%d\n", ibv_get_device_name(hca_ptr->ib_trans.ib_dev), hca_ptr->port_num, ((struct sockaddr_in *)&hca_ptr->hca_address)->sin_family == AF_INET ? "AF_INET":"AF_INET6", @@ -245,7 +260,6 @@ ((struct sockaddr_in *)&hca_ptr->hca_address)->sin_addr.s_addr >> 24 & 0xff, hca_ptr->ib_trans.max_inline_send ); - return DAT_SUCCESS; bail: @@ -276,16 +290,28 @@ dapl_dbg_log (DAPL_DBG_TYPE_UTIL," close_hca: %p->%p\n", hca_ptr,hca_ptr->ib_hca_handle); - dapli_cq_thread_destroy(hca_ptr); - if (hca_ptr->ib_hca_handle != IB_INVALID_HANDLE) { if (ibv_close_device(hca_ptr->ib_hca_handle)) return(dapl_convert_errno(errno,"ib_close_device")); hca_ptr->ib_hca_handle = IB_INVALID_HANDLE; } - - dapl_os_lock_destroy(&hca_ptr->ib_trans.cq_lock); + /* + * Remove hca from async and CQ event processing list + * Wakeup work thread to remove from polling list + */ + hca_ptr->ib_trans.destroy = 1; + write(g_ib_pipe[1], "w", sizeof "w"); + + /* wait for thread to remove HCA references */ + while (hca_ptr->ib_trans.destroy != 2) { + struct timespec sleep, remain; + sleep.tv_sec = 0; + sleep.tv_nsec = 10000000; /* 10 ms */ + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, + " ib_thread_destroy: waiting on hca %p destroy\n"); + nanosleep (&sleep, &remain); + } return (DAT_SUCCESS); } @@ -432,31 +458,285 @@ IN void *context ) { - ib_hca_transport_t *hca_ptr; + ib_hca_transport_t *hca_ptr; - dapl_dbg_log (DAPL_DBG_TYPE_UTIL, - " setup_async_cb: ia %p type %d handle %p cb %p ctx %p\n", - ia_ptr, handler_type, evd_ptr, callback, context); - - hca_ptr = &ia_ptr->hca_ptr->ib_trans; - switch(handler_type) - { - case DAPL_ASYNC_UNAFILIATED: - hca_ptr->async_unafiliated = callback; - break; - case DAPL_ASYNC_CQ_ERROR: - hca_ptr->async_cq_error = callback; - break; - case DAPL_ASYNC_CQ_COMPLETION: - hca_ptr->async_cq = callback; - break; - case DAPL_ASYNC_QP_ERROR: - hca_ptr->async_qp_error = callback; - break; - default: - break; - } - return DAT_SUCCESS; + dapl_dbg_log (DAPL_DBG_TYPE_UTIL, + " setup_async_cb: ia %p type %d handle %p cb %p ctx %p\n", + ia_ptr, handler_type, evd_ptr, callback, context); + + hca_ptr = &ia_ptr->hca_ptr->ib_trans; + switch(handler_type) + { + case DAPL_ASYNC_UNAFILIATED: + hca_ptr->async_unafiliated = callback; + hca_ptr->async_un_ctx = context; + break; + case DAPL_ASYNC_CQ_ERROR: + hca_ptr->async_cq_error = callback; + hca_ptr->async_cq_ctx = context; + break; + case DAPL_ASYNC_CQ_COMPLETION: + hca_ptr->async_cq = callback; + hca_ptr->async_ctx = context; + break; + case DAPL_ASYNC_QP_ERROR: + hca_ptr->async_qp_error = callback; + hca_ptr->async_qp_ctx = context; + break; + default: + break; + } + return DAT_SUCCESS; } +int dapli_ib_thread_init(void) +{ + DAT_RETURN dat_status; + + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, + " ib_thread_init(%d)\n", getpid()); + + /* create thread to process inbound connect request */ + dat_status = dapl_os_thread_create(dapli_thread, NULL, &g_ib_thread); + if (dat_status != DAT_SUCCESS) + { + dapl_dbg_log(DAPL_DBG_TYPE_ERR, + " ib_thread_init: failed to create thread\n"); + return 1; + } + return 0; +} + +void dapli_ib_thread_destroy(void) +{ + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, + " ib_thread_destroy(%d)\n", getpid()); + + /* destroy ib_thread, wait for termination */ + g_ib_destroy = 1; + write(g_ib_pipe[1], "w", sizeof "w"); + while (g_ib_destroy != 2) { + struct timespec sleep, remain; + sleep.tv_sec = 0; + sleep.tv_nsec = 10000000; /* 10 ms */ + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, + " ib_thread_destroy: waiting for ib_thread\n"); + nanosleep(&sleep, &remain); + } + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, + " ib_thread_destroy(%d) exit\n",getpid()); +} + +void dapli_async_event_cb(struct _ib_hca_transport *hca) +{ + struct ibv_async_event event; + struct pollfd async_fd = { + .fd = hca->ib_ctx->async_fd, + .events = POLLIN, + .revents = 0 + }; + + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " dapli_async_event_cb(%p)\n",hca); + + if (hca->destroy) + return; + + if ((poll(&async_fd, 1, 0)==1) && + (!ibv_get_async_event(hca->ib_ctx, &event))) { + + switch (event.event_type) { + + case IBV_EVENT_CQ_ERR: + { + dapl_dbg_log(DAPL_DBG_TYPE_WARN, + " dapli_async_event CQ ERR %d\n", + event.event_type); + + /* report up if async callback still setup */ + if (hca->async_cq_error) + hca->async_cq_error(hca->ib_ctx, + &event, + hca->async_cq_ctx); + break; + } + case IBV_EVENT_COMM_EST: + { + /* Received messages on connected QP before RTU */ + struct dapl_ep *ep_ptr = event.element.qp->qp_context; + + /* TODO: cannot process COMM_EST until ibv + * guarantees valid QP context for events. + * Race conditions exist with QP destroy call. + * For now, assume the RTU will arrive. + */ + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, + " dapli_async_event COMM_EST (qp=%p)\n", + event.element.qp); + + if (!DAPL_BAD_HANDLE(ep_ptr, DAPL_MAGIC_EP) && + ep_ptr->cm_handle != IB_INVALID_HANDLE) + ib_cm_establish(ep_ptr->cm_handle->cm_id); + + break; + } + case IBV_EVENT_QP_FATAL: + case IBV_EVENT_QP_REQ_ERR: + case IBV_EVENT_QP_ACCESS_ERR: + case IBV_EVENT_QP_LAST_WQE_REACHED: + case IBV_EVENT_SRQ_ERR: + case IBV_EVENT_SRQ_LIMIT_REACHED: + case IBV_EVENT_SQ_DRAINED: + { + dapl_dbg_log(DAPL_DBG_TYPE_WARN, + " dapli_async_event QP ERR %d\n", + event.event_type); + + /* report up if async callback still setup */ + if (hca->async_qp_error) + hca->async_qp_error(hca->ib_ctx, + &event, + hca->async_qp_ctx); + break; + } + case IBV_EVENT_PATH_MIG: + case IBV_EVENT_PATH_MIG_ERR: + case IBV_EVENT_DEVICE_FATAL: + case IBV_EVENT_PORT_ACTIVE: + case IBV_EVENT_PORT_ERR: + case IBV_EVENT_LID_CHANGE: + case IBV_EVENT_PKEY_CHANGE: + case IBV_EVENT_SM_CHANGE: + { + dapl_dbg_log(DAPL_DBG_TYPE_WARN, + " dapli_async_event DEV ERR %d\n", + event.event_type); + + /* report up if async callback still setup */ + if (hca->async_unafiliated) + hca->async_unafiliated( + hca->ib_ctx, + &event, + hca->async_un_ctx); + break; + } + default: + { + dapl_dbg_log (DAPL_DBG_TYPE_WARN, + "--> DsEventCb: UNKNOWN\n"); + break; + } + } + ibv_put_async_event(&event); + } +} + + +/* work thread for uAT, uCM, CQ, and async events */ +void dapli_thread(void *arg) +{ + struct pollfd ufds[__FD_SETSIZE]; + struct _ib_hca_transport *uhca[__FD_SETSIZE]={NULL}; + struct _ib_hca_transport *hca; + int ret,idx,fds; + char rbuf[2]; + + dapl_dbg_log (DAPL_DBG_TYPE_UTIL, + " ib_thread(%d,0x%x): ENTER: pipe %d cm %d at %d\n", + getpid(), g_ib_thread, + g_ib_pipe[0], ib_cm_get_fd(), + ib_at_get_fd()); + + /* Poll across pipe, CM, AT never changes */ + dapl_os_lock( &g_hca_lock ); + + ufds[0].fd = g_ib_pipe[0]; /* pipe */ + ufds[0].events = POLLIN; + ufds[1].fd = ib_cm_get_fd(); /* uCM */ + ufds[1].events = POLLIN; + ufds[2].fd = ib_at_get_fd(); /* uAT */ + ufds[2].events = POLLIN; + + while (!g_ib_destroy) { + + /* build ufds after pipe, cm, at events */ + ufds[0].revents = 0; + ufds[1].revents = 0; + ufds[2].revents = 0; + idx=2; + + /* Walk HCA list and setup async and CQ events */ + if (!dapl_llist_is_empty(&g_hca_list)) + hca = dapl_llist_peek_head(&g_hca_list); + else + hca = NULL; + + while(hca) { + int i; + ufds[++idx].fd = hca->ib_ctx->async_fd; /* uASYNC */ + ufds[idx].events = POLLIN; + ufds[idx].revents = 0; + uhca[idx] = hca; + for (i=0;iib_ctx->num_comp;i++) { /* uCQ */ + ufds[++idx].fd = hca->ib_ctx->cq_fd[i]; + ufds[idx].events = POLLIN; + ufds[idx].revents = 0; + uhca[idx] = hca; + } + hca = dapl_llist_next_entry( + &g_hca_list, + (DAPL_LLIST_ENTRY*)&hca->entry); + } + + /* unlock, and setup poll */ + fds = idx+1; + dapl_os_unlock(&g_hca_lock); + ret = poll(ufds, fds, -1); + if (ret <= 0) { + dapl_dbg_log(DAPL_DBG_TYPE_WARN, + " ib_thread(%d): ERR %s poll\n", + getpid(),strerror(errno)); + dapl_os_lock(&g_hca_lock); + continue; + } + + /* check and process CQ and ASYNC events, each open device */ + for(idx=3;idxdestroy == 1) { + dapl_os_lock(&g_hca_lock); + dapl_llist_remove_entry( + &g_hca_list, + (DAPL_LLIST_ENTRY*) + &uhca[idx]->entry); + dapl_os_unlock(&g_hca_lock); + uhca[idx]->destroy = 2; + } + } + } + + /* CM and AT events */ + if (ufds[1].revents == POLLIN) + dapli_cm_event_cb(); + + if (ufds[2].revents == POLLIN) + dapli_at_event_cb(); + + dapl_os_lock(&g_hca_lock); + } + dapl_dbg_log(DAPL_DBG_TYPE_UTIL," ib_thread(%d) EXIT\n",getpid()); + g_ib_destroy = 2; + dapl_os_unlock(&g_hca_lock); +} Index: dapl/openib/dapl_ib_cm.c =================================================================== --- dapl/openib/dapl_ib_cm.c (revision 3293) +++ dapl/openib/dapl_ib_cm.c (working copy) @@ -70,90 +70,6 @@ static inline uint64_t cpu_to_be64(uint64_t x) { return x; } #endif -static int g_at_destroy; -static DAPL_OS_THREAD g_at_thread; -static int g_cm_destroy; -static DAPL_OS_THREAD g_cm_thread; -static DAPL_OS_LOCK g_cm_lock; -static struct dapl_llist_entry *g_cm_list; - -int dapli_cm_thread_init(void) -{ - DAT_RETURN dat_status; - - dapl_dbg_log(DAPL_DBG_TYPE_CM," cm_thread_init(%d)\n", getpid()); - - /* initialize cr_list lock */ - dapl_os_lock_init(&g_cm_lock); - - /* initialize CM list for listens on this HCA */ - dapl_llist_init_head(&g_cm_list); - - /* create thread to process inbound connect request */ - dat_status = dapl_os_thread_create(cm_thread, NULL, &g_cm_thread); - if (dat_status != DAT_SUCCESS) - { - dapl_dbg_log(DAPL_DBG_TYPE_ERR, - " cm_thread_init: failed to create thread\n"); - return 1; - } - return 0; -} - -void dapli_cm_thread_destroy(void) -{ - dapl_dbg_log(DAPL_DBG_TYPE_CM," cm_thread_destroy(%d)\n", getpid()); - - /* destroy cr_thread and lock */ - g_cm_destroy = 1; - pthread_kill( g_cm_thread, SIGUSR1 ); - dapl_dbg_log(DAPL_DBG_TYPE_CM," cm_thread_destroy(%d) SIGUSR1 sent\n",getpid()); - while (g_cm_destroy) { - struct timespec sleep, remain; - sleep.tv_sec = 0; - sleep.tv_nsec = 10000000; /* 10 ms */ - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " cm_thread_destroy: waiting for cm_thread\n"); - nanosleep (&sleep, &remain); - } - dapl_dbg_log(DAPL_DBG_TYPE_CM," cm_thread_destroy(%d) exit\n",getpid()); -} - -int dapli_at_thread_init(void) -{ - DAT_RETURN dat_status; - - dapl_dbg_log(DAPL_DBG_TYPE_CM," at_thread_init(%d)\n", getpid()); - - /* create thread to process AT async requests */ - dat_status = dapl_os_thread_create(at_thread, NULL, &g_at_thread); - if (dat_status != DAT_SUCCESS) - { - dapl_dbg_log(DAPL_DBG_TYPE_ERR, - " at_thread_init: failed to create thread\n"); - return 1; - } - return 0; -} - -void dapli_at_thread_destroy(void) -{ - dapl_dbg_log(DAPL_DBG_TYPE_CM," at_thread_destroy(%d)\n", getpid()); - - /* destroy cr_thread and lock */ - g_at_destroy = 1; - pthread_kill( g_at_thread, SIGUSR1 ); - dapl_dbg_log(DAPL_DBG_TYPE_CM," at_thread_destroy(%d) SIGUSR1 sent\n",getpid()); - while (g_at_destroy) { - struct timespec sleep, remain; - sleep.tv_sec = 0; - sleep.tv_nsec = 10000000; /* 10 ms */ - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " at_thread_destroy: waiting for at_thread\n"); - nanosleep (&sleep, &remain); - } - dapl_dbg_log(DAPL_DBG_TYPE_CM," at_thread_destroy(%d) exit\n",getpid()); -} void dapli_ip_comp_handler(uint64_t req_id, void *context, int rec_num) { @@ -348,12 +264,6 @@ if (conn->ep) conn->ep->cm_handle = IB_INVALID_HANDLE; - /* take off the CM thread work queue and free */ - dapl_os_lock( &g_cm_lock ); - dapl_llist_remove_entry(&g_cm_list, - (DAPL_LLIST_ENTRY*)&conn->entry); - dapl_os_unlock(&g_cm_lock); - dapl_os_free(conn, sizeof(*conn)); } } @@ -426,8 +336,8 @@ if (new_conn) { (void)dapl_os_memzero(new_conn, sizeof(*new_conn)); - dapl_os_lock_init(&new_conn->lock); new_conn->cm_id = event->cm_id; /* provided by uCM */ + event->cm_id->context = new_conn; /* update CM_ID context */ new_conn->sp = conn->sp; new_conn->hca = conn->hca; new_conn->service_id = conn->service_id; @@ -444,13 +354,6 @@ event->param.req_rcvd.primary_path, sizeof(struct ib_sa_path_rec)); - /* put new CR on CM thread event work queue */ - dapl_llist_init_entry((DAPL_LLIST_ENTRY*)&new_conn->entry); - dapl_os_lock( &g_cm_lock ); - dapl_llist_add_tail(&g_cm_list, - (DAPL_LLIST_ENTRY*)&new_conn->entry, new_conn); - dapl_os_unlock(&g_cm_lock); - dapl_dbg_log(DAPL_DBG_TYPE_CM, " passive_cb: " "REQ on HCA %p SP %p SID %d L_ID %d new_id %d p_data %p\n", new_conn->hca, new_conn->sp, @@ -521,18 +424,13 @@ if (conn->ep) conn->ep->cm_handle = IB_INVALID_HANDLE; - /* take off the CM thread work queue and free */ - dapl_os_lock( &g_cm_lock ); - dapl_llist_remove_entry(&g_cm_list, - (DAPL_LLIST_ENTRY*)&conn->entry); - dapl_os_unlock(&g_cm_lock); dapl_os_free(conn, sizeof(*conn)); } return(destroy); } static int dapli_cm_passive_cb(struct dapl_cm_id *conn, - struct ib_cm_event *event) + struct ib_cm_event *event) { int destroy; struct dapl_cm_id *new_conn; @@ -541,9 +439,6 @@ " passive_cb: conn %p id %d event %d\n", conn, conn->cm_id, event->event ); - if (conn->cm_id == 0) - return 0; - dapl_os_lock(&conn->lock); if (conn->destroy) { dapl_os_unlock(&conn->lock); @@ -608,155 +503,11 @@ if (conn->ep) conn->ep->cm_handle = IB_INVALID_HANDLE; - /* take off the CM thread work queue and free */ - dapl_os_lock( &g_cm_lock ); - dapl_llist_remove_entry(&g_cm_list, - (DAPL_LLIST_ENTRY*)&conn->entry); - dapl_os_unlock(&g_cm_lock); - dapl_os_free(conn, sizeof(*conn)); } return(destroy); } -/* something to catch the signal */ -static void ib_sig_handler(int signum) -{ - return; -} - -/* async CM processing thread */ -void cm_thread(void *arg) -{ - struct dapl_cm_id *conn, *next_conn; - struct ib_cm_event *event; - struct pollfd ufds; - sigset_t sigset; - - dapl_dbg_log (DAPL_DBG_TYPE_CM, - " cm_thread(%d,0x%x): ENTER: cm_fd %d\n", - getpid(), g_cm_thread, ib_cm_get_fd()); - - sigemptyset(&sigset); - sigaddset(&sigset, SIGUSR1); - pthread_sigmask(SIG_UNBLOCK, &sigset, NULL); - signal( SIGUSR1, ib_sig_handler); - - dapl_os_lock( &g_cm_lock ); - while (!g_cm_destroy) { - struct ib_cm_id *cm_id; - int ret; - - /* select for CM event, all events process via cm_fd */ - ufds.fd = ib_cm_get_fd(); - ufds.events = POLLIN; - ufds.revents = 0; - - dapl_os_unlock(&g_cm_lock); - ret = poll(&ufds, 1, -1); - if (ret <= 0) { - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " cm_thread(%d): ERR %s poll\n", - getpid(),strerror(errno)); - dapl_os_lock(&g_cm_lock); - continue; - } - - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " cm_thread: GET EVENT fd=%d n=%d\n", - ib_cm_get_fd(),ret); - - if (ib_cm_event_get_timed(0,&event)) { - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " cm_thread: ERR %s event_get on %d\n", - strerror(errno), ib_cm_get_fd() ); - dapl_os_lock(&g_cm_lock); - continue; - } - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " cm_thread: GET EVENT fd=%d woke\n",ib_cm_get_fd()); - dapl_os_lock(&g_cm_lock); - - /* set proper cm_id */ - if (event->event == IB_CM_REQ_RECEIVED || - event->event == IB_CM_SIDR_REQ_RECEIVED) - cm_id = event->param.req_rcvd.listen_id; - else - cm_id = event->cm_id; - - dapl_dbg_log (DAPL_DBG_TYPE_CM, - " cm_thread: EVENT event(%d) cm_id=%d (%d)\n", - event->event, event->cm_id, cm_id ); - - /* - * Walk cm_list looking for connection id in event - * no need to walk if uCM would provide context with event - */ - if (!dapl_llist_is_empty(&g_cm_list)) - next_conn = dapl_llist_peek_head(&g_cm_list); - else - next_conn = NULL; - - ret = 0; - while (next_conn) { - conn = next_conn; - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " cm_thread: LIST cm %p c_id %d e_id %d)\n", - conn, conn->cm_id, cm_id ); - - next_conn = dapl_llist_next_entry( - &g_cm_list, - (DAPL_LLIST_ENTRY*)&conn->entry ); - - if (cm_id == conn->cm_id) { - dapl_os_unlock(&g_cm_lock); - if (conn->sp) - ret = dapli_cm_passive_cb(conn,event); - else - ret = dapli_cm_active_cb(conn,event); - dapl_os_lock(&g_cm_lock); - break; - } - } - ib_cm_event_put(event); - if (ret) { - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " cm_thread: destroy cm_id %d\n",cm_id); - ib_cm_destroy_id(cm_id); - } - } - dapl_os_unlock(&g_cm_lock); - dapl_dbg_log(DAPL_DBG_TYPE_CM," cm_thread(%d) EXIT, cm_list=%s\n", - getpid(),dapl_llist_is_empty(&g_cm_list) ? "EMPTY":"NOT EMPTY"); - g_cm_destroy = 0; -} - -/* async AT processing thread */ -void at_thread(void *arg) -{ - sigset_t sigset; - - dapl_dbg_log (DAPL_DBG_TYPE_CM, - " at_thread(%d,0x%x): ENTER: at_fd %d\n", - getpid(), g_at_thread, ib_at_get_fd()); - - sigemptyset(&sigset); - sigaddset(&sigset, SIGUSR1); - pthread_sigmask(SIG_UNBLOCK, &sigset, NULL); - signal(SIGUSR1, ib_sig_handler); - - while (!g_at_destroy) { - /* poll forever until callback or signal */ - if (ib_at_callback_get_timed(-1) < 0) { - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " at_thread: SIG? ret=%s, destroy=%d\n", - strerror(errno), g_at_destroy ); - } - dapl_dbg_log(DAPL_DBG_TYPE_CM," at_thread: callback woke\n"); - } - dapl_dbg_log(DAPL_DBG_TYPE_CM," at_thread(%d) EXIT \n", getpid()); - g_at_destroy = 0; -} /************************ DAPL provider entry points **********************/ @@ -853,13 +604,6 @@ conn->retries = 0; dapl_os_memcpy(&conn->r_addr, r_addr, sizeof(DAT_SOCK_ADDR6)); - /* put on CM thread work queue */ - dapl_llist_init_entry((DAPL_LLIST_ENTRY*)&conn->entry); - dapl_os_lock( &g_cm_lock ); - dapl_llist_add_tail(&g_cm_list, - (DAPL_LLIST_ENTRY*)&conn->entry, conn); - dapl_os_unlock(&g_cm_lock); - status = ib_at_route_by_ip( ((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, ((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr, @@ -1019,13 +763,6 @@ conn->hca = ia_ptr->hca_ptr; conn->service_id = ServiceID; - /* put on CM thread work queue */ - dapl_llist_init_entry((DAPL_LLIST_ENTRY*)&conn->entry); - dapl_os_lock( &g_cm_lock ); - dapl_llist_add_tail(&g_cm_list, - (DAPL_LLIST_ENTRY*)&conn->entry, conn); - dapl_os_unlock(&g_cm_lock); - dapl_dbg_log(DAPL_DBG_TYPE_EP, " setup_listener(conn=%p cm_id=%d)\n", sp_ptr->cm_srvc_handle,conn->cm_id); @@ -1345,8 +1082,6 @@ return size; } -#ifndef SOCKET_CM - /* * Map all socket CM event codes to the DAT equivelent. */ @@ -1457,7 +1192,44 @@ return ib_cm_event; } -#endif +void dapli_cm_event_cb() +{ + struct ib_cm_event *event; + + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " dapli_cm_event()\n"); + + /* process one CM event, fairness */ + if(!ib_cm_event_get_timed(0,&event)) { + struct dapl_cm_id *conn; + int ret; + dapl_dbg_log(DAPL_DBG_TYPE_CM, + " dapli_cm_event: EVENT=%p ID=%p CTX=%p\n", + event->event, event->cm_id, + event->cm_id->context); + + /* set proper conn from cm_id context*/ + conn = (struct dapl_cm_id*)event->cm_id->context; + + if (conn->sp) + ret = dapli_cm_passive_cb(conn,event); + else + ret = dapli_cm_active_cb(conn,event); + + ib_cm_event_put(event); + + if (ret) + ib_cm_destroy_id(conn->cm_id); + } +} + +void dapli_at_event_cb() +{ + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " dapli_at_event_cb()\n"); + + /* process one AT event, fairness */ + ib_at_callback_get_timed(0); +} + /* * Local variables: Index: dapl/openib/dapl_ib_util.h =================================================================== --- dapl/openib/dapl_ib_util.h (revision 3293) +++ dapl/openib/dapl_ib_util.h (working copy) @@ -231,18 +231,22 @@ /* ib_hca_transport_t, specific to this implementation */ typedef struct _ib_hca_transport { - struct ibv_device *ib_dev; + struct ib_llist_entry entry; + int destroy; + struct ibv_device *ib_dev; + struct ibv_context *ib_ctx; ib_cq_handle_t ib_cq_empty; - DAPL_OS_LOCK cq_lock; DAPL_OS_WAIT_OBJECT wait_object; - int cq_destroy; - DAPL_OS_THREAD cq_thread; int max_inline_send; union ibv_gid gid; ib_async_handler_t async_unafiliated; + void *async_un_ctx; ib_async_handler_t async_cq_error; + void *async_ctx; ib_async_handler_t async_cq; + void *async_cq_ctx; ib_async_handler_t async_qp_error; + void *async_qp_ctx; } ib_hca_transport_t; @@ -252,21 +256,15 @@ /* prototypes */ int32_t dapls_ib_init (void); int32_t dapls_ib_release (void); -void cm_thread (void *arg); -int dapli_cm_thread_init(void); -void dapli_cm_thread_destroy(void); -void at_thread (void *arg); -int dapli_at_thread_init(void); -void dapli_at_thread_destroy(void); -void cq_thread (void *arg); -int dapli_cq_thread_init(struct dapl_hca *hca_ptr); -void dapli_cq_thread_destroy(struct dapl_hca *hca_ptr); - -int dapli_get_lid(struct dapl_hca *hca_ptr, int port, uint16_t *lid); -int dapli_get_gid(struct dapl_hca *hca_ptr, int port, int index, - union ibv_gid *gid); -int dapli_get_hca_addr(struct dapl_hca *hca_ptr); +void dapli_thread(void *arg); +int dapli_ib_thread_init(void); +void dapli_ib_thread_destroy(void); +int dapli_get_hca_addr(struct dapl_hca *hca_ptr); void dapli_ip_comp_handler(uint64_t req_id, void *context, int rec_num); +void dapli_cm_event_cb(void); +void dapli_at_event_cb(void); +void dapli_cq_event_cb(struct _ib_hca_transport *hca); +void dapli_async_event_cb(struct _ib_hca_transport *hca); DAT_RETURN dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, Index: dapl/openib/dapl_ib_cq.c =================================================================== --- dapl/openib/dapl_ib_cq.c (revision 3293) +++ dapl/openib/dapl_ib_cq.c (working copy) @@ -52,94 +52,40 @@ #include "dapl_evd_util.h" #include "dapl_ring_buffer_util.h" #include -#include -int dapli_cq_thread_init(struct dapl_hca *hca_ptr) +void dapli_cq_event_cb(struct _ib_hca_transport *hca) { - DAT_RETURN dat_status; + int i; + dapl_dbg_log(DAPL_DBG_TYPE_UTIL," dapli_cq_event_cb(%p)\n", hca); - dapl_dbg_log(DAPL_DBG_TYPE_UTIL," cq_thread_init(%p)\n", hca_ptr); - - /* create thread to process inbound connect request */ - dat_status = dapl_os_thread_create( cq_thread, (void*)hca_ptr,&hca_ptr->ib_trans.cq_thread); - if (dat_status != DAT_SUCCESS) - { - dapl_dbg_log(DAPL_DBG_TYPE_ERR, - " cq_thread_init: failed to create thread\n"); - return 1; - } - return 0; -} - -void dapli_cq_thread_destroy(struct dapl_hca *hca_ptr) -{ - dapl_dbg_log(DAPL_DBG_TYPE_UTIL," cq_thread_destroy(%p)\n", hca_ptr); - - /* destroy cr_thread and lock */ - hca_ptr->ib_trans.cq_destroy = 1; - pthread_kill(hca_ptr->ib_trans.cq_thread, SIGUSR1); - dapl_dbg_log(DAPL_DBG_TYPE_CM," cq_thread_destroy(%p) SIGUSR1 sent\n",hca_ptr); - while (hca_ptr->ib_trans.cq_destroy != 2) { - struct timespec sleep, remain; - sleep.tv_sec = 0; - sleep.tv_nsec = 10000000; /* 10 ms */ - dapl_dbg_log(DAPL_DBG_TYPE_UTIL, - " cq_thread_destroy: waiting for cq_thread\n"); - nanosleep (&sleep, &remain); - } - dapl_dbg_log(DAPL_DBG_TYPE_UTIL," cq_thread_destroy(%d) exit\n",getpid()); - return; -} - -/* something to catch the signal */ -static void ib_cq_handler(int signum) -{ - return; -} - -void cq_thread( void *arg ) -{ - struct dapl_hca *hca_ptr = arg; - struct dapl_evd *evd_ptr; - struct ibv_cq *ibv_cq = NULL; - sigset_t sigset; - int status = 0; - - dapl_dbg_log ( DAPL_DBG_TYPE_UTIL," cq_thread: ENTER hca %p\n",hca_ptr); - - sigemptyset(&sigset); - sigaddset(&sigset,SIGUSR1); - pthread_sigmask(SIG_UNBLOCK, &sigset, NULL); - signal(SIGUSR1, ib_cq_handler); - - /* wait on DTO event, or signal to abort */ - while (!hca_ptr->ib_trans.cq_destroy) { - - struct pollfd cq_poll = { - .fd = hca_ptr->ib_hca_handle->cq_fd[0], + /* check all comp events on this device */ + for(i=0;iib_ctx->num_comp;i++) { + struct dapl_evd *evd_ptr = NULL; + struct ibv_cq *ibv_cq = NULL; + struct pollfd cq_fd = { + .fd = hca->ib_ctx->cq_fd[i], .events = POLLIN, .revents = 0 }; - - status = poll(&cq_poll, 1, -1); - if ((status == 1) && - (!ibv_get_cq_event(hca_ptr->ib_hca_handle, 0, &ibv_cq, (void*)&evd_ptr))) { - + if ((poll(&cq_fd, 1, 0) == 1) && + (!ibv_get_cq_event(hca->ib_ctx, i, + &ibv_cq, (void*)&evd_ptr))) { + + /* + * TODO: ibv put event to protect against + * destroy CQ race conditions? + */ if (DAPL_BAD_HANDLE(evd_ptr, DAPL_MAGIC_EVD)) continue; /* process DTO event via callback */ - dapl_evd_dto_callback ( evd_ptr->header.owner_ia->hca_ptr->ib_hca_handle, + dapl_evd_dto_callback ( hca->ib_ctx, evd_ptr->ib_cq_handle, (void*)evd_ptr ); - } else { - - } - } - hca_ptr->ib_trans.cq_destroy = 2; - dapl_dbg_log(DAPL_DBG_TYPE_UTIL," cq_thread: EXIT: hca %p \n", hca_ptr); - return; + } + } } + /* * Map all verbs DTO completion codes to the DAT equivelent. * -------------- next part -------------- An HTML attachment was scrubbed... URL: From ardavis at ichips.intel.com Thu Sep 1 11:21:55 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 01 Sep 2005 11:21:55 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <528xyidkfi.fsf@cisco.com> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> Message-ID: <431746C3.3050300@ichips.intel.com> Roland Dreier wrote: >OK, I checked in changes to libibverbs and the kernel uverbs to handle >cleaning up stale events when destroying a CQ/QP/SRQ. All the changes >are in svn r3279. > > Roland, Shouldn't there be a new ibv_put_cq_event() to go with the ibv_get_cq_event() ? -arlin >The changes require a kernel ABI bump. The new libibverbs works with >both the old kernel and new kernel, but the old libibverbs will only >work with the old kernel. So in other words, if you upgrade your >kernel, then make sure you upgrade libibverbs as well. If you upgrade >libibverbs, then you don't have to upgrade your kernel but you can if >you want. (Confused yet? Or should I write still more?) > >I did some light testing but I don't have any tests that generate lots >of async events. Sean and Arlin, if you could retest uDAPL or >whatever was choking on QP connected events, that would be great. > >Thanks, > Roland >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > > From mst at mellanox.co.il Thu Sep 1 11:34:33 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Sep 2005 21:34:33 +0300 Subject: [openib-general] Re: [PATCH] memory leaks in ipoib, srp In-Reply-To: <52ll2gahp7.fsf@cisco.com> References: <20050901071301.GF1707@mellanox.co.il> <52ll2gahp7.fsf@cisco.com> Message-ID: <20050901183433.GA16664@mellanox.co.il> Quoting r. Roland Dreier : > Michael> BTW, opinions on the newer version I sent? > > I'm undecided. On the one hand, it seems like a reasonable thing to > do. On the other hand, I'm not sure whether having two remove entry > points is just going to confuse people. OKay ... lets just add a flag to client instead. struct ib_client { char *name; struct ib_client_data *(*add) (struct ib_device *); void (*remove)(struct ib_device *, struct ib_client_data *); int have_client_data; struct list_head list; }; Better? > And in any case I don't see > the real motivation for making the change now. Type-safety is one. Cleaner memory management is another: its better for clients to allocate their own memory, as the two leaks that I sent patches for previously demonstrate. An additional thinking behind this is: ULPs (e.g. SDP, CM) need to keep lists of per-device objects and kill them on device removal. For example with change Sean proposes SDP will need to keep a list of per-device cm_ids in each connection. One idea, then, is in this example to make each cm_id a client, then this list is managed by device.c Client list then becomes very long, so its important to get client data from device without scanning the client list. -- MST From rolandd at cisco.com Thu Sep 1 11:46:13 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 01 Sep 2005 11:46:13 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <431746C3.3050300@ichips.intel.com> (Arlin Davis's message of "Thu, 01 Sep 2005 11:21:55 -0700") References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> Message-ID: <52u0h47hje.fsf@cisco.com> Arlin> Shouldn't there be a new ibv_put_cq_event() to go with the Arlin> ibv_get_cq_event() ? No, I think that's dealt with by sweeping the CQ in userspace when destroying a QP. - R. From mshefty at ichips.intel.com Thu Sep 1 11:46:18 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Sep 2005 11:46:18 -0700 Subject: [openib-general] Re: [PATCH] memory leaks in ipoib, srp In-Reply-To: <20050901183433.GA16664@mellanox.co.il> References: <20050901071301.GF1707@mellanox.co.il> <52ll2gahp7.fsf@cisco.com> <20050901183433.GA16664@mellanox.co.il> Message-ID: <43174C7A.8070900@ichips.intel.com> Michael S. Tsirkin wrote: > An additional thinking behind this is: ULPs (e.g. SDP, CM) > need to keep lists of per-device objects and kill them on device > removal. > For example with change Sean proposes SDP will need to keep > a list of per-device cm_ids in each connection. > One idea, then, is in this example to make each cm_id a client, > then this list is managed by device.c > > Client list then becomes very long, so its important to get > client data from device without scanning the client list. How does SDP currently track QPs? You should be able to track most cm_id's using the same method. Only listen cm_id's should need to be tracked separately. If we're trying to solve the issue of searching a list looking for user context after a remove event occurs, can't we just modify the remove device handling and pass in the user context? Is this the extent of your proposal, or am I missing something else? - Sean From rolandd at cisco.com Thu Sep 1 11:46:54 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 01 Sep 2005 11:46:54 -0700 Subject: [openib-general] Re: at won't compile with gcc-2.95 In-Reply-To: (James Lentini's message of "Thu, 1 Sep 2005 14:06:50 -0400 (EDT)") References: <52slwo91lm.fsf@cisco.com> <1125595270.4398.215.camel@hal.voltaire.com> <527je08zi2.fsf@cisco.com> Message-ID: <52psrs7hi9.fsf@cisco.com> James> Roland, out of curiosity, why are you using gcc 2.95? James> According to http://www.gnu.org/software/gcc/releases.html, James> it is fairly old. It's still supported for building the kernel, so anything we expect to send upstream needs to build with gcc 2.95. - R. From mst at mellanox.co.il Thu Sep 1 11:48:45 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Sep 2005 21:48:45 +0300 Subject: [openib-general] Re: [RFC] change to ib_create_cm_id() In-Reply-To: <43172BDF.4040207@ichips.intel.com> References: <4316302D.2030401@ichips.intel.com> <20050901065904.GC1707@mellanox.co.il> <43172BDF.4040207@ichips.intel.com> Message-ID: <20050901184845.GC16664@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [RFC] change to ib_create_cm_id() > > Michael S. Tsirkin wrote: > >>This will bind all cm_id's to a specific device, including cm_id's > >>associated with listens. This will help prevent the CM from returning a > >>cm_id associated with a device that a consumer may have already seen as > >>removed. > > > >Looking at the API, cm_ids are not currently associated with a specific > >device. > >What am I missing? > > The proposal is to change the cm_id's so that they become associated with a > specific device. Currently, they are not visibly associated with a device. I understand. But you said "this will help prevent the CM from returning a cm_id associated with a device" which seems to correspond to the existing API. > >So, I gather a ULP would need a list of cm_ids per connection, scanning > >all of them on each cm operation, scanning and updating > >these lists in all listening connections on each hotplug event. > > I'm not following you here. I was asking to keep the API that listens on all devices for listeners, and require the device on active side only. > The change affects listeners the most. Instead of calling listen once, it > would need to be called once for each device. Since most clients have per > device context, the listen cm_id would need to move from a single global > structure into the per device structures maintained by the client. > > - Sean > Thats what I was trying to say: e.g. for SDP, when a device is added, we need to scan all listening connections and add a new cm id for the device that is being added. When a device is being removed, again scan all listening connections and remove the cm id for the device that is being removed. So, I guess, I'll just have to add a list of active cm ids, per device. I would have to allocate chunk of memory and keep a cm id, and sdp connection and the list head on each. Ugh. Currently cm handles this fine for the listening side. -- MST From mst at mellanox.co.il Thu Sep 1 11:50:42 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 1 Sep 2005 21:50:42 +0300 Subject: [openib-general] Re: [PATCH] memory leaks in ipoib, srp In-Reply-To: <43174C7A.8070900@ichips.intel.com> References: <20050901071301.GF1707@mellanox.co.il> <52ll2gahp7.fsf@cisco.com> <20050901183433.GA16664@mellanox.co.il> <43174C7A.8070900@ichips.intel.com> Message-ID: <20050901185042.GD16664@mellanox.co.il> Quoting r. Sean Hefty : > If we're trying to solve the issue of searching a list looking for user > context after a remove event occurs, can't we just modify the remove device > handling and pass in the user context? Is this the extent of your > proposal, or am I missing something else? Correct. This is the extent of the proposal. To make it possible without complexity, I killed set user context and made add return the context. -- MST From mshefty at ichips.intel.com Thu Sep 1 11:57:51 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Sep 2005 11:57:51 -0700 Subject: [openib-general] Re: [RFC] change to ib_create_cm_id() In-Reply-To: <20050901184845.GC16664@mellanox.co.il> References: <4316302D.2030401@ichips.intel.com> <20050901065904.GC1707@mellanox.co.il> <43172BDF.4040207@ichips.intel.com> <20050901184845.GC16664@mellanox.co.il> Message-ID: <43174F2F.3070509@ichips.intel.com> Michael S. Tsirkin wrote: >>The proposal is to change the cm_id's so that they become associated with a >>specific device. Currently, they are not visibly associated with a device. > > I understand. But you said "this will help prevent the CM from returning > a cm_id associated with a device" which seems to correspond to > the existing API. Currently the CM returns a pointer to the device that a REQ or SIDR REQ is received on. This assists the user in allocating the QP used with the connection. > I was asking to keep the API that listens on all devices > for listeners, and require the device on active side only. Unfortunately, the problem is with listens. On the active side, the cm_id is implicitly bound to a device through the associated QP. > Thats what I was trying to say: e.g. for SDP, when a device is > added, we need to scan all listening connections and add a new cm id > for the device that is being added. Why do you need to scan all listening connections when a new device is added? > When a device is being removed, again scan all listening connections > and remove the cm id for the device that is being removed. Assuming that you have some sort of per device context, couldn't you just store the cm_id's with that structure? This would avoid any scans. As an aside, device removal should be uncommon, so I don't see efficiency as a high priority. > So, I guess, I'll just have to add a list of active cm ids, per device. > I would have to allocate chunk of memory and keep > a cm id, and sdp connection and the list head on each. Ugh. > Currently cm handles this fine for the listening side. Without changes to the CM API, the device pointer returned in a REQ may be invalid. This pushes validation of the pointer up to the ULP which would need to be verified for every connection. - Sean From jlentini at netapp.com Thu Sep 1 12:03:33 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 1 Sep 2005 15:03:33 -0400 (EDT) Subject: [openib-general] [mstflint] firmware upgrade instructions Message-ID: Hi Michael, I'm guessing that you are the maintainer of mstflint. Two questions: What is the difference between mstflint and tvflash? Using mstflint, how can the firmware located on the Mellanox website: http://www.mellanox.com/products/firmware.html be used to upgrade an HCA? I tried # /sbin/lspci -d 15b3:5a44 02:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1) # ./mstflint -d 02:00.0 -i fw-23108-rel-3_3_3/fw-23108-a1-rel.mlx b *** ERROR *** Image file open failed: Image size should be 4-bytes aligned. I'll add the answers to the Wiki for posterity. james From halr at voltaire.com Thu Sep 1 12:06:06 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 15:06:06 -0400 Subject: [openib-general] Re: at won't compile with gcc-2.95 In-Reply-To: References: <52slwo91lm.fsf@cisco.com> <1125595270.4398.215.camel@hal.voltaire.com> <527je08zi2.fsf@cisco.com> Message-ID: <1125601281.4398.489.camel@hal.voltaire.com> Hi James, On Thu, 2005-09-01 at 14:06, James Lentini wrote: > I think you want > > #define WARN(fmt, ...) \ > printk("ib_at: %s: " fmt "\n", __FUNCTION__, __VA_ARGS__) > > (note that I've removed the ";") but I don't have gcc 2.95 to test it. Did this work for you ? I get: drivers/infiniband/core/at_priv.h:140:78: warning: __VA_ARGS__ can only appear in the expansion of a C99 variadic macro followed by other errors as a result of this. -- Hal From halr at voltaire.com Thu Sep 1 12:13:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 15:13:49 -0400 Subject: [openib-general] [mstflint] firmware upgrade instructions In-Reply-To: References: Message-ID: <1125601877.4398.523.camel@hal.voltaire.com> On Thu, 2005-09-01 at 15:03, James Lentini wrote: > What is the difference between mstflint and tvflash? Note that tvflash is not being supported although it may work. There was an earlier thread on this. -- Hal From mshefty at ichips.intel.com Thu Sep 1 12:27:09 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Sep 2005 12:27:09 -0700 Subject: [openib-general] add guid to struct ib_device Message-ID: <4317560D.40200@ichips.intel.com> Is there any objection to adding the node_guid to struct ib_device? The CM queries for this, and it looks like SRP does too. To support per device listens from userspace, I was considering adding the same functionality to uCM as well. I didn't see where RNICs have this concept; although, the spec does allow for vendor specific data. - Sean From jlentini at netapp.com Thu Sep 1 13:06:19 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 1 Sep 2005 16:06:19 -0400 (EDT) Subject: [openib-general] Re: at won't compile with gcc-2.95 In-Reply-To: <1125601281.4398.489.camel@hal.voltaire.com> References: <52slwo91lm.fsf@cisco.com> <1125595270.4398.215.camel@hal.voltaire.com> <527je08zi2.fsf@cisco.com> <1125601281.4398.489.camel@hal.voltaire.com> Message-ID: On Thu, 1 Sep 2005, Hal Rosenstock wrote: > Hi James, > > On Thu, 2005-09-01 at 14:06, James Lentini wrote: > > I think you want > > > > #define WARN(fmt, ...) \ > > printk("ib_at: %s: " fmt "\n", __FUNCTION__, __VA_ARGS__) > > > > (note that I've removed the ";") but I don't have gcc 2.95 to test it. > > Did this work for you ? As I said, I don't have gcc 2.95 installed so I didn't test it. We use a similar definition in uDAPL, something along the lines of #define foo(...) bar(__VA_ARGS__) I bet the initial fmt parameter is confusing it. In kDAPL, the debug print function is defined as static inline function when debugging is on and an empty macro when it is off. That might be easier than fighting with the C preprocessor. > I get: > drivers/infiniband/core/at_priv.h:140:78: warning: __VA_ARGS__ can only > appear in the expansion of a C99 variadic macro > followed by other errors as a result of this. > > -- Hal > From shubbell at dbresearch.net Thu Sep 1 12:46:31 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Thu, 01 Sep 2005 15:46:31 -0400 Subject: [openib-general] Connectivity Message-ID: <43175A97.3000300@dbresearch.net> Hello, I have a few question that I would like to ask and possible get some help. Again, I am trying to get my ib0 interface up and running using opensm and IPoIB. I have this working on my local host, but when I want to go over the network to one of the other nodes, this does not work. So I would like to ask the following question to assist me in debugging: 1) How does one find out what the LID is for a given node and is LID 0 the localhost LID? 2) How does one find out what the guid is for a given node? 3) What are the best practices on debugging either IPoIB and/or opensm? Thanks, Sean From iod00d at hp.com Thu Sep 1 13:49:07 2005 From: iod00d at hp.com (Grant Grundler) Date: Thu, 1 Sep 2005 13:49:07 -0700 Subject: [openib-general] RDMA Read performance In-Reply-To: <43060911.3030904@ichips.intel.com> References: <20050818222606.GM15077@esmail.cup.hp.com> <34441d4cc93fdf624a1dd42263b89429@lanl.gov> <43060911.3030904@ichips.intel.com> Message-ID: <20050901204907.GA7118@esmail.cup.hp.com> On Fri, Aug 19, 2005 at 09:30:09AM -0700, Sean Hefty wrote: > Galen Shipman wrote: > >Using: ibv_rc_pingpong --size=1048576 I am seeing 942 Mbytes per sec. > > > >As I said previously, we have an internal application that sees ~950 > >MBytes per sec using RDMA Write. It looks like ibv_rc_pingpong is using > >send receive and not RDMA. Perhaps someone (Roland) has an RDMA Read > >test they can point me to? > > I'm not aware of an RDMA read test. Try increasing max_rd_atomic and > max_dest_rd_atomic to the maximum values supported by the HCA. Galen, Have you had a chance to try Sean's suggestion? I want to but only have lame excuses (time) for not doing so. thanks, grant From mshefty at ichips.intel.com Thu Sep 1 13:45:53 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Sep 2005 13:45:53 -0700 Subject: [openib-general] Connectivity In-Reply-To: <43175A97.3000300@dbresearch.net> References: <43175A97.3000300@dbresearch.net> Message-ID: <43176881.6050302@ichips.intel.com> Sean Hubbell wrote: > 1) How does one find out what the LID is for a given node and is LID 0 > the localhost LID? Easiest way is to look in /sys/class/infiniband/mthca0/ports/1/lid. The actual path may vary based on your system config. To do it programmatically, you'll need to perform SA queries. > 2) How does one find out what the guid is for a given node? cat /sys/class/infiniband/mthca0/node_guid - Sean From halr at voltaire.com Thu Sep 1 13:42:40 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 16:42:40 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43175A97.3000300@dbresearch.net> References: <43175A97.3000300@dbresearch.net> Message-ID: <1125607360.4398.842.camel@hal.voltaire.com> On Thu, 2005-09-01 at 15:46, Sean Hubbell wrote: > Hello, > > I have a few question that I would like to ask and possible get some > help. Again, I am trying to get my ib0 interface up and running using > opensm and IPoIB. I have this working on my local host, but when I want > to go over the network to one of the other nodes, this does not work. So > I would like to ask the following question to assist me in debugging: > > 1) How does one find out what the LID is for a given node and is LID 0 > the localhost LID? ibstat or ibstatus > 2) How does one find out what the guid is for a given node? Same > 3) What are the best practices on debugging either IPoIB and/or opensm? There's a FAQ for IPoIB (http://www.openib.org/docs/ipoib_faq.txt). Running opensm with -V shows more of what is going on. The log (/var/log/osm.log) usually shows what is going on. -- Hal From halr at voltaire.com Thu Sep 1 13:48:07 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 16:48:07 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43175A97.3000300@dbresearch.net> References: <43175A97.3000300@dbresearch.net> Message-ID: <1125607390.4398.851.camel@hal.voltaire.com> Hi Sean, On Thu, 2005-09-01 at 15:46, Sean Hubbell wrote: > Hello, > > I have a few question that I would like to ask and possible get some > help. Again, I am trying to get my ib0 interface up and running using > opensm and IPoIB. I have this working on my local host, but when I want > to go over the network to one of the other nodes, this does not work. So > I would like to ask the following question to assist me in debugging: > > 1) How does one find out what the LID is for a given node and is LID 0 > the localhost LID? ibstat or ibstatus > 2) How does one find out what the guid is for a given node? Same > 3) What are the best practices on debugging either IPoIB and/or opensm? There's a FAQ for IPoIB (http://www.openib.org/docs/ipoib_faq.txt). Running opensm with -V shows more of what is going on. The log (/var/log/osm.log) usually shows what is going on. -- Hal From shubbell at dbresearch.net Thu Sep 1 13:07:24 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Thu, 01 Sep 2005 16:07:24 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125607390.4398.851.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> Message-ID: <43175F7C.3020309@dbresearch.net> Thank you Sean and Hal, Would you happen to know how the lid is generated (static or dynamically)? Does this depend on when opensm is started? Sean From halr at voltaire.com Thu Sep 1 14:11:42 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 17:11:42 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43175F7C.3020309@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> Message-ID: <1125609102.4398.990.camel@hal.voltaire.com> Hi again Sean, On Thu, 2005-09-01 at 16:07, Sean Hubbell wrote: > Thank you Sean and Hal, Would you happen to know how the lid is > generated (static or dynamically)? Does this depend on when opensm is > started? Not exactly sure what you mean by static or dynamic here as there are a couple of interpretations. In any case, the current OpenIB OpenSM is dynamic as to LIDs although it may preserve a previous setting if the SM goes down and back up. The 1.8.0 OpenSM will store these persistently and I think restore them. That merge is in process now but you could have this will Mellanox Gold 1.8.0. On a Set of PortInfo, the OpenIB IPoIB will reregister its multicast group(s) for IPoIB. Are the ports active or not ? -- Hal From halr at voltaire.com Thu Sep 1 14:19:51 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Sep 2005 17:19:51 -0400 Subject: [openib-general] OpenSM 1.8.0 Merge Status and Operational Issue Message-ID: <1125609366.4398.1014.camel@hal.voltaire.com> Hi, It;s the good news... bad news... I've got the merged 1.8.0 OpenSM up and running. I have a number of questions (which I'll send separately) but the one main problem I have now is the following: I have a 4x HCA port (1x/4x LinkWidthEnable and Supported) connected via a 1x analyzer connected to a switch (so is 1x LinkWidthActive). OpenSM does not seem to want to bring this port up. It tries once and gives up until the physical link is cycled (cable pull and reinsertion). It does work running over a 4x link with 4x neighbor ports. I see the following: SM side HCA side Set PortInfo (NoStateChange) -> <- GetResp PortInfo (Init) Set PortInfo (Armed) -> <- GetResp PortInfo (Armed) Set PortInfo (Active) -> <- GetResp PortInfo (Init) I didn't track the settings on the switch side neighbor port but assume they mirror this. OpenSM just seems to never try to bring this port active again without some external stimulus. That's the secondary issue. Can you try this ? -- Hal From mshefty at ichips.intel.com Thu Sep 1 17:41:07 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 01 Sep 2005 17:41:07 -0700 Subject: [openib-general] SDP connection context Message-ID: <43179FA3.5010209@ichips.intel.com> While updating the SDP code to use a per device cm_id, I noticed that it sets the cm_id->context to a hash table entry value, rather than using a pointer directly to the connection object. Is this necessary? Also, as part of the changes I modified the CM event handling code to be event rather than state driven. I still need to test my changes, but hope to have a patch available later tomorrow. - Sean From eitan at mellanox.co.il Thu Sep 1 23:04:43 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 02 Sep 2005 09:04:43 +0300 Subject: [openib-general] Re: OpenSM Build from osm level In-Reply-To: <1125597675.4398.336.camel@hal.voltaire.com> References: <1125592374.4398.149.camel@hal.voltaire.com> <431730D1.3080402@mellanox.co.il> <1125597675.4398.336.camel@hal.voltaire.com> Message-ID: <4317EB7B.7030205@mellanox.co.il> Hal Rosenstock wrote: > On Thu, 2005-09-01 at 12:48, Eitan Zahavi wrote: > >>Hal Rosenstock wrote: >> >>>Hi Eitan, >>> >>>When I run autogen.sh from the osm level, libvendor now works (but >>>opensm and osmtest don't). >> >>Does the build fail or you only refer to the warning messages below? >>I was able to start fresh and complete a build after these warnings. >> >>The cause of the warning is the need to override the LINK rule such that >>the simulator build will use g++ for linking ... > > > It was unclear whether it was a warning or an error. Can this be > eliminated ? No this is automake code that send this message. I was looking for "official" ways to replace the "LINK" rule statement but with no success. I could however add a "success" message at the end... Would it help? > > If I ignore that and continue, it configures and builds. > > -- Hal > > >>EZ >> >>>-- Hal >>> >>>Visiting libvendor/autogen.sh >>>| + aclocal -I config -I ../config >>>| + libtoolize --force --copy >>>| Putting files in AC_CONFIG_AUX_DIR, `config'. >>>| + autoheader >>>| + automake --foreign --add-missing --copy >>>| + autoconf >>>Visiting opensm/autogen.sh >>>| + aclocal -I config -I ../config >>>| + libtoolize --force --copy >>>| Putting files in AC_CONFIG_AUX_DIR, `config'. >>>| + autoheader >>>| + automake --foreign --add-missing --copy >>>| automake: LINK was already defined in condition OSMV_SIM, which is included in condition TRUE ... >>>| Makefile.am:72: ... `LINK' previously defined here >>>| + autoconf >>>Visiting osmtest/autogen.sh >>>| + aclocal -I config -I ../config >>>| + libtoolize --force --copy >>>| Putting files in AC_CONFIG_AUX_DIR, `config'. >>>| + autoheader >>>| + automake --foreign --add-missing --copy >>>| automake: LINK was already defined in condition OSMV_SIM, which is included in condition TRUE ... >>>| Makefile.am:24: ... `LINK' previously defined here >>>| + autoconf >> >> From eitan at mellanox.co.il Thu Sep 1 23:11:02 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 02 Sep 2005 09:11:02 +0300 Subject: [openib-general] Re: OpenSM 1.8.0 Merge Status and Operational Issue In-Reply-To: <1125609366.4398.1014.camel@hal.voltaire.com> References: <1125609366.4398.1014.camel@hal.voltaire.com> Message-ID: <4317ECF6.9070507@mellanox.co.il> Hal Rosenstock wrote: > Hi, > > It;s the good news... bad news... > > I've got the merged 1.8.0 OpenSM up and running. I have a number of > questions (which I'll send separately) but the one main problem I have > now is the following: This was quick. I guess Yael's work on merging the 1.8.0 against the truck did help. > > I have a 4x HCA port (1x/4x LinkWidthEnable and Supported) connected via > a 1x analyzer connected to a switch (so is 1x LinkWidthActive). > OpenSM does not seem to want to bring this port up. It tries once and > gives up until the physical link is cycled (cable pull and reinsertion). > It does work running over a 4x link with 4x neighbor ports. Can you try this with the pre-merge build? Does it work? Just to make sure it is a new bug. Also with the 1.8.0 you could force the SM to heavy sweep by sending it kill -HUP Please provide detailed (-V) log file so we can see what is happening too. > > > Can you try this ? We will on Sunday/Monday. I have a feeling I run into this once. Not sure if it is not an Analyzer issue. Could you please take the Analyzer out and retry? > > -- Hal From shubbell at dbresearch.net Fri Sep 2 04:57:54 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 07:57:54 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125609102.4398.990.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> Message-ID: <43183E42.8040404@dbresearch.net> Hal Rosenstock wrote: >Hi again Sean, > >On Thu, 2005-09-01 at 16:07, Sean Hubbell wrote: > > >>Thank you Sean and Hal, Would you happen to know how the lid is >>generated (static or dynamically)? Does this depend on when opensm is >>started? >> >> > >Not exactly sure what you mean by static or dynamic here as there are a >couple of interpretations. In any case, the current OpenIB OpenSM is >dynamic as to LIDs although it may preserve a previous setting if the SM >goes down and back up. The 1.8.0 OpenSM will store these persistently >and I think restore them. That merge is in process now but you could >have this will Mellanox Gold 1.8.0. > >On a Set of PortInfo, the OpenIB IPoIB will reregister its multicast >group(s) for IPoIB. Are the ports active or not ? > >-- Hal > > > > Yes, the port (port 1) that I am using is ACTIVE. Sean From halr at voltaire.com Fri Sep 2 06:10:59 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 09:10:59 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43183E42.8040404@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> Message-ID: <1125666659.4398.5213.camel@hal.voltaire.com> On Fri, 2005-09-02 at 07:57, Sean Hubbell wrote: > >On a Set of PortInfo, the OpenIB IPoIB will reregister its multicast > >group(s) for IPoIB. Are the ports active or not ? > > > Yes, the port (port 1) that I am using is ACTIVE. and all ports in path to SM (including SM port) are also active ? If so, the osm.log might be instructive. -- Hal From shubbell at dbresearch.net Fri Sep 2 05:28:45 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 08:28:45 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125666659.4398.5213.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> Message-ID: <4318457D.10308@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 07:57, Sean Hubbell wrote: > > >>>On a Set of PortInfo, the OpenIB IPoIB will reregister its multicast >>>group(s) for IPoIB. Are the ports active or not ? >>> >>> >>> >>Yes, the port (port 1) that I am using is ACTIVE. >> >> > >and all ports in path to SM (including SM port) are also active ? If so, >the osm.log might be instructive. > >-- Hal > > > > Again, thanks Hal. Yes, I can perform an ibping on all of the nodes so connectivity and the ports inbetween are up. I am almost positive now that this has something to do with the IPoIB. I am going to try to ping each node and then look at the arp table. Do you know of anything I can do to look specifically at the IPoIB "exchange". Sean From shubbell at dbresearch.net Fri Sep 2 05:38:23 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 08:38:23 -0400 Subject: [openib-general] Connectivity In-Reply-To: <4318457D.10308@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> Message-ID: <431847BF.5080507@dbresearch.net> Sean Hubbell wrote: > Hal Rosenstock wrote: > >> On Fri, 2005-09-02 at 07:57, Sean Hubbell wrote: >> >> >>>> On a Set of PortInfo, the OpenIB IPoIB will reregister its multicast >>>> group(s) for IPoIB. Are the ports active or not ? >>>> >>>> >>> >>> Yes, the port (port 1) that I am using is ACTIVE. >>> >> >> >> and all ports in path to SM (including SM port) are also active ? If so, >> the osm.log might be instructive. >> >> -- Hal >> >> >> >> > Again, thanks Hal. Yes, I can perform an ibping on all of the nodes so > connectivity and the ports inbetween are up. I am almost positive now > that this has something to do with the IPoIB. I am going to try to > ping each node and then look at the arp table. Do you know of anything > I can do to look specifically at the IPoIB "exchange". > > Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > I looked at the arp table and the ib0 device is not listed. The state is marked as ACTIVE for port 1. Does this mean that a connection to opensm has been established and not necessarily that this interface has been enabled for use (meaning the connection to opensm with over eth0 instead of ib0)? Sean From halr at voltaire.com Fri Sep 2 06:33:42 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 09:33:42 -0400 Subject: [openib-general] Connectivity In-Reply-To: <4318457D.10308@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> Message-ID: <1125668022.4398.5341.camel@hal.voltaire.com> On Fri, 2005-09-02 at 08:28, Sean Hubbell wrote: > Again, thanks Hal. Yes, I can perform an ibping on all of the nodes so > connectivity and the ports inbetween are up. I am almost positive now > that this has something to do with the IPoIB. I am going to try to ping > each node and then look at the arp table. Do you know of anything I can > do to look specifically at the IPoIB "exchange". If ibping works, UD unicast is working (and would work for that part of IPoIB). What I suspect is not working is multicast. I suspect some issue with the IPoIB broadcast group. So can you comment on the topology and provide an OpenSM log when run with verbose ? [Also can you down and then up all the ib interfaces and see if connectivity is restored. Also, is the SM running on a node also running IPoIB ?] If not, you can debug this using the following: 1. Using ibroute, you can display the multicast tables in the switches. Using ibtracert you can trace the route of a multicast group. Multicast examples: ibroute -M 4 # dump all non empty mlids of switch with lid 4 ibroute -M 4 0xc010 0xc020 # same, but with range ibroute -M -n 4 # simple dump format Multicast example: ibtracert -m 0xc000 4 16 # show multicast path of mlid 0xc000 between lids 4 and 16 2. There are 2 levels of debug tracing in IPoIB. You can enable these in the build with CONFIG_INFINIBAND_IPOIB_DEBUG and CONFIG_INFINIBAND_IPOIB_DEBUG_DATA -- Hal From halr at voltaire.com Fri Sep 2 06:38:22 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 09:38:22 -0400 Subject: [openib-general] Connectivity In-Reply-To: <431847BF.5080507@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <431847BF.5080507@dbresearch.net> Message-ID: <1125668301.4398.5367.camel@hal.voltaire.com> On Fri, 2005-09-02 at 08:38, Sean Hubbell wrote: > I looked at the arp table and the ib0 device is not listed. The state is > marked as ACTIVE for port 1. Does this mean that a connection to opensm > has been established and not necessarily that this interface has been > enabled for use (meaning the connection to opensm with over eth0 instead > of ib0)? If there are no ARP entries on an ib interface, I don't think it would show up there. If you ping an explicit address on that ib subnet, you would see something like: ? (192.168.0.2) at on ib0 I presume you've modprobe'd ib_ipoib and ifconfig'd ib to get it up and running (ifconfig -a shows ib in an UP state). -- Hal From shubbell at dbresearch.net Fri Sep 2 05:59:02 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 08:59:02 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125668022.4398.5341.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> Message-ID: <43184C96.8020304@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 08:28, Sean Hubbell wrote: > > >>Again, thanks Hal. Yes, I can perform an ibping on all of the nodes so >>connectivity and the ports inbetween are up. I am almost positive now >>that this has something to do with the IPoIB. I am going to try to ping >>each node and then look at the arp table. Do you know of anything I can >>do to look specifically at the IPoIB "exchange". >> >> > >If ibping works, UD unicast is working (and would work for that part of >IPoIB). What I suspect is not working is multicast. I suspect some issue >with the IPoIB broadcast group. So can you comment on the topology and >provide an OpenSM log when run with verbose ? [Also can you down and >then up all the ib interfaces and see if connectivity is restored. >Also, is the SM running on a node also running IPoIB ?] > >If not, you can debug this using the following: > > > The "currently" topology of the system is 4 Dell PowerEdge 2.8 GHz machines with hyperthreading. There also is another DELL and then one day there will be 48 other nodes that are blades in 4 other chasises. There are 12 infiniband switches which basically use three switches to route to the other switches. The log file I cannot send. I can go through it and answer any questions that you have. I realize this is stupid, but this is well above me. I am not sure about the Subnet Manager. How can I tell where it is running? >1. Using ibroute, you can display the multicast tables in the switches. >Using ibtracert you can trace the route of a multicast group. > > Multicast examples: > ibroute -M 4 # dump all non empty mlids of switch with lid 4 > ibroute -M 4 0xc010 0xc020 # same, but with range > ibroute -M -n 4 # simple dump format > > Multicast example: > ibtracert -m 0xc000 4 16 # show multicast path of mlid 0xc000 between lids 4 and 16 > > > I will try these. >2. There are 2 levels of debug tracing in IPoIB. You can enable these in >the build with CONFIG_INFINIBAND_IPOIB_DEBUG and >CONFIG_INFINIBAND_IPOIB_DEBUG_DATA > > Sorry for my ignorance, but how would one go about doing this? Sean From shubbell at dbresearch.net Fri Sep 2 06:01:30 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 09:01:30 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125668301.4398.5367.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <431847BF.5080507@dbresearch.net> <1125668301.4398.5367.camel@hal.voltaire.com> Message-ID: <43184D2A.7060703@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 08:38, Sean Hubbell wrote: > > >>I looked at the arp table and the ib0 device is not listed. The state is >>marked as ACTIVE for port 1. Does this mean that a connection to opensm >>has been established and not necessarily that this interface has been >>enabled for use (meaning the connection to opensm with over eth0 instead >>of ib0)? >> >> > >If there are no ARP entries on an ib interface, I don't think it >would show up there. If you ping an explicit address on that ib >subnet, you would see something like: >? (192.168.0.2) at on ib0 > > > Yes, sorry I should have mentioned I pinged the address before I ran arp and received a reply. >I presume you've modprobe'd ib_ipoib and ifconfig'd ib to get it up >and running (ifconfig -a shows ib in an UP state). > > Yes, I have checked the modules for each node and all are there. This was actually the first thing I looked at as I thought that IPoIB was not there. Yes, each ib0 is up supporting multicast. Sean From halr at voltaire.com Fri Sep 2 06:59:54 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 09:59:54 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43184D2A.7060703@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <431847BF.5080507@dbresearch.net> <1125668301.4398.5367.camel@hal.voltaire.com> <43184D2A.7060703@dbresearch.net> Message-ID: <1125669594.4398.5481.camel@hal.voltaire.com> On Fri, 2005-09-02 at 09:01, Sean Hubbell wrote: > Yes, each ib0 is up supporting multicast. That multicast indicates IP multicast (which is different from IB multicast). IB multicast is needed for IP multicast but also IP unicast to emulate the broadcast network on a LAN. I was asking about if ifconfig showed UP: ib0 Link encap:UNSPEC HWaddr 00-00-04-05-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:192.168.0.11 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::208:f104:396:55a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 -- Hal From halr at voltaire.com Fri Sep 2 07:05:18 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 10:05:18 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43184C96.8020304@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> Message-ID: <1125669917.4398.5519.camel@hal.voltaire.com> On Fri, 2005-09-02 at 08:59, Sean Hubbell wrote: > The log file I cannot send. I can go through it and answer any questions > that you have. I realize this is stupid, but this is well above me. It will just take longer for us to converge on what is going on but this is not the first time being in this situation... > I am not sure about the Subnet Manager. How can I tell where it is running? So I take it you didn't start up the SM... You can tell that by the SM LID of the end nodes. You can find that out via: /sys/class/infiniband/mthca0/ports/1/sm_lid ibstat ibstatus You can then use the diag tool smpquery to display more info about the SM node: smpquery nodeinfo and use ibnetdiscover to correlate the GUID of the node to the topology to determine where it is running. > >2. There are 2 levels of debug tracing in IPoIB. You can enable these in > >the build with CONFIG_INFINIBAND_IPOIB_DEBUG and > >CONFIG_INFINIBAND_IPOIB_DEBUG_DATA > > > > > Sorry for my ignorance, but how would one go about doing this? Those are in the kernel config and then the kernel would need rebuilding. -- Hal From shubbell at dbresearch.net Fri Sep 2 06:19:13 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 09:19:13 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125669594.4398.5481.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <431847BF.5080507@dbresearch.net> <1125668301.4398.5367.camel@hal.voltaire.com> <43184D2A.7060703@dbresearch.net> <1125669594.4398.5481.camel@hal.voltaire.com> Message-ID: <43185151.5090109@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 09:01, Sean Hubbell wrote: > > >>Yes, each ib0 is up supporting multicast. >> >> > >That multicast indicates IP multicast (which is different from IB >multicast). IB multicast is needed for IP multicast but also IP unicast >to emulate the broadcast network on a LAN. > >I was asking about if ifconfig showed UP: >ib0 Link encap:UNSPEC HWaddr 00-00-04-05-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:192.168.0.11 Bcast:192.168.0.255 Mask:255.255.255.0 > inet6 addr: fe80::208:f104:396:55a/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > >-- Hal > > > Here is the results: >-bash-3.00# ifconfig ib0 >ib0 Link encap:InfiniBand HWaddr 00:00:04:04:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 > > inet addr:192.168.1.4 Bcast:192.168.1.255 Mask:255.255.0.0 > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:118699 errors:0 dropped:0 overruns:0 frame:0 > TX packets:60208 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > > > From shubbell at dbresearch.net Fri Sep 2 06:45:25 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 09:45:25 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125669917.4398.5519.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> Message-ID: <43185775.4080506@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 08:59, Sean Hubbell wrote: > > >>The log file I cannot send. I can go through it and answer any questions >>that you have. I realize this is stupid, but this is well above me. >> >> > >It will just take longer for us to converge on what is going on but this >is not the first time being in this situation... > > > >>I am not sure about the Subnet Manager. How can I tell where it is running? >> >> > >So I take it you didn't start up the SM... > >You can tell that by the SM LID of the end nodes. You can find that out >via: >/sys/class/infiniband/mthca0/ports/1/sm_lid >ibstat >ibstatus > >You can then use the diag tool smpquery to display more info about the >SM node: smpquery nodeinfo >and use ibnetdiscover to correlate the GUID of the node to the topology >to determine where it is running. > > > >>>2. There are 2 levels of debug tracing in IPoIB. You can enable these in >>>the build with CONFIG_INFINIBAND_IPOIB_DEBUG and >>>CONFIG_INFINIBAND_IPOIB_DEBUG_DATA >>> >>> >>> >>> >>Sorry for my ignorance, but how would one go about doing this? >> >> > >Those are in the kernel config and then the kernel would need >rebuilding. > >-- Hal > > > > Just to make sure I understand. I run smpquery nodeinfo 0x1. I get the GUID from that command and then run ibnetdiscover and see if that GUID exists in the results? If that is correct, the GUID does not exist, but when I run ibnetdiscover I see switchguids so I am thinking I am not doing this correctly??? Sean From halr at voltaire.com Fri Sep 2 07:46:59 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 10:46:59 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43185775.4080506@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> Message-ID: <1125672418.4398.5735.camel@hal.voltaire.com> On Fri, 2005-09-02 at 09:45, Sean Hubbell wrote: > Just to make sure I understand. I run smpquery nodeinfo 0x1. Assuming SMLID is 1 > I get the GUID from that command and then run ibnetdiscover and see if > that GUID exists in the results? > If that is correct, the GUID does not exist, but when I run > ibnetdiscover I see switchguids so I am thinking I am not doing this > correctly??? Sorry; I didn't actually try this before so here are more precise directions: smpquery nodeinfo should give you the Guid and then match that to ibnetdiscover's output which would then show which HCA the SM is running on. The NodeInfo PortGuid will tell you which port. -- Hal From shubbell at dbresearch.net Fri Sep 2 07:00:52 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 10:00:52 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43185775.4080506@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> Message-ID: <43185B14.6020205@dbresearch.net> I got the results of the osm.log (ended up being ok to send it): I ran tail -f /var/log/osm.log and this was the error that I kept getting: Sep 02 09:49:46 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 Sep 02 09:49:46 [42FFF960] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0001 GID:0xfe80000000000000,0x0005ad000003d269 Sep 02 09:49:46 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 Sep 02 09:49:46 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7. Sep 02 09:49:46 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 Are the error codes defined somewhere? Sean From shubbell at dbresearch.net Fri Sep 2 07:11:59 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 10:11:59 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125672418.4398.5735.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> Message-ID: <43185DAF.3070809@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 09:45, Sean Hubbell wrote: > > >>Just to make sure I understand. I run smpquery nodeinfo 0x1. >> >> > >Assuming SMLID is 1 > > > >>I get the GUID from that command and then run ibnetdiscover and see if >>that GUID exists in the results? >>If that is correct, the GUID does not exist, but when I run >>ibnetdiscover I see switchguids so I am thinking I am not doing this >>correctly??? >> >> > >Sorry; I didn't actually try this before so here are more precise >directions: > >smpquery nodeinfo should give you the Guid and then match that to >ibnetdiscover's output which would then show which HCA the SM is running >on. The NodeInfo PortGuid will tell you which port. > >-- Hal > > > > OK, the guid from the smpquery command does not exist in the output of ibnetdiscover. Sean From halr at voltaire.com Fri Sep 2 08:09:29 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 11:09:29 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43185DAF.3070809@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> Message-ID: <1125673768.4398.5857.camel@hal.voltaire.com> On Fri, 2005-09-02 at 10:11, Sean Hubbell wrote: > OK, the guid from the smpquery command does not exist in the output of > ibnetdiscover. OK but not good :-( Is sminfo consistent with the GUID for the SM obtained by previous method ? Might you be running a (switch) embedded SM ? -- Hal From shubbell at dbresearch.net Fri Sep 2 07:31:37 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 10:31:37 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125673768.4398.5857.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> Message-ID: <43186249.7020902@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 10:11, Sean Hubbell wrote: > > >>OK, the guid from the smpquery command does not exist in the output of >>ibnetdiscover. >> >> > >OK but not good :-( > >Is sminfo consistent with the GUID for the SM obtained by >previous method ? > > > >Might you be running a (switch) embedded SM ? > > sminfo 0x1 returns a guid that matches the PortGUID for the smpquery nodeinfo 0x1but not the guid... I do not know anything about the sm, I followed the directions on the wiki when I installed. How would one check? Sean From halr at voltaire.com Fri Sep 2 08:28:48 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 11:28:48 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43186249.7020902@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> Message-ID: <1125674928.4398.5977.camel@hal.voltaire.com> On Fri, 2005-09-02 at 10:31, Sean Hubbell wrote: > >Might you be running a (switch) embedded SM ? > > > > > > sminfo 0x1 returns a guid that matches the PortGUID for the smpquery > nodeinfo 0x1but not the guid... > I do not know anything about the sm, I followed the directions on the > wiki when I installed. How would one check? Don't start the OpenIB OpenSM and see if the ports come to active and if the SM can be found. -- Hal From halr at voltaire.com Fri Sep 2 08:33:52 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 11:33:52 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43185B14.6020205@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> Message-ID: <1125675232.4398.6009.camel@hal.voltaire.com> On Fri, 2005-09-02 at 10:00, Sean Hubbell wrote: > I got the results of the osm.log (ended up being ok to send it): > > I ran tail -f /var/log/osm.log and this was the error that I kept getting: > > Sep 02 09:49:46 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 > Sep 02 09:49:46 [42FFF960] -> osm_report_notice: Reporting Generic > Notice type:3 num:66 from LID:0x0001 > GID:0xfe80000000000000,0x0005ad000003d269 > Sep 02 09:49:46 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 09:49:46 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = > SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, > expected comp mask = 0x00000000000130c7. > Sep 02 09:49:46 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 That means that there is a join for a group which is not created. Unfortunately the MC group is not indicated in the message. Are you running OpenSM with -V ? If not, please do that, more information should be indicated. Additionally, I would like to see all the MCMemberRecord exchanges from that log if you can share them. This in and of itself is not necessarily the problem. It depends on what group this is occuring on. The IP broadcast group is precreated so this should not occur on it. It does occur on certain groups associated with IP routers like 224.0.0.2 (All routers on this subnet) as one common example which is "OK". > Are the error codes defined somewhere? In the code only. -- Hal From shubbell at dbresearch.net Fri Sep 2 07:56:06 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 10:56:06 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125674928.4398.5977.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> Message-ID: <43186806.9040807@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 10:31, Sean Hubbell wrote: > > >>>Might you be running a (switch) embedded SM ? >>> >>> >>> >>> >>sminfo 0x1 returns a guid that matches the PortGUID for the smpquery >>nodeinfo 0x1but not the guid... >>I do not know anything about the sm, I followed the directions on the >>wiki when I installed. How would one check? >> >> > >Don't start the OpenIB OpenSM and see if the ports come to active and if >the SM can be found. > >-- Hal > > > > OK, here are my results. On my master node, the one that typically runs opensm, states the port is INIT. The other nodes still say that they are ACTIVE. Sean From halr at voltaire.com Fri Sep 2 08:53:00 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 11:53:00 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43186806.9040807@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> <43186806.9040807@dbresearch.net> Message-ID: <1125676380.4398.6107.camel@hal.voltaire.com> On Fri, 2005-09-02 at 10:56, Sean Hubbell wrote: > OK, here are my results. On my master node, the one that typically runs > opensm, states the port is INIT. The other nodes still say that they are > ACTIVE. If the SM port is not active, any SA requests (multicast requests) will not be received by the SA (and hence not responded to) as they require the port to be active. That's the key issue: getting that port to ACTIVE. -- Hal From shubbell at dbresearch.net Fri Sep 2 08:05:29 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 11:05:29 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125675232.4398.6009.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> Message-ID: <43186A39.7000800@dbresearch.net> Attached is the log file after I restarted opensm. Sean -------------- next part -------------- A non-text attachment was scrubbed... Name: opensm050902.log Type: text/x-log Size: 46912 bytes Desc: not available URL: From shubbell at dbresearch.net Fri Sep 2 08:10:35 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 11:10:35 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125676380.4398.6107.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> <43186806.9040807@dbresearch.net> <1125676380.4398.6107.camel@hal.voltaire.com> Message-ID: <43186B6B.4080605@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 10:56, Sean Hubbell wrote: > > >>OK, here are my results. On my master node, the one that typically runs >>opensm, states the port is INIT. The other nodes still say that they are >>ACTIVE. >> >> > >If the SM port is not active, any SA requests (multicast requests) will >not be received by the SA (and hence not responded to) as they require >the port to be active. That's the key issue: getting that port to >ACTIVE. > >-- Hal > > > > This was what happened when I shutdown opensm as per you suggestion. I may have misunderstood. What I did was shutdown opensm and then checked the /sys/class/infiniband/mthca/ports/1/state to see what the status was for the node that usually runs opensm and then the other nodes. The master node when opensm was down was INIT. The other nodes were active. If I restarted opesm then the master node goes to ACTIVE. Sean From halr at voltaire.com Fri Sep 2 09:09:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 12:09:14 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43186B6B.4080605@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> <43186806.9040807@dbresearch.net> <1125676380.4398.6107.camel@hal.voltaire.com> <43186B6B.4080605@dbresearch.net> Message-ID: <1125677354.4398.6191.camel@hal.voltaire.com> On Fri, 2005-09-02 at 11:10, Sean Hubbell wrote: > This was what happened when I shutdown opensm as per you suggestion. I > may have misunderstood. What I did was shutdown opensm and then checked > the /sys/class/infiniband/mthca/ports/1/state to see what the status was > for the node that usually runs opensm and then the other nodes. The > master node when opensm was down was INIT. The other nodes were active. > If I restarted opesm then the master node goes to ACTIVE. SM needs to be running. You needed to bounce OpenSM to get the verbosity. Might be nice to be able to dial this up and down at run time. In the log you sent the port states all appear ACTIVE so that is not the issue. Can you down and up all the IPoIB interfaces (with ifconfig) and see if you have IP connectivity ? -- Hal From shubbell at dbresearch.net Fri Sep 2 08:35:25 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 11:35:25 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125677354.4398.6191.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> <43186806.9040807@dbresearch.net> <1125676380.4398.6107.camel@hal.voltaire.com> <43186B6B.4080605@dbresearch.net> <1125677354.4398.6191.camel@hal.voltaire.com> Message-ID: <4318713D.6050204@dbresearch.net> >SM needs to be running. You needed to bounce OpenSM to get the >verbosity. Might be nice to be able to dial this up and down at run >time. > > > Here is exactly what I did. I restarted opensm by the following: cd /usr/local/ib/bin ./opensm -V | tee opensm050902.log I then sshed to each node and performed: ifconfig ib0 down ifconfig ib0 up I then checked the state and all are active. I then ran my test program which sends data over the multicast address and still it does not work over the nodes but does work sending the data to the localhost. >In the log you sent the port states all appear ACTIVE so that is not the >issue. Can you down and up all the IPoIB interfaces (with ifconfig) and >see if you have IP connectivity ? > > Attached is the new log. Sean -------------- next part -------------- A non-text attachment was scrubbed... Name: opensm050902.log Type: text/x-log Size: 46831 bytes Desc: not available URL: From halr at voltaire.com Fri Sep 2 09:29:39 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 12:29:39 -0400 Subject: [openib-general] Connectivity In-Reply-To: <4318713D.6050204@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> <43186806.9040807@dbresearch.net> <1125676380.4398.6107.camel@hal.voltaire.com> <43186B6B.4080605@dbresearch.net> <1125677354.4398.6191.camel@hal.voltaire.com> <4318713D.6050204@dbresearch.net> Message-ID: <1125678578.4398.6295.camel@hal.voltaire.com> On Fri, 2005-09-02 at 11:35, Sean Hubbell wrote: > I then ran my test program which sends data over the multicast address > and still it does not work over the nodes but does work sending the data > to the localhost. So you are sending (and receiving) to/from an IP multicast address in your application ? I mistakenly thought the problem was just basic unicast IPoIB connectivity (ping -b, etc.). This is a different matter and the join refusals may be what is causing the issue. -- Hal From shubbell at dbresearch.net Fri Sep 2 08:46:32 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 11:46:32 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125678578.4398.6295.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> <43186806.9040807@dbresearch.net> <1125676380.4398.6107.camel@hal.voltaire.com> <43186B6B.4080605@dbresearch.net> <1125677354.4398.6191.camel@hal.voltaire.com> <4318713D.6050204@dbresearch.net> <1125678578.4398.6295.camel@hal.voltaire.com> Message-ID: <431873D8.2070708@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 11:35, Sean Hubbell wrote: > > >>I then ran my test program which sends data over the multicast address >>and still it does not work over the nodes but does work sending the data >>to the localhost. >> >> > >So you are sending (and receiving) to/from an IP multicast address in >your application ? I mistakenly thought the problem was just basic >unicast IPoIB connectivity (ping -b, etc.). > >This is a different matter and the join refusals may be what is causing >the issue. > >-- Hal > > > > Yes, I am sending multicast. What would cause the join refusal/s? Does opensm think that it has that node has already joined? Sean From halr at voltaire.com Fri Sep 2 09:42:40 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 12:42:40 -0400 Subject: [openib-general] Connectivity In-Reply-To: <431873D8.2070708@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> <43186806.9040807@dbresearch.net> <1125676380.4398.6107.camel@hal.voltaire.com> <43186B6B.4080605@dbresearch.net> <1125677354.4398.6191.camel@hal.voltaire.com> <4318713D.6050204@dbresearch.net> <1125678578.4398.6295.camel@hal.voltaire.com> <431873D8.2070708@dbresearch.net> Message-ID: <1125679359.4398.6366.camel@hal.voltaire.com> On Fri, 2005-09-02 at 11:46, Sean Hubbell wrote: > Yes, I am sending multicast. What would cause the join refusal/s? Does > opensm think that it has that node has already joined? The group needs to be created before it is joined. Is this a static IP multicast address which is being used ? Can you say what IP multicast is being used ? Are the receivers started first ? I think they may create the IB multicast group. -- Hal From shubbell at dbresearch.net Fri Sep 2 09:02:38 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 12:02:38 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125679359.4398.6366.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> <43186806.9040807@dbresearch.net> <1125676380.4398.6107.camel@hal.voltaire.com> <43186B6B.4080605@dbresearch.net> <1125677354.4398.6191.camel@hal.voltaire.com> <4318713D.6050204@dbresearch.net> <1125678578.4398.6295.camel@hal.voltaire.com> <431873D8.2070708@dbresearch.net> <1125679359.4398.6366.camel@hal.voltaire.com> Message-ID: <4318779E.90401@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 11:46, Sean Hubbell wrote: > > >>Yes, I am sending multicast. What would cause the join refusal/s? Does >>opensm think that it has that node has already joined? >> >> > >The group needs to be created before it is joined. Is this a static IP >multicast address which is being used ? Can you say what IP multicast is >being used ? Are the receivers started first ? I think they may create >the IB multicast group. > >-- Hal > > Yes, this is a static IP address. The multicast address used is in the range 224.10.10.10 - 224.10.10.40. The receivers are started first. Sean From eitan at mellanox.co.il Fri Sep 2 09:59:34 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 02 Sep 2005 19:59:34 +0300 Subject: [openib-general] Connectivity In-Reply-To: <43186A39.7000800@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> Message-ID: <431884F6.4080101@mellanox.co.il> Hi Sean, OpenSM creates a log file (very detailed) if you start it with -V . The file is located under /var/log/osm.log (if you do not redirect it using the -f flag). The detailed log file shows all the SM operations, requests, responses, you name it. If you will attach this file we will be able to tell exactly which multicast group was opened by which client. We can also tell which request to join the group was accepted and which was not (joining the group - both senders and receivers is required for any IP multicast group communication). You can grep -i for mgrp or mlid or mgid in the log file and see for yourself what is going on. Also I will be happy to assist interpreting it. Eitan From shubbell at dbresearch.net Fri Sep 2 09:20:01 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 12:20:01 -0400 Subject: [openib-general] Connectivity In-Reply-To: <431884F6.4080101@mellanox.co.il> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> Message-ID: <43187BB1.1060402@dbresearch.net> Eitan Zahavi wrote: > Hi Sean, > > OpenSM creates a log file (very detailed) if you start it with -V . > The file is located under /var/log/osm.log (if you do not redirect it > using the -f flag). > > The detailed log file shows all the SM operations, requests, > responses, you name it. > If you will attach this file we will be able to tell exactly which > multicast group was opened by which client. > We can also tell which request to join the group was accepted and > which was not > (joining the group - both senders and receivers is required for any IP > multicast group communication). > > You can grep -i for mgrp or mlid or mgid in the log file and see for > yourself what is going on. > Also I will be happy to assist interpreting it. > > Eitan > > I can send it but it is ~42 MB... Would you rather me send the greps? Attacked are the last 25 lines in the osm.log file. Sean -------------- next part -------------- A non-text attachment was scrubbed... Name: last25lines.log Type: text/x-log Size: 2437 bytes Desc: not available URL: From halr at voltaire.com Fri Sep 2 10:12:01 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 13:12:01 -0400 Subject: [openib-general] Connectivity In-Reply-To: <4318779E.90401@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> <43186806.9040807@dbresearch.net> <1125676380.4398.6107.camel@hal.voltaire.com> <43186B6B.4080605@dbresearch.net> <1125677354.4398.6191.camel@hal.voltaire.com> <4318713D.6050204@dbresearch.net> <1125678578.4398.6295.camel@hal.voltaire.com> <431873D8.2070708@dbresearch.net> <1125679359.4398.6366.camel@hal.voltaire.com> <4318779E.90401@dbresearch.net> Message-ID: <1125681121.4398.6565.camel@hal.voltaire.com> On Fri, 2005-09-02 at 12:02, Sean Hubbell wrote: > Yes, this is a static IP address. The multicast address used is in the > range 224.10.10.10 - 224.10.10.40. > The receivers are started first. The receivers _should_ create the IB MC groups over which the IPmc groups run. The transmitter(s) should join those groups. Can you send the portion of the OSM log (in verbose mode) from a receiver startup to validate the group creation ? Another one for a single transmitter side (after the receive side). Can the topology be reduced to something simpler till it starts to work ? Out of curiosity, what kernel version are you using ? I think there may be a temporary workaround for this so you can get on with your real work (not troubleshooting this :-) -- Hal From halr at voltaire.com Fri Sep 2 10:17:11 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 13:17:11 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43187BB1.1060402@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43187BB1.1060402@dbresearch.net> Message-ID: <1125681282.4398.6587.camel@hal.voltaire.com> On Fri, 2005-09-02 at 12:20, Sean Hubbell wrote: > I can send it but it is ~42 MB... Would you rather me send the greps? > Attacked are the last 25 lines in the osm.log file. Those errors are join errors as there is no MC group. Can you find the MCMemberRecord Set and GetResp with with a component mask of 130c7 in the log ? Can you indicate the MGID ? -- Hal > > ______________________________________________________________________ > Sep 02 12:04:06 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 > Sep 02 12:04:10 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 12:04:10 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 12:04:10 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7. > Sep 02 12:04:10 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 > Sep 02 12:04:11 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 12:04:11 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7. > Sep 02 12:04:11 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 > Sep 02 12:04:11 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 12:04:11 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7. > Sep 02 12:04:11 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 > Sep 02 12:04:12 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 12:04:12 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 12:04:12 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7. > Sep 02 12:04:12 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 > Sep 02 12:04:58 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 12:04:58 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7. > Sep 02 12:04:58 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 > Sep 02 12:04:58 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 12:04:58 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7. > Sep 02 12:04:58 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 > Sep 02 12:05:02 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 12:05:02 [42FFF960] -> osm_vendor_send: RMPP 0 length 112 > Sep 02 12:05:02 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7. > Sep 02 12:05:02 [42FFF960] -> osm_vendor_send: RMPP 0 length 256 From roel at yottayotta.com Fri Sep 2 10:52:18 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Fri, 2 Sep 2005 11:52:18 -0600 (MDT) Subject: [openib-general] Problem following trunk/userspace/management/README Message-ID: Hi all, I am trying to get openib installed. This is what I have done sofar: - installed Fedora Core 4 - downloaded sources for linux-2.6.13 - svn co https://openib.org/svn/gen2/trunk - replace linux-2.6.13/drivers/infiniband with trunk/src/linux-kernel/infiniband - compiled and installed this new kernel Now I am trying to follow the instructions in trunk/userspace/management/README: [root at topcom-63 libibcommon]# ./autogen.sh + aclocal -I config Can't locate object method "path" via package "Request" at /usr/share/autoconf/Autom4te/C4che.pm line 69, line 111. aclocal: autom4te failed with exit status: 1 + libtoolize --force --copy You should update your `aclocal.m4' by running aclocal. Putting files in AC_CONFIG_AUX_DIR, `config'. + autoheader Can't locate object method "path" via package "Request" at /usr/share/autoconf/Autom4te/C4che.pm line 69, line 111. autoheader: /usr/bin/autom4te failed with exit status: 1 + automake --foreign --add-missing --copy Can't locate object method "path" via package "Request" at /usr/share/autoconf/Autom4te/C4che.pm line 69, line 111. automake: autoconf failed with exit status: 1 + autoconf Can't locate object method "path" via package "Request" at /usr/share/autoconf/Autom4te/C4che.pm line 69, line 111. [root at topcom-63 libibcommon]# I googled for this problem on the openib-general list and it appears to have been reported in June 2004, however this month just got removed from the openib-general archives. I have no problem running autogen.sh in the same directory on an older kernel (2.4.20-8), but expect that to cause problems later. Anybody know how I can take this hurdle? Thank you, Roel. From shubbell at dbresearch.net Fri Sep 2 10:13:29 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 13:13:29 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125681282.4398.6587.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43187BB1.1060402@dbresearch.net> <1125681282.4398.6587.camel@hal.voltaire.com> Message-ID: <43188839.5030703@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 12:20, Sean Hubbell wrote: > > > >>I can send it but it is ~42 MB... Would you rather me send the greps? >>Attacked are the last 25 lines in the osm.log file. >> >> > >Those errors are join errors as there is no MC group. > >Can you find the MCMemberRecord Set and GetResp with with a component >mask of 130c7 in the log ? Can you indicate the MGID ? > > > Sep 02 11:44:38 [447FF960] -> SA MAD dump: base_ver................0x1 mgmt_class..............0x3 class_ver...............0x2 method..................0x2 (SubnAdmSet) status..................0x0 resv....................0x0 trans_id................0x7333c794d attr_id.................0x38 (MCMemberRecord) resv1...................0x0 attr_mod................0x0 rmpp_version............0x0 rmpp_type...............0x0 rmpp_flags..............0x0 rmpp_status.............0x0 seg_num.................0x0 payload_len/new_win.....0x0 sm_key..................0x0000000000000000 attr_offset.............0x0 resv2...................0x0 comp_mask...............0x00000000000130c7 Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_process: [ Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_process: Posting Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD. Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_process: ] Sep 02 11:44:38 [42FFF960] -> osm_mcmr_rcv_process: [ Sep 02 11:44:38 [42FFF960] -> osm_mcmr_rcv_join_mgrp: [ Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_rcv_callback: ] Sep 02 11:44:38 [42FFF960] -> osm_mcmr_rcv_join_mgrp: Dump of incoming record. Sep 02 11:44:38 [42FFF960] -> MCMember Record dump: MGID....................0xff12401bffff0000 : 0x0000000000090a0b PortGid.................0xfe80000000000000 : 0x0005ad000003d221 qkey....................0xB1B Mlid....................0x0 ScopeState..............0x1 Rate....................0x0 Mtu.....................0x0 TClass..................0x0 SLFlowLabelHopLimit.....0x0 Attached is the results of grep 1307c osm.log > 1307c.log. Sean -------------- next part -------------- A non-text attachment was scrubbed... Name: 1307c.log Type: text/x-log Size: 126112 bytes Desc: not available URL: From shubbell at dbresearch.net Fri Sep 2 10:26:55 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 13:26:55 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125681121.4398.6565.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <1125672418.4398.5735.camel@hal.voltaire.com> <43185DAF.3070809@dbresearch.net> <1125673768.4398.5857.camel@hal.voltaire.com> <43186249.7020902@dbresearch.net> <1125674928.4398.5977.camel@hal.voltaire.com> <43186806.9040807@dbresearch.net> <1125676380.4398.6107.camel@hal.voltaire.com> <43186B6B.4080605@dbresearch.net> <1125677354.4398.6191.camel@hal.voltaire.com> <4318713D.6050204@dbresearch.net> <1125678578.4398.6295.camel@hal.voltaire.com> <431873D8.2070708@dbresearch.net> <1125679359.4398.6366.camel@hal.voltaire.com> <4318779E.90401@dbresearch.net> <1125681121.4398.6565.camel@hal.voltaire.com> Message-ID: <43188B5F.8090403@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-02 at 12:02, Sean Hubbell wrote: > > >>Yes, this is a static IP address. The multicast address used is in the >>range 224.10.10.10 - 224.10.10.40. >>The receivers are started first. >> >> > >The receivers _should_ create the IB MC groups over which the IPmc >groups run. The transmitter(s) should join those groups. Can you send >the portion of the OSM log (in verbose mode) from a receiver startup to >validate the group creation ? Another one for a single transmitter side >(after the receive side). Can the topology be reduced to something >simpler till it starts to work ? > > > On reducing the topology, I would rather wait until all other options are estinguished before this as it was working last week. >Out of curiosity, what kernel version are you using ? > >I think there may be a temporary workaround for this so you can get on >with your real work (not troubleshooting this :-) > > > I'm game, but also I do not mind debugging this. Sean From halr at voltaire.com Fri Sep 2 11:20:15 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 14:20:15 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43188839.5030703@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43187BB1.1060402@dbresearch.net> <1125681282.4398.6587.camel@hal.voltaire.com> <43188839.5030703@dbresearch.net> Message-ID: <1125685128.4398.7046.camel@hal.voltaire.com> Hi again Sean, On Fri, 2005-09-02 at 13:13, Sean Hubbell wrote: Thanks for bearing with this process... > Sep 02 11:44:38 [447FF960] -> SA MAD dump: > base_ver................0x1 > mgmt_class..............0x3 > class_ver...............0x2 > method..................0x2 (SubnAdmSet) > status..................0x0 > resv....................0x0 > trans_id................0x7333c794d > attr_id.................0x38 > (MCMemberRecord) > resv1...................0x0 > attr_mod................0x0 > rmpp_version............0x0 > rmpp_type...............0x0 > rmpp_flags..............0x0 > rmpp_status.............0x0 > seg_num.................0x0 > payload_len/new_win.....0x0 > sm_key..................0x0000000000000000 > attr_offset.............0x0 > resv2...................0x0 > comp_mask...............0x00000000000130c7 > > > Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_process: [ > Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_process: Posting > Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD. > Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_process: ] > Sep 02 11:44:38 [42FFF960] -> osm_mcmr_rcv_process: [ > Sep 02 11:44:38 [42FFF960] -> osm_mcmr_rcv_join_mgrp: [ > Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_rcv_callback: ] > Sep 02 11:44:38 [42FFF960] -> osm_mcmr_rcv_join_mgrp: Dump of incoming > record. > Sep 02 11:44:38 [42FFF960] -> MCMember Record dump: > > MGID....................0xff12401bffff0000 : 0x0000000000090a0b > > PortGid.................0xfe80000000000000 : 0x0005ad000003d221 > qkey....................0xB1B > Mlid....................0x0 > ScopeState..............0x1 > Rate....................0x0 > Mtu.....................0x0 > TClass..................0x0 > SLFlowLabelHopLimit.....0x0 This looks like a create for 224.9.10.11. What was the response just below it ? > Attached is the results of grep 1307c osm.log > 1307c.log. Many of those are join failures but there are others like this: > comp_mask...............0x00000000000130c7 > comp_mask...............0x00000000000130c7 > comp_mask...............0x00000000000130c7 > comp_mask...............0x00000000000130c7 > comp_mask...............0x00000000000130c7 > comp_mask...............0x00000000000130c7 > comp_mask...............0x00000000000130c7 > comp_mask...............0x00000000000130c7 > comp_mask...............0x00000000000130c7 I would like the part of the log around these if possible. Let me know you svn version and I will provide a patch to display the MGID when the join fails. Also, it may be getting time to use those multicast routing and tracing tools mentioned earlier. -- Hal From shubbell at dbresearch.net Fri Sep 2 10:36:25 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 13:36:25 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125685128.4398.7046.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43187BB1.1060402@dbresearch.net> <1125681282.4398.6587.camel@hal.voltaire.com> <43188839.5030703@dbresearch.net> <1125685128.4398.7046.camel@hal.voltaire.com> Message-ID: <43188D99.9050708@dbresearch.net> >I would like the part of the log around these if possible. > >Let me know you svn version and I will provide a patch to display the >MGID when the join fails. > >Also, it may be getting time to use those multicast routing and tracing >tools mentioned earlier. > >-- Hal > > > > Can I stop osm.log and then blow the log file away, start it back and then send the log? Would this not be easier? Sean From rolandd at cisco.com Fri Sep 2 11:28:48 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 02 Sep 2005 11:28:48 -0700 Subject: [openib-general] Re: [PATCH] memory leaks in ipoib, srp In-Reply-To: <20050901183433.GA16664@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 1 Sep 2005 21:34:33 +0300") References: <20050901071301.GF1707@mellanox.co.il> <52ll2gahp7.fsf@cisco.com> <20050901183433.GA16664@mellanox.co.il> Message-ID: <52k6hz5nof.fsf@cisco.com> Michael> Type-safety is one. Cleaner memory management is another: Michael> its better for clients to allocate their own memory, as Michael> the two leaks that I sent patches for previously Michael> demonstrate. Not sure I understand this. The memory leaks in IPoIB/SRP were leaks of memory that the clients did allocate themselves. Michael> An additional thinking behind this is: ULPs (e.g. SDP, Michael> CM) need to keep lists of per-device objects and kill Michael> them on device removal. For example with change Sean Michael> proposes SDP will need to keep a list of per-device Michael> cm_ids in each connection. One idea, then, is in this Michael> example to make each cm_id a client, then this list is Michael> managed by device.c This doesn't really make sense to me. Each cm_id is attached to a device. You don't want to register/deregister a client and get a callback for every device when you create or destroy an SDP connection. Right now SDP has a single global CM listen -- it calls ib_cm_listen in sdp_conn.c when it's loaded. Sean's change would just require that SDP do a single listen for each device in the add callback and destroy that cm_id in the remove callback. - R. From shubbell at dbresearch.net Fri Sep 2 10:42:43 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 13:42:43 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125685128.4398.7046.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43187BB1.1060402@dbresearch.net> <1125681282.4398.6587.camel@hal.voltaire.com> <43188839.5030703@dbresearch.net> <1125685128.4398.7046.camel@hal.voltaire.com> Message-ID: <43188F13.3070608@dbresearch.net> Hal Rosenstock wrote: >Hi again Sean, > >On Fri, 2005-09-02 at 13:13, Sean Hubbell wrote: > >Thanks for bearing with this process... > > > >>Sep 02 11:44:38 [447FF960] -> SA MAD dump: >> base_ver................0x1 >> mgmt_class..............0x3 >> class_ver...............0x2 >> method..................0x2 (SubnAdmSet) >> status..................0x0 >> resv....................0x0 >> trans_id................0x7333c794d >> attr_id.................0x38 >>(MCMemberRecord) >> resv1...................0x0 >> attr_mod................0x0 >> rmpp_version............0x0 >> rmpp_type...............0x0 >> rmpp_flags..............0x0 >> rmpp_status.............0x0 >> seg_num.................0x0 >> payload_len/new_win.....0x0 >> sm_key..................0x0000000000000000 >> attr_offset.............0x0 >> resv2...................0x0 >> comp_mask...............0x00000000000130c7 >> >> >>Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_process: [ >>Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_process: Posting >>Dispatcher message OSM_MSG_MAD_MCMEMBER_RECORD. >>Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_process: ] >>Sep 02 11:44:38 [42FFF960] -> osm_mcmr_rcv_process: [ >>Sep 02 11:44:38 [42FFF960] -> osm_mcmr_rcv_join_mgrp: [ >>Sep 02 11:44:38 [447FF960] -> __osm_sa_mad_ctrl_rcv_callback: ] >>Sep 02 11:44:38 [42FFF960] -> osm_mcmr_rcv_join_mgrp: Dump of incoming >>record. >>Sep 02 11:44:38 [42FFF960] -> MCMember Record dump: >> >>MGID....................0xff12401bffff0000 : 0x0000000000090a0b >> >>PortGid.................0xfe80000000000000 : 0x0005ad000003d221 >> qkey....................0xB1B >> Mlid....................0x0 >> ScopeState..............0x1 >> Rate....................0x0 >> Mtu.....................0x0 >> TClass..................0x0 >> SLFlowLabelHopLimit.....0x0 >> >> > >This looks like a create for 224.9.10.11. What was the response just >below it ? > > > >>Attached is the results of grep 1307c osm.log > 1307c.log. >> >> > >Many of those are join failures but there are others like this: > > >> comp_mask...............0x00000000000130c7 >> comp_mask...............0x00000000000130c7 >> comp_mask...............0x00000000000130c7 >> comp_mask...............0x00000000000130c7 >> comp_mask...............0x00000000000130c7 >> comp_mask...............0x00000000000130c7 >> comp_mask...............0x00000000000130c7 >> comp_mask...............0x00000000000130c7 >> comp_mask...............0x00000000000130c7 >> >> > >I would like the part of the log around these if possible. > >Let me know you svn version and I will provide a patch to display the >MGID when the join fails. > >Also, it may be getting time to use those multicast routing and tracing >tools mentioned earlier. > >-- Hal > > > Results of ibroute: ibroute -M 4 0xc010 0xc020 Multicast mlids [0xc010-0xc020] of switch Lid 0x4 guid 0x0002c9010c584d40 (MT47396 Infiniscale-III Mellanox Technologies): 0 1 2 Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 MLid 0 valid mlids dumped Results for ibtracert: >ibtracert -m 0xc000 4 16 >warn: [5512] find_target_portguid: can't find to port > >ibtracert: error: can't find a multicast route from src to dest > > > From halr at voltaire.com Fri Sep 2 11:30:56 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 14:30:56 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43188D99.9050708@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43187BB1.1060402@dbresearch.net> <1125681282.4398.6587.camel@hal.voltaire.com> <43188839.5030703@dbresearch.net> <1125685128.4398.7046.camel@hal.voltaire.com> <43188D99.9050708@dbresearch.net> Message-ID: <1125685713.4398.7125.camel@hal.voltaire.com> On Fri, 2005-09-02 at 13:36, Sean Hubbell wrote: > Can I stop osm.log and then blow the log file away, start it back and > then send the log? Would this not be easier? Absolutely (or you can rename the old one). -- Hal From halr at voltaire.com Fri Sep 2 11:40:17 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 14:40:17 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43188F13.3070608@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43187BB1.1060402@dbresearch.net> <1125681282.4398.6587.camel@hal.voltaire.com> <43188839.5030703@dbresearch.net> <1125685128.4398.7046.camel@hal.voltaire.com> <43188F13.3070608@dbresearch.net> Message-ID: <1125686416.4398.7228.camel@hal.voltaire.com> On Fri, 2005-09-02 at 13:42, Sean Hubbell wrote: > Results of ibroute: > > ibroute -M 4 0xc010 0xc020 > Multicast mlids [0xc010-0xc020] of switch Lid 0x4 guid > 0x0002c9010c584d40 (MT47396 Infiniscale-III Mellanox Technologies): > 0 1 2 > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 > MLid > 0 valid mlids dumped How about MLIDs from 0xc000 to 0xc07f just to be really safe we should have it covered ? Also, that's one switch. Can you do this for all the switches ? > Results for ibtracert: > > >ibtracert -m 0xc000 4 16 > >warn: [5512] find_target_portguid: can't find to port > > > >ibtracert: error: can't find a multicast route from src to dest The broadcast group should be everywhere. What ports to LIDs 4 and 16 correspond to ? Are they switch port 0s or HCA ports ? At this point you have bounded the SM numerous times. Does it work after first bringing up up the SM and then either bringing up the end nodes from scratch ? -- Hal From shubbell at dbresearch.net Fri Sep 2 11:03:59 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 14:03:59 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125686416.4398.7228.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43187BB1.1060402@dbresearch.net> <1125681282.4398.6587.camel@hal.voltaire.com> <43188839.5030703@dbresearch.net> <1125685128.4398.7046.camel@hal.voltaire.com> <43188F13.3070608@dbresearch.net> <1125686416.4398.7228.camel@hal.voltaire.com> Message-ID: <4318940F.3060000@dbresearch.net> >At this point you have bounded the SM numerous times. Does it work after >first bringing up up the SM and then either bringing up the end nodes >from scratch ? > >-- Hal > > > > I am trying this as we speak, it takes the machines a little while to come up. Sean From shubbell at dbresearch.net Fri Sep 2 11:14:13 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 14:14:13 -0400 Subject: [openib-general] Connectivity In-Reply-To: <431884F6.4080101@mellanox.co.il> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> Message-ID: <43189675.7010704@dbresearch.net> Eitan Zahavi wrote: > > Eitan > > Hal and Eitan, The log file when tarred and gzipped is 1748K do you want me to send it to you off line or at all? Sean From mshefty at ichips.intel.com Fri Sep 2 12:18:24 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 02 Sep 2005 12:18:24 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <52u0h47hje.fsf@cisco.com> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> Message-ID: <4318A580.9040201@ichips.intel.com> Roland Dreier wrote: > Arlin> Shouldn't there be a new ibv_put_cq_event() to go with the > Arlin> ibv_get_cq_event() ? > > No, I think that's dealt with by sweeping the CQ in userspace when > destroying a QP. I don't think that sweeping the CQ in userspace eliminates the race. The call to ibv_get_cq_event() can be just about to return to the user when they call destroy in a separate thread. Destroy has no way of blocking, so get could return an invalid pointer. - Sean From halr at voltaire.com Fri Sep 2 12:15:19 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 15:15:19 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43189675.7010704@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43189675.7010704@dbresearch.net> Message-ID: <1125688334.4398.7491.camel@hal.voltaire.com> Hi Sean, On Fri, 2005-09-02 at 14:14, Sean Hubbell wrote: > Hal and Eitan, > > The log file when tarred and gzipped is 1748K do you want me to send > it to you off line or at all? Offline. I don't think everyone wants this. I take it it doesn't work after a clean start up of everything... -- Hal From halr at voltaire.com Fri Sep 2 12:18:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 15:18:49 -0400 Subject: [openib-general] Problem following trunk/userspace/management/README In-Reply-To: References: Message-ID: <1125688518.4398.7517.camel@hal.voltaire.com> Hi Roel, On Fri, 2005-09-02 at 13:52, Roel van der Goot wrote: > Hi all, > > I am trying to get openib installed. > This is what I have done sofar: > > - installed Fedora Core 4 > - downloaded sources for linux-2.6.13 > - svn co https://openib.org/svn/gen2/trunk > - replace linux-2.6.13/drivers/infiniband with > trunk/src/linux-kernel/infiniband > - compiled and installed this new kernel > > Now I am trying to follow the instructions in > trunk/userspace/management/README: > > [root at topcom-63 libibcommon]# ./autogen.sh > + aclocal -I config > Can't locate object method "path" via package "Request" at /usr/share/autoconf/Autom4te/C4che.pm line 69, line 111. > aclocal: autom4te failed with exit status: 1 > + libtoolize --force --copy > You should update your `aclocal.m4' by running aclocal. > Putting files in AC_CONFIG_AUX_DIR, `config'. > + autoheader > Can't locate object method "path" via package "Request" at /usr/share/autoconf/Autom4te/C4che.pm line 69, line 111. > autoheader: /usr/bin/autom4te failed with exit status: 1 > + automake --foreign --add-missing --copy > Can't locate object method "path" via package "Request" at /usr/share/autoconf/Autom4te/C4che.pm line 69, line 111. > automake: autoconf failed with exit status: 1 > + autoconf > Can't locate object method "path" via package "Request" at /usr/share/autoconf/Autom4te/C4che.pm line 69, line 111. > [root at topcom-63 libibcommon]# > > I googled for this problem on the openib-general list and it > appears to have been reported in June 2004, however this month > just got removed from the openib-general archives. > > I have no problem running autogen.sh in the same directory on an > older kernel (2.4.20-8), but expect that to cause problems later. > > Anybody know how I can take this hurdle? I have done this with an FC3 without problem (automake 1.9.2) and don't have a FC4 machine to try this on. Does the same thing occur with the other management libraries (libibmad, libibumad) ? Can you clean out any non svn files and start over ? What version of automake is on FC4 ? Did you look for this issue related to automake (aclocal) rather than OpenIB to look for any pointers ? -- Hal From shubbell at dbresearch.net Fri Sep 2 11:32:45 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 14:32:45 -0400 Subject: [openib-general] Connectivity In-Reply-To: <1125688334.4398.7491.camel@hal.voltaire.com> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43189675.7010704@dbresearch.net> <1125688334.4398.7491.camel@hal.voltaire.com> Message-ID: <43189ACD.8070903@dbresearch.net> Hal Rosenstock wrote: >Hi Sean, > >On Fri, 2005-09-02 at 14:14, Sean Hubbell wrote: > > >>Hal and Eitan, >> >> The log file when tarred and gzipped is 1748K do you want me to send >>it to you off line or at all? >> >> > >Offline. I don't think everyone wants this. I take it it doesn't work >after a clean start up of everything... > >-- Hal > > > > Check this out... I has started working now. Is there a timeout if something fails when starting up? I am still sending the log file to see if we can still find out what it is, but I understand if you do not have time... I will sent the log offline to you Hal and thanks for the great help. At least now I have a lot more understanding of what is actually going on ... Sean Sean From rolandd at cisco.com Fri Sep 2 12:38:35 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 02 Sep 2005 12:38:35 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <4318A580.9040201@ichips.intel.com> (Sean Hefty's message of "Fri, 02 Sep 2005 12:18:24 -0700") References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> Message-ID: <52br3b5kg4.fsf@cisco.com> Sean> I don't think that sweeping the CQ in userspace eliminates Sean> the race. The call to ibv_get_cq_event() can be just about Sean> to return to the user when they call destroy in a separate Sean> thread. Destroy has no way of blocking, so get could return Sean> an invalid pointer. Sorry, I was a little sloppy in my reasoning. The real reason I sweep the CQ before returning from the destroy QP operation is for internal libmthca implementation reasons. There's no requirement that this be done in general, and in fact CQ entries corresponding to work requests posted to a given QP may be polled after the QP is destroyed. Section 11.2.4.4 of the IB spec (Destroy Queue Pair) says: The CI does not guarantee that CQEs generated for a QP prior to its destruction can be retrieved from the CQ after that QP has been destroyed. I take this to mean that it's fine if CQEs _are_ retrieved after a QP is destroyed. Since a CQE does not have a pointer to the QP, but only a QP number and a consumer-defined work request ID, I think this is OK; there's no direct reference to a stale resource. - R. From viswa.krish at gmail.com Fri Sep 2 12:39:15 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 2 Sep 2005 12:39:15 -0700 Subject: [Fwd: Re: [openib-general] kernel oops] In-Reply-To: <43172EFA.7040709@xsigo.com> References: <43172EFA.7040709@xsigo.com> Message-ID: <4df28be405090212395a3804c3@mail.gmail.com> The patch failed to fix the panic.. subnetmgr5 login: ib_at: ib_dev_ats_op: dev (c0449800) ib0 already has pending op 2 Unable to handle kernel NULL pointer dereference at virtual address 00000068 printing eip: c02fee65 *pde = 365a7001 Oops: 0000 [#1] SMP Modules linked in: nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbd sd_mod CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010086 (2.6.13) EIP is at _spin_lock_irqsave+0xa/0x51 eax: 00000064 ebx: 00000286 ecx: f665de6c edx: c037bcd0 esi: 00000064 edi: 00000064 ebp: 00000000 esp: f665de00 ds: 007b es: 007b ss: 0068 Process lt-ucmpost (pid: 3749, threadinfo=f665c000 task=f6478020) Stack: c01410ed 00000001 00000000 c037bcd0 c0272f87 00000000 000000d0 f665deac f67abe80 c027f14c c035ef80 c17f8ec0 f665de6c 0c300000 00000064 f665deac f67abe80 c0284cfa 00000000 0c300000 00000064 000000d0 c02847b8 f67abe80 Call Trace: [] __alloc_pages+0x324/0x3f1 [] ib_get_client_data+0x14/0x54 [] ib_sa_path_rec_get+0x1b/0x138 [] resolve_path+0x8c/0x15b [] path_req_complete+0x0/0xf7 [] rtnetlink_dump_all+0x0/0x9e [] rtnetlink_done+0x0/0x3 [] ib_at_paths_by_route+0xf5/0x10f [] same_path_req+0x0/0x95 [] ib_uat_paths_by_route+0xef/0x1c4 [] rtnetlink_dump_all+0x0/0x9e > > > > ---------- Forwarded message ---------- > From: Sean Hefty > To: Hal Rosenstock > Date: Thu, 01 Sep 2005 09:04:37 -0700 > Subject: Re: [openib-general] kernel oops > Hal Rosenstock wrote: > > Here's a patch for this. Let me know if it works. [I tried it out and it > > works for me.] If it does, the next question is how does the pointer get > > trashed. > > I don't think that the pointer is getting trashed. The SA was not running, > so I > don't think that any route was returned. > > - Sean > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Fri Sep 2 12:43:18 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 02 Sep 2005 12:43:18 -0700 Subject: [Fwd: Re: [openib-general] kernel oops] In-Reply-To: <4df28be405090212395a3804c3@mail.gmail.com> (Viswanath Krishnamurthy's message of "Fri, 2 Sep 2005 12:39:15 -0700") References: <43172EFA.7040709@xsigo.com> <4df28be405090212395a3804c3@mail.gmail.com> Message-ID: <527jdz5k89.fsf@cisco.com> Not really related to the ib_at oops, since I don't know that code. But have you made any progress in being able to post the code to reproduce the other oops (at mthca_poll_cq)? Thanks, Roland From halr at voltaire.com Fri Sep 2 12:42:59 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 15:42:59 -0400 Subject: [openib-general] Re: OpenSM 1.8.0 Merge Status and Operational Issue In-Reply-To: <4317ECF6.9070507@mellanox.co.il> References: <1125609366.4398.1014.camel@hal.voltaire.com> <4317ECF6.9070507@mellanox.co.il> Message-ID: <1125690178.4398.7656.camel@hal.voltaire.com> On Fri, 2005-09-02 at 02:11, Eitan Zahavi wrote: > This was quick. I guess Yael's work on merging the 1.8.0 against > the truck did help. Yes, Yael's hard work made this easy for me. > > I have a 4x HCA port (1x/4x LinkWidthEnable and Supported) connected via > > a 1x analyzer connected to a switch (so is 1x LinkWidthActive). > > OpenSM does not seem to want to bring this port up. It tries once and > > gives up until the physical link is cycled (cable pull and reinsertion). > > It does work running over a 4x link with 4x neighbor ports. > Can you try this with the pre-merge build? Does it work? Just to make sure it > is a new bug. It works with the old OpenIB OpenSM (at least it will bring the port back if it doesn't come up the first time). > Also with the 1.8.0 you could force the SM to heavy sweep by sending it > kill -HUP Yes, that makes it go through the cycle again but it still doesn't come up. > Please provide detailed (-V) log file so we can see what is happening too. Attached is a gzip'd version of the log. The port in question is GUID 0x0008f10403960559. There are a number of errors detected on Sets of PortInfo which is consistent with what I saw on the wire. I also see these all over the log: Sep 02 15:27:02 548378 [B572BA40] -> osm_state_mgr_process: Received signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST in state OSM_SM_STATE_SET_ACTIVE_WAIT. Sep 02 15:27:02 548398 [B572BA40] -> __osm_state_mgr_signal_warning: Invalid signal OSM_SIGNAL_IDLE_TIME_PROCESS_REQUEST(9) in state OSM_SM_STATE_SET_ACTIVE_WAIT. > > Can you try this ? > We will on Sunday/Monday. I have a feeling I run into this once. Not sure if it is not an Analyzer issue. > Could you please take the Analyzer out and retry? Unfortunately I can't and keep it at 1x as I don't have a cable to do this. -- Hal -------------- next part -------------- A non-text attachment was scrubbed... Name: osm.log.gz Type: application/x-gzip Size: 181007 bytes Desc: not available URL: From halr at voltaire.com Fri Sep 2 13:02:44 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 16:02:44 -0400 Subject: [Fwd: Re: [openib-general] kernel oops] In-Reply-To: <4df28be405090212395a3804c3@mail.gmail.com> References: <43172EFA.7040709@xsigo.com> <4df28be405090212395a3804c3@mail.gmail.com> Message-ID: <1125690601.4398.7695.camel@hal.voltaire.com> On Fri, 2005-09-02 at 15:39, Viswanath Krishnamurthy wrote: > The patch failed to fix the panic.. Can you describe your setup ? Did you just run ucmpost without an SM/SA running or is it a different scenario ? Thanks. -- Hal From viswa.krish at gmail.com Fri Sep 2 13:53:27 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 2 Sep 2005 13:53:27 -0700 Subject: [Fwd: Re: [openib-general] kernel oops] In-Reply-To: <527jdz5k89.fsf@cisco.com> References: <43172EFA.7040709@xsigo.com> <4df28be405090212395a3804c3@mail.gmail.com> <527jdz5k89.fsf@cisco.com> Message-ID: <4df28be405090213534bec335d@mail.gmail.com> I am working on it. With the updated version of code, slightly difficult to reproduce. -Viswa On 9/2/05, Roland Dreier wrote: > > Not really related to the ib_at oops, since I don't know that code. > > But have you made any progress in being able to post the code to > reproduce the other oops (at mthca_poll_cq)? > > Thanks, > Roland > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri Sep 2 13:52:54 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 16:52:54 -0400 Subject: [openib-general] IPoIB Multicast Connectivity Message-ID: <1125694369.4398.8005.camel@hal.voltaire.com> Hi Sean, Here's my (somewhat long winded) analysis of your osm.log: First I see: Sep 02 13:46:34 [AB43F140] -> osm_vendor_bind: Unable to register class 129 version 1. Sep 02 13:46:34 [AB43F140] -> osm_vendor_bind: ] Sep 02 13:46:34 [AB43F140] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind() failed. Sep 02 13:46:34 [AB43F140] -> osm_sm_mad_ctrl_bind: ] Sep 02 13:46:34 [AB43F140] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind() failed (IB_ERROR). and then OpenSM shuts down and is restarted 4 minutes later. It does that again and then it is up and running. Class 129 is 0x81 which is SubnGet. Was the ib_umad module running ? What OpenIB svn version are you running ? What Linux kernel version ? In terms of failures, I then see a join failure on 224.0.0.22 MGID....................0xff12401bffff0000 : 0x000000000000016 PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Sep 02 14:01:56 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7. That is repeated a number of times from this port and some other ports. PortGid.................0xfe80000000000000 : 0x0005ad000003d269 PortGid.................0xfe80000000000000 : 0x0005ad0000047a81 That may be OK as 224.0.0.22 is for IGMP and perhaps there are no IPmc routers on this IPoIB subnet ? All the IPmc is subnet local, right ? In terms of MC groups, I do see the IPv4 broadcast group being setup MGID....................0xff12401bffff0000 : 0x00000000ffffffff PortGid.................0xfe80000000000000 : 0x0005ad000003d269 Mlid....................0xC000 and others too: MGID....................0xff12401bffff0000 : 0x0000000000000001 PortGid.................0xfe80000000000000 : 0x0005ad000003d269 Mlid....................0xC001 MGID....................0xff12401bffff0000 : 0x00000000000000fb PortGid.................0xfe80000000000000 : 0x0005ad000003d269 Mlid....................0xC002 MGID....................0xff12601bffff0000 : 0x00000001ff03d269 PortGid.................0xfe80000000000000 : 0x0005ad000003d269 Mlid....................0xC003 MGID....................0xff12601bffff0000 : 0x0000000000000001 PortGid.................0xfe80000000000000 : 0x0005ad000003d269 Mlid....................0xC004 I then see the next node come up: MGID....................0xff12401bffff0000 : 0x00000000ffffffff PortGid.................0xfe80000000000000 : 0x0005ad0000047a81 Mlid....................0xC000 MGID....................0xff12401bffff0000 : 0x0000000000000001 PortGid.................0xfe80000000000000 : 0x0005ad0000047a81 Mlid....................0xC001 and then the next one: MGID....................0xff12401bffff0000 : 0x00000000ffffffff PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Mlid....................0xC000 MGID....................0xff12401bffff0000 : 0x00000000ffffffff PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Mlid....................0xC000 Perhaps the response doesn't make it back so the end node rerequested this. MGID....................0xff12401bffff0000 : 0x0000000000000001 PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Mlid....................0xC001 MGID....................0xff12401bffff0000 : 0x0000000000000001 PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Mlid....................0xC001 MGID....................0xff12401bffff0000 : 0x0000000000000001 PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Mlid....................0xC001 MGID....................0xff12401bffff0000 : 0x0000000000000001 PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Mlid....................0xC001 MGID....................0xff12401bffff0000 : 0x0000000000000001 PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Mlid....................0xC001 MGID....................0xff12401bffff0000 : 0x0000000000000001 PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Mlid....................0xC001 MGID....................0xff12401bffff0000 : 0x0000000000000001 PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Mlid....................0xC001 ... Same thing (but worse) on this group... Perhaps there is some problem with the HCA or the path to that HCA. I then see that node rerequest the broadcast group and then 224.0.0.1. Was it rebooted ? That node seem to be rerequesting quite a number of times. I think you are also a candidate to try out the new OpenSM when it is available (I expect early next week) as the multicast handling by the SM is much better. I'll be curious to see if this still occurs or not. This is not to say their might not be other issues but these would be the first ones to get squared away. I'm not exactly sure what the SA client retry strategy is in IPoIB in the end node but that may be germane to this as well. I also see several of your IPmc addresses flow by in the log: MGID....................0xff12401bffff0000 : 0x00000000000a0a15 PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 Mlid....................0xC005 MGID....................0xff12401bffff0000 : 0x00000000000a0a15 PortGid.................0xfe80000000000000 : 0x0005ad0000047a81 Mlid....................0xC007 MGID....................0xff12401bffff0000 : 0x00000000000a0a15 PortGid.................0xfe80000000000000 : 0x0005ad000003d269 Mlid....................0xC007 That looks like the SM set up a different MLID for the same group (1 port on one MLID and 2 other ports on the second MLID). MGID....................0xff12401bffff0000 : 0x00000000000a0a0a PortGid.................0xfe80000000000000 : 0x0005ad0000047a81 Mlid....................0xC009 MGID....................0xff12401bffff0000 : 0x00000000000a0a0a PortGid.................0xfe80000000000000 : 0x0005ad000003d269 Mlid....................0xC009 That one repeats a bunch of times. Several weirdnesses that need some further investigation. -- Hal From viswa.krish at gmail.com Fri Sep 2 13:59:00 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 2 Sep 2005 13:59:00 -0700 Subject: [Fwd: Re: [openib-general] kernel oops] In-Reply-To: <1125690601.4398.7695.camel@hal.voltaire.com> References: <43172EFA.7040709@xsigo.com> <4df28be405090212395a3804c3@mail.gmail.com> <1125690601.4398.7695.camel@hal.voltaire.com> Message-ID: <4df28be4050902135926d078f5@mail.gmail.com> Here is the setup.. #svn info Path: . URL: https://openib.org/svn/gen2/trunk Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd Revision: 3295 Node Kind: directory Schedule: normal Last Changed Author: halr Last Changed Rev: 3295 Last Changed Date: 2005-09-01 12:07:54 -0700 (Thu, 01 Sep 2005) Patch applied to core/at.c and kernel 2.6.13 recompiled. Machine A ========= Running opensm Run ucmpost machine B ========= ./ucmpost The problem is reproducible when you *cannot* ping each other [root at subnetmgr4 ~]# ibv_devinfo hca_id: mthca0 fw_ver: 1.0.1 node_guid: 0002:c902:0040:0d00 sys_image_guid: 0002:c902:0040:0d03 max_mr_size: 0xffffffffffffffff page_size_cap: 0x0 vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0x0 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: invalid MTU (0) < What is this ??> active_mtu: invalid MTU (0) sm_lid: 1 port_lid: 3 port_lmc: 0x00 -Viswa On 02 Sep 2005 16:02:44 -0400, Hal Rosenstock wrote: > > On Fri, 2005-09-02 at 15:39, Viswanath Krishnamurthy wrote: > > The patch failed to fix the panic.. > > Can you describe your setup ? Did you just run ucmpost without an SM/SA > running or is it a different scenario ? > > Thanks. > > -- Hal > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri Sep 2 13:57:06 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 16:57:06 -0400 Subject: [openib-general] Connectivity In-Reply-To: <43189ACD.8070903@dbresearch.net> References: <43175A97.3000300@dbresearch.net> <1125607390.4398.851.camel@hal.voltaire.com> <43175F7C.3020309@dbresearch.net> <1125609102.4398.990.camel@hal.voltaire.com> <43183E42.8040404@dbresearch.net> <1125666659.4398.5213.camel@hal.voltaire.com> <4318457D.10308@dbresearch.net> <1125668022.4398.5341.camel@hal.voltaire.com> <43184C96.8020304@dbresearch.net> <1125669917.4398.5519.camel@hal.voltaire.com> <43185775.4080506@dbresearch.net> <43185B14.6020205@dbresearch.net> <1125675232.4398.6009.camel@hal.voltaire.com> <43186A39.7000800@dbresearch.net> <431884F6.4080101@mellanox.co.il> <43189675.7010704@dbresearch.net> <1125688334.4398.7491.camel@hal.voltaire.com> <43189ACD.8070903@dbresearch.net> Message-ID: <1125694625.4398.8027.camel@hal.voltaire.com> On Fri, 2005-09-02 at 14:32, Sean Hubbell wrote: > Check this out... I has started working now. Is there a timeout if > something fails when starting up? There are retry timers in the SA client of the IPoIB end nodes. That may be what ultimately makes it work. > I am still sending the log file to see if we can still find out what it > is, but I understand if you do not have time... > I will sent the log offline to you Hal and thanks for the great help. At > least now I have a lot more understanding of what is actually going on ... It looks like there may be some noise issues on your network. I'd be curious about the error counters in the HCA ports and perhaps the switch ports along the paths to the OpenSM. -- Hal From jlentini at netapp.com Fri Sep 2 14:02:28 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 2 Sep 2005 17:02:28 -0400 (EDT) Subject: [openib-general] Re: RDMA Generic Connection Management In-Reply-To: <52r7cadqsy.fsf@cisco.com> References: <521x4bjqls.fsf@cisco.com> <6.2.3.4.2.20050830135332.064b9030@exnane01.nane.netapp.com> <521x4bi9su.fsf@cisco.com> <6.2.3.4.2.20050830140954.060c9030@exnane01.nane.netapp.com> <52k6i3gukm.fsf@cisco.com> <6.2.3.4.2.20050830142125.063e1cc0@exnane01.nane.netapp.com> <527je3gtmq.fsf@cisco.com> <6.2.3.4.2.20050830145906.05104890@exnane01.nane.netapp.com> <523borgs4r.fsf@cisco.com> <52zmqyeq58.fsf@cisco.com> <52r7cadqsy.fsf@cisco.com> Message-ID: On Wed, 31 Aug 2005, Roland Dreier wrote: > James> The device could still be used after it's gone. For > James> example: > > James> - the user is configuring SRP via sysfs. The thread in > James> srp_create_target() has just called ib_sa_path_rec_get() > James> [srp.c line 1209] and is waiting for the path record query > James> to complete in wait_for_completion() - the SA callback, > James> srp_path_rec_completion(), is called. This callback thread > James> will make several verb calls (ib_create_cq, > James> ib_req_notify_cq, ib_create_qp, ...) without any > James> coordination with the hotplug device removal callback, > James> srp_remove_one > > I don't think this can happen. How could srp_remove_one get past > > wait_for_completion(&host->released); > > if the sysfs file is still in use? You're right. srp_remove_one will wait for the sysfs file to close. What about SRP's interactions with the SCSI layer? When scsi_remove_host() returns are you guaranteed that there are no SCSI calls into your code in progress (e.g. in srp_queuecommand)? > James> Notice that if the SA client's hotplug removal function, > James> ib_sa_remove_one(), ensured that all callbacks had > James> completed before returning the problem would be fixed. This > James> would protect all ULPs from having to deal with hotplug > James> races in their SA callback function. The fix belongs in the > James> SA client (the core stack), not in SRP. > > All SA client callbacks are driven by the MAD layer. And > ib_sa_remove_one() does ib_unregister_mad_agent(), which should wait > for all callbacks to finish. So I think we already do the best we can > here. Unfortunately the SA client code must clean up after all the > ULPs that depend on it, because ULPs can use the SA up until they know > the device is gone. But I don't see a way around that. > > James> All the ULPs are deficient with respect to their hotplug > James> synchronization. Given that there is a common problem, > James> doesn't it make sense to try and solve it in a generic way > James> instead of in each ULP? > > Yes, but what is the generic way? The generic way would be to handle this in a common layer. For the IB verbs + RDMA connection API to be as easy to use as the sockets API, then it needs to make this issue transparent. Take the current rpc code in net/sunrpc as an example. It uses the sock_create_kern(), kernel_sendmsg(), kernel_recvmsg(), etc. without ever needing to worry about hotplug events. The layers between it and the low level drivers (Ethernet, IBoIP, etc.) take care of that. From halr at voltaire.com Fri Sep 2 14:04:42 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 17:04:42 -0400 Subject: [Fwd: Re: [openib-general] kernel oops] In-Reply-To: <4df28be4050902135926d078f5@mail.gmail.com> References: <43172EFA.7040709@xsigo.com> <4df28be405090212395a3804c3@mail.gmail.com> <1125690601.4398.7695.camel@hal.voltaire.com> <4df28be4050902135926d078f5@mail.gmail.com> Message-ID: <1125695082.4398.8037.camel@hal.voltaire.com> On Fri, 2005-09-02 at 16:59, Viswanath Krishnamurthy wrote: > Here is the setup.. Thanks. A couple more questions: > #svn info > Path: . > > URL: https://openib.org/svn/gen2/trunk > Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd > Revision: 3295 > Node Kind: directory > Schedule: normal > Last Changed Author: halr > Last Changed Rev: 3295 > Last Changed Date: 2005-09-01 12:07:54 -0700 (Thu, 01 Sep 2005) > > > Patch applied to core/at.c and kernel 2.6.13 recompiled. > > > Machine A > ========= > Running opensm > > Run ucmpost > > machine B > ========= > ./ucmpost Are these back to back HCAs or is there a switch in between ? > The problem is reproducible when you *cannot* ping each other over IPoIB ? > [root at subnetmgr4 ~]# ibv_devinfo > hca_id: mthca0 > fw_ver: 1.0.1 > node_guid: 0002:c902:0040:0d00 > sys_image_guid: 0002:c902:0040:0d03 > max_mr_size: 0xffffffffffffffff > page_size_cap: 0x0 > vendor_id: 0x02c9 > vendor_part_id: 25204 > hw_ver: 0x0 > phys_port_cnt: 1 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: invalid MTU (0) < > What is this ??> > active_mtu: invalid MTU (0) If the program is right and those are the real values, somehow max_mtu is trashed which causes active_mtu to be invalid which could break all sorts of things... > sm_lid: 1 > port_lid: 3 > port_lmc: 0x00 That's on the remote (from the SM) machine. -- Hal From shubbell at dbresearch.net Fri Sep 2 13:28:25 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 02 Sep 2005 16:28:25 -0400 Subject: [openib-general] Re: IPoIB Multicast Connectivity In-Reply-To: <1125694369.4398.8005.camel@hal.voltaire.com> References: <1125694369.4398.8005.camel@hal.voltaire.com> Message-ID: <4318B5E9.9020606@dbresearch.net> Hal Rosenstock wrote: >Hi Sean, > >Here's my (somewhat long winded) analysis of your osm.log: > >First I see: >Sep 02 13:46:34 [AB43F140] -> osm_vendor_bind: Unable to register class 129 version 1. >Sep 02 13:46:34 [AB43F140] -> osm_vendor_bind: ] >Sep 02 13:46:34 [AB43F140] -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind() failed. >Sep 02 13:46:34 [AB43F140] -> osm_sm_mad_ctrl_bind: ] >Sep 02 13:46:34 [AB43F140] -> osm_sm_bind: ERR 2E10: SM MAD Controller bind() failed (IB_ERROR). >and then OpenSM shuts down and is restarted 4 minutes later. > >It does that again and then it is up and running. > >Class 129 is 0x81 which is SubnGet. Was the ib_umad module running ? > >What OpenIB svn version are you running ? What Linux kernel version ? > >In terms of failures, I then see a join failure on 224.0.0.22 > MGID....................0xff12401bffff0000 : 0x000000000000016 > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 >Sep 02 14:01:56 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7. > >That is repeated a number of times from this port and some other ports. > PortGid.................0xfe80000000000000 : 0x0005ad000003d269 > PortGid.................0xfe80000000000000 : 0x0005ad0000047a81 > >That may be OK as 224.0.0.22 is for IGMP and perhaps there are no IPmc routers on this >IPoIB subnet ? All the IPmc is subnet local, right ? > > >In terms of MC groups, I do see the IPv4 broadcast group being setup > MGID....................0xff12401bffff0000 : 0x00000000ffffffff > PortGid.................0xfe80000000000000 : 0x0005ad000003d269 > Mlid....................0xC000 >and others too: > > MGID....................0xff12401bffff0000 : 0x0000000000000001 > PortGid.................0xfe80000000000000 : 0x0005ad000003d269 > Mlid....................0xC001 > > MGID....................0xff12401bffff0000 : 0x00000000000000fb > PortGid.................0xfe80000000000000 : 0x0005ad000003d269 > Mlid....................0xC002 > > MGID....................0xff12601bffff0000 : 0x00000001ff03d269 > PortGid.................0xfe80000000000000 : 0x0005ad000003d269 > Mlid....................0xC003 > > MGID....................0xff12601bffff0000 : 0x0000000000000001 > PortGid.................0xfe80000000000000 : 0x0005ad000003d269 > Mlid....................0xC004 > >I then see the next node come up: > > MGID....................0xff12401bffff0000 : 0x00000000ffffffff > PortGid.................0xfe80000000000000 : 0x0005ad0000047a81 > Mlid....................0xC000 > > MGID....................0xff12401bffff0000 : 0x0000000000000001 > PortGid.................0xfe80000000000000 : 0x0005ad0000047a81 > Mlid....................0xC001 > >and then the next one: > > MGID....................0xff12401bffff0000 : 0x00000000ffffffff > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 > Mlid....................0xC000 > MGID....................0xff12401bffff0000 : 0x00000000ffffffff > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 > Mlid....................0xC000 > >Perhaps the response doesn't make it back so the end node rerequested this. > > MGID....................0xff12401bffff0000 : 0x0000000000000001 > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 > Mlid....................0xC001 > MGID....................0xff12401bffff0000 : 0x0000000000000001 > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 > Mlid....................0xC001 > MGID....................0xff12401bffff0000 : 0x0000000000000001 > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 > Mlid....................0xC001 > MGID....................0xff12401bffff0000 : 0x0000000000000001 > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 > Mlid....................0xC001 > MGID....................0xff12401bffff0000 : 0x0000000000000001 > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 > Mlid....................0xC001 > MGID....................0xff12401bffff0000 : 0x0000000000000001 > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 > Mlid....................0xC001 > MGID....................0xff12401bffff0000 : 0x0000000000000001 > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 > Mlid....................0xC001 >... > >Same thing (but worse) on this group... > >Perhaps there is some problem with the HCA or the path to that HCA. > >I then see that node rerequest the broadcast group and then 224.0.0.1. >Was it rebooted ? That node seem to be rerequesting quite a number of >times. > >I think you are also a candidate to try out the new OpenSM when it is >available (I expect early next week) as the multicast handling by the >SM is much better. I'll be curious to see if this still occurs or not. > >This is not to say their might not be other issues but these would be >the first ones to get squared away. > >I'm not exactly sure what the SA client retry strategy is in IPoIB in >the end node but that may be germane to this as well. > >I also see several of your IPmc addresses flow by in the log: > > MGID....................0xff12401bffff0000 : 0x00000000000a0a15 > PortGid.................0xfe80000000000000 : 0x0005ad000003cfb9 > Mlid....................0xC005 > > MGID....................0xff12401bffff0000 : 0x00000000000a0a15 > PortGid.................0xfe80000000000000 : 0x0005ad0000047a81 > Mlid....................0xC007 > > MGID....................0xff12401bffff0000 : 0x00000000000a0a15 > PortGid.................0xfe80000000000000 : 0x0005ad000003d269 > Mlid....................0xC007 > >That looks like the SM set up a different MLID for the same group (1 >port on one MLID and 2 other ports on the second MLID). > > MGID....................0xff12401bffff0000 : 0x00000000000a0a0a > PortGid.................0xfe80000000000000 : 0x0005ad0000047a81 > Mlid....................0xC009 > > MGID....................0xff12401bffff0000 : 0x00000000000a0a0a > PortGid.................0xfe80000000000000 : 0x0005ad000003d269 > Mlid....................0xC009 > >That one repeats a bunch of times. Several weirdnesses that need some >further investigation. > >-- Hal > > > > > I'll get the new version next week and then look into it. I'll try that and let you know the results. If I have problems, I'll send the version and we'll at least know what version of openib I have as I cannot find it. On a side note, I could not ask for better assistance. Thanks Hal. Sean From jlentini at netapp.com Fri Sep 2 14:22:09 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 2 Sep 2005 17:22:09 -0400 (EDT) Subject: [openib-general] [PATCH] only build userspace verbs support if requested Message-ID: only build userspace verbs support if requested Signed-off-by: James Lentini Index: core/Makefile =================================================================== --- core/Makefile (revision 3295) +++ core/Makefile (working copy) @@ -1,9 +1,8 @@ EXTRA_CFLAGS += -Idrivers/infiniband/include -Idrivers/infiniband/ulp/ipoib -obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_ping.o \ - ib_cm.o ib_sa.o ib_umad.o ib_ucm.o \ - ib_at.o ib_uat.o -obj-$(CONFIG_INFINIBAND_USER_VERBS) += ib_uverbs.o +obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_ping.o ib_cm.o \ + ib_sa.o ib_at.o +obj-$(CONFIG_INFINIBAND_USER_VERBS) += ib_uverbs.o ib_umad.o ib_ucm.o ib_uat.o ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ device.o fmr_pool.o cache.o -------------- next part -------------- Index: core/Makefile =================================================================== --- core/Makefile (revision 3295) +++ core/Makefile (working copy) @@ -1,9 +1,8 @@ EXTRA_CFLAGS += -Idrivers/infiniband/include -Idrivers/infiniband/ulp/ipoib -obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_ping.o \ - ib_cm.o ib_sa.o ib_umad.o ib_ucm.o \ - ib_at.o ib_uat.o -obj-$(CONFIG_INFINIBAND_USER_VERBS) += ib_uverbs.o +obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_ping.o ib_cm.o \ + ib_sa.o ib_at.o +obj-$(CONFIG_INFINIBAND_USER_VERBS) += ib_uverbs.o ib_umad.o ib_ucm.o ib_uat.o ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ device.o fmr_pool.o cache.o From rolandd at cisco.com Fri Sep 2 14:28:42 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 02 Sep 2005 14:28:42 -0700 Subject: [openib-general] Re: [PATCH] only build userspace verbs support if requested In-Reply-To: (James Lentini's message of "Fri, 2 Sep 2005 17:22:09 -0400 (EDT)") References: Message-ID: <523bon5fcl.fsf@cisco.com> Makes sense I guess... should the help text for INFINIBAND_USER_VERBS be rewritten? Actually, should we rename the option to INFINIBAND_USER_ACCESS if we're going to make this change? - R. From jlentini at netapp.com Fri Sep 2 14:32:08 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 2 Sep 2005 17:32:08 -0400 (EDT) Subject: [openib-general] Re: [PATCH] uDAPL changes to support async events In-Reply-To: References: Message-ID: On Thu, 1 Sep 2005, Arlin Davis wrote: > James, > > Here are the changes to support async events. Also consolidated the > uAT,uCM,uCQ threads into one processing thread. > > Thanks, > -arlin Hi Arlin, I'm having trouble applying this cleanly. Several of the lines wrapped. I went through and tried to fix them by hand, but then I started seeing errors like this: patch -p0 -b < ~/patch patching file dapl/openib/dapl_ib_util.c Hunk #1 FAILED at 55. Hunk #2 FAILED at 131. Hunk #3 FAILED at 151. Hunk #4 FAILED at 207. Hunk #5 FAILED at 235. Hunk #6 FAILED at 260. Hunk #7 FAILED at 290. ... Could resend as an attachment? Thanks, james From rolandd at cisco.com Fri Sep 2 14:40:49 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 02 Sep 2005 14:40:49 -0700 Subject: [openib-general] Re: RDMA Generic Connection Management In-Reply-To: (James Lentini's message of "Fri, 2 Sep 2005 17:02:28 -0400 (EDT)") References: <521x4bjqls.fsf@cisco.com> <6.2.3.4.2.20050830135332.064b9030@exnane01.nane.netapp.com> <521x4bi9su.fsf@cisco.com> <6.2.3.4.2.20050830140954.060c9030@exnane01.nane.netapp.com> <52k6i3gukm.fsf@cisco.com> <6.2.3.4.2.20050830142125.063e1cc0@exnane01.nane.netapp.com> <527je3gtmq.fsf@cisco.com> <6.2.3.4.2.20050830145906.05104890@exnane01.nane.netapp.com> <523borgs4r.fsf@cisco.com> <52zmqyeq58.fsf@cisco.com> <52r7cadqsy.fsf@cisco.com> Message-ID: <52y86f407y.fsf@cisco.com> James> When scsi_remove_host() returns are you guaranteed that James> there are no SCSI calls into your code in progress (e.g. in James> srp_queuecommand)? I think so. Roland> Yes, but what is the generic way? James> The generic way would be to handle this in a common James> layer. For the IB verbs + RDMA connection API to be as easy James> to use as the sockets API, then it needs to make this issue James> transparent. I don't think the kernel design philosophy is to hide these sorts of object lifetime issues from consumers. And my real question is how it's even possible to handle this efficiently in a generic layer. If you want consumers to be able to ignore hotplug, then the generic layer needs to handle device removal even in the middle of fast path work request posting operations. And I don't see how to do that without changing to reference counted handles (from the current scheme of directly using pointers). And that's going to have a serious performance impact that I don't think is worth it. - R. From halr at voltaire.com Fri Sep 2 14:39:16 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 17:39:16 -0400 Subject: [openib-general] Re: [PATCH] only build userspace verbs support if requested In-Reply-To: <523bon5fcl.fsf@cisco.com> References: <523bon5fcl.fsf@cisco.com> Message-ID: <1125697155.4398.8039.camel@hal.voltaire.com> On Fri, 2005-09-02 at 17:28, Roland Dreier wrote: > Makes sense I guess... should the help text for INFINIBAND_USER_VERBS > be rewritten? Actually, should we rename the option to > INFINIBAND_USER_ACCESS if we're going to make this change? I would vote for yes to both if this is to be done. -- Hal From roel at yottayotta.com Fri Sep 2 14:45:36 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Fri, 2 Sep 2005 15:45:36 -0600 (MDT) Subject: [openib-general] Problem following trunk/userspace/management/README In-Reply-To: <1125688518.4398.7517.camel@hal.voltaire.com> References: <1125688518.4398.7517.camel@hal.voltaire.com> Message-ID: > Hi Roel, Hi Hal, Thank you for your reply. > I have done this with an FC3 without problem (automake 1.9.2) and don't > have a FC4 machine to try this on. It is good to know that worst case I can go to FC3. > Does the same thing occur with the other management libraries > (libibmad, libibumad) ? They seem to be fine. [root at topcom-63 libibcommon]# cd ../libibumad/ [root at topcom-63 libibumad]# ./autogen.sh + aclocal -I config + libtoolize --force --copy Putting files in AC_CONFIG_AUX_DIR, `config'. + autoheader + automake --foreign --add-missing --copy configure.in: installing `config/install-sh' configure.in: installing `config/missing' Makefile.am: installing `config/compile' Makefile.am: installing `config/depcomp' + autoconf [root at topcom-63 libibumad]# cd ../libibmad/ [root at topcom-63 libibmad]# ./autogen.sh + aclocal -I config + libtoolize --force --copy Putting files in AC_CONFIG_AUX_DIR, `config'. + autoheader + automake --foreign --add-missing --copy configure.in: installing `config/install-sh' configure.in: installing `config/missing' Makefile.am: installing `config/compile' Makefile.am: installing `config/depcomp' + autoconf [root at topcom-63 libibmad]# > Can you clean out any non svn files and start over ? Cleaned up, here are the results: [root at topcom-63 libibcommon]# ./autogen.sh + aclocal -I config configure.in:20: warning: AC_LANG_CALL: no function given autoconf/lang.m4:242: AC_LANG_CALL is expanded from... autoconf/general.m4:2215: AC_LINK_IFELSE is expanded from... autoconf/general.m4:1799: AC_CACHE_VAL is expanded from... autoconf/general.m4:1808: AC_CACHE_CHECK is expanded from... autoconf/libs.m4:134: AC_CHECK_LIB is expanded from... configure.in:20: the top level + libtoolize --force --copy Putting files in AC_CONFIG_AUX_DIR, `config'. + autoheader configure.in:20: warning: AC_LANG_CALL: no function given autoconf/lang.m4:242: AC_LANG_CALL is expanded from... autoconf/general.m4:2215: AC_LINK_IFELSE is expanded from... autoconf/general.m4:1799: AC_CACHE_VAL is expanded from... autoconf/general.m4:1808: AC_CACHE_CHECK is expanded from... autoconf/libs.m4:134: AC_CHECK_LIB is expanded from... configure.in:20: the top level + automake --foreign --add-missing --copy configure.in:20: warning: AC_LANG_CALL: no function given autoconf/lang.m4:242: AC_LANG_CALL is expanded from... autoconf/general.m4:2215: AC_LINK_IFELSE is expanded from... autoconf/general.m4:1799: AC_CACHE_VAL is expanded from... autoconf/general.m4:1808: AC_CACHE_CHECK is expanded from... autoconf/libs.m4:134: AC_CHECK_LIB is expanded from... configure.in:20: the top level configure.in: installing `config/install-sh' configure.in: installing `config/missing' Makefile.am: installing `config/compile' Makefile.am: installing `config/depcomp' + autoconf configure.in:20: warning: AC_LANG_CALL: no function given autoconf/lang.m4:242: AC_LANG_CALL is expanded from... autoconf/general.m4:2215: AC_LINK_IFELSE is expanded from... autoconf/general.m4:1799: AC_CACHE_VAL is expanded from... autoconf/general.m4:1808: AC_CACHE_CHECK is expanded from... autoconf/libs.m4:134: AC_CHECK_LIB is expanded from... configure.in:20: the top level [root at topcom-63 libibcommon]# > What version of automake is on FC4 ? [root at topcom-63 libibcommon]# automake --version automake (GNU automake) 1.9.5 Written by Tom Tromey . Copyright 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [root at topcom-63 libibcommon]# > Did you look for this issue related to automake (aclocal) rather than > OpenIB to look for any pointers ? No, but I will have a look at that. Thank you again. > -- Hal Cheers :-), Roel. From arlin.r.davis at intel.com Fri Sep 2 14:52:10 2005 From: arlin.r.davis at intel.com (Arlin Davis) Date: Fri, 2 Sep 2005 14:52:10 -0700 Subject: [openib-general] RE: [PATCH] uDAPL changes to support async events In-Reply-To: Message-ID: James, Here is the patch as an attachment. -arlin >-----Original Message----- >From: James Lentini [mailto:jlentini at netapp.com] >Sent: Friday, September 02, 2005 2:32 PM >To: Davis, Arlin R >Cc: openib-general at openib.org >Subject: Re: [PATCH] uDAPL changes to support async events > > > >On Thu, 1 Sep 2005, Arlin Davis wrote: > >> James, >> >> Here are the changes to support async events. Also consolidated the >> uAT,uCM,uCQ threads into one processing thread. >> >> Thanks, >> -arlin > >Hi Arlin, > >I'm having trouble applying this cleanly. Several of the lines >wrapped. I went through and tried to fix them by hand, but then I >started seeing errors like this: > >patch -p0 -b < ~/patch >patching file dapl/openib/dapl_ib_util.c >Hunk #1 FAILED at 55. >Hunk #2 FAILED at 131. >Hunk #3 FAILED at 151. >Hunk #4 FAILED at 207. >Hunk #5 FAILED at 235. >Hunk #6 FAILED at 260. >Hunk #7 FAILED at 290. >... > >Could resend as an attachment? > >Thanks, >james -------------- next part -------------- A non-text attachment was scrubbed... Name: async_events.patch Type: application/octet-stream Size: 29024 bytes Desc: not available URL: From rolandd at cisco.com Fri Sep 2 14:52:47 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 02 Sep 2005 14:52:47 -0700 Subject: [openib-general] Problem following trunk/userspace/management/README In-Reply-To: (Roel van der Goot's message of "Fri, 2 Sep 2005 15:45:36 -0600 (MDT)") References: <1125688518.4398.7517.camel@hal.voltaire.com> Message-ID: <52psrr3zo0.fsf@cisco.com> Try this (untested) patch. Calling AC_CHECK_LIB with no parameters doesn't make sense. - R. --- management/libibcommon/configure.in (revision 3295) +++ management/libibcommon/configure.in (working copy) @@ -16,9 +16,6 @@ AC_PROG_LN_S AC_PROG_MAKE_SET AM_PROG_LIBTOOL -dnl Checks for libraries -AC_CHECK_LIB - dnl Checks for header files. AC_HEADER_STDC AC_CHECK_HEADERS([fcntl.h inttypes.h netinet/in.h stdint.h stdlib.h string.h sys/ioctl.h syslog.h unistd.h]) From roel at yottayotta.com Fri Sep 2 15:15:48 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Fri, 2 Sep 2005 16:15:48 -0600 (MDT) Subject: [openib-general] Problem following trunk/userspace/management/README In-Reply-To: <52psrr3zo0.fsf@cisco.com> References: <1125688518.4398.7517.camel@hal.voltaire.com> <52psrr3zo0.fsf@cisco.com> Message-ID: Hi Roland, > Try this (untested) patch. Calling AC_CHECK_LIB with no parameters > doesn't make sense. I get further now. Thank you. Roel. From viswa.krish at gmail.com Fri Sep 2 15:25:17 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 2 Sep 2005 15:25:17 -0700 Subject: [Fwd: Re: [openib-general] kernel oops] In-Reply-To: <1125695082.4398.8037.camel@hal.voltaire.com> References: <43172EFA.7040709@xsigo.com> <4df28be405090212395a3804c3@mail.gmail.com> <1125690601.4398.7695.camel@hal.voltaire.com> <4df28be4050902135926d078f5@mail.gmail.com> <1125695082.4398.8037.camel@hal.voltaire.com> Message-ID: <4df28be405090215252604cace@mail.gmail.com> See inline.. On 02 Sep 2005 17:04:42 -0400, Hal Rosenstock wrote: > > On Fri, 2005-09-02 at 16:59, Viswanath Krishnamurthy wrote: > > Here is the setup.. > > Thanks. A couple more questions: > > > #svn info > > Path: . > > > > URL: https://openib.org/svn/gen2/trunk > > Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd > > Revision: 3295 > > Node Kind: directory > > Schedule: normal > > Last Changed Author: halr > > Last Changed Rev: 3295 > > Last Changed Date: 2005-09-01 12:07:54 -0700 (Thu, 01 Sep 2005) > > > > > > Patch applied to core/at.c and kernel 2.6.13 recompiled. > > > > > > Machine A > > ========= > > Running opensm > > > > Run ucmpost > > > > machine B > > ========= > > ./ucmpost > > Are these back to back HCAs or is there a switch in between ? There is a switch in between. A simple setup with 2 machines and a switch. The machines are running 2.6.13. One of them is running opensm. > The problem is reproducible when you *cannot* ping each other > > over IPoIB ? Yes.. > [root at subnetmgr4 ~]# ibv_devinfo > > hca_id: mthca0 > > fw_ver: 1.0.1 > > node_guid: 0002:c902:0040:0d00 > > sys_image_guid: 0002:c902:0040:0d03 > > max_mr_size: 0xffffffffffffffff > > page_size_cap: 0x0 > > vendor_id: 0x02c9 > > vendor_part_id: 25204 > > hw_ver: 0x0 > > phys_port_cnt: 1 > > port: 1 > > state: PORT_ACTIVE (4) > > max_mtu: invalid MTU (0) < > > What is this ??> > > active_mtu: invalid MTU (0) > > If the program is right and those are the real values, somehow max_mtu > is trashed which causes active_mtu to be invalid which could break all > sorts of things... Is there some issue with the HCA ? > sm_lid: 1 > > port_lid: 3 > > port_lmc: 0x00 > > That's on the remote (from the SM) machine. > > -- Hal > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Fri Sep 2 15:30:47 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 02 Sep 2005 15:30:47 -0700 Subject: [Fwd: Re: [openib-general] kernel oops] In-Reply-To: <1125695082.4398.8037.camel@hal.voltaire.com> (Hal Rosenstock's message of "02 Sep 2005 17:04:42 -0400") References: <43172EFA.7040709@xsigo.com> <4df28be405090212395a3804c3@mail.gmail.com> <1125690601.4398.7695.camel@hal.voltaire.com> <4df28be4050902135926d078f5@mail.gmail.com> <1125695082.4398.8037.camel@hal.voltaire.com> Message-ID: <52ll2f3xwo.fsf@cisco.com> Viswanath> max_mtu: invalid MTU (0) Viswanath> active_mtu: invalid MTU (0) Hal> If the program is right and those are the real values, Hal> somehow max_mtu is trashed which causes active_mtu to be Hal> invalid which could break all sorts of things... No, it's just that the MTU is not reported by the underlying verbs, so ibv_devinfo just prints '0'. - R. From halr at voltaire.com Fri Sep 2 15:27:43 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 18:27:43 -0400 Subject: [openib-general] Problem following trunk/userspace/management/README In-Reply-To: References: <1125688518.4398.7517.camel@hal.voltaire.com> <52psrr3zo0.fsf@cisco.com> Message-ID: <1125700062.4398.8051.camel@hal.voltaire.com> On Fri, 2005-09-02 at 18:15, Roel van der Goot wrote: > Hi Roland, > > > Try this (untested) patch. Calling AC_CHECK_LIB with no parameters > > doesn't make sense. > > I get further now. Thank you. Further but not all the way ? -- Hal From halr at voltaire.com Fri Sep 2 15:31:20 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 18:31:20 -0400 Subject: [openib-general] Problem following trunk/userspace/management/README In-Reply-To: <52psrr3zo0.fsf@cisco.com> References: <1125688518.4398.7517.camel@hal.voltaire.com> <52psrr3zo0.fsf@cisco.com> Message-ID: <1125700203.4398.8055.camel@hal.voltaire.com> On Fri, 2005-09-02 at 17:52, Roland Dreier wrote: > Try this (untested) patch. Calling AC_CHECK_LIB with no parameters > doesn't make sense. Thanks. Applied. -- Hal From roel at yottayotta.com Fri Sep 2 15:36:56 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Fri, 2 Sep 2005 16:36:56 -0600 (MDT) Subject: [openib-general] Problem following trunk/userspace/management/README In-Reply-To: <1125700062.4398.8051.camel@hal.voltaire.com> References: <1125688518.4398.7517.camel@hal.voltaire.com> <52psrr3zo0.fsf@cisco.com> <1125700062.4398.8051.camel@hal.voltaire.com> Message-ID: Hi Hal, > On Fri, 2005-09-02 at 18:15, Roel van der Goot wrote: >> Hi Roland, >> >>> Try this (untested) patch. Calling AC_CHECK_LIB with no parameters >>> doesn't make sense. >> >> I get further now. Thank you. > > Further but not all the way ? Sorry for the ambiguous message. I have not found any new problems yet. ;-) It is still building. Cheers :-), Roel. From mshefty at ichips.intel.com Fri Sep 2 15:41:22 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 02 Sep 2005 15:41:22 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <52br3b5kg4.fsf@cisco.com> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> Message-ID: <4318D511.9030305@ichips.intel.com> Roland Dreier wrote: > I take this to mean that it's fine if CQEs _are_ retrieved after a QP > is destroyed. Since a CQE does not have a pointer to the QP, but only > a QP number and a consumer-defined work request ID, I think this is > OK; there's no direct reference to a stale resource. I was actually thinking about the destruction of the CQ, not the QP. Thinking about it more, it seems unlikely, but couldn't a user destroy a QP followed by the CQ before ibv_get_cq_event() returns? - Sean From halr at voltaire.com Fri Sep 2 15:40:38 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 18:40:38 -0400 Subject: [Fwd: Re: [openib-general] kernel oops] In-Reply-To: <52ll2f3xwo.fsf@cisco.com> References: <43172EFA.7040709@xsigo.com> <4df28be405090212395a3804c3@mail.gmail.com> <1125690601.4398.7695.camel@hal.voltaire.com> <4df28be4050902135926d078f5@mail.gmail.com> <1125695082.4398.8037.camel@hal.voltaire.com> <52ll2f3xwo.fsf@cisco.com> Message-ID: <1125700447.4398.8068.camel@hal.voltaire.com> On Fri, 2005-09-02 at 18:30, Roland Dreier wrote: > Viswanath> max_mtu: invalid MTU (0) > Viswanath> active_mtu: invalid MTU (0) > > Hal> If the program is right and those are the real values, > Hal> somehow max_mtu is trashed which causes active_mtu to be > Hal> invalid which could break all sorts of things... > > No, it's just that the MTU is not reported by the underlying verbs, so > ibv_devinfo just prints '0'. Glad to hear it. No need to chase red herrings. -- Hal From rolandd at cisco.com Fri Sep 2 15:46:19 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 02 Sep 2005 15:46:19 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <4318D511.9030305@ichips.intel.com> (Sean Hefty's message of "Fri, 02 Sep 2005 15:41:22 -0700") References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> Message-ID: <52hdd33x6s.fsf@cisco.com> Sean> I was actually thinking about the destruction of the CQ, not Sean> the QP. Thinking about it more, it seems unlikely, but Sean> couldn't a user destroy a QP followed by the CQ before Sean> ibv_get_cq_event() returns? I guess so. I was actually also confusing myself between CQ events and CQ entries. Yeah, we probably need a ibv_put_cq_event() call as well. - R. From halr at voltaire.com Fri Sep 2 15:47:05 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 18:47:05 -0400 Subject: [openib-general] Re: IPoIB Multicast Connectivity In-Reply-To: <4318B5E9.9020606@dbresearch.net> References: <1125694369.4398.8005.camel@hal.voltaire.com> <4318B5E9.9020606@dbresearch.net> Message-ID: <1125700599.4398.8079.camel@hal.voltaire.com> Hi again Sean, On Fri, 2005-09-02 at 16:28, Sean Hubbell wrote: > I'll get the new version next week and then look into it. I'll try that > and let you know the results. If I have problems, I'll send the version > and we'll at least know what version of openib I have as I cannot find it. One more question/thing to try: How sure are you that you have good IB cables ? Are there others to swap in an out especially on those 2 ports which seem to have trouble ? > On a side note, I could not ask for better assistance. Thanks Hal. Thanks. -- Hal From halr at voltaire.com Fri Sep 2 16:08:53 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 19:08:53 -0400 Subject: [openib-general] Re: at won't compile with gcc-2.95 In-Reply-To: <527je08zi2.fsf@cisco.com> References: <52slwo91lm.fsf@cisco.com> <1125595270.4398.215.camel@hal.voltaire.com> <527je08zi2.fsf@cisco.com> Message-ID: <1125702531.4398.8121.camel@hal.voltaire.com> On Thu, 2005-09-01 at 13:32, Roland Dreier wrote: > > Not sure yet on what gcc 2.95 doesn't like about: > > #define WARN(fmt, arg ...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); > > but that is the cause of the other compile issue in at.c. > > Probably it doesn't like the extra space in WARN(fmt, arg ...). > Hmm... changing it to > > #define WARN(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); > > fixes the original complaint, but then gcc 2.95 doesn't like things like: > > WARN("pending request not found in parent request!"); > > ie WARN() with no arg parameter. Not sure how to make gcc 2.95 happy > about that... Can you try the following patch with gcc 2.95 and let me know if this works ? Thanks. -- Hal Use __VA_ARGS__ so will build with gcc-2.95 Signed-off-by: Hal Rosenstock Index: at_priv.h =================================================================== --- at_priv.h (revision 3295) +++ at_priv.h (working copy) @@ -137,9 +137,10 @@ #define DEBUG(fmt, ...) while (0) {} #define DEBUG_VAR(x, y...) -#define WARN(fmt, arg ...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); +#define WARN(fmt, ...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## __VA_ARGS__) #define WARN_VAR(x, y...) x, ## y -//#define DEBUG(fmt, arg ...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); + +//#define DEBUG(fmt, ...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## __VA_ARGS__) //#define DEBUG_VAR(x, y...) x, ## y static kmem_cache_t *route_req_cache = NULL; From halr at voltaire.com Fri Sep 2 18:56:20 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Sep 2005 21:56:20 -0400 Subject: [openib-general] RE: [openib-commits] r3137 -gen2/trunk/src/linux-kernel/infiniband/ulp/ipoib In-Reply-To: References: <5CE025EE7D88BA4599A2C8FEFCF226F5175BDA@taurus.voltaire.com> Message-ID: <1125712580.4398.8182.camel@hal.voltaire.com> Hi Roland, On Sat, 2005-08-20 at 16:10, Roland Dreier wrote: > [Resending because I forgot to reply to all] > > > What happens when the port is not a full member of the partition (and only > > a partial member) ? Is it just that the SA should reject those requests or > > does some other failure occur ? > > I'm not sure I understand the question. This patch fixes the situation where > one host has, say, 0xffff in its P_Key table and another host has 0x7fff. They > should be able to talk to each other, but with the old code the second host > would join the wrong broadcast group and not be able to exchange ARPs with > the first host. Will 2 limited members now be able to talk with each other ? If this change makes the MC requests to the full partition for limited members, that seems correct (as limited MC groups are useless). If it causes/also causes the UD AVs used to send to have their limited PKey to be promoted to full member, then that is not fine. Does it do the latter as well as the former ? -- Hal From rolandd at cisco.com Fri Sep 2 19:21:44 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 02 Sep 2005 19:21:44 -0700 Subject: [openib-general] RE: [openib-commits] r3137 -gen2/trunk/src/linux-kernel/infiniband/ulp/ipoib In-Reply-To: <1125712580.4398.8182.camel@hal.voltaire.com> (Hal Rosenstock's message of "02 Sep 2005 21:56:20 -0400") References: <5CE025EE7D88BA4599A2C8FEFCF226F5175BDA@taurus.voltaire.com> <1125712580.4398.8182.camel@hal.voltaire.com> Message-ID: <52d5nq51s7.fsf@cisco.com> Hal> Will 2 limited members now be able to talk with each other ? No, that's not possible. Hal> If this change makes the MC requests to the full partition Hal> for limited members, that seems correct (as limited MC groups Hal> are useless). If it causes/also causes the UD AVs used to Hal> send to have their limited PKey to be promoted to full Hal> member, then that is not fine. Does it do the latter as well Hal> as the former ? No, that's not possible either. An HCA port can only send UD messages with one of the P_Keys programmed into its P_Key table. - R. From iod00d at hp.com Fri Sep 2 22:42:38 2005 From: iod00d at hp.com (Grant Grundler) Date: Fri, 2 Sep 2005 22:42:38 -0700 Subject: [openib-general] RDMA Generic Connection Management In-Reply-To: <52hdd8lb1e.fsf@cisco.com> References: <1125323947.6584.106.camel@r2d2> <431374D4.5080909@ichips.intel.com> <52hdd8lb1e.fsf@cisco.com> Message-ID: <20050903054238.GA13440@esmail.cup.hp.com> On Mon, Aug 29, 2005 at 01:53:49PM -0700, Roland Dreier wrote: > Sean> To focus on something a little different... do we want an > Sean> API that returns a pointer to a device structure? > Sean> Specifically, how does this affect device removal? > > Hey, that's a really good point. We should make sure that our API > makes it easy to handle device hotplug. > > One solution is to start reference counting device references, but > that inevitably leads to bugs in ULPs -- protocol authors won't get it > right unless we make it really easy. And I don't see how to make the > reference counting trivial. > > Anyone have a better idea? Maybe let the kernel return ENODEV (or something equivalent) if the hotplug occurs before the connection has opened. Caller could fail (un-)gracefully or try again with the hope it would get a working device on the next try. grant From halr at voltaire.com Sat Sep 3 12:43:42 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Sep 2005 15:43:42 -0400 Subject: [openib-general] OpenSM 1.8.0 merge status Message-ID: <1125776622.4398.8986.camel@hal.voltaire.com> Hi Yael, I've made my way through the rest of the OpenSM changes. First, I'd like to thank you for the great job you did merging the OpenIB OpenSM up to 1.8.0. I hope that once I check things back in to the trunk that this can be the basis for future OpenSM development rather than some other code base. Aside from the issue with the 4x port operating at 1x which I will get back to shortly, I need also to do some testing with Solaris 10. With that and the answers, I should be ready to check things back into the trunk early week (Monday is a holiday) so I am shooting for by Wednesday. I do have some additional comments (including nits) and questions based on inspection of the changes: 1. include/opensm/osm_sm.h typedef struct _osm_sm { osm_thread_state_t thread_state; // osm_sm_state_t state; Should state be removed ? 2. osm_port.c has the following: /* allocate enough SL2VL tables */ if (osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH) { /* we need node num ports + 1 SL2VL tables */ num_slvl = osm_node_get_num_physp( p_node ) + 1; } else { /* An end node - we need only one SL2VL */ num_slvl = osm_node_get_num_physp( p_node ) + 1; } Seems like it should just be either: num_slvl = osm_node_get_num_physp( p_node ) + 1; or the end node case should be different per the comment. 3. osm_subnet.c /* by default we will consider waiting for 20x transaction timeout normal */ p_opt->max_msg_fifo_timeout = 50*OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC; The comment and code are inconsistent. Also, p_opt->single_thread = TRUE; /* HACK : Modified until SMP bug is resolved */ Is the comment accurate ? What's the SMP bug ? Can this run multithreaded ? 4. osm_db_files.c: cl_list_init( &p_db->domains, 5 ); Should 5 be defined ? Should the test code at the bottom of this file be separated out somewhere else ? 5. osm_helper.c Should Infinicon now be listed as SilverStorm ? 6. osm_multicast,c: In osm_mcast_mgr_process_mgrp status = osm_mcast_mgr_process_tree( p_mgr, p_mgrp, req_type, port_guid ); if( status != IB_SUCCESS ) { CL_PLOCK_RELEASE( p_mgr->p_lock ); Should this lock release be removed as the other one was ? 7. osm_mad_pool.c osm_mad_pool_destroy( IN osm_mad_pool_t* const p_pool ) { CL_ASSERT( p_pool ); /* HACK: we still rarely see some mads leaking - so ignore this */ /* cl_qlock_pool_destroy( &p_pool->madw_pool ); */ Should this be removed ? But do MADs leak ? 8. osm_ucast_updn.c #if 0 /* This function insert a new element by guid index with rank value into the qmap list */ int __updn_insert_rank ( OUT cl_qmap_t *p_guid_rank_tbl, IN ib_net64_t guid_index, IN uint8_t rank) { cl_status_t status; updn_rank_t *p_rank,*p_check; p_rank = (updn_rank_t*) cl_malloc(sizeof(updn_rank_t)); CL_ASSERT (p_rank != NULL); p_rank->rank = rank; p_check = (updn_rank_t*) cl_qmap_insert(p_guid_rank_tbl, guid_index , &p_rank->map_item); /* No check for same key required since we support mutiple guids ranking */ return 0; } #endif Should this be removed ? Also in __updn_bfs_by_node /* make sure that all five of the following occur: 1. The port isn't NULL 2. The port is a valid port */ It looks like it is 2 rather than 5 (comment). 9. osm_sa_mad_ctrl.c __osm_sa_mad_ctrl_rcv_callback switch( p_sa_mad->method ) { case IB_MAD_METHOD_REPORT_RESP: ... if (p_req_madw) osm_mad_pool_put( p_ctrl->p_mad_pool, p_req_madw ); osm_mad_pool_put( p_ctrl->p_mad_pool, p_madw ); Should the second duplicate call be eliminated ? 10. osm_sm_mad_ctrl.c __osm_sm_mad_ctrl_send_err_cb /* For now - do not add the alternate dr path to the release */ if (0) Should this code be removed ? 11. osm_sa_service_record.c osm_sr_rcv_process_set_method if( (comp_mask & ( IB_SR_COMPMASK_SID | IB_SR_COMPMASK_SGID )) != (IB_SR_COMPMASK_SID | IB_SR_COMPMASK_SGID )) What about SGID and PKEY as well ? Looks like there is some code for PKEY mask not set for the response. 12. I also have a question about files/database and file/database formats: For those files which are input (and perhaps output ones) as well, is there some version number kept ? How is compatibility going forward dealt with (especially if it is a generated file whose format may change) ? Others: You might want to change the permission on st.c in merge branch (no +x) Please be careful about adding extra whitespace to files. The questions about the osmtest changes are still pending. Thanks again for your efforts. -- Hal From pfister at us.ibm.com Sat Sep 3 20:58:57 2005 From: pfister at us.ibm.com (Greg Pfister) Date: Sat, 3 Sep 2005 23:58:57 -0400 Subject: [openib-general] Re: [mgtwg] RMPP Middle Segments Payload Length In-Reply-To: <1125406230.4401.1178.camel@hal.voltaire.com> Message-ID: Hal, I think you're right. The language currently in the spec implies that for intermediate packets PayloadLength is ignored on receive, but I don't think it anywhere explicitly reconfirms that, as a reserved field, it should be set to 0 on transmit. My take is that, as a result, on that field it wouldn't be as straightforward to use the technique we used elsewhere -- have nonzero in a previously-reserved field turn on some new optional function. But I don't think there's any other effect of that spec looseness. And using that field for such a purpose, only on intermediate packets, strikes me as pretty unlikely. Greg Pfister IBM Distinguished Engineer, Member IBM Academy of Technology IBM Systems & Technology Group, Server Technology & Architecture (512) 838-8338 | IBM tieline 678-8338 | FAX (512) 838-3418 Sic Crustulum Frangitur Hal Rosenstock wrote on 08/30/2005 07:50:31 AM: > Hi Greg, > > In addition to the question about whether the first packet Payload > Length only includes valid bytes in Transferred Data or all bytes in all > Transferred Data in all sent segments in the case of a > multipacket/segment send, there is also a question about the Payload > Length in middle segments/packets. It looks to me like there is just a > comment about the Payload Length being valid in first (optional) and > last (mandatory) segments/packets. So that means it is ignored on > receive but does it need to be set to 0 on transmit ? It seems possibly > different from a reserved field in those cases by language in the spec > but I'm not sure whether this is the case or not. > > Thanks. > > -- Hal > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5207 bytes Desc: S/MIME Cryptographic Signature URL: From pfister at us.ibm.com Sat Sep 3 21:13:11 2005 From: pfister at us.ibm.com (Greg Pfister) Date: Sun, 4 Sep 2005 00:13:11 -0400 Subject: [openib-general] Re: [mgtwg] Payload Length in first RMPP sent segment In-Reply-To: <1125362437.4401.161.camel@hal.voltaire.com> Message-ID: Hal Rosenstock wrote on 08/29/2005 07:40:38 PM: > Hi Greg, > > On Mon, 2005-08-29 at 18:56, Greg Pfister wrote: > > Hal, > > > > My take is that there's no ambiguity. Then again, I wrote it, so I > > would think that, right? :-) > > > > The idea is that we're trying to allow *either* of the usual two > > options for specifying a string of stuff: (a) Start out by giving the > > length; or (b) go until you reach a special mark meaning "the end." > > The latter being "streaming" mode. Or C string mode. > > The thing is it gets complicated when there is only one packet. So > > take two cases: >1 packet, and ==1 packet. > > It seems more complicated (perhaps 2 options when there is more than 1 > packet). > > > length >1 packet: > > > > -- PayloadLength <> 0 on 1st packet means case (a). Just read until > > you get that many bytes, which may use only part of the last packet. > > If the last packet isn't also marked last, scream about inconsistency. > > So if one is using this option, does the payload length in the 1st > packet reduced by 220 * (number of packets - 1) need to match the > payload length in the last packet ? Yes, if by "match" you mean that the number of bytes indicated in the last packet, when added to all the prior bytes, equals the 1st packet's PayloadLength. You don't literally have the two PayloadLength fields matching bit-for-bit. > That's a slightly different > inconsistency from the packet not being marked last but the original > length not exhausted. I agreed, you could justifiably argue that it is a different inconsistency. However, the error codes built in the spec lump the two cases together. > > -- PayloadLength=0 on first packet - case (b). Read until you get a > > marked last packet. PayloadLength in that last packet tells you how > > many are valid in that packet (zero in that case -- I'm not sure; > > whole packet, I think). > > For SA, wouldn't anything less than 20 would be an error in the last > packet ? Yes. > If it were 20, it would be legal but an inefficient > implementation (as really the previous packet was full and could have > terminated the RMPP send). Yes, in streaming / C string mode. No way for the receiver to know it's over until it gets the "last packet" code. In "total count in the first packet" node, I think implementations *shall* scream inconsistency if the packet actually containing the last byte isn't marked as the last packet. > > length ==1 packet meaning RMPPFlags.Last=1 and RMPPFlags.First=1 in > > the same packet. > > > > -- Interpretation is the same as the "last packet" case above, i.e., > > RMPPFlags.Last=1 dominates the interpretation. > > > > As far as I know, that's it. Any comments from others? > > > > (This may not forward to openib-general, since I'm not on that list; > > if it doesn't please forward.) > > It made it to openib. It's an open list as far as posting goes. OK. So I guess this one will, too. > Thanks. > > > > Hal Rosenstock > > > > 08/29/2005 08:14 AM > > To > > mgtwg at infinibandta.org > > cc > > openib-general at openib.org > > Subject > > [mgtwg] Payload > > Length in first > > RMPP sent segment > > > > > > > > Hi, > > > > On the RMPP send side, while the Payload Length field in the last > > segment is clear that it indicates the number of valid bytes in > > Transferred Data, there seems to be some ambiguity in the optional > > Payload Length field in the first segment. I think it can work either > > way but I also think the intent was to reflect the valid bytes. Maybe > > it > > is this way to allow flexibility (choice in the implementation). What > > is > > the correct interpretation ? Should I enter a comment on this ? > > Thanks. > > > > -- Hal > > > > IBA 1.2 p.775 line 37 > > > > In the first packet of an RMPP transfer (RMPPFlags.First=1), > > PayloadLength may indicate the sum of the lengths, in bytes, of the > > TransferredData fields in all packets of the entire multipacket > > response; this is done by using a nonzero value for PayloadLength in > > the > > first packet. > > > > IBA 1.2 p. 776 line 8 > > > > In the last packet of an RMPP transfer (RMPPFlags.Last=1), > > PayloadLength > > indicates the number of valid bytes in the TransferredData field, > > allowing data transfers that are not an integral multiple of the > > length > > of the TransferredData field. A transfer terminates when either: (a) a > > packet containing RMPPFlags.Last=1 is received; or (b) a nonzero > > PayloadLength was given in the first packet of a transfer, and a > > packet > > is received containing sufficient TransferredData bytes to equal or > > exceed the PayloadLength originally provided. If case (b) occurs and > > RMPPFlags.Last is not 1 for that packet, the Receiver sends an ABORT > > packet with RMPPStatus of "Inconsistent Last and PayloadLength" and > > terminates the transfer. Greg Pfister http://pfister.userv.ibm.com/ IBM Distinguished Engineer, Member IBM Academy of Technology IBM Systems & Technology Group, Server Technology & Architecture (512) 838-8338 | IBM tieline 678-8338 | FAX (512) 838-3418 Sic Crustulum Frangitur -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5207 bytes Desc: S/MIME Cryptographic Signature URL: From halr at voltaire.com Sun Sep 4 05:30:02 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Sep 2005 08:30:02 -0400 Subject: [openib-general] MgtWG RMPP Answers Message-ID: <1125836964.4398.9193.camel@hal.voltaire.com> Hi Sean, Here's my quick summary of Greg's answers on RMPP: 1. The first payload length is now correct as changed. 2. The middle payload lengths are not necessary to be set to 0. Should that be backed out ? -- Hal From halr at voltaire.com Sun Sep 4 05:33:18 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Sep 2005 08:33:18 -0400 Subject: [openib-general] [Fwd: [openib-commits] r3298 - gen2/branches/osm-1.8.0-merge/src/userspace/management/osm/opensm] Message-ID: <1125836979.4398.9195.camel@hal.voltaire.com> If this was done by osm_check_n_fix, it is inserting a new line ahead of the license :-( -----Forwarded Message----- From: eitan at openib.org To: openib-commits at openib.org Subject: [openib-commits] r3298 - gen2/branches/osm-1.8.0-merge/src/userspace/management/osm/opensm Date: 03 Sep 2005 22:48:51 -0700 Author: eitan Date: 2005-09-03 22:48:50 -0700 (Sat, 03 Sep 2005) New Revision: 3298 Modified: gen2/branches/osm-1.8.0-merge/src/userspace/management/osm/opensm/osm_opensm.c gen2/branches/osm-1.8.0-merge/src/userspace/management/osm/opensm/osm_sm.c gen2/branches/osm-1.8.0-merge/src/userspace/management/osm/opensm/osm_sm_state_mgr.c gen2/branches/osm-1.8.0-merge/src/userspace/management/osm/opensm/osm_state_mgr.c Log: Using the comment /* Format Waved */ to avoid osm_check_n_fix error on the intentional exceptional message... Modified: gen2/branches/osm-1.8.0-merge/src/userspace/management/osm/opensm/osm_opensm.c =================================================================== --- gen2/branches/osm-1.8.0-merge/src/userspace/management/osm/opensm/osm_opensm.c 2005-09-04 05:38:34 UTC (rev 3297) +++ gen2/branches/osm-1.8.0-merge/src/userspace/management/osm/opensm/osm_opensm.c 2005-09-04 05:48:50 UTC (rev 3298) @@ -1,3 +1,4 @@ + /* * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. * Copyright (c) 2002-2005 Mellanox Technologies LTD. All rights reserved. From eitan at mellanox.co.il Sun Sep 4 05:40:56 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 4 Sep 2005 15:40:56 +0300 Subject: [openib-general] RE: [Fwd: [openib-commits] r3298 - gen2/branches/osm-1.8.0-merge/ src/userspace/management/osm/opensm] Message-ID: <506C3D7B14CDD411A52C00025558DED607C3076E@mtlex01.yok.mtl.com> Actually it is done by osm_indent How troubling is it? > If this was done by osm_check_n_fix, it is inserting a new line ahead of > the license :-( -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sun Sep 4 07:14:37 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 4 Sep 2005 17:14:37 +0300 Subject: [openib-general] Re: Re: ibv_get_async_event In-Reply-To: <52hdd33x6s.fsf@cisco.com> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> Message-ID: <20050904141437.GT1707@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Re: ibv_get_async_event > > Sean> I was actually thinking about the destruction of the CQ, not > Sean> the QP. Thinking about it more, it seems unlikely, but > Sean> couldn't a user destroy a QP followed by the CQ before > Sean> ibv_get_cq_event() returns? > > I guess so. I was actually also confusing myself between CQ events > and CQ entries. Yeah, we probably need a ibv_put_cq_event() call as > well. > > - R. If I'm not mistaken, the idea is that ibv_put_cq_event will release a cq reference and allow it to be destroyed? My problem with this approach is that it adds overhead to the data path cq event instead of non-datapath cq destroy. If thats what we are trying to do, I'd like to propose another idea: when cq is destroyed, and after all cq events are queued to user, put a special "cq destroyed" event into the event queue. When a user calls destroy_cq, and after releasing kernel/hardware resources, move the cq structure to a special cleanup list, userspace library will only destroy the userspace cq structure when it gets this event. Instead of reference counting, we require the user to poll for this event after destroying the cq. Since all events are polled from a single thread, when the library sees this event it knows the user does not keep a pointer to the cq any more. Let me know if this makes sense I can code up a patch. Hope this helps, MST -- MST From halr at voltaire.com Sun Sep 4 08:57:39 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Sep 2005 11:57:39 -0400 Subject: [openib-general] [PATCH] iSER: Make iser depend on kdapl Message-ID: <1125849459.4418.18.camel@hal.voltaire.com> iSER: Make iser depend on kdapl Signed-off-by: Hal Rosenstock Index: Kconfig =================================================================== --- Kconfig (revision 3293) +++ Kconfig (working copy) @@ -1,6 +1,6 @@ config INFINIBAND_ISER tristate "ISCSI RDMA Protocol" - depends on INFINIBAND && SCSI + depends on INFINIBAND && KDAPL_INFINIBAND && SCSI ---help--- Support for the ISCSI RDMA Protocol over InfiniBand. This @@ -9,4 +9,3 @@ The ISER protocol is defined by IETF. See . - From halr at voltaire.com Sun Sep 4 09:00:56 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Sep 2005 12:00:56 -0400 Subject: [openib-general] RE: [Fwd: [openib-commits] r3298 - gen2/branches/osm-1.8.0-merge/ src/userspace/management/osm/opensm] In-Reply-To: <506C3D7B14CDD411A52C00025558DED607C3076E@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED607C3076E@mtlex01.yok.mtl.com> Message-ID: <1125849615.4418.20.camel@hal.voltaire.com> On Sun, 2005-09-04 at 08:40, Eitan Zahavi wrote: > > If this was done by osm_check_n_fix, it is inserting a new line > ahead of > > the license :-( > Actually it is done by osm_indent > How troubling is it? Not so troubling but it would be nice if this weren't introduced into every file. Is the "answer" to hand edit it out afterwards ? -- Hal From rajib.majumder at csfb.com Sun Sep 4 20:01:35 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Mon, 5 Sep 2005 11:01:35 +0800 Subject: [openib-general] SDP message loss Message-ID: hi, We are doing some proof of concept over SDP. During that we noticed that not all messages are received by the data sink. Can there by a message loss in SDP? If yes, what could be the possible reasons? In this case, is there any diagnostic tool, eg. a packet filter, that can give us some idea? We are using RHAS 3.0. thanks. Rajib ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From tomduffy at gmail.com Sun Sep 4 20:15:31 2005 From: tomduffy at gmail.com (Tom Duffy) Date: Sun, 4 Sep 2005 20:15:31 -0700 Subject: [openib-general] SDP message loss In-Reply-To: References: Message-ID: <9d3b7de705090420152dece465@mail.gmail.com> On 9/4/05, Majumder, Rajib wrote: > hi, > > We are doing some proof of concept over SDP. During that we noticed that not all messages are received by the data sink. > Can there by a message loss in SDP? If yes, what could be the possible reasons? No, there shouldn't be a loss. SDP uses the reliable connected mode of Infiniband. It is possible that there is bug in the software, but the hardware should be delivering all of the bits. > In this case, is there any diagnostic tool, eg. a packet filter, that can give us some idea? Have you tried some of the simpler test ulps? Or DAPL which also uses RC? That would help narrow it down to SDP. > We are using RHAS 3.0. Hrm, that is old and 2.4-based. Not that it should really matter as I am assuming you are compilng your own kernel...and got all the 2.6 userland stuff on there. But, in any event, that shouldn't have any impact on this problem. -tduffy From rajib.majumder at csfb.com Sun Sep 4 20:33:12 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Mon, 5 Sep 2005 11:33:12 +0800 Subject: [openib-general] SDP message loss Message-ID: -----Original Message----- From: Tom Duffy [mailto:tomduffy at gmail.com] Sent: 05 September 2005 11:16 To: Majumder, Rajib Cc: openib-general at openib.org Subject: Re: [openib-general] SDP message loss On 9/4/05, Majumder, Rajib wrote: > hi, > > We are doing some proof of concept over SDP. During that we noticed that not all messages are received by the data sink. > Can there by a message loss in SDP? If yes, what could be the possible reasons? No, there shouldn't be a loss. SDP uses the reliable connected mode of Infiniband. It is possible that there is bug in the software, but the hardware should be delivering all of the bits. in case of RNR nack/PSN error nack/fatal error nack and the data source's RC QP's retry count is exhausted, what would be the fate of the packet? > In this case, is there any diagnostic tool, eg. a packet filter, that can give us some idea? Have you tried some of the simpler test ulps? Or DAPL which also uses RC? That would help narrow it down to SDP. > We are using RHAS 3.0. Hrm, that is old and 2.4-based. Not that it should really matter as I am assuming you are compilng your own kernel...and got all the 2.6 userland stuff on there. But, in any event, that shouldn't have any impact on this problem. -tduffy ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From rajib.majumder at csfb.com Sun Sep 4 20:59:56 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Mon, 5 Sep 2005 11:59:56 +0800 Subject: [openib-general] SDP message loss Message-ID: hi tom, In case of RNR nack/PSN error nack/fatal error nack and the data source's RC QP's retry count is exhausted, what would be the fate of the packet? thanks. rajib -----Original Message----- From: Tom Duffy [mailto:tomduffy at gmail.com] Sent: 05 September 2005 11:16 To: Majumder, Rajib Cc: openib-general at openib.org Subject: Re: [openib-general] SDP message loss On 9/4/05, Majumder, Rajib wrote: > hi, > > We are doing some proof of concept over SDP. During that we noticed that not all messages are received by the data sink. > Can there by a message loss in SDP? If yes, what could be the possible reasons? No, there shouldn't be a loss. SDP uses the reliable connected mode of Infiniband. It is possible that there is bug in the software, but the hardware should be delivering all of the bits. > In this case, is there any diagnostic tool, eg. a packet filter, that can give us some idea? Have you tried some of the simpler test ulps? Or DAPL which also uses RC? That would help narrow it down to SDP. > We are using RHAS 3.0. Hrm, that is old and 2.4-based. Not that it should really matter as I am assuming you are compilng your own kernel...and got all the 2.6 userland stuff on there. But, in any event, that shouldn't have any impact on this problem. -tduffy ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From eitan at mellanox.co.il Sun Sep 4 22:42:56 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 05 Sep 2005 08:42:56 +0300 Subject: [openib-general] Re: [Fwd: [openib-commits] r3298 - gen2/branches/osm-1.8.0-merge/ src/userspace/management/osm/opensm] In-Reply-To: <1125849615.4418.20.camel@hal.voltaire.com> References: <506C3D7B14CDD411A52C00025558DED607C3076E@mtlex01.yok.mtl.com> <1125849615.4418.20.camel@hal.voltaire.com> Message-ID: <431BDAE0.7080603@mellanox.co.il> Hal Rosenstock wrote: > On Sun, 2005-09-04 at 08:40, Eitan Zahavi wrote: > > Not so troubling but it would be nice if this weren't introduced into > every file. Is the "answer" to hand edit it out afterwards ? I updated osm_indent to avoid this first empty line (on the branch). > > -- Hal From danb at voltaire.com Mon Sep 5 00:00:58 2005 From: danb at voltaire.com (Dan Bar Dov) Date: Mon, 5 Sep 2005 10:00:58 +0300 Subject: [openib-general] RE: [PATCH] iSER: Make iser depend on kdapl Message-ID: Merged. Also removed SCSI dependency, ISER really depends on ISCSI, but that is not in the kernel yet. Dan > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Sunday, September 04, 2005 6:58 PM > To: Dan Bar Dov > Cc: openib-general at openib.org > Subject: [PATCH] iSER: Make iser depend on kdapl > > iSER: Make iser depend on kdapl > > Signed-off-by: Hal Rosenstock > > Index: Kconfig > =================================================================== > --- Kconfig (revision 3293) > +++ Kconfig (working copy) > @@ -1,6 +1,6 @@ > config INFINIBAND_ISER > tristate "ISCSI RDMA Protocol" > - depends on INFINIBAND && SCSI > + depends on INFINIBAND && KDAPL_INFINIBAND && SCSI > ---help--- > > Support for the ISCSI RDMA Protocol over InfiniBand. This > @@ -9,4 +9,3 @@ > > The ISER protocol is defined by IETF. > See . > - > > > From mst at mellanox.co.il Mon Sep 5 01:18:17 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Sep 2005 11:18:17 +0300 Subject: [openib-general] Re: SDP message loss In-Reply-To: References: Message-ID: <20050905081817.GE17291@mellanox.co.il> Quoting r. Majumder, Rajib : > Subject: SDP message loss > > hi, > > We are doing some proof of concept over SDP. During that we noticed that not all messages are received by the data sink. Do you mean that reading from the socket returns less bytes than you have written into it? > Can there by a message loss in SDP? No, if the retry counter is exceeded, the QP will be closed and you will notice since no more data will get transferred. > If yes, what could be the possible reasons? Software bug. > In this case, is there any diagnostic tool, eg. a packet filter, that can give us some idea? IB analyser is what I use in such cases. > > We are using RHAS 3.0. Is that 2.4 based? > > thanks. > > Rajib -- MST From rajib.majumder at csfb.com Mon Sep 5 01:26:28 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Mon, 5 Sep 2005 16:26:28 +0800 Subject: [openib-general] RE: SDP message loss Message-ID: hi mst, -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: 05 September 2005 16:18 To: Majumder, Rajib Cc: 'openib-general at openib.org' Subject: Re: SDP message loss Quoting r. Majumder, Rajib : > Subject: SDP message loss > > hi, > > We are doing some proof of concept over SDP. During that we noticed that not all messages are received by the data sink. Do you mean that reading from the socket returns less bytes than you have written into it? we are writing large number of messages. sometime, some of the messages are lost i.e not received by sink at all. > Can there by a message loss in SDP? No, if the retry counter is exceeded, the QP will be closed and you will notice since no more data will get transferred. > If yes, what could be the possible reasons? Software bug. what software are you talking about here? our application or SDP? > In this case, is there any diagnostic tool, eg. a packet filter, that can give us some idea? IB analyser is what I use in such cases. > > We are using RHAS 3.0. Is that 2.4 based? Yes. > > thanks. > > Rajib -- MST ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From mst at mellanox.co.il Mon Sep 5 01:35:15 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Sep 2005 11:35:15 +0300 Subject: [openib-general] Re: SDP message loss In-Reply-To: References: Message-ID: <20050905083515.GH17291@mellanox.co.il> Quoting Majumder, Rajib : > > > We are doing some proof of concept over SDP. During that we noticed > > > that not all messages are received by the data sink. > > > > Do you mean that reading from the socket returns less bytes than you > > have written into it? > > we are writing large number of messages. sometime, some of the > messages are lost i.e not received by sink at all. > > > > > If yes, what could be the possible reasons? > > > > > Software bug. > > > what software are you talking about here? our application or SDP? Could be either one, I guess :). > > We are using RHAS 3.0. > > Is that 2.4 based? > > Yes. Did you backport gen2 to 2.4 then? I'll be interested to see the backport patches. -- MST From rajib.majumder at csfb.com Mon Sep 5 01:51:33 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Mon, 5 Sep 2005 16:51:33 +0800 Subject: [openib-general] RE: SDP message loss Message-ID: hi mst, -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: 05 September 2005 16:35 To: Majumder, Rajib Cc: 'openib-general at openib.org' Subject: Re: SDP message loss Quoting Majumder, Rajib : > > > We are doing some proof of concept over SDP. During that we noticed > > > that not all messages are received by the data sink. > > > > Do you mean that reading from the socket returns less bytes than you > > have written into it? > > we are writing large number of messages. sometime, some of the > messages are lost i.e not received by sink at all. > > > > > If yes, what could be the possible reasons? > > > > > Software bug. > > > what software are you talking about here? our application or SDP? Could be either one, I guess :). We never noticed this kind of message loss when we run our application over TCP/IP. > > We are using RHAS 3.0. > > Is that 2.4 based? > > Yes. Did you backport gen2 to 2.4 then? I'll be interested to see the backport patches. Do you think gen2 on 2.4 could be the reason? -- MST ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From rajib.majumder at csfb.com Mon Sep 5 02:15:42 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Mon, 5 Sep 2005 17:15:42 +0800 Subject: [openib-general] RE: SDP message loss Message-ID: hi mst, What's the latest stable revision of SDP? What rev of gen2 stack is recommended at a minimum for RHAS 3.0? Thanks. Rajib -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: 05 September 2005 16:35 To: Majumder, Rajib Cc: 'openib-general at openib.org' Subject: Re: SDP message loss Quoting Majumder, Rajib : > > > We are doing some proof of concept over SDP. During that we noticed > > > that not all messages are received by the data sink. > > > > Do you mean that reading from the socket returns less bytes than you > > have written into it? > > we are writing large number of messages. sometime, some of the > messages are lost i.e not received by sink at all. > > > > > If yes, what could be the possible reasons? > > > > > Software bug. > > > what software are you talking about here? our application or SDP? Could be either one, I guess :). > > We are using RHAS 3.0. > > Is that 2.4 based? > > Yes. Did you backport gen2 to 2.4 then? I'll be interested to see the backport patches. -- MST ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From rajib.majumder at csfb.com Mon Sep 5 03:38:39 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Mon, 5 Sep 2005 18:38:39 +0800 Subject: [openib-general] SDP Revision Message-ID: hi, What's the latest stable revision of SDP? What rev of gen2 stack is recommended at a minimum for RHEL AS 3.0? Thanks. Rajib ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From mst at mellanox.co.il Mon Sep 5 04:03:51 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Sep 2005 14:03:51 +0300 Subject: [openib-general] Re: SDP message loss In-Reply-To: References: Message-ID: <20050905110351.GO17291@mellanox.co.il> Quoting Majumder, Rajib : > Subject: RE: SDP message loss > > hi mst, > > What's the latest stable revision of SDP? What rev of gen2 stack is > recommended at a minimum for RHAS 3.0? > > Thanks. > > Rajib Hi, Personally I am using svn rev 3300 without problems. My system is SuSe pro 9.3 with kernel 2.6.13. Gen2 is kernel 2.6 only. -- MST From mst at mellanox.co.il Mon Sep 5 06:47:43 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Sep 2005 16:47:43 +0300 Subject: [openib-general] QP 000404 not found in MGM Message-ID: <20050905134743.GQ17291@mellanox.co.il> Hi! I see this if opensm is killed when I ping over IPoIB. ib_mthca 0000:07:00.0: QP 000404 not found in MGM ib0: ib_detach_mcast failed (result = -22) ib0: ipoib_mcast_detach failed (result = -22) Are these benign messages? -- MST From eitan at mellanox.co.il Mon Sep 5 08:16:39 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 05 Sep 2005 18:16:39 +0300 Subject: [openib-general] Re: OpenSM 1.8.0 Merge Status and Operational Issue In-Reply-To: <1125690178.4398.7656.camel@hal.voltaire.com> References: <1125609366.4398.1014.camel@hal.voltaire.com> <4317ECF6.9070507@mellanox.co.il> <1125690178.4398.7656.camel@hal.voltaire.com> Message-ID: <431C6157.50500@mellanox.co.il> Hi Hal, We were looking at the section below and were no sure we understand. Is your setup such that a 4x HCA is connected with a 4x -> 4* 1x "split" cable to a 1x analyzer and then on the other side using a similar cable back to a 4x HCA ? In that case which 1x plug out of the 4 are you using to connect to the Analyzer? Also we were looking at the differences between the pre 1.8.0 and the latest code and we conclude that there is a "missing feature" in the 1.8.0 code: OpenSM (from version 1.7.0 or so) does not do multiple sweeps in case of error - unless it has the "errors during initialization" flag set. This prevents repetitive sweeps if a port is not responding... The missing feature - not to say bug - is in the case of SetResp (yes we are able to differentiate it from GetResp) that is responded by the target SMA with status != 0 the current code does not declare the fabric initialization as erroneous. So in our example - if for some reason the 1x configuration requires a link reset due to the 4x -> 1x and the link respond with set error the new OpenSM misses the Set failure and thus fail to bring the link up. However, it is not clear to em why a second sweep is required in the first place. Yael will be working on it. It might take us some time as we need special setup for reproducing the problem. Meanwhile I think Yale will provide a simple patch for considering the SetResp error as a valid cause for "errors during initialization". Eitan Hal Rosenstock wrote: > >>>I have a 4x HCA port (1x/4x LinkWidthEnable and Supported) connected via >>>a 1x analyzer connected to a switch (so is 1x LinkWidthActive). >>>OpenSM does not seem to want to bring this port up. It tries once and >>>gives up until the physical link is cycled (cable pull and reinsertion). >>>It does work running over a 4x link with 4x neighbor ports. From tomduffy at gmail.com Mon Sep 5 08:39:44 2005 From: tomduffy at gmail.com (Tom Duffy) Date: Mon, 5 Sep 2005 08:39:44 -0700 Subject: [openib-general] SDP Revision In-Reply-To: References: Message-ID: <9d3b7de705090508394e31e1ba@mail.gmail.com> On 9/5/05, Majumder, Rajib wrote: > hi, > > What's the latest stable revision of SDP? What rev of gen2 stack is recommended at a minimum for RHEL AS 3.0? Just go for the latest version of SDP. There haven't been any major patches recently, so it should be in good shape. You should use a 2.6.13 kernel with that. Or use one of the back port patches (which are less widely tested, I think). I would recommend updating to RHEL 4.0 as this is 2.6-based and should work better with openib. Other distro option would be Fedora Core 4 which is even closer to working with the latest kernel. -tduffy From halr at voltaire.com Mon Sep 5 08:39:19 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 05 Sep 2005 11:39:19 -0400 Subject: [openib-general] Re: OpenSM 1.8.0 Merge Status and Operational Issue In-Reply-To: <431C6157.50500@mellanox.co.il> References: <1125609366.4398.1014.camel@hal.voltaire.com> <4317ECF6.9070507@mellanox.co.il> <1125690178.4398.7656.camel@hal.voltaire.com> <431C6157.50500@mellanox.co.il> Message-ID: <1125934759.4403.64.camel@hal.voltaire.com> Hi Eitan, On Mon, 2005-09-05 at 11:16, Eitan Zahavi wrote: > We were looking at the section below and were no sure we understand. > Is your setup such that a 4x HCA is connected with a 4x -> 4* 1x "split" cable > to a 1x analyzer and then on the other side using a similar cable back to a 4x HCA ? Yes. > In that case which 1x plug out of the 4 are you using to connect to the Analyzer? The correct one. It works with the pre 1.8.0 OpenSM. > Also we were looking at the differences between the pre 1.8.0 and the latest code and > we conclude that there is a "missing feature" in the 1.8.0 code: OK. > OpenSM (from version 1.7.0 or so) does not do multiple sweeps in case of error - unless > it has the "errors during initialization" flag set. This prevents repetitive sweeps if a > port is not responding... The port responded; just not with Active (It responded with Init rather than Active). > The missing feature - not to say bug - is in the case of SetResp (yes we are able to > differentiate it from GetResp) that is responded by the target SMA with status != 0 the > current code does not declare the fabric initialization as erroneous. The status is 0x1c which says that the HCA is responding with status 7 to the Set PortInfo. Not sure why this is. (That comes from the firmware). That's the first level problem. > So in our example - if for some reason the 1x configuration requires a link reset due to the > 4x -> 1x and the link respond with set error the new OpenSM misses the Set failure and thus > fail to bring the link up. > However, it is not clear to em why a second sweep is required in the first place. Right but only trying once doesn't seem right to me as it never recovers automatically although perhaps it could. This seems like the second level problem to me which should be solved first. > Yael will be working on it. It might take us some time as we need special setup for > reproducing the problem. > > Meanwhile I think Yale will provide a simple patch for considering the SetResp error > as a valid cause for "errors during initialization". OK. Thanks. -- Hal From mst at mellanox.co.il Mon Sep 5 10:04:51 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 5 Sep 2005 20:04:51 +0300 Subject: [openib-general] [PATCH] sdp: set default zcopy threshold by module parameter Message-ID: <20050905170451.GA19358@mellanox.co.il> Make default zcopy threshold values configurable by a module parameter. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.12.2/drivers/infiniband/ulp/sdp/sdp_conn.c =================================================================== --- linux-2.6.12.2.orig/drivers/infiniband/ulp/sdp/sdp_conn.c 2005-09-05 19:37:49.000000000 +0300 +++ linux-2.6.12.2/drivers/infiniband/ulp/sdp/sdp_conn.c 2005-09-05 19:41:18.000000000 +0300 @@ -35,6 +35,13 @@ #include "sdp_main.h" +static int sdp_zcopy_thrsh_src_default = SDP_ZCOPY_THRSH_SRC_DEFAULT; +module_param(sdp_zcopy_thrsh_src_default, int, 0666); +MODULE_PARM_DESC(sdp_zcopy_thrsh_src_default, "Default ZCopy Threshold for Data Source"); +static int sdp_zcopy_thrsh_snk_default = SDP_ZCOPY_THRSH_SNK_DEFAULT; +module_param(sdp_zcopy_thrsh_src_default, int, 0666); +MODULE_PARM_DESC(sdp_zcopy_thrsh_snk_default, "Default ZCopy Threshold for Data Sink"); + static struct sdev_root dev_root_s; static void sdp_device_init_one(struct ib_device *device); @@ -1178,8 +1185,8 @@ struct sdp_sock *sdp_conn_alloc(unsigned conn->rcv_urg_cnt = 0; conn->nodelay = 0; - conn->src_zthresh = SDP_ZCOPY_THRSH_SRC_DEFAULT; - conn->snk_zthresh = SDP_ZCOPY_THRSH_SNK_DEFAULT; + conn->src_zthresh = sdp_zcopy_thrsh_src_default; + conn->snk_zthresh = sdp_zcopy_thrsh_snk_default; conn->accept_next = NULL; conn->accept_prev = NULL; -- MST From rajib.majumder at csfb.com Mon Sep 5 21:06:57 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Tue, 6 Sep 2005 12:06:57 +0800 Subject: [openib-general] RE: SDP message loss Message-ID: Did you notice or anybody reported this kind of issue with gen1 stack? -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: 05 September 2005 19:04 To: Majumder, Rajib Cc: 'openib-general at openib.org' Subject: Re: SDP message loss Quoting Majumder, Rajib : > Subject: RE: SDP message loss > > hi mst, > > What's the latest stable revision of SDP? What rev of gen2 stack is > recommended at a minimum for RHAS 3.0? > > Thanks. > > Rajib Hi, Personally I am using svn rev 3300 without problems. My system is SuSe pro 9.3 with kernel 2.6.13. Gen2 is kernel 2.6 only. -- MST ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From rajib.majumder at csfb.com Tue Sep 6 03:10:16 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Tue, 6 Sep 2005 18:10:16 +0800 Subject: [openib-general] RE: SDP message loss Message-ID: hi mst, how do I enable debug trace of SDP or activate some debug hooks that can be added to SDP? thanks. rajib -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: 05 September 2005 19:04 To: Majumder, Rajib Cc: 'openib-general at openib.org' Subject: Re: SDP message loss Quoting Majumder, Rajib : > Subject: RE: SDP message loss > > hi mst, > > What's the latest stable revision of SDP? What rev of gen2 stack is > recommended at a minimum for RHAS 3.0? > > Thanks. > > Rajib Hi, Personally I am using svn rev 3300 without problems. My system is SuSe pro 9.3 with kernel 2.6.13. Gen2 is kernel 2.6 only. -- MST ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From mst at mellanox.co.il Tue Sep 6 04:01:34 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Sep 2005 14:01:34 +0300 Subject: [openib-general] Re: SDP message loss In-Reply-To: References: Message-ID: <20050906110134.GE19358@mellanox.co.il> Quoting Majumder, Rajib : > how do I enable debug trace of SDP or activate some debug hooks that > can be added to SDP? Set "Sockets Direct Protocol debugging" under InfiniBand support/Sockets Direct Protocol -- MST From halr at voltaire.com Tue Sep 6 04:39:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 07:39:25 -0400 Subject: [openib-general] [PATCH} OpenSM: Add MGID to 1B11 error message Message-ID: <1126006765.4403.234.camel@hal.voltaire.com> OpenSM: Add MGID to error message when MC group can't be created due to only being supplied with join (rather than create) component mask This appears to be the most common "failure" so it would be good to have the MGID in the error message even though it is redundant when running in verbose mode. [Are there others like this we should add ?] Signed-off-by: Hal Rosenstock 1581c1581,1583 < "expected comp mask = 0x%016" PRIx64 ".\n", --- > "expected comp mask = 0x%016" PRIx64 ", " > "MGID: 0x%016" PRIx64 " : " > "0x%016" PRIx64 "\n", 1585c1587,1589 < cl_ntoh64(REQUIRED_MC_CREATE_COMP_MASK)); --- > cl_ntoh64(REQUIRED_MC_CREATE_COMP_MASK), > cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.prefix ), > cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.interface_id ) ); From imhzoyjrlct at msn.com Tue Sep 6 05:07:43 2005 From: imhzoyjrlct at msn.com (Elise Leach) Date: Tue, 6 Sep 2005 12:07:43 +0000 Subject: [openib-general] Mortgage News Update. Message-ID: <1044139072.30imhzoyjrlct@msn.com> We are happy to present you with six deals from four different brokers. Please remember that there is no commitment required on your part, and your credit is not an issue. Please validate your information with our secure and private database to ensure our records are up to date and accurate. http://shr3wd.net/p3.asp Have a good day. Sincerely, Elise Leach Customer Service Rep From eitan at mellanox.co.il Tue Sep 6 06:59:05 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 06 Sep 2005 16:59:05 +0300 Subject: [openib-general] Re: [PATCH} OpenSM: Add MGID to 1B11 error message In-Reply-To: <1126006765.4403.234.camel@hal.voltaire.com> References: <1126006765.4403.234.camel@hal.voltaire.com> Message-ID: <431DA0A9.70605@mellanox.co.il> Good - we needed it long ago. Hal Rosenstock wrote: >> cl_ntoh64(REQUIRED_MC_CREATE_COMP_MASK), It is common practice to use CL_NTOH (upper case) for converting constants. The claim in the cl_byteswap.h is that the upper case version of constants is handled by the compiler and thus is faster to evaluate. I must say I have not verified that but it seems logical to me. Anyway the upper case version should never be used on anything that is not a constant! >> cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.prefix ), >> cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.interface_id ) ); From mst at mellanox.co.il Tue Sep 6 07:04:28 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Sep 2005 17:04:28 +0300 Subject: [openib-general] Re: [PATCH][MSTFLINT] fix segfault in full usage print In-Reply-To: References: Message-ID: <20050906140427.GL19358@mellanox.co.il> Quoting r. James Lentini : > Subject: [PATCH][MSTFLINT] fix segfault in full usage print > > > fix segfault in full usage print > > Signend-off-by: James Lentini I've updated flint to the revision from 1.8.0 and this should now be fixed. -- MST From halr at voltaire.com Tue Sep 6 07:16:22 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 10:16:22 -0400 Subject: [openib-general] [PATCH] OpenSM: OpenIB vendor layer: Implement osm_vendor_delete Message-ID: <1126016180.4405.159.camel@hal.voltaire.com> OpenSM: OpenIB vendor layer: Implement osm_vendor_delete [I've done some testing of this; are there any regressions for this ?] Signed-off-by: Hal Rosenstock 486,491d485 < int agent_id; < < /* unregister UMAD agents */ < for (agent_id = 0; agent_id < UMAD_CA_MAX_AGENTS; agent_id++) < if ( (*pp_vend)->agents[agent_id] ) < umad_unregister( (*pp_vend)->umad_port_id, agent_id ); 493c487 < /* make sure all ports are closed? */ --- > /* make sure all ports are closed */ 596c590 < int --- > static int 831c825,836 < osm_vendor_t *p_vend = p_bind->p_vend; --- > osm_vendor_t *p_vend; > > if (p_bind) { > p_vend = p_bind->p_vend; > > OSM_LOG_ENTER( p_vend->p_log, osm_vendor_unbind ); > > /* Unregister UMAD agents */ > if (p_vend->agents[p_bind->agent_id1]) > umad_unregister(p_bind->port_id, p_bind->agent_id1); > if (p_vend->agents[p_bind->agent_id]) > umad_unregister(p_bind->port_id, p_bind->agent_id); 833c838,844 < OSM_LOG_ENTER( p_vend->p_log, osm_vendor_unbind ); --- > /* close port ??? */ > > free(p_bind); > > OSM_LOG_EXIT( p_vend->p_log); > > } 835d845 < OSM_LOG_EXIT( p_vend->p_log); From halr at voltaire.com Tue Sep 6 07:26:56 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 10:26:56 -0400 Subject: [openib-general] [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_delete Message-ID: <1126016815.4405.185.camel@hal.voltaire.com> [same patch just generated with diff -up] OpenSM: OpenIB vendor layer: Implement osm_vendor_delete [I've done some testing of this; are there any regressions for this ?] Signed-off-by: Hal Rosenstock --- osm_vendor_ibumad.c.1 2005-08-31 12:26:03.000000000 -0400 +++ osm_vendor_ibumad.c 2005-09-06 09:35:27.000000000 -0400 @@ -483,14 +483,8 @@ void osm_vendor_delete( IN osm_vendor_t** const pp_vend ) { - int agent_id; - - /* unregister UMAD agents */ - for (agent_id = 0; agent_id < UMAD_CA_MAX_AGENTS; agent_id++) - if ( (*pp_vend)->agents[agent_id] ) - umad_unregister( (*pp_vend)->umad_port_id, agent_id ); clear_madw( *pp_vend ); - /* make sure all ports are closed? */ + /* make sure all ports are closed */ umad_done(); cl_free( *pp_vend ); *pp_vend = NULL; @@ -593,7 +587,7 @@ Exit: /********************************************************************** **********************************************************************/ -int +static int osm_vendor_open_port( IN osm_vendor_t* const p_vend, IN const ib_net64_t port_guid ) @@ -828,11 +822,27 @@ osm_vendor_unbind( IN osm_bind_handle_t h_bind) { osm_umad_bind_info_t *p_bind = ( osm_umad_bind_info_t * ) h_bind; - osm_vendor_t *p_vend = p_bind->p_vend; + osm_vendor_t *p_vend; + + if (p_bind) { + p_vend = p_bind->p_vend; + + OSM_LOG_ENTER( p_vend->p_log, osm_vendor_unbind ); + + /* Unregister UMAD agents */ + if (p_vend->agents[p_bind->agent_id1]) + umad_unregister(p_bind->port_id, p_bind->agent_id1); + if (p_vend->agents[p_bind->agent_id]) + umad_unregister(p_bind->port_id, p_bind->agent_id); - OSM_LOG_ENTER( p_vend->p_log, osm_vendor_unbind ); + /* close port ??? */ + + free(p_bind); + + OSM_LOG_EXIT( p_vend->p_log); + + } - OSM_LOG_EXIT( p_vend->p_log); } /********************************************************************** From mst at mellanox.co.il Tue Sep 6 07:43:44 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Sep 2005 17:43:44 +0300 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: References: Message-ID: <20050906144344.GO19358@mellanox.co.il> Quoting r. James Lentini : > Subject: [mstflint] firmware upgrade instructions > > > Hi Michael, > > I'm guessing that you are the maintainer of mstflint. Two questions: > What is the difference between mstflint and tvflash? I didnt personally use tvflash. I think this tool is specific for topspin cards. I think you'll need to use it if you are using the topspin gen1 driver. For mellanox cards, you can also use the mlxburn tool, part of the ibadm package in mellanox ib gold. > Using mstflint, how can the firmware located on the Mellanox website: > > http://www.mellanox.com/products/firmware.html > > be used to upgrade an HCA? James, .mlx is a generic firmware common for all boards based on mellanox silicon. Therefore, to get a firmware image that flint can burn, in addition to the .mlx file, you also need a .brd file that matches the board you have. For mellanox cards you can usually figure that our by the board id, which is reported by running "mstflint -d q" or by looking at board id in /sys/class/infiniband/mthcaX/board_id Once you have .mlx and .brd, compile these into a firmware image specific for your board. This can be done by imgen tools that I have just uploaded to the src/userspace/imgen directory. Pls look at imgen/README file that explains how to do it. HTH -- MST From halr at voltaire.com Tue Sep 6 07:40:09 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 10:40:09 -0400 Subject: [openib-general] Re: [PATCH} OpenSM: Add MGID to 1B11 error message In-Reply-To: <431DA0A9.70605@mellanox.co.il> References: <1126006765.4403.234.camel@hal.voltaire.com> <431DA0A9.70605@mellanox.co.il> Message-ID: <1126017606.4405.213.camel@hal.voltaire.com> On Tue, 2005-09-06 at 09:59, Eitan Zahavi wrote: > Good - we needed it long ago. > > Hal Rosenstock wrote: > >> cl_ntoh64(REQUIRED_MC_CREATE_COMP_MASK), > It is common practice to use CL_NTOH (upper case) for converting constants. > The claim in the cl_byteswap.h is that the upper case version of constants is handled > by the compiler and thus is faster to evaluate. I must say I have not verified that but > it seems logical to me. That was a preexisting line of code (and I am sure there are other instances of similar things). I changed it and retested it as you suggest. I will reissue the patch. > Anyway the upper case version should never be used on anything that is not a constant! I found another MC error message to update too. Patch coming for this too... -- Hal From halr at voltaire.com Tue Sep 6 07:43:52 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 10:43:52 -0400 Subject: [openib-general] [PATCHv2] OpenSM: Add MGID to 1B11 error message Message-ID: <1126017831.4405.221.camel@hal.voltaire.com> [This is version 2 of this patch with the minor change suggested by Eitan incorporated.] OpenSM: Add MGID to error message when MC group can't be created due to only being supplied with join (rather than create) component mask This appears to be the most common "failure" so it would be good to have the MGID in the error message even though it is redundant when running in verbose mode. Signed-off-by: Hal Rosenstock --- osm_sa_mcmember_record.c.1 2005-09-06 07:09:30.000000000 -0400 +++ osm_sa_mcmember_record.c 2005-09-06 10:37:06.000000000 -0400 @@ -1578,11 +1578,15 @@ osm_mcmr_rcv_join_mgrp( "method = %s," "scope_state = 0x%x, " "component mask = 0x%016" PRIx64 ", " - "expected comp mask = 0x%016" PRIx64 ".\n", + "expected comp mask = 0x%016" PRIx64 ", " + "MGID: 0x%016" PRIx64 " : " + "0x%016" PRIx64 "\n", ib_get_sa_method_str(p_sa_mad->method), p_recvd_mcmember_rec->scope_state, cl_ntoh64(p_sa_mad->comp_mask), - cl_ntoh64(REQUIRED_MC_CREATE_COMP_MASK)); + CL_NTOH64(REQUIRED_MC_CREATE_COMP_MASK), + cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.prefix ), + cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.interface_id ) ); CL_PLOCK_RELEASE( p_rcv->p_lock ); sa_status = IB_SA_MAD_STATUS_INSUF_COMPS; From eitan at mellanox.co.il Tue Sep 6 08:04:25 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 06 Sep 2005 18:04:25 +0300 Subject: [openib-general] Re: [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_delete In-Reply-To: <1126016815.4405.185.camel@hal.voltaire.com> References: <1126016815.4405.185.camel@hal.voltaire.com> Message-ID: <431DAFF9.50600@mellanox.co.il> Hal Rosenstock wrote: > [same patch just generated with diff -up] > > OpenSM: OpenIB vendor layer: Implement osm_vendor_delete > > [I've done some testing of this; are there any regressions for this ?] OpenSM call osm_vendor_delete during osm_opensm_destroy. It is invoked during exit. EZ > From halr at voltaire.com Tue Sep 6 08:09:28 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 11:09:28 -0400 Subject: [openib-general] [PATCH] OpenSM: Add MGID and PortGID to 1B25 error message Message-ID: <1126019367.4405.271.camel@hal.voltaire.com> OpenSM: Add MGID and PortGID to 1B25 error message Signed-off-by: Hal Rosenstock --- osm_sa_mcmember_record.c.3 2005-09-06 10:52:22.000000000 -0400 +++ osm_sa_mcmember_record.c 2005-09-06 11:05:02.000000000 -0400 @@ -1412,7 +1412,15 @@ osm_mcmr_rcv_leave_mgrp( CL_PLOCK_RELEASE( p_rcv->p_lock ); osm_log( p_rcv->p_log, OSM_LOG_ERROR, "osm_mcmr_rcv_leave_mgrp: ERR 1B25: " - "Received an Invalid Delete Request.\n"); + "Received an Invalid Delete Request on " + "MGID: 0x%016" PRIx64 " : " + "0x%016" PRIx64 " for " + "PortGID: 0x%016" PRIx64 " : " + "0x%016" PRIx64 "\n", + cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.prefix ), + cl_ntoh64( p_recvd_mcmember_rec->mgid.unicast.interface_id ), + cl_ntoh64( p_recvd_mcmember_rec->port_gid.unicast.prefix ), + cl_ntoh64( p_recvd_mcmember_rec->port_gid.unicast.interface_id ) ); sa_status = IB_SA_MAD_STATUS_REQ_INVALID; osm_sa_send_error( p_rcv->p_resp, p_madw, sa_status); goto Exit; From halr at voltaire.com Tue Sep 6 08:12:45 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 11:12:45 -0400 Subject: [openib-general] Re: [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_delete In-Reply-To: <431DAFF9.50600@mellanox.co.il> References: <1126016815.4405.185.camel@hal.voltaire.com> <431DAFF9.50600@mellanox.co.il> Message-ID: <1126019486.4405.279.camel@hal.voltaire.com> On Tue, 2005-09-06 at 11:04, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > [same patch just generated with diff -up] > > > > OpenSM: OpenIB vendor layer: Implement osm_vendor_delete > > > > [I've done some testing of this; are there any regressions for this ?] > OpenSM call osm_vendor_delete during osm_opensm_destroy. > It is invoked during exit. Actually, it should have said "Implement osm_vendor_unbind" rather than delete. OpenSM calls it in a similar place (as it stops the SA and SM MAD controllers). I went through this starting and stopping the OpenSM a number of times although did not do this "infinitely". I was asking about any other explicit regressions. -- Hal From eitan at mellanox.co.il Tue Sep 6 08:17:14 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 06 Sep 2005 18:17:14 +0300 Subject: [openib-general] Re: [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_delete In-Reply-To: <1126019486.4405.279.camel@hal.voltaire.com> References: <1126016815.4405.185.camel@hal.voltaire.com> <431DAFF9.50600@mellanox.co.il> <1126019486.4405.279.camel@hal.voltaire.com> Message-ID: <431DB2FA.5020504@mellanox.co.il> Hal Rosenstock wrote: > On Tue, 2005-09-06 at 11:04, Eitan Zahavi wrote: > >>Hal Rosenstock wrote: >> >>>[same patch just generated with diff -up] >>> >>>OpenSM: OpenIB vendor layer: Implement osm_vendor_delete >>> >>>[I've done some testing of this; are there any regressions for this ?] >> >>OpenSM call osm_vendor_delete during osm_opensm_destroy. >>It is invoked during exit. > > > Actually, it should have said "Implement osm_vendor_unbind" rather than > delete. Well the semantics are very old. We stick to the old osm_vendor_api.h. Maybe we should not have. But now we have too much depending on this API that I urge you not to modify it if possible. > > OpenSM calls it in a similar place (as it stops the SA and SM MAD > controllers). I went through this starting and stopping the OpenSM a > number of times although did not do this "infinitely". > > I was asking about any other explicit regressions. No we do not have tests for the osm_vendor_api.h. It might have been a good idea. We do have now some stuff for the Windows code. Maybe we should try and make a test suite from it. Liran - what do you think ? > > -- Hal From halr at voltaire.com Tue Sep 6 08:22:09 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 11:22:09 -0400 Subject: [openib-general] Re: [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_delete In-Reply-To: <431DB2FA.5020504@mellanox.co.il> References: <1126016815.4405.185.camel@hal.voltaire.com> <431DAFF9.50600@mellanox.co.il> <1126019486.4405.279.camel@hal.voltaire.com> <431DB2FA.5020504@mellanox.co.il> Message-ID: <1126020129.4405.294.camel@hal.voltaire.com> On Tue, 2005-09-06 at 11:17, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > On Tue, 2005-09-06 at 11:04, Eitan Zahavi wrote: > > > >>Hal Rosenstock wrote: > >> > >>>[same patch just generated with diff -up] > >>> > >>>OpenSM: OpenIB vendor layer: Implement osm_vendor_delete > >>> > >>>[I've done some testing of this; are there any regressions for this ?] > >> > >>OpenSM call osm_vendor_delete during osm_opensm_destroy. > >>It is invoked during exit. > > > > > > Actually, it should have said "Implement osm_vendor_unbind" rather than > > delete. > Well the semantics are very old. We stick to the old osm_vendor_api.h. Maybe we should not have. > But now we have too much depending on this API that I urge you not to modify it if possible. I didn't change the semantics. I implemented the OpenIB version. Recall that Yael had indicated this was needed when she started on the 1.8.0 merge work. -- Hal From eitan at mellanox.co.il Tue Sep 6 08:33:33 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 06 Sep 2005 18:33:33 +0300 Subject: [openib-general] Re: [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_delete In-Reply-To: <1126020129.4405.294.camel@hal.voltaire.com> References: <1126016815.4405.185.camel@hal.voltaire.com> <431DAFF9.50600@mellanox.co.il> <1126019486.4405.279.camel@hal.voltaire.com> <431DB2FA.5020504@mellanox.co.il> <1126020129.4405.294.camel@hal.voltaire.com> Message-ID: <431DB6CD.9050805@mellanox.co.il> Hal Rosenstock wrote: > On Tue, 2005-09-06 at 11:17, Eitan Zahavi wrote: > >>Hal Rosenstock wrote: >> >>>On Tue, 2005-09-06 at 11:04, Eitan Zahavi wrote: >>> >>> >>>>Hal Rosenstock wrote: >>>> >>>> >>>>>[same patch just generated with diff -up] >>>>> >>>>>OpenSM: OpenIB vendor layer: Implement osm_vendor_delete >>>>> >>>>>[I've done some testing of this; are there any regressions for this ?] >>>> >>>>OpenSM call osm_vendor_delete during osm_opensm_destroy. >>>>It is invoked during exit. >>> >>> >>>Actually, it should have said "Implement osm_vendor_unbind" rather than >>>delete. >> >>Well the semantics are very old. We stick to the old osm_vendor_api.h. Maybe we should not have. >>But now we have too much depending on this API that I urge you not to modify it if possible. > > > I didn't change the semantics. I implemented the OpenIB version. Recall > that Yael had indicated this was needed when she started on the 1.8.0 > merge work. I did not mean that. I thought you proposed to rename osm_opensm_destroy osm_vendor_unbind but now I'm not sure this is what you meant. My comment was about renaming. I think I have to go home... so I do not waste your time not being able to read right. I'll probably login in 2 hours or so. > > -- Hal From rolandd at cisco.com Tue Sep 6 08:41:53 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 08:41:53 -0700 Subject: [openib-general] [PATCH] sdp: set default zcopy threshold by module parameter References: <20050905170451.GA19358@mellanox.co.il> Message-ID: <5264teyzi6.fsf@cisco.com> > +static int sdp_zcopy_thrsh_src_default = SDP_ZCOPY_THRSH_SRC_DEFAULT; > +module_param(sdp_zcopy_thrsh_src_default, int, 0666); > +MODULE_PARM_DESC(sdp_zcopy_thrsh_src_default, "Default ZCopy Threshold for Data Source"); > +static int sdp_zcopy_thrsh_snk_default = SDP_ZCOPY_THRSH_SNK_DEFAULT; > +module_param(sdp_zcopy_thrsh_src_default, int, 0666); > +MODULE_PARM_DESC(sdp_zcopy_thrsh_snk_default, "Default ZCopy Threshold for Data Sink"); 0666 seems like a strange choice of permissions here. Would it be better as 0644? - R. From rolandd at cisco.com Tue Sep 6 08:41:56 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 08:41:56 -0700 Subject: [openib-general] Re: QP 000404 not found in MGM References: <20050905134743.GQ17291@mellanox.co.il> Message-ID: <52zmqqxkxn.fsf@cisco.com> Michael> ib_mthca 0000:07:00.0: QP 000404 not found in MGM Michael> Are these benign messages? No, something is wrong. Either there's a bug in the mthca multicast group code, or IPoIB is trying to detach a QP that is not attached to the multicast group. I'd bet on the bug being in IPoIB. I think I probably just need to rewrite all the netdevice up/down, port state change and device removal code in IPoIB. - R. From rolandd at cisco.com Tue Sep 6 08:42:00 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 08:42:00 -0700 Subject: [openib-general] SDP Revision References: Message-ID: <52u0gyxkxj.fsf@cisco.com> Rajib> hi, What's the latest stable revision of SDP? What rev of Rajib> gen2 stack is recommended at a minimum for RHEL AS 3.0? In general I would recommend running the latest gen2 code. What version are you running now? I would also recommend using a newer distribution. RHEL3 does not really support 2.6 kernels very well. You have to replace modutils with module-init-tools, RHEL3 doesn't have udev, and so on. - R. From rolandd at cisco.com Tue Sep 6 08:42:03 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 08:42:03 -0700 Subject: [openib-general] RE: [PATCH] iSER: Make iser depend on kdapl References: Message-ID: <52oe76xkxg.fsf@cisco.com> Dan> Merged. Also removed SCSI dependency, ISER really depends on Dan> ISCSI, but that is not in the kernel yet. Shouldn't iSER depend on something having to do with SCSI? Otherwise it's too easy to create configurations that don't compile. For example, the latest iSCSI submission that I saw had: config ISCSI_TCP tristate "iSCSI Initiator over TCP/IP" depends on SCSI && INET && SCSI_ISCSI_ATTRS - R. From mst at mellanox.co.il Tue Sep 6 08:43:48 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Sep 2005 18:43:48 +0300 Subject: [openib-general] [PATCH] sdp: set default zcopy threshold by module parameter In-Reply-To: <5264teyzi6.fsf@cisco.com> References: <20050905170451.GA19358@mellanox.co.il> <5264teyzi6.fsf@cisco.com> Message-ID: <20050906154348.GA29024@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [openib-general] [PATCH] sdp: set default zcopy threshold by module parameter > > > +static int sdp_zcopy_thrsh_src_default = SDP_ZCOPY_THRSH_SRC_DEFAULT; > > +module_param(sdp_zcopy_thrsh_src_default, int, 0666); > > +MODULE_PARM_DESC(sdp_zcopy_thrsh_src_default, "Default ZCopy Threshold for Data Source"); > > +static int sdp_zcopy_thrsh_snk_default = SDP_ZCOPY_THRSH_SNK_DEFAULT; > > +module_param(sdp_zcopy_thrsh_src_default, int, 0666); > > +MODULE_PARM_DESC(sdp_zcopy_thrsh_snk_default, "Default ZCopy Threshold for Data Sink"); > > 0666 seems like a strange choice of permissions here. Would it be > better as 0644? > > - R. > True. I'll fix that tomorrow. -- MST From mst at mellanox.co.il Tue Sep 6 08:50:08 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Sep 2005 18:50:08 +0300 Subject: [openib-general] Re: QP 000404 not found in MGM In-Reply-To: <52zmqqxkxn.fsf@cisco.com> References: <20050905134743.GQ17291@mellanox.co.il> <52zmqqxkxn.fsf@cisco.com> Message-ID: <20050906155008.GA29302@mellanox.co.il> Quoting r. Roland Dreier : > I think I probably just need to rewrite all the netdevice up/down, > port state change and device removal code in IPoIB. It would be nice not to do down and then up on lid change. E.g. SDP and AT rely on the interface being up for address resoltion. -- MST From jlentini at netapp.com Tue Sep 6 08:53:32 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 11:53:32 -0400 (EDT) Subject: [openib-general] [mstflint] firmware upgrade instructions In-Reply-To: <1125601877.4398.523.camel@hal.voltaire.com> References: <1125601877.4398.523.camel@hal.voltaire.com> Message-ID: On Thu, 1 Sep 2005, Hal Rosenstock wrote: > On Thu, 2005-09-01 at 15:03, James Lentini wrote: > > What is the difference between mstflint and tvflash? > > Note that tvflash is not being supported although it may work. > There was an earlier thread on this. I'll add this information to the Wiki. From halr at voltaire.com Tue Sep 6 08:49:55 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 11:49:55 -0400 Subject: [openib-general] RE: [openib-commits] r3137 -gen2/trunk/src/linux-kernel/infiniband/ulp/ipoib In-Reply-To: <52d5nq51s7.fsf@cisco.com> References: <5CE025EE7D88BA4599A2C8FEFCF226F5175BDA@taurus.voltaire.com> <1125712580.4398.8182.camel@hal.voltaire.com> <52d5nq51s7.fsf@cisco.com> Message-ID: <1126021794.4406.3.camel@hal.voltaire.com> On Fri, 2005-09-02 at 22:21, Roland Dreier wrote: > Hal> Will 2 limited members now be able to talk with each other ? > > No, that's not possible. > > Hal> If this change makes the MC requests to the full partition > Hal> for limited members, that seems correct (as limited MC groups > Hal> are useless). If it causes/also causes the UD AVs used to > Hal> send to have their limited PKey to be promoted to full > Hal> member, then that is not fine. Does it do the latter as well > Hal> as the former ? > > No, that's not possible either. An HCA port can only send UD messages > with one of the P_Keys programmed into its P_Key table. Maybe it is just the interface name (w/ifconfig) that has it's PKey promoted to a full member of the partition then. -- Hal From rolandd at cisco.com Tue Sep 6 08:56:53 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 08:56:53 -0700 Subject: [openib-general] Re: at won't compile with gcc-2.95 References: <52slwo91lm.fsf@cisco.com> <1125595270.4398.215.camel@hal.voltaire.com> <527je08zi2.fsf@cisco.com> <1125702531.4398.8121.camel@hal.voltaire.com> Message-ID: <52y86aw5oa.fsf@cisco.com> Hal> Can you try the following patch with gcc 2.95 and let me know Hal> if this works ? No, that doesn't help. However adding a space before the last comma and removing the space before the '...' does build with gcc 2.95. - R. Index: infiniband/core/at_priv.h =================================================================== --- infiniband/core/at_priv.h (revision 3319) +++ infiniband/core/at_priv.h (working copy) @@ -137,9 +137,9 @@ static const struct ib_field ats_rec_tab #define DEBUG(fmt, ...) while (0) {} #define DEBUG_VAR(x, y...) -#define WARN(fmt, arg ...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); +#define WARN(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__ , ## arg); #define WARN_VAR(x, y...) x, ## y -//#define DEBUG(fmt, arg ...) printk("ib_at: %s: " fmt "\n", __FUNCTION__, ## arg); +//#define DEBUG(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__ , ## arg); //#define DEBUG_VAR(x, y...) x, ## y static kmem_cache_t *route_req_cache = NULL; From rolandd at cisco.com Tue Sep 6 08:56:54 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 08:56:54 -0700 Subject: [openib-general] Re: ibv_get_async_event References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> Message-ID: <52slwiw5o9.fsf@cisco.com> Michael> If thats what we are trying to do, I'd like to propose Michael> another idea: when cq is destroyed, and after all cq Michael> events are queued to user, put a special "cq destroyed" Michael> event into the event queue. When a user calls Michael> destroy_cq, and after releasing kernel/hardware Michael> resources, move the cq structure to a special cleanup Michael> list, userspace library will only destroy the userspace Michael> cq structure when it gets this event. Makes sense. I coded this up with the API as below: /** * ibv_destroy_cq - Destroy a completion queue * @cq: CQ to destroy. * * If ibv_req_notify_cq() has ever been called for @cq, then the * CQ structure will not be freed until ibv_get_cq_event() has * returned a "dead CQ" event (ie is_dead != 0) for this CQ. * * Calling ibv_destroy_cq() from one thread while another thread calls * ibv_req_notify_cq() for the same CQ will lead to unpredictable * results. Don't do that. */ extern int ibv_destroy_cq(struct ibv_cq *cq); /** * ibv_get_cq_event - Read next CQ event * @context: Context to get CQ event for * @comp_num: Index of completion event to check. Must be >= 0 and * <= context->num_comp. * @cq: Used to return pointer to CQ. * @cq_context: Used to return consumer-supplied CQ context. * @is_dead: If non-zero, then the event indicates that the CQ has * been destroyed and no more completion events will be generated * for the CQ. */ extern int ibv_get_cq_event(struct ibv_context *context, int comp_num, struct ibv_cq **cq, void **cq_context, int *is_dead); I'll post the full kernel patches and userspace changes in separate emails shortly. Michael> Instead of reference counting, we require the user to Michael> poll for this event after destroying the cq. Since all Michael> events are polled from a single thread, when the library Michael> sees this event it knows the user does not keep a pointer Michael> to the cq any more. Actually there's no rule that all events need to be read from a single thread. But if an application wants to use multiple threads and ends up getting things out of order, then I think that's the app's fault. - R. From rolandd at cisco.com Tue Sep 6 08:56:54 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 08:56:54 -0700 Subject: [openib-general] [PATCH] Kernel stale CQ event handling (was: ibv_get_async_event) References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> Message-ID: <52mzmqw5o9.fsf@cisco.com> This is the kernel side of MST's idea for stale CQ event handling. When a CQ is destroyed, we sweep all the existing completion events for that CQ and, if requested, create a "dead CQ" event so that userspace can know the CQ is gone. --- infiniband/core/uverbs_cmd.c (revision 3319) +++ infiniband/core/uverbs_cmd.c (working copy) @@ -590,7 +590,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uv struct ib_uverbs_create_cq cmd; struct ib_uverbs_create_cq_resp resp; struct ib_udata udata; - struct ib_uevent_object *uobj; + struct ib_ucq_object *uobj; struct ib_cq *cq; int ret; @@ -614,6 +614,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uv uobj->uobject.user_handle = cmd.user_handle; uobj->uobject.context = file->ucontext; uobj->events_reported = 0; + INIT_LIST_HEAD(&uobj->comp_list); INIT_LIST_HEAD(&uobj->event_list); cq = file->device->ib_dev->create_cq(file->device->ib_dev, cmd.cqe, @@ -685,8 +686,9 @@ ssize_t ib_uverbs_destroy_cq(struct ib_u struct ib_uverbs_destroy_cq cmd; struct ib_uverbs_destroy_cq_resp resp; struct ib_cq *cq; - struct ib_uevent_object *uobj; - struct ib_uverbs_async_event *evt, *tmp; + struct ib_ucq_object *uobj; + struct ib_uverbs_event *evt, *tmp, *dead_evt; + u64 user_handle; int ret = -EINVAL; if (copy_from_user(&cmd, buf, sizeof cmd)) @@ -700,7 +702,8 @@ ssize_t ib_uverbs_destroy_cq(struct ib_u if (!cq || cq->uobject->context != file->ucontext) goto out; - uobj = container_of(cq->uobject, struct ib_uevent_object, uobject); + user_handle = cq->uobject->user_handle; + uobj = container_of(cq->uobject, struct ib_ucq_object, uobject); ret = ib_destroy_cq(cq); if (ret) @@ -712,6 +715,31 @@ ssize_t ib_uverbs_destroy_cq(struct ib_u list_del(&uobj->uobject.list); spin_unlock_irq(&file->ucontext->lock); + if (cmd.dead_event) { + dead_evt = kmalloc(sizeof *dead_evt, GFP_KERNEL); + if (dead_evt) { + dead_evt->desc.comp.cq_handle = user_handle; + dead_evt->desc.comp.is_dead = 1; + dead_evt->desc.comp.reserved = 0; + } + } else + dead_evt = NULL; + + spin_lock_irq(&file->comp_file[0].lock); + + list_for_each_entry_safe(evt, tmp, &uobj->comp_list, obj_list) { + list_del(&evt->list); + kfree(evt); + } + + if (dead_evt) { + list_add_tail(&dead_evt->list, &file->comp_file[0].event_list); + wake_up_interruptible(&file->comp_file[0].poll_wait); + kill_fasync(&file->comp_file[0].async_queue, SIGIO, POLL_IN); + } + + spin_unlock_irq(&file->comp_file[0].lock); + spin_lock_irq(&file->async_file.lock); list_for_each_entry_safe(evt, tmp, &uobj->event_list, obj_list) { list_del(&evt->list); @@ -955,7 +983,7 @@ ssize_t ib_uverbs_destroy_qp(struct ib_u struct ib_uverbs_destroy_qp_resp resp; struct ib_qp *qp; struct ib_uevent_object *uobj; - struct ib_uverbs_async_event *evt, *tmp; + struct ib_uverbs_event *evt, *tmp; int ret = -EINVAL; if (copy_from_user(&cmd, buf, sizeof cmd)) @@ -1193,7 +1221,7 @@ ssize_t ib_uverbs_destroy_srq(struct ib_ struct ib_uverbs_destroy_srq_resp resp; struct ib_srq *srq; struct ib_uevent_object *uobj; - struct ib_uverbs_async_event *evt, *tmp; + struct ib_uverbs_event *evt, *tmp; int ret = -EINVAL; if (copy_from_user(&cmd, buf, sizeof cmd)) --- infiniband/core/uverbs.h (revision 3319) +++ infiniband/core/uverbs.h (working copy) @@ -76,21 +76,26 @@ struct ib_uverbs_file { struct ib_uverbs_event_file comp_file[1]; }; -struct ib_uverbs_async_event { - struct ib_uverbs_async_event_desc desc; +struct ib_uverbs_event { + union { + struct ib_uverbs_async_event_desc async; + struct ib_uverbs_comp_event_desc comp; + } desc; struct list_head list; struct list_head obj_list; u32 *counter; }; -struct ib_uverbs_comp_event { - struct ib_uverbs_comp_event_desc desc; - struct list_head list; +struct ib_uevent_object { + struct ib_uobject uobject; + struct list_head event_list; + u32 events_reported; }; -struct ib_uevent_object { +struct ib_ucq_object { struct ib_uobject uobject; struct list_head event_list; + struct list_head comp_list; u32 events_reported; }; --- infiniband/core/uverbs_main.c (revision 3319) +++ infiniband/core/uverbs_main.c (working copy) @@ -128,7 +128,7 @@ static int ib_dealloc_ucontext(struct ib idr_remove(&ib_uverbs_cq_idr, uobj->id); ib_destroy_cq(cq); list_del(&uobj->list); - kfree(container_of(uobj, struct ib_uevent_object, uobject)); + kfree(container_of(uobj, struct ib_ucq_object, uobject)); } list_for_each_entry_safe(uobj, tmp, &context->srq_list, list) { @@ -182,9 +182,8 @@ static ssize_t ib_uverbs_event_read(stru size_t count, loff_t *pos) { struct ib_uverbs_event_file *file = filp->private_data; - struct ib_uverbs_async_event *async_evt = NULL; + struct ib_uverbs_event *event; u32 *counter = NULL; - void *event; int eventsz; int ret = 0; @@ -209,19 +208,17 @@ static ssize_t ib_uverbs_event_read(stru return -ENODEV; } + event = list_entry(file->event_list.next, struct ib_uverbs_event, list); + if (file->is_async) { - async_evt = list_entry(file->event_list.next, - struct ib_uverbs_async_event, list); - event = async_evt; - eventsz = sizeof *async_evt; - counter = async_evt->counter; + eventsz = sizeof (struct ib_uverbs_async_event_desc); + counter = event->counter; if (counter) ++*counter; } else { - event = list_entry(file->event_list.next, - struct ib_uverbs_comp_event, list); eventsz = sizeof (struct ib_uverbs_comp_event_desc); + counter = NULL; } if (eventsz > count) { @@ -229,8 +226,8 @@ static ssize_t ib_uverbs_event_read(stru event = NULL; } else { list_del(file->event_list.next); - if (counter) - list_del(&async_evt->obj_list); + if (counter || (!file->is_async && !event->desc.comp.is_dead)) + list_del(&event->obj_list); } spin_unlock_irq(&file->lock); @@ -267,16 +264,13 @@ static unsigned int ib_uverbs_event_poll static void ib_uverbs_event_release(struct ib_uverbs_event_file *file) { - struct list_head *entry, *tmp; + struct ib_uverbs_event *entry, *tmp; spin_lock_irq(&file->lock); if (file->fd != -1) { file->fd = -1; - list_for_each_safe(entry, tmp, &file->event_list) - if (file->is_async) - kfree(list_entry(entry, struct ib_uverbs_async_event, list)); - else - kfree(list_entry(entry, struct ib_uverbs_comp_event, list)); + list_for_each_entry_safe(entry, tmp, &file->event_list, list) + kfree(entry); } spin_unlock_irq(&file->lock); } @@ -314,18 +308,24 @@ static struct file_operations uverbs_eve void ib_uverbs_comp_handler(struct ib_cq *cq, void *cq_context) { - struct ib_uverbs_file *file = cq_context; - struct ib_uverbs_comp_event *entry; - unsigned long flags; + struct ib_uverbs_file *file = cq_context; + struct ib_ucq_object *uobj; + struct ib_uverbs_event *entry; + unsigned long flags; entry = kmalloc(sizeof *entry, GFP_ATOMIC); if (!entry) return; - entry->desc.cq_handle = cq->uobject->user_handle; + uobj = container_of(cq->uobject, struct ib_ucq_object, uobject); + + entry->desc.comp.cq_handle = cq->uobject->user_handle; + entry->desc.comp.is_dead = 0; + entry->desc.comp.reserved = 0; spin_lock_irqsave(&file->comp_file[0].lock, flags); list_add_tail(&entry->list, &file->comp_file[0].event_list); + list_add_tail(&entry->obj_list, &uobj->comp_list); spin_unlock_irqrestore(&file->comp_file[0].lock, flags); wake_up_interruptible(&file->comp_file[0].poll_wait); @@ -337,16 +337,16 @@ static void ib_uverbs_async_handler(stru struct list_head *obj_list, u32 *counter) { - struct ib_uverbs_async_event *entry; + struct ib_uverbs_event *entry; unsigned long flags; entry = kmalloc(sizeof *entry, GFP_ATOMIC); if (!entry) return; - entry->desc.element = element; - entry->desc.event_type = event; - entry->counter = counter; + entry->desc.async.element = element; + entry->desc.async.event_type = event; + entry->counter = counter; spin_lock_irqsave(&file->async_file.lock, flags); list_add_tail(&entry->list, &file->async_file.event_list); @@ -360,10 +360,10 @@ static void ib_uverbs_async_handler(stru void ib_uverbs_cq_event_handler(struct ib_event *event, void *context_ptr) { - struct ib_uevent_object *uobj; + struct ib_ucq_object *uobj; uobj = container_of(event->element.cq->uobject, - struct ib_uevent_object, uobject); + struct ib_ucq_object, uobject); ib_uverbs_async_handler(context_ptr, uobj->uobject.user_handle, event->event, &uobj->event_list, --- infiniband/include/rdma/ib_user_verbs.h (revision 3319) +++ infiniband/include/rdma/ib_user_verbs.h (working copy) @@ -102,6 +102,8 @@ struct ib_uverbs_async_event_desc { struct ib_uverbs_comp_event_desc { __u64 cq_handle; + __u32 is_dead; + __u32 reserved; }; /* @@ -294,6 +296,7 @@ struct ib_uverbs_create_cq_resp { struct ib_uverbs_destroy_cq { __u64 response; __u32 cq_handle; + __u32 dead_event; }; struct ib_uverbs_destroy_cq_resp { From rolandd at cisco.com Tue Sep 6 08:56:52 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 08:56:52 -0700 Subject: [openib-general] RE: SDP message loss References: Message-ID: <524q8yxk8r.fsf@cisco.com> Rajib> we are writing large number of messages. sometime, some of Rajib> the messages are lost i.e not received by sink at all. Can you post the application that has this problem? The easiest way for us to debug this is if we can reproduce the problem on our own machines. Thanks, Roland From rolandd at cisco.com Tue Sep 6 08:56:55 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 08:56:55 -0700 Subject: [openib-general] [PATCH] libibverbs handling for stale CQ events (was: ibv_get_async_event) References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> Message-ID: <52hdcyw5o8.fsf@cisco.com> Here's the libibverbs part of MST's idea for stale CQ event handling. --- libibverbs/include/infiniband/verbs.h (revision 3296) +++ libibverbs/include/infiniband/verbs.h (working copy) @@ -503,6 +503,7 @@ struct ibv_cq { pthread_mutex_t mutex; pthread_cond_t cond; uint32_t events_completed; + int need_dead_event; }; struct ibv_ah { @@ -545,6 +546,7 @@ struct ibv_context_ops { int (*req_notify_cq)(struct ibv_cq *cq, int solicited); void (*cq_event)(struct ibv_cq *cq); int (*destroy_cq)(struct ibv_cq *cq); + int (*free_cq)(struct ibv_cq *cq); struct ibv_srq * (*create_srq)(struct ibv_pd *pd, struct ibv_srq_init_attr *srq_init_attr); int (*modify_srq)(struct ibv_srq *srq, @@ -677,14 +679,32 @@ extern struct ibv_cq *ibv_create_cq(stru /** * ibv_destroy_cq - Destroy a completion queue + * @cq: CQ to destroy. + * + * If ibv_req_notify_cq() has ever been called for @cq, then the + * CQ structure will not be freed until ibv_get_cq_event() has + * returned a "dead CQ" event (ie is_dead != 0) for this CQ. + * + * Calling ibv_destroy_cq() from one thread while another thread calls + * ibv_req_notify_cq() for the same CQ will lead to unpredictable + * results. Don't do that. */ extern int ibv_destroy_cq(struct ibv_cq *cq); /** * ibv_get_cq_event - Read next CQ event + * @context: Context to get CQ event for + * @comp_num: Index of completion event to check. Must be >= 0 and + * <= context->num_comp. + * @cq: Used to return pointer to CQ. + * @cq_context: Used to return consumer-supplied CQ context. + * @is_dead: If non-zero, then the event indicates that the CQ has + * been destroyed and no more completion events will be generated + * for the CQ. */ extern int ibv_get_cq_event(struct ibv_context *context, int comp_num, - struct ibv_cq **cq, void **cq_context); + struct ibv_cq **cq, void **cq_context, + int *is_dead); /** @@ -710,6 +730,7 @@ static inline int ibv_poll_cq(struct ibv */ static inline int ibv_req_notify_cq(struct ibv_cq *cq, int solicited) { + cq->need_dead_event = 1; return cq->context->ops.req_notify_cq(cq, solicited); } --- libibverbs/include/infiniband/kern-abi.h (revision 3296) +++ libibverbs/include/infiniband/kern-abi.h (working copy) @@ -105,8 +105,14 @@ struct ibv_kern_async_event { __u32 reserved; }; +struct ibv_comp_event_v1 { + __u64 cq_handle; +}; + struct ibv_comp_event { __u64 cq_handle; + __u32 is_dead; + __u32 reserved; }; /* @@ -332,6 +338,7 @@ struct ibv_destroy_cq { __u16 out_words; __u64 response; __u32 cq_handle; + __u32 dead_event; }; struct ibv_destroy_cq_resp { --- libibverbs/src/verbs.c (revision 3296) +++ libibverbs/src/verbs.c (working copy) @@ -42,6 +42,17 @@ #include "ibverbs.h" +/* + * Keep a dead CQ list in userspace to simulate the dead CQ completion + * event introduced with ABI version 2. This is slightly racy but it + * at least allows us to keep the same library API. + */ +static pthread_mutex_t cq_dead_mutex = PTHREAD_MUTEX_INITIALIZER; +static struct ibv_dead_cq { + struct ibv_cq *cq; + struct ibv_dead_cq *next; +} *cq_dead_list = NULL; + int ibv_query_device(struct ibv_context *context, struct ibv_device_attr *device_attr) { @@ -110,6 +121,7 @@ struct ibv_cq *ibv_create_cq(struct ibv_ cq->context = context; cq->cq_context = cq_context; cq->events_completed = 0; + cq->need_dead_event = 0; pthread_mutex_init(&cq->mutex, NULL); pthread_cond_init(&cq->cond, NULL); } @@ -119,25 +131,89 @@ struct ibv_cq *ibv_create_cq(struct ibv_ int ibv_destroy_cq(struct ibv_cq *cq) { - return cq->context->ops.destroy_cq(cq); + struct ibv_dead_cq *dead; + int need_dead_event = cq->need_dead_event; + int ret; + + ret = cq->context->ops.destroy_cq(cq); + + if (abi_ver == 1 && !ret && need_dead_event) { + pthread_mutex_lock(&cq_dead_mutex); + dead = malloc(sizeof *dead); + if (dead) { + dead->cq = cq; + dead->next = cq_dead_list; + cq_dead_list = dead; + } + pthread_mutex_unlock(&cq_dead_mutex); + } + + return ret; } +int ibv_get_cq_event_v1(struct ibv_context *context, int comp_num, + struct ibv_cq **cq, void **cq_context) +{ + struct ibv_dead_cq *dead; + struct ibv_comp_event_v1 ev; + + /* + * Check if there are any synthetic "dead CQ" events before + * trying to read our file descriptor. There are some race + * windows here -- for example, we might enter this function + * to early to see a dead CQ event and then block forever if + * no real CQ events are generated later. + */ + + pthread_mutex_lock(&cq_dead_mutex); + if (cq_dead_list != NULL) { + *cq = cq_dead_list->cq; + *cq_context = (*cq)->cq_context; + dead = cq_dead_list; + cq_dead_list = dead->next; + free(dead); + pthread_mutex_unlock(&cq_dead_mutex); + + (*cq)->context->ops.free_cq(cq); + return 0; + } + pthread_mutex_unlock(&cq_dead_mutex); + + if (read(context->cq_fd[comp_num], &ev, sizeof ev) != sizeof ev) + return -1; + + *cq = (struct ibv_cq *) (uintptr_t) ev.cq_handle; + *cq_context = (*cq)->cq_context; + + if ((*cq)->context->ops.cq_event) + (*cq)->context->ops.cq_event(*cq); + + return 0; +} int ibv_get_cq_event(struct ibv_context *context, int comp_num, - struct ibv_cq **cq, void **cq_context) + struct ibv_cq **cq, void **cq_context, int *is_dead) { struct ibv_comp_event ev; if (comp_num < 0 || comp_num >= context->num_comp) return -1; + if (abi_ver == 1) { + *is_dead = 0; + return ibv_get_cq_event_v1(context, comp_num, cq, cq_context); + } + if (read(context->cq_fd[comp_num], &ev, sizeof ev) != sizeof ev) return -1; *cq = (struct ibv_cq *) (uintptr_t) ev.cq_handle; *cq_context = (*cq)->cq_context; + *is_dead = ev.is_dead; - if ((*cq)->context->ops.cq_event) + if (ev.is_dead) + (*cq)->context->ops.free_cq(*cq); + else if ((*cq)->context->ops.cq_event) (*cq)->context->ops.cq_event(*cq); return 0; --- libibverbs/src/cmd.c (revision 3296) +++ libibverbs/src/cmd.c (working copy) @@ -297,7 +297,8 @@ int ibv_cmd_destroy_cq(struct ibv_cq *cq return ibv_cmd_destroy_cq_v1(cq); IBV_INIT_CMD_RESP(&cmd, sizeof cmd, DESTROY_CQ, &resp, sizeof resp); - cmd.cq_handle = cq->handle; + cmd.cq_handle = cq->handle; + cmd.dead_event = cq->need_dead_event; if (write(cq->context->cmd_fd, &cmd, sizeof cmd) != sizeof cmd) return errno; --- libibverbs/ChangeLog (revision 3296) +++ libibverbs/ChangeLog (working copy) @@ -1,3 +1,13 @@ +2005-09-04 Roland Dreier + + * include/infiniband/verbs.h, include/infiniband/kern-abi.h, + src/verbs.c, src/cmd.c, examples/rc_pingpong.c, + examples/srq_pingpong.c, examples/uc_pingpong.c, + examples/ud_pingpong.c: Update to handle new kernel ABI for + avoiding stale completion events. Change ibv_get_cq_event() to + return "dead CQ" events, and synthesize these events for old + kernels. + 2005-08-31 Roland Dreier * include/infiniband/kern-abi.h, include/infiniband/verbs.h, --- libibverbs/examples/rc_pingpong.c (revision 3296) +++ libibverbs/examples/rc_pingpong.c (working copy) @@ -604,12 +604,18 @@ int main(int argc, char *argv[]) if (use_event) { struct ibv_cq *ev_cq; void *ev_ctx; + int dead; - if (ibv_get_cq_event(ctx->context, 0, &ev_cq, &ev_ctx)) { + if (ibv_get_cq_event(ctx->context, 0, &ev_cq, &ev_ctx, &dead)) { fprintf(stderr, "Failed to get cq_event\n"); return 1; } + if (dead) { + fprintf(stderr, "Unexpected CQ dead event\n"); + return 1; + } + if (ev_cq != ctx->cq) { fprintf(stderr, "CQ event for unknown CQ %p\n", ev_cq); return 1; --- libibverbs/examples/srq_pingpong.c (revision 3296) +++ libibverbs/examples/srq_pingpong.c (working copy) @@ -677,12 +677,18 @@ int main(int argc, char *argv[]) if (use_event) { struct ibv_cq *ev_cq; void *ev_ctx; + int dead; - if (ibv_get_cq_event(ctx->context, 0, &ev_cq, &ev_ctx)) { + if (ibv_get_cq_event(ctx->context, 0, &ev_cq, &ev_ctx, &dead)) { fprintf(stderr, "Failed to get cq_event\n"); return 1; } + if (dead) { + fprintf(stderr, "Unexpected CQ dead event\n"); + return 1; + } + if (ev_cq != ctx->cq) { fprintf(stderr, "CQ event for unknown CQ %p\n", ev_cq); return 1; --- libibverbs/examples/uc_pingpong.c (revision 3296) +++ libibverbs/examples/uc_pingpong.c (working copy) @@ -596,12 +596,18 @@ int main(int argc, char *argv[]) if (use_event) { struct ibv_cq *ev_cq; void *ev_ctx; + int dead; - if (ibv_get_cq_event(ctx->context, 0, &ev_cq, &ev_ctx)) { + if (ibv_get_cq_event(ctx->context, 0, &ev_cq, &ev_ctx, &dead)) { fprintf(stderr, "Failed to get cq_event\n"); return 1; } + if (dead) { + fprintf(stderr, "Unexpected CQ dead event\n"); + return 1; + } + if (ev_cq != ctx->cq) { fprintf(stderr, "CQ event for unknown CQ %p\n", ev_cq); return 1; --- libibverbs/examples/ud_pingpong.c (revision 3296) +++ libibverbs/examples/ud_pingpong.c (working copy) @@ -600,12 +600,18 @@ int main(int argc, char *argv[]) if (use_event) { struct ibv_cq *ev_cq; void *ev_ctx; + int dead; - if (ibv_get_cq_event(ctx->context, 0, &ev_cq, &ev_ctx)) { + if (ibv_get_cq_event(ctx->context, 0, &ev_cq, &ev_ctx, &dead)) { fprintf(stderr, "Failed to get cq_event\n"); return 1; } + if (dead) { + fprintf(stderr, "Unexpected CQ dead event\n"); + return 1; + } + if (ev_cq != ctx->cq) { fprintf(stderr, "CQ event for unknown CQ %p\n", ev_cq); return 1; From rolandd at cisco.com Tue Sep 6 08:56:56 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 08:56:56 -0700 Subject: [openib-general] [PATCH] libmthca stale CQ event handling (was: ibv_get_async_event) References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> Message-ID: <52br36w5o7.fsf@cisco.com> Finally, the simple libmthca changes. We just need to split up destroy_cq and free_cq so that libibverbs can wait to actually free a CQ structure. --- libmthca/src/verbs.c (revision 3296) +++ libmthca/src/verbs.c (working copy) @@ -247,9 +247,16 @@ int mthca_destroy_cq(struct ibv_cq *cq) int ret; ret = ibv_cmd_destroy_cq(cq); - if (ret) + if (ret || cq->need_dead_event) return ret; + mthca_free_cq(cq); + + return 0; +} + +void mthca_free_cq(struct ibv_cq *cq) +{ if (mthca_is_memfree(cq->context)) { mthca_free_db(to_mctx(cq->context)->db_tab, MTHCA_DB_TYPE_CQ_SET_CI, to_mcq(cq)->set_ci_db_index); @@ -261,8 +268,6 @@ int mthca_destroy_cq(struct ibv_cq *cq) free(to_mcq(cq)->buf); free(to_mcq(cq)); - - return 0; } static int align_queue_size(struct ibv_context *context, int size, int spare) --- libmthca/src/mthca.h (revision 3296) +++ libmthca/src/mthca.h (working copy) @@ -299,6 +299,7 @@ extern int mthca_dereg_mr(struct ibv_mr extern struct ibv_cq *mthca_create_cq(struct ibv_context *context, int cqe); extern int mthca_destroy_cq(struct ibv_cq *cq); +extern void mthca_free_cq(struct ibv_cq *cq); extern int mthca_poll_cq(struct ibv_cq *cq, int ne, struct ibv_wc *wc); extern int mthca_tavor_arm_cq(struct ibv_cq *cq, int solicited); extern int mthca_arbel_arm_cq(struct ibv_cq *cq, int solicited); --- libmthca/src/mthca.c (revision 3296) +++ libmthca/src/mthca.c (working copy) @@ -108,6 +108,7 @@ static struct ibv_context_ops mthca_ctx_ .create_cq = mthca_create_cq, .poll_cq = mthca_poll_cq, .destroy_cq = mthca_destroy_cq, + .free_cq = mthca_free_cq, .create_srq = mthca_create_srq, .destroy_srq = mthca_destroy_srq, .create_qp = mthca_create_qp, --- libmthca/ChangeLog (revision 3296) +++ libmthca/ChangeLog (working copy) @@ -1,3 +1,10 @@ +2005-09-04 Roland Dreier + + * src/mthca.c, src/mthca.h, src/verbs.c: Update for new kernel + ABI, which generates synthetic "CQ dead" events. To handle this + we need to split freeing the userspace CQ from the destroy CQ + operation. + 2005-08-31 Roland Dreier * src/memfree.c (mthca_free_db): When we free a doorbell record, From jlentini at netapp.com Tue Sep 6 09:00:39 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 12:00:39 -0400 (EDT) Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <20050906144344.GO19358@mellanox.co.il> References: <20050906144344.GO19358@mellanox.co.il> Message-ID: On Tue, 6 Sep 2005, Michael S. Tsirkin wrote: > Quoting r. James Lentini : > > Subject: [mstflint] firmware upgrade instructions > > > > > > Hi Michael, > > > > I'm guessing that you are the maintainer of mstflint. Two questions: > > What is the difference between mstflint and tvflash? > > I didnt personally use tvflash. I think this tool is specific for > topspin cards. I think you'll need to use it if you are using the > topspin gen1 driver. If tvflash requires the gen1 stack, why is it located at https://openib.org/svn/gen2/trunk/src/userspace/tvflash/ ^^^^ Is it in the wrong place? > For mellanox cards, you can also use the mlxburn tool, part of the > ibadm package in mellanox ib gold. > > > Using mstflint, how can the firmware located on the Mellanox website: > > > > http://www.mellanox.com/products/firmware.html > > > > be used to upgrade an HCA? > > James, .mlx is a generic firmware common for all boards based > on mellanox silicon. > Therefore, to get a firmware image that flint can burn, > in addition to the .mlx file, you also need a .brd file > that matches the board you have. > > For mellanox cards you can usually figure that our by the board id, > which is reported by running "mstflint -d q" > or by looking at board id in /sys/class/infiniband/mthcaX/board_id > > Once you have .mlx and .brd, compile these into a firmware image > specific for your board. This can be done by imgen tools that I have just > uploaded to the src/userspace/imgen directory. > > Pls look at imgen/README file that explains how to do it. > > HTH It is a big help. I'll add this information to the Wiki. From danb at voltaire.com Tue Sep 6 09:02:18 2005 From: danb at voltaire.com (Dan Bar Dov) Date: Tue, 6 Sep 2005 19:02:18 +0300 Subject: [openib-general] RE: [PATCH] iSER: Make iser depend on kdapl Message-ID: > > Dan> Merged. Also removed SCSI dependency, ISER really depends on > Dan> ISCSI, but that is not in the kernel yet. > > Shouldn't iSER depend on something having to do with SCSI? Otherwise > it's too easy to create configurations that don't compile. Yes, iSER will depend on ISCSI_ISER, as well as on IB core (currently kDAPL). ISCSI_ISER is the parallel of ISCSI_TCP, and it depends on SCSI and SCSI_ISCSI_ATTRS. The ISCSI_ISER is submitted through open-iscsi. It is also possible to merge ISCSI_ISER and ISER and then it will depend directly on SCSI and the rest. Dan > > For example, the latest iSCSI submission that I saw had: > > config ISCSI_TCP > tristate "iSCSI Initiator over TCP/IP" > depends on SCSI && INET && SCSI_ISCSI_ATTRS > > - R. > From halr at voltaire.com Tue Sep 6 09:06:11 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 12:06:11 -0400 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: References: <20050906144344.GO19358@mellanox.co.il> Message-ID: <1126022770.4406.21.camel@hal.voltaire.com> On Tue, 2005-09-06 at 12:00, James Lentini wrote: > If tvflash requires the gen1 stack, why is it located at > > https://openib.org/svn/gen2/trunk/src/userspace/tvflash/ > ^^^^ > Is it in the wrong place? No. It works with gen2. -- Hal From mshefty at ichips.intel.com Tue Sep 6 09:13:31 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Sep 2005 09:13:31 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <20050904141437.GT1707@mellanox.co.il> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> Message-ID: <431DC02B.6010506@ichips.intel.com> Michael S. Tsirkin wrote: > If thats what we are trying to do, I'd like to propose another idea: > when cq is destroyed, and after all cq events are queued to user, > put a special "cq destroyed" event into the event queue. > When a user calls destroy_cq, and after releasing kernel/hardware resources, > move the cq structure to a special cleanup list, userspace library will > only destroy the userspace cq structure when it gets this event. I think that this will only work if users are using a single thread to poll for events. I don't think that we want to impose such a restriction. - Sean From liran at mellanox.co.il Tue Sep 6 09:15:26 2005 From: liran at mellanox.co.il (Liran Sorani) Date: Tue, 6 Sep 2005 19:15:26 +0300 Subject: [openib-general] Re: [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_d elete Message-ID: <506C3D7B14CDD411A52C00025558DED608B3642E@mtlex01.yok.mtl.com> > Hal Rosenstock wrote: > > On Tue, 2005-09-06 at 11:04, Eitan Zahavi wrote: > > > >Hal Rosenstock wrote: > > > >>[same patch just generated with diff -up] > >> > >>OpenSM: OpenIB vendor layer: Implement osm_vendor_delete > >> > >>[I've done some testing of this; are there any regressions for this ?] > >> > >>OpenSM call osm_vendor_delete during osm_opensm_destroy. > >>It is invoked during exit. > > > > > > Actually, it should have said "Implement osm_vendor_unbind" rather than > > delete. > Well the semantics are very old. We stick to the old osm_vendor_api.h. Maybe we should not have. > But now we have too much depending on this API that I urge you not to modify it if possible. > > > > OpenSM calls it in a similar place (as it stops the SA and SM MAD > > controllers). I went through this starting and stopping the OpenSM a > > number of times although did not do this "infinitely". > > > > I was asking about any other explicit regressions. > No we do not have tests for the osm_vendor_api.h. It might have been a good idea. > We do have now some stuff for the Windows code. Maybe we should try and make a test suite > from it. Liran - what do you think ? I think that the vendor api should be tested with several bad flows (under OpenSM / Osmtest we use only the good flow ) , althrough not tested , I'm guiding a verficator (Yoav) to write directly over umad api test that send / receive SMP / GSI packets . Liran. > > > > -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Tue Sep 6 09:15:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 09:15:11 -0700 Subject: [openib-general] RE: [openib-commits] r3137 -gen2/trunk/src/linux-kernel/infiniband/ulp/ipoib References: <5CE025EE7D88BA4599A2C8FEFCF226F5175BDA@taurus.voltaire.com> <1125712580.4398.8182.camel@hal.voltaire.com> <52d5nq51s7.fsf@cisco.com> <1126021794.4406.3.camel@hal.voltaire.com> Message-ID: <52ek82uq9c.fsf@cisco.com> Hal> Maybe it is just the interface name (w/ifconfig) that has Hal> it's PKey promoted to a full member of the partition then. Yes, that is what happens. - R. From halr at voltaire.com Tue Sep 6 09:11:44 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 12:11:44 -0400 Subject: [openib-general] Re: at won't compile with gcc-2.95 In-Reply-To: <52y86aw5oa.fsf@cisco.com> References: <52slwo91lm.fsf@cisco.com> <1125595270.4398.215.camel@hal.voltaire.com> <527je08zi2.fsf@cisco.com> <1125702531.4398.8121.camel@hal.voltaire.com> <52y86aw5oa.fsf@cisco.com> Message-ID: <1126023104.4406.30.camel@hal.voltaire.com> On Tue, 2005-09-06 at 11:56, Roland Dreier wrote: > Hal> Can you try the following patch with gcc 2.95 and let me know > Hal> if this works ? > > No, that doesn't help. However adding a space before the last comma > and removing the space before the '...' does build with gcc 2.95. Thanks. Applied. -- Hal From mshefty at ichips.intel.com Tue Sep 6 09:17:17 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Sep 2005 09:17:17 -0700 Subject: [openib-general] Re: MgtWG RMPP Answers In-Reply-To: <1125836964.4398.9193.camel@hal.voltaire.com> References: <1125836964.4398.9193.camel@hal.voltaire.com> Message-ID: <431DC10D.6030606@ichips.intel.com> Hal Rosenstock wrote: > 2. The middle payload lengths are not necessary to be set to 0. Should > that be backed out ? I'm fine either way on this. There isn't much overhead to set them to 0. - Sean From rolandd at cisco.com Tue Sep 6 09:25:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 09:25:11 -0700 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions References: <20050906144344.GO19358@mellanox.co.il> Message-ID: <52ek82tb88.fsf@cisco.com> James> If tvflash requires the gen1 stack, why is it located at It doesn't require the gen1 stack or any stack at all for that matter. - R. From rolandd at cisco.com Tue Sep 6 09:25:12 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 09:25:12 -0700 Subject: [openib-general] RE: [PATCH] iSER: Make iser depend on kdapl References: Message-ID: <528xyatb87.fsf@cisco.com> Dan> Yes, iSER will depend on ISCSI_ISER, as well as on IB core Dan> (currently kDAPL). ISCSI_ISER is the parallel of ISCSI_TCP, Dan> and it depends on SCSI and SCSI_ISCSI_ATTRS. The ISCSI_ISER Dan> is submitted through open-iscsi. It is also possible to merge Dan> ISCSI_ISER and ISER and then it will depend directly on SCSI Dan> and the rest. If it will depend on ISCSI_ISER, why not put the dependency in now? Otherwise we'll get reports of "iser doesn't compile" because of broken .configs. - R. From halr at voltaire.com Tue Sep 6 09:25:12 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 12:25:12 -0400 Subject: [openib-general] Re: MgtWG RMPP Answers In-Reply-To: <431DC10D.6030606@ichips.intel.com> References: <1125836964.4398.9193.camel@hal.voltaire.com> <431DC10D.6030606@ichips.intel.com> Message-ID: <1126023544.4406.43.camel@hal.voltaire.com> On Tue, 2005-09-06 at 12:17, Sean Hefty wrote: > Hal Rosenstock wrote: > > 2. The middle payload lengths are not necessary to be set to 0. Should > > that be backed out ? > > I'm fine either way on this. There isn't much overhead to set them to 0. I prefer to leave it at 0 (unless I hear otherwise from someone for some reason). -- Hal From mst at mellanox.co.il Tue Sep 6 09:32:04 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Sep 2005 19:32:04 +0300 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: References: <20050906144344.GO19358@mellanox.co.il> Message-ID: <20050906163204.GA30290@mellanox.co.il> Quoting r. James Lentini : > > > I'm guessing that you are the maintainer of mstflint. Two questions: > > > What is the difference between mstflint and tvflash? > > > > I didnt personally use tvflash. I think this tool is specific for > > topspin cards. I think you'll need to use it if you are using the > > topspin gen1 driver. > > If tvflash requires the gen1 stack, why is it located at > > https://openib.org/svn/gen2/trunk/src/userspace/tvflash/ > ^^^^ > Is it in the wrong place? No, I meant it the other way around: topspin stack needs tvflash on topspin cards. Roland will know for sure. -- MST From rolandd at cisco.com Tue Sep 6 09:34:00 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 09:34:00 -0700 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <20050906163204.GA30290@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 6 Sep 2005 19:32:04 +0300") References: <20050906144344.GO19358@mellanox.co.il> <20050906163204.GA30290@mellanox.co.il> Message-ID: <52zmqqrw93.fsf@cisco.com> Michael> No, I meant it the other way around: topspin stack needs Michael> tvflash on topspin cards. Roland will know for sure. No, that's actually not the case either. The Topspin stack will work with non-Topspin firmware. The only thing you really need tvflash for is setting information used by the bootable HCA firmware. - R. From rolandd at cisco.com Tue Sep 6 09:35:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 09:35:10 -0700 Subject: [openib-general] Re: ibv_get_async_event References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> Message-ID: <52u0gyrw75.fsf@cisco.com> Sean> I think that this will only work if users are using a single Sean> thread to poll for events. I don't think that we want to Sean> impose such a restriction. But I think Michael has a point. Do we really want to impose the cost of an extra pthread_mutex_lock/unlock for every completion event just so we make sure a pathological app is race-free? - R. From danb at voltaire.com Tue Sep 6 09:36:27 2005 From: danb at voltaire.com (Dan Bar Dov) Date: Tue, 6 Sep 2005 19:36:27 +0300 Subject: [openib-general] RE: [PATCH] iSER: Make iser depend on kdapl Message-ID: > > Dan> Yes, iSER will depend on ISCSI_ISER, as well as on IB core > Dan> (currently kDAPL). ISCSI_ISER is the parallel of ISCSI_TCP, > Dan> and it depends on SCSI and SCSI_ISCSI_ATTRS. The ISCSI_ISER > Dan> is submitted through open-iscsi. It is also possible to merge > Dan> ISCSI_ISER and ISER and then it will depend directly on SCSI > Dan> and the rest. > > If it will depend on ISCSI_ISER, why not put the dependency in now? > Otherwise we'll get reports of "iser doesn't compile" because of > broken .configs. I got it wrong. ISER is a transport provider, ISCSI_ISER will depend on it. ISER depends only on underlying IB. It compiles and loads regardless of ISCSI or SCSI. Dan > > - R. > From mst at mellanox.co.il Tue Sep 6 09:44:31 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Sep 2005 19:44:31 +0300 Subject: [openib-general] Re: Re: [mstflint] firmware upgrade instructions In-Reply-To: <52zmqqrw93.fsf@cisco.com> References: <20050906144344.GO19358@mellanox.co.il> <20050906163204.GA30290@mellanox.co.il> <52zmqqrw93.fsf@cisco.com> Message-ID: <20050906164431.GA30610@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: Re: [mstflint] firmware upgrade instructions > > Michael> No, I meant it the other way around: topspin stack needs > Michael> tvflash on topspin cards. Roland will know for sure. > > No, that's actually not the case either. The Topspin stack will work > with non-Topspin firmware. The only thing you really need tvflash for > is setting information used by the bootable HCA firmware. By bootable HCA firmware do you mean like e.g. PXE? -- MST From rolandd at cisco.com Tue Sep 6 09:45:28 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 09:45:28 -0700 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <20050906164431.GA30610@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 6 Sep 2005 19:44:31 +0300") References: <20050906144344.GO19358@mellanox.co.il> <20050906163204.GA30290@mellanox.co.il> <52zmqqrw93.fsf@cisco.com> <20050906164431.GA30610@mellanox.co.il> Message-ID: <52psrmrvpz.fsf@cisco.com> Michael> By bootable HCA firmware do you mean like e.g. PXE? Yes, exactly. - R. From mshefty at ichips.intel.com Tue Sep 6 09:46:49 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Sep 2005 09:46:49 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <52u0gyrw75.fsf@cisco.com> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> Message-ID: <431DC7F9.2050307@ichips.intel.com> Roland Dreier wrote: > Sean> I think that this will only work if users are using a single > Sean> thread to poll for events. I don't think that we want to > Sean> impose such a restriction. > > But I think Michael has a point. Do we really want to impose the cost > of an extra pthread_mutex_lock/unlock for every completion event just > so we make sure a pathological app is race-free? An app that simply uses multiple threads for event processing can hit this issue. If we can avoid this overhead for completion processing that would be good, but I don't see a way to do it without imposing some sort of restriction. Does the problem go away if we require users to poll for all CQ events after destroying a QP, but before destroying a CQ? - Sean From rolandd at cisco.com Tue Sep 6 09:54:24 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 09:54:24 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <431DC7F9.2050307@ichips.intel.com> (Sean Hefty's message of "Tue, 06 Sep 2005 09:46:49 -0700") References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> Message-ID: <52ll2arvb3.fsf@cisco.com> Sean> Does the problem go away if we require users to poll for all Sean> CQ events after destroying a QP, but before destroying a CQ? I don't see how an app could do this. It doesn't know how many CQ events it needs to retrieve, and there could be arbitrarily many events from other CQs to retrieve first. However, this is essentially the same as Michael's scheme, which I implemented. The app destroys the CQ and then retrieves events until it gets the "dead CQ" event. - R. From jlentini at netapp.com Tue Sep 6 09:55:40 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 12:55:40 -0400 (EDT) Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <20050906144344.GO19358@mellanox.co.il> References: <20050906144344.GO19358@mellanox.co.il> Message-ID: On Tue, 6 Sep 2005, Michael S. Tsirkin wrote: > For mellanox cards you can usually figure that our by the board id, > which is reported by running "mstflint -d q" > or by looking at board id in /sys/class/infiniband/mthcaX/board_id What if my board ID doesn't match any of the *.brd file names? # cat /sys/class/infiniband/mthca0/board_id MT_0030000001 # ./mstflint -d `lspci -d 15b3:5a44 | cut -f 1 -d ' '` q Image type: FailSafe Chip rev.: A1 GUID Des: Node Port1 Port2 Sys image GUIDs: 0002c90200003098 0002c90200003099 0002c9020000309a 0002c9020000309b Board ID: (MT_0030000001) > ls -1 fw-23108-rel-3_3_3/*.brd fw-23108-rel-3_3_3/MHX-CE128-T.brd fw-23108-rel-3_3_3/MHX-CE256-T.brd fw-23108-rel-3_3_3/MHX-CE512-T.brd fw-23108-rel-3_3_3/MHXL-CF128-T.brd fw-23108-rel-3_3_3/MHXL-CF256-T.brd fw-23108-rel-3_3_3/MTLP23108_jaguar.brd I know that it is a MHXL-CF128-T though. Is it safe to upgrade with that board file? From mshefty at ichips.intel.com Tue Sep 6 10:04:06 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Sep 2005 10:04:06 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <52ll2arvb3.fsf@cisco.com> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> Message-ID: <431DCC06.8050403@ichips.intel.com> Roland Dreier wrote: > Sean> Does the problem go away if we require users to poll for all > Sean> CQ events after destroying a QP, but before destroying a CQ? > > I don't see how an app could do this. It doesn't know how many CQ > events it needs to retrieve, and there could be arbitrarily many > events from other CQs to retrieve first. The intent is that after all QPs on a CQ are destroyed and all events are removed, then no new completion events could ever occur on that CQ. Destroying the CQ at this point should now be safe. > However, this is essentially the same as Michael's scheme, which I > implemented. The app destroys the CQ and then retrieves events until > it gets the "dead CQ" event. It's not quite the same. With a destroy event scheme, a call is made to destroy the CQ, but completion events could still be oustanding. I'm proposing delaying the call to destroy the CQ until no more completion events are possible. - Sean From jlentini at netapp.com Tue Sep 6 10:05:53 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 13:05:53 -0400 (EDT) Subject: [openib-general] Re: [PATCH] only build userspace verbs support if requested In-Reply-To: <1125697155.4398.8039.camel@hal.voltaire.com> References: <523bon5fcl.fsf@cisco.com> <1125697155.4398.8039.camel@hal.voltaire.com> Message-ID: On Fri, 2 Sep 2005, Hal Rosenstock wrote: > On Fri, 2005-09-02 at 17:28, Roland Dreier wrote: > > Makes sense I guess... should the help text for INFINIBAND_USER_VERBS > > be rewritten? Actually, should we rename the option to > > INFINIBAND_USER_ACCESS if we're going to make this change? > > I would vote for yes to both if this is to be done. Are the other userspace components useful without uverbs support? I didn't think the umad, ucm, and uat layers would work without the uverbs. From rolandd at cisco.com Tue Sep 6 10:08:44 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 10:08:44 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <431DCC06.8050403@ichips.intel.com> (Sean Hefty's message of "Tue, 06 Sep 2005 10:04:06 -0700") References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> <431DCC06.8050403@ichips.intel.com> Message-ID: <52hdcyrun7.fsf@cisco.com> Sean> The intent is that after all QPs on a CQ are destroyed and Sean> all events are removed, then no new completion events could Sean> ever occur on that CQ. Destroying the CQ at this point Sean> should now be safe. Sure, but how do you know all events are removed? There's still the race of a thread retrieving a completion event and then being delayed arbitrarily long before it gets to process the event. - R. From rolandd at cisco.com Tue Sep 6 10:11:02 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 10:11:02 -0700 Subject: [openib-general] Re: [PATCH] only build userspace verbs support if requested In-Reply-To: (James Lentini's message of "Tue, 6 Sep 2005 13:05:53 -0400 (EDT)") References: <523bon5fcl.fsf@cisco.com> <1125697155.4398.8039.camel@hal.voltaire.com> Message-ID: <52d5nmrujd.fsf@cisco.com> James> Are the other userspace components useful without uverbs James> support? I didn't think the umad, ucm, and uat layers would James> work without the uverbs. Certainly umad is useful without uverbs. For example one can use just umad to run OpenSM. In fact umad predates uverbs by quite a long time. Thanks for pointing this out -- certainly we don't want to lump umad and uverbs into the same config option. ucm and uat probably aren't that useful without uverbs. So I guess the final patch should have CONFIG_INFINIBAND_USER_MAD and CONFIG_INFINIBAND_USER_ACCESS as separate config options. - R. From jlentini at netapp.com Tue Sep 6 10:12:13 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 13:12:13 -0400 (EDT) Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <52ek82tb88.fsf@cisco.com> References: <20050906144344.GO19358@mellanox.co.il> <52ek82tb88.fsf@cisco.com> Message-ID: On Tue, 6 Sep 2005, Roland Dreier wrote: roland> James> If tvflash requires the gen1 stack, why is it located at roland> roland> It doesn't require the gen1 stack or any stack at all for that roland> matter. Is it still supported? From rolandd at cisco.com Tue Sep 6 10:15:44 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 10:15:44 -0700 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: (James Lentini's message of "Tue, 6 Sep 2005 13:12:13 -0400 (EDT)") References: <20050906144344.GO19358@mellanox.co.il> <52ek82tb88.fsf@cisco.com> Message-ID: <528xyarubj.fsf@cisco.com> James> Is it still supported? I'm not sure what "supported" means precisely, but it should still work fine for the cards it works for. I haven't had time to add the changes required for all the new flash devices used on new HCAs, and given that mstflint also works fine, I'm not making it a high priority. - R. From halr at voltaire.com Tue Sep 6 10:36:20 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 13:36:20 -0400 Subject: [openib-general] Re: [PATCH] only build userspace verbs support if requested In-Reply-To: References: <523bon5fcl.fsf@cisco.com> <1125697155.4398.8039.camel@hal.voltaire.com> Message-ID: <1126028180.4406.89.camel@hal.voltaire.com> On Tue, 2005-09-06 at 13:05, James Lentini wrote: > On Fri, 2 Sep 2005, Hal Rosenstock wrote: > > > On Fri, 2005-09-02 at 17:28, Roland Dreier wrote: > > > Makes sense I guess... should the help text for INFINIBAND_USER_VERBS > > > be rewritten? Actually, should we rename the option to > > > INFINIBAND_USER_ACCESS if we're going to make this change? > > > > I would vote for yes to both if this is to be done. > > Are the other userspace components useful without uverbs support? > I didn't think the umad, ucm, and uat layers would work without > the uverbs. umad is useful without uverbs. Not sure ucm or uat are. -- Hal From mst at mellanox.co.il Tue Sep 6 10:41:35 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 6 Sep 2005 20:41:35 +0300 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: References: <20050906144344.GO19358@mellanox.co.il> Message-ID: <20050906174135.GB30610@mellanox.co.il> Quoting r. James Lentini : > Subject: Re: [mstflint] firmware upgrade instructions > > > > On Tue, 6 Sep 2005, Michael S. Tsirkin wrote: > > > For mellanox cards you can usually figure that our by the board id, > > which is reported by running "mstflint -d q" > > or by looking at board id in /sys/class/infiniband/mthcaX/board_id > > What if my board ID doesn't match any of the *.brd file names? > > # cat /sys/class/infiniband/mthca0/board_id > MT_0030000001 > > # ./mstflint -d `lspci -d 15b3:5a44 | cut -f 1 -d ' '` q > Image type: FailSafe > Chip rev.: A1 > GUID Des: Node Port1 Port2 Sys > image > GUIDs: 0002c90200003098 0002c90200003099 0002c9020000309a > 0002c9020000309b > Board ID: (MT_0030000001) > > > ls -1 fw-23108-rel-3_3_3/*.brd > fw-23108-rel-3_3_3/MHX-CE128-T.brd > fw-23108-rel-3_3_3/MHX-CE256-T.brd > fw-23108-rel-3_3_3/MHX-CE512-T.brd > fw-23108-rel-3_3_3/MHXL-CF128-T.brd > fw-23108-rel-3_3_3/MHXL-CF256-T.brd > fw-23108-rel-3_3_3/MTLP23108_jaguar.brd MHX-CE256-T etc is noth the board id, I think its called the product number or something. MT_0030000001 is MHXL-CF128-T (Previously: MTLP23108-CF128), right enough. > I know that it is a MHXL-CF128-T though. Is it safe to upgrade with > that board file? > After compiling the image, you can run flint -i q on the image and see that the board ID matches. If it does, its safe to burn. -- MST From halr at voltaire.com Tue Sep 6 10:42:30 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 13:42:30 -0400 Subject: [openib-general] Re: [PATCH] only build userspace verbs support if requested In-Reply-To: References: <523bon5fcl.fsf@cisco.com> <1125697155.4398.8039.camel@hal.voltaire.com> Message-ID: <1126028235.4406.91.camel@hal.voltaire.com> On Tue, 2005-09-06 at 13:05, James Lentini wrote: > On Fri, 2 Sep 2005, Hal Rosenstock wrote: > > > On Fri, 2005-09-02 at 17:28, Roland Dreier wrote: > > > Makes sense I guess... should the help text for INFINIBAND_USER_VERBS > > > be rewritten? Actually, should we rename the option to > > > INFINIBAND_USER_ACCESS if we're going to make this change? > > > > I would vote for yes to both if this is to be done. > > Are the other userspace components useful without uverbs support? > I didn't think the umad, ucm, and uat layers would work without > the uverbs. umad is useful without uverbs. Not sure ucm or uat are (useful without uverbs also). -- Hal From jlentini at netapp.com Tue Sep 6 11:03:14 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 14:03:14 -0400 (EDT) Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <528xyarubj.fsf@cisco.com> References: <20050906144344.GO19358@mellanox.co.il> <52ek82tb88.fsf@cisco.com> <528xyarubj.fsf@cisco.com> Message-ID: On Tue, 6 Sep 2005, Roland Dreier wrote: > James> Is it still supported? > > I'm not sure what "supported" means precisely, but it should still > work fine for the cards it works for. I haven't had time to add the > changes required for all the new flash devices used on new HCAs, and > given that mstflint also works fine, I'm not making it a high priority. Ok, I'm trying to figure out how to describe it on the Wiki. I'll say that it works with "certain Topspin HCAs". From jlentini at netapp.com Tue Sep 6 11:59:53 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 14:59:53 -0400 (EDT) Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <20050906174135.GB30610@mellanox.co.il> References: <20050906144344.GO19358@mellanox.co.il> <20050906174135.GB30610@mellanox.co.il> Message-ID: On Tue, 6 Sep 2005, Michael S. Tsirkin wrote: > After compiling the image, you can run flint -i q > on the image and see that the board ID matches. I've added your instructions to the "Installation Cheat Sheet" page of the Wiki. After I created the binary image, I checked to see that board id's matched: # ./mstflint -i /tmp/fw-23108-a1-rel.bin q Image type: FailSafe Chip rev.: A1 GUID Des: Node Port1 Port2 Sys image GUIDs: 0002c9000100d050 0002c9000100d051 0002c9000100d052 0002c9000100d050 Board ID: V_ym (MT_0030000001) I'm not sure what the "V_ym" text means, but the MT_0030000001 matched, so I assumed that the image was ready. > If it does, its safe to burn. When I tried to burn the image, I receive the following error: # ./mstflint -d `lspci -d 15b3:5a44 | cut -f 1 -d ' '` -i /tmp/fw-23108-a1-rel.bin burn Image type: FailSafe Chip rev.: A1 GUID Des: Node Port1 Port2 Sys image GUIDs: 0002c90200003098 0002c90200003099 0002c9020000309a 0002c9020000309b Board ID: (MT_0030000001) Burn image with the following GUIDs: Node: 0002c90200003098 Port1: 0002c90200003099 Port2: 0002c9020000309a Sys.Image: 0002c9020000309b Read and verify Invariant Sector - FAILED *** ERROR *** Failsafe burn failed: Invariant sector doesn't match. Word #446 (0x1be) in image: 0xffffffff, while in flash: 0x00009cfc It is impossible to burn this image in a failsafe mode. If you want to burn in non failsafe mode, use the "-nofs" switch. Any ideas? From iod00d at hp.com Tue Sep 6 12:13:49 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 6 Sep 2005 12:13:49 -0700 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: References: <20050906144344.GO19358@mellanox.co.il> <52ek82tb88.fsf@cisco.com> <528xyarubj.fsf@cisco.com> Message-ID: <20050906191349.GF25539@esmail.cup.hp.com> On Tue, Sep 06, 2005 at 02:03:14PM -0400, James Lentini wrote: > Ok, I'm trying to figure out how to describe it on the Wiki. I'll say > that it works with "certain Topspin HCAs". I'll assert it will work with any Mellanox-based PCI-X card. I've used it for both Mellanox and Topspin cards on IA64 platforms. grant From jlentini at netapp.com Tue Sep 6 12:22:04 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 15:22:04 -0400 (EDT) Subject: [openib-general] Re: [PATCH] uDAPL changes to support async events In-Reply-To: <431DD1F9.5000707@ichips.intel.com> References: <431DD1F9.5000707@ichips.intel.com> Message-ID: On Tue, 6 Sep 2005, Arlin Davis wrote: > James Lentini wrote: > > > Could resend as an attachment? > > > Did the attachment work? I just retried the patch on a fresh svn3324 and it > worked fine for me. It did. I've committed it in revision 3326. From eitan at mellanox.co.il Tue Sep 6 12:18:16 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 06 Sep 2005 22:18:16 +0300 Subject: [openib-general] OpenSM Routing Algorithms Scalability and Enhancements Message-ID: <431DEB78.4040800@mellanox.co.il> Hi All, As we are about to start working on the fast routing algorithms, here is the writeup about proposed algorithms for your review. The plan is to start development once the merge of 1.8.0 into the trunk is done. Thanks Eitan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: OpenSM ROUTING.txt URL: From jlentini at netapp.com Tue Sep 6 13:03:11 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 16:03:11 -0400 (EDT) Subject: [openib-general] Re: [PATCH] only build userspace verbs support if requested In-Reply-To: <52d5nmrujd.fsf@cisco.com> References: <523bon5fcl.fsf@cisco.com> <1125697155.4398.8039.camel@hal.voltaire.com> <52d5nmrujd.fsf@cisco.com> Message-ID: On Tue, 6 Sep 2005, Roland Dreier wrote: > James> Are the other userspace components useful without uverbs > James> support? I didn't think the umad, ucm, and uat layers would > James> work without the uverbs. > > Certainly umad is useful without uverbs. For example one can use > just umad to run OpenSM. In fact umad predates uverbs by quite a long > time. Thanks for pointing this out -- certainly we don't want to lump > umad and uverbs into the same config option. > > ucm and uat probably aren't that useful without uverbs. > > So I guess the final patch should have CONFIG_INFINIBAND_USER_MAD and > CONFIG_INFINIBAND_USER_ACCESS as separate config options. Here's an updated. Note that I did not modify the INFINIBAND_USER_ACCESS option's help text. In particular, it is still described as "InfiniBand userspace verbs support" and not "InfiniBand userspace access support" since I find the former more descriptive of what is being enabled than the later. Signed-off-by: James Lentini Index: Kconfig =================================================================== --- Kconfig (revision 3326) +++ Kconfig (working copy) @@ -7,7 +7,16 @@ any protocols you wish to use as well as drivers for your InfiniBand hardware. -config INFINIBAND_USER_VERBS +config INFINIBAND_USER_MAD + tristate "InfiniBand userspace MAD support" + depends on INFINIBAND + ---help--- + Userspace InfiniBand Management Datagram (MAD) support. This + is the kernel side of the userspace MAD support, which allows + userspace processes to send and receive MADs. You will also + need libibumad from . + +config INFINIBAND_USER_ACCESS tristate "InfiniBand userspace verbs support" depends on INFINIBAND ---help--- Index: core/Makefile =================================================================== --- core/Makefile (revision 3326) +++ core/Makefile (working copy) @@ -1,9 +1,9 @@ EXTRA_CFLAGS += -Idrivers/infiniband/include -Idrivers/infiniband/ulp/ipoib -obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_ping.o \ - ib_cm.o ib_sa.o ib_umad.o ib_ucm.o \ - ib_at.o ib_uat.o -obj-$(CONFIG_INFINIBAND_USER_VERBS) += ib_uverbs.o +obj-$(CONFIG_INFINIBAND) += ib_core.o ib_mad.o ib_ping.o ib_cm.o \ + ib_sa.o ib_at.o +obj-$(CONFIG_INFINIBAND_USER_MAD) += ib_umad.o +obj-$(CONFIG_INFINIBAND_USER_ACCESS) += ib_uverbs.o ib_ucm.o ib_uat.o ib_core-y := packer.o ud_header.o verbs.o sysfs.o \ device.o fmr_pool.o cache.o From jlentini at netapp.com Tue Sep 6 13:10:33 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 16:10:33 -0400 (EDT) Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <20050906174135.GB30610@mellanox.co.il> References: <20050906144344.GO19358@mellanox.co.il> <20050906174135.GB30610@mellanox.co.il> Message-ID: On Tue, 6 Sep 2005, Michael S. Tsirkin wrote: > MT_0030000001 is MHXL-CF128-T (Previously: MTLP23108-CF128) How would on determine this? Is there a tool or web page that does the mapping? From jlentini at netapp.com Tue Sep 6 13:35:26 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 6 Sep 2005 16:35:26 -0400 (EDT) Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <20050906191349.GF25539@esmail.cup.hp.com> References: <20050906144344.GO19358@mellanox.co.il> <52ek82tb88.fsf@cisco.com> <528xyarubj.fsf@cisco.com> <20050906191349.GF25539@esmail.cup.hp.com> Message-ID: On Tue, 6 Sep 2005, Grant Grundler wrote: > On Tue, Sep 06, 2005 at 02:03:14PM -0400, James Lentini wrote: > > Ok, I'm trying to figure out how to describe it on the Wiki. I'll say > > that it works with "certain Topspin HCAs". > > I'll assert it will work with any Mellanox-based PCI-X card. > I've used it for both Mellanox and Topspin cards on IA64 platforms. I've updated the wiki with this info. From pw at osc.edu Tue Sep 6 13:37:34 2005 From: pw at osc.edu (Pete Wyckoff) Date: Tue, 6 Sep 2005 16:37:34 -0400 Subject: [openib-general] mvapich-gen2 needs mvapich-gen1 patches Message-ID: <20050906203734.GA3800@osc.edu> I applied by hand some of the patches here: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich/mvapich-0.9.5-patch.html to my local mvapich subtree of openib svn current. You may wish to do the same to the repository so others get the fixes too. -- Pete From panda at cse.ohio-state.edu Tue Sep 6 13:48:27 2005 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue, 6 Sep 2005 16:48:27 -0400 (EDT) Subject: [openib-general] mvapich-gen2 needs mvapich-gen1 patches In-Reply-To: <20050906203734.GA3800@osc.edu> from "Pete Wyckoff" at Sep 06, 2005 04:37:34 PM Message-ID: <200509062048.j86KmRWL012331@xi.cse.ohio-state.edu> Pete, > I applied by hand some of the patches here: > > http://nowlab.cse.ohio-state.edu/projects/mpi-iba/download-mvapich/mvapich-0.9.5-patch.html > > to my local mvapich subtree of openib svn current. You may wish to > do the same to the repository so others get the fixes too. Thanks for your note. We plan to do that in the near future. Thanks, DK > -- Pete > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From steve_wooding at keysounds.co.uk Tue Sep 6 14:22:12 2005 From: steve_wooding at keysounds.co.uk (Steve Wooding) Date: Tue, 06 Sep 2005 22:22:12 +0100 Subject: [openib-general] Re: [PATCH] [uCM] user specified context in CM events + new test program Message-ID: <431E0884.5080308@keysounds.co.uk> Hi Sean, I've just starting looking into IB connection establishment and I was wondering what the ib_cm_init_qp_attr() function actually does. Studying the MindShare IB book, it talks about exchanging the QPNs, pSNs etc., via the REQ and REP messages. However, looking at your cmpost.c example, I see, for exmaple, that in the req handler the event containing the REQ message is never actually used when modyfing the QP to RTR. The ib_cm_init_qp_attr() function is used instead. Does the info in the REQ message get read in kernel space? May you could expand a bit on the description of ib_cm_init_qp_attr() function. Thanks a lot. Steve. Sean Hefty wrote: The following patch: * Adds user specified context to all uCM events. Users will not retrieve any events associated with the context after destroying the corresponding cm_id. * Provides the ib_cm_init_qp_attr() call to userspace clients of the CM. This call may be used to set QP attributes properly before modifying the QP. * Fixes some error handling synchronization and cleanup issues. * Performs some minor code cleanup. * Replaces the ucm_simple test program with a userspace version of cmpost. The userspace version of cmpost uses the uAT interface to retrieve path records based on a remote host name, establishes a connection over a QP, and performs some simple message passing between the nodes. From rolandd at cisco.com Tue Sep 6 14:23:19 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 14:23:19 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <431DCC06.8050403@ichips.intel.com> (Sean Hefty's message of "Tue, 06 Sep 2005 10:04:06 -0700") References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> <431DCC06.8050403@ichips.intel.com> Message-ID: <52fyshriuw.fsf@cisco.com> I thought about this some more and I came to the conclusion that Sean is right. We should come up with something race-free, even if an app is perverse enough to use multiple threads to read CQ events. I think the only way to do that is for the app to acknowledge completion events, since a completion event could be read by a thread that loses the CPU before returning to the app and then delayed for arbitrarily long before the app sees the event. However, it is possible to amortize the locking cost of acknowledging events by allowing the app to acknowledge multiple events in a single call. The API I came up with is the following: /** * ibv_ack_cq_events - Free an async event * @cq: CQ to acknowledge events for * @nevents: Number of events to acknowledge. * * All completion events which are returned by ibv_get_cq_event() must * be acknowledged. ibv_destroy_cq() will wait for all completion * events to be acknowledged, so there should be a one-to-one * correspondence between acks and successful gets. An application * may accumulate multiple completion events and acknowledge them in a * single call by passing the number of events to ack in @nevents. */ extern void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents); (I also renamed ibv_put_async_event() to ibv_ack_async_event() for symmetry) I coded this up and did some unscientific measurements using ibv_rc_pingpong (using CQ events with --size=1). Even with a call to ibv_async_event() every time a CQ event is read, the cost is too small to measure. In other words, the variability from run to run of my test drowns out the cost of the call to ibv_ack_cq_events(). Patches to follow... - R. From rolandd at cisco.com Tue Sep 6 14:29:26 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 14:29:26 -0700 Subject: [openib-general] [PATCH] new kernel side of stale CQ event handling In-Reply-To: <52fyshriuw.fsf@cisco.com> (Roland Dreier's message of "Tue, 06 Sep 2005 14:23:19 -0700") References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> <431DCC06.8050403@ichips.intel.com> <52fyshriuw.fsf@cisco.com> Message-ID: <52br35rikp.fsf_-_@cisco.com> This is completely analogous to the async events change we just made. I did take the opportunity to clean up some of the code by consolidating struct ib_uverbs_async_event and struct ib_uverbs_comp_event into a single struct ib_uverbs_event. - R. --- infiniband/include/rdma/ib_user_verbs.h (revision 3324) +++ infiniband/include/rdma/ib_user_verbs.h (working copy) @@ -297,7 +297,8 @@ struct ib_uverbs_destroy_cq { }; struct ib_uverbs_destroy_cq_resp { - __u32 events_reported; + __u32 comp_events_reported; + __u32 async_events_reported; }; struct ib_uverbs_create_qp { --- infiniband/core/uverbs_main.c (revision 3324) +++ infiniband/core/uverbs_main.c (working copy) @@ -128,7 +128,7 @@ static int ib_dealloc_ucontext(struct ib idr_remove(&ib_uverbs_cq_idr, uobj->id); ib_destroy_cq(cq); list_del(&uobj->list); - kfree(container_of(uobj, struct ib_uevent_object, uobject)); + kfree(container_of(uobj, struct ib_ucq_object, uobject)); } list_for_each_entry_safe(uobj, tmp, &context->srq_list, list) { @@ -182,9 +182,7 @@ static ssize_t ib_uverbs_event_read(stru size_t count, loff_t *pos) { struct ib_uverbs_event_file *file = filp->private_data; - struct ib_uverbs_async_event *async_evt = NULL; - u32 *counter = NULL; - void *event; + struct ib_uverbs_event *event; int eventsz; int ret = 0; @@ -209,28 +207,22 @@ static ssize_t ib_uverbs_event_read(stru return -ENODEV; } - if (file->is_async) { - async_evt = list_entry(file->event_list.next, - struct ib_uverbs_async_event, list); - event = async_evt; - eventsz = sizeof *async_evt; - counter = async_evt->counter; + event = list_entry(file->event_list.next, struct ib_uverbs_event, list); - if (counter) - ++*counter; - } else { - event = list_entry(file->event_list.next, - struct ib_uverbs_comp_event, list); + if (file->is_async) + eventsz = sizeof (struct ib_uverbs_async_event_desc); + else eventsz = sizeof (struct ib_uverbs_comp_event_desc); - } if (eventsz > count) { ret = -EINVAL; event = NULL; } else { list_del(file->event_list.next); - if (counter) - list_del(&async_evt->obj_list); + if (event->counter) { + ++(*event->counter); + list_del(&event->obj_list); + } } spin_unlock_irq(&file->lock); @@ -267,16 +259,13 @@ static unsigned int ib_uverbs_event_poll static void ib_uverbs_event_release(struct ib_uverbs_event_file *file) { - struct list_head *entry, *tmp; + struct ib_uverbs_event *entry, *tmp; spin_lock_irq(&file->lock); if (file->fd != -1) { file->fd = -1; - list_for_each_safe(entry, tmp, &file->event_list) - if (file->is_async) - kfree(list_entry(entry, struct ib_uverbs_async_event, list)); - else - kfree(list_entry(entry, struct ib_uverbs_comp_event, list)); + list_for_each_entry_safe(entry, tmp, &file->event_list, list) + kfree(entry); } spin_unlock_irq(&file->lock); } @@ -314,18 +303,23 @@ static struct file_operations uverbs_eve void ib_uverbs_comp_handler(struct ib_cq *cq, void *cq_context) { - struct ib_uverbs_file *file = cq_context; - struct ib_uverbs_comp_event *entry; - unsigned long flags; + struct ib_uverbs_file *file = cq_context; + struct ib_ucq_object *uobj; + struct ib_uverbs_event *entry; + unsigned long flags; entry = kmalloc(sizeof *entry, GFP_ATOMIC); if (!entry) return; - entry->desc.cq_handle = cq->uobject->user_handle; + uobj = container_of(cq->uobject, struct ib_ucq_object, uobject); + + entry->desc.comp.cq_handle = cq->uobject->user_handle; + entry->counter = &uobj->comp_events_reported; spin_lock_irqsave(&file->comp_file[0].lock, flags); list_add_tail(&entry->list, &file->comp_file[0].event_list); + list_add_tail(&entry->obj_list, &uobj->comp_list); spin_unlock_irqrestore(&file->comp_file[0].lock, flags); wake_up_interruptible(&file->comp_file[0].poll_wait); @@ -337,16 +331,16 @@ static void ib_uverbs_async_handler(stru struct list_head *obj_list, u32 *counter) { - struct ib_uverbs_async_event *entry; + struct ib_uverbs_event *entry; unsigned long flags; entry = kmalloc(sizeof *entry, GFP_ATOMIC); if (!entry) return; - entry->desc.element = element; - entry->desc.event_type = event; - entry->counter = counter; + entry->desc.async.element = element; + entry->desc.async.event_type = event; + entry->counter = counter; spin_lock_irqsave(&file->async_file.lock, flags); list_add_tail(&entry->list, &file->async_file.event_list); @@ -360,14 +354,14 @@ static void ib_uverbs_async_handler(stru void ib_uverbs_cq_event_handler(struct ib_event *event, void *context_ptr) { - struct ib_uevent_object *uobj; + struct ib_ucq_object *uobj; uobj = container_of(event->element.cq->uobject, - struct ib_uevent_object, uobject); + struct ib_ucq_object, uobject); ib_uverbs_async_handler(context_ptr, uobj->uobject.user_handle, - event->event, &uobj->event_list, - &uobj->events_reported); + event->event, &uobj->async_list, + &uobj->async_events_reported); } --- infiniband/core/uverbs.h (revision 3324) +++ infiniband/core/uverbs.h (working copy) @@ -76,24 +76,30 @@ struct ib_uverbs_file { struct ib_uverbs_event_file comp_file[1]; }; -struct ib_uverbs_async_event { - struct ib_uverbs_async_event_desc desc; +struct ib_uverbs_event { + union { + struct ib_uverbs_async_event_desc async; + struct ib_uverbs_comp_event_desc comp; + } desc; struct list_head list; struct list_head obj_list; u32 *counter; }; -struct ib_uverbs_comp_event { - struct ib_uverbs_comp_event_desc desc; - struct list_head list; -}; - struct ib_uevent_object { struct ib_uobject uobject; struct list_head event_list; u32 events_reported; }; +struct ib_ucq_object { + struct ib_uobject uobject; + struct list_head comp_list; + struct list_head async_list; + u32 comp_events_reported; + u32 async_events_reported; +}; + extern struct semaphore ib_uverbs_idr_mutex; extern struct idr ib_uverbs_pd_idr; extern struct idr ib_uverbs_mr_idr; --- infiniband/core/uverbs_cmd.c (revision 3324) +++ infiniband/core/uverbs_cmd.c (working copy) @@ -590,7 +590,7 @@ ssize_t ib_uverbs_create_cq(struct ib_uv struct ib_uverbs_create_cq cmd; struct ib_uverbs_create_cq_resp resp; struct ib_udata udata; - struct ib_uevent_object *uobj; + struct ib_ucq_object *uobj; struct ib_cq *cq; int ret; @@ -611,10 +611,12 @@ ssize_t ib_uverbs_create_cq(struct ib_uv if (!uobj) return -ENOMEM; - uobj->uobject.user_handle = cmd.user_handle; - uobj->uobject.context = file->ucontext; - uobj->events_reported = 0; - INIT_LIST_HEAD(&uobj->event_list); + uobj->uobject.user_handle = cmd.user_handle; + uobj->uobject.context = file->ucontext; + uobj->comp_events_reported = 0; + uobj->async_events_reported = 0; + INIT_LIST_HEAD(&uobj->comp_list); + INIT_LIST_HEAD(&uobj->async_list); cq = file->device->ib_dev->create_cq(file->device->ib_dev, cmd.cqe, file->ucontext, &udata); @@ -685,8 +687,9 @@ ssize_t ib_uverbs_destroy_cq(struct ib_u struct ib_uverbs_destroy_cq cmd; struct ib_uverbs_destroy_cq_resp resp; struct ib_cq *cq; - struct ib_uevent_object *uobj; - struct ib_uverbs_async_event *evt, *tmp; + struct ib_ucq_object *uobj; + struct ib_uverbs_event *evt, *tmp; + u64 user_handle; int ret = -EINVAL; if (copy_from_user(&cmd, buf, sizeof cmd)) @@ -700,7 +703,8 @@ ssize_t ib_uverbs_destroy_cq(struct ib_u if (!cq || cq->uobject->context != file->ucontext) goto out; - uobj = container_of(cq->uobject, struct ib_uevent_object, uobject); + user_handle = cq->uobject->user_handle; + uobj = container_of(cq->uobject, struct ib_ucq_object, uobject); ret = ib_destroy_cq(cq); if (ret) @@ -712,14 +716,22 @@ ssize_t ib_uverbs_destroy_cq(struct ib_u list_del(&uobj->uobject.list); spin_unlock_irq(&file->ucontext->lock); + spin_lock_irq(&file->comp_file[0].lock); + list_for_each_entry_safe(evt, tmp, &uobj->comp_list, obj_list) { + list_del(&evt->list); + kfree(evt); + } + spin_unlock_irq(&file->comp_file[0].lock); + spin_lock_irq(&file->async_file.lock); - list_for_each_entry_safe(evt, tmp, &uobj->event_list, obj_list) { + list_for_each_entry_safe(evt, tmp, &uobj->async_list, obj_list) { list_del(&evt->list); kfree(evt); } spin_unlock_irq(&file->async_file.lock); - resp.events_reported = uobj->events_reported; + resp.comp_events_reported = uobj->comp_events_reported; + resp.async_events_reported = uobj->async_events_reported; kfree(uobj); @@ -955,7 +967,7 @@ ssize_t ib_uverbs_destroy_qp(struct ib_u struct ib_uverbs_destroy_qp_resp resp; struct ib_qp *qp; struct ib_uevent_object *uobj; - struct ib_uverbs_async_event *evt, *tmp; + struct ib_uverbs_event *evt, *tmp; int ret = -EINVAL; if (copy_from_user(&cmd, buf, sizeof cmd)) @@ -1193,7 +1205,7 @@ ssize_t ib_uverbs_destroy_srq(struct ib_ struct ib_uverbs_destroy_srq_resp resp; struct ib_srq *srq; struct ib_uevent_object *uobj; - struct ib_uverbs_async_event *evt, *tmp; + struct ib_uverbs_event *evt, *tmp; int ret = -EINVAL; if (copy_from_user(&cmd, buf, sizeof cmd)) From caitlinb at broadcom.com Tue Sep 6 14:29:47 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Tue, 6 Sep 2005 14:29:47 -0700 Subject: [openib-general] Re: ibv_get_async_event Message-ID: <54AD0F12E08D1541B826BE97C98F99F1F5D5@NT-SJCA-0751.brcm.ad.broadcom.com> I'm not sure I follow the rationale as to why acking is needed. You're requiring a solution to a problem that most apps and most devices do not have. But anyway, if you *do* have an acked cq reap, you should take advantage of it by having the cq_poll return a *pointer* to the work completion rather than copying it to a supplied buffer. This avoids an unecessary copy whenever the work completion will be processed immediately or transformed into another format (which, between them, I believe account for most applications). The only problem with a cq_poll routine that returns a pointer is that it *requires* an ack call -- but if you're going to do it anyway you might as well get the benefit of a two stage call. > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier > Sent: Tuesday, September 06, 2005 2:23 PM > To: Sean Hefty > Cc: openib > Subject: [openib-general] Re: ibv_get_async_event > > I thought about this some more and I came to the conclusion > that Sean is right. We should come up with something > race-free, even if an app is perverse enough to use multiple > threads to read CQ events. > > I think the only way to do that is for the app to acknowledge > completion events, since a completion event could be read by > a thread that loses the CPU before returning to the app and > then delayed for arbitrarily long before the app sees the > event. However, it is possible to amortize the locking cost > of acknowledging events by allowing the app to acknowledge > multiple events in a single call. > > The API I came up with is the following: > > /** > * ibv_ack_cq_events - Free an async event > * @cq: CQ to acknowledge events for > * @nevents: Number of events to acknowledge. > * > * All completion events which are returned by > ibv_get_cq_event() must > * be acknowledged. ibv_destroy_cq() will wait for all > completion > * events to be acknowledged, so there should be a one-to-one > * correspondence between acks and successful gets. An > application > * may accumulate multiple completion events and > acknowledge them in a > * single call by passing the number of events to ack > in @nevents. > */ > extern void ibv_ack_cq_events(struct ibv_cq *cq, > unsigned int nevents); > > (I also renamed ibv_put_async_event() to ibv_ack_async_event() for > symmetry) > > I coded this up and did some unscientific measurements using > ibv_rc_pingpong (using CQ events with --size=1). Even with a call to > ibv_async_event() every time a CQ event is read, the cost is > too small to measure. In other words, the variability from > run to run of my test drowns out the cost of the call to > ibv_ack_cq_events(). > > Patches to follow... > > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From rolandd at cisco.com Tue Sep 6 14:30:09 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 14:30:09 -0700 Subject: [openib-general] [PATCH] new libibverbs handling of stale CQ events In-Reply-To: <52fyshriuw.fsf@cisco.com> (Roland Dreier's message of "Tue, 06 Sep 2005 14:23:19 -0700") References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> <431DCC06.8050403@ichips.intel.com> <52fyshriuw.fsf@cisco.com> Message-ID: <527jdtriji.fsf_-_@cisco.com> Again, completely analogous to the existing stale async event handling... - R. --- libibverbs/include/infiniband/verbs.h (revision 3324) +++ libibverbs/include/infiniband/verbs.h (working copy) @@ -502,7 +502,8 @@ struct ibv_cq { pthread_mutex_t mutex; pthread_cond_t cond; - uint32_t events_completed; + uint32_t comp_events_completed; + uint32_t async_events_completed; }; struct ibv_ah { @@ -608,21 +609,22 @@ extern int ibv_close_device(struct ibv_c * ibv_get_async_event - Get next async event * @event: Pointer to use to return async event * - * The event returned must eventually be released via ibv_put_async_event(). + * All async events returned by ibv_get_async_event() must eventually + * be acknowledged with ibv_ack_async_event(). */ extern int ibv_get_async_event(struct ibv_context *context, struct ibv_async_event *event); /** - * ibv_put_async_event - Free an async event - * @event: Event to be released. + * ibv_ack_async_event - Free an async event + * @event: Event to be acknowledged. * - * All events which are returned by ib_get_async_event() must be - * released. There should be a one-to-one correspondence between - * successful gets and puts. + * All async events which are returned by ibv_get_async_event() must + * be acknowledged. Destroying an object (CQ, SRQ or QP) will wait + * for all affiliated events to be acknowledged, so there should be a + * one-to-one correspondence between acks and successful gets. */ -extern void ibv_put_async_event(struct ibv_async_event *event); - +extern void ibv_ack_async_event(struct ibv_async_event *event); /** * ibv_query_device - Get device properties @@ -682,10 +684,31 @@ extern int ibv_destroy_cq(struct ibv_cq /** * ibv_get_cq_event - Read next CQ event + * @context: Context to get CQ event for + * @comp_num: Index of completion event to check. Must be >= 0 and + * <= context->num_comp. + * @cq: Used to return pointer to CQ. + * @cq_context: Used to return consumer-supplied CQ context. + * + * All completion events returned by ibv_get_cq_event() must + * eventually be acknowledged with ibv_ack_cq_events(). */ extern int ibv_get_cq_event(struct ibv_context *context, int comp_num, struct ibv_cq **cq, void **cq_context); - + +/** + * ibv_ack_cq_events - Free an async event + * @cq: CQ to acknowledge events for + * @nevents: Number of events to acknowledge. + * + * All completion events which are returned by ibv_get_cq_event() must + * be acknowledged. ibv_destroy_cq() will wait for all completion + * events to be acknowledged, so there should be a one-to-one + * correspondence between acks and successful gets. An application + * may accumulate multiple completion events and acknowledge them in a + * single call by passing the number of events to ack in @nevents. + */ +extern void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents); /** * ibv_poll_cq - Poll a CQ for work completions --- libibverbs/include/infiniband/kern-abi.h (revision 3324) +++ libibverbs/include/infiniband/kern-abi.h (working copy) @@ -335,7 +335,8 @@ struct ibv_destroy_cq { }; struct ibv_destroy_cq_resp { - __u32 events_reported; + __u32 comp_events_reported; + __u32 async_events_reported; }; struct ibv_create_qp { --- libibverbs/src/libibverbs.map (revision 3324) +++ libibverbs/src/libibverbs.map (working copy) @@ -6,7 +6,7 @@ IBVERBS_1.0 { ibv_open_device; ibv_close_device; ibv_get_async_event; - ibv_put_async_event; + ibv_ack_async_event; ibv_query_device; ibv_query_port; ibv_query_gid; @@ -18,6 +18,7 @@ IBVERBS_1.0 { ibv_create_cq; ibv_destroy_cq; ibv_get_cq_event; + ibv_ack_cq_events; ibv_create_srq; ibv_modify_srq; ibv_destroy_srq; --- libibverbs/src/device.c (revision 3324) +++ libibverbs/src/device.c (working copy) @@ -171,7 +171,7 @@ int ibv_get_async_event(struct ibv_conte return 0; } -void ibv_put_async_event(struct ibv_async_event *event) +void ibv_ack_async_event(struct ibv_async_event *event) { switch (event->event_type) { case IBV_EVENT_CQ_ERR: @@ -179,7 +179,7 @@ void ibv_put_async_event(struct ibv_asyn struct ibv_cq *cq = event->element.cq; pthread_mutex_lock(&cq->mutex); - ++cq->events_completed; + ++cq->async_events_completed; pthread_cond_signal(&cq->cond); pthread_mutex_unlock(&cq->mutex); --- libibverbs/src/verbs.c (revision 3324) +++ libibverbs/src/verbs.c (working copy) @@ -107,9 +107,10 @@ struct ibv_cq *ibv_create_cq(struct ibv_ struct ibv_cq *cq = context->ops.create_cq(context, cqe); if (cq) { - cq->context = context; - cq->cq_context = cq_context; - cq->events_completed = 0; + cq->context = context; + cq->cq_context = cq_context; + cq->comp_events_completed = 0; + cq->async_events_completed = 0; pthread_mutex_init(&cq->mutex, NULL); pthread_cond_init(&cq->cond, NULL); } @@ -143,6 +144,14 @@ int ibv_get_cq_event(struct ibv_context return 0; } +void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents) +{ + pthread_mutex_lock(&cq->mutex); + cq->comp_events_completed += nevents; + pthread_cond_signal(&cq->cond); + pthread_mutex_unlock(&cq->mutex); +} + struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, struct ibv_srq_init_attr *srq_init_attr) { --- libibverbs/src/cmd.c (revision 3324) +++ libibverbs/src/cmd.c (working copy) @@ -303,7 +303,8 @@ int ibv_cmd_destroy_cq(struct ibv_cq *cq return errno; pthread_mutex_lock(&cq->mutex); - while (cq->events_completed != resp.events_reported) + while (cq->comp_events_completed != resp.comp_events_reported || + cq->async_events_completed != resp.async_events_reported) pthread_cond_wait(&cq->cond, &cq->mutex); pthread_mutex_unlock(&cq->mutex); --- libibverbs/ChangeLog (revision 3324) +++ libibverbs/ChangeLog (working copy) @@ -1,8 +1,16 @@ +2005-09-06 Roland Dreier + + * include/infiniband/kern-abi.h, include/infiniband/verbs.h, + src/cmd.c, src/device.c, src/verbs.c, examples/asyncwatch.c: + Update to handle new kernel ABI for avoiding stale completion + events. This is completely analogous to the previous asynchronous + event change. + 2005-08-31 Roland Dreier * include/infiniband/kern-abi.h, include/infiniband/verbs.h, src/cmd.c, src/device.c, src/ibverbs.h, src/init.c, src/verbs.c, - examples/asyncwatch.h: Update to handle new kernel ABI for + examples/asyncwatch.c: Update to handle new kernel ABI for avoiding stale asynchronous events. When a CQ, QP or SRQ is destroyed, the kernel reports the number of events it has given to userspace, and we wait until we've handled the same number of --- libibverbs/examples/asyncwatch.c (revision 3324) +++ libibverbs/examples/asyncwatch.c (working copy) @@ -86,7 +86,7 @@ int main(int argc, char *argv[]) printf(" event_type %d, port %d\n", event.event_type, event.element.port_num); - ibv_put_async_event(&event); + ibv_ack_async_event(&event); } return 0; From mshefty at ichips.intel.com Tue Sep 6 14:37:04 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Sep 2005 14:37:04 -0700 Subject: [openib-general] ibv_get_device_guid() not byte swapping Message-ID: <431E0C00.8030105@ichips.intel.com> Has anyone else seen errors with byte swapping in ibv_get_device_guid()? I'm seeing a condition where the initial 2-bytes of the GUID are not swapped. The actual code in device.c appears to be correct, and if I insert a printf before returning from the call, then the returned GUID is correct. (I'm using SuSE, with gcc 3.3.3.) If I change guid from a uint16_t guid[4] to: union { uint16_t parts[4]; uint64_t whole; } guid; I can get the expected results. - Sean From rolandd at cisco.com Tue Sep 6 14:40:45 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 14:40:45 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1F5D5@NT-SJCA-0751.brcm.ad.broadcom.com> (Caitlin Bestler's message of "Tue, 6 Sep 2005 14:29:47 -0700") References: <54AD0F12E08D1541B826BE97C98F99F1F5D5@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <523bohri1u.fsf@cisco.com> Caitlin> I'm not sure I follow the rationale as to why acking is Caitlin> needed. You're requiring a solution to a problem that Caitlin> most apps and most devices do not have. I think you're getting confused between polling a CQ for work completions, and receiving a completion notification event. We're talking about the second one here. The race condition that exists is device independent. Any application that reads completion events in one thread and destroys CQs in another thread is susceptible. I believe this includes every application that uses uDAPL. Caitlin> But anyway, if you *do* have an acked cq reap, you should Caitlin> take advantage of it by having the cq_poll return a Caitlin> *pointer* to the work completion rather than copying it Caitlin> to a supplied buffer. This avoids an unecessary copy Caitlin> whenever the work completion will be processed Caitlin> immediately or transformed into another format (which, Caitlin> between them, I believe account for most applications). Once again -- we're talking about completion notification events, not polling a CQ for work completions. But if we were polling for work completions, I don't see why this is an improvement. A device driver can already write the work completion directly into the supplied buffer as it converts it from hardware format. This avoids having to allocate memory in the fast path, and makes it easy to implement polling multiple work completions in a single call (which is useful for amortizing locking). - R. From rolandd at cisco.com Tue Sep 6 14:47:21 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 14:47:21 -0700 Subject: [openib-general] Re: ibv_get_device_guid() not byte swapping In-Reply-To: <431E0C00.8030105@ichips.intel.com> (Sean Hefty's message of "Tue, 06 Sep 2005 14:37:04 -0700") References: <431E0C00.8030105@ichips.intel.com> Message-ID: <52y869q36e.fsf@cisco.com> Sean> Has anyone else seen errors with byte swapping in Sean> ibv_get_device_guid()? I'm seeing a condition where the Sean> initial 2-bytes of the GUID are not swapped. The actual Sean> code in device.c appears to be correct, and if I insert a Sean> printf before returning from the call, then the returned Sean> GUID is correct. (I'm using SuSE, with gcc 3.3.3.) I haven't seen that, but it certainly looks like something optimization-related. I'm not sure if it's a compiler bug or a bug in the libibverbs code. I don't think the code violates any pointer aliasing rules, but I'm not enough of a C lawyer to be positive. Perhaps changing the guid to a union is the easiest thing to do. - R. From mshefty at ichips.intel.com Tue Sep 6 15:01:51 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Sep 2005 15:01:51 -0700 Subject: [openib-general] Re: [PATCH] [uCM] user specified context in CM events + new test program In-Reply-To: <431E0884.5080308@keysounds.co.uk> References: <431E0884.5080308@keysounds.co.uk> Message-ID: <431E11CF.3010903@ichips.intel.com> Steve Wooding wrote: > I've just starting looking into IB connection establishment and I was > wondering what the ib_cm_init_qp_attr() function actually does. Studying > the MindShare IB book, it talks about exchanging the QPNs, pSNs etc., > via the REQ and REP messages. However, looking at your cmpost.c example, > I see, for exmaple, that in the req handler the event containing the REQ > message is never actually used when modyfing the QP to RTR. The > ib_cm_init_qp_attr() function is used instead. Does the info in the REQ > message get read in kernel space? The kernel CM stores the information used in the establishment of a connection. It then formats the QP attribute structure for the user. This avoids every app from having to store these same values and format the QP attribute structure. - Sean From mshefty at ichips.intel.com Tue Sep 6 15:03:27 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 06 Sep 2005 15:03:27 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <52fyshriuw.fsf@cisco.com> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> <431DCC06.8050403@ichips.intel.com> <52fyshriuw.fsf@cisco.com> Message-ID: <431E122F.5040709@ichips.intel.com> Roland Dreier wrote: > The API I came up with is the following: > > /** > * ibv_ack_cq_events - Free an async event > * @cq: CQ to acknowledge events for > * @nevents: Number of events to acknowledge. > * > * All completion events which are returned by ibv_get_cq_event() must > * be acknowledged. ibv_destroy_cq() will wait for all completion > * events to be acknowledged, so there should be a one-to-one > * correspondence between acks and successful gets. An application > * may accumulate multiple completion events and acknowledge them in a > * single call by passing the number of events to ack in @nevents. > */ > extern void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents); > > (I also renamed ibv_put_async_event() to ibv_ack_async_event() for > symmetry) I think that this would work well. I will update the uCM put event to match. - Sean From halr at voltaire.com Tue Sep 6 15:15:51 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 18:15:51 -0400 Subject: [openib-general] Another OpenSM 1.8.0 nit Message-ID: <1126044950.4396.133.camel@hal.voltaire.com> Hi Yael, Here'a another OpenSM 1.8.0 nit: opensm/osm_base.h:/****d* OpenSM: Base/OSM_SM_DEFAULT_POLLING_TIMEOUT_MILISECS opensm/osm_base.h:* OSM_SM_DEFAULT_POLLING_TIMEOUT_MILISECS opensm/osm_base.h:#define OSM_SM_DEFAULT_POLLING_TIMEOUT_MILISECS 10000 Is this used ? Also are there updated docs (user manual, release notes) for 1.8.0 ? Thanks. -- Hal From halr at voltaire.com Tue Sep 6 15:28:01 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 18:28:01 -0400 Subject: [openib-general] OpenSM 1.8.0 Merge Status and Operational Issue In-Reply-To: <1125609366.4398.1014.camel@hal.voltaire.com> References: <1125609366.4398.1014.camel@hal.voltaire.com> Message-ID: <1126045651.4396.163.camel@hal.voltaire.com> On Thu, 2005-09-01 at 17:19, Hal Rosenstock wrote: > I've got the merged 1.8.0 OpenSM up and running. I have a number of > questions (which I'll send separately) but the one main problem I have > now is the following: > > I have a 4x HCA port (1x/4x LinkWidthEnable and Supported) connected via > a 1x analyzer connected to a switch (so is 1x LinkWidthActive). > OpenSM does not seem to want to bring this port up. It tries once and > gives up until the physical link is cycled (cable pull and reinsertion). > It does work running over a 4x link with 4x neighbor ports. > > I see the following: > SM side HCA side > Set PortInfo (NoStateChange) -> > <- GetResp PortInfo (Init) > Set PortInfo (Armed) -> > <- GetResp PortInfo (Armed) > Set PortInfo (Active) -> > <- GetResp PortInfo (Init) > > I didn't track the settings on the switch side neighbor port but assume > they mirror this. OpenSM just seems to never try to bring this port > active again without some external stimulus. That's the secondary issue. Aside from the issue above, I did some Solaris 10 testing today and there appears to be some regression which I need to investigate further. Based on these, I am currently ambivalent about putting the 1.8.0 changes back as yet (even though I was hoping I could go ahead tomorrow AM). Does anybody else have an opinion on this ? -- Hal From sean.hefty at intel.com Tue Sep 6 16:32:46 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Sep 2005 16:32:46 -0700 Subject: [openib-general] [PATCH] use union in ibv_get_device_guid() Message-ID: This patch replaces the uint16_t array with a union to avoid a compiler related optimization issue with SuSE gcc 3.3.3. Signed-off-by: Sean Hefty Index: libibverbs/src/device.c =================================================================== --- libibverbs/src/device.c (revision 3295) +++ libibverbs/src/device.c (working copy) @@ -63,7 +63,10 @@ uint64_t ibv_get_device_guid(struct ibv_device *device) { struct sysfs_attribute *attr; - uint16_t guid[4]; + union { + uint16_t parts[4]; + uint64_t whole; + } guid; int i; attr = sysfs_get_classdev_attr(device->ibdev, "node_guid"); @@ -71,13 +74,13 @@ return 0; if (sscanf(attr->value, "%hx:%hx:%hx:%hx", - guid, guid + 1, guid + 2, guid + 3) != 4) + guid.parts, guid.parts + 1, guid.parts + 2, guid.parts + 3) != 4) return 0; for (i = 0; i < 4; ++i) - guid[i] = htons(guid[i]); + guid.parts[i] = htons(guid.parts[i]); - return *(uint64_t *) guid; + return guid.whole; } struct ibv_context *ibv_open_device(struct ibv_device *device) From sean.hefty at intel.com Tue Sep 6 16:41:34 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Sep 2005 16:41:34 -0700 Subject: [openib-general] [PATCH] [CM] 1/6 core kernel changes to bind cm_id's to a device Message-ID: The following patch will bind communication identifiers to a specific device. Signed-off-by: Sean Hefty Index: core/cm.c =================================================================== --- core/cm.c (revision 3295) +++ core/cm.c (working copy) @@ -365,9 +365,15 @@ static struct cm_id_private * cm_insert_ cur_cm_id_priv = rb_entry(parent, struct cm_id_private, service_node); if ((cur_cm_id_priv->id.service_mask & service_id) == - (service_mask & cur_cm_id_priv->id.service_id)) - return cm_id_priv; - if (service_id < cur_cm_id_priv->id.service_id) + (service_mask & cur_cm_id_priv->id.service_id) && + (cm_id_priv->id.device == cur_cm_id_priv->id.device)) + return cur_cm_id_priv; + + if (cm_id_priv->id.device < cur_cm_id_priv->id.device) + link = &(*link)->rb_left; + else if (cm_id_priv->id.device > cur_cm_id_priv->id.device) + link = &(*link)->rb_right; + else if (service_id < cur_cm_id_priv->id.service_id) link = &(*link)->rb_left; else link = &(*link)->rb_right; @@ -377,7 +383,8 @@ static struct cm_id_private * cm_insert_ return NULL; } -static struct cm_id_private * cm_find_listen(__be64 service_id) +static struct cm_id_private * cm_find_listen(struct ib_device *device, + __be64 service_id) { struct rb_node *node = cm.listen_service_table.rb_node; struct cm_id_private *cm_id_priv; @@ -385,9 +392,15 @@ static struct cm_id_private * cm_find_li while (node) { cm_id_priv = rb_entry(node, struct cm_id_private, service_node); if ((cm_id_priv->id.service_mask & service_id) == - (cm_id_priv->id.service_mask & cm_id_priv->id.service_id)) + cm_id_priv->id.service_id && + (cm_id_priv->id.device == device)) return cm_id_priv; - if (service_id < cm_id_priv->id.service_id) + + if (device < cm_id_priv->id.device) + node = node->rb_left; + else if (device > cm_id_priv->id.device) + node = node->rb_right; + else if (service_id < cm_id_priv->id.service_id) node = node->rb_left; else node = node->rb_right; @@ -522,7 +535,8 @@ static void cm_reject_sidr_req(struct cm ib_send_cm_sidr_rep(&cm_id_priv->id, ¶m); } -struct ib_cm_id *ib_create_cm_id(ib_cm_handler cm_handler, +struct ib_cm_id *ib_create_cm_id(struct ib_device *device, + ib_cm_handler cm_handler, void *context) { struct cm_id_private *cm_id_priv; @@ -534,6 +548,7 @@ struct ib_cm_id *ib_create_cm_id(ib_cm_h memset(cm_id_priv, 0, sizeof *cm_id_priv); cm_id_priv->id.state = IB_CM_IDLE; + cm_id_priv->id.device = device; cm_id_priv->id.cm_handler = cm_handler; cm_id_priv->id.context = context; ret = cm_alloc_id(cm_id_priv); @@ -1045,7 +1060,6 @@ static void cm_format_req_event(struct c req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad; param = &work->cm_event.param.req_rcvd; param->listen_id = listen_id; - param->device = cm_id_priv->av.port->mad_agent->device; param->port = cm_id_priv->av.port->port_num; param->primary_path = &work->path[0]; if (req_msg->alt_local_lid) @@ -1224,7 +1238,8 @@ static struct cm_id_private * cm_match_r } /* Find matching listen request. */ - listen_cm_id_priv = cm_find_listen(req_msg->service_id); + listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device, + req_msg->service_id); if (!listen_cm_id_priv) { spin_unlock_irqrestore(&cm.lock, flags); cm_issue_rej(work->port, work->mad_recv_wc, @@ -1252,7 +1267,7 @@ static int cm_req_handler(struct cm_work req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad; - cm_id = ib_create_cm_id(NULL, NULL); + cm_id = ib_create_cm_id(work->port->cm_dev->device, NULL, NULL); if (IS_ERR(cm_id)) return PTR_ERR(cm_id); @@ -2626,7 +2641,6 @@ static void cm_format_sidr_req_event(str param = &work->cm_event.param.sidr_req_rcvd; param->pkey = __be16_to_cpu(sidr_req_msg->pkey); param->listen_id = listen_id; - param->device = work->port->mad_agent->device; param->port = work->port->port_num; work->cm_event.private_data = &sidr_req_msg->private_data; } @@ -2639,7 +2653,7 @@ static int cm_sidr_req_handler(struct cm struct ib_wc *wc; unsigned long flags; - cm_id = ib_create_cm_id(NULL, NULL); + cm_id = ib_create_cm_id(work->port->cm_dev->device, NULL, NULL); if (IS_ERR(cm_id)) return PTR_ERR(cm_id); cm_id_priv = container_of(cm_id, struct cm_id_private, id); @@ -2663,7 +2677,8 @@ static int cm_sidr_req_handler(struct cm spin_unlock_irqrestore(&cm.lock, flags); goto out; /* Duplicate message. */ } - cur_cm_id_priv = cm_find_listen(sidr_req_msg->service_id); + cur_cm_id_priv = cm_find_listen(cm_id->device, + sidr_req_msg->service_id); if (!cur_cm_id_priv) { rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table); spin_unlock_irqrestore(&cm.lock, flags); Index: core/ucm.c =================================================================== --- core/ucm.c (revision 3295) +++ core/ucm.c (working copy) @@ -70,22 +70,40 @@ enum { printk(KERN_DEBUG PFX format, ## arg); \ } while (0) -static struct semaphore ctx_id_mutex; -static struct idr ctx_id_table; +static void ib_ucm_add_one(struct ib_device *device); +static void ib_ucm_remove_one(struct ib_device *device); + +static struct ib_client ucm_client = { + .name = "ucm", + .add = ib_ucm_add_one, + .remove = ib_ucm_remove_one +}; + +static struct ib_ucm { + struct semaphore mutex; + struct idr ctx_id_table; + struct list_head device_list; +} ucm; + +struct ucm_device { + struct list_head list; + struct ib_device *device; + __be64 guid; +}; static struct ib_ucm_context *ib_ucm_ctx_get(struct ib_ucm_file *file, int id) { struct ib_ucm_context *ctx; - down(&ctx_id_mutex); - ctx = idr_find(&ctx_id_table, id); + down(&ucm.mutex); + ctx = idr_find(&ucm.ctx_id_table, id); if (!ctx) ctx = ERR_PTR(-ENOENT); else if (ctx->file != file) ctx = ERR_PTR(-EINVAL); else atomic_inc(&ctx->ref); - up(&ctx_id_mutex); + up(&ucm.mutex); return ctx; } @@ -139,13 +157,13 @@ static struct ib_ucm_context *ib_ucm_ctx INIT_LIST_HEAD(&ctx->events); do { - result = idr_pre_get(&ctx_id_table, GFP_KERNEL); + result = idr_pre_get(&ucm.ctx_id_table, GFP_KERNEL); if (!result) goto error; - down(&ctx_id_mutex); - result = idr_get_new(&ctx_id_table, ctx, &ctx->id); - up(&ctx_id_mutex); + down(&ucm.mutex); + result = idr_get_new(&ucm.ctx_id_table, ctx, &ctx->id); + up(&ucm.mutex); } while (result == -EAGAIN); if (result) @@ -209,6 +227,7 @@ static void ib_ucm_event_req_get(struct ureq->retry_count = kreq->retry_count; ureq->rnr_retry_count = kreq->rnr_retry_count; ureq->srq = kreq->srq; + ureq->port = kreq->port; ib_ucm_event_path_get(&ureq->primary_path, kreq->primary_path); ib_ucm_event_path_get(&ureq->alternate_path, kreq->alternate_path); @@ -295,6 +314,8 @@ static int ib_ucm_event_process(struct i case IB_CM_SIDR_REQ_RECEIVED: uvt->resp.u.sidr_req_resp.pkey = evt->param.sidr_req_rcvd.pkey; + uvt->resp.u.sidr_req_resp.port = + evt->param.sidr_req_rcvd.port; uvt->data_len = IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE; break; case IB_CM_SIDR_REP_RECEIVED: @@ -471,6 +492,16 @@ done: return result; } +static struct ib_device *ib_ucm_get_device(__be64 guid) +{ + struct ucm_device *ucm_dev; + + list_for_each_entry(ucm_dev, &ucm.device_list, list) { + if (ucm_dev->guid == guid) + return ucm_dev->device; + } + return NULL; +} static ssize_t ib_ucm_create_id(struct ib_ucm_file *file, const char __user *inbuf, @@ -479,6 +510,7 @@ static ssize_t ib_ucm_create_id(struct i struct ib_ucm_create_id cmd; struct ib_ucm_create_id_resp resp; struct ib_ucm_context *ctx; + struct ib_device *device; int result; if (out_len < sizeof(resp)) @@ -489,12 +521,19 @@ static ssize_t ib_ucm_create_id(struct i down(&file->mutex); ctx = ib_ucm_ctx_alloc(file); - up(&file->mutex); - if (!ctx) + if (!ctx) { + up(&file->mutex); return -ENOMEM; + } + device = ib_ucm_get_device(cmd.device_guid); + up(&file->mutex); + if (!device) { + result = -EINVAL; + goto err; + } ctx->uid = cmd.uid; - ctx->cm_id = ib_create_cm_id(ib_ucm_event_handler, ctx); + ctx->cm_id = ib_create_cm_id(device, ib_ucm_event_handler, ctx); if (IS_ERR(ctx->cm_id)) { result = PTR_ERR(ctx->cm_id); goto err; @@ -510,13 +549,15 @@ static ssize_t ib_ucm_create_id(struct i return 0; err: - down(&ctx_id_mutex); - idr_remove(&ctx_id_table, ctx->id); - up(&ctx_id_mutex); + down(&ucm.mutex); + idr_remove(&ucm.ctx_id_table, ctx->id); + up(&ucm.mutex); - if (!IS_ERR(ctx->cm_id)) + if (ctx->cm_id && !IS_ERR(ctx->cm_id)) ib_destroy_cm_id(ctx->cm_id); + ib_ucm_cleanup_events(ctx); + kfree(ctx); return result; } @@ -536,15 +577,15 @@ static ssize_t ib_ucm_destroy_id(struct if (copy_from_user(&cmd, inbuf, sizeof(cmd))) return -EFAULT; - down(&ctx_id_mutex); - ctx = idr_find(&ctx_id_table, cmd.id); + down(&ucm.mutex); + ctx = idr_find(&ucm.ctx_id_table, cmd.id); if (!ctx) ctx = ERR_PTR(-ENOENT); else if (ctx->file != file) ctx = ERR_PTR(-EINVAL); else - idr_remove(&ctx_id_table, ctx->id); - up(&ctx_id_mutex); + idr_remove(&ucm.ctx_id_table, ctx->id); + up(&ucm.mutex); if (IS_ERR(ctx)) return PTR_ERR(ctx); @@ -1248,9 +1289,9 @@ static int ib_ucm_close(struct inode *in struct ib_ucm_context, file_list); up(&file->mutex); - down(&ctx_id_mutex); - idr_remove(&ctx_id_table, ctx->id); - up(&ctx_id_mutex); + down(&ucm.mutex); + idr_remove(&ucm.ctx_id_table, ctx->id); + up(&ucm.mutex); ib_destroy_cm_id(ctx->cm_id); ib_ucm_cleanup_events(ctx); @@ -1263,6 +1304,60 @@ static int ib_ucm_close(struct inode *in return 0; } +static __be64 ib_ucm_get_ca_guid(struct ib_device *device) +{ + struct ib_device_attr *device_attr; + __be64 guid; + int ret; + + device_attr = kmalloc(sizeof *device_attr, GFP_KERNEL); + if (!device_attr) + return 0; + + ret = ib_query_device(device, device_attr); + guid = ret ? 0 : device_attr->node_guid; + kfree(device_attr); + return guid; +} + +static void ib_ucm_add_one(struct ib_device *device) +{ + struct ucm_device *ucm_dev; + + ucm_dev = kmalloc(sizeof(*ucm_dev), GFP_KERNEL); + if (!ucm_dev) + return; + + ucm_dev->device = device; + ucm_dev->guid = ib_ucm_get_ca_guid(device); + if (!ucm_dev->guid) + goto error; + + ib_set_client_data(device, &ucm_client, ucm_dev); + + down(&ucm.mutex); + list_add_tail(&ucm_dev->list, &ucm.device_list); + up(&ucm.mutex); + return; + +error: + kfree(ucm_dev); +} + +static void ib_ucm_remove_one(struct ib_device *device) +{ + struct ucm_device *ucm_dev; + + ucm_dev = ib_get_client_data(device, &ucm_client); + if (!ucm_dev) + return; + + down(&ucm.mutex); + list_del(&ucm_dev->list); + up(&ucm.mutex); + kfree(ucm_dev); +} + static struct file_operations ib_ucm_fops = { .owner = THIS_MODULE, .open = ib_ucm_open, @@ -1271,14 +1366,22 @@ static struct file_operations ib_ucm_fop .poll = ib_ucm_poll, }; - static struct class *ib_ucm_class; -static struct cdev ib_ucm_cdev; +static struct cdev ib_ucm_cdev; static int __init ib_ucm_init(void) { int result; + memset(&ucm, 0, sizeof ucm); + INIT_LIST_HEAD(&ucm.device_list); + idr_init(&ucm.ctx_id_table); + init_MUTEX(&ucm.mutex); + + result = ib_register_client(&ucm_client); + if (result) + goto err_reg; + result = register_chrdev_region(IB_UCM_DEV, 1, "infiniband_cm"); if (result) { ucm_dbg("Error <%d> registering dev\n", result); @@ -1302,15 +1405,14 @@ static int __init ib_ucm_init(void) class_device_create(ib_ucm_class, IB_UCM_DEV, NULL, "ucm"); - idr_init(&ctx_id_table); - init_MUTEX(&ctx_id_mutex); - return 0; err_class: cdev_del(&ib_ucm_cdev); err_cdev: unregister_chrdev_region(IB_UCM_DEV, 1); err_chr: + ib_unregister_client(&ucm_client); +err_reg: return result; } @@ -1320,6 +1422,7 @@ static void __exit ib_ucm_cleanup(void) class_destroy(ib_ucm_class); cdev_del(&ib_ucm_cdev); unregister_chrdev_region(IB_UCM_DEV, 1); + ib_unregister_client(&ucm_client); } module_init(ib_ucm_init); Index: include/rdma/ib_cm.h =================================================================== --- include/rdma/ib_cm.h (revision 3295) +++ include/rdma/ib_cm.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004 Intel Corporation. All rights reserved. + * Copyright (c) 2004, 2005 Intel Corporation. All rights reserved. * Copyright (c) 2004 Topspin Corporation. All rights reserved. * Copyright (c) 2004 Voltaire Corporation. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. @@ -109,7 +109,6 @@ struct ib_cm_id; struct ib_cm_req_event_param { struct ib_cm_id *listen_id; - struct ib_device *device; u8 port; struct ib_sa_path_rec *primary_path; @@ -220,7 +219,6 @@ struct ib_cm_apr_event_param { struct ib_cm_sidr_req_event_param { struct ib_cm_id *listen_id; - struct ib_device *device; u8 port; u16 pkey; }; @@ -284,6 +282,7 @@ typedef int (*ib_cm_handler)(struct ib_c struct ib_cm_id { ib_cm_handler cm_handler; void *context; + struct ib_device *device; __be64 service_id; __be64 service_mask; enum ib_cm_state state; /* internal CM/debug use */ @@ -294,6 +293,8 @@ struct ib_cm_id { /** * ib_create_cm_id - Allocate a communication identifier. + * @device: Device associated with the cm_id. All related communication will + * be associated with the specified device. * @cm_handler: Callback invoked to notify the user of CM events. * @context: User specified context associated with the communication * identifier. @@ -301,7 +302,8 @@ struct ib_cm_id { * Communication identifiers are used to track connection states, service * ID resolution requests, and listen requests. */ -struct ib_cm_id *ib_create_cm_id(ib_cm_handler cm_handler, +struct ib_cm_id *ib_create_cm_id(struct ib_device *device, + ib_cm_handler cm_handler, void *context); /** Index: include/rdma/ib_user_cm.h =================================================================== --- include/rdma/ib_user_cm.h (revision 3295) +++ include/rdma/ib_user_cm.h (working copy) @@ -38,7 +38,7 @@ #include -#define IB_USER_CM_ABI_VERSION 2 +#define IB_USER_CM_ABI_VERSION 3 enum { IB_USER_CM_CMD_CREATE_ID, @@ -74,6 +74,7 @@ struct ib_ucm_cmd_hdr { struct ib_ucm_create_id { __u64 uid; + __be64 device_guid; __u64 response; }; @@ -299,8 +300,6 @@ struct ib_ucm_event_get { }; struct ib_ucm_req_event_resp { - /* device */ - /* port */ struct ib_ucm_path_rec primary_path; struct ib_ucm_path_rec alternate_path; __be64 remote_ca_guid; @@ -316,6 +315,7 @@ struct ib_ucm_req_event_resp { __u8 retry_count; __u8 rnr_retry_count; __u8 srq; + __u8 port; }; struct ib_ucm_rep_event_resp { @@ -353,10 +353,9 @@ struct ib_ucm_apr_event_resp { }; struct ib_ucm_sidr_req_event_resp { - /* device */ - /* port */ __u16 pkey; - __u8 reserved[2]; + __u8 port; + __u8 reserved; }; struct ib_ucm_sidr_rep_event_resp { From sean.hefty at intel.com Tue Sep 6 16:46:10 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Sep 2005 16:46:10 -0700 Subject: [openib-general] [PATCH] [CM] 2/6 SRP updates to bind cm_id's to a device In-Reply-To: Message-ID: This patch should update SRP to use the new ib_create_cm_id() API. This patch is untested. Signed-off-by: Sean Hefty Index: ulp/srp/ib_srp.c =================================================================== --- ulp/srp/ib_srp.c (revision 3295) +++ ulp/srp/ib_srp.c (working copy) @@ -1178,7 +1178,7 @@ static ssize_t srp_create_target(struct goto err; spin_lock_init(&target->lock); - target->cm_id = ib_create_cm_id(srp_cm_handler, target); + target->cm_id = ib_create_cm_id(host->dev, srp_cm_handler, target); if (IS_ERR(target->cm_id)) { ret = -ENOMEM; goto err; From sean.hefty at intel.com Tue Sep 6 16:50:51 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Sep 2005 16:50:51 -0700 Subject: [openib-general] [PATCH] [CM] 3/6 SDP updates to bind cm_id's to a device In-Reply-To: Message-ID: This patch updates SDP to use the new ib_create_cm_id() API. It also replaces the state driven CM callback processing model with the more reliable event driven processing model. This patch is for review and is untested. Signed-off-by: Sean Hefty Index: ulp/sdp/sdp_actv.c =================================================================== --- ulp/sdp/sdp_actv.c (revision 3295) +++ ulp/sdp/sdp_actv.c (working copy) @@ -480,7 +480,7 @@ static void sdp_cm_path_complete(u64 id, /* XXX set timeout to default value of 14 */ path->packet_life = 13; #endif - conn->cm_id = ib_create_cm_id(sdp_cm_event_handler, + conn->cm_id = ib_create_cm_id(ca, sdp_cm_event_handler, hashent_arg(conn->hashent)); if (!conn->cm_id) { sdp_dbg_warn(conn, "Failed to create CM handle, %d", Index: ulp/sdp/sdp_conn.c =================================================================== --- ulp/sdp/sdp_conn.c (revision 3295) +++ ulp/sdp/sdp_conn.c (working copy) @@ -1801,11 +1801,29 @@ static void sdp_device_init_one(struct i } } + hca->listen_id = ib_create_cm_id(device, sdp_cm_event_handler, hca); + if (IS_ERR(hca->listen_id)) { + sdp_warn("Error <%ld> creating listen ID on <%s>.", + PTR_ERR(hca->listen_id), device->name); + goto error; + } + + result = ib_cm_listen(hca->listen_id, + cpu_to_be64(SDP_MSG_SERVICE_ID_VALUE), + cpu_to_be64(SDP_MSG_SERVICE_ID_MASK)); + if (result) { + sdp_warn("Error <%d> listening for SDP connections", result); + goto error; + } + ib_set_client_data(device, &sdp_client, hca); return; error: + if (!IS_ERR(hca->listen_id)) + ib_destroy_cm_id(hca->listen_id); + list_for_each_entry_safe(port, tmp, &hca->port_list, list) { list_del(&port->list); kfree(port); @@ -1838,6 +1856,9 @@ static void sdp_device_remove_one(struct return; } + if (!IS_ERR(hca->listen_id)) + ib_destroy_cm_id(hca->listen_id); + list_for_each_entry_safe(port, tmp, &hca->port_list, list) { list_del(&port->list); kfree(port); @@ -1938,31 +1959,9 @@ int sdp_conn_table_init(int proto_family goto error_iocb; } - /* - * start listening - */ - dev_root_s.listen_id = ib_create_cm_id(sdp_cm_event_handler, - (void *)SDP_DEV_SK_INVALID); - if (!dev_root_s.listen_id) { - sdp_warn("Failed to create listen connection identifier."); - result = -ENOMEM; - goto error_conn; - } - - result = ib_cm_listen(dev_root_s.listen_id, - cpu_to_be64(SDP_MSG_SERVICE_ID_VALUE), - cpu_to_be64(SDP_MSG_SERVICE_ID_MASK)); - if (result) { - sdp_warn("Error <%d> listening for SDP connections", result); - goto error_listen; - - } - sdp_dbg_init("Started listening for SDP connection requests"); return 0; -error_listen: - ib_destroy_cm_id(dev_root_s.listen_id); error_conn: sdp_main_iocb_cleanup(); error_iocb: @@ -2003,8 +2002,4 @@ void sdp_conn_table_clear(void) * delete IOCB table */ sdp_main_iocb_cleanup(); - /* - * stop listening - */ - ib_destroy_cm_id(dev_root_s.listen_id); } Index: ulp/sdp/sdp_event.c =================================================================== --- ulp/sdp/sdp_event.c (revision 3295) +++ ulp/sdp/sdp_event.c (working copy) @@ -384,45 +384,46 @@ int sdp_cm_event_handler(struct ib_cm_id struct sdp_sock *conn = NULL; int result = 0; - sdp_dbg_ctrl(NULL, "CM state <%d> event <%d> commID <%08x> ID <%d>", - cm_id->state, event->event, cm_id->local_id, hashent); - /* - * lookup the connection, on a REQ_RECV the sk will be empty. - */ - conn = sdp_conn_table_lookup(hashent); - if (conn) - sdp_conn_lock(conn); - else - if (cm_id->state != IB_CM_REQ_RCVD) { - sdp_dbg_warn(NULL, - "No conn <%d> CM state <%d> event <%d>", - hashent, cm_id->state, event->event); + sdp_dbg_ctrl(NULL, "event <%d> commID <%08x> ID <%d>", + event->event, cm_id->local_id, hashent); + + if (event->event != IB_CM_REQ_RECEIVED) { + conn = sdp_conn_table_lookup(hashent); + if (conn) + sdp_conn_lock(conn); + else return -EINVAL; - } + /* Can this fail? Why not just set context = conn? */ + } - switch (cm_id->state) { - case IB_CM_REQ_RCVD: + switch (event->event) { + case IB_CM_REQ_RECEIVED: result = sdp_cm_req_handler(cm_id, event); break; - case IB_CM_REP_RCVD: + case IB_CM_REP_RECEIVED: result = sdp_cm_rep_handler(cm_id, event, conn); break; - case IB_CM_IDLE: + case IB_CM_REQ_ERROR: + case IB_CM_REP_ERROR: + case IB_CM_REJ_RECEIVED: + case IB_CM_TIMEWAIT_EXIT: result = sdp_cm_idle(cm_id, event, conn); break; - case IB_CM_ESTABLISHED: + case IB_CM_RTU_RECEIVED: + case IB_CM_USER_ESTABLISHED: result = sdp_cm_established(cm_id, event, conn); break; - case IB_CM_DREQ_RCVD: + case IB_CM_DREQ_RECEIVED: result = sdp_cm_dreq_rcvd(cm_id, event, conn); if (result) break; /* fall through on success to handle state transition */ - case IB_CM_TIMEWAIT: + case IB_CM_DREQ_ERROR: + case IB_CM_DREP_RECEIVED: result = sdp_cm_timewait(cm_id, event, conn); break; default: - sdp_dbg_warn(conn, "Unexpected CM state <%d>", cm_id->state); + sdp_dbg_warn(conn, "Unhandled CM event <%d>", event->event); result = -EINVAL; } /* Index: ulp/sdp/sdp_dev.h =================================================================== --- ulp/sdp/sdp_dev.h (revision 3295) +++ ulp/sdp/sdp_dev.h (working copy) @@ -161,6 +161,7 @@ struct sdev_hca { u32 r_key; /* remote key */ struct ib_fmr_pool *fmr_pool; /* fast memory for Zcopy */ struct list_head port_list; /* ports on this HCA */ + struct ib_cm_id *listen_id; }; struct sdev_root { @@ -194,10 +195,6 @@ struct sdev_root { spinlock_t bind_lock; spinlock_t sock_lock; spinlock_t listen_lock; - /* - * SDP wide listen - */ - struct ib_cm_id *listen_id; /* listen handle */ }; #endif /* _SDP_DEV_H */ From sean.hefty at intel.com Tue Sep 6 16:53:54 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Sep 2005 16:53:54 -0700 Subject: [openib-general] [PATCH] [CM] 4/6 userspace CM changes for per device cm_id's In-Reply-To: Message-ID: This patch extends binding cm_id's to a device to userspace. Signed-off-by: Sean Hefty Index: libibcm/include/infiniband/cm_abi.h =================================================================== --- libibcm/include/infiniband/cm_abi.h (revision 3295) +++ libibcm/include/infiniband/cm_abi.h (working copy) @@ -42,7 +42,7 @@ * drivers/infiniband/include/ib_user_cm.h */ -#define IB_USER_CM_ABI_VERSION 2 +#define IB_USER_CM_ABI_VERSION 3 enum { IB_USER_CM_CMD_CREATE_ID, @@ -78,6 +78,7 @@ struct cm_abi_cmd_hdr { struct cm_abi_create_id { __u64 uid; + __u64 device_guid; __u64 response; }; @@ -303,8 +304,6 @@ struct cm_abi_event_get { }; struct cm_abi_req_event_resp { - /* device */ - /* port */ struct cm_abi_path_rec primary_path; struct cm_abi_path_rec alternate_path; __u64 remote_ca_guid; @@ -320,6 +319,7 @@ struct cm_abi_req_event_resp { __u8 retry_count; __u8 rnr_retry_count; __u8 srq; + __u8 port; }; struct cm_abi_rep_event_resp { @@ -357,10 +357,9 @@ struct cm_abi_apr_event_resp { }; struct cm_abi_sidr_req_event_resp { - /* device */ - /* port */ __u16 pkey; - __u8 reserved[2]; + __u8 port; + __u8 reserved; }; struct cm_abi_sidr_rep_event_resp { Index: libibcm/include/infiniband/cm.h =================================================================== --- libibcm/include/infiniband/cm.h (revision 3295) +++ libibcm/include/infiniband/cm.h (working copy) @@ -79,11 +79,13 @@ enum ib_cm_data_size { struct ib_cm_id { void *context; + struct ibv_context *device_context; uint32_t handle; }; struct ib_cm_req_event_param { struct ib_cm_id *listen_id; + uint8_t port; struct ib_sa_path_rec *primary_path; struct ib_sa_path_rec *alternate_path; @@ -193,7 +195,6 @@ struct ib_cm_apr_event_param { struct ib_cm_sidr_req_event_param { struct ib_cm_id *listen_id; - struct ib_device *device; uint8_t port; uint16_t pkey; }; @@ -292,7 +293,8 @@ int ib_cm_get_fd(void); * Communication identifiers are used to track connection states, service * ID resolution requests, and listen requests. */ -int ib_cm_create_id(struct ib_cm_id **cm_id, void *context); +int ib_cm_create_id(struct ibv_context *device_context, + struct ib_cm_id **cm_id, void *context); /** * ib_cm_destroy_id - Destroy a connection identifier. Index: libibcm/src/cm.c =================================================================== --- libibcm/src/cm.c (revision 3295) +++ libibcm/src/cm.c (working copy) @@ -146,7 +146,8 @@ static void ib_cm_free_id(struct cm_id_p free(cm_id_priv); } -static struct cm_id_private *ib_cm_alloc_id(void *context) +static struct cm_id_private *ib_cm_alloc_id(struct ibv_context *device_context, + void *context) { struct cm_id_private *cm_id_priv; @@ -155,6 +156,7 @@ static struct cm_id_private *ib_cm_alloc return NULL; memset(cm_id_priv, 0, sizeof *cm_id_priv); + cm_id_priv->id.device_context = device_context; cm_id_priv->id.context = context; pthread_mutex_init(&cm_id_priv->mut, NULL); if (pthread_cond_init(&cm_id_priv->cond, NULL)) @@ -166,7 +168,8 @@ err: ib_cm_free_id(cm_id_priv); return NULL; } -int ib_cm_create_id(struct ib_cm_id **cm_id, void *context) +int ib_cm_create_id(struct ibv_context *device_context, + struct ib_cm_id **cm_id, void *context) { struct cm_abi_create_id_resp *resp; struct cm_abi_create_id *cmd; @@ -175,12 +178,13 @@ int ib_cm_create_id(struct ib_cm_id **cm int result; int size; - cm_id_priv = ib_cm_alloc_id(context); + cm_id_priv = ib_cm_alloc_id(device_context, context); if (!cm_id_priv) return -ENOMEM; CM_CREATE_MSG_CMD_RESP(msg, cmd, resp, IB_USER_CM_CMD_CREATE_ID, size); cmd->uid = (uintptr_t) cm_id_priv; + cmd->device_guid = ibv_get_device_guid(device_context->device); result = write(fd, msg, size); if (result != size) @@ -750,6 +754,7 @@ static void cm_event_req_get(struct ib_c ureq->retry_count = kreq->retry_count; ureq->rnr_retry_count = kreq->rnr_retry_count; ureq->srq = kreq->srq; + ureq->port = kreq->port; cm_event_path_get(ureq->primary_path, &kreq->primary_path); cm_event_path_get(ureq->alternate_path, &kreq->alternate_path); @@ -868,7 +873,8 @@ int ib_cm_event_get(struct ib_cm_event * switch (evt->event) { case IB_CM_REQ_RECEIVED: evt->param.req_rcvd.listen_id = evt->cm_id; - cm_id_priv = ib_cm_alloc_id(evt->cm_id->context); + cm_id_priv = ib_cm_alloc_id(evt->cm_id->device_context, + evt->cm_id->context); if (!cm_id_priv) { result = -ENOMEM; goto done; @@ -905,7 +911,8 @@ int ib_cm_event_get(struct ib_cm_event * break; case IB_CM_SIDR_REQ_RECEIVED: evt->param.sidr_req_rcvd.listen_id = evt->cm_id; - cm_id_priv = ib_cm_alloc_id(evt->cm_id->context); + cm_id_priv = ib_cm_alloc_id(evt->cm_id->device_context, + evt->cm_id->context); if (!cm_id_priv) { result = -ENOMEM; goto done; @@ -913,6 +920,7 @@ int ib_cm_event_get(struct ib_cm_event * cm_id_priv->id.handle = resp->id; evt->cm_id = &cm_id_priv->id; evt->param.sidr_req_rcvd.pkey = resp->u.sidr_req_resp.pkey; + evt->param.sidr_req_rcvd.port = resp->u.sidr_req_resp.port; break; case IB_CM_SIDR_REP_RECEIVED: cm_event_sidr_rep_get(&evt->param.sidr_rep_rcvd, Index: libibcm/examples/cmpost.c =================================================================== --- libibcm/examples/cmpost.c (revision 3295) +++ libibcm/examples/cmpost.c (working copy) @@ -307,7 +307,7 @@ static int init_node(struct cmtest_node int cqe, ret; if (!is_server) { - ret = ib_cm_create_id(&node->cm_id, node); + ret = ib_cm_create_id(test.verbs, &node->cm_id, node); if (ret) { printf("failed to create cm_id: %d\n", ret); return ret; @@ -554,7 +554,7 @@ static void run_server(void) int i, ret; printf("starting server\n"); - if (ib_cm_create_id(&listen_id, &test)) { + if (ib_cm_create_id(test.verbs, &listen_id, &test)) { printf("listen request failed\n"); return; } From sean.hefty at intel.com Tue Sep 6 16:57:01 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Sep 2005 16:57:01 -0700 Subject: [openib-general] [PATCH] [CM] 5/6 DAPL changes to support per device cm_id's In-Reply-To: Message-ID: This patch updates DAPL to use the new ib_cm_create_id() API. This patch is for review and is untested. Signed-off-by: Sean Hefty Index: dapl/dapl/openib/dapl_ib_cm.c =================================================================== --- dapl/dapl/openib/dapl_ib_cm.c (revision 3295) +++ dapl/dapl/openib/dapl_ib_cm.c (working copy) @@ -821,7 +821,7 @@ dapls_ib_connect ( conn->ep = ep_ptr; conn->hca = ep_ptr->header.owner_ia->hca_ptr; - status = ib_cm_create_id(&conn->cm_id, conn); + status = ib_cm_create_id(conn->hca->ib_hca_handle, &conn->cm_id, conn); if (status < 0) { dat_status = dapl_convert_errno(errno,"create_cm_id"); dapl_os_free(conn, sizeof(*conn)); @@ -1003,7 +1003,7 @@ dapls_ib_setup_conn_listener ( return DAT_INTERNAL_ERROR; } - status = ib_cm_create_id(&conn->cm_id, conn); + status = ib_cm_create_id(ia_ptr->hca_ptr->ib_hca_handle, &conn->cm_id, conn); if (status < 0) { dat_status = dapl_convert_errno(errno,"create_cm_id"); dapl_os_free(conn, sizeof(*conn)); From sean.hefty at intel.com Tue Sep 6 17:00:46 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Sep 2005 17:00:46 -0700 Subject: [openib-general] [PATCH] [CM] 6/6 update cmpost to use per device cm_id's In-Reply-To: Message-ID: Patch updates kernel cmpost test utility to updated ib_create_cm_id() API. Signed-off-by: Sean Hefty Index: cmpost/cmpost.c =================================================================== --- cmpost/cmpost.c (revision 3327) +++ cmpost/cmpost.c (working copy) @@ -408,7 +408,7 @@ static int init_node(struct cmtest_node int cqe, ret; if (!is_server) { - node->cm_id = ib_create_cm_id(cm_handler, node); + node->cm_id = ib_create_cm_id(test.device, cm_handler, node); if (IS_ERR(node->cm_id)) { ret = PTR_ERR(node->cm_id); printk("cmpost: failed to create cm_id: %d\n", ret); @@ -634,7 +634,7 @@ static void run_server(void) int i, ret; printk("cmpost: starting server\n"); - listen_id = ib_create_cm_id(cm_handler, &test); + listen_id = ib_create_cm_id(test.device, cm_handler, &test); if (IS_ERR(listen_id)) { ret = PTR_ERR(listen_id); printk("cmpost: listen request failed: %d\n", ret); From rolandd at cisco.com Tue Sep 6 17:40:12 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 17:40:12 -0700 Subject: [openib-general] [PATCH] [CM] 1/6 core kernel changes to bind cm_id's to a device In-Reply-To: (Sean Hefty's message of "Tue, 6 Sep 2005 16:41:34 -0700") References: Message-ID: <52hdcxpv6b.fsf@cisco.com> Now that cm_id's are per-IB-device, does it make sense to have the userspace CM create a charcter node for each IB device? It seems that might simplify the interface. uverbs handles up to 32 IB devices with minors 192...223, so using the minors 224...255 for 32 ucm devices would make sense. Unfortunately uat is using minor 254, so we would have to do some rejiggering. I guess minor numbers shouldn't be driving this -- we should just pick the right interface, whatever it is. - R. From rolandd at cisco.com Tue Sep 6 17:46:02 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 06 Sep 2005 17:46:02 -0700 Subject: [openib-general] [PATCH] [CM] 1/6 core kernel changes to bind cm_id's to a device In-Reply-To: (Sean Hefty's message of "Tue, 6 Sep 2005 16:41:34 -0700") References: Message-ID: <52d5nlpuwl.fsf@cisco.com> > +static struct ib_ucm { > + struct semaphore mutex; > + struct idr ctx_id_table; > + struct list_head device_list; > +} ucm; Just out of curiousity, why put these in a struct? It seems like you might as well just use three static variables. You could even use DEFINE_IDR() and avoid a call to idr_init()... - R. From halr at voltaire.com Tue Sep 6 19:48:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 06 Sep 2005 22:48:25 -0400 Subject: [openib-general] [PATCH] [CM] 1/6 core kernel changes to bind cm_id's to a device In-Reply-To: <52hdcxpv6b.fsf@cisco.com> References: <52hdcxpv6b.fsf@cisco.com> Message-ID: <1126061217.4396.427.camel@hal.voltaire.com> On Tue, 2005-09-06 at 20:40, Roland Dreier wrote: > Now that cm_id's are per-IB-device, does it make sense to have the > userspace CM create a charcter node for each IB device? It seems that > might simplify the interface. > > uverbs handles up to 32 IB devices with minors 192...223, so using the > minors 224...255 for 32 ucm devices would make sense. Unfortunately > uat is using minor 254, so we would have to do some rejiggering. I > guess minor numbers shouldn't be driving this -- we should just pick > the right interface, whatever it is. uat can easily be moved. -- Hal From sean.hefty at intel.com Tue Sep 6 20:51:48 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Sep 2005 20:51:48 -0700 Subject: [openib-general] [PATCH] [CM] 1/6 core kernel changes to bind cm_id's to a device In-Reply-To: <52hdcxpv6b.fsf@cisco.com> Message-ID: >Now that cm_id's are per-IB-device, does it make sense to have the >userspace CM create a charcter node for each IB device? It seems that >might simplify the interface. That makes sense. I'll work on updating to that model. - Sean From sean.hefty at intel.com Tue Sep 6 20:54:59 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 6 Sep 2005 20:54:59 -0700 Subject: [openib-general] [PATCH] [CM] 1/6 core kernel changes to bind cm_id's to a device In-Reply-To: <52d5nlpuwl.fsf@cisco.com> Message-ID: > > +static struct ib_ucm { > > + struct semaphore mutex; > > + struct idr ctx_id_table; > > + struct list_head device_list; > > +} ucm; > >Just out of curiousity, why put these in a struct? It seems like you >might as well just use three static variables. You could even use >DEFINE_IDR() and avoid a call to idr_init()... This was a by-product of re-using some code from the kernel CM, coupled with my aversion to global variables. I'll remove the struct. - Sean From mst at mellanox.co.il Wed Sep 7 00:50:08 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 10:50:08 +0300 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <52fyshriuw.fsf@cisco.com> References: <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> <431DCC06.8050403@ichips.intel.com> <52fyshriuw.fsf@cisco.com> Message-ID: <20050907075007.GS19358@mellanox.co.il> Quoting r. Roland Dreier : > I thought about this some more and I came to the conclusion that Sean > is right. We should come up with something race-free, even if an app > is perverse enough to use multiple threads to read CQ events. You must be right. Some ULPs come up with really strange ideas. -- MST From eitan at mellanox.co.il Wed Sep 7 00:48:19 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 07 Sep 2005 10:48:19 +0300 Subject: [openib-general] OpenSM 1.8.0 Merge Status and Operational Issue In-Reply-To: <1126045651.4396.163.camel@hal.voltaire.com> References: <1125609366.4398.1014.camel@hal.voltaire.com> <1126045651.4396.163.camel@hal.voltaire.com> Message-ID: <431E9B43.2030402@mellanox.co.il> Hal Rosenstock wrote: Based on these, I am currently ambivalent about putting the 1.8.0 > changes back as yet (even though I was hoping I could go ahead tomorrow > AM). Does anybody else have an opinion on this ? > The faster we move we can get more people checking it. I suspect after that huge merge we might find more issues. From mst at mellanox.co.il Wed Sep 7 00:57:40 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 10:57:40 +0300 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: References: <20050906144344.GO19358@mellanox.co.il> <20050906174135.GB30610@mellanox.co.il> Message-ID: <20050907075740.GT19358@mellanox.co.il> Quoting r. James Lentini : > Subject: Re: [mstflint] firmware upgrade instructions > > > > On Tue, 6 Sep 2005, Michael S. Tsirkin wrote: > > > MT_0030000001 is MHXL-CF128-T (Previously: MTLP23108-CF128) > > How would on determine this? Is there a tool or web page that does the > mapping? > We plan to put out a web page to help with this soonish. -- MST From mst at mellanox.co.il Wed Sep 7 01:30:12 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 11:30:12 +0300 Subject: [openib-general] Re: ibv_get_device_guid() not byte swapping In-Reply-To: <52y869q36e.fsf@cisco.com> References: <431E0C00.8030105@ichips.intel.com> <52y869q36e.fsf@cisco.com> Message-ID: <20050907083012.GU19358@mellanox.co.il> Quoting Roland Dreier : > Perhaps changing the guid to a union is the easiest thing to do. Take care however, taking pointers to a union member and passing it down to another function seems also a buggy field. See e.g. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=12404 (seems fixed in 3.4.0) -- MST From mst at mellanox.co.il Wed Sep 7 02:14:14 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 12:14:14 +0300 Subject: [openib-general] Re: [PATCH] [CM] 3/6 SDP updates to bind cm_id's to a device In-Reply-To: References: Message-ID: <20050907091414.GV19358@mellanox.co.il> Hello, Sean! Thanks for the patch. The changes in SDP seem quite small, I think this validates that the new API is easy enough to use. Quoting Sean Hefty : > Subject: RE: [PATCH] [CM] 3/6 SDP updates to bind cm_id's to a device > > This patch updates SDP to use the new ib_create_cm_id() API. I wander if the API updates could be split to a separate patch. > It also replaces the state driven CM callback processing model with > the more reliable event driven processing model. Could you please elaborate on why is an event driven model more reliable than the state driven one? It certainly seems to require more code: isnt cm_id->state set by CM to a valid value? It seems that cm needs to track connection state anyway as per "12.9.2 Invalid State Input Handling", to know which messages are legal. Is this done? If so why isnt it a good idea to reuse that state in SDP? > This patch is for review and is untested. > > Signed-off-by: Sean Hefty A couple of comments below. > Index: ulp/sdp/sdp_actv.c > =================================================================== > --- ulp/sdp/sdp_actv.c (revision 3295) > +++ ulp/sdp/sdp_actv.c (working copy) > @@ -480,7 +480,7 @@ static void sdp_cm_path_complete(u64 id, > /* XXX set timeout to default value of 14 */ > path->packet_life = 13; > #endif > - conn->cm_id = ib_create_cm_id(sdp_cm_event_handler, > + conn->cm_id = ib_create_cm_id(ca, sdp_cm_event_handler, > hashent_arg(conn->hashent)); > if (!conn->cm_id) { > sdp_dbg_warn(conn, "Failed to create CM handle, %d", > Index: ulp/sdp/sdp_conn.c > =================================================================== > --- ulp/sdp/sdp_conn.c (revision 3295) > +++ ulp/sdp/sdp_conn.c (working copy) > @@ -1801,11 +1801,29 @@ static void sdp_device_init_one(struct i > } > } > > + hca->listen_id = ib_create_cm_id(device, sdp_cm_event_handler, hca); > + if (IS_ERR(hca->listen_id)) { > + sdp_warn("Error <%ld> creating listen ID on <%s>.", > + PTR_ERR(hca->listen_id), device->name); > + goto error; > + } > + > + result = ib_cm_listen(hca->listen_id, > + cpu_to_be64(SDP_MSG_SERVICE_ID_VALUE), > + cpu_to_be64(SDP_MSG_SERVICE_ID_MASK)); > + if (result) { > + sdp_warn("Error <%d> listening for SDP connections", result); > + goto error; > + } > + > ib_set_client_data(device, &sdp_client, hca); > > return; Can a listen event arrive before we call ib_set_client_data? > error: > + if (!IS_ERR(hca->listen_id)) > + ib_destroy_cm_id(hca->listen_id); > + > list_for_each_entry_safe(port, tmp, &hca->port_list, list) { > list_del(&port->list); > kfree(port); > @@ -1838,6 +1856,9 @@ static void sdp_device_remove_one(struct > return; > } > > + if (!IS_ERR(hca->listen_id)) > + ib_destroy_cm_id(hca->listen_id); > + > list_for_each_entry_safe(port, tmp, &hca->port_list, list) { > list_del(&port->list); > kfree(port); I'd prefer separate labels for two errors, instead of testing IS_ERR twice. > @@ -1938,31 +1959,9 @@ int sdp_conn_table_init(int proto_family > goto error_iocb; > } > > - /* > - * start listening > - */ > - dev_root_s.listen_id = ib_create_cm_id(sdp_cm_event_handler, > - (void *)SDP_DEV_SK_INVALID); > - if (!dev_root_s.listen_id) { > - sdp_warn("Failed to create listen connection identifier."); > - result = -ENOMEM; > - goto error_conn; > - } > - > - result = ib_cm_listen(dev_root_s.listen_id, > - cpu_to_be64(SDP_MSG_SERVICE_ID_VALUE), > - cpu_to_be64(SDP_MSG_SERVICE_ID_MASK)); > - if (result) { > - sdp_warn("Error <%d> listening for SDP connections", result); > - goto error_listen; > - > - } > - > sdp_dbg_init("Started listening for SDP connection requests"); > > return 0; > -error_listen: > - ib_destroy_cm_id(dev_root_s.listen_id); > error_conn: > sdp_main_iocb_cleanup(); > error_iocb: > @@ -2003,8 +2002,4 @@ void sdp_conn_table_clear(void) > * delete IOCB table > */ > sdp_main_iocb_cleanup(); > - /* > - * stop listening > - */ > - ib_destroy_cm_id(dev_root_s.listen_id); > } > Index: ulp/sdp/sdp_event.c > =================================================================== > --- ulp/sdp/sdp_event.c (revision 3295) > +++ ulp/sdp/sdp_event.c (working copy) > @@ -384,45 +384,46 @@ int sdp_cm_event_handler(struct ib_cm_id > struct sdp_sock *conn = NULL; > int result = 0; > > - sdp_dbg_ctrl(NULL, "CM state <%d> event <%d> commID <%08x> ID <%d>", > - cm_id->state, event->event, cm_id->local_id, hashent); > - /* > - * lookup the connection, on a REQ_RECV the sk will be empty. > - */ > - conn = sdp_conn_table_lookup(hashent); > - if (conn) > - sdp_conn_lock(conn); > - else > - if (cm_id->state != IB_CM_REQ_RCVD) { > - sdp_dbg_warn(NULL, > - "No conn <%d> CM state <%d> event <%d>", > - hashent, cm_id->state, event->event); > + sdp_dbg_ctrl(NULL, "event <%d> commID <%08x> ID <%d>", > + event->event, cm_id->local_id, hashent); > + > + if (event->event != IB_CM_REQ_RECEIVED) { > + conn = sdp_conn_table_lookup(hashent); > + if (conn) > + sdp_conn_lock(conn); > + else > return -EINVAL; > - } > + /* Can this fail? Why not just set context = conn? */ > + } Regarding the question: the first thing we do in sdp_conn_put is sdp_conn_table_remove, so I think this lookup can fail. > - switch (cm_id->state) { > - case IB_CM_REQ_RCVD: > + switch (event->event) { > + case IB_CM_REQ_RECEIVED: > result = sdp_cm_req_handler(cm_id, event); > break; > - case IB_CM_REP_RCVD: > + case IB_CM_REP_RECEIVED: > result = sdp_cm_rep_handler(cm_id, event, conn); > break; > - case IB_CM_IDLE: > + case IB_CM_REQ_ERROR: > + case IB_CM_REP_ERROR: > + case IB_CM_REJ_RECEIVED: > + case IB_CM_TIMEWAIT_EXIT: > result = sdp_cm_idle(cm_id, event, conn); > break; > - case IB_CM_ESTABLISHED: > + case IB_CM_RTU_RECEIVED: > + case IB_CM_USER_ESTABLISHED: > result = sdp_cm_established(cm_id, event, conn); > break; > - case IB_CM_DREQ_RCVD: > + case IB_CM_DREQ_RECEIVED: > result = sdp_cm_dreq_rcvd(cm_id, event, conn); > if (result) > break; > /* fall through on success to handle state transition */ > - case IB_CM_TIMEWAIT: > + case IB_CM_DREQ_ERROR: > + case IB_CM_DREP_RECEIVED: > result = sdp_cm_timewait(cm_id, event, conn); > break; > default: > - sdp_dbg_warn(conn, "Unexpected CM state <%d>", cm_id->state); > + sdp_dbg_warn(conn, "Unhandled CM event <%d>", event->event); > result = -EINVAL; > } > /* Seems more code. Could you please elaborate on why is this more correct? > Index: ulp/sdp/sdp_dev.h > =================================================================== > --- ulp/sdp/sdp_dev.h (revision 3295) > +++ ulp/sdp/sdp_dev.h (working copy) > @@ -161,6 +161,7 @@ struct sdev_hca { > u32 r_key; /* remote key */ > struct ib_fmr_pool *fmr_pool; /* fast memory for Zcopy */ > struct list_head port_list; /* ports on this HCA */ > + struct ib_cm_id *listen_id; > }; > > struct sdev_root { > @@ -194,10 +195,6 @@ struct sdev_root { > spinlock_t bind_lock; > spinlock_t sock_lock; > spinlock_t listen_lock; > - /* > - * SDP wide listen > - */ > - struct ib_cm_id *listen_id; /* listen handle */ > }; > > #endif /* _SDP_DEV_H */ > Thanks, MST -- MST From javcrau at msn.com Wed Sep 7 01:25:01 2005 From: javcrau at msn.com (Kenny Dick) Date: Wed, 7 Sep 2005 12:25:01 +0400 Subject: [openib-general] Rate Quote for free! Message-ID: <1044130172.52javcrau@msn.com> We are happy to present you with six deals from four different brokers. Please remember that there is no commitment required on your part, and your credit is not an issue. Please validate your information with our secure and private database to ensure our records are up to date and accurate. http://shr1ne.net/p1.asp Have a good day. Sincerely, Kenny Dick Customer Service Rep eMLL Inc. wrath not everywhere see may gyrocompass ! see strafe be and dub it or histamine ! not sparge see on amino seeit's noblesse may. viaduct try granitic , on violent be it's implement and some gnomon , ! castor be see chickweed some see paycheck andsome slivery a. From mst at mellanox.co.il Wed Sep 7 06:29:53 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 16:29:53 +0300 Subject: [openib-general] mthca: qp->wait uninitialized? Message-ID: <20050907132953.GB19358@mellanox.co.il> ----- Forwarded message from Leonid Keller ----- From: Leonid Keller Subject: a bug or feature ? i failed to find init if the wait queue of qp (init_waitqueue_head(qp>wait)). but there exist wake_up(qp->wait) and wait_on(qp->wait) - see mthca_qp.c ----- End forwarded message ----- Roland, does the following make sense? qp->wait does not seem to be initialized anywhere. Signed-off-by: Michael S. Tsirkin Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_qp.c =================================================================== --- linux-kernel.orig/drivers/infiniband/hw/mthca/mthca_qp.c 2005-09-07 16:20:18.025526000 +0300 +++ linux-kernel/drivers/infiniband/hw/mthca/mthca_qp.c 2005-09-07 16:20:37.435068000 +0300 @@ -1024,6 +1024,7 @@ static int mthca_alloc_qp_common(struct int i; atomic_set(&qp->refcount, 1); + init_waitqueue_head(&qp->wait); qp->state = IB_QPS_RESET; qp->atomic_rd_en = 0; qp->resp_depth = 0; -- MST From halr at voltaire.com Wed Sep 7 06:52:27 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Sep 2005 09:52:27 -0400 Subject: [openib-general] Some OpenSM 1.8.0 Anomalies Message-ID: <1126101147.4396.1405.camel@hal.voltaire.com> Hi Eitan, I see the following message (not necessarily new to 1.8.0): Sep 06 15:42:28 743720 [B76A4C40] -> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING. Is this really an error ? Also, I see the following messages although these should not be the case: Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 ... Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 Also, should these 2 (slightly) different error messages have the same error number ? -- Hal From jlentini at netapp.com Wed Sep 7 07:23:28 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 7 Sep 2005 10:23:28 -0400 (EDT) Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <20050907075740.GT19358@mellanox.co.il> References: <20050906144344.GO19358@mellanox.co.il> <20050906174135.GB30610@mellanox.co.il> <20050907075740.GT19358@mellanox.co.il> Message-ID: On Wed, 7 Sep 2005, Michael S. Tsirkin wrote: > Quoting r. James Lentini : > > Subject: Re: [mstflint] firmware upgrade instructions > > > > > > > > On Tue, 6 Sep 2005, Michael S. Tsirkin wrote: > > > > > MT_0030000001 is MHXL-CF128-T (Previously: MTLP23108-CF128) > > > > How would on determine this? Is there a tool or web page that does the > > mapping? > > > > We plan to put out a web page to help with this soonish. Thanks. Any ideas on why my burn failed? From mst at mellanox.co.il Wed Sep 7 07:36:47 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 17:36:47 +0300 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: References: <20050906144344.GO19358@mellanox.co.il> <20050906174135.GB30610@mellanox.co.il> <20050907075740.GT19358@mellanox.co.il> Message-ID: <20050907143647.GC19358@mellanox.co.il> Quoting r. James Lentini : > Subject: Re: [mstflint] firmware upgrade instructions > > > > On Wed, 7 Sep 2005, Michael S. Tsirkin wrote: > > > Quoting r. James Lentini : > > > Subject: Re: [mstflint] firmware upgrade instructions > > > > > > > > > > > > On Tue, 6 Sep 2005, Michael S. Tsirkin wrote: > > > > > > > MT_0030000001 is MHXL-CF128-T (Previously: MTLP23108-CF128) > > > > > > How would on determine this? Is there a tool or web page that does the > > > mapping? > > > > > > > We plan to put out a web page to help with this soonish. > > Thanks. > > Any ideas on why my burn failed? > For now, just give it the -nofs flag. -- MST From jlentini at netapp.com Wed Sep 7 08:00:32 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 7 Sep 2005 11:00:32 -0400 (EDT) Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: <20050907143647.GC19358@mellanox.co.il> References: <20050906144344.GO19358@mellanox.co.il> <20050906174135.GB30610@mellanox.co.il> <20050907075740.GT19358@mellanox.co.il> <20050907143647.GC19358@mellanox.co.il> Message-ID: On Wed, 7 Sep 2005, Michael S. Tsirkin wrote: > > Any ideas on why my burn failed? > > For now, just give it the -nofs flag. Is the failure without the -nofs flag expected? If there is a bug in one of the burn utilities, I can wait until it is fixed. I don't want to toast my cards. From mst at mellanox.co.il Wed Sep 7 08:17:02 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 18:17:02 +0300 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: References: <20050906144344.GO19358@mellanox.co.il> <20050906174135.GB30610@mellanox.co.il> <20050907075740.GT19358@mellanox.co.il> <20050907143647.GC19358@mellanox.co.il> Message-ID: <20050907151702.GD19358@mellanox.co.il> Quoting r. James Lentini : > > > Any ideas on why my burn failed? > > > > For now, just give it the -nofs flag. > > Is the failure without the -nofs flag expected? Yes. The new firmware comes with an updated first sector code, and flint currently forces you to use this updated version. Future versions of flint will support upgrading just main flash, skipping the first sector. > If there is a bug in > one of the burn utilities, I can wait until it is fixed. You can verify the image with mstflint -i v > I don't want > to toast my cards. -nofs just disables "failsafe" feature. You wont have problems unless you reset the card while burning is in progress. -- MST From mst at mellanox.co.il Wed Sep 7 08:49:32 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 18:49:32 +0300 Subject: [openib-general] Re: [mstflint] firmware upgrade instructions In-Reply-To: References: <20050906144344.GO19358@mellanox.co.il> <20050906174135.GB30610@mellanox.co.il> <20050907075740.GT19358@mellanox.co.il> <20050907143647.GC19358@mellanox.co.il> Message-ID: <20050907154932.GA9911@mellanox.co.il> Quoting r. James Lentini : > I don't want > to toast my cards. You can also read the flash content into a file with "ri" command: ./mstflint -d /sys/class/infiniband/mthca0/device/resource0 ri ~/oldimage.bin -- MST From rolandd at cisco.com Wed Sep 7 09:41:07 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 09:41:07 -0700 Subject: [openib-general] Re: mthca: qp->wait uninitialized? In-Reply-To: <20050907132953.GB19358@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Sep 2005 16:29:53 +0300") References: <20050907132953.GB19358@mellanox.co.il> Message-ID: <52u0gwomos.fsf@cisco.com> Yes, I think this is correct. I guess we never got in trouble because qp->refcount was always 0 by the time we reached the wait_event(). Good catch. - R. From mshefty at ichips.intel.com Wed Sep 7 09:44:54 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Sep 2005 09:44:54 -0700 Subject: [openib-general] Re: [PATCH] [CM] 3/6 SDP updates to bind cm_id's to a device In-Reply-To: <20050907091414.GV19358@mellanox.co.il> References: <20050907091414.GV19358@mellanox.co.il> Message-ID: <431F1906.1050301@ichips.intel.com> Michael S. Tsirkin wrote: > Could you please elaborate on why is an event driven model more reliable than > the state driven one? > It certainly seems to require more code: isnt cm_id->state set by CM to > a valid value? It seems that cm needs to track connection state > anyway as per "12.9.2 Invalid State Input Handling", to know which messages > are legal. Is this done? If so why isnt it a good idea to reuse that state > in SDP? The state of the cm_id is controlled by the CM and can change at any time as a result of processing a received MAD. It's only exposed for debug purposes. >>+ result = ib_cm_listen(hca->listen_id, >>+ cpu_to_be64(SDP_MSG_SERVICE_ID_VALUE), >>+ cpu_to_be64(SDP_MSG_SERVICE_ID_MASK)); >>+ if (result) { >>+ sdp_warn("Error <%d> listening for SDP connections", result); >>+ goto error; >>+ } >>+ >> ib_set_client_data(device, &sdp_client, hca); >> >> return; > > > Can a listen event arrive before we call ib_set_client_data? Yes. We may need to swap those two statements. >> error: >>+ if (!IS_ERR(hca->listen_id)) >>+ ib_destroy_cm_id(hca->listen_id); >>+ >> list_for_each_entry_safe(port, tmp, &hca->port_list, list) { >> list_del(&port->list); >> kfree(port); >>@@ -1838,6 +1856,9 @@ static void sdp_device_remove_one(struct >> return; >> } >> >>+ if (!IS_ERR(hca->listen_id)) >>+ ib_destroy_cm_id(hca->listen_id); >>+ >> list_for_each_entry_safe(port, tmp, &hca->port_list, list) { >> list_del(&port->list); >> kfree(port); > > > I'd prefer separate labels for two errors, instead of testing IS_ERR twice. So do I, but I was trying to match the coding style used throughout the SDP code. Fixing the error handling seems like another set of changes to me. >>+ if (event->event != IB_CM_REQ_RECEIVED) { >>+ conn = sdp_conn_table_lookup(hashent); >>+ if (conn) >>+ sdp_conn_lock(conn); >>+ else >> return -EINVAL; >>- } >>+ /* Can this fail? Why not just set context = conn? */ >>+ } > > > Regarding the question: the first thing we do in sdp_conn_put > is sdp_conn_table_remove, so I think this lookup can fail. I should have removed that comment. I'm not overly familiar with the SDP code, but it seems wrong to need to verify that a requested callback can be executed. By returning -EINVAL above, the cm_id associated with the callback will be destroyed. This indicates to me that either that cm_id will be destroyed twice by SDP (once when the SDP conn object is destroyed and again here), or that SDP does not usually destroy cm_id's as part of its normal cleanup. I'm also wondering about possible race conditions that could occur between calling table_lookup and conn_lock, but I don't know the code well enough to say if one exists. >>- switch (cm_id->state) { >>- case IB_CM_REQ_RCVD: >>+ switch (event->event) { >>+ case IB_CM_REQ_RECEIVED: >> result = sdp_cm_req_handler(cm_id, event); >> break; >>- case IB_CM_REP_RCVD: >>+ case IB_CM_REP_RECEIVED: >> result = sdp_cm_rep_handler(cm_id, event, conn); >> break; >>- case IB_CM_IDLE: >>+ case IB_CM_REQ_ERROR: >>+ case IB_CM_REP_ERROR: >>+ case IB_CM_REJ_RECEIVED: >>+ case IB_CM_TIMEWAIT_EXIT: >> result = sdp_cm_idle(cm_id, event, conn); >> break; >>- case IB_CM_ESTABLISHED: >>+ case IB_CM_RTU_RECEIVED: >>+ case IB_CM_USER_ESTABLISHED: >> result = sdp_cm_established(cm_id, event, conn); >> break; >>- case IB_CM_DREQ_RCVD: >>+ case IB_CM_DREQ_RECEIVED: >> result = sdp_cm_dreq_rcvd(cm_id, event, conn); >> if (result) >> break; >> /* fall through on success to handle state transition */ >>- case IB_CM_TIMEWAIT: >>+ case IB_CM_DREQ_ERROR: >>+ case IB_CM_DREP_RECEIVED: >> result = sdp_cm_timewait(cm_id, event, conn); >> break; >> default: >>- sdp_dbg_warn(conn, "Unexpected CM state <%d>", cm_id->state); >>+ sdp_dbg_warn(conn, "Unhandled CM event <%d>", event->event); >> result = -EINVAL; >> } >> /* > > Seems more code. Could you please elaborate on why is this more correct? See comment above. Using the cm_id state is incorrect as it can change while a client's callback is running. - Sean From jlentini at netapp.com Wed Sep 7 09:55:27 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 7 Sep 2005 12:55:27 -0400 (EDT) Subject: [openib-general] Re: RDMA Generic Connection Management In-Reply-To: <52y86f407y.fsf@cisco.com> References: <521x4bjqls.fsf@cisco.com> <6.2.3.4.2.20050830135332.064b9030@exnane01.nane.netapp.com> <521x4bi9su.fsf@cisco.com> <6.2.3.4.2.20050830140954.060c9030@exnane01.nane.netapp.com> <52k6i3gukm.fsf@cisco.com> <6.2.3.4.2.20050830142125.063e1cc0@exnane01.nane.netapp.com> <527je3gtmq.fsf@cisco.com> <6.2.3.4.2.20050830145906.05104890@exnane01.nane.netapp.com> <523borgs4r.fsf@cisco.com> <52zmqyeq58.fsf@cisco.com> <52r7cadqsy.fsf@cisco.com> <52y86f407y.fsf@cisco.com> Message-ID: On Fri, 2 Sep 2005, Roland Dreier wrote: > Roland> Yes, but what is the generic way? > > James> The generic way would be to handle this in a common > James> layer. For the IB verbs + RDMA connection API to be as easy > James> to use as the sockets API, then it needs to make this issue > James> transparent. > > I don't think the kernel design philosophy is to hide these sorts of > object lifetime issues from consumers. I disagree. The kernel Sockets API makes these issues transparent. The interactions between the PPPoE driver and the Sockets layer are a nice example. The PPPoE driver registers a callback for network device events by calling register_netdevice_notifier(), which is similar to our ib_register_client() function. When a hotplug event occurs, the ppoe_flush_dev() function is called. This function cleans up all sockets on the device that is being removed. For each socket, this function obtains a lock on the socket via lock_sock(). The pppoe_sendmsg() function (which is invoked by net/socket.c's kernel_sendmsg() function for PPPoE sockets) also obtains this lock via lock_sock(). Therefore the socket lock ensures that a consumer can send a message without worrying about hotplug events. > And my real question is how it's even possible to handle this > efficiently in a generic layer. > > If you want consumers to be able to ignore hotplug, then the generic > layer needs to handle device removal even in the middle of fast path > work request posting operations. And I don't see how to do that > without changing to reference counted handles (from the current scheme > of directly using pointers). And that's going to have a serious > performance impact that I don't think is worth it. I agree that if hotplug is hidden inside the common RDMA layer, additional locking will be necessary. I think the performance impact will vary depending on the ULP. Some ULPs (SRP) connect into layers (SCSI) that are designed for hotplug. For this class of ULPs, hiding hotplug events inside the common RDMA layer is unnecessary. There is another class of ULPs (NFS RPC) that currently use the Sockets API but want to start using RDMA. For these ULPs to use the common RDMA layer they need additional locking to deal with hotplug. The locking could be done in a common layer or in each ULP. I expect both these options to have the same performance. However, consolidating the locking into a common layer will benefit correctness, code reuse, and ease the task of moving ULPs from the sockets API to RDMA. From mst at mellanox.co.il Wed Sep 7 10:01:55 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 20:01:55 +0300 Subject: [openib-general] Re: [PATCH] [CM] 3/6 SDP updates to bind cm_id's to a device In-Reply-To: <431F1906.1050301@ichips.intel.com> References: <20050907091414.GV19358@mellanox.co.il> <431F1906.1050301@ichips.intel.com> Message-ID: <20050907170155.GA10137@mellanox.co.il> Quoting Sean Hefty : > The state of the cm_id is controlled by the CM and can change at any time > as a result of processing a received MAD. I see. Lets hide this field then. At least, this warrants a comment in the header file. > It's only exposed for debug purposes. I'd say you cant usefully debug with something that changes at any time, anyway. Let's just have a compile time flag for dumping cm traffic to syslog. Makes sense? [...] > >I'd prefer separate labels for two errors, instead of testing IS_ERR twice. > > So do I, but I was trying to match the coding style used throughout the SDP > code. Fixing the error handling seems like another set of changes to me. Lets do it correctly in this function, no need to add more cleanup work. > >Regarding the question: the first thing we do in sdp_conn_put > >is sdp_conn_table_remove, so I think this lookup can fail. > > I should have removed that comment. I'm not overly familiar with the SDP > code, but it seems wrong to need to verify that a requested callback can be > executed. By returning -EINVAL above, the cm_id associated with the > callback will be destroyed. This indicates to me that either that cm_id > will be destroyed twice by SDP (once when the SDP conn object is destroyed > and again here), or that SDP does not usually destroy cm_id's as part of > its normal cleanup. Hmm. Got to think about it. > I'm also wondering about possible race conditions that > could occur between calling table_lookup and conn_lock, but I don't know > the code well enough to say if one exists. No, conn_lock does not handle connection reference counting, it only protects against two threads running on a connection. -- MST From rolandd at cisco.com Wed Sep 7 10:01:38 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 10:01:38 -0700 Subject: [openib-general] RMPP fixes for 2.6.14 Message-ID: <52psrkolql.fsf@cisco.com> I found this RMPP difference between the current kernel and our subversion tree. Is there anything else that needs to be merged for the kernel 2.6.14 tree? - R. --- old/drivers/infiniband/core/mad_rmpp.c 2005-09-07 09:48:48.232278654 -0700 +++ new/drivers/infiniband/core/mad_rmpp.c 2005-08-30 20:26:41.989894000 -0700 @@ -593,7 +593,8 @@ rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(mad_send_wr->total_seg * (sizeof(struct ib_rmpp_mad) - - offsetof(struct ib_rmpp_mad, data))); + offsetof(struct ib_rmpp_mad, data)) - + mad_send_wr->pad); mad_send_wr->sg_list[0].length = sizeof(struct ib_rmpp_mad); } else { mad_send_wr->send_wr.num_sge = 2; @@ -602,6 +603,7 @@ mad_send_wr->sg_list[1].length = sizeof(struct ib_rmpp_mad) - mad_send_wr->data_offset; mad_send_wr->sg_list[1].lkey = mad_send_wr->sg_list[0].lkey; + rmpp_mad->rmpp_hdr.paylen_newwin = 0; } if (mad_send_wr->seg_num == mad_send_wr->total_seg) { From eitan at mellanox.co.il Wed Sep 7 10:00:50 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 07 Sep 2005 20:00:50 +0300 Subject: [openib-general] Re: Some OpenSM 1.8.0 Anomalies In-Reply-To: <1126101147.4396.1405.camel@hal.voltaire.com> References: <1126101147.4396.1405.camel@hal.voltaire.com> Message-ID: <431F1CC2.9020104@mellanox.co.il> Hal Rosenstock wrote: > Hi Eitan, > > I see the following message (not necessarily new to 1.8.0): > Sep 06 15:42:28 743720 [B76A4C40] -> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal OSM_SM_SIGNAL_DISCOVER in state IB_SMINFO_STATE_DISCOVERING. > Is this really an error ? I will let Yael answer this - she wrote the sm state machine. > > Also, I see the following messages although these should not be the case: > Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 This means that the LID of the port registered as the source for this inform info is not recognized as a valid LID. > ... > Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 The meaning of this is that the incoming trap source is not a recognized (included in the SM database) guid > > Also, should these 2 (slightly) different error messages have the same > error number ? You are right. The errors should be different. > > -- Hal From mst at mellanox.co.il Wed Sep 7 10:07:19 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 20:07:19 +0300 Subject: [openib-general] Re: RMPP fixes for 2.6.14 In-Reply-To: <52psrkolql.fsf@cisco.com> References: <52psrkolql.fsf@cisco.com> Message-ID: <20050907170718.GC10137@mellanox.co.il> Quoting r. Roland Dreier : > Subject: RMPP fixes for 2.6.14 > > I found this RMPP difference between the current kernel and our > subversion tree. What should I be looking at? Linus's git? > Is there anything else that needs to be merged for > the kernel 2.6.14 tree? > > - R. The qp->wait init patch :) -- MST From kingman at storagegear.com Wed Sep 7 10:09:23 2005 From: kingman at storagegear.com (John Kingman) Date: Wed, 7 Sep 2005 12:09:23 -0500 (CDT) Subject: [openib-general] [PATCH] [CM] 1/2 Fix CM redirection Message-ID: I found that CM handling for SRP is broken when handling a REJ with reason 24 (Port and CM Redirection) with a RedirectLID supplied. As stated in the spec, if RedirectLID is non-zero, it is the DLID a requester _shall_ use to access the class services. I believe that without this support, CM does not comply with C13-28 with respect to RedirectLID. In my testing, the following patches seem to fix the problem. If there is a better way to fix the problem, I would appreciate the input. Summary of patches: Part 1/2 include/rdma/ib_cm.h: A new field is added to struct ib_cm_id (redirect_qpn) to hold the RedirectQP number returned in the reject message. core/cm.c: Is modified to use redirect_qpn, if it is non-zero, when calling ib_create_send_mad, instead of QP1. Part 2/2 ulp/srp/ib_srp.h: A new target status is added (SRP_DLID_REDIRECT). ulp/srp/ib_srp.c: Is modified to check for RedirectLID in ClassPortInfo when it receives a reject with reason IB_CM_REJ_PORT_CM_REDIRECT (reason 24) and to save the redirect information (RedirectLID, RedirectP_Key, and RedirectQP) and set SRP_DLID_REDIRECT target status. If target status on completion of srp_send_req is SRP_DLID_REDIRECT, the send request is retried with the new redirect information. Signed-off-by: John Kingman storagegear.com> Index: ib_cm.h =================================================================== --- ib_cm.h (revision 3328) +++ ib_cm.h (working copy) @@ -290,6 +290,7 @@ struct ib_cm_id { enum ib_cm_lap_state lap_state; /* internal CM/debug use */ __be32 local_id; __be32 remote_id; + u32 redirect_qpn; }; /** Index: cm.c =================================================================== --- cm.c (revision 3328) +++ cm.c (working copy) @@ -167,13 +167,14 @@ static int cm_alloc_msg(struct cm_id_pri struct ib_mad_agent *mad_agent; struct ib_mad_send_buf *m; struct ib_ah *ah; + u32 qpn = cm_id_priv->id.redirect_qpn? cm_id_priv->id.redirect_qpn: 1; mad_agent = cm_id_priv->av.port->mad_agent; ah = ib_create_ah(mad_agent->qp->pd, &cm_id_priv->av.ah_attr); if (IS_ERR(ah)) return PTR_ERR(ah); - m = ib_create_send_mad(mad_agent, 1, cm_id_priv->av.pkey_index, + m = ib_create_send_mad(mad_agent, qpn, cm_id_priv->av.pkey_index, ah, 0, sizeof(struct ib_mad_hdr), sizeof(struct ib_mad)-sizeof(struct ib_mad_hdr), GFP_ATOMIC); From kingman at storagegear.com Wed Sep 7 10:09:36 2005 From: kingman at storagegear.com (John Kingman) Date: Wed, 7 Sep 2005 12:09:36 -0500 (CDT) Subject: [openib-general] [PATCH] [CM] 2/2 Fix CM redirection in SRP Message-ID: SRP changes to handle IB_CM_REJ_PORT_CM_REDIRECT with RedirectLID. Signed-off-by: John Kingman storagegear.com> Index: ib_srp.h =================================================================== --- ib_srp.h (revision 3328) +++ ib_srp.h (working copy) @@ -49,6 +49,7 @@ enum { SRP_PORT_REDIRECT = 1, + SRP_DLID_REDIRECT = 2, SRP_MAX_IU_LEN = 256, Index: ib_srp.c =================================================================== --- ib_srp.c (revision 3328) +++ ib_srp.c (working copy) @@ -738,15 +738,38 @@ static int srp_cm_handler(struct ib_cm_i comp = 1; if (event->param.rej_rcvd.reason == IB_CM_REJ_PORT_CM_REDIRECT) { - /* - * Additional Reject Info contains - * ClassPortInfo, which has the RedirectGID - * field at an offset of 8 bytes. - */ - memcpy(target->path.dgid.raw, - event->param.rej_rcvd.ari + 8, 16); + /* + * Additional Reject Info contains ClassPortInfo, which has + * the RedirectGID field at an offset of 8 bytes, + * the RedirectLID field at an offset of 28 bytes, + * the RedirectP_Key field at an offset of 30 bytes, and + * the RedirectQP field at an offset of 33 bytes. + */ + target->path.dlid = *(__be16 *)(event->param.rej_rcvd.ari + 28); + target->path.pkey = *(__be16 *)(event->param.rej_rcvd.ari + 30); + cm_id->redirect_qpn = + be32_to_cpu(*(u32 *)(event->param.rej_rcvd.ari + 32)) + & 0x00ffffff; + if (target->path.dlid) { + /* + * If RedirectLID is non-zero, it is the DLID a + * requester shall use to access the class services. + */ + target->status = SRP_DLID_REDIRECT; + } else { + /* + * If the RedirectLID value is zero, the redirect + * requires the requester to use the supplied + * RedirectGID to request further path resolution + * from subnet administration. + */ + memcpy(target->path.dgid.raw, + event->param.rej_rcvd.ari + 8, 16); - target->status = SRP_PORT_REDIRECT; + target->status = SRP_PORT_REDIRECT; + } } else if (topspin_workarounds && !memcmp(&target->ioc_guid, topspin_oui, 3) && event->param.rej_rcvd.reason == IB_CM_REJ_PORT_REDIRECT) { @@ -1230,6 +1253,7 @@ retry_path: goto err; } +retry_send: init_completion(&target->done); ret = srp_send_req(target); if (ret) @@ -1240,9 +1264,12 @@ retry_path: /* * The CM event handling code will set status to * SRP_PORT_REDIRECT if we get a port redirect REJ back. + * or SRP_DLID_REDIRECT if we get a lid/qp redirect REJ back. */ if (target->status == SRP_PORT_REDIRECT) goto retry_path; + else if (target->status == SRP_DLID_REDIRECT) + goto retry_send; else if (target->status < 0) { printk(KERN_ERR PFX "Connection failed\n"); ret = target->status; From rolandd at cisco.com Wed Sep 7 10:13:51 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 10:13:51 -0700 Subject: [openib-general] Re: RMPP fixes for 2.6.14 In-Reply-To: <20050907170718.GC10137@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Sep 2005 20:07:19 +0300") References: <52psrkolql.fsf@cisco.com> <20050907170718.GC10137@mellanox.co.il> Message-ID: <52ll28ol68.fsf@cisco.com> Michael> What should I be looking at? Linus's git? You can look at my git: http://www.kernel.org/git/?p=linux/kernel/git/roland/infiniband.git;a=summary I just pushed a few more things, so it will take a few more minutes to propagate to all the mirrors. In my previous email, I was a little unclear. I was just asking for more RMPP changes specifically, since I know there's something to merge there. But letting me know about anything that I'm missing would be good. Michael> The qp->wait init patch :) Yes, that's in there. I have the following in my git tree on top of what's already in Linus's tree: Michael S. Tsirkin: IPoIB: fix memory leak IB/sa_query: avoid unnecessary list scan IB: Initialize qp->wait Roland Dreier: IB: really reset QPs Sean Hefty: IB: Add user-supplied context to userspace CM ABI - R. From mshefty at ichips.intel.com Wed Sep 7 10:13:59 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Sep 2005 10:13:59 -0700 Subject: [openib-general] Re: [PATCH] [CM] 3/6 SDP updates to bind cm_id's to a device In-Reply-To: <20050907170155.GA10137@mellanox.co.il> References: <20050907091414.GV19358@mellanox.co.il> <431F1906.1050301@ichips.intel.com> <20050907170155.GA10137@mellanox.co.il> Message-ID: <431F1FD7.9050808@ichips.intel.com> Michael S. Tsirkin wrote: >>The state of the cm_id is controlled by the CM and can change at any time >>as a result of processing a received MAD. > > I see. Lets hide this field then. > At least, this warrants a comment in the header file. In ib_cm.h: enum ib_cm_state state; /* internal CM/debug use */ enum ib_cm_lap_state lap_state; /* internal CM/debug use */ > I'd say you cant usefully debug with something that changes at any time, anyway. > Let's just have a compile time flag for dumping cm traffic to syslog. > Makes sense? I use madeye to dump CM MAD traffic. I can use that to verify that cm_id states are correct based on which messages have been received. I did try to make these states internal to the CM a few months ago, but did not want to remove them from the API until all clients no longer access them. I believe that once SDP is fixed, these can be moved into the internal structure to avoid this issue in the future. - Sean From rolandd at cisco.com Wed Sep 7 10:15:53 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 10:15:53 -0700 Subject: [openib-general] [PATCH] [CM] 1/2 Fix CM redirection In-Reply-To: (John Kingman's message of "Wed, 7 Sep 2005 12:09:23 -0500 (CDT)") References: Message-ID: <52hdcwol2u.fsf@cisco.com> a very very minor nit: > + u32 qpn = cm_id_priv->id.redirect_qpn? cm_id_priv->id.redirect_qpn: 1; Please put a space before the '?' - R. From rolandd at cisco.com Wed Sep 7 10:21:22 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 10:21:22 -0700 Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: (John Kingman's message of "Wed, 7 Sep 2005 12:09:36 -0500 (CDT)") References: Message-ID: <52d5nkoktp.fsf@cisco.com> Thanks, looks pretty good. Have you tested with an SRP target that actually sends back a redirected CM LID? Does this mean you are developing such a target? ;) One question on the CM interface: > + cm_id->redirect_qpn = > + be32_to_cpu(*(u32 *)(event->param.rej_rcvd.ari + 32)) > + & 0x00ffffff; It seems a little awkward that a consumer has to poke a value back into the cm_id structure. Sean, how do you want to handle this? - R. From mst at mellanox.co.il Wed Sep 7 10:22:30 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 20:22:30 +0300 Subject: [openib-general] Re: Re: [PATCH] [CM] 3/6 SDP updates to bind cm_id's to a device In-Reply-To: <431F1FD7.9050808@ichips.intel.com> References: <20050907091414.GV19358@mellanox.co.il> <431F1906.1050301@ichips.intel.com> <20050907170155.GA10137@mellanox.co.il> <431F1FD7.9050808@ichips.intel.com> Message-ID: <20050907172230.GD10137@mellanox.co.il> Quoting Sean Hefty : > >At least, this warrants a comment in the header file. > > In ib_cm.h: > > enum ib_cm_state state; /* internal CM/debug use */ > enum ib_cm_lap_state lap_state; /* internal CM/debug use */ Doh. Should have checked my facts better before posting. [...] > I believe that once SDP is fixed, these can be moved into the internal > structure to avoid this issue in the future. I agree. In fact, if you post a separate patch for just the cm state I'm ready to apply that right away. Thanks, -- MST From mshefty at ichips.intel.com Wed Sep 7 10:28:36 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Sep 2005 10:28:36 -0700 Subject: [openib-general] [PATCH] [CM] 1/2 Fix CM redirection In-Reply-To: References: Message-ID: <431F2344.1090400@ichips.intel.com> John Kingman wrote: > I found that CM handling for SRP is broken when handling a REJ with > reason 24 (Port and CM Redirection) with a RedirectLID supplied. As > stated in the spec, if RedirectLID is non-zero, it is the DLID a > requester _shall_ use to access the class services. I believe that > without this support, CM does not comply with C13-28 with respect to > RedirectLID. > > In my testing, the following patches seem to fix the problem. If there > is a better way to fix the problem, I would appreciate the input. Thanks for the feedback. I didn't realize anyone was using redirection. Many months ago we had some discussion around supporting QP redirection. I don't think that we reached any conclusion on the best way to handle it. So, you are correct in that the current code base does not handle QP redirection. I'll need to spend some time thinking about what we may want to do here. Your CM patch seems good for handling a single connection, but won't result in automatically redirecting future connection requests to the redirected port. I'm not sure if QP redirection should be provided by a library or integrated with the MAD layer. - Sean From rolandd at cisco.com Wed Sep 7 10:33:17 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 10:33:17 -0700 Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: (John Kingman's message of "Wed, 7 Sep 2005 12:09:36 -0500 (CDT)") References: Message-ID: <528xy8ok9u.fsf@cisco.com> By the way, the old code: - /* - * Additional Reject Info contains - * ClassPortInfo, which has the RedirectGID - * field at an offset of 8 bytes. - */ - memcpy(target->path.dgid.raw, - event->param.rej_rcvd.ari + 8, 16); was hokey enough, but this stuff: + /* + * Additional Reject Info contains ClassPortInfo, which has + * the RedirectGID field at an offset of 8 bytes, + * the RedirectLID field at an offset of 28 bytes, + * the RedirectP_Key field at an offset of 30 bytes, and + * the RedirectQP field at an offset of 33 bytes. + */ + target->path.dlid = *(__be16 *)(event->param.rej_rcvd.ari + 28); + target->path.pkey = *(__be16 *)(event->param.rej_rcvd.ari + 30); + cm_id->redirect_qpn = + be32_to_cpu(*(u32 *)(event->param.rej_rcvd.ari + 32)) + & 0x00ffffff; + if (target->path.dlid) { + /* + * If RedirectLID is non-zero, it is the DLID a + * requester shall use to access the class services. + */ + target->status = SRP_DLID_REDIRECT; + } else { + /* + * If the RedirectLID value is zero, the redirect + * requires the requester to use the supplied + * RedirectGID to request further path resolution + * from subnet administration. + */ + memcpy(target->path.dgid.raw, + event->param.rej_rcvd.ari + 8, 16); seems to be screaming that we need some generic handling of ClassPortInfo for CM redirects. - R. From Thomas.Duffy.99 at alumni.brown.edu Wed Sep 7 10:35:39 2005 From: Thomas.Duffy.99 at alumni.brown.edu (Tom Duffy) Date: Wed, 7 Sep 2005 10:35:39 -0700 Subject: [openib-general] Re: [PATCH] [CM] 3/6 SDP updates to bind cm_id's to a device In-Reply-To: <431F1FD7.9050808@ichips.intel.com> References: <20050907091414.GV19358@mellanox.co.il> <431F1906.1050301@ichips.intel.com> <20050907170155.GA10137@mellanox.co.il> <431F1FD7.9050808@ichips.intel.com> Message-ID: <695231A3-DDD9-4BBD-809E-72DE2281644E@alumni.brown.edu> On Sep 7, 2005, at 10:13 AM, Sean Hefty wrote: > I use madeye to dump CM MAD traffic. I can use that to verify that > cm_id states are correct based on which messages have been > received. I did try to make these states internal to the CM a few > months ago, but did not want to remove them from the API until all > clients no longer access them. I believe that once SDP is fixed, > these can be moved into the internal structure to avoid this issue > in the future. I remember this discussion. And thanks for taking the time to update SDP. Sounds like a good plan going forward. -tduffy From halr at voltaire.com Wed Sep 7 10:38:47 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Sep 2005 13:38:47 -0400 Subject: [openib-general] RMPP fixes for 2.6.14 In-Reply-To: <52psrkolql.fsf@cisco.com> References: <52psrkolql.fsf@cisco.com> Message-ID: <1126114726.4401.153.camel@hal.voltaire.com> On Wed, 2005-09-07 at 13:01, Roland Dreier wrote: > I found this RMPP difference between the current kernel and our > subversion tree. Is there anything else that needs to be merged for > the kernel 2.6.14 tree? > > - R. > > --- old/drivers/infiniband/core/mad_rmpp.c 2005-09-07 09:48:48.232278654 -0700 > +++ new/drivers/infiniband/core/mad_rmpp.c 2005-08-30 20:26:41.989894000 -0700 > @@ -593,7 +593,8 @@ > rmpp_mad->rmpp_hdr.paylen_newwin = > cpu_to_be32(mad_send_wr->total_seg * > (sizeof(struct ib_rmpp_mad) - > - offsetof(struct ib_rmpp_mad, data))); > + offsetof(struct ib_rmpp_mad, data)) - > + mad_send_wr->pad); > mad_send_wr->sg_list[0].length = sizeof(struct ib_rmpp_mad); > } else { > mad_send_wr->send_wr.num_sge = 2; > @@ -602,6 +603,7 @@ > mad_send_wr->sg_list[1].length = sizeof(struct ib_rmpp_mad) - > mad_send_wr->data_offset; > mad_send_wr->sg_list[1].lkey = mad_send_wr->sg_list[0].lkey; > + rmpp_mad->rmpp_hdr.paylen_newwin = 0; > } > > if (mad_send_wr->seg_num == mad_send_wr->total_seg) { Yes, that looks like the RMPP change since last pushed upstream. I will see if there is anything else which might be a candidate. -- Hal From mst at mellanox.co.il Wed Sep 7 10:52:56 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 20:52:56 +0300 Subject: [openib-general] Re: RMPP fixes for 2.6.14 In-Reply-To: <52ll28ol68.fsf@cisco.com> References: <52psrkolql.fsf@cisco.com> <20050907170718.GC10137@mellanox.co.il> <52ll28ol68.fsf@cisco.com> Message-ID: <20050907175256.GA11128@mellanox.co.il> Quoting r. Roland Dreier : > But letting me know about anything that I'm missing > would be good. Roland, what do you say to the idea of moving mthca_doorbell.h to somewhere under include/asm? Its not really mthca specific, is it? -- MST From mshefty at ichips.intel.com Wed Sep 7 10:52:47 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Sep 2005 10:52:47 -0700 Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: <52d5nkoktp.fsf@cisco.com> References: <52d5nkoktp.fsf@cisco.com> Message-ID: <431F28EF.8020200@ichips.intel.com> Roland Dreier wrote: > One question on the CM interface: > > > + cm_id->redirect_qpn = > > + be32_to_cpu(*(u32 *)(event->param.rej_rcvd.ari + 32)) > > + & 0x00ffffff; > > It seems a little awkward that a consumer has to poke a value back > into the cm_id structure. Sean, how do you want to handle this? I'm not sure. The first thought that comes to mind is having a MAD redirection module that the CM could query before sending any message. - Sean From rolandd at cisco.com Wed Sep 7 10:58:21 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 10:58:21 -0700 Subject: [openib-general] Re: RMPP fixes for 2.6.14 In-Reply-To: <20050907175256.GA11128@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Sep 2005 20:52:56 +0300") References: <52psrkolql.fsf@cisco.com> <20050907170718.GC10137@mellanox.co.il> <52ll28ol68.fsf@cisco.com> <20050907175256.GA11128@mellanox.co.il> Message-ID: <52zmqon4jm.fsf@cisco.com> Michael> Roland, what do you say to the idea of moving Michael> mthca_doorbell.h to somewhere under include/asm? Its not Michael> really mthca specific, is it? Some of it definitely seems like it could be made generic. I'm not sure whether mthca_write_db_rec() is worth it, but the write64() emulation with a lock might be worth it on 32-bit systems. - R. From rolandd at cisco.com Wed Sep 7 11:00:12 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 11:00:12 -0700 Subject: [openib-general] RMPP fixes for 2.6.14 In-Reply-To: <1126114726.4401.153.camel@hal.voltaire.com> (Hal Rosenstock's message of "07 Sep 2005 13:38:47 -0400") References: <52psrkolql.fsf@cisco.com> <1126114726.4401.153.camel@hal.voltaire.com> Message-ID: <52vf1cn4gj.fsf@cisco.com> OK, I'll add this to my git tree. - R. From mst at mellanox.co.il Wed Sep 7 11:00:58 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 21:00:58 +0300 Subject: [openib-general] Re: RMPP fixes for 2.6.14 In-Reply-To: <52zmqon4jm.fsf@cisco.com> References: <52psrkolql.fsf@cisco.com> <20050907170718.GC10137@mellanox.co.il> <52ll28ol68.fsf@cisco.com> <20050907175256.GA11128@mellanox.co.il> <52zmqon4jm.fsf@cisco.com> Message-ID: <20050907180058.GB11245@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: RMPP fixes for 2.6.14 > > Michael> Roland, what do you say to the idea of moving > Michael> mthca_doorbell.h to somewhere under include/asm? Its not > Michael> really mthca specific, is it? > > Some of it definitely seems like it could be made generic. I'm not > sure whether mthca_write_db_rec() is worth it, but the write64() > emulation with a lock might be worth it on 32-bit systems. Yes, thats what I was referring to. Too late for 2.6.14? -- MST From kingman at storagegear.com Wed Sep 7 11:14:50 2005 From: kingman at storagegear.com (John Kingman) Date: Wed, 7 Sep 2005 18:14:50 +0000 (UTC) Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP References: <52d5nkoktp.fsf@cisco.com> Message-ID: Roland Dreier cisco.com> writes: > > Thanks, looks pretty good. Have you tested with an SRP target that > actually sends back a redirected CM LID? Yes, but the set of fields in cm rej, cm rep, classportinfo, etc. that have not been tested yet is quite large. >Does this mean you are developing such a target? ;) Yes. John From halr at voltaire.com Wed Sep 7 11:21:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Sep 2005 14:21:25 -0400 Subject: [openib-general] More 2.6.14 Candidates Message-ID: <1126117285.4401.208.camel@hal.voltaire.com> Hi Roland, I found some more candidates for 2.6.14: More mad_rmpp.c changes r2928 | sean.hefty | 2005-07-28 14:45:56 -0400 (Thu, 28 Jul 2005) | 5 lines Fix sparse warnings. Use __be* where appropriate. Signed-off-by: Sean Hefty ------------------------------------------------------------------------ r2868 | sean.hefty | 2005-07-15 19:34:32 -0400 (Fri, 15 Jul 2005) | 4 lines Add handling for ABORT / STOP RMPP MADs. Signed-off-by: Sean Hefty Sparse changes to: ib_cm.h ib_mad.h ib_sa.h ib_smi.h ib_user_cm.h cm.c cm_msgs.h mad.c mad_priv.h sa_query.c sysfs.c ucm.h ucm.c ud_header.c user_mad.c SRQ support ib_verbs.h uverbs.h verbs.c (uverbs_cmd.c, uverbs_main.c, uverbs_mem.c) fmr_pool.c fix r3086 | roland | 2005-08-15 10:46:43 -0400 (Mon, 15 Aug 2005) | 6 lines Make sure that all FMRs are unmapped before we deallocate them so that we don't leak references to our protection domain when destroying an FMR pool. (Bug reported by Guy German ) Signed-off-by: Roland Dreier I haven't looked at mthca or IPoIB yet. Should I verify these too ? -- Hal From halr at voltaire.com Wed Sep 7 11:29:58 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Sep 2005 14:29:58 -0400 Subject: [openib-general] Re: Some OpenSM 1.8.0 Anomalies In-Reply-To: <431F1CC2.9020104@mellanox.co.il> References: <1126101147.4396.1405.camel@hal.voltaire.com> <431F1CC2.9020104@mellanox.co.il> Message-ID: <1126117797.4401.215.camel@hal.voltaire.com> On Wed, 2005-09-07 at 13:00, Eitan Zahavi wrote: > > Also, I see the following messages although these should not be the case: > > Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 > This means that the LID of the port registered as the source for this inform info is not recognized as a valid LID. > > ... > > Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 > The meaning of this is that the incoming trap source is not a recognized (included in the SM database) guid Both this LID and GUID were discovered so I don't understand how this is the case. I can send the log separately if interested (it is over >1M gzipped). -- Hal From sean.hefty at intel.com Wed Sep 7 11:32:58 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 7 Sep 2005 11:32:58 -0700 Subject: [openib-general] Re: Re: [PATCH] [SDP] change CM event processing In-Reply-To: <20050907172230.GD10137@mellanox.co.il> Message-ID: >> I believe that once SDP is fixed, these can be moved into the internal >> structure to avoid this issue in the future. > >I agree. In fact, if you post a separate patch for just the cm state >I'm ready to apply that right away. I think that all you need is the diff from sdp_event.c (extracted below). Signed-off-by: Sean Hefty Index: sdp_event.c =================================================================== --- sdp_event.c (revision 3295) +++ sdp_event.c (working copy) @@ -384,45 +384,45 @@ int sdp_cm_event_handler(struct ib_cm_id struct sdp_sock *conn = NULL; int result = 0; - sdp_dbg_ctrl(NULL, "CM state <%d> event <%d> commID <%08x> ID <%d>", - cm_id->state, event->event, cm_id->local_id, hashent); - /* - * lookup the connection, on a REQ_RECV the sk will be empty. - */ - conn = sdp_conn_table_lookup(hashent); - if (conn) - sdp_conn_lock(conn); - else - if (cm_id->state != IB_CM_REQ_RCVD) { - sdp_dbg_warn(NULL, - "No conn <%d> CM state <%d> event <%d>", - hashent, cm_id->state, event->event); + sdp_dbg_ctrl(NULL, "event <%d> commID <%08x> ID <%d>", + event->event, cm_id->local_id, hashent); + + if (event->event != IB_CM_REQ_RECEIVED) { + conn = sdp_conn_table_lookup(hashent); + if (conn) + sdp_conn_lock(conn); + else return -EINVAL; - } + } - switch (cm_id->state) { - case IB_CM_REQ_RCVD: + switch (event->event) { + case IB_CM_REQ_RECEIVED: result = sdp_cm_req_handler(cm_id, event); break; - case IB_CM_REP_RCVD: + case IB_CM_REP_RECEIVED: result = sdp_cm_rep_handler(cm_id, event, conn); break; - case IB_CM_IDLE: + case IB_CM_REQ_ERROR: + case IB_CM_REP_ERROR: + case IB_CM_REJ_RECEIVED: + case IB_CM_TIMEWAIT_EXIT: result = sdp_cm_idle(cm_id, event, conn); break; - case IB_CM_ESTABLISHED: + case IB_CM_RTU_RECEIVED: + case IB_CM_USER_ESTABLISHED: result = sdp_cm_established(cm_id, event, conn); break; - case IB_CM_DREQ_RCVD: + case IB_CM_DREQ_RECEIVED: result = sdp_cm_dreq_rcvd(cm_id, event, conn); if (result) break; /* fall through on success to handle state transition */ - case IB_CM_TIMEWAIT: + case IB_CM_DREQ_ERROR: + case IB_CM_DREP_RECEIVED: result = sdp_cm_timewait(cm_id, event, conn); break; default: - sdp_dbg_warn(conn, "Unexpected CM state <%d>", cm_id->state); + sdp_dbg_warn(conn, "Unhandled CM event <%d>", event->event); result = -EINVAL; } /* From rolandd at cisco.com Wed Sep 7 11:39:39 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 11:39:39 -0700 Subject: [openib-general] Re: RMPP fixes for 2.6.14 In-Reply-To: <20050907180058.GB11245@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Sep 2005 21:00:58 +0300") References: <52psrkolql.fsf@cisco.com> <20050907170718.GC10137@mellanox.co.il> <52ll28ol68.fsf@cisco.com> <20050907175256.GA11128@mellanox.co.il> <52zmqon4jm.fsf@cisco.com> <20050907180058.GB11245@mellanox.co.il> Message-ID: <52r7c0n2ms.fsf@cisco.com> Michael> Yes, thats what I was referring to. Too late for 2.6.14? Probably. I wouldn't be comfortable pushing that into all the arch trees through my git tree. I think we would need to go through lkml, and I think 2.6.14 will be closed to this sort of stuff around Friday. - R. From mst at mellanox.co.il Wed Sep 7 11:40:25 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 21:40:25 +0300 Subject: [openib-general] Re: Re: [PATCH] [SDP] change CM event processing In-Reply-To: References: <20050907172230.GD10137@mellanox.co.il> Message-ID: <20050907184025.GA11647@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: [openib-general] Re: Re: [PATCH] [SDP] change CM event processing > > >> I believe that once SDP is fixed, these can be moved into the internal > >> structure to avoid this issue in the future. > > > >I agree. In fact, if you post a separate patch for just the cm state > >I'm ready to apply that right away. > > I think that all you need is the diff from sdp_event.c (extracted below). > > Signed-off-by: Sean Hefty Looks good. Is that tested? -- MST From ftillier at silverstorm.com Wed Sep 7 11:42:25 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Wed, 7 Sep 2005 11:42:25 -0700 Subject: [openib-general] [PATCH] [CM] 1/2 Fix CM redirection In-Reply-To: Message-ID: <002701c5b3db$e536f100$9e5aa8c0@infiniconsys.com> > From: John Kingman [mailto:kingman at storagegear.com] > Sent: Wednesday, September 07, 2005 10:09 AM > > I found that CM handling for SRP is broken when handling a REJ with > reason 24 (Port and CM Redirection) with a RedirectLID supplied. As > stated in the spec, if RedirectLID is non-zero, it is the DLID a > requester _shall_ use to access the class services. I believe that > without this support, CM does not comply with C13-28 with respect to > RedirectLID. > > In my testing, the following patches seem to fix the problem. If there > is a better way to fix the problem, I would appreciate the input. > > Signed-off-by: John Kingman storagegear.com> > > Index: ib_cm.h > =================================================================== > --- ib_cm.h (revision 3328) > +++ ib_cm.h (working copy) > @@ -290,6 +290,7 @@ struct ib_cm_id { > enum ib_cm_lap_state lap_state; /* internal CM/debug use */ > __be32 local_id; > __be32 remote_id; > + u32 redirect_qpn; > }; > > /** > > Index: cm.c > =================================================================== > --- cm.c (revision 3328) > +++ cm.c (working copy) > @@ -167,13 +167,14 @@ static int cm_alloc_msg(struct cm_id_pri > struct ib_mad_agent *mad_agent; > struct ib_mad_send_buf *m; > struct ib_ah *ah; > + u32 qpn = cm_id_priv->id.redirect_qpn? cm_id_priv->id.redirect_qpn: > 1; > > mad_agent = cm_id_priv->av.port->mad_agent; > ah = ib_create_ah(mad_agent->qp->pd, &cm_id_priv->av.ah_attr); > if (IS_ERR(ah)) > return PTR_ERR(ah); > > - m = ib_create_send_mad(mad_agent, 1, cm_id_priv->av.pkey_index, > + m = ib_create_send_mad(mad_agent, qpn, cm_id_priv->av.pkey_index, > ah, 0, sizeof(struct ib_mad_hdr), > sizeof(struct ib_mad)-sizeof(struct ib_mad_hdr), > GFP_ATOMIC); Why not just initialize redirect_qpn to 1 and just use it always? From mst at mellanox.co.il Wed Sep 7 11:43:21 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 21:43:21 +0300 Subject: [openib-general] Re: RMPP fixes for 2.6.14 In-Reply-To: <52r7c0n2ms.fsf@cisco.com> References: <52psrkolql.fsf@cisco.com> <20050907170718.GC10137@mellanox.co.il> <52ll28ol68.fsf@cisco.com> <20050907175256.GA11128@mellanox.co.il> <52zmqon4jm.fsf@cisco.com> <20050907180058.GB11245@mellanox.co.il> <52r7c0n2ms.fsf@cisco.com> Message-ID: <20050907184321.GB11647@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: RMPP fixes for 2.6.14 > > Michael> Yes, thats what I was referring to. Too late for 2.6.14? > > Probably. I wouldn't be comfortable pushing that into all the arch > trees through my git tree. I think we would need to go through lkml, > and I think 2.6.14 will be closed to this sort of stuff around Friday. It'll wait then. I wont be online this weekend. -- MST From rolandd at cisco.com Wed Sep 7 11:45:08 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 11:45:08 -0700 Subject: [openib-general] More 2.6.14 Candidates In-Reply-To: <1126117285.4401.208.camel@hal.voltaire.com> (Hal Rosenstock's message of "07 Sep 2005 14:21:25 -0400") References: <1126117285.4401.208.camel@hal.voltaire.com> Message-ID: <52mzmon2dn.fsf@cisco.com> Hal> Hi Roland, I found some more candidates for 2.6.14: Are you comparing to my git tree? A lot of these are already there, and in fact a lot are already merged into Linus's git tree. - R. From mshefty at ichips.intel.com Wed Sep 7 11:46:51 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Sep 2005 11:46:51 -0700 Subject: [openib-general] Re: Re: [PATCH] [SDP] change CM event processing In-Reply-To: <20050907184025.GA11647@mellanox.co.il> References: <20050907172230.GD10137@mellanox.co.il> <20050907184025.GA11647@mellanox.co.il> Message-ID: <431F359B.5070201@ichips.intel.com> Michael S. Tsirkin wrote: > Looks good. Is that tested? No. It will take me a while to get to that, but it's on my list to do. - Sean From ftillier at silverstorm.com Wed Sep 7 11:50:03 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Wed, 7 Sep 2005 11:50:03 -0700 Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: <431F28EF.8020200@ichips.intel.com> Message-ID: <002801c5b3dc$f6b15140$9e5aa8c0@infiniconsys.com> > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Wednesday, September 07, 2005 10:53 AM > > Roland Dreier wrote: > > One question on the CM interface: > > > > > + cm_id->redirect_qpn = > > > + be32_to_cpu(*(u32 *)(event->param.rej_rcvd.ari + 32)) > > > + & 0x00ffffff; > > > > It seems a little awkward that a consumer has to poke a value back > > into the cm_id structure. Sean, how do you want to handle this? > > I'm not sure. The first thought that comes to mind is having a MAD > redirection module that the CM could query before sending any message. Since the ARI for a redirect is defined by the IB spec, why not have the CM just update the redirect_qpn (or better yet the forthcoming redirection gizmo) itself when it receives such an REJ? In any case, the CM has to drive any updates of cached redirection information since it is the recipient of the REJ carrying that information. - Fab From mst at mellanox.co.il Wed Sep 7 11:50:21 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 21:50:21 +0300 Subject: [openib-general] Re: Re: [PATCH] [SDP] change CM event processing In-Reply-To: <431F359B.5070201@ichips.intel.com> References: <20050907172230.GD10137@mellanox.co.il> <20050907184025.GA11647@mellanox.co.il> <431F359B.5070201@ichips.intel.com> Message-ID: <20050907185021.GD11647@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] Re: Re: [PATCH] [SDP] change CM event processing > > Michael S. Tsirkin wrote: > >Looks good. Is that tested? > > No. It will take me a while to get to that, but it's on my list to do. If you dont get to this by Sunday, I'll test and merge it. -- MST From mshefty at ichips.intel.com Wed Sep 7 11:54:45 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Sep 2005 11:54:45 -0700 Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: <002801c5b3dc$f6b15140$9e5aa8c0@infiniconsys.com> References: <002801c5b3dc$f6b15140$9e5aa8c0@infiniconsys.com> Message-ID: <431F3775.3050503@ichips.intel.com> Fab Tillier wrote: >>I'm not sure. The first thought that comes to mind is having a MAD >>redirection module that the CM could query before sending any message. > > Since the ARI for a redirect is defined by the IB spec, why not have the CM just > update the redirect_qpn (or better yet the forthcoming redirection gizmo) itself > when it receives such an REJ? The CM still needs to know that redirection should occur, cache that information somewhere, and use it for other CM traffic to that same destination. Updating the redirect_qpn only fixes the issue for that single connection. My thought was to have a module that the CM would call to insert a redirected endpoint, and then call again to see if an outbound MAD should be redirected. Hopefully, such a module could be defined to be useful for other management classes. > In any case, the CM has to drive any updates of cached redirection information > since it is the recipient of the REJ carrying that information. I agree that the CM should control updates to the cache. - Sean From roel at yottayotta.com Wed Sep 7 11:57:30 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Wed, 7 Sep 2005 12:57:30 -0600 (MDT) Subject: [openib-general] SC|05 version of OpenIB Message-ID: Hi all, Does anybody know what version of OpenIB is going to be used at SC|05 for the OpenIb network? Cheers :-), Roel. From halr at voltaire.com Wed Sep 7 12:38:22 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Sep 2005 15:38:22 -0400 Subject: [openib-general] SC|05 version of OpenIB In-Reply-To: References: Message-ID: <1126121901.4401.224.camel@hal.voltaire.com> Hi Roel, On Wed, 2005-09-07 at 14:57, Roel van der Goot wrote: > Does anybody know what version of OpenIB is going to be used at > SC|05 for the OpenIb network? The minimum requirement has yet to be decided. It will use an OpenIB OpenSM with the 1.8.0 changes merged in. Are you exhibiting there ? -- Hal From rolandd at cisco.com Wed Sep 7 12:43:41 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 12:43:41 -0700 Subject: [openib-general] Re: [PATCH] only build userspace verbs support if requested In-Reply-To: (James Lentini's message of "Tue, 6 Sep 2005 16:03:11 -0400 (EDT)") References: <523bon5fcl.fsf@cisco.com> <1125697155.4398.8039.camel@hal.voltaire.com> <52d5nmrujd.fsf@cisco.com> Message-ID: <52aciomzo2.fsf@cisco.com> Thanks, I applied this (with a few tweaks to the Kconfig help text). I also added it to my git tree to push for 2.6.14. - R. From rolandd at cisco.com Wed Sep 7 12:45:20 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 12:45:20 -0700 Subject: [openib-general] Re: [PATCH] use union in ibv_get_device_guid() In-Reply-To: (Sean Hefty's message of "Tue, 6 Sep 2005 16:32:46 -0700") References: Message-ID: <5264tcmzlb.fsf@cisco.com> Because of MST's warning about union member aliasing, I think it's better to fix this mess up like this. No reason to mess around with pointer aliasing at all -- let's just define htonll() and use that. - R. --- libibverbs/include/infiniband/arch.h (revision 3327) +++ libibverbs/include/infiniband/arch.h (working copy) @@ -35,6 +35,17 @@ #ifndef INFINIBAND_ARCH_H #define INFINIBAND_ARCH_H +#include +#include + +#if __BYTE_ORDER == __LITTLE_ENDIAN +static inline uint64_t htonll(uint64_t x) { return bswap_64(x); } +static inline uint64_t ntohll(uint64_t x) { return bswap_64(x); } +#elif __BYTE_ORDER == __BIG_ENDIAN +static inline uint64_t htonll(uint64_t x) { return x; } +static inline uint64_t ntohll(uint64_t x) { return x; } +#endif + /* * Architecture-specific defines. Currently, an architecture is * required to implement the following operations: --- libibverbs/src/device.c (revision 3335) +++ libibverbs/src/device.c (working copy) @@ -44,6 +44,8 @@ #include #include +#include + #include "ibverbs.h" static struct dlist *device_list; @@ -63,7 +65,8 @@ const char *ibv_get_device_name(struct i uint64_t ibv_get_device_guid(struct ibv_device *device) { struct sysfs_attribute *attr; - uint16_t guid[4]; + uint64_t guid = 0; + uint16_t parts[4]; int i; attr = sysfs_get_classdev_attr(device->ibdev, "node_guid"); @@ -71,13 +74,13 @@ uint64_t ibv_get_device_guid(struct ibv_ return 0; if (sscanf(attr->value, "%hx:%hx:%hx:%hx", - guid, guid + 1, guid + 2, guid + 3) != 4) + parts, parts + 1, parts + 2, parts + 3) != 4) return 0; for (i = 0; i < 4; ++i) - guid[i] = htons(guid[i]); + guid = (guid << 16) | parts[i]; - return *(uint64_t *) guid; + return htonll(guid); } struct ibv_context *ibv_open_device(struct ibv_device *device) --- libmthca/src/mthca.h (revision 3327) +++ libmthca/src/mthca.h (working copy) @@ -36,9 +36,6 @@ #ifndef MTHCA_H #define MTHCA_H -#include -#include - #include #include @@ -212,14 +209,6 @@ struct mthca_ah { uint32_t key; }; -#if __BYTE_ORDER == __LITTLE_ENDIAN -static inline uint64_t htonll(uint64_t x) { return bswap_64(x); } -static inline uint64_t ntohll(uint64_t x) { return bswap_64(x); } -#elif __BYTE_ORDER == __BIG_ENDIAN -static inline uint64_t htonll(uint64_t x) { return x; } -static inline uint64_t ntohll(uint64_t x) { return x; } -#endif - static inline unsigned long align(unsigned long val, unsigned long align) { return (val + align - 1) & ~(align - 1); From ftillier at silverstorm.com Wed Sep 7 12:47:33 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Wed, 7 Sep 2005 12:47:33 -0700 Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: <431F3775.3050503@ichips.intel.com> Message-ID: <002901c5b3e4$fe8d5dc0$9e5aa8c0@infiniconsys.com> > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Wednesday, September 07, 2005 11:55 AM > > Fab Tillier wrote: > >>I'm not sure. The first thought that comes to mind is having a MAD > >>redirection module that the CM could query before sending any message. > > > > Since the ARI for a redirect is defined by the IB spec, why not have > > the CM just update the redirect_qpn (or better yet the forthcoming > > redirection gizmo) itself when it receives such an REJ? > > The CM still needs to know that redirection should occur, cache that > information somewhere, and use it for other CM traffic to that same > destination. Updating the redirect_qpn only fixes the issue for that > single connection. My thought was to have a module that the CM would > call to insert a redirected endpoint, and then call again to see if an > outbound MAD should be redirected. Hopefully, such a module could be > defined to be useful for other management classes. Right, hence the "or better yet the forthcoming redirection gizmo". My point was that the CM should handle this, not the CM client (in this case SDP). - Fab From roel at yottayotta.com Wed Sep 7 12:56:23 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Wed, 7 Sep 2005 13:56:23 -0600 (MDT) Subject: [openib-general] SC|05 version of OpenIB In-Reply-To: <1126121901.4401.224.camel@hal.voltaire.com> References: <1126121901.4401.224.camel@hal.voltaire.com> Message-ID: Hi Hal, Hal Rosenstock wrote: > Hi Roel, > > On Wed, 2005-09-07 at 14:57, Roel van der Goot wrote: >> Does anybody know what version of OpenIB is going to be used at >> SC|05 for the OpenIb network? > > The minimum requirement has yet to be decided. It will use an OpenIB > OpenSM with the 1.8.0 changes merged in. Are you exhibiting there ? If I get everything running, YottaYotta will be exhibiting, yes. ;-) Have you made a decision on when are you planning on checking your changes in? Or do you prefer me to work from patches for the time being? Cheers :-), Roel. From halr at voltaire.com Wed Sep 7 13:01:11 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Sep 2005 16:01:11 -0400 Subject: [openib-general] SC|05 version of OpenIB In-Reply-To: References: <1126121901.4401.224.camel@hal.voltaire.com> Message-ID: <1126123174.4401.249.camel@hal.voltaire.com> On Wed, 2005-09-07 at 15:56, Roel van der Goot wrote: > If I get everything running, YottaYotta will be exhibiting, yes. ;-) > Have you made a decision on when are you planning on checking your > changes in? Not yet. > Or do you prefer me to work from patches for the time > being? Not sure what you mean. The patches are to a yet to be checked in 1.8.0 version on the trunk. They may apply to the osm-1.8.0-merge branch as well. Is that what you are using ? -- Hal From mst at mellanox.co.il Wed Sep 7 13:11:54 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 7 Sep 2005 23:11:54 +0300 Subject: [openib-general] Re: [PATCH] use union in ibv_get_device_guid() In-Reply-To: <5264tcmzlb.fsf@cisco.com> References: <5264tcmzlb.fsf@cisco.com> Message-ID: <20050907201154.GF11647@mellanox.co.il> Quoting r. Roland Dreier : > --- libibverbs/include/infiniband/arch.h (revision 3327) > +++ libibverbs/include/infiniband/arch.h (working copy) > @@ -35,6 +35,17 @@ > #ifndef INFINIBAND_ARCH_H > #define INFINIBAND_ARCH_H > > +#include > +#include > + > +#if __BYTE_ORDER == __LITTLE_ENDIAN > +static inline uint64_t htonll(uint64_t x) { return bswap_64(x); } > +static inline uint64_t ntohll(uint64_t x) { return bswap_64(x); } > +#elif __BYTE_ORDER == __BIG_ENDIAN > +static inline uint64_t htonll(uint64_t x) { return x; } > +static inline uint64_t ntohll(uint64_t x) { return x; } > +#endif > + > /* > * Architecture-specific defines. Currently, an architecture is > * required to implement the following operations: Lets also add #else #error __BYTE_ORDER differs from both __LITTLE_ENDIAN and __BIG_ENDIAN #endif just in case? -- MST From roel at yottayotta.com Wed Sep 7 13:15:30 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Wed, 7 Sep 2005 14:15:30 -0600 (MDT) Subject: [openib-general] SC|05 version of OpenIB In-Reply-To: <1126123174.4401.249.camel@hal.voltaire.com> References: <1126121901.4401.224.camel@hal.voltaire.com> <1126123174.4401.249.camel@hal.voltaire.com> Message-ID: Hi Hal: Hal Rosenstock wrote: > Not sure what you mean. The patches are to a yet to be checked in 1.8.0 > version on the trunk. They may apply to the osm-1.8.0-merge branch as > well. Is that what you are using ? I am using the default tree without your merges for osm. Anyway, I am not in a real hurry to move onto that tree, because I am still fighting with getting Linux to shut up about some ACPI problems. Old hardware I think. The Linux box will mainly be running as the OpenSM on an InfiniBand network. Cheers :-), Roel. From steve_wooding at keysounds.co.uk Wed Sep 7 13:37:01 2005 From: steve_wooding at keysounds.co.uk (Steve Wooding) Date: Wed, 07 Sep 2005 21:37:01 +0100 Subject: [openib-general] Re: [PATCH] [uCM] user specified context in CM events + new test program In-Reply-To: <431E11CF.3010903@ichips.intel.com> References: <431E0884.5080308@keysounds.co.uk> <431E11CF.3010903@ichips.intel.com> Message-ID: <431F4F6D.8030105@keysounds.co.uk> Thanks for clarifying that, Sean. Cheers, Steve. Sean Hefty wrote: > Steve Wooding wrote: > >> I've just starting looking into IB connection establishment and I was >> wondering what the ib_cm_init_qp_attr() function actually does. >> Studying the MindShare IB book, it talks about exchanging the QPNs, >> pSNs etc., via the REQ and REP messages. However, looking at your >> cmpost.c example, I see, for exmaple, that in the req handler the >> event containing the REQ message is never actually used when modyfing >> the QP to RTR. The ib_cm_init_qp_attr() function is used instead. >> Does the info in the REQ message get read in kernel space? > > > The kernel CM stores the information used in the establishment of a > connection. It then formats the QP attribute structure for the user. > This avoids every app from having to store these same values and > format the QP attribute structure. > > - Sean > From rolandd at cisco.com Wed Sep 7 13:45:38 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 13:45:38 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <20050907075007.GS19358@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Sep 2005 10:50:08 +0300") References: <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> <431DCC06.8050403@ichips.intel.com> <52fyshriuw.fsf@cisco.com> <20050907075007.GS19358@mellanox.co.il> Message-ID: <52ll28li8d.fsf@cisco.com> Given that Sean and Michael both seem OK with this approach, I went ahead and checked the changes into the OpenIB svn. Sean, if you and Arlin get a chance to test this with uDAPL before Friday morning, I will push this upstream for 2.6.14. Thanks, Roland From sean.hefty at intel.com Wed Sep 7 14:29:58 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 7 Sep 2005 14:29:58 -0700 Subject: [openib-general] [PATCH] [MAD] define MAD data field sizes Message-ID: The following patch defines the MAD data sizes to cleanup the code in a couple of places. Signed-off-by: Sean Hefty Index: include/rdma/ib_mad.h =================================================================== --- include/rdma/ib_mad.h (revision 3335) +++ include/rdma/ib_mad.h (working copy) @@ -108,6 +108,13 @@ #define IB_QP1_QKEY 0x80010000 #define IB_QP_SET_QKEY 0x80000000 +enum { + IB_MGMT_MAD_DATA = 232, + IB_MGMT_RMPP_DATA = 220, + IB_MGMT_VENDOR_DATA = 216, + IB_MGMT_SA_DATA = 200 +}; + struct ib_mad_hdr { u8 base_version; u8 mgmt_class; @@ -149,20 +156,20 @@ struct ib_sa_hdr { struct ib_mad { struct ib_mad_hdr mad_hdr; - u8 data[232]; + u8 data[IB_MGMT_MAD_DATA]; }; struct ib_rmpp_mad { struct ib_mad_hdr mad_hdr; struct ib_rmpp_hdr rmpp_hdr; - u8 data[220]; + u8 data[IB_MGMT_RMPP_DATA]; }; struct ib_sa_mad { struct ib_mad_hdr mad_hdr; struct ib_rmpp_hdr rmpp_hdr; struct ib_sa_hdr sa_hdr; - u8 data[200]; + u8 data[IB_MGMT_SA_DATA]; } __attribute__ ((packed)); struct ib_vendor_mad { @@ -170,7 +177,7 @@ struct ib_vendor_mad { struct ib_rmpp_hdr rmpp_hdr; u8 reserved; u8 oui[3]; - u8 data[216]; + u8 data[IB_MGMT_VENDOR_DATA]; }; /** Index: core/mad_rmpp.c =================================================================== --- core/mad_rmpp.c (revision 3335) +++ core/mad_rmpp.c (working copy) @@ -583,6 +583,7 @@ static int send_next_seg(struct ib_mad_s { struct ib_rmpp_mad *rmpp_mad; int timeout; + u32 paylen; rmpp_mad = (struct ib_rmpp_mad *)mad_send_wr->send_wr.wr.ud.mad_hdr; ib_set_rmpp_flags(&rmpp_mad->rmpp_hdr, IB_MGMT_RMPP_FLAG_ACTIVE); @@ -590,11 +591,9 @@ static int send_next_seg(struct ib_mad_s if (mad_send_wr->seg_num == 1) { rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_FIRST; - rmpp_mad->rmpp_hdr.paylen_newwin = - cpu_to_be32(mad_send_wr->total_seg * - (sizeof(struct ib_rmpp_mad) - - offsetof(struct ib_rmpp_mad, data)) - - mad_send_wr->pad); + paylen = mad_send_wr->total_seg * IB_MGMT_RMPP_DATA - + mad_send_wr->pad; + rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(paylen); mad_send_wr->sg_list[0].length = sizeof(struct ib_rmpp_mad); } else { mad_send_wr->send_wr.num_sge = 2; @@ -608,10 +607,8 @@ static int send_next_seg(struct ib_mad_s if (mad_send_wr->seg_num == mad_send_wr->total_seg) { rmpp_mad->rmpp_hdr.rmpp_rtime_flags |= IB_MGMT_RMPP_FLAG_LAST; - rmpp_mad->rmpp_hdr.paylen_newwin = - cpu_to_be32(sizeof(struct ib_rmpp_mad) - - offsetof(struct ib_rmpp_mad, data) - - mad_send_wr->pad); + paylen = IB_MGMT_RMPP_DATA - mad_send_wr->pad; + rmpp_mad->rmpp_hdr.paylen_newwin = cpu_to_be32(paylen); } /* 2 seconds for an ACK until we can find the packet lifetime */ From halr at voltaire.com Wed Sep 7 14:33:30 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 07 Sep 2005 17:33:30 -0400 Subject: [openib-general] [PATCH] [MAD] define MAD data field sizes In-Reply-To: References: Message-ID: <1126128810.4401.356.camel@hal.voltaire.com> On Wed, 2005-09-07 at 17:29, Sean Hefty wrote: > The following patch defines the MAD data sizes to cleanup the code in a > couple of places. Looks good to me. -- Hal From mshefty at ichips.intel.com Wed Sep 7 14:57:19 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Sep 2005 14:57:19 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <431E122F.5040709@ichips.intel.com> References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> <431DCC06.8050403@ichips.intel.com> <52fyshriuw.fsf@cisco.com> <431E122F.5040709@ichips.intel.com> Message-ID: <431F623F.6000408@ichips.intel.com> Sean Hefty wrote: > I think that this would work well. I will update the uCM put event to > match. After looking at the uCM more, changing from it's current model of put_event to ack_events would require changes to the get_event, which requires changes to the ib_cm_event structure. I think that having matching event models across IB interfaces is ideal, but it requires re-working a fair portion of the uCM event processing. How important do others feel that this is? - Sean From rolandd at cisco.com Wed Sep 7 15:01:05 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 15:01:05 -0700 Subject: [openib-general] Re: ibv_get_async_event In-Reply-To: <431F623F.6000408@ichips.intel.com> (Sean Hefty's message of "Wed, 07 Sep 2005 14:57:19 -0700") References: <000e01c5adbd$8c977ed0$9e5aa8c0@infiniconsys.com> <4314F264.6010207@ichips.intel.com> <528xyidkfi.fsf@cisco.com> <431746C3.3050300@ichips.intel.com> <52u0h47hje.fsf@cisco.com> <4318A580.9040201@ichips.intel.com> <52br3b5kg4.fsf@cisco.com> <4318D511.9030305@ichips.intel.com> <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> <431DCC06.8050403@ichips.intel.com> <52fyshriuw.fsf@cisco.com> <431E122F.5040709@ichips.intel.com> <431F623F.6000408@ichips.intel.com> Message-ID: <52hdcwleqm.fsf@cisco.com> Sean> After looking at the uCM more, changing from it's current Sean> model of put_event to ack_events would require changes to Sean> the get_event, which requires changes to the ib_cm_event Sean> structure. I think that having matching event models across Sean> IB interfaces is ideal, but it requires re-working a fair Sean> portion of the uCM event processing. How important do Sean> others feel that this is? I don't think it's important to be able to ack multiple events. I only did it because there's only one type of CQ completion event. For async events, libibverbs only allows a single event to be acked at a time. So I think libibcm is fine the way it is. - R. From rolandd at cisco.com Wed Sep 7 15:02:02 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 15:02:02 -0700 Subject: [openib-general] Re: [PATCH] use union in ibv_get_device_guid() In-Reply-To: <20050907201154.GF11647@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 7 Sep 2005 23:11:54 +0300") References: <5264tcmzlb.fsf@cisco.com> <20050907201154.GF11647@mellanox.co.il> Message-ID: <52d5nklep1.fsf@cisco.com> Michael> Lets also add #else #error __BYTE_ORDER differs from both Michael> __LITTLE_ENDIAN and __BIG_ENDIAN #endif just in case? OK, I guess I can do that. I don't think it's that big a deal though, since we'll immediately see ntohll and htonll as being undefined. - R. From mst at mellanox.co.il Wed Sep 7 15:10:41 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Sep 2005 01:10:41 +0300 Subject: [openib-general] Re: [PATCH] use union in ibv_get_device_guid() In-Reply-To: <52d5nklep1.fsf@cisco.com> References: <5264tcmzlb.fsf@cisco.com> <20050907201154.GF11647@mellanox.co.il> <52d5nklep1.fsf@cisco.com> Message-ID: <20050907221041.GB12853@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] use union in ibv_get_device_guid() > > Michael> Lets also add #else #error __BYTE_ORDER differs from both > Michael> __LITTLE_ENDIAN and __BIG_ENDIAN #endif just in case? > > OK, I guess I can do that. I don't think it's that big a deal though, > since we'll immediately see ntohll and htonll as being undefined. > > - R. > Not a big deal, I just thought an explicit error message would be more readable. -- MST From mst at mellanox.co.il Wed Sep 7 16:23:11 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Sep 2005 02:23:11 +0300 Subject: [openib-general] Re: Re: ibv_get_async_event In-Reply-To: <431F623F.6000408@ichips.intel.com> References: <52hdd33x6s.fsf@cisco.com> <20050904141437.GT1707@mellanox.co.il> <431DC02B.6010506@ichips.intel.com> <52u0gyrw75.fsf@cisco.com> <431DC7F9.2050307@ichips.intel.com> <52ll2arvb3.fsf@cisco.com> <431DCC06.8050403@ichips.intel.com> <52fyshriuw.fsf@cisco.com> <431E122F.5040709@ichips.intel.com> <431F623F.6000408@ichips.intel.com> Message-ID: <20050907232311.GC11245@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: Re: ibv_get_async_event > > Sean Hefty wrote: > >I think that this would work well. I will update the uCM put event to > >match. > > After looking at the uCM more, changing from it's current model of > put_event to ack_events would require changes to the get_event, which > requires changes to the ib_cm_event structure. I think that having > matching event models across IB interfaces is ideal, but it requires > re-working a fair portion of the uCM event processing. How important do > others feel that this is? I dont think its too important: I think Roland added multiple events as a performance optimization for completion events. CM speed isnt that critical a data path IMO to really need such an optimization. -- MST From sean.hefty at intel.com Wed Sep 7 16:24:56 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 7 Sep 2005 16:24:56 -0700 Subject: [openib-general] [PATCH] [verbs] add node_guid to device structure Message-ID: This patch adds the node_guid to struct ib_device to avoid ULPs needing to query for it. It will also make it possible to give users the attributes of a device as part of their add_device routine. If this patch is okay with everyone, I will submit patches to remove the device attribute queries in the CM, SRP, and sysfs. Signed-off-by: Sean Hefty Index: include/rdma/ib_verbs.h =================================================================== --- include/rdma/ib_verbs.h (revision 3340) +++ include/rdma/ib_verbs.h (working copy) @@ -952,6 +952,7 @@ struct ib_device { IB_DEV_UNREGISTERED } reg_state; + __be64 node_guid; u8 node_type; u8 phys_port_cnt; }; Index: core/device.c =================================================================== --- core/device.c (revision 3340) +++ core/device.c (working copy) @@ -227,8 +227,15 @@ static int add_client_context(struct ib_ */ int ib_register_device(struct ib_device *device) { + struct ib_device_attr *device_attr = NULL; int ret; + device_attr = kmalloc(sizeof *device_attr, GFP_KERNEL); + if (!device_attr) { + ret = -ENOMEM; + goto out; + } + down(&device_sem); if (strchr(device->name, '%')) { @@ -247,6 +254,12 @@ int ib_register_device(struct ib_device spin_lock_init(&device->event_handler_lock); spin_lock_init(&device->client_data_lock); + ret = ib_query_device(device, device_attr); + if (ret) + goto out; + + device->node_guid = device_attr->node_guid; + ret = ib_device_register_sysfs(device); if (ret) { printk(KERN_WARNING "Couldn't register device %s with driver model\n", @@ -268,6 +281,7 @@ int ib_register_device(struct ib_device out: up(&device_sem); + kfree(device_attr); return ret; } EXPORT_SYMBOL(ib_register_device); From rolandd at cisco.com Wed Sep 7 16:30:34 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 16:30:34 -0700 Subject: [openib-general] [PATCH] [verbs] add node_guid to device structure In-Reply-To: (Sean Hefty's message of "Wed, 7 Sep 2005 16:24:56 -0700") References: Message-ID: <528xy8lalh.fsf@cisco.com> Sean> This patch adds the node_guid to struct ib_device to avoid Sean> ULPs needing to query for it. Seems reasonable. I can't think of any valid reason why the node_guid would ever change during a device's lifetime. If we're going to put node_guid in the ib_device_attr, should we remove it from struct ib_device_attr? - R. From mshefty at ichips.intel.com Wed Sep 7 16:33:45 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Sep 2005 16:33:45 -0700 Subject: [openib-general] [PATCH] [verbs] add node_guid to device structure In-Reply-To: <528xy8lalh.fsf@cisco.com> References: <528xy8lalh.fsf@cisco.com> Message-ID: <431F78D9.8080809@ichips.intel.com> Roland Dreier wrote: > Sean> This patch adds the node_guid to struct ib_device to avoid > Sean> ULPs needing to query for it. > > Seems reasonable. I can't think of any valid reason why the node_guid > would ever change during a device's lifetime. > > If we're going to put node_guid in the ib_device_attr, should we > remove it from struct ib_device_attr? I call ib_query_device() to set the node_guid. I didn't see any other way of getting it reading through the mthca code. - Sean From rolandd at cisco.com Wed Sep 7 16:40:17 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 16:40:17 -0700 Subject: [openib-general] [PATCH] [verbs] add node_guid to device structure In-Reply-To: <431F78D9.8080809@ichips.intel.com> (Sean Hefty's message of "Wed, 07 Sep 2005 16:33:45 -0700") References: <528xy8lalh.fsf@cisco.com> <431F78D9.8080809@ichips.intel.com> Message-ID: <52zmqojvku.fsf@cisco.com> Sean> I call ib_query_device() to set the node_guid. I didn't see Sean> any other way of getting it reading through the mthca code. I think that it should be the responsibility of the device provider to set the node_guid field before registering the struct ib_device via ib_register_device(). I can add code to mthca to do this if needed. - R. From mshefty at ichips.intel.com Wed Sep 7 16:52:28 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 07 Sep 2005 16:52:28 -0700 Subject: [openib-general] [PATCH] [verbs] add node_guid to device structure In-Reply-To: <52zmqojvku.fsf@cisco.com> References: <528xy8lalh.fsf@cisco.com> <431F78D9.8080809@ichips.intel.com> <52zmqojvku.fsf@cisco.com> Message-ID: <431F7D3C.4070602@ichips.intel.com> Roland Dreier wrote: > Sean> I call ib_query_device() to set the node_guid. I didn't see > Sean> any other way of getting it reading through the mthca code. > > I think that it should be the responsibility of the device provider to > set the node_guid field before registering the struct ib_device via > ib_register_device(). > > I can add code to mthca to do this if needed. If you could add the code to perform this it would save me a lot of time floundering. Thanks, - Sean From rolandd at cisco.com Wed Sep 7 17:21:54 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 07 Sep 2005 17:21:54 -0700 Subject: [openib-general] [PATCH] [verbs] add node_guid to device structure In-Reply-To: <431F7D3C.4070602@ichips.intel.com> (Sean Hefty's message of "Wed, 07 Sep 2005 16:52:28 -0700") References: <528xy8lalh.fsf@cisco.com> <431F78D9.8080809@ichips.intel.com> <52zmqojvku.fsf@cisco.com> <431F7D3C.4070602@ichips.intel.com> Message-ID: <52vf1cjtnh.fsf@cisco.com> Sean> If you could add the code to perform this it would save me a Sean> lot of time floundering. OK, I'll code something up and post a patch. You can merge it at the same time as the other node_guid changes. - R. From eitan at mellanox.co.il Wed Sep 7 23:13:45 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 08 Sep 2005 09:13:45 +0300 Subject: [openib-general] Re: Some OpenSM 1.8.0 Anomalies In-Reply-To: <1126117797.4401.215.camel@hal.voltaire.com> References: <1126101147.4396.1405.camel@hal.voltaire.com> <431F1CC2.9020104@mellanox.co.il> <1126117797.4401.215.camel@hal.voltaire.com> Message-ID: <431FD699.9060404@mellanox.co.il> Hal Rosenstock wrote: > On Wed, 2005-09-07 at 13:00, Eitan Zahavi wrote: > >>>Also, I see the following messages although these should not be the case: >>>Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 >> >>This means that the LID of the port registered as the source for this inform info is not recognized as a valid LID. >> >>>... >>>Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 >> >>The meaning of this is that the incoming trap source is not a recognized (included in the SM database) guid > > > Both this LID and GUID were discovered so I don't understand how this is > the case. I can send the log separately if interested (it is over >1M > gzipped). I guess this trap report and inform info flow are new. Which ULP is using it? Can you describe the flow? Then if you can send us the log file it will help. We should be able to at least track the inform info registration and the trap received from the log (actually this is most what we need). It is very possible this is a new untested flow. We did not validate all the possible filtering of Inform Info based reports. So there might be a simple bug in there (NTOH maybe). > > -- Hal From mst at mellanox.co.il Thu Sep 8 04:41:56 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Sep 2005 14:41:56 +0300 Subject: [openib-general] sdp: kill sdp buff pool Message-ID: <20050908114156.GI19358@mellanox.co.il> SDP seems to have a re-implementation of kmem_cache in sdp_buff. Killing it actually results in a small bandwidth increase for both bcopy and zcopy versions, so I plan to put this in. Note that sdp_buff_pool_chain_link is now same as sdp_buff_pool_put, and sdp_buff_pool_chain_put is now a nop, but I decided to keep it in for now, just in case there's another use for it. sdp_buff.c | 388 ++++++------------------------------------------------------ sdp_dev.h | 7 - sdp_inet.c | 17 -- sdp_proc.c | 6 sdp_proc.h | 11 - sdp_proto.h | 10 - 6 files changed, 50 insertions(+), 389 deletions(-) --- Replace sdp buff pool by kmem_cache. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_buff.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_buff.c 2005-09-08 15:06:20.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_buff.c 2005-09-08 16:03:16.000000000 +0300 @@ -35,7 +35,7 @@ #include "sdp_main.h" -static struct sdpc_buff_root *main_pool = NULL; +static struct sdpc_buff_root main_pool; /* * data buffers managment API */ @@ -305,194 +305,78 @@ void sdp_buff_q_clear_unmap(struct sdpc_ /* * sdp_buff_pool_release - release allocated buffers from the main pool */ -static void sdp_buff_pool_release(struct sdpc_buff_root *m_pool, int count) +static void sdp_buff_pool_release(struct sdpc_buff *buff) { - struct sdpc_buff *buff; - - /* - * Release count buffers. - */ - while (count--) { - buff = sdp_buff_q_get(&m_pool->pool); - if (!buff) - break; - /* - * decrement global buffer count, free buffer page, and free - * buffer descriptor. - */ - m_pool->buff_cur--; - free_page((unsigned long)buff->head); - kmem_cache_free(m_pool->buff_cache, buff); - } -} - -/* - * sdp_buff_pool_release_check - check for buffer release from main pool - */ -static inline void sdp_buff_pool_release_check(struct sdpc_buff_root *m_pool) -{ - /* - * If there are more then minimum buffers outstanding, free half of - * the available buffers. - */ - if (m_pool->buff_cur > m_pool->buff_min && - m_pool->pool.size > m_pool->free_mark) { - int count; - /* - * Always leave at least minimum buffers, otherwise remove - * either half of the pool, which is more then the mark - */ - count = min(m_pool->buff_cur - m_pool->buff_min, - m_pool->free_mark/2); - - sdp_buff_pool_release(m_pool, count); - } + kmem_cache_free(main_pool.pool_cache, buff->head); + kmem_cache_free(main_pool.buff_cache, buff); } /* * sdp_buff_pool_alloc - allocate more buffers for the main pool */ -static int sdp_buff_pool_alloc(struct sdpc_buff_root *m_pool) +static struct sdpc_buff *sdp_buff_pool_alloc(void) { struct sdpc_buff *buff; - int total; - /* - * Calculate the total number of buffers. - */ - total = max(m_pool->buff_min, m_pool->buff_cur + m_pool->alloc_inc); - total = min(total, m_pool->buff_max); - - while (total > m_pool->buff_cur) { - /* - * allocate a buffer descriptor, buffer, and then add it to - * the pool. - */ - buff = kmem_cache_alloc(m_pool->buff_cache, GFP_ATOMIC); - if (!buff) { - sdp_warn("Failed to allocate buffer. <%d:%d>", - total, m_pool->buff_cur); - break; - } - - buff->head = (void *)__get_free_page(GFP_ATOMIC); - if (!buff->head) { - sdp_warn("Failed to allocate buffer page. <%d:%d>", - total, m_pool->buff_cur); - - kmem_cache_free(m_pool->buff_cache, buff); - break; - } - - buff->end = buff->head + PAGE_SIZE; - buff->data = buff->head; - buff->tail = buff->head; - buff->sge.lkey = 0; - buff->sge.addr = 0; - buff->sge.length = 0; - buff->pool = NULL; - buff->type = SDP_DESC_TYPE_BUFF; - buff->release = sdp_buff_pool_put; - - sdp_buff_q_put(&m_pool->pool, buff); - - m_pool->buff_cur++; + buff = kmem_cache_alloc(main_pool.buff_cache, GFP_ATOMIC); + if (!buff) { + sdp_warn("Failed to allocate buffer."); + return NULL; } - if (!main_pool->pool.head) { - sdp_warn("Failed to allocate any buffers. <%d:%d:%d>", - total, m_pool->buff_cur, m_pool->alloc_inc); - - return -ENOMEM; + buff->head = kmem_cache_alloc(main_pool.pool_cache, GFP_ATOMIC); + if (!buff->head) { + sdp_warn("Failed to allocate buffer page"); + kmem_cache_free(main_pool.buff_cache, buff); + return NULL; } - return 0; + buff->end = buff->head + PAGE_SIZE; + buff->data = buff->head; + buff->tail = buff->head; + buff->sge.lkey = 0; + buff->sge.addr = 0; + buff->sge.length = 0; + buff->pool = NULL; + buff->type = SDP_DESC_TYPE_BUFF; + buff->release = sdp_buff_pool_put; + return buff; } /* * sdp_buff_pool_init - Initialize the main buffer pool of memory */ -int sdp_buff_pool_init(int buff_min, int buff_max, int alloc_inc, int free_mark) +int sdp_buff_pool_init(void) { int result; - if (main_pool) { - sdp_warn("Main pool already initialized!"); - return -EEXIST; - } - - if (buff_min <= 0 || - alloc_inc <= 0 || - free_mark <= 0 || - buff_max < buff_min) { - - sdp_warn("Pool allocation count error. <%d:%d:%d:%d>", - buff_min, buff_max, alloc_inc, free_mark); - return -ERANGE; - } - /* - * allocate the main pool structures - */ - main_pool = kmalloc(sizeof(struct sdpc_buff_root), GFP_KERNEL); - if (!main_pool) { - sdp_warn("Main pool initialization failed."); - result = -ENOMEM; - goto done; - } - - memset(main_pool, 0, sizeof(struct sdpc_buff_root)); - - main_pool->buff_size = PAGE_SIZE; - main_pool->buff_min = buff_min; - main_pool->buff_max = buff_max; - main_pool->alloc_inc = alloc_inc; - main_pool->free_mark = free_mark; - - spin_lock_init(&main_pool->lock); - sdp_buff_q_init(&main_pool->pool); - - main_pool->pool_cache = kmem_cache_create("sdp_buff_pool", - sizeof(struct sdpc_buff_q), - 0, SLAB_HWCACHE_ALIGN, + main_pool.pool_cache = kmem_cache_create("sdp_buff_pool", + PAGE_SIZE, + 0, 0, NULL, NULL); - if (!main_pool->pool_cache) { + if (!main_pool.pool_cache) { sdp_warn("Failed to allocate pool cache."); result = -ENOMEM; goto error_pool; } - main_pool->buff_cache = kmem_cache_create("sdp_buff_desc", + main_pool.buff_cache = kmem_cache_create("sdp_buff_desc", sizeof(struct sdpc_buff), 0, SLAB_HWCACHE_ALIGN, NULL, NULL); - if (!main_pool->buff_cache) { + if (!main_pool.buff_cache) { sdp_warn("Failed to allocate buffer cache."); result = -ENOMEM; goto error_buff; } - /* - * allocate the minimum number of buffers. - */ - result = sdp_buff_pool_alloc(main_pool); - if (result < 0) { - sdp_warn("Error <%d> allocating buffers. <%d>", - result, buff_min); - goto error_alloc; - } - /* - * done - */ sdp_dbg_init("Main pool initialized with min:max <%d:%d> buffers.", buff_min, buff_max); - return 0; /* success */ -error_alloc: - kmem_cache_destroy(main_pool->buff_cache); + return 0; + + kmem_cache_destroy(main_pool.buff_cache); error_buff: - kmem_cache_destroy(main_pool->pool_cache); + kmem_cache_destroy(main_pool.pool_cache); error_pool: - kfree(main_pool); -done: - main_pool = NULL; return result; } @@ -501,33 +385,8 @@ done: */ void sdp_buff_pool_destroy(void) { - if (!main_pool) { - sdp_warn("Main pool dosn't exist."); - return; - } - /* - * Free all the buffers. - */ - sdp_buff_pool_release(main_pool, main_pool->buff_cur); - /* - * Sanity check that the current number of buffers was released. - */ - if (main_pool->buff_cur) - sdp_warn("Leaking buffers during cleanup. <%d>", - main_pool->buff_cur); - /* - * free pool cache - */ - kmem_cache_destroy(main_pool->pool_cache); - kmem_cache_destroy(main_pool->buff_cache); - /* - * free main - */ - kfree(main_pool); - main_pool = NULL; - /* - * done - */ + kmem_cache_destroy(main_pool.pool_cache); + kmem_cache_destroy(main_pool.buff_cache); sdp_dbg_init("Main pool destroyed."); } @@ -537,37 +396,10 @@ void sdp_buff_pool_destroy(void) struct sdpc_buff *sdp_buff_pool_get(void) { struct sdpc_buff *buff; - int result; - unsigned long flags; - - /* - * get buffer - */ - spin_lock_irqsave(&main_pool->lock, flags); - if (!main_pool->pool.head) { - result = sdp_buff_pool_alloc(main_pool); - if (result < 0) { - sdp_warn("Error <%d> allocating buffers.", result); - spin_unlock_irqrestore(&main_pool->lock, flags); - return NULL; - } - } - - buff = main_pool->pool.head; - - if (buff->next == buff) - main_pool->pool.head = NULL; - else { - buff->next->prev = buff->prev; - buff->prev->next = buff->next; - - main_pool->pool.head = buff->next; - } - - main_pool->pool.size--; - - spin_unlock_irqrestore(&main_pool->lock, flags); + buff = sdp_buff_pool_alloc(); + if (!buff) + return NULL; buff->next = NULL; buff->prev = NULL; @@ -590,39 +422,7 @@ struct sdpc_buff *sdp_buff_pool_get(void */ void sdp_buff_pool_put(struct sdpc_buff *buff) { - unsigned long flags; - - if (!buff) - return; - - BUG_ON(buff->pool); - BUG_ON(buff->next || buff->prev); - /* - * reset pointers - */ - buff->data = buff->head; - buff->tail = buff->head; - buff->pool = &main_pool->pool; - - spin_lock_irqsave(&main_pool->lock, flags); - - if (!main_pool->pool.head) { - buff->next = buff; - buff->prev = buff; - main_pool->pool.head = buff; - } else { - buff->next = main_pool->pool.head; - buff->prev = main_pool->pool.head->prev; - - buff->next->prev = buff; - buff->prev->next = buff; - } - - main_pool->pool.size++; - - sdp_buff_pool_release_check(main_pool); - - spin_unlock_irqrestore(&main_pool->lock, flags); + sdp_buff_pool_release(buff); } /* @@ -630,20 +430,7 @@ void sdp_buff_pool_put(struct sdpc_buff */ void sdp_buff_pool_chain_link(struct sdpc_buff *head, struct sdpc_buff *buff) { - buff->data = buff->head; - buff->tail = buff->head; - buff->pool = &main_pool->pool; - - if (!head) { - buff->next = buff; - buff->prev = buff; - } else { - buff->next = head; - buff->prev = head->prev; - - buff->next->prev = buff; - buff->prev->next = buff; - } + sdp_buff_pool_release(buff); } /* @@ -651,38 +438,6 @@ void sdp_buff_pool_chain_link(struct sdp */ void sdp_buff_pool_chain_put(struct sdpc_buff *buff, u32 count) { - unsigned long flags; - struct sdpc_buff *next; - struct sdpc_buff *prev; - /* - * return an entire Link of buffers to the queue, this save on - * lock contention for the buffer pool, for code paths where - * a number of buffers are processed in a loop, before being - * returned. (e.g. send completions, recv to userspace. - */ - if (!buff || count <= 0) - return; - - spin_lock_irqsave(&main_pool->lock, flags); - - if (!main_pool->pool.head) - main_pool->pool.head = buff; - else { - prev = buff->prev; - next = main_pool->pool.head->next; - - buff->prev = main_pool->pool.head; - main_pool->pool.head->next = buff; - - prev->next = next; - next->prev = prev; - } - - main_pool->pool.size += count; - - sdp_buff_pool_release_check(main_pool); - - spin_unlock_irqrestore(&main_pool->lock, flags); } /* @@ -690,62 +445,5 @@ void sdp_buff_pool_chain_put(struct sdpc */ int sdp_buff_pool_buff_size(void) { - int result; - - if (!main_pool) - result = -1; - else - result = main_pool->buff_size; - - return result; -} - -/* - * sdp_proc_dump_buff_pool - write the buffer pool stats to a file (/proc) - */ -int sdp_proc_dump_buff_pool(char *buffer, int max_size, off_t start_index, - long *end_index) -{ - unsigned long flags; - int offset = 0; - - /* - * simple table read, without page boundry handling. - */ - *end_index = 0; - /* - * lock the table - */ - spin_lock_irqsave(&main_pool->lock, flags); - - if (!start_index) { - offset += sprintf(buffer + offset, - " buffer size: %8d\n", - main_pool->buff_size); - offset += sprintf(buffer + offset, - " buffers maximum: %8d\n", - main_pool->buff_max); - offset += sprintf(buffer + offset, - " buffers minimum: %8d\n", - main_pool->buff_min); - offset += sprintf(buffer + offset, - " buffers increment: %8d\n", - main_pool->alloc_inc); - offset += sprintf(buffer + offset, - " buffers decrement: %8d\n", - main_pool->free_mark); - offset += sprintf(buffer + offset, - " buffers allocated: %8d\n", - main_pool->buff_cur); - offset += sprintf(buffer + offset, - " buffers available: %8d\n", - main_pool->pool.size); - offset += sprintf(buffer + offset, - " buffers outstanding: %8d\n", - main_pool->buff_cur - main_pool->pool.size); - } - - spin_unlock_irqrestore(&main_pool->lock, flags); - - return offset; + return PAGE_SIZE; } Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_proc.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_proc.c 2005-09-08 15:06:20.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_proc.c 2005-09-08 15:57:45.000000000 +0300 @@ -85,12 +85,6 @@ static int sdp_proc_read_parse(char *pag static struct sdpc_proc_ent file_entry_list[SDP_PROC_ENTRIES] = { { .entry = NULL, - .type = SDP_PROC_ENTRY_MAIN_BUFF, - .name = "buffer_pool", - .read = sdp_proc_dump_buff_pool, - }, - { - .entry = NULL, .type = SDP_PROC_ENTRY_MAIN_CONN, .name = "conn_main", .read = sdp_proc_dump_conn_main, Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_proc.h =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_proc.h 2005-09-08 15:06:20.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_proc.h 2005-09-08 15:57:45.000000000 +0300 @@ -48,12 +48,11 @@ * proc filesystem framework table/file entries */ enum sdp_proc_ent_list { - SDP_PROC_ENTRY_MAIN_BUFF = 0, /* buffer pool */ - SDP_PROC_ENTRY_MAIN_CONN = 1, /* connection table */ - SDP_PROC_ENTRY_DATA_CONN = 2, /* connection table */ - SDP_PROC_ENTRY_RDMA_CONN = 3, /* connection table */ - SDP_PROC_ENTRY_OPT_CONN = 4, /* socket option table */ - SDP_PROC_ENTRY_ROOT_TABLE = 5, /* device table */ + SDP_PROC_ENTRY_MAIN_CONN = 0, /* connection table */ + SDP_PROC_ENTRY_DATA_CONN = 1, /* connection table */ + SDP_PROC_ENTRY_RDMA_CONN = 2, /* connection table */ + SDP_PROC_ENTRY_OPT_CONN = 3, /* socket option table */ + SDP_PROC_ENTRY_ROOT_TABLE = 4, /* device table */ SDP_PROC_ENTRIES /* number of entries in framework */ }; Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_proto.h =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_proto.h 2005-09-08 15:57:21.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_proto.h 2005-09-08 16:02:50.000000000 +0300 @@ -93,18 +93,10 @@ struct sdpc_buff *sdp_buff_q_fetch(struc void *arg), void *usr_arg); -int sdp_buff_pool_init(int buff_min, - int buff_max, - int alloc_inc, - int free_mark); +int sdp_buff_pool_init(void); void sdp_buff_pool_destroy(void); -int sdp_proc_dump_buff_pool(char *buffer, - int max_size, - off_t start_index, - long *end_index); - /* * Wall between userspace protocol and SDP protocol proper */ Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_dev.h =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_dev.h 2005-09-08 16:01:48.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_dev.h 2005-09-08 16:02:06.000000000 +0300 @@ -114,13 +114,6 @@ #define SDP_SEND_POST_SLOW 0x01 #define SDP_SEND_POST_COUNT 0x0A /* - * Buffer pool initialization defaul values. - */ -#define SDP_BUFF_POOL_COUNT_MIN 1024 -#define SDP_BUFF_POOL_COUNT_MAX 1048576 -#define SDP_BUFF_POOL_COUNT_INC 128 -#define SDP_BUFF_POOL_FREE_MARK 1024 -/* * SDP experimental parameters. */ Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_inet.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_inet.c 2005-09-08 15:59:32.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_inet.c 2005-09-08 16:00:28.000000000 +0300 @@ -42,10 +42,6 @@ * list of connections waiting for an incoming connection */ static int proto_family = AF_INET_SDP; -static int buff_min = SDP_BUFF_POOL_COUNT_MIN; -static int buff_max = SDP_BUFF_POOL_COUNT_MAX; -static int alloc_inc = SDP_BUFF_POOL_COUNT_INC; -static int free_mark = SDP_BUFF_POOL_FREE_MARK; static int conn_size = SDP_DEV_SK_LIST_SIZE; static int recv_post_max = SDP_CQ_RECV_SIZE; @@ -64,14 +60,6 @@ module_param(proto_family, int, 0); MODULE_PARM_DESC(proto_family, "Override the default protocol family value of 27."); -module_param(buff_min, int, 0); -MODULE_PARM_DESC(buff_min, - "Set the minimum number of buffers to allocate."); - -module_param(buff_max, int, 0); -MODULE_PARM_DESC(buff_max, - "Set the maximum number of buffers to allocate."); - module_param(conn_size, int, 0); MODULE_PARM_DESC(conn_size, "Set the maximum number of active sockets."); @@ -1366,10 +1354,7 @@ static int __init sdp_init(void) /* * buffer memory */ - result = sdp_buff_pool_init(buff_min, - buff_max, - alloc_inc, - free_mark); + result = sdp_buff_pool_init(); if (result < 0) { sdp_warn("Error <%d> initializing buffer pool.", result); goto error_buff; From halr at voltaire.com Thu Sep 8 05:21:00 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 08 Sep 2005 08:21:00 -0400 Subject: [openib-general] Re: Some OpenSM 1.8.0 Anomalies In-Reply-To: <431FD699.9060404@mellanox.co.il> References: <1126101147.4396.1405.camel@hal.voltaire.com> <431F1CC2.9020104@mellanox.co.il> <1126117797.4401.215.camel@hal.voltaire.com> <431FD699.9060404@mellanox.co.il> Message-ID: <1126182059.4401.2269.camel@hal.voltaire.com> On Thu, 2005-09-08 at 02:13, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > On Wed, 2005-09-07 at 13:00, Eitan Zahavi wrote: > > > >>>Also, I see the following messages although these should not be the case: > >>>Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 > >> > >>This means that the LID of the port registered as the source for this inform info is not recognized as a valid LID. > >> > >>>... > >>>Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 > >> > >>The meaning of this is that the incoming trap source is not a recognized (included in the SM database) guid > > > > > > Both this LID and GUID were discovered so I don't understand how this is > > the case. I can send the log separately if interested (it is over >1M > > gzipped). > I guess this trap report and inform info flow are new. > Which ULP is using it? Solaris 10. > Can you describe the flow? It looks like it occurs on SM port down which seems OK. Here's an extract of that portion of the log: Sep 06 15:41:48 724961 [B76A4C40] -> __osm_state_mgr_is_sm_port_down: ] Sep 06 15:41:48 724980 [0000] -> SM port is down. Sep 06 15:41:48 724980 [B76A4C40] -> SM port is down.Sep 06 15:41:48 725261 [B76A4C40] -> __osm_state_mgr_sm_port_down_msg: ****************************************************************** ************************** SM PORT DOWN ************************** ****************************************************************** Sep 06 15:41:48 725283 [B76A4C40] -> osm_drop_mgr_process: [ Sep 06 15:41:48 725303 [B76A4C40] -> osm_drop_mgr_process: Checking node 0x0008f1040396040c. Sep 06 15:41:48 725324 [B76A4C40] -> __osm_drop_mgr_process_node: [ Sep 06 15:41:48 725342 [B76A4C40] -> __osm_drop_mgr_process_node: Unreachable node 0x0008f1040396040c. Sep 06 15:41:48 725364 [B76A4C40] -> __osm_drop_mgr_remove_port: [ Sep 06 15:41:48 725383 [B76A4C40] -> __osm_drop_mgr_remove_port: Unreachable port 0x0008f1040396040e. Sep 06 15:41:48 725417 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing abandoned LID range [0x7,0x7]. Sep 06 15:41:48 725480 [B76A4C40] -> __osm_drop_mgr_remove_port: Unlinking local node 0x0008f1040396040c, port 0x2 and remote node 0x0008f10403960558, port 0x1. Sep 06 15:41:48 725504 [B76A4C40] -> __osm_drop_mgr_remove_port: resetting discovery count of node: 0x0008f10403960558 port num:1. Sep 06 15:41:48 725525 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing physical port number 2. Sep 06 15:41:48 725563 [B76A4C40] -> osm_report_notice: [ Sep 06 15:41:48 725583 [B76A4C40] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0003 GID:0xfe80000000000000,0x0008f10403960559 Sep 06 15:41:48 725612 [B76A4C40] -> __match_notice_to_inf_rec: [ Sep 06 15:41:48 725632 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000003 Trap=0x000004 Sep 06 15:41:48 725653 [B76A4C40] -> __match_notice_to_inf_rec: ] Sep 06 15:41:48 725671 [B76A4C40] -> __match_notice_to_inf_rec: [ Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 Sep 06 15:41:48 725710 [B76A4C40] -> __match_notice_to_inf_rec: ] Sep 06 15:41:48 725728 [B76A4C40] -> __match_notice_to_inf_rec: [ Sep 06 15:41:48 725747 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000001 Trap=0x000004 Sep 06 15:41:48 725767 [B76A4C40] -> __match_notice_to_inf_rec: ] Sep 06 15:41:48 725785 [B76A4C40] -> __match_notice_to_inf_rec: [ Sep 06 15:41:48 725804 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000002 Trap=0x000004 Sep 06 15:41:48 725823 [B76A4C40] -> __match_notice_to_inf_rec: ] Sep 06 15:41:48 725843 [B76A4C40] -> osm_report_notice: ] Sep 06 15:41:48 725862 [B76A4C40] -> Removed port with GUID:0x0008f1040396040e LID range [0x7,0x7] of node:Voltaire HCA400 Sep 06 15:41:48 725883 [B76A4C40] -> __osm_drop_mgr_remove_port: ] Sep 06 15:41:48 725904 [B76A4C40] -> __osm_drop_mgr_process_node: ] Sep 06 15:41:48 725923 [B76A4C40] -> osm_drop_mgr_process: Checking node 0x0008f10403960558. Sep 06 15:41:48 725943 [B76A4C40] -> osm_drop_mgr_process: Checking full discovery of node 0x0008f10403960558. Sep 06 15:41:48 725964 [B76A4C40] -> osm_drop_mgr_process: Checking port 0x0008f10403960559. Sep 06 15:41:48 725984 [B76A4C40] -> __osm_drop_mgr_remove_port: [ Sep 06 15:41:48 726002 [B76A4C40] -> __osm_drop_mgr_remove_port: Unreachable port 0x0008f10403960559. Sep 06 15:41:48 726023 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing abandoned LID range [0x3,0x3]. Sep 06 15:41:48 726043 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing physical port number 1. Sep 06 15:41:48 726067 [B76A4C40] -> osm_report_notice: [ Sep 06 15:41:48 726086 [B76A4C40] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0003 GID:0xfe80000000000000,0x0008f10403960559 Sep 06 15:41:48 726110 [B76A4C40] -> __match_notice_to_inf_rec: [ Sep 06 15:41:48 726129 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000003 Trap=0x000004 Sep 06 15:41:48 726149 [B76A4C40] -> __match_notice_to_inf_rec: ] Sep 06 15:41:48 726167 [B76A4C40] -> __match_notice_to_inf_rec: [ Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 Sep 06 15:41:48 726206 [B76A4C40] -> __match_notice_to_inf_rec: ] Sep 06 15:41:48 726225 [B76A4C40] -> __match_notice_to_inf_rec: [ Sep 06 15:41:48 726243 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000001 Trap=0x000004 Sep 06 15:41:48 726263 [B76A4C40] -> __match_notice_to_inf_rec: ] Sep 06 15:41:48 726281 [B76A4C40] -> __match_notice_to_inf_rec: [ Sep 06 15:41:48 726300 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000002 Trap=0x000004 Sep 06 15:41:48 726319 [B76A4C40] -> __match_notice_to_inf_rec: ] Sep 06 15:41:48 726339 [B76A4C40] -> osm_report_notice: ] Sep 06 15:41:48 726357 [B76A4C40] -> Removed port with GUID:0x0008f10403960559 LID range [0x3,0x3] of node:MT23108 InfiniHost Mellanox Technologies Sep 06 15:41:48 726378 [B76A4C40] -> __osm_drop_mgr_remove_port: ] Sep 06 15:41:48 726426 [B76A4C40] -> osm_drop_mgr_process: ] > Then if you can send us the log file it will help. I'll send you the whole log offline if you still want it. -- Hal > We should be able to at least track the inform info registration and > the trap received from the log (actually this is most what we need). > It is very possible this is a new untested flow. > We did not validate all the possible filtering of > Inform Info based reports. So there might be a simple bug in there > (NTOH maybe). > > > > > -- Hal > From eitan at mellanox.co.il Thu Sep 8 06:02:58 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 08 Sep 2005 16:02:58 +0300 Subject: [openib-general] Re: Some OpenSM 1.8.0 Anomalies In-Reply-To: <1126182059.4401.2269.camel@hal.voltaire.com> References: <1126101147.4396.1405.camel@hal.voltaire.com> <431F1CC2.9020104@mellanox.co.il> <1126117797.4401.215.camel@hal.voltaire.com> <431FD699.9060404@mellanox.co.il> <1126182059.4401.2269.camel@hal.voltaire.com> Message-ID: <43203682.50600@mellanox.co.il> Hal Rosenstock wrote: >>>>>Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 >>>> >>>>This means that the LID of the port registered as the source for this inform info is not recognized as a valid LID. >>>> >>>> >>>>>... >>>>>Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 >>>> >>>>The meaning of this is that the incoming trap source is not a recognized (included in the SM database) guid >>> >>> > It looks like it occurs on SM port down which seems OK. OK that explains it: The errors are when the SM port has turned down. In that case all the ports that were previously found on the fabric are now inaccessible. The SM should Report(Notice with trap #65) for each of these ports. For that sake it scans through the InformInfo database. Apparently an InformInfo with LID=7 has requested for this report. But LID 7 does not exist anymore - so the first message is valid: > Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 (actually this should have caused the InformInfo record to be deleted... which I do not think happening) Later we see the following error: > Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 This is sent during the section where node 0x0008f10403960559 is being teared off from the SMDB. The code in osm_inform.c say: /* Check if there is a pkey match. o13-17.1.1*/ /* Check if the issuer of the trap is the SM. If it is, then the pkey comparison should be done on the trap source (saved as the gid in the data details field). If the issuer gid is not the SM - then it is the guid of the trap source. */ if ( (cl_ntoh64(p_ntc->issuer_gid.unicast.prefix) == p_subn->opt.subnet_prefix) && (cl_ntoh64(p_ntc->issuer_gid.unicast.interface_id) == p_subn->sm_port_guid) ) { /* The issuer is the SM this is trap 64-67 - compare the pkey with the gid saved on the data details. */ source_gid = p_ntc->data_details.ntc_64_67.gid; } else { source_gid = p_ntc->issuer_gid; } In our case the trap is 65 and sent by the SM. However, the spec required to check the tear down port and the target of the Report will share a PKey. In out case the source of the event is considered to be the port that is tear down. (As we want to prevent any case where port not sharing PKey will get reports on each other). But since the "source" port is being teared down we can not find it's PKey table ... (actually we look first in the Port by LID table - and can not find it). This means we will never send Report(Notice trap#65) to any node. How do we solve that bug? Maybe we have a way to find the "source" port PKey that is not yet corrupted. > Here's an > extract of that portion of the log: > > Sep 06 15:41:48 724961 [B76A4C40] -> __osm_state_mgr_is_sm_port_down: ] > Sep 06 15:41:48 724980 [0000] -> SM port is down. > Sep 06 15:41:48 724980 [B76A4C40] -> SM port is down.Sep 06 15:41:48 725261 [B76A4C40] -> __osm_state_mgr_sm_port_down_msg: > > > ****************************************************************** > ************************** SM PORT DOWN ************************** > ****************************************************************** > > > Sep 06 15:41:48 725283 [B76A4C40] -> osm_drop_mgr_process: [ > Sep 06 15:41:48 725303 [B76A4C40] -> osm_drop_mgr_process: Checking node 0x0008f1040396040c. > Sep 06 15:41:48 725324 [B76A4C40] -> __osm_drop_mgr_process_node: [ > Sep 06 15:41:48 725342 [B76A4C40] -> __osm_drop_mgr_process_node: Unreachable node 0x0008f1040396040c. > Sep 06 15:41:48 725364 [B76A4C40] -> __osm_drop_mgr_remove_port: [ > Sep 06 15:41:48 725383 [B76A4C40] -> __osm_drop_mgr_remove_port: Unreachable port 0x0008f1040396040e. > Sep 06 15:41:48 725417 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing abandoned LID range [0x7,0x7]. > Sep 06 15:41:48 725480 [B76A4C40] -> __osm_drop_mgr_remove_port: Unlinking local node 0x0008f1040396040c, port 0x2 > and remote node 0x0008f10403960558, port 0x1. > Sep 06 15:41:48 725504 [B76A4C40] -> __osm_drop_mgr_remove_port: resetting discovery count of node: 0x0008f10403960558 port num:1. > Sep 06 15:41:48 725525 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing physical port number 2. > Sep 06 15:41:48 725563 [B76A4C40] -> osm_report_notice: [ > Sep 06 15:41:48 725583 [B76A4C40] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0003 GID:0xfe80000000000000,0x0008f10403960559 > Sep 06 15:41:48 725612 [B76A4C40] -> __match_notice_to_inf_rec: [ > Sep 06 15:41:48 725632 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000003 Trap=0x000004 > Sep 06 15:41:48 725653 [B76A4C40] -> __match_notice_to_inf_rec: ] > Sep 06 15:41:48 725671 [B76A4C40] -> __match_notice_to_inf_rec: [ > Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 > Sep 06 15:41:48 725710 [B76A4C40] -> __match_notice_to_inf_rec: ] > Sep 06 15:41:48 725728 [B76A4C40] -> __match_notice_to_inf_rec: [ > Sep 06 15:41:48 725747 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000001 Trap=0x000004 > Sep 06 15:41:48 725767 [B76A4C40] -> __match_notice_to_inf_rec: ] > Sep 06 15:41:48 725785 [B76A4C40] -> __match_notice_to_inf_rec: [ > Sep 06 15:41:48 725804 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000002 Trap=0x000004 > Sep 06 15:41:48 725823 [B76A4C40] -> __match_notice_to_inf_rec: ] > Sep 06 15:41:48 725843 [B76A4C40] -> osm_report_notice: ] > Sep 06 15:41:48 725862 [B76A4C40] -> Removed port with GUID:0x0008f1040396040e LID range [0x7,0x7] of node:Voltaire HCA400 > Sep 06 15:41:48 725883 [B76A4C40] -> __osm_drop_mgr_remove_port: ] > Sep 06 15:41:48 725904 [B76A4C40] -> __osm_drop_mgr_process_node: ] > Sep 06 15:41:48 725923 [B76A4C40] -> osm_drop_mgr_process: Checking node 0x0008f10403960558. > Sep 06 15:41:48 725943 [B76A4C40] -> osm_drop_mgr_process: Checking full discovery of node 0x0008f10403960558. > Sep 06 15:41:48 725964 [B76A4C40] -> osm_drop_mgr_process: Checking port 0x0008f10403960559. > Sep 06 15:41:48 725984 [B76A4C40] -> __osm_drop_mgr_remove_port: [ > Sep 06 15:41:48 726002 [B76A4C40] -> __osm_drop_mgr_remove_port: Unreachable port 0x0008f10403960559. > Sep 06 15:41:48 726023 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing abandoned LID range [0x3,0x3]. > Sep 06 15:41:48 726043 [B76A4C40] -> __osm_drop_mgr_remove_port: Clearing physical port number 1. > Sep 06 15:41:48 726067 [B76A4C40] -> osm_report_notice: [ > Sep 06 15:41:48 726086 [B76A4C40] -> osm_report_notice: Reporting Generic Notice type:3 num:65 from LID:0x0003 GID:0xfe80000000000000,0x0008f10403960559 > Sep 06 15:41:48 726110 [B76A4C40] -> __match_notice_to_inf_rec: [ > Sep 06 15:41:48 726129 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000003 Trap=0x000004 > Sep 06 15:41:48 726149 [B76A4C40] -> __match_notice_to_inf_rec: ] > Sep 06 15:41:48 726167 [B76A4C40] -> __match_notice_to_inf_rec: [ > Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 > Sep 06 15:41:48 726206 [B76A4C40] -> __match_notice_to_inf_rec: ] > Sep 06 15:41:48 726225 [B76A4C40] -> __match_notice_to_inf_rec: [ > Sep 06 15:41:48 726243 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000001 Trap=0x000004 > Sep 06 15:41:48 726263 [B76A4C40] -> __match_notice_to_inf_rec: ] > Sep 06 15:41:48 726281 [B76A4C40] -> __match_notice_to_inf_rec: [ > Sep 06 15:41:48 726300 [B76A4C40] -> __match_notice_to_inf_rec: Mismatch by Node Type: II=0x000002 Trap=0x000004 > Sep 06 15:41:48 726319 [B76A4C40] -> __match_notice_to_inf_rec: ] > Sep 06 15:41:48 726339 [B76A4C40] -> osm_report_notice: ] > Sep 06 15:41:48 726357 [B76A4C40] -> Removed port with GUID:0x0008f10403960559 LID range [0x3,0x3] of node:MT23108 InfiniHost Mellanox Technologies > Sep 06 15:41:48 726378 [B76A4C40] -> __osm_drop_mgr_remove_port: ] > Sep 06 15:41:48 726426 [B76A4C40] -> osm_drop_mgr_process: ] > > >>Then if you can send us the log file it will help. > > > I'll send you the whole log offline if you still want it. No no need to. From thomas.duffy.99 at alumni.brown.edu Thu Sep 8 08:20:03 2005 From: thomas.duffy.99 at alumni.brown.edu (Tom Duffy) Date: Thu, 8 Sep 2005 08:20:03 -0700 Subject: [openib-general] Re: sdp: kill sdp buff pool In-Reply-To: <20050908114156.GI19358@mellanox.co.il> References: <20050908114156.GI19358@mellanox.co.il> Message-ID: On Sep 8, 2005, at 4:41 AM, Michael S. Tsirkin wrote: > SDP seems to have a re-implementation of kmem_cache in sdp_buff. > Killing it actually results in a small bandwidth increase > for both bcopy and zcopy versions, so I plan to put this in. > Can you please post some numbers? > Note that sdp_buff_pool_chain_link is now same as sdp_buff_pool_put, > and sdp_buff_pool_chain_put is now a nop, but I decided to > keep it in for now, just in case there's another use for it. > like what? > sdp_buff.c | 388 +++++ > +------------------------------------------------------ > sdp_dev.h | 7 - > sdp_inet.c | 17 -- > sdp_proc.c | 6 > sdp_proc.h | 11 - > sdp_proto.h | 10 - > 6 files changed, 50 insertions(+), 389 deletions(-) > I love patches like this. Much better than what I had done ;) -tduffy From mst at mellanox.co.il Thu Sep 8 09:18:49 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 8 Sep 2005 19:18:49 +0300 Subject: [openib-general] Re: sdp: kill sdp buff pool In-Reply-To: References: <20050908114156.GI19358@mellanox.co.il> Message-ID: <20050908161849.GA21522@mellanox.co.il> Quoting r. Tom Duffy : > Subject: Re: sdp: kill sdp buff pool > > > On Sep 8, 2005, at 4:41 AM, Michael S. Tsirkin wrote: > > > >SDP seems to have a re-implementation of kmem_cache in sdp_buff. > >Killing it actually results in a small bandwidth increase > >for both bcopy and zcopy versions, so I plan to put this in. > > > > Can you please post some numbers? Up to 720 from around 700 MB/sec with 128KB buffers. > > >Note that sdp_buff_pool_chain_link is now same as sdp_buff_pool_put, > >and sdp_buff_pool_chain_put is now a nop, but I decided to > >keep it in for now, just in case there's another use for it. > > > > like what? Another re-implementation of kmem_cache, I guess :) Do you prefer replacing them all with sdp_buff_pool_put? > > sdp_buff.c | 388 +++++ > >+------------------------------------------------------ > > sdp_dev.h | 7 - > > sdp_inet.c | 17 -- > > sdp_proc.c | 6 > > sdp_proc.h | 11 - > > sdp_proto.h | 10 - > > 6 files changed, 50 insertions(+), 389 deletions(-) > > > > I love patches like this. Much better than what I had done ;) > > -tduffy > Yep, simple replacing of sdp lists with list_head (mega patch that I posted previously) hurts performance by some 10%, and I didnt yet figure why. -- MST From rolandd at cisco.com Thu Sep 8 12:38:08 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 08 Sep 2005 12:38:08 -0700 Subject: [openib-general] [PATCH] CM ACK timeout fix Message-ID: <5264tbjqov.fsf@cisco.com> Sean, is the patch below for the CM correct? Unless I'm mistaken, the current code always forces the ACK timeout to be 31 (~2 hours). - R. --- infiniband/core/cm.c (revision 3327) +++ infiniband/core/cm.c (working copy) @@ -814,7 +814,7 @@ static void cm_format_req(struct cm_req_ cm_req_set_primary_sl(req_msg, param->primary_path->sl); cm_req_set_primary_subnet_local(req_msg, 1); /* local only... */ cm_req_set_primary_local_ack_timeout(req_msg, - min(31, param->primary_path->packet_life_time + 1)); + max(31, param->primary_path->packet_life_time + 1)); if (param->alternate_path) { req_msg->alt_local_lid = param->alternate_path->slid; @@ -829,7 +829,7 @@ static void cm_format_req(struct cm_req_ cm_req_set_alt_sl(req_msg, param->alternate_path->sl); cm_req_set_alt_subnet_local(req_msg, 1); /* local only... */ cm_req_set_alt_local_ack_timeout(req_msg, - min(31, param->alternate_path->packet_life_time + 1)); + max(31, param->alternate_path->packet_life_time + 1)); } if (param->private_data && param->private_data_len) @@ -2257,7 +2257,7 @@ static void cm_format_lap(struct cm_lap_ cm_lap_set_sl(lap_msg, alternate_path->sl); cm_lap_set_subnet_local(lap_msg, 1); /* local only... */ cm_lap_set_local_ack_timeout(lap_msg, - min(31, alternate_path->packet_life_time + 1)); + max(31, alternate_path->packet_life_time + 1)); if (private_data && private_data_len) memcpy(lap_msg->private_data, private_data, private_data_len); From mshefty at ichips.intel.com Thu Sep 8 12:58:03 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 08 Sep 2005 12:58:03 -0700 Subject: [openib-general] [PATCH] CM ACK timeout fix In-Reply-To: <5264tbjqov.fsf@cisco.com> References: <5264tbjqov.fsf@cisco.com> Message-ID: <432097CB.30009@ichips.intel.com> Roland Dreier wrote: > --- infiniband/core/cm.c (revision 3327) > +++ infiniband/core/cm.c (working copy) > @@ -814,7 +814,7 @@ static void cm_format_req(struct cm_req_ > cm_req_set_primary_sl(req_msg, param->primary_path->sl); > cm_req_set_primary_subnet_local(req_msg, 1); /* local only... */ > cm_req_set_primary_local_ack_timeout(req_msg, > - min(31, param->primary_path->packet_life_time + 1)); > + max(31, param->primary_path->packet_life_time + 1)); Wouldn't using max increase the timeout (to 31)? - Sean From rolandd at cisco.com Thu Sep 8 12:59:49 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 08 Sep 2005 12:59:49 -0700 Subject: [openib-general] [PATCH] CM ACK timeout fix In-Reply-To: <5264tbjqov.fsf@cisco.com> (Roland Dreier's message of "Thu, 08 Sep 2005 12:38:08 -0700") References: <5264tbjqov.fsf@cisco.com> Message-ID: <521x3zjpoq.fsf@cisco.com> Never mind ... I am completely mistaken :) From sean.hefty at intel.com Thu Sep 8 15:56:12 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 8 Sep 2005 15:56:12 -0700 Subject: [openib-general] [PATCH] [DAPL] update to match new event processing APIs Message-ID: The following patch updates DAPL to match the verbs and CM event processing APIs. Signed-off-by: Sean Hefty Index: dapl/openib/dapl_ib_util.c =================================================================== --- dapl/openib/dapl_ib_util.c (revision 3342) +++ dapl/openib/dapl_ib_util.c (working copy) @@ -626,7 +626,7 @@ void dapli_async_event_cb(struct _ib_hca break; } } - ibv_put_async_event(&event); + ibv_ack_async_event(&event); } } Index: dapl/openib/dapl_ib_cm.c =================================================================== --- dapl/openib/dapl_ib_cm.c (revision 3342) +++ dapl/openib/dapl_ib_cm.c (working copy) @@ -1199,7 +1199,7 @@ void dapli_cm_event_cb() dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " dapli_cm_event()\n"); /* process one CM event, fairness */ - if(!ib_cm_event_get_timed(0,&event)) { + if(!ib_cm_get_event_timed(0,&event)) { struct dapl_cm_id *conn; int ret; dapl_dbg_log(DAPL_DBG_TYPE_CM, @@ -1215,7 +1215,7 @@ void dapli_cm_event_cb() else ret = dapli_cm_active_cb(conn,event); - ib_cm_event_put(event); + ib_cm_ack_event(event); if (ret) ib_cm_destroy_id(conn->cm_id); Index: dapl/openib/dapl_ib_cq.c =================================================================== --- dapl/openib/dapl_ib_cq.c (revision 3342) +++ dapl/openib/dapl_ib_cq.c (working copy) @@ -71,10 +71,6 @@ void dapli_cq_event_cb(struct _ib_hca_tr (!ibv_get_cq_event(hca->ib_ctx, i, &ibv_cq, (void*)&evd_ptr))) { - /* - * TODO: ibv put event to protect against - * destroy CQ race conditions? - */ if (DAPL_BAD_HANDLE(evd_ptr, DAPL_MAGIC_EVD)) continue; @@ -82,6 +78,8 @@ void dapli_cq_event_cb(struct _ib_hca_tr dapl_evd_dto_callback ( hca->ib_ctx, evd_ptr->ib_cq_handle, (void*)evd_ptr ); + + ibv_ack_cq_events(ibv_cq, 1); } } } From thomas.duffy.99 at alumni.brown.edu Thu Sep 8 17:54:20 2005 From: thomas.duffy.99 at alumni.brown.edu (Tom Duffy) Date: Thu, 8 Sep 2005 17:54:20 -0700 Subject: [openib-general] Re: sdp: kill sdp buff pool In-Reply-To: <20050908161849.GA21522@mellanox.co.il> References: <20050908114156.GI19358@mellanox.co.il> <20050908161849.GA21522@mellanox.co.il> Message-ID: On Sep 8, 2005, at 9:18 AM, Michael S. Tsirkin wrote: > Up to 720 from around 700 MB/sec with 128KB buffers. Nice, not shabby at all. > Another re-implementation of kmem_cache, I guess :) > Do you prefer replacing them all with sdp_buff_pool_put? Yeah, I think so. Simpler... > > Yep, simple replacing of sdp lists with list_head (mega patch that > I posted > previously) hurts performance by some 10%, and I didnt yet figure why. Which patch was that? Did that include the sdp_buff.[ch] changes I posted? I don't remember that going across the list and a quick search didn't illuminate anything. -tduffy From kingman at storagegear.com Thu Sep 8 20:43:50 2005 From: kingman at storagegear.com (John Kingman) Date: Thu, 8 Sep 2005 22:43:50 -0500 (CDT) Subject: [openib-general] Re: [PATCH] [CM] 1/2 Fix CM redirection In-Reply-To: References: Message-ID: Here's an updated change. Took the suggestions and it should be cleaner. Not the be-all, end-all answer for redirection, of course, but it works for my situation. Any chance of inclusion as an interim fix? Signed-off-by: John Kingman storagegear.com> Index: cm.c =================================================================== --- cm.c (revision 3344) +++ cm.c (working copy) @@ -173,7 +173,8 @@ static int cm_alloc_msg(struct cm_id_pri if (IS_ERR(ah)) return PTR_ERR(ah); - m = ib_create_send_mad(mad_agent, 1, cm_id_priv->av.pkey_index, + m = ib_create_send_mad(mad_agent, cm_id_priv->id.qpn, + cm_id_priv->av.pkey_index, ah, 0, sizeof(struct ib_mad_hdr), sizeof(struct ib_mad)-sizeof(struct ib_mad_hdr), GFP_ATOMIC); @@ -536,6 +537,7 @@ struct ib_cm_id *ib_create_cm_id(ib_cm_h cm_id_priv->id.state = IB_CM_IDLE; cm_id_priv->id.cm_handler = cm_handler; cm_id_priv->id.context = context; + cm_id_priv->id.qpn = 1; ret = cm_alloc_id(cm_id_priv); if (ret) goto error; Index: ib_cm.h =================================================================== --- ib_cm.h (revision 3344) +++ ib_cm.h (working copy) @@ -290,6 +290,7 @@ struct ib_cm_id { enum ib_cm_lap_state lap_state; /* internal CM/debug use */ __be32 local_id; __be32 remote_id; + u32 qpn; /* will be 1 unless redirected */ }; /** From kingman at storagegear.com Thu Sep 8 20:44:01 2005 From: kingman at storagegear.com (John Kingman) Date: Thu, 8 Sep 2005 22:44:01 -0500 (CDT) Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: References: Message-ID: SRP part of updated redirection change. Any chance of inclusion as an interim fix? Signed-off-by: John Kingman storagegear.com> Index: ib_srp.h =================================================================== --- ib_srp.h (revision 3344) +++ ib_srp.h (working copy) @@ -49,6 +49,7 @@ enum { SRP_PORT_REDIRECT = 1, + SRP_PORT_REDIRECT = 2, SRP_MAX_IU_LEN = 256, @@ -316,4 +317,24 @@ struct srp_rsp { u8 data[0]; }; +struct class_port_info +{ + u8 base_version; + u8 class_version; + __be16 capability_mask; + __be32 resp_time_value; + u8 redirect_gid[16]; + __be32 redirect_tcslfl; + __be16 redirect_lid; + __be16 redirect_p_key; + __be32 redirect_qp; + __be32 redirect_q_key; + u8 trap_gid[16]; + __be32 trap_tcslfl; + __be16 trap_lid; + __be16 trap_p_key; + __be32 trap_hlqp; + __be32 trap_q_key; +}; + #endif /* IB_SRP_H */ Index: ib_srp.c =================================================================== --- ib_srp.c (revision 3344) +++ ib_srp.c (working copy) @@ -875,6 +875,7 @@ static int srp_cm_handler(struct ib_cm_i { struct srp_target_port *target = cm_id->context; struct ib_qp_attr *qp_attr = NULL; + struct class_port_info *cpi; int attr_mask = 0; int comp = 0; int ret = 0; @@ -944,17 +945,34 @@ static int srp_cm_handler(struct ib_cm_i case IB_CM_REJ_RECEIVED: printk(KERN_DEBUG PFX "REJ received\n"); comp = 1; + cpi = (struct class_port_info *)event->param.rej_rcvd.ari; if (event->param.rej_rcvd.reason == IB_CM_REJ_PORT_CM_REDIRECT) { - /* - * Additional Reject Info contains - * ClassPortInfo, which has the RedirectGID - * field at an offset of 8 bytes. + /* + * Additional Reject Info contains ClassPortInfo, of which + * we need the RedirectGID, RedirectLID, RedirectP_Key, and + * the RedirectQP fields. */ - memcpy(target->path.dgid.raw, - event->param.rej_rcvd.ari + 8, 16); + target->path.dlid = cpi->redirect_lid; + target->path.pkey = cpi->redirect_p_key; + cm_id->qpn = be32_to_cpu(cpi->redirect_qp) & 0x00ffffff; + if (target->path.dlid) { + /* + * If RedirectLID is non-zero, it is the DLID a + * requester shall use to access the class services. + */ + target->status = SRP_DLID_REDIRECT; + } else { + /* + * If the RedirectLID value is zero, the redirect + * requires the requester to use the supplied + * RedirectGID to request further path resolution + * from subnet administration. + */ + memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); - target->status = SRP_PORT_REDIRECT; + target->status = SRP_PORT_REDIRECT; + } } else if (topspin_workarounds && !memcmp(&target->ioc_guid, topspin_oui, 3) && event->param.rej_rcvd.reason == IB_CM_REJ_PORT_REDIRECT) { @@ -963,8 +981,7 @@ static int srp_cm_handler(struct ib_cm_i * reject reason code 25 when they mean 24 * (port redirect). */ - memcpy(target->path.dgid.raw, - event->param.rej_rcvd.ari + 0, 16); + memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), @@ -1333,6 +1350,7 @@ retry_path: goto err; } +retry_send: init_completion(&target->done); ret = srp_send_req(target); if (ret) @@ -1341,10 +1359,13 @@ retry_path: /* * The CM event handling code will set status to - * SRP_PORT_REDIRECT if we get a port redirect REJ back. + * SRP_PORT_REDIRECT if we get a port redirect REJ back, + * or SRP_DLID_REDIRECT if we get a lid/qp redirect REJ back. */ if (target->status == SRP_PORT_REDIRECT) goto retry_path; + else if (target->status == SRP_DLID_REDIRECT) + goto retry_send; else if (target->status < 0) { printk(KERN_ERR PFX "Connection failed\n"); ret = target->status; From chrkeypqedeo at go.com Thu Sep 8 19:11:49 2005 From: chrkeypqedeo at go.com (Kevin Pennington) Date: Fri, 9 Sep 2005 05:11:49 +0300 Subject: [openib-general] re: 27. Message-ID: <420a830m.4612000@go.com> We are happy to present you with six deals from four different brokers. Please remember that there is no commitment required on your part, and your credit is not an issue. Please validate your information with our secure and private database to ensure our records are up to date and accurate. http://her0es.net/p1.asp Have a good day. Sincerely, Kevin Pennington Customer Service Rep eLGD Inc. lawrencium on buyer ! ! roxbury may see frau try some increasable be in invasion but the rankle , try craftsman notmay zoo !. videotape but ripple it's on celanese ! the hartford be some quicksilver or on iffy but see thruway on a radiogram andsee forbearance may. From rwnmfztqmvz at shawcable.net Thu Sep 8 08:24:27 2005 From: rwnmfztqmvz at shawcable.net (Milagros Slater) Date: Thu, 8 Sep 2005 14:24:27 -0100 Subject: [openib-general] Ever want a Rolex? Try a replica first! Message-ID: <848j309u.3912517@shawcable.net> Replicated down to the smallest detail, with the signature green sticket and the serial number on the back of the watch. Dont settle for anything less. We offer the best, for the lowest possible price. No battery, they charge themselves as you move, just like an original.. http://mybesttimewatches.com/ Have a good day. Sincerely, Milagros Slater lorelei the southeast it ! besetting on but niche it try perseverance on it belle but it lordosis or ! rotc somemay recriminate try. valley ! trouble , or candid try try churchgoing try see wedlock not some tompkins in it ernie ! and bourbaki besee sewn the. From halr at voltaire.com Fri Sep 9 05:09:52 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 08:09:52 -0400 Subject: [openib-general] Re: when executing sminfo with a port in down state, there is a retur n value 0 In-Reply-To: <506C3D7B14CDD411A52C00025558DED6089DBA60@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED6089DBA60@mtlex01.yok.mtl.com> Message-ID: <1126267791.4401.5914.camel@hal.voltaire.com> Hi Dotan, On Tue, 2005-08-23 at 02:33, Dotan Barak wrote: > I'm working with gen2 svn rev. 3155 with 2 Mellanox HCAs (23108) (1 on > each host; they are connected b2b: port 1 to port 1). > > I executed opensm on host 1, port 1. > When i executed sminfo on host 2 port 1 everything was as expected > (return value = 0). > > I killed the opensm > When i executed sminfo on host 2 port 1 everything was as expected > (return value = 255). > > When i executed sminfo on host 2 port 2 everything i got 0 (i expected > to get return value = 255). > > Port 2 in host 2 was down, so i don't know why i got the return value > 0. I just tried this and got 255. Can you try this again ? > here is the output: > > host2:~ # /usr/local/bin/sminfo -C mthca0 -P 2 > sminfo: sm lid 0x0 sm guid 0x8200000000, activity count 0 priority 0 > state SMINFO_NOTACT 0 > host2:~ # echo $? > 0 > host2:~ # /usr/local/bin/sminfo -C mthca0 -P 2 > sminfo: sm lid 0x0 sm guid 0x0, activity count 0 priority 0 state > SMINFO_STANDBY 2 > host2:~ # echo $? > 0 It looks like 0 is set because there is some SMInfo response but I don't understand how that would be the case. sminfo can use either DR or LR. This form (without -D) uses LR so that shouldn't work if the port is down. In either case, the other end wouldn't respond if there is no SM there. Also, the GUID looks suspicious. > host2:~ # vstat > hca_id: mthca0 > phys_port_cnt: 2 > port: 1 > state: PORT_ACTIVE (4) > max_mtu: 0 (0) > active_mtu: 0 (0) > sm_lid: 1 > port_lid: 2 > port_lmc: 0x00 > > port: 2 > state: PORT_DOWN (1) > max_mtu: 0 (0) > active_mtu: 0 (0) > sm_lid: 0 > port_lid: 0 > port_lmc: 0x00 How about ibstat or ibstatus ? > can you please help me with this issue? If you can reproduce this, not sure what is different about your setup. Is port 2 on host 2 cabled to anything ? -- Hal From halr at voltaire.com Fri Sep 9 06:44:42 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 09:44:42 -0400 Subject: [openib-general] osmtest/OpenSM: ServiceGID and busy status Message-ID: <1126273481.4401.6431.camel@hal.voltaire.com> Hi, A couple of things about osmtest (and one is related to OpenSM): 1. It appears that osmt_service.c sets ServiceRecords with the subnet prefix of the ServiceGID set to 0 ? Is that the correct thing to do (from an osmtest perspective) ? ServiceGID..............0x0000000000000000 : 0x0008f10403960559 More importantly, should the SM allow this (is this a valid GID) ? Shouldn't it match one of the GIDs for that port that is setting the ServiceRecord ? 2. In general in osmtest (and other SA client code using the vendor layer), when a remote error is indicated (MAD status != success), this is indicated as a remote error. It appears that the various clients/applications (osmtest is one) is not dealing with BUSY which can be returned by an SM. -- Hal From kingman at storagegear.com Fri Sep 9 07:22:59 2005 From: kingman at storagegear.com (John Kingman) Date: Fri, 9 Sep 2005 09:22:59 -0500 (CDT) Subject: [openib-general] Re: [PATCH] [CM] 1/2 Fix CM redirection Message-ID: Second attempt; sent this last night. I apologize if it is a duplicate. Here's an updated change. Took the suggestions and it should be cleaner. Not the be-all, end-all answer for redirection, of course, but it works for my situation. Any chance of inclusion as an interim fix? Signed-off-by: John Kingman storagegear.com> Index: cm.c =================================================================== --- cm.c (revision 3344) +++ cm.c (working copy) @@ -173,7 +173,8 @@ static int cm_alloc_msg(struct cm_id_pri if (IS_ERR(ah)) return PTR_ERR(ah); - m = ib_create_send_mad(mad_agent, 1, cm_id_priv->av.pkey_index, + m = ib_create_send_mad(mad_agent, cm_id_priv->id.qpn, + cm_id_priv->av.pkey_index, ah, 0, sizeof(struct ib_mad_hdr), sizeof(struct ib_mad)-sizeof(struct ib_mad_hdr), GFP_ATOMIC); @@ -536,6 +537,7 @@ struct ib_cm_id *ib_create_cm_id(ib_cm_h cm_id_priv->id.state = IB_CM_IDLE; cm_id_priv->id.cm_handler = cm_handler; cm_id_priv->id.context = context; + cm_id_priv->id.qpn = 1; ret = cm_alloc_id(cm_id_priv); if (ret) goto error; Index: ib_cm.h =================================================================== --- ib_cm.h (revision 3344) +++ ib_cm.h (working copy) @@ -290,6 +290,7 @@ struct ib_cm_id { enum ib_cm_lap_state lap_state; /* internal CM/debug use */ __be32 local_id; __be32 remote_id; + u32 qpn; /* will be 1 unless redirected */ }; /** From kingman at storagegear.com Fri Sep 9 07:24:06 2005 From: kingman at storagegear.com (John Kingman) Date: Fri, 9 Sep 2005 09:24:06 -0500 (CDT) Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP (fwd) Message-ID: Second attempt; sent this last night. I apologize if it is a duplicate. SRP part of updated redirection change. Any chance of inclusion as an interim fix? Signed-off-by: John Kingman storagegear.com> Index: ib_srp.h =================================================================== --- ib_srp.h (revision 3344) +++ ib_srp.h (working copy) @@ -49,6 +49,7 @@ enum { SRP_PORT_REDIRECT = 1, + SRP_PORT_REDIRECT = 2, SRP_MAX_IU_LEN = 256, @@ -316,4 +317,24 @@ struct srp_rsp { u8 data[0]; }; +struct class_port_info +{ + u8 base_version; + u8 class_version; + __be16 capability_mask; + __be32 resp_time_value; + u8 redirect_gid[16]; + __be32 redirect_tcslfl; + __be16 redirect_lid; + __be16 redirect_p_key; + __be32 redirect_qp; + __be32 redirect_q_key; + u8 trap_gid[16]; + __be32 trap_tcslfl; + __be16 trap_lid; + __be16 trap_p_key; + __be32 trap_hlqp; + __be32 trap_q_key; +}; + #endif /* IB_SRP_H */ Index: ib_srp.c =================================================================== --- ib_srp.c (revision 3344) +++ ib_srp.c (working copy) @@ -875,6 +875,7 @@ static int srp_cm_handler(struct ib_cm_i { struct srp_target_port *target = cm_id->context; struct ib_qp_attr *qp_attr = NULL; + struct class_port_info *cpi; int attr_mask = 0; int comp = 0; int ret = 0; @@ -944,17 +945,34 @@ static int srp_cm_handler(struct ib_cm_i case IB_CM_REJ_RECEIVED: printk(KERN_DEBUG PFX "REJ received\n"); comp = 1; + cpi = (struct class_port_info *)event->param.rej_rcvd.ari; if (event->param.rej_rcvd.reason == IB_CM_REJ_PORT_CM_REDIRECT) { - /* - * Additional Reject Info contains - * ClassPortInfo, which has the RedirectGID - * field at an offset of 8 bytes. + /* + * Additional Reject Info contains ClassPortInfo, of which + * we need the RedirectGID, RedirectLID, RedirectP_Key, and + * the RedirectQP fields. */ - memcpy(target->path.dgid.raw, - event->param.rej_rcvd.ari + 8, 16); + target->path.dlid = cpi->redirect_lid; + target->path.pkey = cpi->redirect_p_key; + cm_id->qpn = be32_to_cpu(cpi->redirect_qp) & 0x00ffffff; + if (target->path.dlid) { + /* + * If RedirectLID is non-zero, it is the DLID a + * requester shall use to access the class services. + */ + target->status = SRP_DLID_REDIRECT; + } else { + /* + * If the RedirectLID value is zero, the redirect + * requires the requester to use the supplied + * RedirectGID to request further path resolution + * from subnet administration. + */ + memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); - target->status = SRP_PORT_REDIRECT; + target->status = SRP_PORT_REDIRECT; + } } else if (topspin_workarounds && !memcmp(&target->ioc_guid, topspin_oui, 3) && event->param.rej_rcvd.reason == IB_CM_REJ_PORT_REDIRECT) { @@ -963,8 +981,7 @@ static int srp_cm_handler(struct ib_cm_i * reject reason code 25 when they mean 24 * (port redirect). */ - memcpy(target->path.dgid.raw, - event->param.rej_rcvd.ari + 0, 16); + memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), @@ -1333,6 +1350,7 @@ retry_path: goto err; } +retry_send: init_completion(&target->done); ret = srp_send_req(target); if (ret) @@ -1341,10 +1359,13 @@ retry_path: /* * The CM event handling code will set status to - * SRP_PORT_REDIRECT if we get a port redirect REJ back. + * SRP_PORT_REDIRECT if we get a port redirect REJ back, + * or SRP_DLID_REDIRECT if we get a lid/qp redirect REJ back. */ if (target->status == SRP_PORT_REDIRECT) goto retry_path; + else if (target->status == SRP_DLID_REDIRECT) + goto retry_send; else if (target->status < 0) { printk(KERN_ERR PFX "Connection failed\n"); ret = target->status; From halr at voltaire.com Fri Sep 9 07:31:47 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 10:31:47 -0400 Subject: [openib-general] [PATCH] ib_sa.h: Define more SA methods Message-ID: <1126276306.4401.6563.camel@hal.voltaire.com> ib_sa.h: Define more SA methods (initially for madeye decode) Signed-off-by: Hal Rosenstock Index: ib_sa.h =================================================================== --- ib_sa.h (revision 3342) +++ ib_sa.h (working copy) @@ -46,7 +46,11 @@ enum { IB_SA_METHOD_GET_TABLE = 0x12, IB_SA_METHOD_GET_TABLE_RESP = 0x92, - IB_SA_METHOD_DELETE = 0x15 + IB_SA_METHOD_DELETE = 0x15, + IB_SA_METHOD_DELETE_RESP = 0x95, + IB_SA_METHOD_GET_MULTI = 0x14, + IB_SA_METHOD_GET_MULTI_RESP = 0x94, + IB_SA_METHOD_GET_TRACE_TBL = 0x13 }; enum ib_sa_selector { From rolandd at cisco.com Fri Sep 9 08:32:35 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 08:32:35 -0700 Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP References: Message-ID: <528xy6i7e4.fsf@cisco.com> > SRP_PORT_REDIRECT = 1, > + SRP_PORT_REDIRECT = 2, You must never have compiled this ;) > +struct class_port_info > +{ > + u8 base_version; > + u8 class_version; > + __be16 capability_mask; > + __be32 resp_time_value; > + u8 redirect_gid[16]; > + __be32 redirect_tcslfl; > + __be16 redirect_lid; > + __be16 redirect_p_key; > + __be32 redirect_qp; > + __be32 redirect_q_key; > + u8 trap_gid[16]; > + __be32 trap_tcslfl; > + __be16 trap_lid; > + __be16 trap_p_key; > + __be32 trap_hlqp; > + __be32 trap_q_key; > +}; I think this belongs in ib_mad.h, not ib_srp.h. Other than this, I'm OK with this approach if Sean is OK with the CM changes. Unfortunately I'm just about to land some changes to the connection code to handle reconnecting for host resets, so you'll have to rebase your patches. Once I get that done (should happen today), please respin your patch, test with your target and resend. Thanks, Roland From halr at voltaire.com Fri Sep 9 08:29:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 11:29:46 -0400 Subject: [openib-general] Re: Some OpenSM 1.8.0 Anomalies In-Reply-To: <43203682.50600@mellanox.co.il> References: <1126101147.4396.1405.camel@hal.voltaire.com> <431F1CC2.9020104@mellanox.co.il> <1126117797.4401.215.camel@hal.voltaire.com> <431FD699.9060404@mellanox.co.il> <1126182059.4401.2269.camel@hal.voltaire.com> <43203682.50600@mellanox.co.il> Message-ID: <1126279785.4401.6739.camel@hal.voltaire.com> On Thu, 2005-09-08 at 09:02, Eitan Zahavi wrote: > Hal Rosenstock wrote: > >>>>>Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 > >>>> > >>>>This means that the LID of the port registered as the source for this inform info is not recognized as a valid LID. > >>>> > >>>> > >>>>>... > >>>>>Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 > >>>> > >>>>The meaning of this is that the incoming trap source is not a recognized (included in the SM database) guid > >>> > >>> > > It looks like it occurs on SM port down which seems OK. > OK that explains it: > The errors are when the SM port has turned down. In that case all the ports that were previously > found on the fabric are now inaccessible. The SM should Report(Notice with trap #65) for each of these ports. Right, GID out of service should be and is indicated. > For that sake it scans through the InformInfo database. > Apparently an InformInfo with LID=7 has requested for this report. > But LID 7 does not exist anymore It exists. It is just not reachable via GS (SA) LID routed packets. > - so the first message is valid: Not sure what you mean exactly by valid here. > > Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 > (actually this should have caused the InformInfo record to be deleted... which I do not think happening) What should have caused the InformInfo record to be deleted ? This error being detected ? If so, should it wait for the error or should it occur when the SM port goes down do this (clear the inform list perhaps with the exception of the local node) ? That would require/mean reregistration is required when the node comes back. SA clients won't necessarily do this when the SM port comes back without something like ClientReregistration. > Later we see the following error: > > Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 > This is sent during the section where node 0x0008f10403960559 is being teared off from the SMDB. > > The code in osm_inform.c say: > /* Check if there is a pkey match. o13-17.1.1*/ Where is this performed ? > /* Check if the issuer of the trap is the SM. If it is, then the pkey ^^ gid > comparison should be done on the trap source (saved as the gid in the > data details field). > If the issuer gid is not the SM - then it is the guid of the trap > source. */ > if ( (cl_ntoh64(p_ntc->issuer_gid.unicast.prefix) == p_subn->opt.subnet_prefix) && > (cl_ntoh64(p_ntc->issuer_gid.unicast.interface_id) == p_subn->sm_port_guid) ) > { > /* The issuer is the SM this is trap 64-67 - compare the pkey > with the gid saved on the data details. */ > source_gid = p_ntc->data_details.ntc_64_67.gid; > } > else > { > source_gid = p_ntc->issuer_gid; > } > > In our case the trap is 65 and sent by the SM. However, the spec required to check > the tear down port and the target of the Report will share a PKey. I'm not sure what you are referring to in the spec. In any case, shouldn't the local ports perhaps be an exception to this ? > In out case the > source of the event is considered to be the port that is tear down. (As we want to > prevent any case where port not sharing PKey will get reports on each other). > But since the "source" port is being teared down we can not find it's PKey table ... > (actually we look first in the Port by LID table - and can not find it). > > This means we will never send Report(Notice trap#65) to any node. > How do we solve that bug? Maybe we have a way to find the "source" port PKey that > is not yet corrupted. I'm not totally following this because of the PKey v. GID issue above and I think local ports may be (needed to be) treated differently. -- Hal From kingman at storagegear.com Fri Sep 9 09:14:20 2005 From: kingman at storagegear.com (John Kingman) Date: Fri, 9 Sep 2005 11:14:20 -0500 (CDT) Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: <528xy6i7e4.fsf@cisco.com> References: <528xy6i7e4.fsf@cisco.com> Message-ID: On Fri, 9 Sep 2005, Roland Dreier wrote: > > SRP_PORT_REDIRECT = 1, > > + SRP_PORT_REDIRECT = 2, > >You must never have compiled this ;) Oops, not that patch. I did compile the right one on the prior version, however. I'll update everything once your latest changes are in and test it before creating the next patch. > > +struct class_port_info > > +{ > > + u8 base_version; > > + u8 class_version; > > + __be16 capability_mask; > > + __be32 resp_time_value; > > + u8 redirect_gid[16]; > > + __be32 redirect_tcslfl; > > + __be16 redirect_lid; > > + __be16 redirect_p_key; > > + __be32 redirect_qp; > > + __be32 redirect_q_key; > > + u8 trap_gid[16]; > > + __be32 trap_tcslfl; > > + __be16 trap_lid; > > + __be16 trap_p_key; > > + __be32 trap_hlqp; > > + __be32 trap_q_key; > > +}; > >I think this belongs in ib_mad.h, not ib_srp.h. OK, wasn't sure where it should go. I'll move it. >Other than this, I'm OK with this approach if Sean is OK with the CM >changes. Unfortunately I'm just about to land some changes to the >connection code to handle reconnecting for host resets, so you'll have >to rebase your patches. > >Once I get that done (should happen today), please respin your patch, >test with your target and resend. Will do. Thanks, John From mshefty at ichips.intel.com Fri Sep 9 09:36:48 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Sep 2005 09:36:48 -0700 Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: <528xy6i7e4.fsf@cisco.com> References: <528xy6i7e4.fsf@cisco.com> Message-ID: <4321BA20.50403@ichips.intel.com> Roland Dreier wrote: > Other than this, I'm OK with this approach if Sean is OK with the CM > changes. Unfortunately I'm just about to land some changes to the > connection code to handle reconnecting for host resets, so you'll have > to rebase your patches. I'm okay with this approach until we can come up with a more complete solution. I'm not thrilled with requiring the user to set the QPN, but I'm not sure if hiding it inside the private cm_id structure is a better solution. At least this way the ULP can set the QPN for future requests. - Sean From eitan at mellanox.co.il Fri Sep 9 10:22:04 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 09 Sep 2005 20:22:04 +0300 Subject: [openib-general] Re: Some OpenSM 1.8.0 Anomalies In-Reply-To: <1126279785.4401.6739.camel@hal.voltaire.com> References: <1126101147.4396.1405.camel@hal.voltaire.com> <431F1CC2.9020104@mellanox.co.il> <1126117797.4401.215.camel@hal.voltaire.com> <431FD699.9060404@mellanox.co.il> <1126182059.4401.2269.camel@hal.voltaire.com> <43203682.50600@mellanox.co.il> <1126279785.4401.6739.camel@hal.voltaire.com> Message-ID: <4321C4BC.7080709@mellanox.co.il> Hal Rosenstock wrote: > On Thu, 2005-09-08 at 09:02, Eitan Zahavi wrote: > >>Hal Rosenstock wrote: >> >>>>>>>Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 >>>>>> >>>>>>This means that the LID of the port registered as the source for this inform info is not recognized as a valid LID. >>>>>> >>>>>> >>>>>> >>>>>>>... >>>>>>>Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 >>>>>> >>>>>>The meaning of this is that the incoming trap source is not a recognized (included in the SM database) guid >>>>> >>>>> > >>>It looks like it occurs on SM port down which seems OK. >> >>OK that explains it: >>The errors are when the SM port has turned down. In that case all the ports that were previously >>found on the fabric are now inaccessible. The SM should Report(Notice with trap #65) for each of these ports. > > > Right, GID out of service should be and is indicated. > > >>For that sake it scans through the InformInfo database. >>Apparently an InformInfo with LID=7 has requested for this report. >>But LID 7 does not exist anymore > > > It exists. It is just not reachable via GS (SA) LID routed packets. Well from the point of view of the SM it does not once the SM can not reach it. > > >> - so the first message is valid: > > > Not sure what you mean exactly by valid here. Valid means that it is correct. The destination port to send the Report to is not part of any partition any more. I would rephrase the error message and make it Info. There is no ERROR in loosing some ports. > > >> > Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 >>(actually this should have caused the InformInfo record to be deleted... which I do not think happening) > > > What should have caused the InformInfo record to be deleted ? "o13-17.1.2: If a Set(InformInfo) specified a valid trap source at the time of subscription (see o13-14.1.1: on page 746), yet Trap() forwarding fails because the subscriber and trap source are no longer permitted to access each other according to current partitioning (see o13-17.1.1: on page 747), then the manager shall permanently discontinue all event forwarding caused by the Set(InformInfo) which created a subscription to that trap source, except if InformInfo:LIDRangeBegin was 0xFFFF; in the latter case, event forwarding is discontinued only for the now-invalid trap source." Later on the same page: "Note also that “permanently discontinue all event forwarding” is meant to indicate that the subscription for forwarding is dropped by the manager; if the source later becomes reachable again by the subscriber, a new Set(InformInfo) is required to re-establish event forwarding, if that is what is desired. (This may not be desired; when the source becomes reachable again, it may have acquired new characteristics, such as new, different software functions, that make such forwarding inappropriate.)" > This error being detected ? Not currently > If so, should it wait for the error or should it occur > when the SM port goes down do this (clear the inform list perhaps with > the exception of the local node) ? Maybe or just code the generic code to handle 013-17.1.2 > That would require/mean > reregistration is required when the node comes back. SA clients won't > necessarily do this when the SM port comes back without something like > ClientReregistration. Correct. This is another reason why ClientReRegistration is an important feature of the access layer. > > >>Later we see the following error: >> > Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 >>This is sent during the section where node 0x0008f10403960559 is being teared off from the SMDB. >> >>The code in osm_inform.c say: >> /* Check if there is a pkey match. o13-17.1.1*/ > > > Where is this performed ? osm_inform.c __match_notice_to_inf_rec > > >> /* Check if the issuer of the trap is the SM. If it is, then the pkey > > ^^ > gid The requirement is to have a shared PKey according to PKey sharing rules between the InformInfo requester and the Trap generator. However, in the case of traps 64-67 the SM is the Trap generator. So we need the spacial logic below to obtain the port gid that the trap refers to from within the notice data details fields and not from the issuer field. > >> comparison should be done on the trap source (saved as the gid in the >> data details field). >> If the issuer gid is not the SM - then it is the guid of the trap >> source. */ >> if ( (cl_ntoh64(p_ntc->issuer_gid.unicast.prefix) == p_subn->opt.subnet_prefix) && >> (cl_ntoh64(p_ntc->issuer_gid.unicast.interface_id) == p_subn->sm_port_guid) ) >> { >> /* The issuer is the SM this is trap 64-67 - compare the pkey >> with the gid saved on the data details. */ >> source_gid = p_ntc->data_details.ntc_64_67.gid; >> } >> else >> { >> source_gid = p_ntc->issuer_gid; >> } >> >>In our case the trap is 65 and sent by the SM. However, the spec required to check >>the tear down port and the target of the Report will share a PKey. > > > I'm not sure what you are referring to in the spec. In any case, > shouldn't the local ports perhaps be an exception to this ? I do not think so. The requirement make sense for all traps: If the Trap describes a port A then it should not be forwarded to another port B unless they share a PKey: "o13-17.1.1: Managers that support event forwarding and have confirmed a request for event subscription shall forward corresponding events to the subscriber using a Report(Notice) MAD, as long as the subscriber and Trap() source are permitted to access each other according to current partitioning." > > >> In out case the >>source of the event is considered to be the port that is tear down. (As we want to >>prevent any case where port not sharing PKey will get reports on each other). >>But since the "source" port is being teared down we can not find it's PKey table ... >>(actually we look first in the Port by LID table - and can not find it). >> >>This means we will never send Report(Notice trap#65) to any node. >>How do we solve that bug? Maybe we have a way to find the "source" port PKey that >>is not yet corrupted. > > > I'm not totally following this because of the PKey v. GID issue above and > I think local ports may be (needed to be) treated differently. I hope the above 17.1.1 convinced you. The GID vs PKey is just unclear documentation. The idea is that for trap# 64-67 which are generated by the SM you can not simply use the SM PKey but lookup the gid of the reported port from within the notice data details and then lookup that port PKey. > > -- Hal From eitan at mellanox.co.il Fri Sep 9 10:29:14 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 09 Sep 2005 20:29:14 +0300 Subject: [openib-general] Re: osmtest/OpenSM: ServiceGID and busy status In-Reply-To: <1126273481.4401.6431.camel@hal.voltaire.com> References: <1126273481.4401.6431.camel@hal.voltaire.com> Message-ID: <4321C66A.3000604@mellanox.co.il> Hal Rosenstock wrote: > Hi, > > A couple of things about osmtest (and one is related to OpenSM): > > 1. It appears that osmt_service.c sets ServiceRecords with the subnet > prefix of the ServiceGID set to 0 ? Is that the correct thing to do > (from an osmtest perspective) ? > > ServiceGID..............0x0000000000000000 : 0x0008f10403960559 Well, I could not find where the spec require the validation of the provided GID field for ServiceRecords. The fact we allow non valid or unknown GIDs to be registered might become useful. > > More importantly, should the SM allow this (is this a valid GID) ? > Shouldn't it match one of the GIDs for that port that is setting the > ServiceRecord ? As I said - I did not see anywhere in the spec a specific requirement for that. Why do you see this as an issue? > > 2. In general in osmtest (and other SA client code using the vendor > layer), when a remote error is indicated (MAD status != success), this > is indicated as a remote error. It appears that the various > clients/applications (osmtest is one) is not dealing with BUSY which can > be returned by an SM. This is a big hole! Thanks for bringing it up. I think we should enhance the SA client code to recognize this and re-issue the MAD. Can this be done in the lowest possible layer? > > -- Hal From sean.hefty at intel.com Fri Sep 9 11:03:03 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 9 Sep 2005 11:03:03 -0700 Subject: [openib-general] RE: [PATCH] [CM] 1/2 Fix CM redirection In-Reply-To: Message-ID: >Second attempt; sent this last night. I apologize if it is a duplicate. > >Here's an updated change. Took the suggestions and it should be cleaner. >Not the be-all, end-all answer for redirection, of course, but it works >for my situation. > >Any chance of inclusion as an interim fix? Committed this with one minor change: named qpn to remote_cm_qpn to indicate which QPN was being reported. See below for patch. - Sean Index: include/rdma/ib_cm.h =================================================================== --- include/rdma/ib_cm.h (revision 3342) +++ include/rdma/ib_cm.h (working copy) @@ -290,6 +290,7 @@ struct ib_cm_id { enum ib_cm_lap_state lap_state; /* internal CM/debug use */ __be32 local_id; __be32 remote_id; + u32 remote_cm_qpn; /* 1 unless redirected */ }; /** Index: core/cm.c =================================================================== --- core/cm.c (revision 3342) +++ core/cm.c (working copy) @@ -173,7 +173,8 @@ static int cm_alloc_msg(struct cm_id_pri if (IS_ERR(ah)) return PTR_ERR(ah); - m = ib_create_send_mad(mad_agent, 1, cm_id_priv->av.pkey_index, + m = ib_create_send_mad(mad_agent, cm_id_priv->id.remote_cm_qpn, + cm_id_priv->av.pkey_index, ah, 0, sizeof(struct ib_mad_hdr), sizeof(struct ib_mad)-sizeof(struct ib_mad_hdr), GFP_ATOMIC); @@ -536,6 +537,7 @@ struct ib_cm_id *ib_create_cm_id(ib_cm_h cm_id_priv->id.state = IB_CM_IDLE; cm_id_priv->id.cm_handler = cm_handler; cm_id_priv->id.context = context; + cm_id_priv->id.remote_cm_qpn = 1; ret = cm_alloc_id(cm_id_priv); if (ret) goto error; From rolandd at cisco.com Fri Sep 9 12:36:14 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 12:36:14 -0700 Subject: [openib-general] [PATCH] [DAPL] update to match new event processing APIs In-Reply-To: (Sean Hefty's message of "Thu, 8 Sep 2005 15:56:12 -0700") References: Message-ID: <52ll26ghjl.fsf@cisco.com> Sean, what's your feeling about merging the new uverbs stale event handling stuff for 2.6.14? I'm inclined to get it in early, so that it gets wider exposure. And I think the ABI is good, so we won't have to break it again. - R. From mshefty at ichips.intel.com Fri Sep 9 12:43:24 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Sep 2005 12:43:24 -0700 Subject: [openib-general] [PATCH] [DAPL] update to match new event processing APIs In-Reply-To: <52ll26ghjl.fsf@cisco.com> References: <52ll26ghjl.fsf@cisco.com> Message-ID: <4321E5DC.4010301@ichips.intel.com> Roland Dreier wrote: > Sean, what's your feeling about merging the new uverbs stale event > handling stuff for 2.6.14? I'm inclined to get it in early, so that > it gets wider exposure. And I think the ABI is good, so we won't have > to break it again. I think that earlier would be better. - Sean From halr at voltaire.com Fri Sep 9 12:52:32 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 15:52:32 -0400 Subject: [openib-general] Re: osmtest/OpenSM: ServiceGID and busy status In-Reply-To: <4321C66A.3000604@mellanox.co.il> References: <1126273481.4401.6431.camel@hal.voltaire.com> <4321C66A.3000604@mellanox.co.il> Message-ID: <1126295551.4401.7529.camel@hal.voltaire.com> On Fri, 2005-09-09 at 13:29, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > Hi, > > > > A couple of things about osmtest (and one is related to OpenSM): > > > > 1. It appears that osmt_service.c sets ServiceRecords with the subnet > > prefix of the ServiceGID set to 0 ? Is that the correct thing to do > > (from an osmtest perspective) ? > > > > ServiceGID..............0x0000000000000000 : 0x0008f10403960559 > Well, I could not find where the spec require the validation of the provided GID field for > ServiceRecords. The fact we allow non valid or unknown GIDs to be registered might become useful. I may be wrong but: ServiceGID says port GID for service. A port GID must meet the requirements in the addressing section. > > More importantly, should the SM allow this (is this a valid GID) ? > > Shouldn't it match one of the GIDs for that port that is setting the > > ServiceRecord ? > As I said - I did not see anywhere in the spec a specific requirement for that. > Why do you see this as an issue? See above. > > 2. In general in osmtest (and other SA client code using the vendor > > layer), when a remote error is indicated (MAD status != success), this > > is indicated as a remote error. It appears that the various > > clients/applications (osmtest is one) is not dealing with BUSY which can > > be returned by an SM. > This is a big hole! Thanks for bringing it up. I think we should enhance the > SA client code to recognize this and re-issue the MAD. Can this be done in the > lowest possible layer? For busy, it might be possible but is there one timeout retry strategy or should this be left to the client ? For other errors, I think it needs to be left to the client/application to determine whether it is in error. -- Hal From eitan at mellanox.co.il Fri Sep 9 13:05:37 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 9 Sep 2005 23:05:37 +0300 Subject: [openib-general] RE: osmtest/OpenSM: ServiceGID and busy status Message-ID: <506C3D7B14CDD411A52C00025558DED607C307A8@mtlex01.yok.mtl.com> > On Fri, 2005-09-09 at 13:29, Eitan Zahavi wrote: > > Hal Rosenstock wrote: > > > Hi, > > > > > > A couple of things about osmtest (and one is related to OpenSM): > > > > > > 1. It appears that osmt_service.c sets ServiceRecords with the subnet > > > prefix of the ServiceGID set to 0 ? Is that the correct thing to do > > > (from an osmtest perspective) ? > > > > > > ServiceGID..............0x0000000000000000 : 0x0008f10403960559 > > Well, I could not find where the spec require the validation of the provided GID > field for > > ServiceRecords. The fact we allow non valid or unknown GIDs to be registered > might become useful. > > I may be wrong but: > ServiceGID says port GID for service. A port GID must meet the > requirements in the addressing section. [EZ] I think the spec intentionally leaves this open. The intent is to use this as GID but no check is defined. According to your interpretation no "proxy" - where node A publish services of node B - is allowed > > > > More importantly, should the SM allow this (is this a valid GID) ? > > > Shouldn't it match one of the GIDs for that port that is setting the > > > ServiceRecord ? > > As I said - I did not see anywhere in the spec a specific requirement for that. > > Why do you see this as an issue? > > See above. > > > > 2. In general in osmtest (and other SA client code using the vendor > > > layer), when a remote error is indicated (MAD status != success), this > > > is indicated as a remote error. It appears that the various > > > clients/applications (osmtest is one) is not dealing with BUSY which can > > > be returned by an SM. > > This is a big hole! Thanks for bringing it up. I think we should enhance the > > SA client code to recognize this and re-issue the MAD. Can this be done in the > > lowest possible layer? > > For busy, it might be possible but is there one timeout retry strategy > or should this be left to the client ? For other errors, I think it > needs to be left to the client/application to determine whether it is in > error. [EZ] Agree about the need to pass up the error codes. Just handle the BUSY at a lower level which is probably common to most applications. But we might at least make it an optional service of the low level? > -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlentini at netapp.com Fri Sep 9 13:07:00 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 9 Sep 2005 16:07:00 -0400 (EDT) Subject: [openib-general] [IBAT][PATCH] simplify debug/warn enable/disable Message-ID: Simplify the enabling/disabling of debugging and warning messages in IBAT. Signed-off-by: James Lentini Index: infiniband/core/at_priv.h =================================================================== --- infiniband/core/at_priv.h (revision 3345) +++ infiniband/core/at_priv.h (working copy) @@ -132,15 +132,21 @@ static const struct ib_field ats_rec_tab #define IB_ATS_LAST_SERVICE_ID cpu_to_be64(0x10000ce1ff415453ULL) #define IB_ATS_OPENIB_MAGIC_KEY cpu_to_be16(IB_OPENIB_OUI & 0xffff) -//#define WARN(fmt, ...) while (0) {} -//#define WARN_VAR(x, y...) -#define DEBUG(fmt, ...) while (0) {} -#define DEBUG_VAR(x, y...) - +#if 1 #define WARN(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__ , ## arg); #define WARN_VAR(x, y...) x, ## y -//#define DEBUG(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__ , ## arg); -//#define DEBUG_VAR(x, y...) x, ## y +#else +#define WARN(fmt, ...) while (0) {} +#define WARN_VAR(x, y...) +#endif + +#if 0 +#define DEBUG(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__ , ## arg); +#define DEBUG_VAR(x, y...) x, ## y +#else +#define DEBUG(fmt, ...) while (0) {} +#define DEBUG_VAR(x, y...) +#endif static kmem_cache_t *route_req_cache = NULL; static kmem_cache_t *path_req_cache = NULL; -------------- next part -------------- Index: infiniband/core/at_priv.h =================================================================== --- infiniband/core/at_priv.h (revision 3345) +++ infiniband/core/at_priv.h (working copy) @@ -132,15 +132,21 @@ static const struct ib_field ats_rec_tab #define IB_ATS_LAST_SERVICE_ID cpu_to_be64(0x10000ce1ff415453ULL) #define IB_ATS_OPENIB_MAGIC_KEY cpu_to_be16(IB_OPENIB_OUI & 0xffff) -//#define WARN(fmt, ...) while (0) {} -//#define WARN_VAR(x, y...) -#define DEBUG(fmt, ...) while (0) {} -#define DEBUG_VAR(x, y...) - +#if 1 #define WARN(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__ , ## arg); #define WARN_VAR(x, y...) x, ## y -//#define DEBUG(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__ , ## arg); -//#define DEBUG_VAR(x, y...) x, ## y +#else +#define WARN(fmt, ...) while (0) {} +#define WARN_VAR(x, y...) +#endif + +#if 0 +#define DEBUG(fmt, arg...) printk("ib_at: %s: " fmt "\n", __FUNCTION__ , ## arg); +#define DEBUG_VAR(x, y...) x, ## y +#else +#define DEBUG(fmt, ...) while (0) {} +#define DEBUG_VAR(x, y...) +#endif static kmem_cache_t *route_req_cache = NULL; static kmem_cache_t *path_req_cache = NULL; From halr at voltaire.com Fri Sep 9 13:03:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 16:03:14 -0400 Subject: [openib-general] Re: Some OpenSM 1.8.0 Anomalies In-Reply-To: <4321C4BC.7080709@mellanox.co.il> References: <1126101147.4396.1405.camel@hal.voltaire.com> <431F1CC2.9020104@mellanox.co.il> <1126117797.4401.215.camel@hal.voltaire.com> <431FD699.9060404@mellanox.co.il> <1126182059.4401.2269.camel@hal.voltaire.com> <43203682.50600@mellanox.co.il> <1126279785.4401.6739.camel@hal.voltaire.com> <4321C4BC.7080709@mellanox.co.il> Message-ID: <1126296193.4401.7571.camel@hal.voltaire.com> On Fri, 2005-09-09 at 13:22, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > On Thu, 2005-09-08 at 09:02, Eitan Zahavi wrote: > > > >>Hal Rosenstock wrote: > >> > >>>>>>>Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 > >>>>>> > >>>>>>This means that the LID of the port registered as the source for this inform info is not recognized as a valid LID. > >>>>>> > >>>>>> > >>>>>> > >>>>>>>... > >>>>>>>Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 > >>>>>> > >>>>>>The meaning of this is that the incoming trap source is not a recognized (included in the SM database) guid > >>>>> > >>>>> > > > >>>It looks like it occurs on SM port down which seems OK. > >> > >>OK that explains it: > >>The errors are when the SM port has turned down. In that case all the ports that were previously > >>found on the fabric are now inaccessible. The SM should Report(Notice with trap #65) for each of these ports. > > > > > > Right, GID out of service should be and is indicated. > > > > > >>For that sake it scans through the InformInfo database. > >>Apparently an InformInfo with LID=7 has requested for this report. > >>But LID 7 does not exist anymore > > > > > > It exists. It is just not reachable via GS (SA) LID routed packets. > Well from the point of view of the SM it does not once the SM can not reach it. OK. > >> - so the first message is valid: > > > > > > Not sure what you mean exactly by valid here. > Valid means that it is correct. The destination port to send the Report to is not part of any partition any more. > I would rephrase the error message and make it Info. There is no ERROR in loosing some ports. Right. This should be made into something less than error. > >> > Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find destination port with LID:0x0007 > >>(actually this should have caused the InformInfo record to be deleted... which I do not think happening) > > > > What should have caused the InformInfo record to be deleted ? > "o13-17.1.2: If a Set(InformInfo) specified a valid trap source at the time of > subscription (see o13-14.1.1: on page 746), yet Trap() forwarding fails because > the subscriber and trap source are no longer permitted to access > each other according to current partitioning (see o13-17.1.1: on page > 747), then the manager shall permanently discontinue all event forwarding > caused by the Set(InformInfo) which created a subscription to > that trap source, except if InformInfo:LIDRangeBegin was 0xFFFF; in the > latter case, event forwarding is discontinued only for the now-invalid trap > source." > Later on the same page: > "Note also that “permanently discontinue all event forwarding” is meant to > indicate that the subscription for forwarding is dropped by the manager; if > the source later becomes reachable again by the subscriber, a new > Set(InformInfo) is required to re-establish event forwarding, if that is what > is desired. (This may not be desired; when the source becomes reachable > again, it may have acquired new characteristics, such as new, different > software functions, that make such forwarding inappropriate.)" > > > This error being detected ? > Not currently > > If so, should it wait for the error or should it occur > > when the SM port goes down do this (clear the inform list perhaps with > > the exception of the local node) ? > Maybe or just code the generic code to handle 013-17.1.2 > > That would require/mean > > reregistration is required when the node comes back. SA clients won't > > necessarily do this when the SM port comes back without something like > > ClientReregistration. > Correct. This is another reason why ClientReRegistration is an important feature of the > access layer. I would have ended that sentence after feature. It does not need to be implemented in the access layer. > >>Later we see the following error: > >> > Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: Cannot find source port with GUID:0x0008f10403960559 > >>This is sent during the section where node 0x0008f10403960559 is being teared off from the SMDB. > >> > >>The code in osm_inform.c say: > >> /* Check if there is a pkey match. o13-17.1.1*/ > > > > > > Where is this performed ? > osm_inform.c > __match_notice_to_inf_rec > > > > > > >> /* Check if the issuer of the trap is the SM. If it is, then the pkey > > > > ^^ > > gid > The requirement is to have a shared PKey according to PKey sharing rules between the > InformInfo requester and the Trap generator. However, in the case of traps 64-67 > the SM is the Trap generator. So we need the spacial logic below to obtain the port gid > that the trap refers to from within the notice data details fields and not from the issuer field. I think the comment in the code is wrong here and should be gid rather than pkey. I do agree that the pkey sharing needs checking but that is separate. > >> comparison should be done on the trap source (saved as the gid in the > >> data details field). > >> If the issuer gid is not the SM - then it is the guid of the trap > >> source. */ > >> if ( (cl_ntoh64(p_ntc->issuer_gid.unicast.prefix) == p_subn->opt.subnet_prefix) && > >> (cl_ntoh64(p_ntc->issuer_gid.unicast.interface_id) == p_subn->sm_port_guid) ) > >> { > >> /* The issuer is the SM this is trap 64-67 - compare the pkey > >> with the gid saved on the data details. */ > >> source_gid = p_ntc->data_details.ntc_64_67.gid; > >> } > >> else > >> { > >> source_gid = p_ntc->issuer_gid; > >> } > >> > >>In our case the trap is 65 and sent by the SM. However, the spec required to check > >>the tear down port and the target of the Report will share a PKey. > > > > > > I'm not sure what you are referring to in the spec. In any case, > > shouldn't the local ports perhaps be an exception to this ? > I do not think so. The requirement make sense for all traps: > If the Trap describes a port A then it should not be forwarded to another port B unless they > share a PKey: > "o13-17.1.1: Managers that support event forwarding and have confirmed > a request for event subscription shall forward corresponding events to the > subscriber using a Report(Notice) MAD, as long as the subscriber and > Trap() source are permitted to access each other according to current partitioning." > > > > > >> In out case the > >>source of the event is considered to be the port that is tear down. (As we want to > >>prevent any case where port not sharing PKey will get reports on each other). > >>But since the "source" port is being teared down we can not find it's PKey table ... > >>(actually we look first in the Port by LID table - and can not find it). > >> > >>This means we will never send Report(Notice trap#65) to any node. > >>How do we solve that bug? Maybe we have a way to find the "source" port PKey that > >>is not yet corrupted. > > > > > > I'm not totally following this because of the PKey v. GID issue above and > > I think local ports may be (needed to be) treated differently. > I hope the above 17.1.1 convinced you. The GID vs PKey is just unclear documentation. > The idea is that for trap# 64-67 which are generated by the SM you can not simply use the SM PKey but > lookup the gid of the reported port from within the notice data details and then lookup that port PKey. OK. I'm convinced. I'm still not sure what is the bug you are referring to above though. -- Hal From eitan at mellanox.co.il Fri Sep 9 13:14:35 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 9 Sep 2005 23:14:35 +0300 Subject: [openib-general] RE: Some OpenSM 1.8.0 Anomalies Message-ID: <506C3D7B14CDD411A52C00025558DED607C307AA@mtlex01.yok.mtl.com> > On Fri, 2005-09-09 at 13:22, Eitan Zahavi wrote: > > Hal Rosenstock wrote: > > > On Thu, 2005-09-08 at 09:02, Eitan Zahavi wrote: > > > > > >>Hal Rosenstock wrote: > > >> > > >>>>>>>Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: > Cannot find destination port with LID:0x0007 > > >>>>>> > > >>>>>>This means that the LID of the port registered as the source for this inform > info is not recognized as a valid LID. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>>>... > > >>>>>>>Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: > Cannot find source port with GUID:0x0008f10403960559 > > >>>>>> > > >>>>>>The meaning of this is that the incoming trap source is not a recognized > (included in the SM database) guid > > >>>>> > > >>>>> > > > > > >>>It looks like it occurs on SM port down which seems OK. > > >> > > >>OK that explains it: > > >>The errors are when the SM port has turned down. In that case all the ports that > were previously > > >>found on the fabric are now inaccessible. The SM should Report(Notice with trap > #65) for each of these ports. > > > > > > > > > Right, GID out of service should be and is indicated. > > > > > > > > >>For that sake it scans through the InformInfo database. > > >>Apparently an InformInfo with LID=7 has requested for this report. > > >>But LID 7 does not exist anymore > > > > > > > > > It exists. It is just not reachable via GS (SA) LID routed packets. > > Well from the point of view of the SM it does not once the SM can not reach it. > > OK. > > > >> - so the first message is valid: > > > > > > > > > Not sure what you mean exactly by valid here. > > Valid means that it is correct. The destination port to send the Report to is not part > of any partition any more. > > I would rephrase the error message and make it Info. There is no ERROR in loosing > some ports. > > Right. This should be made into something less than error. > > > >> > Sep 06 15:41:48 725691 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: > Cannot find destination port with LID:0x0007 > > >>(actually this should have caused the InformInfo record to be deleted... which I do > not think happening) > > > > > > What should have caused the InformInfo record to be deleted ? > > "o13-17.1.2: If a Set(InformInfo) specified a valid trap source at the time of > > subscription (see o13-14.1.1: on page 746), yet Trap() forwarding fails because > > the subscriber and trap source are no longer permitted to access > > each other according to current partitioning (see o13-17.1.1: on page > > 747), then the manager shall permanently discontinue all event forwarding > > caused by the Set(InformInfo) which created a subscription to > > that trap source, except if InformInfo:LIDRangeBegin was 0xFFFF; in the > > latter case, event forwarding is discontinued only for the now-invalid trap > > source." > > Later on the same page: > > "Note also that "permanently discontinue all event forwarding" is meant to > > indicate that the subscription for forwarding is dropped by the manager; if > > the source later becomes reachable again by the subscriber, a new > > Set(InformInfo) is required to re-establish event forwarding, if that is what > > is desired. (This may not be desired; when the source becomes reachable > > again, it may have acquired new characteristics, such as new, different > > software functions, that make such forwarding inappropriate.)" > > > > > This error being detected ? > > Not currently > > > If so, should it wait for the error or should it occur > > > when the SM port goes down do this (clear the inform list perhaps with > > > the exception of the local node) ? > > Maybe or just code the generic code to handle 013-17.1.2 > > > That would require/mean > > > reregistration is required when the node comes back. SA clients won't > > > necessarily do this when the SM port comes back without something like > > > ClientReregistration. > > Correct. This is another reason why ClientReRegistration is an important feature of > the > > access layer. > > I would have ended that sentence after feature. It does not need to be > implemented in the access layer. > > > >>Later we see the following error: > > >> > Sep 06 15:41:48 726186 [B76A4C40] -> __match_notice_to_inf_rec: ERR 0207: > Cannot find source port with GUID:0x0008f10403960559 > > >>This is sent during the section where node 0x0008f10403960559 is being teared off > from the SMDB. > > >> > > >>The code in osm_inform.c say: > > >> /* Check if there is a pkey match. o13-17.1.1*/ > > > > > > > > > Where is this performed ? > > osm_inform.c > > __match_notice_to_inf_rec > > > > > > > > > > >> /* Check if the issuer of the trap is the SM. If it is, then the pkey > > > > > > ^^ > > > gid > > The requirement is to have a shared PKey according to PKey sharing rules between > the > > InformInfo requester and the Trap generator. However, in the case of traps 64-67 > > the SM is the Trap generator. So we need the spacial logic below to obtain the port > gid > > that the trap refers to from within the notice data details fields and not from the > issuer field. > > I think the comment in the code is wrong here and should be gid rather > than pkey. I do agree that the pkey sharing needs checking but that is > separate. [EZ] OK we can improve the comments accuracy and readability. As always comments written by the developers are somewhat biased by the fact he already understands the code. So the first time reader can do a better job... > > > >> comparison should be done on the trap source (saved as the gid in the > > >> data details field). > > >> If the issuer gid is not the SM - then it is the guid of the trap > > >> source. */ > > >> if ( (cl_ntoh64(p_ntc->issuer_gid.unicast.prefix) == p_subn->opt.subnet_prefix) > && > > >> (cl_ntoh64(p_ntc->issuer_gid.unicast.interface_id) == p_subn->sm_port_guid) > ) > > >> { > > >> /* The issuer is the SM this is trap 64-67 - compare the pkey > > >> with the gid saved on the data details. */ > > >> source_gid = p_ntc->data_details.ntc_64_67.gid; > > >> } > > >> else > > >> { > > >> source_gid = p_ntc->issuer_gid; > > >> } > > >> > > >>In our case the trap is 65 and sent by the SM. However, the spec required to check > > >>the tear down port and the target of the Report will share a PKey. > > > > > > > > > I'm not sure what you are referring to in the spec. In any case, > > > shouldn't the local ports perhaps be an exception to this ? > > I do not think so. The requirement make sense for all traps: > > If the Trap describes a port A then it should not be forwarded to another port B > unless they > > share a PKey: > > "o13-17.1.1: Managers that support event forwarding and have confirmed > > a request for event subscription shall forward corresponding events to the > > subscriber using a Report(Notice) MAD, as long as the subscriber and > > Trap() source are permitted to access each other according to current partitioning." > > > > > > > > >> In out case the > > >>source of the event is considered to be the port that is tear down. (As we want to > > >>prevent any case where port not sharing PKey will get reports on each other). > > >>But since the "source" port is being teared down we can not find it's PKey table ... > > >>(actually we look first in the Port by LID table - and can not find it). > > >> > > >>This means we will never send Report(Notice trap#65) to any node. > > >>How do we solve that bug? Maybe we have a way to find the "source" port PKey > that > > >>is not yet corrupted. > > > > > > > > > I'm not totally following this because of the PKey v. GID issue above and > > > I think local ports may be (needed to be) treated differently. > > I hope the above 17.1.1 convinced you. The GID vs PKey is just unclear > documentation. > > The idea is that for trap# 64-67 which are generated by the SM you can not simply > use the SM PKey but > > lookup the gid of the reported port from within the notice data details and then > lookup that port PKey. > > OK. I'm convinced. > > I'm still not sure what is the bug you are referring to above though. [EZ] The bug is that the code does not perform the required operations to meet o13-17.1.2 compliancy. > > -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlentini at netapp.com Fri Sep 9 13:18:25 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 9 Sep 2005 16:18:25 -0400 (EDT) Subject: [openib-general] IBAT ATS registration issue Message-ID: Hi Hal, I've doing some interoperability testing between IBAT's ATS code and OnTap's ATS code. For the most part, everything works great. I've only seen one issue that I don't have a handle on. Sometimes IBAT doesn't register an ATS SA record. This happens when I setup the IB stack via a script like this: modprobe ib_mthca modprobe ib_ipoib ifconfig ib0 w.x.y.z modprobe kdapl_ib // this brings in ib_at.ko if I change the script as follows: modprobe ib_mthca modprobe ib_ipoib ifconfig ib0 w.x.y.z sleep 10 modprobe kdapl_ib // this brings in ib_at.ko everything seems to work. I plan to look through at.c's ib_devs_changed() and ib_devs_sweep() to see if I can spot the problem. Before I did that, I wanted to see if you had observed this problem. Could it be as simple as uncommenting the queue_delayed_work line in ib_at_ats_reg()? james From halr at voltaire.com Fri Sep 9 13:14:01 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 16:14:01 -0400 Subject: [openib-general] RE: osmtest/OpenSM: ServiceGID and busy status In-Reply-To: <506C3D7B14CDD411A52C00025558DED607C307A8@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED607C307A8@mtlex01.yok.mtl.com> Message-ID: <1126296841.4401.7610.camel@hal.voltaire.com> On Fri, 2005-09-09 at 16:05, Eitan Zahavi wrote: > > I may be wrong but: > > ServiceGID says port GID for service. A port GID must meet the > > requirements in the addressing section. > [EZ] I think the spec intentionally leaves this open. The intent is to > use this as GID but no check is defined. According to your > interpretation no "proxy" - where node A publish services of node B - > is allowed Proxy would be allowed. There are 2 possibilities: 1. Allow valid looking GIDs or 2. Only allow GIDs present in the subnet > > For busy, it might be possible but is there one timeout retry > strategy > > or should this be left to the client ? For other errors, I think it > > needs to be left to the client/application to determine whether it > is in > > error. > > [EZ] Agree about the need to pass up the error codes. Just handle the > BUSY at a lower level which is probably common to most applications. > But we might at least make it an optional service of the low level? The OpenSM SA client API needs changing to make it optional. Other than that it is a matter of the default policy: retries and timeout (with backoff) to be used. -- Hal From halr at voltaire.com Fri Sep 9 13:24:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 16:24:46 -0400 Subject: [openib-general] RE: Some OpenSM 1.8.0 Anomalies In-Reply-To: <506C3D7B14CDD411A52C00025558DED607C307AA@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED607C307AA@mtlex01.yok.mtl.com> Message-ID: <1126297216.4401.7637.camel@hal.voltaire.com> On Fri, 2005-09-09 at 16:14, Eitan Zahavi wrote: > > I think the comment in the code is wrong here and should be gid > rather > > than pkey. I do agree that the pkey sharing needs checking but that > is > > separate. > [EZ] OK we can improve the comments accuracy and readability. As > always comments written by the developers are somewhat biased by the > fact he already understands the code. So the first time reader can do > a better job... Understood. > > I'm still not sure what is the bug you are referring to above > though. > [EZ] The bug is that the code does not perform the required operations > to meet o13-17.1.2 compliancy. What is missing ? Is it the part relating to "except if InformInfo:LIDRangeBegin was 0xFFFF" ? -- Hal From rolandd at cisco.com Fri Sep 9 13:46:18 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 13:46:18 -0700 Subject: [openib-general] Re: [PATCH] ib_sa.h: Define more SA methods In-Reply-To: <1126276306.4401.6563.camel@hal.voltaire.com> (Hal Rosenstock's message of "09 Sep 2005 10:31:47 -0400") References: <1126276306.4401.6563.camel@hal.voltaire.com> Message-ID: <52acimgeat.fsf@cisco.com> thanks, applied. I'll push this for 2.6.14 too. From rolandd at cisco.com Fri Sep 9 13:50:17 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 13:50:17 -0700 Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: (John Kingman's message of "Thu, 8 Sep 2005 22:44:01 -0500 (CDT)") References: Message-ID: <5264tage46.fsf@cisco.com> Sean/Hal, does this make sense as a place to stick the ClassPortInfo structure (initially for use with CM redirection)? If so I'll go ahead and commit it and put it in my git tree as well. - R. Index: include/rdma/ib_mad.h =================================================================== --- include/rdma/ib_mad.h (revision 3345) +++ include/rdma/ib_mad.h (working copy) @@ -180,6 +180,26 @@ struct ib_vendor_mad { u8 data[IB_MGMT_VENDOR_DATA]; }; +struct ib_class_port_info +{ + u8 base_version; + u8 class_version; + __be16 capability_mask; + __be32 resp_time_value; + u8 redirect_gid[16]; + __be32 redirect_tcslfl; + __be16 redirect_lid; + __be16 redirect_p_key; + __be32 redirect_qp; + __be32 redirect_q_key; + u8 trap_gid[16]; + __be32 trap_tcslfl; + __be16 trap_lid; + __be16 trap_p_key; + __be32 trap_hlqp; + __be32 trap_q_key; +}; + /** * ib_mad_send_buf - MAD data buffer and work request for sends. * @mad: References an allocated MAD data buffer. The size of the data From mshefty at ichips.intel.com Fri Sep 9 13:52:04 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Sep 2005 13:52:04 -0700 Subject: [openib-general] Re: [PATCH] [CM] 2/2 Fix CM redirection in SRP In-Reply-To: <5264tage46.fsf@cisco.com> References: <5264tage46.fsf@cisco.com> Message-ID: <4321F5F4.90202@ichips.intel.com> Roland Dreier wrote: > Sean/Hal, does this make sense as a place to stick the ClassPortInfo > structure (initially for use with CM redirection)? > > If so I'll go ahead and commit it and put it in my git tree as well. Looks good to me. - Sean From jlentini at netapp.com Fri Sep 9 14:38:30 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 9 Sep 2005 17:38:30 -0400 (EDT) Subject: [openib-general] Re: [PATCH] [DAPL] update to match new event processing APIs In-Reply-To: References: Message-ID: On Thu, 8 Sep 2005, Sean Hefty wrote: > The following patch updates DAPL to match the verbs and CM event > processing APIs. Committed in revision 3349. From halr at voltaire.com Fri Sep 9 14:54:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 17:54:25 -0400 Subject: [openib-general] [PATCH] SA: Move SA attributes to ib_sa.h Message-ID: <1126302861.4401.7921.camel@hal.voltaire.com> SA: Move SA attributes to ib_sa.h so are accessible to more than sa_query.c. Also, remove deprecated attributes and add one missing one. Signed-off-by: Hal Rosenstock Index: ib_sa.h =================================================================== --- ib_sa.h (revision 3349) +++ ib_sa.h (working copy) @@ -53,6 +53,31 @@ enum { IB_SA_METHOD_GET_TRACE_TBL = 0x13 }; +enum { + IB_SA_ATTR_CLASS_PORTINFO = 0x01, + IB_SA_ATTR_NOTICE = 0x02, + IB_SA_ATTR_INFORM_INFO = 0x03, + IB_SA_ATTR_NODE_REC = 0x11, + IB_SA_ATTR_PORT_INFO_REC = 0x12, + IB_SA_ATTR_SL2VL_REC = 0x13, + IB_SA_ATTR_SWITCH_REC = 0x14, + IB_SA_ATTR_LINEAR_FDB_REC = 0x15, + IB_SA_ATTR_RANDOM_FDB_REC = 0x16, + IB_SA_ATTR_MCAST_FDB_REC = 0x17, + IB_SA_ATTR_SM_INFO_REC = 0x18, + IB_SA_ATTR_LINK_REC = 0x20, + IB_SA_ATTR_GUID_INFO_REC = 0x30, + IB_SA_ATTR_SERVICE_REC = 0x31, + IB_SA_ATTR_PARTITION_REC = 0x33, + IB_SA_ATTR_PATH_REC = 0x35, + IB_SA_ATTR_VL_ARB_REC = 0x36, + IB_SA_ATTR_MC_MEMBER_REC = 0x38, + IB_SA_ATTR_TRACE_REC = 0x39, + IB_SA_ATTR_MULTI_PATH_REC = 0x3a, + IB_SA_ATTR_SERVICE_ASSOC_REC = 0x3b, + IB_SA_ATTR_INFORM_INFO_REC = 0xf3 +}; + enum ib_sa_selector { IB_SA_GTE = 0, IB_SA_LTE = 1, Index: sa_query.c =================================================================== --- sa_query.c (revision 3349) +++ sa_query.c (working copy) @@ -113,32 +113,6 @@ static DEFINE_IDR(query_idr); static spinlock_t tid_lock; static u32 tid; -enum { - IB_SA_ATTR_CLASS_PORTINFO = 0x01, - IB_SA_ATTR_NOTICE = 0x02, - IB_SA_ATTR_INFORM_INFO = 0x03, - IB_SA_ATTR_NODE_REC = 0x11, - IB_SA_ATTR_PORT_INFO_REC = 0x12, - IB_SA_ATTR_SL2VL_REC = 0x13, - IB_SA_ATTR_SWITCH_REC = 0x14, - IB_SA_ATTR_LINEAR_FDB_REC = 0x15, - IB_SA_ATTR_RANDOM_FDB_REC = 0x16, - IB_SA_ATTR_MCAST_FDB_REC = 0x17, - IB_SA_ATTR_SM_INFO_REC = 0x18, - IB_SA_ATTR_LINK_REC = 0x20, - IB_SA_ATTR_GUID_INFO_REC = 0x30, - IB_SA_ATTR_SERVICE_REC = 0x31, - IB_SA_ATTR_PARTITION_REC = 0x33, - IB_SA_ATTR_RANGE_REC = 0x34, - IB_SA_ATTR_PATH_REC = 0x35, - IB_SA_ATTR_VL_ARB_REC = 0x36, - IB_SA_ATTR_MC_GROUP_REC = 0x37, - IB_SA_ATTR_MC_MEMBER_REC = 0x38, - IB_SA_ATTR_TRACE_REC = 0x39, - IB_SA_ATTR_MULTI_PATH_REC = 0x3a, - IB_SA_ATTR_SERVICE_ASSOC_REC = 0x3b -}; - #define PATH_REC_FIELD(field) \ .struct_offset_bytes = offsetof(struct ib_sa_path_rec, field), \ .struct_size_bytes = sizeof ((struct ib_sa_path_rec *) 0)->field, \ From rolandd at cisco.com Fri Sep 9 15:21:36 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 15:21:36 -0700 Subject: [openib-general] Re: [PATCH] SA: Move SA attributes to ib_sa.h In-Reply-To: <1126302861.4401.7921.camel@hal.voltaire.com> (Hal Rosenstock's message of "09 Sep 2005 17:54:25 -0400") References: <1126302861.4401.7921.camel@hal.voltaire.com> Message-ID: <52wtlpg9vz.fsf@cisco.com> This seems OK, so I applied it. But where is this reorganization leading? - R. From viswa.krish at gmail.com Fri Sep 9 15:21:44 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 9 Sep 2005 15:21:44 -0700 Subject: [openib-general] completion Q overflow error/panic Message-ID: <4df28be4050909152118f3e947@mail.gmail.com> I modified the cmpost program to have individual completion send/receive Q's. The mcpost server acts like a echo server, echoing back anything it receives. The client program keeps sending the packets. The test works fine upto around 600 connections. After 600 connections, I start to see ibv_post_send errors with. I added some debug messages in libmthca/src/qp.c where a check is made for wq_overflow. In fact it is overflowing. I checked the code to make sure all the send descriptors are recovered with cq_poll operation. Also the wc.status field is checked for any errors. I am attaching the modified code . bash-3.00$ svn info Path: . URL: https://openib.org/svn/gen2/trunk Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd Revision: 3344 Node Kind: directory Schedule: normal Last Changed Author: jlentini Last Changed Rev: 3344 Last Changed Date: 2005-09-08 16:39:25 -0700 (Thu, 08 Sep 2005) To run the test compile the code cc -o cmpost cmpost.c -libcm -libverbs -libat $ cmpost -n 1024 <=== as server $ cmpost -c -n 1024 -l -g After sometime you start seeing post_send errors. On my system upto 600 connections work fine. When running the test I saw panics couple of time. But difficult to reproduce ernel BUG at include/asm/spinlock.h:149! invalid operand: 0000 [#1] SMP Modules linked in: nfs nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbdsd_mod CPU: 1 EIP: 0060:[] Not tainted VLI EFLAGS: 00010086 (2.6.13) EIP is at _spin_lock_irqsave+0x47/0x51 eax: 00000011 ebx: 00000282 ecx: c035950c edx: 00000082 esi: f7d82010 edi: 00000000 ebp: f6792c80 esp: c1a33ed0 ds: 007b es: 007b ss: 0068 Process ib_mad1 (pid: 308, threadinfo=c1a32000 task=f7e3c540) Stack: c03123ee c0276963 f6792c80 f7d82010 c0276963 f79a6adc f7974b00 00000001 c1a33f0c f7912e00 f7df2000 f7df4200 c1a33f0c 00000292 c0276b96 f6792c80 00000000 00000000 00000000 b93e2c00 00000128 00000296 00000402 00000001 Call Trace: [] ib_mad_send_done_handler+0x72/0x11e [] ib_mad_send_done_handler+0x72/0x11e [] ib_mad_completion_handler+0x80/0x8d [] wait_noreap_copyout+0x55/0xbe [] worker_thread+0x1b0/0x23a [] schedule+0x5d3/0xbdf [] ib_mad_completion_handler+0x0/0x8d [] default_wake_function+0x0/0xc [] default_wake_function+0x0/0xc [] worker_thread+0x0/0x23a [] kthread+0x8a/0xb2 [] kthread+0x0/0xb2 [] kernel_thread_helper+0x5/0xb Code: 00 00 74 01 fb f3 90 80 3e 00 7e f9 fa eb e8 83 c4 08 89 d8 5b 5e c3 8b 44 24 10 c7 04 24 ee 23 31 c0 89 44 24 04 e8 2f e7 e1 ff <0f> 0b 95 00 39 1c 31 c0 eb c2 53 89 c3 83 ec 08 fa 81 78 04 ad -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cmpost.c Type: application/octet-stream Size: 18912 bytes Desc: not available URL: From viswa.krish at gmail.com Fri Sep 9 15:30:05 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 9 Sep 2005 15:30:05 -0700 Subject: [openib-general] completion Q overflow error/panic Message-ID: <4df28be40509091530757a3581@mail.gmail.com> Somehow gmail ate away the main content of my mail.. Here it is.. I modified the cmpost program to have individual completion send/receive Q's. The mcpost server acts like a echo server, echoing back anything it receives. The client program keeps sending the packets. The test works fine upto around 600 connections. After 600 connections, I start to see ibv_post_send errors with. I added some debug messages in libmthca/src/qp.c where a check is made for wq_overflow. In fact it is overflowing. I checked the code to make sure all the send descriptors are recovered with cq_poll operation. Also the wc.status field is checked for any errors. I am attaching the modified code . bash-3.00$ svn info Path: . URL: https://openib.org/svn/gen2/trunk Repository UUID: 21a7a0b7-18d7-0310-8e21 -e8b31bdbf5cd Revision: 3344 Node Kind: directory Schedule: normal Last Changed Author: jlentini Last Changed Rev: 3344 Last Changed Date: 2005-09-08 16:39:25 -0700 (Thu, 08 Sep 2005) To run the test compile the code cc -o cmpost cmpost.c -libcm -libverbs -libat $ cmpost -n 1024 <=== as server $ cmpost -c -n 1024 -l -g After sometime you start seeing post_send errors. On my system upto 600 connections work fine. When running the test I saw panics couple of time. But difficult to reproduce ernel BUG at include/asm/spinlock.h:149! invalid operand: 0000 [#1] SMP Modules linked in: nfs nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbdsd_mod CPU: 1 EIP: 0060:[] Not tainted VLI EFLAGS: 00010086 (2.6.13) EIP is at _spin_lock_irqsave+0x47/0x51 eax: 00000011 ebx: 00000282 ecx: c035950c edx: 00000082 esi: f7d82010 edi: 00000000 ebp: f6792c80 esp: c1a33ed0 ds: 007b es: 007b ss: 0068 Process ib_mad1 (pid: 308, threadinfo=c1a32000 task=f7e3c540) Stack: c03123ee c0276963 f6792c80 f7d82010 c0276963 f79a6adc f7974b00 00000001 c1a33f0c f7912e00 f7df2000 f7df4200 c1a33f0c 00000292 c0276b96 f6792c80 00000000 00000000 00000000 b93e2c00 00000128 00000296 00000402 00000001 Call Trace: [] ib_mad_send_done_handler+0x72/0x11e [] ib_mad_send_done_handler+0x72/0x11e [] ib_mad_completion_handler+0x80/0x8d [] wait_noreap_copyout+0x55/0xbe [] worker_thread+0x1b0/0x23a [] schedule+0x5d3/0xbdf [] ib_mad_completion_handler+0x0/0x8d [] default_wake_function+0x0/0xc [] default_wake_function+0x0/0xc [] worker_thread+0x0/0x23a [] kthread+0x8a/0xb2 [] kthread+0x0/0xb2 [] kernel_thread_helper+0x5/0xb Code: 00 00 74 01 fb f3 90 80 3e 00 7e f9 fa eb e8 83 c4 08 89 d8 5b 5e c3 8b 44 24 10 c7 04 24 ee 23 31 c0 89 44 24 04 e8 2f e7 e1 ff <0f> 0b 95 00 39 1c 31 c0 eb c2 53 89 c3 83 ec 08 fa 81 78 04 ad -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri Sep 9 15:36:58 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 18:36:58 -0400 Subject: [openib-general] Re: [PATCH] SA: Move SA attributes to ib_sa.h In-Reply-To: <52wtlpg9vz.fsf@cisco.com> References: <1126302861.4401.7921.camel@hal.voltaire.com> <52wtlpg9vz.fsf@cisco.com> Message-ID: <1126305418.4401.8046.camel@hal.voltaire.com> On Fri, 2005-09-09 at 18:21, Roland Dreier wrote: > This seems OK, so I applied it. But where is this reorganization leading? Again, this was for a madeye change that is forthcoming to decode SA packets. -- Hal From halr at voltaire.com Fri Sep 9 16:02:36 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 19:02:36 -0400 Subject: [openib-general] Re: [IBAT][PATCH] simplify debug/warn enable/disable In-Reply-To: References: Message-ID: <1126306952.4401.8123.camel@hal.voltaire.com> On Fri, 2005-09-09 at 16:07, James Lentini wrote: > Simplify the enabling/disabling of debugging and warning messages in > IBAT. Thanks. Applied. From rolandd at cisco.com Fri Sep 9 16:08:34 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 16:08:34 -0700 Subject: [openib-general] completion Q overflow error/panic In-Reply-To: <4df28be4050909152118f3e947@mail.gmail.com> (Viswanath Krishnamurthy's message of "Fri, 9 Sep 2005 15:21:44 -0700") References: <4df28be4050909152118f3e947@mail.gmail.com> Message-ID: <52ll25g7pp.fsf@cisco.com> Thanks for the excellent bug report. I'll try your code and see if I can reproduce the problem. If I can, then I should be able to fix the bugs. - R. From rolandd at cisco.com Fri Sep 9 16:12:04 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 16:12:04 -0700 Subject: [openib-general] netdev reference counting problem with ib_at Message-ID: <52hdctg7jv.fsf@cisco.com> If I configure an IPoIB interface, load the ib_at module, and then try to remove the ib_ipoib module, the module removal gets stuck with an endless stream of unregister_netdevice: waiting for ib0 to become free. Usage count = 1 This is really bad, because as far as I can tell, there's no way to recover from this situation other than rebooting. ib_at needs to be reworked so that it doesn't keep perpetual references to netdevs. - R. From rolandd at cisco.com Fri Sep 9 16:41:15 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 16:41:15 -0700 Subject: [openib-general] libibat/libibcm build mess Message-ID: <528xy5g678.fsf@cisco.com> Building libibat and libibcm right now is somewhat of a mess: - libibat and libibcm both have an include file named infiniband/at.h. It's actually installed by libibcm, but the version in libibat has some structures not defined in the libibcm version. - the cmpost example in libibcm uses , so in theory the libibcm build depends on libibat. But the libibcm configure script doesn't check for this. - include/infiniband/sa.h is not in the libibat EXTRA_DIST list, so libibat tarballs are not buildable. I'm not sure exactly how to untangle all of this. I don't think we want build dependencies between libibcm and libibat. So I'm not sure of a sensible home for the include file. As a side note, does it make sense for libibcm and libibat to install the cm_abi.h and at_abi.h files? Those ABI definitions should probably be kept internal to the libraries. libibverbs only exports its ABI because the individual device-specific drivers need to use it too. - R. From mshefty at ichips.intel.com Fri Sep 9 16:53:10 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 09 Sep 2005 16:53:10 -0700 Subject: [openib-general] libibat/libibcm build mess In-Reply-To: <528xy5g678.fsf@cisco.com> References: <528xy5g678.fsf@cisco.com> Message-ID: <43222066.2010403@ichips.intel.com> Roland Dreier wrote: > - libibat and libibcm both have an include file named > infiniband/at.h. It's actually installed by libibcm, but the > version in libibat has some structures not defined in the libibcm > version. I think you meant sa.h. I agree that there should be a single file. Did you have an idea of how to handle this? It seems more natural to me for this file to be located with umad, but I don't know if we want that dependency either. > I'm not sure exactly how to untangle all of this. I don't think we > want build dependencies between libibcm and libibat. So I'm not sure > of a sensible home for the include file. The only reason for the dependency is because of the ucmpost sample. I think it's useful to keep that example around somewhere, but maybe we can move it outside libibcm. > As a side note, does it make sense for libibcm and libibat to install > the cm_abi.h and at_abi.h files? Those ABI definitions should > probably be kept internal to the libraries. libibverbs only exports > its ABI because the individual device-specific drivers need to use it too. I can't think of a reason why cm_abi.h would need to be installed. - Sean From rolandd at cisco.com Fri Sep 9 16:57:27 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 16:57:27 -0700 Subject: [openib-general] libibat/libibcm build mess In-Reply-To: <528xy5g678.fsf@cisco.com> (Roland Dreier's message of "Fri, 09 Sep 2005 16:41:15 -0700") References: <528xy5g678.fsf@cisco.com> Message-ID: <521x3xg5g8.fsf@cisco.com> We probably want a patch like the below for libibcm. I don't like it, because it makes libibcm officially depend on libibat, but on the other hand it's just making the current situation explicit. Only this doesn't actually work: if we build libibcm after installing libibat, the check for fails because at.h includes sa.h, and libibat doesn't install sa.h. - R. --- libibcm/configure.in (revision 3345) +++ libibcm/configure.in (working copy) @@ -13,10 +13,16 @@ dnl Checks for programs AC_PROG_CC dnl Checks for libraries +AC_CHECK_LIB(ibverbs, ibv_get_devices, [], + AC_MSG_ERROR([ibv_get_devices() not found. libibcm requires libibcm.])) +AC_CHECK_LIB(ibat, ib_at_route_by_ip, [], + AC_MSG_ERROR([ib_at_route_by_ip() not found. libibcm requires libat.])) dnl Checks for header files. AC_CHECK_HEADER(infiniband/verbs.h, [], AC_MSG_ERROR([ not found. Is libibverbs installed?])) +AC_CHECK_HEADER(infiniband/at.h, [], + AC_MSG_ERROR([ not found. Is libibat installed?])) AC_HEADER_STDC dnl Checks for typedefs, structures, and compiler characteristics. --- libibcm/Makefile.am (revision 3355) +++ libibcm/Makefile.am (working copy) @@ -20,9 +20,7 @@ src_libibcm_la_LDFLAGS = -avoid-version bin_PROGRAMS = examples/ucmpost examples_ucmpost_SOURCES = examples/cmpost.c -examples_ucmpost_LDADD = $(top_builddir)/src/libibcm.la \ - $(libdir)/libibverbs.la \ - $(libdir)/libibat.la +examples_ucmpost_LDADD = $(top_builddir)/src/libibcm.la libibcmincludedir = $(includedir)/infiniband From rolandd at cisco.com Fri Sep 9 17:02:14 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 17:02:14 -0700 Subject: [openib-general] libibat/libibcm build mess In-Reply-To: <43222066.2010403@ichips.intel.com> (Sean Hefty's message of "Fri, 09 Sep 2005 16:53:10 -0700") References: <528xy5g678.fsf@cisco.com> <43222066.2010403@ichips.intel.com> Message-ID: <52wtlpeqnt.fsf@cisco.com> Sean> I think you meant sa.h. I agree that there should be a Sean> single file. Did you have an idea of how to handle this? Sean> It seems more natural to me for this file to be located with Sean> umad, but I don't know if we want that dependency either. Yep, sa.h, sorry. As I said before, I'm not sure how to handle it either. Oh well. Sean> The only reason for the dependency is because of the ucmpost Sean> sample. I think it's useful to keep that example around Sean> somewhere, but maybe we can move it outside libibcm. That would work. Maybe we should create some kind of examples package that has the spiffy apps that show everything off. - R. From halr at voltaire.com Fri Sep 9 17:00:07 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 20:00:07 -0400 Subject: [openib-general] libibat/libibcm build mess In-Reply-To: <528xy5g678.fsf@cisco.com> References: <528xy5g678.fsf@cisco.com> Message-ID: <1126310407.4382.45.camel@hal.voltaire.com> On Fri, 2005-09-09 at 19:41, Roland Dreier wrote: > Building libibat and libibcm right now is somewhat of a mess: > > - libibat and libibcm both have an include file named > infiniband/at.h. It's actually installed by libibcm, but the > version in libibat has some structures not defined in the libibcm > version. I think you mean sa.h rather than at.h. I believe the one in libibat is a superset of the one in libibcm in that it includes MCMemberRecords and ServiceRecord structures. More SA structures could be added. > - the cmpost example in libibcm uses , so in theory > the libibcm build depends on libibat. But the libibcm configure > script doesn't check for this. This is a recent change. > - include/infiniband/sa.h is not in the libibat EXTRA_DIST list, so > libibat tarballs are not buildable. Guess this should be added then. > I'm not sure exactly how to untangle all of this. I don't think we > want build dependencies between libibcm and libibat. Agreed. > So I'm not sure > of a sensible home for the include file. Perhaps there should be a libibsa ? > As a side note, does it make sense for libibcm and libibat to install > the cm_abi.h and at_abi.h files? Those ABI definitions should > probably be kept internal to the libraries. libibverbs only exports > its ABI because the individual device-specific drivers need to use it too. Sounds like there is no good reason for this. -- Hal From halr at voltaire.com Fri Sep 9 17:04:07 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 20:04:07 -0400 Subject: [openib-general] libibat/libibcm build mess In-Reply-To: <521x3xg5g8.fsf@cisco.com> References: <528xy5g678.fsf@cisco.com> <521x3xg5g8.fsf@cisco.com> Message-ID: <1126310570.4382.55.camel@hal.voltaire.com> On Fri, 2005-09-09 at 19:57, Roland Dreier wrote: > We probably want a patch like the below for libibcm. I don't like it, > because it makes libibcm officially depend on libibat, but on the > other hand it's just making the current situation explicit. > > Only this doesn't actually work: if we build libibcm after installing > libibat, the check for fails because at.h includes > sa.h, and libibat doesn't install sa.h. So should we make libibat install sa.h ? Does that solve this ? -- Hal From rolandd at cisco.com Fri Sep 9 17:10:15 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 17:10:15 -0700 Subject: [openib-general] libibat/libibcm build mess In-Reply-To: <1126310407.4382.45.camel@hal.voltaire.com> (Hal Rosenstock's message of "09 Sep 2005 20:00:07 -0400") References: <528xy5g678.fsf@cisco.com> <1126310407.4382.45.camel@hal.voltaire.com> Message-ID: <52slwdeqag.fsf@cisco.com> Hal> Perhaps there should be a libibsa ? That would be kind of silly unless there was something more than the sa.h include file to distribute. Maybe in the opposite direction it would make sense to fold libibat into libibcm. I suspect that there's not that much use for AT except for connecting. - R. From rolandd at cisco.com Fri Sep 9 17:11:44 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 17:11:44 -0700 Subject: [openib-general] libibat/libibcm build mess In-Reply-To: <1126310570.4382.55.camel@hal.voltaire.com> (Hal Rosenstock's message of "09 Sep 2005 20:04:07 -0400") References: <528xy5g678.fsf@cisco.com> <521x3xg5g8.fsf@cisco.com> <1126310570.4382.55.camel@hal.voltaire.com> Message-ID: <52oe71eq7z.fsf@cisco.com> Hal> So should we make libibat install sa.h ? Does that solve this? Yes, at least for the short term, one solution is to take sa.h out of libibcm completely and have libibat install it. And of course make sure it gets added to the EXTRA_DIST. - R. From rolandd at cisco.com Fri Sep 9 17:14:46 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 17:14:46 -0700 Subject: different CM panic (was: [openib-general] completion Q overflow error/panic) In-Reply-To: <4df28be4050909152118f3e947@mail.gmail.com> (Viswanath Krishnamurthy's message of "Fri, 9 Sep 2005 15:21:44 -0700") References: <4df28be4050909152118f3e947@mail.gmail.com> Message-ID: <52k6hpeq2x.fsf@cisco.com> No time to analyze this just now, but trying to run Viswa's test, I got the following CM panic: [ 3335.553628] Unable to handle kernel paging request at 000000013a7f0e18 RIP: [ 3335.567573] {rb_erase+213} [ 3335.587832] PGD 13cc64067 PUD 0 [ 3335.597568] Oops: 0000 [1] SMP [ 3335.607051] CPU 3 [ 3335.613110] Modules linked in: ib_uat ib_at ib_ucm ib_cm ib_uverbs ib_ipoib ib_sa ib_mthca ib_mad ib_core nfsd exportfs ipv6 nfs lockd sunrpc ehci_hcd ohci_hcd i2c_nforce2 i2c_core tg3 psmouse ide_generic ide_disk unix [ 3335.671837] Pid: 3050, comm: ib_cm/3 Not tainted 2.6.13 [ 3335.687608] RIP: 0010:[] {rb_erase+213} [ 3335.708115] RSP: 0018:ffff81007e1fbd10 EFLAGS: 00010006 [ 3335.724688] RAX: ffff8101382b3200 RBX: ffff81013a7f0e00 RCX: ffff8101382b3200 [ 3335.746243] RDX: 0000000000000000 RSI: 000000013a7f0e00 RDI: ffff81013a7f0f08 [ 3335.767791] RBP: 0000000000000082 R08: ffffffff881da2b8 R09: 0000000002e97c10 [ 3335.789347] R10: ffff81013a7f0f00 R11: ffff81007e67c800 R12: 0000000000000008 [ 3335.810891] R13: 0000000000000000 R14: 0000000000000000 R15: ffff81013ff74518 [ 3335.832422] FS: 00002aaaab2c5700(0000) GS:ffffffff804cc980(0000) knlGS:0000000000000000 [ 3335.856872] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [ 3335.874220] CR2: 000000013a7f0e18 CR3: 000000013eb56000 CR4: 00000000000006e0 [ 3335.895767] Process ib_cm/3 (pid: 3050, threadinfo ffff81007e1fa000, task ffff8100bfe9e2f0) [ 3335.920986] Stack: ffffffff881d0862 ffff81007e67c800 ffff8101382b3200 0000000000000000 [ 3335.944670] ffffffff881d098e ffff8101382b3200 ffffffff881d2ee5 0000000000000000 [ 3335.968894] 0000000000000287 00000004382b30e8 [ 3335.984157] Call Trace:{:ib_cm:cm_cleanup_timewait+60} {:ib_cm:cm_reset_to_idle+28} [ 3336.016783] {:ib_cm:ib_send_cm_rej+174} {:ib_cm:ib_destroy_cm_id+249} [ 3336.047562] {_read_unlock_irqrestore+5} {:ib_cm:cm_init_av_by_path+115} [ 3336.078857] {:ib_cm:cm_req_handler+1050} {:ib_cm:cm_work_handler+0} [ 3336.109107] {:ib_cm:cm_work_handler+31} {worker_thread+424} [ 3336.137279] {default_wake_function+0} {default_wake_function+0} [ 3336.166478] {keventd_create_kthread+0} {worker_thread+0} [ 3336.193836] {keventd_create_kthread+0} {kthread+129} [ 3336.220142] {child_rip+8} {keventd_create_kthread+0} [ 3336.246451] {kthread+0} {child_rip+0} [ 3336.268811] [ 3336.275661] [ 3336.275662] Code: 48 39 7e 18 75 06 48 89 4e 18 eb 09 48 89 4e 10 eb 03 49 89 From halr at voltaire.com Fri Sep 9 17:25:12 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 20:25:12 -0400 Subject: [openib-general] libibat/libibcm build mess In-Reply-To: <52slwdeqag.fsf@cisco.com> References: <528xy5g678.fsf@cisco.com> <1126310407.4382.45.camel@hal.voltaire.com> <52slwdeqag.fsf@cisco.com> Message-ID: <1126311828.4382.147.camel@hal.voltaire.com> On Fri, 2005-09-09 at 20:10, Roland Dreier wrote: > Hal> Perhaps there should be a libibsa ? > > That would be kind of silly unless there was something more than the > sa.h include file to distribute. Someday, there might be a user space SA. There are needs which are not currently met or met well in this area. > Maybe in the opposite direction it would make sense to fold libibat > into libibcm. I suspect that there's not that much use for AT except > for connecting. Yes, AT is used for connecting so this might be a better approach and eliminate a library. -- Hal From halr at voltaire.com Fri Sep 9 17:28:26 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 09 Sep 2005 20:28:26 -0400 Subject: [openib-general] libibat/libibcm build mess In-Reply-To: <52oe71eq7z.fsf@cisco.com> References: <528xy5g678.fsf@cisco.com> <521x3xg5g8.fsf@cisco.com> <1126310570.4382.55.camel@hal.voltaire.com> <52oe71eq7z.fsf@cisco.com> Message-ID: <1126311890.4382.154.camel@hal.voltaire.com> On Fri, 2005-09-09 at 20:11, Roland Dreier wrote: > Hal> So should we make libibat install sa.h ? Does that solve this? > > Yes, at least for the short term, one solution is to take sa.h out of > libibcm completely and have libibat install it. And of course make > sure it gets added to the EXTRA_DIST. Shall I take care of this or do you have it covered ? -- Hal From viswa.krish at gmail.com Fri Sep 9 17:58:02 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 9 Sep 2005 17:58:02 -0700 Subject: [openib-general] completion Q overflow error/panic In-Reply-To: <52ll25g7pp.fsf@cisco.com> References: <4df28be4050909152118f3e947@mail.gmail.com> <52ll25g7pp.fsf@cisco.com> Message-ID: <4df28be40509091758227211b8@mail.gmail.com> Some more info.. This also happens in the kernel level. I have a small kernel module which does the echo reply. After about 100-200 connections, I start to see the following message ib_mthca 0000:05:00.0: SQ 590473 full (8 head, 0 tail, 8 max, 0 nreq) ib_mthca 0000:05:00.0: SQ 590477 full (8 head, 0 tail, 8 max, 0 nreq) ib_mthca 0000:05:00.0: SQ 59040c full (8 head, 0 tail, 8 max, 0 nreq) Below 100 connections I do not see any such messages. Looks like if there is problem, it exists in both kernel and userland API's. -Viswa On 9/9/05, Roland Dreier wrote: > > Thanks for the excellent bug report. I'll try your code and see if I > can reproduce the problem. If I can, then I should be able to fix the > bugs. > > - R. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Fri Sep 9 20:54:05 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 20:54:05 -0700 Subject: [openib-general] Re: different CM panic In-Reply-To: <52k6hpeq2x.fsf@cisco.com> (Roland Dreier's message of "Fri, 09 Sep 2005 17:14:46 -0700") References: <4df28be4050909152118f3e947@mail.gmail.com> <52k6hpeq2x.fsf@cisco.com> Message-ID: <52br31efxe.fsf@cisco.com> Well, at least I tracked this down to a use-after-free bug in the CM. I went ahead and committed this trivial fix: If the CM REQ handling function gets to error2, then it frees cm_id_priv->timewait_info. But the next line goes through ib_destroy_cm_id() -> ib_send_cm_rej() -> cm_reset_to_idle(), which ends up calling cm_cleanup_timewait(), which dereferences the pointer we just freed. --- infiniband/core/cm.c (revision 3352) +++ infiniband/core/cm.c (working copy) @@ -1315,6 +1315,7 @@ error3: atomic_dec(&cm_id_priv->refcount cm_deref_id(listen_cm_id_priv); cm_cleanup_timewait(cm_id_priv->timewait_info); error2: kfree(cm_id_priv->timewait_info); + cm_id_priv->timewait_info = NULL; error1: ib_destroy_cm_id(&cm_id_priv->id); return ret; } From rolandd at cisco.com Fri Sep 9 22:02:34 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 09 Sep 2005 22:02:34 -0700 Subject: [openib-general] completion Q overflow error/panic In-Reply-To: <4df28be4050909152118f3e947@mail.gmail.com> (Viswanath Krishnamurthy's message of "Fri, 9 Sep 2005 15:21:44 -0700") References: <4df28be4050909152118f3e947@mail.gmail.com> Message-ID: <527jdpecr9.fsf@cisco.com> I found one bug in your cmpost.c program that could cause CQ overruns. When you create your receive and send CQs, you create them with a cqe value of 5, so they can hold at most 5 entries. However, you create the send and receive work queues so they can hold up to 10 entries, and in fact the code will post up to 8 entries at a time. So it's possible to overflow the CQ. The fix is to create the CQs to have at least as many entries as the work queues -- in other words, change cqe to 10. However, even with this fixed I do see some strange behavior that I'm still debugging. More details on Monday. What HCA firmware version do your systems have? - R. From eitan at mellanox.co.il Sat Sep 10 00:24:41 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 10 Sep 2005 10:24:41 +0300 Subject: [openib-general] Re: Some OpenSM 1.8.0 Anomalies In-Reply-To: <1126297216.4401.7637.camel@hal.voltaire.com> References: <506C3D7B14CDD411A52C00025558DED607C307AA@mtlex01.yok.mtl.com> <1126297216.4401.7637.camel@hal.voltaire.com> Message-ID: <43228A39.2060209@mellanox.co.il> Hal Rosenstock wrote: > On Fri, 2005-09-09 at 16:14, Eitan Zahavi wrote: > >>>I think the comment in the code is wrong here and should be gid >> >>rather >> >>>than pkey. I do agree that the pkey sharing needs checking but that >> >>is >> >>>separate. >> >>[EZ] OK we can improve the comments accuracy and readability. As >>always comments written by the developers are somewhat biased by the >>fact he already understands the code. So the first time reader can do >>a better job... > > > Understood. > > >>>I'm still not sure what is the bug you are referring to above >> >>though. >>[EZ] The bug is that the code does not perform the required operations >>to meet o13-17.1.2 compliancy. > > > What is missing ? Is it the part relating to "except if > InformInfo:LIDRangeBegin was 0xFFFF" ? Ooops. I was the one to coded it but still forgot. Yes you are right - everything is already there. > > -- Hal From eitan at mellanox.co.il Sat Sep 10 00:26:42 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 10 Sep 2005 10:26:42 +0300 Subject: [openib-general] Re: osmtest/OpenSM: ServiceGID and busy status In-Reply-To: <1126296841.4401.7610.camel@hal.voltaire.com> References: <506C3D7B14CDD411A52C00025558DED607C307A8@mtlex01.yok.mtl.com> <1126296841.4401.7610.camel@hal.voltaire.com> Message-ID: <43228AB2.3080806@mellanox.co.il> Hal Rosenstock wrote: > On Fri, 2005-09-09 at 16:05, Eitan Zahavi wrote: > >>>I may be wrong but: >>>ServiceGID says port GID for service. A port GID must meet the >>>requirements in the addressing section. >> >>[EZ] I think the spec intentionally leaves this open. The intent is to >>use this as GID but no check is defined. According to your >>interpretation no "proxy" - where node A publish services of node B - >>is allowed > > > Proxy would be allowed. There are 2 possibilities: > 1. Allow valid looking GIDs > or > 2. Only allow GIDs present in the subnet > > >>>For busy, it might be possible but is there one timeout retry >> >>strategy >> >>>or should this be left to the client ? For other errors, I think it >>>needs to be left to the client/application to determine whether it >> >>is in >> >>>error. >> >>[EZ] Agree about the need to pass up the error codes. Just handle the >>BUSY at a lower level which is probably common to most applications. >>But we might at least make it an optional service of the low level? > > > The OpenSM SA client API needs changing to make it optional. Other than > that it is a matter of the default policy: retries and timeout (with > backoff) to be used. We should add it to the todo list. BTW is the presented future work during the last OpenIB workshop reflected in a todo file ? > > -- Hal From ebiederm at xmission.com Sat Sep 10 10:25:27 2005 From: ebiederm at xmission.com (Eric W. Biederman) Date: Sat, 10 Sep 2005 11:25:27 -0600 Subject: [openib-general] [PATCH] af_packet: Allow for > 8 byte hardware addresses. In-Reply-To: <20050811.124916.77057824.davem@davemloft.net> (David S. Miller's message of "Thu, 11 Aug 2005 12:49:16 -0700 (PDT)") References: <1123786117.4403.5835.camel@hal.voltaire.com> <20050811.124916.77057824.davem@davemloft.net> Message-ID: Does this approach look sound? The convention it adopts is that longer addresses will simply extend the hardeware address byte arrays at the end of sockaddr_ll and packet_mreq. In making this change a small information leak was also closed. The code only initializes the hardware address bytes that are used, but all of struct sockaddr_ll was copied to userspace. Now we just copy sockaddr_ll to the last byte of the hardware address used. Hopefully for the common case of ethernet returning a value that is 2 bytes smaller than sockaddr_ll won't break anything. Given that it simplifies the code and removes an information leak I thought the small chance of breakage was worth it. If not this can be easily fixed. For error checking larger structures than our internal maximums continue to be allowed but an error is signaled if we can not fit the hardware address into our internal structure. Signed-off-by: Eric W. Biederman --- net/packet/af_packet.c | 64 +++++++++++++++++++++++++++++++++++------------- 1 files changed, 47 insertions(+), 17 deletions(-) 39a41e0363bab6661f40f24b404416b99de0c15a diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -36,6 +36,11 @@ * Michal Ostrowski : Module initialization cleanup. * Ulises Alonso : Frame number limit removal and * packet_set_ring memory leak. + * Eric Biederman : Allow for > 8 byte hardware addresses. + * The convention is that longer addresses + * will simply extend the hardware address + * byte arrays at the end of sockaddr_ll + * and packet_mreq. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -161,7 +166,17 @@ struct packet_mclist int count; unsigned short type; unsigned short alen; - unsigned char addr[8]; + unsigned char addr[MAX_ADDR_LEN]; +}; +/* identical to struct packet_mreq except it has + * a longer address field. + */ +struct packet_mreq_max +{ + int mr_ifindex; + unsigned short mr_type; + unsigned short mr_alen; + unsigned char mr_address[MAX_ADDR_LEN]; }; #endif #ifdef CONFIG_PACKET_MMAP @@ -716,6 +731,8 @@ static int packet_sendmsg(struct kiocb * err = -EINVAL; if (msg->msg_namelen < sizeof(struct sockaddr_ll)) goto out; + if (msg->msg_namelen < (saddr->sll_halen + offsetof(struct sockaddr_ll, sll_addr))) + goto out; ifindex = saddr->sll_ifindex; proto = saddr->sll_protocol; addr = saddr->sll_addr; @@ -744,6 +761,12 @@ static int packet_sendmsg(struct kiocb * if (dev->hard_header) { int res; err = -EINVAL; + if (saddr) { + if (saddr->sll_halen != dev->addr_len) + goto out_free; + if (saddr->sll_hatype != dev->type) + goto out_free; + } res = dev->hard_header(skb, dev, ntohs(proto), addr, NULL, len); if (sock->type != SOCK_DGRAM) { skb->tail = skb->data; @@ -1045,6 +1068,7 @@ static int packet_recvmsg(struct kiocb * struct sock *sk = sock->sk; struct sk_buff *skb; int copied, err; + struct sockaddr_ll *sll; err = -EINVAL; if (flags & ~(MSG_PEEK|MSG_DONTWAIT|MSG_TRUNC|MSG_CMSG_COMPAT)) @@ -1057,16 +1081,6 @@ static int packet_recvmsg(struct kiocb * #endif /* - * If the address length field is there to be filled in, we fill - * it in now. - */ - - if (sock->type == SOCK_PACKET) - msg->msg_namelen = sizeof(struct sockaddr_pkt); - else - msg->msg_namelen = sizeof(struct sockaddr_ll); - - /* * Call the generic datagram receiver. This handles all sorts * of horrible races and re-entrancy so we can forget about it * in the protocol layers. @@ -1087,6 +1101,17 @@ static int packet_recvmsg(struct kiocb * goto out; /* + * If the address length field is there to be filled in, we fill + * it in now. + */ + + sll = (struct sockaddr_ll*)skb->cb; + if (sock->type == SOCK_PACKET) + msg->msg_namelen = sizeof(struct sockaddr_pkt); + else + msg->msg_namelen = sll->sll_halen + offsetof(struct sockaddr_ll, sll_addr); + + /* * You lose any data beyond the buffer you gave. If it worries a * user program they can ask the device for its MTU anyway. */ @@ -1166,7 +1191,7 @@ static int packet_getname(struct socket sll->sll_hatype = 0; /* Bad: we have no ARPHRD_UNSPEC */ sll->sll_halen = 0; } - *uaddr_len = sizeof(*sll); + *uaddr_len = offsetof(struct sockaddr_ll, sll_addr) + sll->sll_halen; return 0; } @@ -1199,7 +1224,7 @@ static void packet_dev_mclist(struct net } } -static int packet_mc_add(struct sock *sk, struct packet_mreq *mreq) +static int packet_mc_add(struct sock *sk, struct packet_mreq_max *mreq) { struct packet_sock *po = pkt_sk(sk); struct packet_mclist *ml, *i; @@ -1249,7 +1274,7 @@ done: return err; } -static int packet_mc_drop(struct sock *sk, struct packet_mreq *mreq) +static int packet_mc_drop(struct sock *sk, struct packet_mreq_max *mreq) { struct packet_mclist *ml, **mlp; @@ -1315,11 +1340,16 @@ packet_setsockopt(struct socket *sock, i case PACKET_ADD_MEMBERSHIP: case PACKET_DROP_MEMBERSHIP: { - struct packet_mreq mreq; - if (optlen sizeof(mreq)) + len = sizeof(mreq); + if (copy_from_user(&mreq,optval,len)) return -EFAULT; + if (len < (mreq.mr_alen + offsetof(struct packet_mreq, mr_address))) + return -EINVAL; if (optname == PACKET_ADD_MEMBERSHIP) ret = packet_mc_add(sk, &mreq); else From viswa.krish at gmail.com Sat Sep 10 10:53:28 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Sat, 10 Sep 2005 10:53:28 -0700 Subject: [openib-general] completion Q overflow error/panic In-Reply-To: <527jdpecr9.fsf@cisco.com> References: <4df28be4050909152118f3e947@mail.gmail.com> <527jdpecr9.fsf@cisco.com> Message-ID: <4df28be405091010536a2675c1@mail.gmail.com> Here is ibv_devinfo output. It is InfiniHost_III_Lx0 ]# ibv_devinfo hca_id: mthca0 fw_ver: 1.0.1 node_guid: 0002:c902:0040:0cfc sys_image_guid: 0002:c902:0040:0cff max_mr_size: 0xffffffffffffffff page_size_cap: 0x0 vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0x0 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: invalid MTU (0) active_mtu: invalid MTU (0) sm_lid: 1 port_lid: 1 port_lmc: 0x00 Yes the CQE is a bug. But in this case at any time there should be one outstanding packet in the pipe. The client sends 1 packet, waits for response with a pause (delay), then sends the next packet. If everything works, we should be using atmost 1 cq entry. Initially I had more number of CQ entries, but the problem appeared later. Looks like the packet is getting stuck somewhere, with no notification back of any error. Do we need to tweak any of the QP parameters ? (packet life time, retries etc) ? -Viswa On 9/9/05, Roland Dreier wrote: > > I found one bug in your cmpost.c program that could cause CQ > overruns. When you create your receive and send CQs, you create them > with a cqe value of 5, so they can hold at most 5 entries. However, > you create the send and receive work queues so they can hold up to 10 > entries, and in fact the code will post up to 8 entries at a time. So > it's possible to overflow the CQ. > > The fix is to create the CQs to have at least as many entries as the > work queues -- in other words, change cqe to 10. > > However, even with this fixed I do see some strange behavior that I'm > still debugging. More details on Monday. > > What HCA firmware version do your systems have? > > - R. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zadzyftd at shawcable.net Fri Sep 9 23:20:43 2005 From: zadzyftd at shawcable.net (Ursula Nicholson) Date: Sat, 10 Sep 2005 04:20:43 -0200 Subject: [openib-general] Hey Ursula! Message-ID: <589p495x.8642544@shawcable.net> We are happy to present you with six deals from four different brokers. Please remember that there is no commitment required on your part, and your credit is not an issue. Please validate your information with our secure and private database to ensure our records are up to date and accurate. http://her0es.net/p3.asp Have a good day. Sincerely, Ursula Nicholson Customer Service Rep eSOX Inc. ludlow on proxy be the decatur some try bagging the try crescendo in and tendency the some lengthy ! it's abash seeit's maine may. neural a copybook but try naval ! the jawbreak some or plunk some it's liven may it cost on in baseball bebut sloane or. From kingman at storagegear.com Sat Sep 10 15:16:47 2005 From: kingman at storagegear.com (John Kingman) Date: Sat, 10 Sep 2005 17:16:47 -0500 (CDT) Subject: [openib-general] [PATCH] [SRP] Fix CM redirection in SRP In-Reply-To: <528xy6i7e4.fsf@cisco.com> References: <528xy6i7e4.fsf@cisco.com> Message-ID: On Fri, 9 Sep 2005, Roland Dreier wrote: >Once I get that done (should happen today), please respin your patch, >test with your target and resend. Here is the updated and tested patch. I believe all of the other changes have been committed. Signed-off-by: John Kingman storagegear.com> Index: ib_srp.h =================================================================== --- ib_srp.h (revision 3359) +++ ib_srp.h (working copy) @@ -52,6 +52,7 @@ enum { SRP_ABORT_TIMEOUT_MS = 5000, SRP_PORT_REDIRECT = 1, + SRP_DLID_REDIRECT = 2, SRP_MAX_IU_LEN = 256, Index: ib_srp.c =================================================================== --- ib_srp.c (revision 3359) +++ ib_srp.c (working copy) @@ -1028,11 +1028,18 @@ static int srp_cm_handler(struct ib_cm_i case IB_CM_REJ_RECEIVED: printk(KERN_DEBUG PFX "REJ received\n"); comp = 1; + cpi = event->param.rej_rcvd.ari; if (event->param.rej_rcvd.reason == IB_CM_REJ_PORT_CM_REDIRECT) { - cpi = event->param.rej_rcvd.ari; - memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); - target->status = SRP_PORT_REDIRECT; + target->path.dlid = cpi->redirect_lid; + target->path.pkey = cpi->redirect_pkey; + cm_id->remote_cm_qpn = be32_to_cpu(cpi->redirect_qp) & 0x00ffffff; + if (target->path.dlid) { + target->status = SRP_DLID_REDIRECT; + } else { + memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); + target->status = SRP_PORT_REDIRECT; + } } else if (topspin_workarounds && !memcmp(&target->ioc_guid, topspin_oui, 3) && event->param.rej_rcvd.reason == IB_CM_REJ_PORT_REDIRECT) { @@ -1041,8 +1048,7 @@ static int srp_cm_handler(struct ib_cm_i * reject reason code 25 when they mean 24 * (port redirect). */ - memcpy(target->path.dgid.raw, - event->param.rej_rcvd.ari + 0, 16); + memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), @@ -1393,6 +1399,7 @@ retry_path: goto err; } +retry_send: init_completion(&target->done); ret = srp_send_req(target); if (ret) @@ -1401,10 +1408,13 @@ retry_path: /* * The CM event handling code will set status to - * SRP_PORT_REDIRECT if we get a port redirect REJ back. + * SRP_PORT_REDIRECT if we get a port redirect REJ back, + * or SRP_DLID_REDIRECT if we get a lid/qp redirect REJ back. */ if (target->status == SRP_PORT_REDIRECT) goto retry_path; + else if (target->status == SRP_DLID_REDIRECT) + goto retry_send; else if (target->status < 0) { printk(KERN_ERR PFX "Connection failed\n"); ret = target->status; From rolandd at cisco.com Sat Sep 10 16:02:12 2005 From: rolandd at cisco.com (Roland Dreier) Date: Sat, 10 Sep 2005 16:02:12 -0700 Subject: [openib-general] [git pull] InfiniBand updates Message-ID: <523bocedcb.fsf@cisco.com> Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git This tree is also available from kernel.org mirrors at: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git This will update the following files: drivers/infiniband/Kconfig | 25 +- drivers/infiniband/core/Makefile | 5 drivers/infiniband/core/cm.c | 5 drivers/infiniband/core/mad_rmpp.c | 4 drivers/infiniband/core/sa_query.c | 30 --- drivers/infiniband/core/ucm.c | 289 +++++++++++++++++++----------- drivers/infiniband/core/ucm.h | 11 - drivers/infiniband/core/uverbs.h | 26 +- drivers/infiniband/core/uverbs_cmd.c | 155 +++++++++++----- drivers/infiniband/core/uverbs_main.c | 98 ++++++---- drivers/infiniband/hw/mthca/mthca_qp.c | 45 +++- drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 include/rdma/ib_cm.h | 1 include/rdma/ib_mad.h | 21 ++ include/rdma/ib_sa.h | 31 +++ include/rdma/ib_user_cm.h | 72 +++++++ include/rdma/ib_user_verbs.h | 21 ++ 17 files changed, 590 insertions(+), 251 deletions(-) through the following changes: commit 1b205c2d2464bfecbba80227e74b412596dc5521 Author: Roland Dreier Date: Fri Sep 9 20:52:00 2005 -0700 [PATCH] IB: fix CM use-after-free If the CM REQ handling function gets to error2, then it frees cm_id_priv->timewait_info. But the next line goes through ib_destroy_cm_id() -> ib_send_cm_rej() -> cm_reset_to_idle(), which ends up calling cm_cleanup_timewait(), which dereferences the pointer we just freed. Make sure we clear cm_id_priv->timewait_info after freeing it, so that doesn't happen. Signed-off-by: Roland Dreier commit 354ba39cf96e439149541acf3c6c7c0df0a3ef25 Author: John Kingman Date: Fri Sep 9 18:23:32 2005 -0700 [PATCH] IB CM: support CM redir Changes to CM to support CM and port redirection (REJ reason 24). Signed-off-by: John Kingman storagegear.com> Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 63aaf647529e8a56bdf31fd8f2979d4371c6a332 Author: Roland Dreier Date: Fri Sep 9 15:55:08 2005 -0700 Make sure that userspace does not retrieve stale asynchronous or completion events after destroying a CQ, QP or SRQ. We do this by sweeping the event lists before returning from a destroy calls, and then return the number of events already reported before the destroy call. This allows userspace wait until it has processed all events for an object returned from the kernel before it frees its context for the object. The ABI of the destroy CQ, destroy QP and destroy SRQ commands has to change to return the event count, so bump the ABI version from 1 to 2. The userspace libibverbs library has already been updated to handle both the old and new ABI versions. Signed-off-by: Roland Dreier commit 2e9f7cb7869059e55cd91f5e23c6380f3763db56 Author: Roland Dreier Date: Fri Sep 9 15:45:57 2005 -0700 [PATCH] IB: Add struct for ClassPortInfo Add structure definition for ClassPortInfo format. This is needed for (at least) handling CM redirects. Signed-off-by: Roland Dreier commit fbed8eee70cf7e11fbf231afafc0ccb313acc62e Author: Hal Rosenstock Date: Fri Sep 9 15:24:04 2005 -0700 [PATCH] IB: Move SA attributes to ib_sa.h SA: Move SA attributes to ib_sa.h so are accessible to more than sa_query.c. Also, remove deprecated attributes and add one missing one. Signed-off-by: Hal Rosenstock Signed-off-by: Roland Dreier commit 1325cc79163058739b70bed9860fccbecac6236b Author: Hal Rosenstock Date: Fri Sep 9 13:45:51 2005 -0700 [PATCH] IB: Define more SA methods ib_sa.h: Define more SA methods (initially for madeye decode) Signed-off-by: Hal Rosenstock Signed-off-by: Roland Dreier commit 17781cd6186cb3472ff34b2d9a15e647bd311e8b Author: James Lentini Date: Wed Sep 7 12:43:08 2005 -0700 [PATCH] IB: clean up user access config options Add a new config option INFINIBAND_USER_MAD to control whether we build ib_umad. Change INFINIBAND_USER_VERBS to INFINIBAND_USER_ACCESS, and have it control ib_ucm and ib_uat as well as ib_uverbs. Signed-off-by: James Lentini Signed-off-by: Roland Dreier commit b5dcbf47e10e568273213a4410daa27c11cdba3a Author: Hal Rosenstock Date: Wed Sep 7 11:03:41 2005 -0700 [PATCH] IB: RMPP fixes - Fix payload length of middle RMPP sent segments. Middle payload lengths should be 0 on the send side. (This is perhaps a compliance and should not be an interop issue as middle payload lengths are supposed to be ignored on receive). - Fix length in first segment of multipacket sends (This is a compliance issue but does not affect at least OpenIB to OpenIB RMPP transfers). Signed-off-by: Hal Rosenstock Signed-off-by: Roland Dreier commit 30a7e8ef13b2ff0db7b15af9afdd12b93783f01e Author: Michael S. Tsirkin Date: Wed Sep 7 09:45:00 2005 -0700 [PATCH] IB: Initialize qp->wait Add missing call to init_waitqueue_head(). Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier commit c9fe2b3287498b80781284306064104ef9c8a31a Author: Roland Dreier Date: Wed Sep 7 09:43:23 2005 -0700 [PATCH] IB: really reset QPs When we modify a QP to the RESET state, completely clean up the QP so that it is really and truly reset. Signed-off-by: Roland Dreier commit 0b2b35f68140ceeb1b78ef85680198e63ebc8649 Author: Sean Hefty Date: Thu Sep 1 09:28:03 2005 -0700 [PATCH] IB: Add user-supplied context to userspace CM ABI - Add user specified context to all uCM events. Users will not retrieve any events associated with the context after destroying the corresponding cm_id. - Provide the ib_cm_init_qp_attr() call to userspace clients of the CM. This call may be used to set QP attributes properly before modifying the QP. - Fixes some error handling synchonization and cleanup issues. - Performs some minor code cleanup. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier commit 1d6801f9dd3ebb054ae685153a01b1a4ec817f46 Author: Michael S. Tsirkin Date: Thu Sep 1 09:19:44 2005 -0700 [PATCH] IB/sa_query: avoid unnecessary list scan Using ib_get_client_data in SA event handler performs a list scan. It's better to use container_of to get the sa device directly. Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier commit 06c56e44f3e32a859420ecac97996cc6f12827bb Author: Michael S. Tsirkin Date: Thu Sep 1 09:19:02 2005 -0700 [PATCH] IPoIB: fix memory leak Fix IPoIB memory leak on device removal. Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier From cw at f00f.org Sat Sep 10 20:03:45 2005 From: cw at f00f.org (Chris Wedgwood) Date: Sat, 10 Sep 2005 20:03:45 -0700 Subject: [openib-general] Re: [git pull] InfiniBand updates In-Reply-To: <523bocedcb.fsf@cisco.com> References: <523bocedcb.fsf@cisco.com> Message-ID: <20050911030345.GA14593@taniwha.stupidest.org> On Sat, Sep 10, 2005 at 04:02:12PM -0700, Roland Dreier wrote: > include/rdma/ib_cm.h | 1 > include/rdma/ib_mad.h | 21 ++ > include/rdma/ib_sa.h | 31 +++ > include/rdma/ib_user_cm.h | 72 +++++++ > include/rdma/ib_user_verbs.h | 21 ++ Do these really need to be here? if we really must merge RDMA can we not hide these headers in drivers/inifiniband for now? From mst at mellanox.co.il Sun Sep 11 00:22:16 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Sep 2005 10:22:16 +0300 Subject: [openib-general] Re: sdp: kill sdp buff pool In-Reply-To: References: <20050908114156.GI19358@mellanox.co.il> <20050908161849.GA21522@mellanox.co.il> Message-ID: <20050911072216.GM19358@mellanox.co.il> Quoting Tom Duffy : > >Yep, simple replacing of sdp lists with list_head (mega patch that > >I posted previously) hurts performance by some 10%, and I didnt yet > >figure why. > > Which patch was that? It turns out I didnt post that patch yet. It might not apply cleanly now since I didnt yet update it after recent changes. Do you want to see it? > Did that include the sdp_buff.[ch] changes I > posted? No, it took a more drastic approach of replacing all calls to sdp_buff,sdp_queue,sdp_advt and sdp_iocb functions by list.h macros. This probably was too big a change: replacing just the list traversal functions and the casts with list_for_each and container_of calls would likely work better. -- MST From mst at mellanox.co.il Sun Sep 11 01:04:38 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Sep 2005 11:04:38 +0300 Subject: [openib-general] Re: Re: Re: [PATCH] [SDP] change CM event processing In-Reply-To: References: <20050907172230.GD10137@mellanox.co.il> Message-ID: <20050911080438.GP19358@mellanox.co.il> Quoting Sean Hefty : > >In fact, if you post a separate patch for just the cm state > >I'm ready to apply that right away. > > I think that all you need is the diff from sdp_event.c (extracted below). Thanks for looking into this. This is what I checked in rev 3363: --- The state of the cm_id is controlled by the CM and can change at any time as a result of processing a received MAD. It's only exposed for debug purposes. Signed-off-by: Michael S. Tsirkin Signed-off-by: Sean Hefty Index: linux-kernel/drivers/infiniband/ulp/sdp/sdp_event.c =================================================================== --- linux-kernel/drivers/infiniband/ulp/sdp/sdp_event.c (revision 3359) +++ linux-kernel/drivers/infiniband/ulp/sdp/sdp_event.c (working copy) @@ -384,45 +384,48 @@ int sdp_cm_event_handler(struct ib_cm_id struct sdp_sock *conn = NULL; int result = 0; - sdp_dbg_ctrl(NULL, "CM state <%d> event <%d> commID <%08x> ID <%d>", - cm_id->state, event->event, cm_id->local_id, hashent); - /* - * lookup the connection, on a REQ_RECV the sk will be empty. - */ - conn = sdp_conn_table_lookup(hashent); - if (conn) - sdp_conn_lock(conn); - else - if (cm_id->state != IB_CM_REQ_RCVD) { - sdp_dbg_warn(NULL, - "No conn <%d> CM state <%d> event <%d>", - hashent, cm_id->state, event->event); + sdp_dbg_ctrl(NULL, "event <%d> commID <%08x> ID <%d>", + event->event, cm_id->local_id, hashent); + + if (event->event != IB_CM_REQ_RECEIVED) { + conn = sdp_conn_table_lookup(hashent); + if (conn) + sdp_conn_lock(conn); + else { + sdp_dbg_warn(NULL, "No conn <%d> CM event <%d>", + hashent, event->event); return -EINVAL; } + } - switch (cm_id->state) { - case IB_CM_REQ_RCVD: + switch (event->event) { + case IB_CM_REQ_RECEIVED: result = sdp_cm_req_handler(cm_id, event); break; - case IB_CM_REP_RCVD: + case IB_CM_REP_RECEIVED: result = sdp_cm_rep_handler(cm_id, event, conn); break; - case IB_CM_IDLE: + case IB_CM_REQ_ERROR: + case IB_CM_REP_ERROR: + case IB_CM_REJ_RECEIVED: + case IB_CM_TIMEWAIT_EXIT: result = sdp_cm_idle(cm_id, event, conn); break; - case IB_CM_ESTABLISHED: + case IB_CM_RTU_RECEIVED: + case IB_CM_USER_ESTABLISHED: result = sdp_cm_established(cm_id, event, conn); break; - case IB_CM_DREQ_RCVD: + case IB_CM_DREQ_RECEIVED: result = sdp_cm_dreq_rcvd(cm_id, event, conn); if (result) break; /* fall through on success to handle state transition */ - case IB_CM_TIMEWAIT: + case IB_CM_DREQ_ERROR: + case IB_CM_DREP_RECEIVED: result = sdp_cm_timewait(cm_id, event, conn); break; default: - sdp_dbg_warn(conn, "Unexpected CM state <%d>", cm_id->state); + sdp_dbg_warn(conn, "Unhandled CM event <%d>", event->event); result = -EINVAL; } /* -- MST From yael at mellanox.co.il Sun Sep 11 01:11:46 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Sun, 11 Sep 2005 11:11:46 +0300 Subject: [openib-general] RE: [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_d elete Message-ID: <506C3D7B14CDD411A52C00025558DED60CCF35@mtlex01.yok.mtl.com> Hi Hal, There is a problem with the patch. 1. In osm_vendor_unbind: You used free(p_bind), when the pointer was allocated using cl_zalloc. You need to use cl_free. 2. There is a race between the cl_free and the receiver thread. We get a segmentation fault due to the fact that the thread isn't destroyed before freeing the p_bind object. There should be a way to signal the reciever thread to exit, and the unbind should wait for that thread to join. Thanks, Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Tuesday, September 06, 2005 5:27 PM To: Yael Kalka; Eitan Zahavi Cc: openib-general at openib.org Subject: [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_delete [same patch just generated with diff -up] OpenSM: OpenIB vendor layer: Implement osm_vendor_delete [I've done some testing of this; are there any regressions for this ?] Signed-off-by: Hal Rosenstock --- osm_vendor_ibumad.c.1 2005-08-31 12:26:03.000000000 -0400 +++ osm_vendor_ibumad.c 2005-09-06 09:35:27.000000000 -0400 @@ -483,14 +483,8 @@ void osm_vendor_delete( IN osm_vendor_t** const pp_vend ) { - int agent_id; - - /* unregister UMAD agents */ - for (agent_id = 0; agent_id < UMAD_CA_MAX_AGENTS; agent_id++) - if ( (*pp_vend)->agents[agent_id] ) - umad_unregister( (*pp_vend)->umad_port_id, agent_id ); clear_madw( *pp_vend ); - /* make sure all ports are closed? */ + /* make sure all ports are closed */ umad_done(); cl_free( *pp_vend ); *pp_vend = NULL; @@ -593,7 +587,7 @@ Exit: /********************************************************************** **********************************************************************/ -int +static int osm_vendor_open_port( IN osm_vendor_t* const p_vend, IN const ib_net64_t port_guid ) @@ -828,11 +822,27 @@ osm_vendor_unbind( IN osm_bind_handle_t h_bind) { osm_umad_bind_info_t *p_bind = ( osm_umad_bind_info_t * ) h_bind; - osm_vendor_t *p_vend = p_bind->p_vend; + osm_vendor_t *p_vend; + + if (p_bind) { + p_vend = p_bind->p_vend; + + OSM_LOG_ENTER( p_vend->p_log, osm_vendor_unbind ); + + /* Unregister UMAD agents */ + if (p_vend->agents[p_bind->agent_id1]) + umad_unregister(p_bind->port_id, p_bind->agent_id1); + if (p_vend->agents[p_bind->agent_id]) + umad_unregister(p_bind->port_id, p_bind->agent_id); - OSM_LOG_ENTER( p_vend->p_log, osm_vendor_unbind ); + /* close port ??? */ + + free(p_bind); + + OSM_LOG_EXIT( p_vend->p_log); + + } - OSM_LOG_EXIT( p_vend->p_log); } /********************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From hch at infradead.org Sun Sep 11 02:00:22 2005 From: hch at infradead.org (Christoph Hellwig) Date: Sun, 11 Sep 2005 10:00:22 +0100 Subject: [openib-general] Re: [git pull] InfiniBand updates In-Reply-To: <20050911030345.GA14593@taniwha.stupidest.org> References: <523bocedcb.fsf@cisco.com> <20050911030345.GA14593@taniwha.stupidest.org> Message-ID: <20050911090022.GA4841@infradead.org> On Sat, Sep 10, 2005 at 08:03:45PM -0700, Chris Wedgwood wrote: > On Sat, Sep 10, 2005 at 04:02:12PM -0700, Roland Dreier wrote: > > > include/rdma/ib_cm.h | 1 > > include/rdma/ib_mad.h | 21 ++ > > include/rdma/ib_sa.h | 31 +++ > > include/rdma/ib_user_cm.h | 72 +++++++ > > include/rdma/ib_user_verbs.h | 21 ++ > > Do these really need to be here? if we really must merge RDMA can we > not hide these headers in drivers/inifiniband for now? No. They've been there before, but it's just wrong. This stuff is kernel-wide interfaces and having them under drivers/ was wrong to start with. From yael at mellanox.co.il Sun Sep 11 02:02:41 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Sun, 11 Sep 2005 12:02:41 +0300 Subject: [openib-general] OpenSM - branch for current merging Message-ID: <506C3D7B14CDD411A52C00025558DED60CCF36@mtlex01.yok.mtl.com> Hi Hal, I want to test and debug the files you are merging in the new opensm. Can you create a branch with your merged files of the osm, so I can use that branch both to try and debug your problems and try and test it on some systems here? I assume there are differences between your branch and our osm-1.8.0-merge branch. Thanks, Yael -------------- next part -------------- An HTML attachment was scrubbed... URL: From hch at lst.de Sun Sep 11 02:05:55 2005 From: hch at lst.de (Christoph Hellwig) Date: Sun, 11 Sep 2005 11:05:55 +0200 Subject: [openib-general] sdp: kill sdp buff pool In-Reply-To: <20050908114156.GI19358@mellanox.co.il> References: <20050908114156.GI19358@mellanox.co.il> Message-ID: <20050911090555.GA8570@lst.de> On Thu, Sep 08, 2005 at 02:41:56PM +0300, Michael S. Tsirkin wrote: > SDP seems to have a re-implementation of kmem_cache in sdp_buff. > Killing it actually results in a small bandwidth increase > for both bcopy and zcopy versions, so I plan to put this in. > > Note that sdp_buff_pool_chain_link is now same as sdp_buff_pool_put, > and sdp_buff_pool_chain_put is now a nop, but I decided to > keep it in for now, just in case there's another use for it. Btw, the whole sdb_buff code looks a little like it's duplicating the network layers sk_buff code. Anyone looking into this higher-level thing? From yael at mellanox.co.il Sun Sep 11 02:47:01 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Sun, 11 Sep 2005 12:47:01 +0300 Subject: [openib-general] RE: Another OpenSM 1.8.0 nit Message-ID: <506C3D7B14CDD411A52C00025558DED60CCF38@mtlex01.yok.mtl.com> Hi Hal, The constant should be used. I added to our code (osm-1.8.0-merge) use of this code. There was a problem that the constant was defined, but in osm_subnet.c the sminfo_polling_timeout was hardcoded given the value of 10000, instead of using this constant. Do you want me to send a patch for this too? Regarding documentation - we do have user manual for the 1.8.0. How do you want to add it? Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Wednesday, September 07, 2005 1:16 AM To: Yael Kalka Cc: openib-general at openib.org Subject: Another OpenSM 1.8.0 nit Hi Yael, Here'a another OpenSM 1.8.0 nit: opensm/osm_base.h:/****d* OpenSM: Base/OSM_SM_DEFAULT_POLLING_TIMEOUT_MILISECS opensm/osm_base.h:* OSM_SM_DEFAULT_POLLING_TIMEOUT_MILISECS opensm/osm_base.h:#define OSM_SM_DEFAULT_POLLING_TIMEOUT_MILISECS 10000 Is this used ? Also are there updated docs (user manual, release notes) for 1.8.0 ? Thanks. -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From hch at lst.de Sun Sep 11 02:48:23 2005 From: hch at lst.de (Christoph Hellwig) Date: Sun, 11 Sep 2005 11:48:23 +0200 Subject: [openib-general] [PATCH] [verbs] add node_guid to device structure In-Reply-To: References: Message-ID: <20050911094823.GA9217@lst.de> On Wed, Sep 07, 2005 at 04:24:56PM -0700, Sean Hefty wrote: > This patch adds the node_guid to struct ib_device to avoid ULPs needing > to query for it. > > It will also make it possible to give users the attributes of a device > as part of their add_device routine. > > If this patch is okay with everyone, I will submit patches to remove > the device attribute queries in the CM, SRP, and sysfs. The PAGE_SIZE slab cache doesn't make sense - just use alloc_page to get a page or merge it into the other slab cache. Probably the former makes more sense. From mst at mellanox.co.il Sun Sep 11 03:11:04 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Sep 2005 13:11:04 +0300 Subject: [openib-general] sdp: kill sdp buff pool In-Reply-To: <20050911090555.GA8570@lst.de> References: <20050908114156.GI19358@mellanox.co.il> <20050911090555.GA8570@lst.de> Message-ID: <20050911101104.GQ19358@mellanox.co.il> Quoting Christoph Hellwig : > Btw, the whole sdb_buff code looks a little like it's duplicating the > network layers sk_buff code. Anyone looking into this higher-level > thing? I was looking into the possibility to use list.h to manage queues. But you are right - there is a similiarity, it might just make sense to use skbuff.h rather than list.h to manage sdpc_buffs. Need to think what to do with sdpc_iocb/sdpc_advt then: currently these are, sometimes, kept on the same queue with sdpc_buff objects. Worth keeping in mind, and I'll be interested to see such a patch emerge. Another option that comes to mind would be to generalize linked-list-with-size code in skbuff.h, moving it into list.h, sdp would then reuse it. Something like: struct slist_entry { struct slist_entry *next; struct slist_entry *prev; struct slist_head *list; }; Does this make sense to anyone? Thanks, -- MST From rolandd at cisco.com Sun Sep 11 08:15:42 2005 From: rolandd at cisco.com (Roland Dreier) Date: Sun, 11 Sep 2005 08:15:42 -0700 Subject: [openib-general] Re: [git pull] InfiniBand updates In-Reply-To: <20050911030345.GA14593@taniwha.stupidest.org> (Chris Wedgwood's message of "Sat, 10 Sep 2005 20:03:45 -0700") References: <523bocedcb.fsf@cisco.com> <20050911030345.GA14593@taniwha.stupidest.org> Message-ID: <52u0grd49t.fsf@cisco.com> > > include/rdma/ib_cm.h | 1 > > include/rdma/ib_mad.h | 21 ++ > > include/rdma/ib_sa.h | 31 +++ > > include/rdma/ib_user_cm.h | 72 +++++++ > > include/rdma/ib_user_verbs.h | 21 ++ > Do these really need to be here? if we really must merge RDMA can we > not hide these headers in drivers/inifiniband for now? The includes were moved from drivers/infiniband a few weeks ago for various good reasons. I really wish you had replied to the initial RFC (http://lkml.org/lkml/2005/8/4/191) or the merge where the headers were actually moved (http://lkml.org/lkml/2005/8/29/105). I don't think there's much point in moving the files back now. - R. From mst at mellanox.co.il Sun Sep 11 08:19:34 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Sep 2005 18:19:34 +0300 Subject: [openib-general] [PATCH] ipoib: fix module removal race Message-ID: <20050911151934.GB19358@mellanox.co.il> Roland, does the following patch make sense? IP over IB seems more stable (didnt yet crash for me) with this applied. --- Since ipoib uses queue_delayed_work to run flush task on port state events, it must flush scheduled work after unregistering the event handler. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.13/drivers/infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c 2005-09-11 12:36:47.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/ipoib/ipoib_main.c 2005-09-11 14:13:49.000000000 +0300 @@ -1005,6 +1005,7 @@ debug_failed: register_failed: ib_unregister_event_handler(&priv->event_handler); + flush_scheduled_work(); event_failed: ipoib_dev_cleanup(priv->dev); @@ -1057,6 +1058,7 @@ static void ipoib_remove_one(struct ib_d list_for_each_entry_safe(priv, tmp, dev_list, list) { ib_unregister_event_handler(&priv->event_handler); + flush_scheduled_work(); unregister_netdev(priv->dev); ipoib_dev_cleanup(priv->dev); -- MST From rolandd at cisco.com Sun Sep 11 08:34:32 2005 From: rolandd at cisco.com (Roland Dreier) Date: Sun, 11 Sep 2005 08:34:32 -0700 Subject: [openib-general] [PATCH] [verbs] add node_guid to device structure In-Reply-To: <20050911094823.GA9217@lst.de> (Christoph Hellwig's message of "Sun, 11 Sep 2005 11:48:23 +0200") References: <20050911094823.GA9217@lst.de> Message-ID: <52psrfd3ef.fsf@cisco.com> Christoph> The PAGE_SIZE slab cache doesn't make sense - just use Christoph> alloc_page to get a page or merge it into the other Christoph> slab cache. Probably the former makes more sense. I'm confused. I couldn't find any use of PAGE_SIZE in the patch you're replying to. More details please... - R. From hch at lst.de Sun Sep 11 08:58:26 2005 From: hch at lst.de (Christoph Hellwig) Date: Sun, 11 Sep 2005 17:58:26 +0200 Subject: [openib-general] [PATCH] [verbs] add node_guid to device structure In-Reply-To: <52psrfd3ef.fsf@cisco.com> References: <20050911094823.GA9217@lst.de> <52psrfd3ef.fsf@cisco.com> Message-ID: <20050911155826.GA14460@lst.de> On Sun, Sep 11, 2005 at 08:34:32AM -0700, Roland Dreier wrote: > Christoph> The PAGE_SIZE slab cache doesn't make sense - just use > Christoph> alloc_page to get a page or merge it into the other > Christoph> slab cache. Probably the former makes more sense. > > I'm confused. I couldn't find any use of PAGE_SIZE in the patch > you're replying to. More details please... Sorry, looks like I replied to a totally unrelated mail for some reason. I'll resend it in the right thread. From bohra at cs.rutgers.edu Sun Sep 11 10:50:28 2005 From: bohra at cs.rutgers.edu (Aniruddha Bohra) Date: Sun, 11 Sep 2005 13:50:28 -0400 Subject: [openib-general] dapl_ep_connect problems Message-ID: <43246E64.9010309@cs.rutgers.edu> Hello, I tried connecting 2 endpoints using the new uDAPL library. The connection fails with an invalid route option. Attached is the log with DAT_DBG_TYPE=0xffff and DAPL_DBG_TYPE=0xffff. I traced the call to ib_at_route_by_ip(), It seems like all the arguments (dst_ip, src_ip, r_qual..) are 0x00. I have also attached my dat.conf, lsmod, /etc/ibhosts, and /etc/hosts. Could you please direct me where to look? Thanks Aniruddha -------------- next part -------------- A non-text attachment was scrubbed... Name: dapl_fail.log Type: text/x-log Size: 5742 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dat.conf URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: hosts URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: lsmod.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ibhosts URL: From mst at mellanox.co.il Sun Sep 11 13:31:42 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 11 Sep 2005 23:31:42 +0300 Subject: [openib-general] RFC: ib_set_comp_handler Message-ID: <20050911203142.GA25325@mellanox.co.il> Hi! I'd like to add a capability to change the cq completion handler. It seems this cant be done in the ULP without introducing additional indirection and/or locking, which I'd like to avoid. I'd use it in sdp to disable cq events while a connection is destroyed. It also seems like ipoib could use such a capability, simply blocking completion events instead of waiting for 5 seconds in ipoib_ib_dev_stop. I expect this to be useful in other scenarious (IPoIB NAPI?). If this makes sense, I'll code up a patch. Opinions? Thanks, -- MST From mst at mellanox.co.il Mon Sep 12 00:39:55 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Sep 2005 10:39:55 +0300 Subject: [openib-general] opensm and signals Message-ID: <20050912073955.GD19358@mellanox.co.il> Hi, Hal, Eitan! Whats the reason opensm needs to catch and try to handle signals such as SIGINT? It seems that we can let the default handler simply kill the application. If this is required for some vendor layer, shouldnt the signal handling be part of that vendor layer? Please note that glibc manual says http://www.gnu.org/software/libc/manual/html_node/Defining-Handlers.html "You need to take special care in writing handler functions because they can be called asynchronously. That is, a handler might be called at any point in the program, unpredictably." and "The best practice is to write a handler that does nothing but set an external variable that the program checks regularly, and leave all serious work to the program." Based on this, I really wander whether its best to avoid signal handlers altogether, if possible. Thanks, -- MST From eitan at mellanox.co.il Mon Sep 12 01:00:16 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 12 Sep 2005 11:00:16 +0300 Subject: [openib-general] RE: opensm and signals Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30691CC@mtlexch01.mtl.com> Hi Michael pensm and signals > > Hi, Hal, Eitan! > Whats the reason opensm needs to catch and try to handle signals such as SIGINT? [EZ] The reason was way back some drivers had resource tracking problems. So if OpenSM left without cleaning up all used resources (like MAD buffers and UD-AVs) the driver oops'ed. > It seems that we can let the default handler simply kill the application. > If this is required for some vendor layer, shouldnt the signal > handling be part of that vendor layer? I will be glad to remove that code... > > Please note that glibc manual says > > http://www.gnu.org/software/libc/manual/html_node/Defining-Handlers.html > > "You need to take special care in writing handler functions because they > can be called asynchronously. That is, a handler might be called at any > point in the program, unpredictably." > > and > > "The best practice is to write a handler that does nothing but set an > external variable that the program checks regularly, and leave all > serious work to the program." [EZ] This is exactly what the handler does - set osm_exit_flag ... > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon Sep 12 01:07:55 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Sep 2005 11:07:55 +0300 Subject: [openib-general] [PATCH] ib_sync_cq ( was Re: RFC: ib_set_comp_handler) In-Reply-To: <20050911203142.GA25325@mellanox.co.il> References: <20050911203142.GA25325@mellanox.co.il> Message-ID: <20050912080755.GE19358@mellanox.co.il> Roland, Sean, With the following patch, it becomes legal for clients to modify comp_handler or cq_context fields in ib_cq structure of an existing cq. To avoid races, and to make it possible for hw layer to cache these values, I added a new API ib_sync_cq which must be called after one of comp_handler and cq_context is changed. I plan to use this capability in sdp, to disable cq events while a connection is destroyed. I expect this to be useful in other scenarious (IPoIB NAPI?). Comments? --- Make it possible to flush completion events for a specific cq. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.13/drivers/infiniband/hw/mthca/mthca_cq.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/hw/mthca/mthca_cq.c 2005-09-11 17:52:37.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/hw/mthca/mthca_cq.c 2005-09-12 10:33:13.000000000 +0300 @@ -789,6 +789,15 @@ err_out: return err; } +void mthca_sync_cq(struct ib_cq *ibcq) +{ + struct mthca_dev *dev = to_mdev(ibcq->device); + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); + else + synchronize_irq(dev->pdev->irq); +} + void mthca_free_cq(struct mthca_dev *dev, struct mthca_cq *cq) { Index: linux-2.6.13/drivers/infiniband/include/rdma/ib_verbs.h =================================================================== --- linux-2.6.13.orig/drivers/infiniband/include/rdma/ib_verbs.h 2005-09-11 10:24:36.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/include/rdma/ib_verbs.h 2005-09-12 10:40:33.000000000 +0300 @@ -884,6 +884,7 @@ struct ib_device { struct ib_cq * (*create_cq)(struct ib_device *device, int cqe, struct ib_ucontext *context, struct ib_udata *udata); + void (*sync_cq)(struct ib_cq *cq); int (*destroy_cq)(struct ib_cq *cq); int (*resize_cq)(struct ib_cq *cq, int *cqe); int (*poll_cq)(struct ib_cq *cq, int num_entries, @@ -1227,6 +1228,16 @@ struct ib_cq *ib_create_cq(struct ib_dev void *cq_context, int cqe); /** + * ib_sync_cq - flush CQ completion event handler. + * This must be used after modifying comp_handler or cq_context. + * @cq: The CQ to flush events for. + */ +static inline void ib_sync_cq(struct ib_cq *cq) +{ + return cq->device->sync_cq(cq); +} + +/** * ib_resize_cq - Modifies the capacity of the CQ. * @cq: The CQ to resize. * @cqe: The minimum size of the CQ. Index: linux-2.6.13/drivers/infiniband/hw/mthca/mthca_provider.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/hw/mthca/mthca_provider.c 2005-09-11 10:24:37.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/hw/mthca/mthca_provider.c 2005-09-12 10:34:00.000000000 +0300 @@ -1090,6 +1090,7 @@ int mthca_register_device(struct mthca_d dev->ib_dev.destroy_qp = mthca_destroy_qp; dev->ib_dev.create_cq = mthca_create_cq; dev->ib_dev.destroy_cq = mthca_destroy_cq; + dev->ib_dev.sync_cq = mthca_sync_cq; dev->ib_dev.poll_cq = mthca_poll_cq; dev->ib_dev.get_dma_mr = mthca_get_dma_mr; dev->ib_dev.reg_phys_mr = mthca_reg_phys_mr; -- MST From halr at voltaire.com Mon Sep 12 03:32:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 06:32:14 -0400 Subject: [openib-general] Re: [openib-commits] r3362 - gen2/trunk/src/linux-kernel/infiniband/ulp/sdp In-Reply-To: <20050911075359.7686F2283E7@openib.ca.sandia.gov> References: <20050911075359.7686F2283E7@openib.ca.sandia.gov> Message-ID: <1126520743.4382.28476.camel@hal.voltaire.com> On Sun, 2005-09-11 at 03:53, mst at openib.org wrote: > Author: mst > Date: 2005-09-11 00:53:58 -0700 (Sun, 11 Sep 2005) > New Revision: 3362 > > Modified: > gen2/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_buff.c > Modified: gen2/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_buff.c > =================================================================== > --- gen2/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_buff.c 2005-09-11 07:48:41 UTC (rev 3361) > +++ gen2/trunk/src/linux-kernel/infiniband/ulp/sdp/sdp_buff.c 2005-09-11 07:53:58 UTC (rev 3362) > @@ -305,194 +305,78 @@ ... > /* > * sdp_buff_pool_init - Initialize the main buffer pool of memory > */ > -int sdp_buff_pool_init(int buff_min, int buff_max, int alloc_inc, int free_mark) > +int sdp_buff_pool_init(void) ... > sdp_dbg_init("Main pool initialized with min:max <%d:%d> buffers.", > buff_min, buff_max); These variables are no longer declared -- Hal From mst at mellanox.co.il Mon Sep 12 04:05:58 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Sep 2005 14:05:58 +0300 Subject: [openib-general] Re: [openib-commits] r3362 - gen2/trunk/src/linux-kernel/infiniband/ulp/sdp In-Reply-To: <1126520743.4382.28476.camel@hal.voltaire.com> References: <20050911075359.7686F2283E7@openib.ca.sandia.gov> <1126520743.4382.28476.camel@hal.voltaire.com> Message-ID: <20050912110558.GE845@mellanox.co.il> Quoting Hal Rosenstock : > > sdp_dbg_init("Main pool initialized with min:max <%d:%d> buffers.", > > buff_min, buff_max); > > These variables are no longer declared I wander why does this compile for me? Hal, does the following fix your problem? 5 files changed, 8 insertions(+), 79 deletions(-) --- Kill unused code/data in sdp_buff.* Signed-off-by: Michael S. Tsirkin Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_proto.h =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_proto.h 2005-09-12 13:58:10.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_proto.h 2005-09-12 15:49:37.000000000 +0300 @@ -52,12 +52,6 @@ struct sdpc_buff *sdp_buff_pool_get(void void sdp_buff_pool_put(struct sdpc_buff *buff); -void sdp_buff_pool_chain_put(struct sdpc_buff *buff, u32 count); - -void sdp_buff_pool_chain_link(struct sdpc_buff *head, struct sdpc_buff *buff); - -int sdp_buff_pool_buff_size(void); - void sdp_buff_q_init(struct sdpc_buff_q *pool); void sdp_buff_q_clear_unmap(struct sdpc_buff_q *pool, Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_recv.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_recv.c 2005-09-08 11:55:59.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_recv.c 2005-09-12 15:49:37.000000000 +0300 @@ -1064,7 +1064,6 @@ int sdp_inet_recv(struct kiocb *req, st struct sdp_sock *conn; struct sdpc_iocb *iocb; struct sdpc_buff *buff; - struct sdpc_buff *head = NULL; long timeout; size_t length; int result = 0; @@ -1073,7 +1072,6 @@ int sdp_inet_recv(struct kiocb *req, st int copied = 0; int copy; int update; - int free_count = 0; s8 oob = 0; s8 ack = 0; struct sdpc_buff_q peek_queue; @@ -1215,15 +1213,8 @@ int sdp_inet_recv(struct kiocb *req, st else { if (buff->flags & SDP_BUFF_F_OOB_PRES) conn->rcv_urg_cnt -= 1; - /* - * create a link of buffers which - * will be returned to the free pool - * in one group. - */ - sdp_buff_pool_chain_link(head, buff); - head = buff; - free_count++; + sdp_buff_pool_put(buff); /* * post additional recv buffers if * needed, but check only every N @@ -1383,8 +1374,6 @@ done: sdp_dbg_warn(conn, "Error <%d> flushing recv queue.", expect); } - - sdp_buff_pool_chain_put(head, free_count); /* * return any peeked buffers to the recv queue, in the correct order. */ Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_sent.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_sent.c 2005-09-08 11:55:59.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_sent.c 2005-09-12 15:52:41.000000000 +0300 @@ -117,7 +117,6 @@ static int sdp_sent_abort(struct sdp_soc */ int sdp_event_send(struct sdp_sock *conn, struct ib_wc *comp) { - struct sdpc_buff *head = NULL; struct sdpc_buff *buff; u64 current_wrid = 0; u32 free_count = 0; @@ -240,21 +239,15 @@ int sdp_event_send(struct sdp_sock *conn if (SDP_BUFF_F_GET_UNSIG(buff) > 0) conn->send_usig--; - /* - * create a link of buffers which will be returned to - * the free pool in one group. - */ - sdp_buff_pool_chain_link(head, buff); - head = buff; + sdp_buff_pool_put(buff); + free_count++; if (comp->wr_id == current_wrid) break; } - sdp_buff_pool_chain_put(head, free_count); - if (free_count <= 0 || conn->send_usig < 0) { sdp_dbg_warn(conn, "Send processing mismatch. <%llu:%llu:%d:%d>", @@ -276,7 +269,6 @@ int sdp_event_send(struct sdp_sock *conn return 0; drop: sdp_buff_pool_put(buff); - sdp_buff_pool_chain_put(head, free_count); done: return result; } Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_buff.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_buff.c 2005-09-11 12:36:48.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_buff.c 2005-09-12 15:51:18.000000000 +0300 @@ -305,7 +305,7 @@ void sdp_buff_q_clear_unmap(struct sdpc_ /* * sdp_buff_pool_release - release allocated buffers from the main pool */ -static void sdp_buff_pool_release(struct sdpc_buff *buff) +void sdp_buff_pool_put(struct sdpc_buff *buff) { kmem_cache_free(main_pool.pool_cache, buff->head); kmem_cache_free(main_pool.buff_cache, buff); @@ -368,8 +368,7 @@ int sdp_buff_pool_init(void) result = -ENOMEM; goto error_buff; } - sdp_dbg_init("Main pool initialized with min:max <%d:%d> buffers.", - buff_min, buff_max); + sdp_dbg_init("Main pool initialized."); return 0; @@ -416,34 +415,3 @@ struct sdpc_buff *sdp_buff_pool_get(void return buff; } - -/* - * sdp_buff_pool_put - Return a buffer to the main buffer pool - */ -void sdp_buff_pool_put(struct sdpc_buff *buff) -{ - sdp_buff_pool_release(buff); -} - -/* - * sdp_buff_pool_chain_link - create chain of buffers which can be returned - */ -void sdp_buff_pool_chain_link(struct sdpc_buff *head, struct sdpc_buff *buff) -{ - sdp_buff_pool_release(buff); -} - -/* - * sdp_buff_pool_chain_put - Return a buffer to the main buffer pool - */ -void sdp_buff_pool_chain_put(struct sdpc_buff *buff, u32 count) -{ -} - -/* - * sdp_buff_pool_buff_size - return the size of buffers in the main pool - */ -int sdp_buff_pool_buff_size(void) -{ - return PAGE_SIZE; -} Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_buff.h =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_buff.h 2005-09-08 19:19:46.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_buff.h 2005-09-12 15:49:37.000000000 +0300 @@ -76,24 +76,8 @@ struct sdpc_buff { }; struct sdpc_buff_root { - /* - * variant - */ - struct sdpc_buff_q pool; /* actual pool of buffers */ - spinlock_t lock; /* spin lock for pool access */ - /* - * invariant - */ - kmem_cache_t *pool_cache; /* cache of pool objects */ + kmem_cache_t *pool_cache; /* pool of buffers */ kmem_cache_t *buff_cache; /* cache of buffer descriptor objects */ - - int buff_min; /* minimum allocated buffers */ - int buff_max; /* maximum allocated buffers */ - int buff_cur; /* total allocated buffers */ - int buff_size; /* size of each buffer in the pool */ - - int alloc_inc; /* allocation increment */ - int free_mark; /* start freeing unused buffers */ }; /* @@ -117,4 +101,6 @@ struct sdpc_buff_root { */ #define sdp_buff_q_size(pool) ((pool)->size) +#define sdp_buff_pool_buff_size() PAGE_SIZE + #endif /* _SDP_BUFF_H */ -- MST From halr at voltaire.com Mon Sep 12 05:26:56 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 08:26:56 -0400 Subject: [openib-general] Re: osmtest/OpenSM: ServiceGID and busy status In-Reply-To: <43228AB2.3080806@mellanox.co.il> References: <506C3D7B14CDD411A52C00025558DED607C307A8@mtlex01.yok.mtl.com> <1126296841.4401.7610.camel@hal.voltaire.com> <43228AB2.3080806@mellanox.co.il> Message-ID: <1126528015.4382.29787.camel@hal.voltaire.com> On Sat, 2005-09-10 at 03:26, Eitan Zahavi wrote: > > The OpenSM SA client API needs changing to make it optional. Other than > > that it is a matter of the default policy: retries and timeout (with > > backoff) to be used. > We should add it to the todo list. Done. > BTW is the presented future work during the last OpenIB workshop reflected in a > todo file ? Are there other items for this list (osm/doc/todo) ? -- Hal From halr at voltaire.com Mon Sep 12 05:36:26 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 08:36:26 -0400 Subject: [openib-general] osmtest osmt_multicast.c physible Message-ID: <1126528390.4382.29845.camel@hal.voltaire.com> Hi Yael, What is meant by physible in the below ? osmtest/osmt_multicast.c: "Fifth exact MTU & RATE physible, Sixth exact RATE physible\n\t\t" osmtest/osmt_multicast.c: "Seventh exact MTU physible (o15.0.1.4)...\n" osmtest/osmt_multicast.c: /* Using Exact physible MTU & RATE */ osmtest/osmt_multicast.c: /* Using Exact physible RATE */ osmtest/osmt_multicast.c: /* Using Exact physible MTU */ Also, o15.0.1.4 is obsolete at 1.2 and is replaced by o15-0.2.2.. Thanks. -- Hal From halr at voltaire.com Mon Sep 12 05:40:06 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 08:40:06 -0400 Subject: [openib-general] RE: Another OpenSM 1.8.0 nit In-Reply-To: <506C3D7B14CDD411A52C00025558DED60CCF38@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED60CCF38@mtlex01.yok.mtl.com> Message-ID: <1126528734.4382.29904.camel@hal.voltaire.com> Hi Yael, On Sun, 2005-09-11 at 05:47, Yael Kalka wrote: > The constant should be used. I added to our code (osm-1.8.0-merge) use > of this code. > There was a problem that the constant was defined, but in osm_subnet.c > the sminfo_polling_timeout > was hardcoded given the value of 10000, instead of using this > constant. > Do you want me to send a patch for this too? I am still picking up changes from osm-1.8.0-merge and merging them to the trunk version to be so I think I have this change. > Regarding documentation - we do have user manual for the 1.8.0. How do > you want to add it? Also, release notes too ? Previously, I had these added under osm/doc. Ideally it might be nice to make these more OpenIB specific which would require having the sources for editing, etc. Thanks. -- Hal > Yael > > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, September 07, 2005 1:16 AM > To: Yael Kalka > Cc: openib-general at openib.org > Subject: Another OpenSM 1.8.0 nit > > > Hi Yael, > > Here'a another OpenSM 1.8.0 nit: > opensm/osm_base.h:/****d* OpenSM: > Base/OSM_SM_DEFAULT_POLLING_TIMEOUT_MILISECS > opensm/osm_base.h:* OSM_SM_DEFAULT_POLLING_TIMEOUT_MILISECS > opensm/osm_base.h:#define OSM_SM_DEFAULT_POLLING_TIMEOUT_MILISECS > 10000 > Is this used ? > > Also are there updated docs (user manual, release notes) for 1.8.0 ? > > Thanks. > > -- Hal > From halr at voltaire.com Mon Sep 12 05:50:21 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 08:50:21 -0400 Subject: [openib-general] Re: [openib-commits] r3362 - gen2/trunk/src/linux-kernel/infiniband/ulp/sdp In-Reply-To: <20050912110558.GE845@mellanox.co.il> References: <20050911075359.7686F2283E7@openib.ca.sandia.gov> <1126520743.4382.28476.camel@hal.voltaire.com> <20050912110558.GE845@mellanox.co.il> Message-ID: <1126529414.4382.30010.camel@hal.voltaire.com> On Mon, 2005-09-12 at 07:05, Michael S. Tsirkin wrote: > Quoting Hal Rosenstock : > > > sdp_dbg_init("Main pool initialized with min:max <%d:%d> buffers.", > > > buff_min, buff_max); > > > > These variables are no longer declared > > I wander why does this compile for me? I wonder why too. > Hal, does the following fix your problem? That worked for me. Thanks. -- Hal From halr at voltaire.com Mon Sep 12 06:17:24 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 09:17:24 -0400 Subject: [openib-general] RE: osmtest/OpenSM: ServiceGID and busy status In-Reply-To: <1126296841.4401.7610.camel@hal.voltaire.com> References: <506C3D7B14CDD411A52C00025558DED607C307A8@mtlex01.yok.mtl.com> <1126296841.4401.7610.camel@hal.voltaire.com> Message-ID: <1126531043.4382.30246.camel@hal.voltaire.com> On Fri, 2005-09-09 at 16:14, Hal Rosenstock wrote: > On Fri, 2005-09-09 at 16:05, Eitan Zahavi wrote: > > > I may be wrong but: > > > ServiceGID says port GID for service. A port GID must meet the > > > requirements in the addressing section. > > [EZ] I think the spec intentionally leaves this open. The intent is to > > use this as GID but no check is defined. According to your > > interpretation no "proxy" - where node A publish services of node B - > > is allowed > > Proxy would be allowed. There are 2 possibilities: > 1. Allow valid looking GIDs > or > 2. Only allow GIDs present in the subnet I think that C15-0.0.1.13 (IBA 1.2 p.896) results in the second possibility above as it states: C15-0.1.13: SA shall reject as invalid any attempt to create, modify, or delete a ServiceRecord in which the ServiceP_Key is not present in the P_Key Tables of both the port identified by the ServiceGID and the port from which the request came. So that is more stringent and the ServiceGID must be a (valid) GID in the subnet. I believe the test ais ttempting to create the SR with an invalid ServiceGID as its subnet prefix is 0. It should be rejected by the SA per the rule above. -- Hal From johann at pathscale.com Mon Sep 12 06:51:20 2005 From: johann at pathscale.com (Johann George) Date: Mon, 12 Sep 2005 06:51:20 -0700 Subject: [openib-general] cannot seem to compile SDP Message-ID: <20050912135120.GA32672@cuprite.internal.keyresearch.com> When I compile SDP (using revision 3369) I get the following errors: drivers/infiniband/ulp/sdp/sdp_buff.c: In function `sdp_buff_pool_init': drivers/infiniband/ulp/sdp/sdp_buff.c:371: error: `buff_min' undeclared (first use in this function) drivers/infiniband/ulp/sdp/sdp_buff.c:371: error: (Each undeclared identifier is reported only once drivers/infiniband/ulp/sdp/sdp_buff.c:371: error: for each function it appears in.) drivers/infiniband/ulp/sdp/sdp_buff.c:371: error: `buff_max' undeclared (first use in this function) Johann From halr at voltaire.com Mon Sep 12 06:52:15 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 09:52:15 -0400 Subject: [openib-general] cannot seem to compile SDP In-Reply-To: <20050912135120.GA32672@cuprite.internal.keyresearch.com> References: <20050912135120.GA32672@cuprite.internal.keyresearch.com> Message-ID: <1126533135.4382.30548.camel@hal.voltaire.com> Hi Johann, On Mon, 2005-09-12 at 09:51, Johann George wrote: > When I compile SDP (using revision 3369) I get the following errors: > > drivers/infiniband/ulp/sdp/sdp_buff.c: In function `sdp_buff_pool_init': > drivers/infiniband/ulp/sdp/sdp_buff.c:371: error: `buff_min' undeclared (first use in this function) > drivers/infiniband/ulp/sdp/sdp_buff.c:371: error: (Each undeclared identifier is reported only once > drivers/infiniband/ulp/sdp/sdp_buff.c:371: error: for each function it appears in.) > drivers/infiniband/ulp/sdp/sdp_buff.c:371: error: `buff_max' undeclared (first use in this function) See http://openib.org/pipermail/openib-general/2005-September/011042.html for a patch for this. -- Hal From mst at mellanox.co.il Mon Sep 12 07:02:55 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Sep 2005 17:02:55 +0300 Subject: [openib-general] Re: cannot seem to compile SDP In-Reply-To: <20050912135120.GA32672@cuprite.internal.keyresearch.com> References: <20050912135120.GA32672@cuprite.internal.keyresearch.com> Message-ID: <20050912140255.GJ845@mellanox.co.il> Quoting r. Johann George : > When I compile SDP (using revision 3369) I get the following errors: Thats fixed now (rev 3370). -- MST From johann at pathscale.com Mon Sep 12 07:05:57 2005 From: johann at pathscale.com (Johann George) Date: Mon, 12 Sep 2005 07:05:57 -0700 Subject: [openib-general] Re: cannot seem to compile SDP In-Reply-To: <20050912140255.GJ845@mellanox.co.il> References: <20050912135120.GA32672@cuprite.internal.keyresearch.com> <20050912140255.GJ845@mellanox.co.il> Message-ID: <20050912140557.GA604@cuprite.internal.keyresearch.com> > Thats fixed now (rev 3370). Thanks. Johann From halr at voltaire.com Mon Sep 12 07:18:37 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 10:18:37 -0400 Subject: [openib-general] imgen build Message-ID: <1126534716.4382.30787.camel@hal.voltaire.com> Hi Michael, A couple of build related imgen questions: 1. How is imgen related to IBADM ? g++ -Wall -W -Werror -g -O2 -MP -MD '-DBLD_VER_STR="devel"' '-DIBADM_VER_STR=""' -fno-exceptions -c -o mic.o mic.cpp 2. Should there be a make install for these tools (t2a, mic) ? Thanks. -- Hal From halr at voltaire.com Mon Sep 12 07:25:27 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 10:25:27 -0400 Subject: [openib-general] dapls_ib_connect comment Message-ID: <1126535126.4382.30849.camel@hal.voltaire.com> Hi Arlin, In userspace/dapl/dapl/openib/dapl_ib_cm.c::dapls_ib_connect: status = ib_at_route_by_ip( ((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, ((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr, 0, 0, &conn->dapl_rt, &conn->dapl_comp, &conn->dapl_comp.req_id); dapl_dbg_log(DAPL_DBG_TYPE_CM, " connect: at_route ret=%d,%s req_id %d GID %016llx %016llx\n", status, strerror(errno), conn->dapl_comp.req_id, (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.subnet_prefix), (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.interface_id) ); The GID is part of the route structure which is an output and not filled in at completion of this call. I think that the destination and source IP addresses would be more useful here. -- Hal From mst at mellanox.co.il Mon Sep 12 07:48:08 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Sep 2005 17:48:08 +0300 Subject: [openib-general] Re: imgen build In-Reply-To: <1126534716.4382.30787.camel@hal.voltaire.com> References: <1126534716.4382.30787.camel@hal.voltaire.com> Message-ID: <20050912144808.GK845@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: imgen build > > Hi Michael, > > A couple of build related imgen questions: > > 1. How is imgen related to IBADM ? > g++ -Wall -W -Werror -g -O2 -MP -MD '-DBLD_VER_STR="devel"' '-DIBADM_VER_STR=""' -fno-exceptions -c -o mic.o mic.cpp IBADM also includes imgen. The only use of IBADM_VER_STR is for version reporting. > 2. Should there be a make install for these tools (t2a, mic) ? No idea. It really does not make sense to install it on all nodes in a cluster, and it works fine from any directory. What do you think? -- MST From halr at voltaire.com Mon Sep 12 07:58:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 10:58:49 -0400 Subject: [openib-general] Re: imgen build In-Reply-To: <20050912144808.GK845@mellanox.co.il> References: <1126534716.4382.30787.camel@hal.voltaire.com> <20050912144808.GK845@mellanox.co.il> Message-ID: <1126536754.4382.31138.camel@hal.voltaire.com> On Mon, 2005-09-12 at 10:48, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: imgen build > > > > Hi Michael, > > > > A couple of build related imgen questions: > > > > 1. How is imgen related to IBADM ? > > g++ -Wall -W -Werror -g -O2 -MP -MD '-DBLD_VER_STR="devel"' '-DIBADM_VER_STR=""' -fno-exceptions -c -o mic.o mic.cpp > > IBADM also includes imgen. > The only use of IBADM_VER_STR is for version reporting. Is it's version tied to IBADM or is this to save a separate version string ? > > 2. Should there be a make install for these tools (t2a, mic) ? > > No idea. > It really does not make sense to install it on all nodes in a cluster, OK but shouldn't one have the option of installing it or not ? > and it works fine from any directory. True. > What do you think? I'm ambivalent about this. Anyone else have an opinion ? -- Hal From mst at mellanox.co.il Mon Sep 12 08:08:25 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Sep 2005 18:08:25 +0300 Subject: [openib-general] Re: imgen build In-Reply-To: <1126536754.4382.31138.camel@hal.voltaire.com> References: <1126534716.4382.30787.camel@hal.voltaire.com> <20050912144808.GK845@mellanox.co.il> <1126536754.4382.31138.camel@hal.voltaire.com> Message-ID: <20050912150825.GL845@mellanox.co.il> Quoting Hal Rosenstock : > > IBADM also includes imgen. > > The only use of IBADM_VER_STR is for version reporting. > > Is it's version tied to IBADM or is this to save a separate version > string ? Both :) BLD_VER_STR is supposed to keep its own version string, IBADM_VER_STR - to the IBADM version string. -- MST From halr at voltaire.com Mon Sep 12 08:02:11 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 11:02:11 -0400 Subject: [openib-general] dapl_quit_util.c Message-ID: <1126537051.4382.31182.camel@hal.voltaire.com> Hi James, .../src/userspace/dapl/test/dapltest/test/dapl_quit_util.c has +x permissions which should be removed. Thanks. -- Hal From halr at voltaire.com Mon Sep 12 08:15:26 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 11:15:26 -0400 Subject: [openib-general] uDAPL README comment Message-ID: <1126538125.4382.31372.camel@hal.voltaire.com> Hi again, /userspace/dapl/dapl/openib/README says: - modify doc/dat.conf to add a example openib configuration ... doc/dat.conf The doc subdirectory and sample dat.conf appear to be missing. Thanks. -- Hal From eitan at openib.org Mon Sep 12 08:53:45 2005 From: eitan at openib.org (Eitan Zahavi) Date: Mon, 12 Sep 2005 18:53:45 +0300 Subject: [openib-general] RE: osmtest/OpenSM: ServiceGID and busy status In-Reply-To: <1126531043.4382.30246.camel@hal.voltaire.com> References: <1126531043.4382.30246.camel@hal.voltaire.com> Message-ID: <20050912161141.CE2D422834D@openib.ca.sandia.gov> Hal Rosenstock wrote: >> >>Proxy would be allowed. There are 2 possibilities: >>1. Allow valid looking GIDs >>or >>2. Only allow GIDs present in the subnet > > > I think that C15-0.0.1.13 (IBA 1.2 p.896) results in the second > possibility above as it states: > > C15-0.1.13: SA shall reject as invalid any attempt to create, modify, or > delete a ServiceRecord in which the ServiceP_Key is not present in the > P_Key Tables of both the port identified by the ServiceGID and the port > from which the request came. > Thanks for finding/pointing it. I am not sure this is covered in the implementation as of today. As it would cause the osmtest requests to be dropped. Let us add it to the todo list. Thanks > So that is more stringent and the ServiceGID must be a (valid) GID in > the subnet. > > I believe the test ais ttempting to create the SR with an invalid > ServiceGID as its subnet prefix is 0. It should be rejected by the SA > per the rule above. > > -- Hal > From rolandd at cisco.com Mon Sep 12 09:02:26 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 12 Sep 2005 09:02:26 -0700 Subject: [openib-general] libibat/libibcm build mess In-Reply-To: <1126311890.4382.154.camel@hal.voltaire.com> (Hal Rosenstock's message of "09 Sep 2005 20:28:26 -0400") References: <528xy5g678.fsf@cisco.com> <521x3xg5g8.fsf@cisco.com> <1126310570.4382.55.camel@hal.voltaire.com> <52oe71eq7z.fsf@cisco.com> <1126311890.4382.154.camel@hal.voltaire.com> Message-ID: <528xy2cm0d.fsf@cisco.com> Hal> Shall I take care of this or do you have it covered ? I just checked in a change that makes libibat the single source of and makes libibcm depend on libibat. I don't really like this solution but at least now everything builds sanely. - R. From mshefty at ichips.intel.com Mon Sep 12 09:23:48 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Sep 2005 09:23:48 -0700 Subject: [openib-general] Re: different CM panic In-Reply-To: <52br31efxe.fsf@cisco.com> References: <4df28be4050909152118f3e947@mail.gmail.com> <52k6hpeq2x.fsf@cisco.com> <52br31efxe.fsf@cisco.com> Message-ID: <4325AB94.2060404@ichips.intel.com> Roland Dreier wrote: > Well, at least I tracked this down to a use-after-free bug in the CM. > I went ahead and committed this trivial fix: > > If the CM REQ handling function gets to error2, then it frees > cm_id_priv->timewait_info. But the next line goes through > ib_destroy_cm_id() -> ib_send_cm_rej() -> cm_reset_to_idle(), > which ends up calling cm_cleanup_timewait(), which dereferences the > pointer we just freed. Thanks for fixing this. - Sean From rolandd at cisco.com Mon Sep 12 09:33:09 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 12 Sep 2005 09:33:09 -0700 Subject: [openib-general] Re: [PATCH] [SRP] Fix CM redirection in SRP In-Reply-To: (John Kingman's message of "Sat, 10 Sep 2005 17:16:47 -0500 (CDT)") References: <528xy6i7e4.fsf@cisco.com> Message-ID: <524q8qckl6.fsf@cisco.com> I rewrote your patch to fix a few things: - I believe we always need to copy the redirect GID, even if the redirect LID is non-zero; otherwise we will end up putting the GID of the original port in our CM REQ. From my reading of the description of ClassPortInfo in the IB spec, I think the redirect GID must always be valid if redirection is being done. - I moved the connect handling into a new function srp_connect_target(), so that both the initial connect and future reconnects get handling for port and DLID redirection. - I got rid of the change for CM REJ reason 25 -- that reject code does not carry a ClassPortInfo in the ARI, but rather just the GID at offset 0. By the way, it's not that critical for SRP, since it's not in the upstream kernel yet, but I don't think we should use obfuscated email addresses in Signed-off-by lines: > Signed-off-by: John Kingman storagegear.com> Just put the '@' in there -- I think there are much bigger spam sources to worry about. Anyway, here's the updated patch. Does this look OK to you? - R. Index: infiniband/ulp/srp/ib_srp.c =================================================================== --- infiniband/ulp/srp/ib_srp.c (revision 3372) +++ infiniband/ulp/srp/ib_srp.c (working copy) @@ -386,11 +386,48 @@ static void srp_remove_work(void *target scsi_host_put(target->scsi_host); } +static int srp_connect_target(struct srp_target_port *target) +{ + int ret; + + while (1) { + init_completion(&target->done); + ret = srp_send_req(target); + if (ret) + return ret; + wait_for_completion(&target->done); + + /* + * The CM event handling code will set status to + * SRP_PORT_REDIRECT if we get a port redirect REJ + * back, or SRP_DLID_REDIRECT if we get a lid/qp + * redirect REJ back. + */ + switch (target->status) { + case 0: + return 0; + + case SRP_PORT_REDIRECT: + ret = srp_lookup_path(target); + if (ret) + return ret; + break; + + case SRP_DLID_REDIRECT: + break; + + default: + return target->status; + } + } +} + static int srp_reconnect_target(struct srp_target_port *target) { struct ib_qp_attr qp_attr; struct srp_request *req; struct ib_wc wc; + u32 remote_cm_qpn; int ret; int i; @@ -402,6 +439,8 @@ static int srp_reconnect_target(struct s target->state = SRP_TARGET_CONNECTING; spin_unlock_irq(target->scsi_host->host_lock); + remote_cm_qpn = target->cm_id->remote_cm_qpn; + srp_disconnect_target(target); target->cm_id = ib_create_cm_id(srp_cm_handler, target); @@ -411,6 +450,8 @@ static int srp_reconnect_target(struct s goto err; } + target->cm_id->remote_cm_qpn = remote_cm_qpn; + qp_attr.qp_state = IB_QPS_RESET; ret = ib_modify_qp(target->qp, &qp_attr, IB_QP_STATE); if (ret) @@ -438,24 +479,9 @@ static int srp_reconnect_target(struct s target->req_ring[SRP_SQ_SIZE - 1].next = -1; INIT_LIST_HEAD(&target->req_queue); -retry_connect: - init_completion(&target->done); - ret = srp_send_req(target); + ret = srp_connect_target(target); if (ret) goto err; - wait_for_completion(&target->done); - - /* - * The CM event handling code will set status to - * SRP_PORT_REDIRECT if we get a port redirect REJ back. - */ - if (target->status == SRP_PORT_REDIRECT) { - ret = srp_lookup_path(target); - if (ret) - goto err; - goto retry_connect; - } else if (target->status < 0) - goto err; spin_lock_irq(target->scsi_host->host_lock); if (target->state == SRP_TARGET_CONNECTING) { @@ -1031,8 +1057,13 @@ static int srp_cm_handler(struct ib_cm_i if (event->param.rej_rcvd.reason == IB_CM_REJ_PORT_CM_REDIRECT) { cpi = event->param.rej_rcvd.ari; + target->path.dlid = cpi->redirect_lid; + target->path.pkey = cpi->redirect_pkey; + cm_id->remote_cm_qpn = be32_to_cpu(cpi->redirect_qp) & 0x00ffffff; memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); - target->status = SRP_PORT_REDIRECT; + + target->status = target->path.dlid ? + SRP_DLID_REDIRECT : SRP_PORT_REDIRECT; } else if (topspin_workarounds && !memcmp(&target->ioc_guid, topspin_oui, 3) && event->param.rej_rcvd.reason == IB_CM_REJ_PORT_REDIRECT) { @@ -1042,7 +1073,7 @@ static int srp_cm_handler(struct ib_cm_i * (port redirect). */ memcpy(target->path.dgid.raw, - event->param.rej_rcvd.ari + 0, 16); + event->param.rej_rcvd.ari, 16); printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), @@ -1386,28 +1417,15 @@ static ssize_t srp_create_target(struct (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[12]), (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[14])); -retry_path: ret = srp_lookup_path(target); if (ret) { ib_destroy_cm_id(target->cm_id); goto err; } - init_completion(&target->done); - ret = srp_send_req(target); - if (ret) - goto err; - wait_for_completion(&target->done); - - /* - * The CM event handling code will set status to - * SRP_PORT_REDIRECT if we get a port redirect REJ back. - */ - if (target->status == SRP_PORT_REDIRECT) - goto retry_path; - else if (target->status < 0) { + ret = srp_connect_target(target); + if (ret) { printk(KERN_ERR PFX "Connection failed\n"); - ret = target->status; goto err; } Index: infiniband/ulp/srp/ib_srp.h =================================================================== --- infiniband/ulp/srp/ib_srp.h (revision 3372) +++ infiniband/ulp/srp/ib_srp.h (working copy) @@ -52,6 +52,7 @@ enum { SRP_ABORT_TIMEOUT_MS = 5000, SRP_PORT_REDIRECT = 1, + SRP_DLID_REDIRECT = 2, SRP_MAX_IU_LEN = 256, From ftillier at silverstorm.com Mon Sep 12 09:47:20 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Mon, 12 Sep 2005 09:47:20 -0700 Subject: [openib-general] RFC: ib_set_comp_handler In-Reply-To: <20050911203142.GA25325@mellanox.co.il> Message-ID: <003e01c5b7b9$a9855ed0$9e5aa8c0@infiniconsys.com> > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Sunday, September 11, 2005 1:32 PM > > Hi! > I'd like to add a capability to change the cq completion handler. > > It seems this cant be done in the ULP without introducing additional > indirection and/or locking, which I'd like to avoid. You need to have locking to properly synchronize so that the user can know when their "old" callback handler will cease to be invoked. I agree however that you need some help from the verbs layer because only it knows when callbacks are in progress. > I'd use it in sdp to disable cq events while a connection is destroyed. Why not just move the QP to the reset state to suppress any further completions, then poll the CQ for any prior completions? Aren't you guaranteed that once the QP is in reset, all pending CQEs have been written? > It also seems like ipoib could use such a capability, simply blocking > completion events instead of waiting for 5 seconds in ipoib_ib_dev_stop. > I expect this to be useful in other scenarious (IPoIB NAPI?). It seems that what you really want is a way to disarm a CQ, not change the completion handler. Are CQs shared between sockets in SDP, or does each socket have its own CQ? - Fab From viswa.krish at gmail.com Mon Sep 12 09:49:05 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Mon, 12 Sep 2005 09:49:05 -0700 Subject: [openib-general] Status of opensm 1.8 merge Message-ID: <4df28be405091209494a5a4236@mail.gmail.com> Can I start testing opensm 1.8 merge on gen2 ? What is the current status ? -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon Sep 12 10:02:57 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Sep 2005 20:02:57 +0300 Subject: [openib-general] RFC: ib_set_comp_handler In-Reply-To: <003e01c5b7b9$a9855ed0$9e5aa8c0@infiniconsys.com> References: <20050911203142.GA25325@mellanox.co.il> <003e01c5b7b9$a9855ed0$9e5aa8c0@infiniconsys.com> Message-ID: <20050912170257.GO845@mellanox.co.il> Quoting r. Fab Tillier : > It seems that what you really want is a way to disarm a CQ, not change the > completion handler. Yes, but changing the handler to an empty function looks like an easy way to do it, without adding conditions on typical event path. See the patch I posted separately. > Are CQs shared between sockets in SDP, or does each socket > have its own CQ? Currently each socket has 2 CQs. -- MST From rolandd at cisco.com Mon Sep 12 10:06:51 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 12 Sep 2005 10:06:51 -0700 Subject: [openib-general] Re: [PATCH] ipoib: fix module removal race In-Reply-To: <20050911151934.GB19358@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 11 Sep 2005 18:19:34 +0300") References: <20050911151934.GB19358@mellanox.co.il> Message-ID: <52y862b4gk.fsf@cisco.com> Michael> Roland, does the following patch make sense? IP over IB Michael> seems more stable (didnt yet crash for me) with this Michael> applied. Yes, this makes sense. Applied to svn, and queued in my git tree for 2.6.14. - R. From jlentini at netapp.com Mon Sep 12 10:11:46 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 12 Sep 2005 13:11:46 -0400 (EDT) Subject: [openib-general] Re: imgen build In-Reply-To: <1126536754.4382.31138.camel@hal.voltaire.com> References: <1126534716.4382.30787.camel@hal.voltaire.com> <20050912144808.GK845@mellanox.co.il> <1126536754.4382.31138.camel@hal.voltaire.com> Message-ID: On Mon, 12 Sep 2005, Hal Rosenstock wrote: > > > 2. Should there be a make install for these tools (t2a, mic) ? > > > > No idea. > > It really does not make sense to install it on all nodes in a cluster, > > OK but shouldn't one have the option of installing it or not ? > > > and it works fine from any directory. > > True. > > > What do you think? > > I'm ambivalent about this. Anyone else have an opinion ? If there is an option to install them, they should go in a mellanox/infiniband specific location. I've seen problems in the past where tools in /bin and /sbin get overwritten. The more pressing issue in my opinion is to provide a way to map the board ids in /sys/class/infiniband/mthca0/board_id to the correct .brd files in the Mellanox firmware download. From ftillier at silverstorm.com Mon Sep 12 10:21:25 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Mon, 12 Sep 2005 10:21:25 -0700 Subject: [openib-general] RFC: ib_set_comp_handler In-Reply-To: <20050912170257.GO845@mellanox.co.il> Message-ID: <004001c5b7be$698a7cc0$9e5aa8c0@infiniconsys.com> > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Monday, September 12, 2005 10:03 AM > > Quoting r. Fab Tillier : > > It seems that what you really want is a way to disarm a CQ, not change the > > completion handler. > Yes, but changing the handler to an empty function looks like an easy > way to do it, without adding conditions on typical event path. It might be worth changing the name of the function to reflect that - ib_clr_comp_handler, for example. > See the patch I posted separately. Sorry, I missed that. I'll take a look. > > Are CQs shared between sockets in SDP, or does each socket > > have its own CQ? > > Currently each socket has 2 CQs. Doesn't the verbs layer provide synchronization between CQ callbacks and CQ destruction? Why not just destroy the CQs and avoid having to change the handler? - Fab From jlentini at netapp.com Mon Sep 12 10:33:39 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 12 Sep 2005 13:33:39 -0400 (EDT) Subject: [openib-general] Re: dapl_ep_connect problems In-Reply-To: <43246E64.9010309@cs.rutgers.edu> References: <43246E64.9010309@cs.rutgers.edu> Message-ID: On Sun, 11 Sep 2005, Aniruddha Bohra wrote: > Hello, > I tried connecting 2 endpoints using the new uDAPL library. > The connection fails with an invalid route option. > Attached is the log with DAT_DBG_TYPE=0xffff and DAPL_DBG_TYPE=0xffff. > I traced the call to ib_at_route_by_ip(), > It seems like all the arguments (dst_ip, src_ip, r_qual..) are 0x00. > > I have also attached my dat.conf, lsmod, /etc/ibhosts, and /etc/hosts. > > Could you please direct me where to look? > > Thanks > Aniruddha > > The parameters you passed to dapl_ep_connect don't look good: dapl_ep_connect (0x8057218, {4294967280.4294967295.4294967295.4294967295}, 4A275800, 0, -1, (nil), 0, 0) The second argument should be the IP address you want to connect to. The address doesn't look good. Can you verify that you are passing uDAPL a good address? I'll look over our code to see if there is a bug in our debug statement. We are printing out the 8-bit octets of the IP address as unsigned integer values. I can't explain why octet 1 is 0xFFFFFFF0 and octets 2,3, and 4 are 0xFFFFFFFF. Also your private data size is -1. I would change that to 0, but this isn't the problem. From bohra at cs.rutgers.edu Mon Sep 12 10:37:27 2005 From: bohra at cs.rutgers.edu (Aniruddha Bohra) Date: Mon, 12 Sep 2005 13:37:27 -0400 Subject: [openib-general] Re: dapl_ep_connect problems In-Reply-To: References: <43246E64.9010309@cs.rutgers.edu> Message-ID: <4325BCD7.8020509@cs.rutgers.edu> James Lentini wrote: > >On Sun, 11 Sep 2005, Aniruddha Bohra wrote: > > > >>Hello, >> I tried connecting 2 endpoints using the new uDAPL library. >>The connection fails with an invalid route option. >>Attached is the log with DAT_DBG_TYPE=0xffff and DAPL_DBG_TYPE=0xffff. >>I traced the call to ib_at_route_by_ip(), >>It seems like all the arguments (dst_ip, src_ip, r_qual..) are 0x00. >> >>I have also attached my dat.conf, lsmod, /etc/ibhosts, and /etc/hosts. >> >>Could you please direct me where to look? >> >>Thanks >>Aniruddha >> >> >> >> > >The parameters you passed to dapl_ep_connect don't look good: > >dapl_ep_connect (0x8057218, >{4294967280.4294967295.4294967295.4294967295}, 4A275800, 0, -1, (nil), >0, 0) > >The second argument should be the IP address you want to connect to. >The address doesn't look good. Can you verify that you are passing >uDAPL a good address? > >I'll look over our code to see if there is a bug in our debug >statement. We are printing out the 8-bit octets of the IP address as >unsigned integer values. I can't explain why octet 1 is 0xFFFFFFF0 and >octets 2,3, and 4 are 0xFFFFFFFF. > >Also your private data size is -1. I would change that to 0, but this >isn't the problem. > > Hi I checked the code and it seems there was indeed a problem with that. I changed the code and now it posts the connect request and does not get any event. The ep_status is DAT_EP_STATE_CONNECT_PENDING. I am trying now with dapltest to see if it is a problem with my server (netapp filer NFSoRDMA) or the Open IB stack. Thanks for your help. If you have any idea about some special configuration required for the connection, I would really appreciate your help. Is there any way to see the requests similar to tcpdump? Thanks Aniruddha From mst at mellanox.co.il Mon Sep 12 10:42:22 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Sep 2005 20:42:22 +0300 Subject: [openib-general] Re: imgen build In-Reply-To: References: <1126534716.4382.30787.camel@hal.voltaire.com> <20050912144808.GK845@mellanox.co.il> <1126536754.4382.31138.camel@hal.voltaire.com> Message-ID: <20050912174222.GA3184@mellanox.co.il> Quoting r. James Lentini : > The more pressing issue in my opinion is to provide a way to map the > board ids in /sys/class/infiniband/mthca0/board_id to the correct .brd > files in the Mellanox firmware download. > A web page with the table that does the mapping is in the works. -- MST From mst at mellanox.co.il Mon Sep 12 10:44:22 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 12 Sep 2005 20:44:22 +0300 Subject: [openib-general] RFC: ib_set_comp_handler In-Reply-To: <004001c5b7be$698a7cc0$9e5aa8c0@infiniconsys.com> References: <20050912170257.GO845@mellanox.co.il> <004001c5b7be$698a7cc0$9e5aa8c0@infiniconsys.com> Message-ID: <20050912174422.GB3184@mellanox.co.il> Quoting r. Fab Tillier : > Doesn't the verbs layer provide synchronization between CQ callbacks and CQ > destruction? Why not just destroy the CQs and avoid having to change the > handler? I cant destroy the cq without destroying the qps. And I dont want to destroy the qp which has events outstanding, to avoid conditional code in the vent handler. -- MST From ardavis at ichips.intel.com Mon Sep 12 10:46:14 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Mon, 12 Sep 2005 10:46:14 -0700 Subject: [openib-general] Re: dapl_ep_connect problems In-Reply-To: <43246E64.9010309@cs.rutgers.edu> References: <43246E64.9010309@cs.rutgers.edu> Message-ID: <4325BEE6.7080109@ichips.intel.com> Aniruddha Bohra wrote: > Hello, > I tried connecting 2 endpoints using the new uDAPL library. > The connection fails with an invalid route option. > Attached is the log with DAT_DBG_TYPE=0xffff and DAPL_DBG_TYPE=0xffff. > I traced the call to ib_at_route_by_ip(), > It seems like all the arguments (dst_ip, src_ip, r_qual..) are 0x00. > > I have also attached my dat.conf, lsmod, /etc/ibhosts, and /etc/hosts. > > Could you please direct me where to look? Looks like your configuration is ok. The debug message prints the dest GID on return which is a little misleading since the route information is not updated until the async callback fires. I will change this to print the src and dst ip addresses. What address are you using for dap_ep_connect? The src_ip address will be set to 10.10.10.12 which was retrieved from the hca_open and the dst_ip address is taken from your (struct sockaddr_in *) remote_ia_address provided with your dat_ep_connect call. Did you ifconfig the IPoIB devices on both sides? Can you ping using the IPoIB addresses? Thanks, -arlin > > Thanks > Aniruddha > >dapl_ep_connect (0x8057218, {4294967280.4294967295.4294967295.4294967295}, 4A275800, 0, -1, (nil), 0, 0) > connect: r_SID 1244092416, pdata (nil), plen 0 > connect: at_route ret=-1,Invalid argument req_id 0 GID 0000000000000000 0000000000000000 > ib_at_route_by_ip Invalid argument > destroy_cm_id: conn 0x80832c8 id 134755424 >dapl_ep_connect () returns 0x50000 > > From bohra at cs.rutgers.edu Mon Sep 12 10:50:49 2005 From: bohra at cs.rutgers.edu (Aniruddha Bohra) Date: Mon, 12 Sep 2005 13:50:49 -0400 Subject: [openib-general] Re: dapl_ep_connect problems In-Reply-To: <4325BEE6.7080109@ichips.intel.com> References: <43246E64.9010309@cs.rutgers.edu> <4325BEE6.7080109@ichips.intel.com> Message-ID: <4325BFF9.5090501@cs.rutgers.edu> Arlin Davis wrote: > Aniruddha Bohra wrote: > >> Hello, >> I tried connecting 2 endpoints using the new uDAPL library. >> The connection fails with an invalid route option. >> Attached is the log with DAT_DBG_TYPE=0xffff and DAPL_DBG_TYPE=0xffff. >> I traced the call to ib_at_route_by_ip(), >> It seems like all the arguments (dst_ip, src_ip, r_qual..) are 0x00. >> >> I have also attached my dat.conf, lsmod, /etc/ibhosts, and /etc/hosts. >> >> Could you please direct me where to look? > > > Looks like your configuration is ok. > > The debug message prints the dest GID on return which is a little > misleading since the route information is not updated until the async > callback fires. I will change this to print the src and dst ip addresses. > > What address are you using for dap_ep_connect? The src_ip address will > be set to 10.10.10.12 which was retrieved from the hca_open and the > dst_ip address is taken from your (struct sockaddr_in *) > remote_ia_address provided with your dat_ep_connect call. Did you > ifconfig the IPoIB devices on both sides? Can you ping using the > IPoIB addresses? My server is a netapp filer. I attached the IP address to the interface, 10.10.10.11 but the ping fails.. Also, with the correct IP address, I can now post a connect request, but it never gets back with a response (other than timeout). Unfortunately I do not control the server, so I have no way of confirming if it is sent out. I am now trying against an identical OpenIB instance. I will let you know the result. Thanks Aniruddha From jlentini at netapp.com Mon Sep 12 10:58:52 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 12 Sep 2005 13:58:52 -0400 (EDT) Subject: [openib-general] Re: dapl_ep_connect problems In-Reply-To: <4325BCD7.8020509@cs.rutgers.edu> References: <43246E64.9010309@cs.rutgers.edu> <4325BCD7.8020509@cs.rutgers.edu> Message-ID: On Mon, 12 Sep 2005, Aniruddha Bohra wrote: > Hi > I checked the code and it seems there was indeed a problem with that. > I changed the code and now it posts the connect request and does not > get any event. The ep_status is DAT_EP_STATE_CONNECT_PENDING. > I am trying now with dapltest to see if it is a problem with my server > (netapp filer NFSoRDMA) or the Open IB stack. > > Thanks for your help. If you have any idea about some special > configuration required for the connection, I would really appreciate > your help. Is there any way to see the requests similar to tcpdump? I believe you've run into an ATS issue. The ATS implementations in OnTap and OpenIB were done before the ATS specification was drafted in the DAT Collaborative. The interoperability problem stems from the fact that OnTap ATS records and OpenIB ATS records use a different default value for the PKey field. When the issue was discussed in the DAT Collaborative, it was decided that the "default PKey value", 0xFF, should be the default for this field. This is what OpenIB is using. Your version of OnTap needs an update for this. The quick fix is to modify OpenIB to use a default value of 0: Index: core/at.c =================================================================== --- core/at.c (revision 3375) +++ core/at.c (working copy) @@ -95,6 +95,9 @@ static void build_ats_req(struct ib_sa_s { struct ib_sa_ats_rec *ats; + /* FIXME Filer interop change */ + pkey = 0; + memset(rec, 0, sizeof *rec); rec->id = IB_ATS_SERVICE_ID; The long term solution will be to give you an updated version of OnTap that uses the new default value. Let's work directly with one another on that since it is not an OpenIB issue. From itamar at mellanox.co.il Mon Sep 12 11:01:33 2005 From: itamar at mellanox.co.il (Itamar Rabenstein) Date: Mon, 12 Sep 2005 21:01:33 +0300 Subject: [openib-general] Re: dapl_ep_connect problems Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3B01F@mtlexch01.mtl.com> As far as i know NetApp's filer does not have ipoib . You must have ATS in your system in order to work. if you are using opensm try to enable debug and check that both sides register ATS servise record "set" and that your system get correct reply to the ATS "get". Itamar -----Original Message----- From: Aniruddha Bohra [mailto:bohra at cs.rutgers.edu] Sent: Monday, September 12, 2005 8:51 PM To: Arlin Davis Cc: openib-general at openib.org Subject: [openib-general] Re: dapl_ep_connect problems Arlin Davis wrote: > Aniruddha Bohra wrote: > >> Hello, >> I tried connecting 2 endpoints using the new uDAPL library. >> The connection fails with an invalid route option. >> Attached is the log with DAT_DBG_TYPE=0xffff and DAPL_DBG_TYPE=0xffff. >> I traced the call to ib_at_route_by_ip(), >> It seems like all the arguments (dst_ip, src_ip, r_qual..) are 0x00. >> >> I have also attached my dat.conf, lsmod, /etc/ibhosts, and /etc/hosts. >> >> Could you please direct me where to look? > > > Looks like your configuration is ok. > > The debug message prints the dest GID on return which is a little > misleading since the route information is not updated until the async > callback fires. I will change this to print the src and dst ip addresses. > > What address are you using for dap_ep_connect? The src_ip address will > be set to 10.10.10.12 which was retrieved from the hca_open and the > dst_ip address is taken from your (struct sockaddr_in *) > remote_ia_address provided with your dat_ep_connect call. Did you > ifconfig the IPoIB devices on both sides? Can you ping using the > IPoIB addresses? My server is a netapp filer. I attached the IP address to the interface, 10.10.10.11 but the ping fails.. Also, with the correct IP address, I can now post a connect request, but it never gets back with a response (other than timeout). Unfortunately I do not control the server, so I have no way of confirming if it is sent out. I am now trying against an identical OpenIB instance. I will let you know the result. Thanks Aniruddha _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Sep 12 11:03:42 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 14:03:42 -0400 Subject: [openib-general] Re: dapl_ep_connect problems In-Reply-To: References: <43246E64.9010309@cs.rutgers.edu> <4325BCD7.8020509@cs.rutgers.edu> Message-ID: <1126548221.4382.33334.camel@hal.voltaire.com> On Mon, 2005-09-12 at 13:58, James Lentini wrote: > On Mon, 12 Sep 2005, Aniruddha Bohra wrote: > > > Hi > > I checked the code and it seems there was indeed a problem with that. > > I changed the code and now it posts the connect request and does not > > get any event. The ep_status is DAT_EP_STATE_CONNECT_PENDING. > > I am trying now with dapltest to see if it is a problem with my server > > (netapp filer NFSoRDMA) or the Open IB stack. > > > > Thanks for your help. If you have any idea about some special > > configuration required for the connection, I would really appreciate > > your help. Is there any way to see the requests similar to tcpdump? > > I believe you've run into an ATS issue. The ATS implementations in > OnTap and OpenIB were done before the ATS specification was drafted in > the DAT Collaborative. > > The interoperability problem stems from the fact that OnTap ATS > records and OpenIB ATS records use a different default value for the > PKey field. When the issue was discussed in the DAT Collaborative, it > was decided that the "default PKey value", 0xFF, should be the default ^^^^ 0xFFFF > for this field. This is what OpenIB is using. Your version of OnTap > needs an update for this. > > The quick fix is to modify OpenIB to use a default value of 0: > > > Index: core/at.c > =================================================================== > --- core/at.c (revision 3375) > +++ core/at.c (working copy) > @@ -95,6 +95,9 @@ static void build_ats_req(struct ib_sa_s > { > struct ib_sa_ats_rec *ats; > > + /* FIXME Filer interop change */ > + pkey = 0; > + > memset(rec, 0, sizeof *rec); > > rec->id = IB_ATS_SERVICE_ID; > > > The long term solution will be to give you an updated version of OnTap > that uses the new default value. Let's work directly with one > another on that since it is not an OpenIB issue. Did you try this change ? -- Hal From bohra at cs.rutgers.edu Mon Sep 12 11:11:29 2005 From: bohra at cs.rutgers.edu (Aniruddha Bohra) Date: Mon, 12 Sep 2005 14:11:29 -0400 Subject: [openib-general] Re: dapl_ep_connect problems In-Reply-To: <1126548221.4382.33334.camel@hal.voltaire.com> References: <43246E64.9010309@cs.rutgers.edu> <4325BCD7.8020509@cs.rutgers.edu> <1126548221.4382.33334.camel@hal.voltaire.com> Message-ID: <4325C4D1.5080709@cs.rutgers.edu> I just got it -- I will try it rightaway. Thanks Aniruddha Hal Rosenstock wrote: >On Mon, 2005-09-12 at 13:58, James Lentini wrote: > > >>On Mon, 12 Sep 2005, Aniruddha Bohra wrote: >> >> >> >>>Hi >>> I checked the code and it seems there was indeed a problem with that. >>>I changed the code and now it posts the connect request and does not >>>get any event. The ep_status is DAT_EP_STATE_CONNECT_PENDING. >>>I am trying now with dapltest to see if it is a problem with my server >>>(netapp filer NFSoRDMA) or the Open IB stack. >>> >>>Thanks for your help. If you have any idea about some special >>>configuration required for the connection, I would really appreciate >>>your help. Is there any way to see the requests similar to tcpdump? >>> >>> >>I believe you've run into an ATS issue. The ATS implementations in >>OnTap and OpenIB were done before the ATS specification was drafted in >>the DAT Collaborative. >> >>The interoperability problem stems from the fact that OnTap ATS >>records and OpenIB ATS records use a different default value for the >>PKey field. When the issue was discussed in the DAT Collaborative, it >>was decided that the "default PKey value", 0xFF, should be the default >> >> > ^^^^ > 0xFFFF > > > >>for this field. This is what OpenIB is using. Your version of OnTap >>needs an update for this. >> >>The quick fix is to modify OpenIB to use a default value of 0: >> >> >>Index: core/at.c >>=================================================================== >>--- core/at.c (revision 3375) >>+++ core/at.c (working copy) >>@@ -95,6 +95,9 @@ static void build_ats_req(struct ib_sa_s >> { >> struct ib_sa_ats_rec *ats; >> >>+ /* FIXME Filer interop change */ >>+ pkey = 0; >>+ >> memset(rec, 0, sizeof *rec); >> >> rec->id = IB_ATS_SERVICE_ID; >> >> >>The long term solution will be to give you an updated version of OnTap >>that uses the new default value. Let's work directly with one >>another on that since it is not an OpenIB issue. >> >> > >Did you try this change ? > >-- Hal > > > From jlentini at netapp.com Mon Sep 12 11:15:43 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 12 Sep 2005 14:15:43 -0400 (EDT) Subject: [openib-general] Re: dapl_ep_connect problems In-Reply-To: <1126548221.4382.33334.camel@hal.voltaire.com> References: <43246E64.9010309@cs.rutgers.edu> <4325BCD7.8020509@cs.rutgers.edu> <1126548221.4382.33334.camel@hal.voltaire.com> Message-ID: On Mon, 12 Sep 2005, Hal Rosenstock wrote: > On Mon, 2005-09-12 at 13:58, James Lentini wrote: > > On Mon, 12 Sep 2005, Aniruddha Bohra wrote: > > > > > Hi > > > I checked the code and it seems there was indeed a problem with that. > > > I changed the code and now it posts the connect request and does not > > > get any event. The ep_status is DAT_EP_STATE_CONNECT_PENDING. > > > I am trying now with dapltest to see if it is a problem with my server > > > (netapp filer NFSoRDMA) or the Open IB stack. > > > > > > Thanks for your help. If you have any idea about some special > > > configuration required for the connection, I would really appreciate > > > your help. Is there any way to see the requests similar to tcpdump? > > > > I believe you've run into an ATS issue. The ATS implementations in > > OnTap and OpenIB were done before the ATS specification was drafted in > > the DAT Collaborative. > > > > The interoperability problem stems from the fact that OnTap ATS > > records and OpenIB ATS records use a different default value for the > > PKey field. When the issue was discussed in the DAT Collaborative, it > > was decided that the "default PKey value", 0xFF, should be the default > ^^^^ > 0xFFFF > > > for this field. This is what OpenIB is using. Your version of OnTap > > needs an update for this. > > > > The quick fix is to modify OpenIB to use a default value of 0: > > > > > > Index: core/at.c > > =================================================================== > > --- core/at.c (revision 3375) > > +++ core/at.c (working copy) > > @@ -95,6 +95,9 @@ static void build_ats_req(struct ib_sa_s > > { > > struct ib_sa_ats_rec *ats; > > > > + /* FIXME Filer interop change */ > > + pkey = 0; > > + > > memset(rec, 0, sizeof *rec); > > > > rec->id = IB_ATS_SERVICE_ID; > > > > > > The long term solution will be to give you an updated version of OnTap > > that uses the new default value. Let's work directly with one > > another on that since it is not an OpenIB issue. > > Did you try this change ? It works for our kernel NFS-RDMA client. Does IBAT use a different code path for creating ATS records if userspace access is enabled? From iod00d at hp.com Mon Sep 12 11:20:49 2005 From: iod00d at hp.com (Grant Grundler) Date: Mon, 12 Sep 2005 11:20:49 -0700 Subject: [openib-general] Re: [PATCH] [SRP] Fix CM redirection in SRP In-Reply-To: <524q8qckl6.fsf@cisco.com> References: <528xy6i7e4.fsf@cisco.com> <524q8qckl6.fsf@cisco.com> Message-ID: <20050912182049.GB21820@esmail.cup.hp.com> On Mon, Sep 12, 2005 at 09:33:09AM -0700, Roland Dreier wrote: ... > Index: infiniband/ulp/srp/ib_srp.c > =================================================================== > --- infiniband/ulp/srp/ib_srp.c (revision 3372) > +++ infiniband/ulp/srp/ib_srp.c (working copy) > @@ -386,11 +386,48 @@ static void srp_remove_work(void *target > scsi_host_put(target->scsi_host); > } > > +static int srp_connect_target(struct srp_target_port *target) > +{ > + int ret; > + > + while (1) { > + init_completion(&target->done); > + ret = srp_send_req(target); > + if (ret) > + return ret; > + wait_for_completion(&target->done); > + > + /* > + * The CM event handling code will set status to > + * SRP_PORT_REDIRECT if we get a port redirect REJ > + * back, or SRP_DLID_REDIRECT if we get a lid/qp > + * redirect REJ back. > + */ > + switch (target->status) { > + case 0: > + return 0; > + > + case SRP_PORT_REDIRECT: > + ret = srp_lookup_path(target); > + if (ret) > + return ret; > + break; > + > + case SRP_DLID_REDIRECT: > + break; > + > + default: > + return target->status; > + } > + } > +} Roland, Nothing is returned in the SRP_DLID_REDIRECT case. I expect this will generate a compiler warning. I have no clue what it's supposed to do in that case. grant From iod00d at hp.com Mon Sep 12 11:23:51 2005 From: iod00d at hp.com (Grant Grundler) Date: Mon, 12 Sep 2005 11:23:51 -0700 Subject: [openib-general] Re: [PATCH] [SRP] Fix CM redirection in SRP In-Reply-To: <20050912182049.GB21820@esmail.cup.hp.com> References: <528xy6i7e4.fsf@cisco.com> <524q8qckl6.fsf@cisco.com> <20050912182049.GB21820@esmail.cup.hp.com> Message-ID: <20050912182351.GC21820@esmail.cup.hp.com> On Mon, Sep 12, 2005 at 11:20:49AM -0700, Grant Grundler wrote: > Roland, > Nothing is returned in the SRP_DLID_REDIRECT case. > I expect this will generate a compiler warning. Sorry, looking at the code again, I just realized the "while(1)" won't let it exit. grant From halr at voltaire.com Mon Sep 12 11:20:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 14:20:14 -0400 Subject: [openib-general] Re: dapl_ep_connect problems In-Reply-To: References: <43246E64.9010309@cs.rutgers.edu> <4325BCD7.8020509@cs.rutgers.edu> <1126548221.4382.33334.camel@hal.voltaire.com> Message-ID: <1126549009.4382.33519.camel@hal.voltaire.com> On Mon, 2005-09-12 at 14:15, James Lentini wrote: > It works for our kernel NFS-RDMA client. running OpenIB ? > Does IBAT use a different code path for creating ATS records if > userspace access is enabled? No. -- Hal From jlentini at netapp.com Mon Sep 12 11:29:23 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 12 Sep 2005 14:29:23 -0400 (EDT) Subject: [openib-general] Re: dapl_ep_connect problems In-Reply-To: <1126549009.4382.33519.camel@hal.voltaire.com> References: <43246E64.9010309@cs.rutgers.edu> <4325BCD7.8020509@cs.rutgers.edu> <1126548221.4382.33334.camel@hal.voltaire.com> <1126549009.4382.33519.camel@hal.voltaire.com> Message-ID: On Mon, 12 Sep 2005, Hal Rosenstock wrote: > On Mon, 2005-09-12 at 14:15, James Lentini wrote: > > It works for our kernel NFS-RDMA client. > > running OpenIB ? Yes. The configuration is the current OpenIB svn code (including kDAPL). From jlentini at netapp.com Mon Sep 12 11:56:42 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 12 Sep 2005 14:56:42 -0400 (EDT) Subject: [openib-general] Re: dapl_quit_util.c In-Reply-To: <1126537051.4382.31182.camel@hal.voltaire.com> References: <1126537051.4382.31182.camel@hal.voltaire.com> Message-ID: On Mon, 12 Sep 2005, Hal Rosenstock wrote: > Hi James, > > .../src/userspace/dapl/test/dapltest/test/dapl_quit_util.c has +x > permissions which should be removed. Thanks. Fixed in revision 3377. From jlentini at netapp.com Mon Sep 12 12:00:57 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 12 Sep 2005 15:00:57 -0400 (EDT) Subject: [openib-general] Re: uDAPL README comment In-Reply-To: <1126538125.4382.31372.camel@hal.voltaire.com> References: <1126538125.4382.31372.camel@hal.voltaire.com> Message-ID: On Mon, 12 Sep 2005, Hal Rosenstock wrote: > Hi again, > > /userspace/dapl/dapl/openib/README says: > > - modify doc/dat.conf to add a example openib configuration > ... > doc/dat.conf > > > The doc subdirectory and sample dat.conf appear to be missing. Fixed in revision 3378. From halr at voltaire.com Mon Sep 12 12:03:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 15:03:49 -0400 Subject: [openib-general] Re: [openib-commits] r3378 - in gen2/trunk/src/userspace/dapl: . doc In-Reply-To: <20050912191444.C2D7C2283D4@openib.ca.sandia.gov> References: <20050912191444.C2D7C2283D4@openib.ca.sandia.gov> Message-ID: <1126551828.4382.34111.camel@hal.voltaire.com> On Mon, 2005-09-12 at 15:14, jlentini at openib.org wrote: > Added: ... > gen2/trunk/src/userspace/dapl/doc/dapl_ibm_api_variations.txt Is this needed anymore ? From jlentini at netapp.com Mon Sep 12 12:29:23 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 12 Sep 2005 15:29:23 -0400 (EDT) Subject: [openib-general] Re: [openib-commits] r3378 - in gen2/trunk/src/userspace/dapl: . doc In-Reply-To: <1126551828.4382.34111.camel@hal.voltaire.com> References: <20050912191444.C2D7C2283D4@openib.ca.sandia.gov> <1126551828.4382.34111.camel@hal.voltaire.com> Message-ID: On Mon, 12 Sep 2005, Hal Rosenstock wrote: > On Mon, 2005-09-12 at 15:14, jlentini at openib.org wrote: > > Added: > ... > > gen2/trunk/src/userspace/dapl/doc/dapl_ibm_api_variations.txt > Is this needed anymore ? That document is not relevant to the implementation of DAPL on OpenIB but it is part of the common DAPL documentation. There are references to other verbs implementations in the other documents as well. I'd like to keep the documentation the same, regardless of the verbs layer. From halr at voltaire.com Mon Sep 12 12:24:58 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 15:24:58 -0400 Subject: [openib-general] Status of opensm 1.8 merge In-Reply-To: <4df28be405091209494a5a4236@mail.gmail.com> References: <4df28be405091209494a5a4236@mail.gmail.com> Message-ID: <1126553002.4382.34348.camel@hal.voltaire.com> Hi Viswa, On Mon, 2005-09-12 at 12:49, Viswanath Krishnamurthy wrote: > Can I start testing opensm 1.8 merge on gen2 ? What is the current > status ? I will be checking it in shortly. You can either wait for this or start with the osm-1.8.0-merge branch. -- Hal From halr at voltaire.com Mon Sep 12 12:28:12 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 15:28:12 -0400 Subject: [openib-general] Re: OpenSM - branch for current merging In-Reply-To: <506C3D7B14CDD411A52C00025558DED60CCF36@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED60CCF36@mtlex01.yok.mtl.com> Message-ID: <1126553008.4382.34350.camel@hal.voltaire.com> Hi Yael, On Sun, 2005-09-11 at 05:02, Yael Kalka wrote: > I want to test and debug the files you are merging in the new opensm. > Can you create a branch with your merged files of the osm, so I can > use that branch both > to try and debug your problems and try and test it on some systems > here? I will be checking things back into the trunk shortly. > I assume there are differences between your branch and our > osm-1.8.0-merge branch. Yes. -- Hal From jlentini at netapp.com Mon Sep 12 13:14:39 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 12 Sep 2005 16:14:39 -0400 (EDT) Subject: [openib-general] [PATCH] ib_sync_cq ( was Re: RFC: ib_set_comp_handler) In-Reply-To: <20050912080755.GE19358@mellanox.co.il> References: <20050911203142.GA25325@mellanox.co.il> <20050912080755.GE19358@mellanox.co.il> Message-ID: This would be a useful feature. The purpose of this function would be more obvious if you included the new comp_handler and cq_contex in the function signature. A different name would help as well. I would suggest: void ib_modify_cq(struct ib_cq *cq, void (*event_handler)(struct ib_event *, void *), void *cq_context); On Mon, 12 Sep 2005, Michael S. Tsirkin wrote: > Roland, Sean, > With the following patch, it becomes legal for clients to modify comp_handler > or cq_context fields in ib_cq structure of an existing cq. > > To avoid races, and to make it possible for hw layer to cache > these values, I added a new API ib_sync_cq which must be called > after one of comp_handler and cq_context is changed. > > I plan to use this capability in sdp, to disable cq events while a connection > is destroyed. I expect this to be useful in other scenarious (IPoIB NAPI?). > > Comments? > > --- > > Make it possible to flush completion events for a specific cq. > > Signed-off-by: Michael S. Tsirkin > > Index: linux-2.6.13/drivers/infiniband/hw/mthca/mthca_cq.c > =================================================================== > --- linux-2.6.13.orig/drivers/infiniband/hw/mthca/mthca_cq.c 2005-09-11 17:52:37.000000000 +0300 > +++ linux-2.6.13/drivers/infiniband/hw/mthca/mthca_cq.c 2005-09-12 10:33:13.000000000 +0300 > @@ -789,6 +789,15 @@ err_out: > return err; > } > > +void mthca_sync_cq(struct ib_cq *ibcq) > +{ > + struct mthca_dev *dev = to_mdev(ibcq->device); > + if (dev->mthca_flags & MTHCA_FLAG_MSI_X) > + synchronize_irq(dev->eq_table.eq[MTHCA_EQ_COMP].msi_x_vector); > + else > + synchronize_irq(dev->pdev->irq); > +} > + > void mthca_free_cq(struct mthca_dev *dev, > struct mthca_cq *cq) > { > Index: linux-2.6.13/drivers/infiniband/include/rdma/ib_verbs.h > =================================================================== > --- linux-2.6.13.orig/drivers/infiniband/include/rdma/ib_verbs.h 2005-09-11 10:24:36.000000000 +0300 > +++ linux-2.6.13/drivers/infiniband/include/rdma/ib_verbs.h 2005-09-12 10:40:33.000000000 +0300 > @@ -884,6 +884,7 @@ struct ib_device { > struct ib_cq * (*create_cq)(struct ib_device *device, int cqe, > struct ib_ucontext *context, > struct ib_udata *udata); > + void (*sync_cq)(struct ib_cq *cq); > int (*destroy_cq)(struct ib_cq *cq); > int (*resize_cq)(struct ib_cq *cq, int *cqe); > int (*poll_cq)(struct ib_cq *cq, int num_entries, > @@ -1227,6 +1228,16 @@ struct ib_cq *ib_create_cq(struct ib_dev > void *cq_context, int cqe); > > /** > + * ib_sync_cq - flush CQ completion event handler. > + * This must be used after modifying comp_handler or cq_context. > + * @cq: The CQ to flush events for. > + */ > +static inline void ib_sync_cq(struct ib_cq *cq) > +{ > + return cq->device->sync_cq(cq); > +} > + > +/** > * ib_resize_cq - Modifies the capacity of the CQ. > * @cq: The CQ to resize. > * @cqe: The minimum size of the CQ. > Index: linux-2.6.13/drivers/infiniband/hw/mthca/mthca_provider.c > =================================================================== > --- linux-2.6.13.orig/drivers/infiniband/hw/mthca/mthca_provider.c 2005-09-11 10:24:37.000000000 +0300 > +++ linux-2.6.13/drivers/infiniband/hw/mthca/mthca_provider.c 2005-09-12 10:34:00.000000000 +0300 > @@ -1090,6 +1090,7 @@ int mthca_register_device(struct mthca_d > dev->ib_dev.destroy_qp = mthca_destroy_qp; > dev->ib_dev.create_cq = mthca_create_cq; > dev->ib_dev.destroy_cq = mthca_destroy_cq; > + dev->ib_dev.sync_cq = mthca_sync_cq; > dev->ib_dev.poll_cq = mthca_poll_cq; > dev->ib_dev.get_dma_mr = mthca_get_dma_mr; > dev->ib_dev.reg_phys_mr = mthca_reg_phys_mr; > > -- > MST > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From thomas.duffy.99 at alumni.brown.edu Mon Sep 12 13:29:51 2005 From: thomas.duffy.99 at alumni.brown.edu (Tom Duffy) Date: Mon, 12 Sep 2005 13:29:51 -0700 Subject: [openib-general] Re: sdp: kill sdp buff pool In-Reply-To: <20050911072216.GM19358@mellanox.co.il> References: <20050908114156.GI19358@mellanox.co.il> <20050908161849.GA21522@mellanox.co.il> <20050911072216.GM19358@mellanox.co.il> Message-ID: On Sep 11, 2005, at 12:22 AM, Michael S. Tsirkin wrote: > It turns out I didnt post that patch yet. > It might not apply cleanly now since I didnt yet update it after > recent changes. Do you want to see it? Doesn't really matter at this point. What was left in it that hasn't changed? -tduffy From mshefty at ichips.intel.com Mon Sep 12 14:09:17 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 12 Sep 2005 14:09:17 -0700 Subject: [openib-general] [PATCH] ib_sync_cq ( was Re: RFC: ib_set_comp_handler) In-Reply-To: References: <20050911203142.GA25325@mellanox.co.il> <20050912080755.GE19358@mellanox.co.il> Message-ID: <4325EE7D.2080503@ichips.intel.com> James Lentini wrote: > The purpose of this function would be more obvious if you included the > new comp_handler and cq_contex in the function signature. A different > name would help as well. > > I would suggest: > > void ib_modify_cq(struct ib_cq *cq, > void (*event_handler)(struct ib_event *, void *), > void *cq_context); I think that this makes more sense. It keeps the synchronization internal to the verbs layer, and prevents the user from overwriting the event_handler at the same time that it may be read by the hca driver. Can we rely on a write to the cq->event_handler being atomic wrt a read of the same value? - Sean From halr at voltaire.com Mon Sep 12 14:05:06 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 17:05:06 -0400 Subject: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features Message-ID: <1126559105.4382.35460.camel@hal.voltaire.com> OpenIB OpenSM now includes the OpenSM 1.8.0 functionality. Major thanks go to Yael Kalka and Eitan Zahavi of Mellanox. This is a complete merge of the osm-1.8.0-branch up through version 3368. There are 2 known caveats with this so far: 1. Some Anafa ports cannot be brought to active if not most recent firmware (5.3.3) 2. Solaris interoperability needs work again (I'm working on this) New Features Semistatic LID assignment No LID change on SM restart or node reboot Critical for IPoIB to avoid communication loss Irresponsive port scan during light sweep No response but Link state not down Switch ports with HCA neighbor have lower HOQLife Faster drain so bad HCA not impact subnet Pkeys Not reordered Default values not set DDR and QDR support Options Cache including all non command line Use –c flag to create /var/cache/osm/opensm.opts Kill –HUP Forces a new full sweep Bug Fixes Overflow on SA queries (now drops them if overflow) Multicast tree build took forever on large clusters MTU and Rate selectors ignored during MCMemberRecord Query Deleted multicast groups existing until deferred deletion Crashed on any zero Port or Node GUID SMInfo with a non default PKey was dropped DDR and QDR rates were not calculated correctly Fail to error Service Record delete of non-existing record Memory leak in SA Client code Multicast Join did not check for ‘JoinState != 0’ PortInfo SA query fail if base_lid component used OpenSM runs out of MLIDs even though some groups were deleted Complib race in Passive Lock caused a deadlock (now use rwlock) Many more less severe bugs fixed NOTE: The old OpenIB OpenSM is still available on the osm-pre-1.8.0 branch. -- Hal From davem at davemloft.net Mon Sep 12 14:13:51 2005 From: davem at davemloft.net (David S. Miller) Date: Mon, 12 Sep 2005 14:13:51 -0700 (PDT) Subject: [openib-general] Re: [PATCH] af_packet: Allow for > 8 byte hardware addresses. In-Reply-To: References: <1123786117.4403.5835.camel@hal.voltaire.com> <20050811.124916.77057824.davem@davemloft.net> Message-ID: <20050912.141351.50320521.davem@davemloft.net> From: ebiederm at xmission.com (Eric W. Biederman) Date: Sat, 10 Sep 2005 11:25:27 -0600 > @@ -1315,11 +1340,16 @@ packet_setsockopt(struct socket *sock, i > case PACKET_ADD_MEMBERSHIP: > case PACKET_DROP_MEMBERSHIP: > { > - struct packet_mreq mreq; > - if (optlen + struct packet_mreq_max mreq; > + int len = optlen; > + if (len < sizeof(struct packet_mreq)) > return -EINVAL; > - if (copy_from_user(&mreq,optval,sizeof(mreq))) > + if (len > sizeof(mreq)) > + len = sizeof(mreq); > + if (copy_from_user(&mreq,optval,len)) > return -EFAULT; I would suggest memset()'ing out any packet_mreq_max structure, before copying a smaller amount of data into it, just to be safe. Please check this out in all such possible uses in the patch. Thanks. From rolandd at cisco.com Mon Sep 12 14:15:07 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 12 Sep 2005 14:15:07 -0700 Subject: [openib-general] [PATCH] ib_sync_cq ( was Re: RFC: ib_set_comp_handler) In-Reply-To: <4325EE7D.2080503@ichips.intel.com> (Sean Hefty's message of "Mon, 12 Sep 2005 14:09:17 -0700") References: <20050911203142.GA25325@mellanox.co.il> <20050912080755.GE19358@mellanox.co.il> <4325EE7D.2080503@ichips.intel.com> Message-ID: <523boaasys.fsf@cisco.com> James> The purpose of this function would be more obvious if you James> included the new comp_handler and cq_contex in the function James> signature. A different name would help as well. I would James> suggest: void ib_modify_cq(struct ib_cq *cq, void James> (*event_handler)(struct ib_event *, void *), void James> *cq_context); Sean> I think that this makes more sense. It keeps the Sean> synchronization internal to the verbs layer, and prevents Sean> the user from overwriting the event_handler at the same time Sean> that it may be read by the hca driver. Can we rely on a Sean> write to the cq->event_handler being atomic wrt a read of Sean> the same value? I'm not sure that writes to function pointers are always atomic. For example the ppc64 ABI does some crazy stuff with function descriptors. On the other hand I'm not sure I like wrapping things up in an ib_modify_cq() function. Calling ib_modify_cq() from a completion event handler will deadlock, and if we allow clearing the CQ event handler, it seems legitimate thing to do so from a CQ event handler. In any case I'm not sure I buy the motivation behind adding this function. It saves a conditional branch in SDP's completion handler, but on the other hand, that branch is going to be a test of a value in a cache line that already gets used, and it's almost always going to be predicted correctly. So I'm not sure the performance gain even exists. - R. From jlentini at netapp.com Mon Sep 12 14:15:46 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 12 Sep 2005 17:15:46 -0400 (EDT) Subject: [openib-general] [PATCH] ib_sync_cq ( was Re: RFC: ib_set_comp_handler) In-Reply-To: <4325EE7D.2080503@ichips.intel.com> References: <20050911203142.GA25325@mellanox.co.il> <20050912080755.GE19358@mellanox.co.il> <4325EE7D.2080503@ichips.intel.com> Message-ID: On Mon, 12 Sep 2005, Sean Hefty wrote: > James Lentini wrote: > > The purpose of this function would be more obvious if you included the new > > comp_handler and cq_contex in the function signature. A different name would > > help as well. > > I would suggest: > > > > void ib_modify_cq(struct ib_cq *cq, void (*event_handler)(struct > > ib_event *, void *), > > void *cq_context); > > I think that this makes more sense. It keeps the synchronization > internal to the verbs layer, and prevents the user from overwriting > the event_handler at the same time that it may be read by the hca > driver. Can we rely on a write to the cq->event_handler being > atomic wrt a read of the same value? Along those same lines, we should also ensure that when both the event_handler and cq_context are changed at the same time, the update is atomic. james From halr at voltaire.com Mon Sep 12 14:17:08 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 17:17:08 -0400 Subject: [openib-general] udapl copyrights and OpenIB Message-ID: <1126559793.4382.35592.camel@hal.voltaire.com> Hi Arlin & James, The udapl code appears to have the original license on it. Should it have the OpenIB copyright ? -- Hal From kingman at storagegear.com Mon Sep 12 14:25:41 2005 From: kingman at storagegear.com (John Kingman) Date: Mon, 12 Sep 2005 16:25:41 -0500 (CDT) Subject: [openib-general] Re: [PATCH] [SRP] Fix CM redirection in SRP In-Reply-To: <524q8qckl6.fsf@cisco.com> References: <528xy6i7e4.fsf@cisco.com> <524q8qckl6.fsf@cisco.com> Message-ID: On Mon, 12 Sep 2005, Roland Dreier wrote: >I rewrote your patch to fix a few things: > > - I believe we always need to copy the redirect GID, even if the > redirect LID is non-zero; otherwise we will end up putting the GID > of the original port in our CM REQ. From my reading of the > description of ClassPortInfo in the IB spec, I think the redirect > GID must always be valid if redirection is being done. OK. > - I moved the connect handling into a new function > srp_connect_target(), so that both the initial connect and future > reconnects get handling for port and DLID redirection. OK. > - I got rid of the change for CM REJ reason 25 -- that reject code > does not carry a ClassPortInfo in the ARI, but rather just the > GID at offset 0. OK. >By the way, it's not that critical for SRP, since it's not in the >upstream kernel yet, but I don't think we should use obfuscated email >addresses in Signed-off-by lines: > > > Signed-off-by: John Kingman storagegear.com> > >Just put the '@' in there -- I think there are much bigger spam >sources to worry about. OK. >Anyway, here's the updated patch. Does this look OK to you? I believe that there are some errors that have already been noted. Will you be sending another patch? John From rolandd at cisco.com Mon Sep 12 14:28:14 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 12 Sep 2005 14:28:14 -0700 Subject: [openib-general] Re: [PATCH] [SRP] Fix CM redirection in SRP In-Reply-To: (John Kingman's message of "Mon, 12 Sep 2005 16:25:41 -0500 (CDT)") References: <528xy6i7e4.fsf@cisco.com> <524q8qckl6.fsf@cisco.com> Message-ID: <52u0gq9dsh.fsf@cisco.com> John> I believe that there are some errors that have already been John> noted. Will you be sending another patch? I didn't see them -- as far as I know, the patch is OK. Can you resend? - R. From elrxb at tenmillion.com Mon Sep 12 15:36:53 2005 From: elrxb at tenmillion.com (Willis Key) Date: Mon, 12 Sep 2005 20:36:53 -0200 Subject: [openib-general] Thank you for using our Shopping system In-Reply-To: <26331128143048.B6072@obey.doit.wisc.edu> References: <25630224173406.A16330@boucher.doit.wisc.edu> Message-ID: <22031129233551.K36494@hermeneutic.noc.ntua.gr> "Ci-iallis Sof-tabs" is better than Pfizer V-iiaggrra and normal Ci-ialis because: - Guarantes 36 hours lasting - Safe to take, no side effectts at all - Boost and increase se-xual perfoormance - Haarder e-rectiiions and quick recharge - Proven and c-ertified by e-xperts and d-octors - only $1.98 per tabs - Special offeer! These prices - are valid u-ntil 20th of September ! Clisk h-ere: http://gzcore.com toy befallen jerry imaginate coproduct offer judith demurring hymn anchoritism barrette apprentice shylock transept pegging aviv gabble chatham radcliffe forbade collector progress bivariate dacca sultanate winslow crispin legging adler associate chow peppermint colgate fuse gannett druid bifurcate hammock dextrous acetylene impunity titian priggish bonnie homage heed assure chairlady big around backplane dementia legato egyptian From roel at yottayotta.com Mon Sep 12 15:09:53 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Mon, 12 Sep 2005 16:09:53 -0600 (MDT) Subject: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features In-Reply-To: <1126559105.4382.35460.camel@hal.voltaire.com> References: <1126559105.4382.35460.camel@hal.voltaire.com> Message-ID: Hi Hal, Hal Rosenstock wrote: > There are 2 known caveats with this so far: > 1. Some Anafa ports cannot be brought to active if not most recent > firmware (5.3.3) Are you sure the firmware release is 5.3.3, because on the Mellanox web site we can only find 5.3.0? > 2. Solaris interoperability needs work again (I'm working on this) Cheers :-), Roel. From ebiederm at xmission.com Mon Sep 12 15:13:23 2005 From: ebiederm at xmission.com (Eric W. Biederman) Date: Mon, 12 Sep 2005 16:13:23 -0600 Subject: [openib-general] Re: [PATCH] af_packet: Allow for > 8 byte hardware addresses. In-Reply-To: <20050912.141351.50320521.davem@davemloft.net> (David S. Miller's message of "Mon, 12 Sep 2005 14:13:51 -0700 (PDT)") References: <1123786117.4403.5835.camel@hal.voltaire.com> <20050811.124916.77057824.davem@davemloft.net> <20050912.141351.50320521.davem@davemloft.net> Message-ID: "David S. Miller" writes: > From: ebiederm at xmission.com (Eric W. Biederman) > Date: Sat, 10 Sep 2005 11:25:27 -0600 > >> @@ -1315,11 +1340,16 @@ packet_setsockopt(struct socket *sock, i >> case PACKET_ADD_MEMBERSHIP: >> case PACKET_DROP_MEMBERSHIP: >> { >> - struct packet_mreq mreq; >> - if (optlen> + struct packet_mreq_max mreq; >> + int len = optlen; >> + if (len < sizeof(struct packet_mreq)) >> return -EINVAL; >> - if (copy_from_user(&mreq,optval,sizeof(mreq))) >> + if (len > sizeof(mreq)) >> + len = sizeof(mreq); >> + if (copy_from_user(&mreq,optval,len)) >> return -EFAULT; > > I would suggest memset()'ing out any packet_mreq_max structure, > before copying a smaller amount of data into it, just to be > safe. Please check this out in all such possible uses in > the patch. > > Thanks. Ok. For that specific case you have quoted the only instance. In a practical sense it doesn't matter because halen determines how many of the bytes we actually look at. But if something is buggy I can see the memset causing the bug to act in a more deterministic fashion. Updated patch will follow in a bit. Eric From halr at voltaire.com Mon Sep 12 15:38:59 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 18:38:59 -0400 Subject: [openib-general] RE: osmtest/OpenSM: ServiceGID and busy status In-Reply-To: <20050912155358.B6F6A253369@mailgw.voltaire.com> References: <1126531043.4382.30246.camel@hal.voltaire.com> <20050912155358.B6F6A253369@mailgw.voltaire.com> Message-ID: <1126564181.4382.36164.camel@hal.voltaire.com> On Mon, 2005-09-12 at 11:53, Eitan Zahavi wrote: > Hal Rosenstock wrote: ... > > C15-0.1.13: SA shall reject as invalid any attempt to create, modify, or > > delete a ServiceRecord in which the ServiceP_Key is not present in the > > P_Key Tables of both the port identified by the ServiceGID and the port > > from which the request came. > > > Thanks for finding/pointing it. I am not sure this is covered in the > implementation as of today. As it would cause the osmtest requests to be dropped. The tests should be fixed to do the right thing although there can be negative tests as well. > Let us add it to the todo list. Done. -- Hal From davem at davemloft.net Mon Sep 12 15:45:27 2005 From: davem at davemloft.net (David S. Miller) Date: Mon, 12 Sep 2005 15:45:27 -0700 (PDT) Subject: [openib-general] Re: [PATCH] af_packet: Allow for > 8 byte hardware addresses. In-Reply-To: References: <20050912.141351.50320521.davem@davemloft.net> Message-ID: <20050912.154527.48978091.davem@davemloft.net> From: ebiederm at xmission.com (Eric W. Biederman) Date: Mon, 12 Sep 2005 16:13:23 -0600 > Updated patch will follow in a bit. Thanks for following up on this Eric. From xma at us.ibm.com Mon Sep 12 15:47:18 2005 From: xma at us.ibm.com (Shirley Ma) Date: Mon, 12 Sep 2005 15:47:18 -0700 Subject: [openib-general] build userspace Message-ID: Anything changed to build the userspace? I need to manually export LD_LIBRARY_PATH=/usr/local/lib (svn 3380) to build libibcm. which didn't need before. It used to check /usr/local/lib by default. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Mon Sep 12 15:50:21 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 12 Sep 2005 15:50:21 -0700 Subject: [openib-general] build userspace In-Reply-To: (Shirley Ma's message of "Mon, 12 Sep 2005 15:47:18 -0700") References: Message-ID: <52hdcpaok2.fsf@cisco.com> Shirley> Anything changed to build the userspace? I need to Shirley> manually export LD_LIBRARY_PATH=/usr/local/lib (svn 3380) Shirley> to build libibcm. which didn't need before. It used to Shirley> check /usr/local/lib by default. I just fixed some bugs in the libibcm Makefile.am that could have made it seem the LD_LIBRARY_PATH wasn't needed. - R. From halr at voltaire.com Mon Sep 12 15:48:48 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 18:48:48 -0400 Subject: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features In-Reply-To: References: <1126559105.4382.35460.camel@hal.voltaire.com> Message-ID: <1126564330.4382.36185.camel@hal.voltaire.com> On Mon, 2005-09-12 at 18:09, Roel van der Goot wrote: > Are you sure the firmware release is 5.3.3, because on the > Mellanox web site we can only find 5.3.0? You're right. It should have said 5.3.0. -- Hal From halr at voltaire.com Mon Sep 12 15:57:04 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 12 Sep 2005 18:57:04 -0400 Subject: [openib-general] RE: opensm and signals In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30691CC@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30691CC@mtlexch01.mtl.com> Message-ID: <1126564672.4382.36243.camel@hal.voltaire.com> On Mon, 2005-09-12 at 04:00, Eitan Zahavi wrote: > > Hi, Hal, Eitan! > > Whats the reason opensm needs to catch and try to handle signals > such as SIGINT? > [EZ] The reason was way back some drivers had resource tracking > problems. So if OpenSM left without cleaning up all used resources > (like MAD buffers and UD-AVs) > > the driver oops'ed. > > It seems that we can let the default handler simply kill the > application. > > If this is required for some vendor layer, shouldnt the signal > > handling be part of that vendor layer? > I will be glad to remove that code... Do we still need to any supported vendor layer ? -- Hal From rpandit at silverstorm.com Mon Sep 12 16:04:33 2005 From: rpandit at silverstorm.com (Pandit, Ranjit) Date: Mon, 12 Sep 2005 19:04:33 -0400 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable Datagram Sockets) to OpenIB Message-ID: <5D78D28F88822E4D8702BB9EEF1A4367AE3CF3@mercury.infiniconsys.com> Hello, Following up on the commitment made in the last OpenIB conference, I'm pleased to announce that SilverStorm would like to contribute RDS to OpenIB and take on the roll of maintainer for this component. The current reference implementation of RDS is on SilverStorm's Access Layer. Our intention is to post the reference implementation ASAP and begin porting to OpenIB. Please let me know the best way to get started. Thanks, Ranjit Pandit SilverStorm Technologies -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Mon Sep 12 16:14:29 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 12 Sep 2005 16:14:29 -0700 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable Datagram Sockets) to OpenIB In-Reply-To: <5D78D28F88822E4D8702BB9EEF1A4367AE3CF3@mercury.infiniconsys.com> (Ranjit Pandit's message of "Mon, 12 Sep 2005 19:04:33 -0400") References: <5D78D28F88822E4D8702BB9EEF1A4367AE3CF3@mercury.infiniconsys.com> Message-ID: <52d5ndanfu.fsf@cisco.com> Ranjit> The current reference implementation of RDS is on Ranjit> SilverStorm's Access Layer. Our intention is to post the Ranjit> reference implementation ASAP and begin porting to OpenIB. Ranjit> Please let me know the best way to get started. The plan above sounds reasonable to me. - R. From kingman at storagegear.com Mon Sep 12 16:30:42 2005 From: kingman at storagegear.com (John Kingman) Date: Mon, 12 Sep 2005 18:30:42 -0500 (CDT) Subject: [openib-general] Re: [PATCH] [SRP] Fix CM redirection in SRP In-Reply-To: <52u0gq9dsh.fsf@cisco.com> References: <528xy6i7e4.fsf@cisco.com> <524q8qckl6.fsf@cisco.com> <52u0gq9dsh.fsf@cisco.com> Message-ID: On Mon, 12 Sep 2005, Roland Dreier wrote: > John> I believe that there are some errors that have already been > John> noted. Will you be sending another patch? > >I didn't see them -- as far as I know, the patch is OK. Can you resend? Sorry, I misread Grant's posts. The patch looks good and works for me! John From rolandd at cisco.com Mon Sep 12 17:30:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 12 Sep 2005 17:30:10 -0700 Subject: strange mem-free bug (was: [openib-general] completion Q overflow error/panic) In-Reply-To: <4df28be4050909152118f3e947@mail.gmail.com> (Viswanath Krishnamurthy's message of "Fri, 9 Sep 2005 15:21:44 -0700") References: <4df28be4050909152118f3e947@mail.gmail.com> Message-ID: <52u0gp95d9.fsf@cisco.com> While looking at Viswa's example, I've found what seems to be a problem using lots of QPs on mem-free HCAs. This could easily be an mthca driver bug, but I'd appreciate it if Mellanox would take a look and help track down the issue. I looked at the mthca code and don't see anything wrong, so either narrowing down the software bug or telling me it's actually a FW/HW bug would be great. I'm attaching a fairly simple program that shows the problem on my systems. It just creates a bunch of QPs and has one side send one message from each QP. The other side waits for receives and sends a reply back for every receive it gets. When all the replies are received, it loops around and does it again. To build the example, just do: gcc -o rc-test rc-test.c -libverbs To run, do rc-test on one system, and rc-test on the other. In fact, I can reproduce the problem even on a single system just with rc-test & rc-test localhost On a system with a PCI-X HCA, this works perfectly. However, on a system with Arbel HCAs (with mem-free FW 5.1.0), I get the following output (going on forever): local address: LID 0x0008 remote address: LID 0x0007 After 1.000066 sec, 104/4000 comps After 2.000276 sec, 104/4000 comps After 3.000295 sec, 104/4000 comps After 4.000332 sec, 104/4000 comps After 5.000375 sec, 104/4000 comps which shows that only 104 out of the 4000 send/receive pairs ever complete. On the other side I see the same number of completions. It seems the HCA loses a bunch of doorbells, although an IPoIB traffic running in the background continues fine. Viswa seems to have seen the same problem with Sinai & FW 1.0.1. Let me know if you need more info. Thanks, Roland -------------- next part -------------- A non-text attachment was scrubbed... Name: rc-test.c Type: text/x-csrc Size: 16011 bytes Desc: not available URL: From rajib.majumder at csfb.com Mon Sep 12 20:26:54 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Tue, 13 Sep 2005 11:26:54 +0800 Subject: [openib-general] SDP over local communication Message-ID: hi, I am wondering if SDP has support for local communication, i.e, when both the data source and the sink run on the same physical hardware and uses SOCK_STREAM. I have another application that chooses between AF_UNIX and AF_INET, depending on whether the peer is on the same host or remote host. Will SDP work in this case? any opinion is appreciated. thanks. rajib ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From mst at mellanox.co.il Mon Sep 12 22:07:07 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Sep 2005 08:07:07 +0300 Subject: [openib-general] Re: SDP over local communication In-Reply-To: References: Message-ID: <20050913050706.GB12526@mellanox.co.il> Quoting Majumder, Rajib : > Subject: SDP over local communication > > hi, > > I am wondering if SDP has support for local communication, i.e, when both the > data source and the sink run on the same physical hardware and uses > SOCK_STREAM. > > I have another application that chooses between AF_UNIX and AF_INET, depending > on whether the peer is on the same host or remote host. Will SDP work in this > case? > > any opinion is appreciated. > > thanks. > > rajib Yes, SDP implements a special test for loopback address. However, you currently still need opensm running on the subnet, since path record queries are sent to opensm, anyway. -- MST From mst at mellanox.co.il Mon Sep 12 22:11:54 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Sep 2005 08:11:54 +0300 Subject: [openib-general] Re: [ANNOUNCE] Contribute RDS (Reliable Datagram Sockets) to OpenIB In-Reply-To: <5D78D28F88822E4D8702BB9EEF1A4367AE3CF3@mercury.infiniconsys.com> References: <5D78D28F88822E4D8702BB9EEF1A4367AE3CF3@mercury.infiniconsys.com> Message-ID: <20050913051154.GC12526@mellanox.co.il> Quoting Pandit, Ranjit : > Our intention is to post the reference implementation ASAP and begin porting > to OpenIB. > > Please let me know the best way to get started. Sounds good, thanks! I'd suggest you drop it somewhere under https://openib.org/svn/trunk/contrib When the port is complete, you'll be able to easily move to gen2/src/ -- MST From yael at mellanox.co.il Mon Sep 12 22:20:44 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Tue, 13 Sep 2005 08:20:44 +0300 Subject: [openib-general] RE: osmtest osmt_multicast.c physible Message-ID: <506C3D7B14CDD411A52C00025558DED60CCF40@mtlex01.yok.mtl.com> Liran, As owner of the osmtest - please answer the below. Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Monday, September 12, 2005 3:36 PM To: Yael Kalka Cc: openib-general at openib.org Subject: osmtest osmt_multicast.c physible Hi Yael, What is meant by physible in the below ? osmtest/osmt_multicast.c: "Fifth exact MTU & RATE physible, Sixth exact RATE physible\n\t\t" osmtest/osmt_multicast.c: "Seventh exact MTU physible (o15.0.1.4)...\n" osmtest/osmt_multicast.c: /* Using Exact physible MTU & RATE */ osmtest/osmt_multicast.c: /* Using Exact physible RATE */ osmtest/osmt_multicast.c: /* Using Exact physible MTU */ Also, o15.0.1.4 is obsolete at 1.2 and is replaced by o15-0.2.2.. Thanks. -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at openib.org Mon Sep 12 22:59:00 2005 From: eitan at openib.org (Eitan Zahavi) Date: Tue, 13 Sep 2005 08:59:00 +0300 Subject: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features In-Reply-To: <1126559105.4382.35460.camel@hal.voltaire.com> References: <1126559105.4382.35460.camel@hal.voltaire.com> Message-ID: <20050913061656.BD3F622834D@openib.ca.sandia.gov> Great news! We will be sending patches for the main trunk from now on. Eitan Hal Rosenstock wrote: > OpenIB OpenSM now includes the OpenSM 1.8.0 functionality. > > Major thanks go to Yael Kalka and Eitan Zahavi of Mellanox. > > This is a complete merge of the osm-1.8.0-branch up through version > 3368. > > There are 2 known caveats with this so far: > 1. Some Anafa ports cannot be brought to active if not most recent > firmware (5.3.3) > 2. Solaris interoperability needs work again (I'm working on this) > > New Features > > Semistatic LID assignment > No LID change on SM restart or node reboot > Critical for IPoIB to avoid communication loss > Irresponsive port scan during light sweep > No response but Link state not down > Switch ports with HCA neighbor have lower HOQLife > Faster drain so bad HCA not impact subnet > Pkeys > Not reordered > Default values not set > DDR and QDR support > Options Cache > including all non command line > Use –c flag to create /var/cache/osm/opensm.opts > Kill –HUP > Forces a new full sweep > > Bug Fixes > > Overflow on SA queries (now drops them if overflow) > Multicast tree build took forever on large clusters > MTU and Rate selectors ignored during MCMemberRecord Query > Deleted multicast groups existing until deferred deletion > Crashed on any zero Port or Node GUID > SMInfo with a non default PKey was dropped > DDR and QDR rates were not calculated correctly > Fail to error Service Record delete of non-existing record > Memory leak in SA Client code > Multicast Join did not check for ‘JoinState != 0’ > PortInfo SA query fail if base_lid component used > OpenSM runs out of MLIDs even though some groups were deleted > Complib race in Passive Lock caused a deadlock (now use rwlock) > Many more less severe bugs fixed > > NOTE: The old OpenIB OpenSM is still available on the osm-pre-1.8.0 > branch. > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From eitan at openib.org Mon Sep 12 23:06:44 2005 From: eitan at openib.org (Eitan Zahavi) Date: Tue, 13 Sep 2005 09:06:44 +0300 Subject: [openib-general] Re: opensm and signals In-Reply-To: <1126564672.4382.36243.camel@hal.voltaire.com> References: <1126564672.4382.36243.camel@hal.voltaire.com> Message-ID: <20050913062440.F0AB422834D@openib.ca.sandia.gov> Hal Rosenstock wrote: > On Mon, 2005-09-12 at 04:00, Eitan Zahavi wrote: > >>>Hi, Hal, Eitan! >>>Whats the reason opensm needs to catch and try to handle signals >> >>such as SIGINT? > > >>[EZ] The reason was way back some drivers had resource tracking >>problems. So if OpenSM left without cleaning up all used resources >>(like MAD buffers and UD-AVs) >> >>the driver oops'ed. >> >>>It seems that we can let the default handler simply kill the >> >>application. >> >>>If this is required for some vendor layer, shouldnt the signal >>>handling be part of that vendor layer? >> >>I will be glad to remove that code... > > > Do we still need to any supported vendor layer ? I do not know. I guess that once we turn off the signal handlers we will find out. Anyway the races on the vendor exit will not be solved by that as osmtest or opensm -o still needs to exit some how. EZ > > -- Hal > From mst at mellanox.co.il Tue Sep 13 00:06:13 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Sep 2005 10:06:13 +0300 Subject: strange mem-free bug (was: [openib-general] completion Q overflow error/panic) In-Reply-To: <52u0gp95d9.fsf@cisco.com> References: <52u0gp95d9.fsf@cisco.com> Message-ID: <20050913070613.GQ845@mellanox.co.il> Quoting Roland Dreier : > Subject: strange mem-free bug (was: [openib-general] completion Q overflow error/panic) > > While looking at Viswa's example, I've found what seems to be a > problem using lots of QPs on mem-free HCAs. Thanks, Roland, I'll look into this. -- MST From liran at mellanox.co.il Tue Sep 13 00:30:22 2005 From: liran at mellanox.co.il (Liran Sorani) Date: Tue, 13 Sep 2005 10:30:22 +0300 Subject: [openib-general] RE: osmtest osmt_multicast.c physible Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA372@mtlexch01.mtl.com> Hi , The o15-0.2.2 is dealing with MC group creation its required components . The section you've pointed at the osmtest does MC creation by variant MTU & RATE values. A physible MTU / RATE means valid subnet MTU/RATE values for the port Osmtest running on. These values are responsed by OpenSM as the correct values this port can use. -----Original Message----- From: Yael Kalka Sent: Tuesday, September 13, 2005 8:21 AM To: 'Hal Rosenstock'; Yael Kalka; Liran Sorani Cc: openib-general at openib.org Subject: RE: osmtest osmt_multicast.c physible Liran, As owner of the osmtest - please answer the below. Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Monday, September 12, 2005 3:36 PM To: Yael Kalka Cc: openib-general at openib.org Subject: osmtest osmt_multicast.c physible Hi Yael, What is meant by physible in the below ? osmtest/osmt_multicast.c: "Fifth exact MTU & RATE physible, Sixth exact RATE physible\n\t\t" osmtest/osmt_multicast.c: "Seventh exact MTU physible (o15.0.1.4)...\n" osmtest/osmt_multicast.c: /* Using Exact physible MTU & RATE */ osmtest/osmt_multicast.c: /* Using Exact physible RATE */ osmtest/osmt_multicast.c: /* Using Exact physible MTU */ Also, o15.0.1.4 is obsolete at 1.2 and is replaced by o15-0.2.2.. Thanks. -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mtl001.openib.org Tue Sep 13 01:19:09 2005 From: eitan at mtl001.openib.org (Yael Kalka) Date: 13 Sep 2005 11:19:09 +0300 Subject: [openib-general] Opensm - casting issues Message-ID: <5zirx5tm6a.fsf@mtl066.yok.mtl.com> Hi Hal, Attached is a patch to fix some casting issues in osm_pkey.h. In Linux it compiles fine, but under Windows I get compilation errors due to the problem. Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/osm_pkey.c =================================================================== --- opensm/osm_pkey.c (revision 3395) +++ opensm/osm_pkey.c (working copy) @@ -202,7 +202,8 @@ osm_physp_share_pkey( IN const osm_physp_t* const p_physp_1, IN const osm_physp_t* const p_physp_2 ) { - ib_net16_t *pkey1, *pkey2, pkey1_base, pkey2_base; + ib_net16_t *pkey1, *pkey2; + uint64_t pkey1_base, pkey2_base; const osm_pkey_tbl_t *pkey_tbl1, *pkey_tbl2; cl_map_iterator_t map_iter1, map_iter2; From eitan at mtl001.openib.org Tue Sep 13 01:43:04 2005 From: eitan at mtl001.openib.org (Yael Kalka) Date: 13 Sep 2005 11:43:04 +0300 Subject: [openib-general] Opensm - casting issues Message-ID: <5zhdcptl2f.fsf@mtl066.yok.mtl.com> Hi Hal, Attached is a patch for comments left by mistake when you merged the files. Thanks, Yael Signed-off-by: Yael Kalka Index: libvendor/osm_vendor_mlx_anafa.c =================================================================== --- libvendor/osm_vendor_mlx_anafa.c (revision 3395) +++ libvendor/osm_vendor_mlx_anafa.c (working copy) @@ -447,11 +447,7 @@ osm_vendor_send (IN osm_bind_handle_t h_ /* Make our operations with the send context atomic */ osmv_txn_lock (p_bo); -<<<<<<< .working - -======= ->>>>>>> .merge-right.r3275 if (TRUE == p_bo->is_closing) { osm_log (p_bo->p_vendor->p_log, OSM_LOG_ERROR, "osm_vendor_send: ERR 7410: " From eitan at mtl001.openib.org Tue Sep 13 02:08:10 2005 From: eitan at mtl001.openib.org (Yael Kalka) Date: 13 Sep 2005 12:08:10 +0300 Subject: [openib-general] Opensm - casting issues #2 Message-ID: <5zfys9tjwl.fsf@mtl066.yok.mtl.com> Hi Hal, Attached is a patch to fix some casting issues in ib_types.h. In Linux it compiles fine, but under Windows I get compilation errors due to the problem. Thanks, Yael Signed-off-by: Yael Kalka Index: ib_types.h =================================================================== --- ib_types.h (revision 3395) +++ ib_types.h (working copy) @@ -5669,7 +5669,7 @@ ib_member_get_sl_flow_hop( tmp_sl_flow_hop = tmp_sl_flow_hop >> 20; if (p_hop) - *p_hop = tmp_sl_flow_hop & 0xff; + *p_hop = (uint8_t)(tmp_sl_flow_hop & 0xff); } /* * PARAMETERS @@ -6083,7 +6083,7 @@ ib_notice_set_prod_type( IN ib_net32_t prod_type_val) { uint32_t ptv = cl_ntoh32(prod_type_val); - p_ntc->g_or_v.generic.prod_type_lsb = cl_hton16( ptv & 0x0000ffff); + p_ntc->g_or_v.generic.prod_type_lsb = cl_hton16((uint16_t)(ptv & 0x0000ffff)); p_ntc->g_or_v.generic.prod_type_msb = (uint8_t)( (ptv & 0x00ff0000) >> 16); } /* @@ -6146,7 +6146,7 @@ ib_notice_set_vend_id( IN ib_net32_t vend_id) { uint32_t vi = cl_ntoh32(vend_id); - p_ntc->g_or_v.vend.vend_id_lsb = cl_hton16(vi & 0x0000ffff); + p_ntc->g_or_v.vend.vend_id_lsb = cl_hton16((uint16_t)(vi & 0x0000ffff)); p_ntc->g_or_v.vend.vend_id_msb = (uint8_t)((vi & 0x00ff0000) >> 16); } /* From eitan at mtl001.openib.org Tue Sep 13 02:27:31 2005 From: eitan at mtl001.openib.org (Yael Kalka) Date: 13 Sep 2005 12:27:31 +0300 Subject: [openib-general] Opensm - osm_vendor_mlx_ts.c patches Message-ID: <5zek7ttj0c.fsf@mtl066.yok.mtl.com> Hi Hal, I think you forgot to add the updates on osm_vendor_mlx_ts.c. Your branch break compilation on this file. Thanks, Yael Signed-off-by: Yael Kalka Index: osm_vendor_mlx_ts.c =================================================================== --- osm_vendor_mlx_ts.c (revision 3395) +++ osm_vendor_mlx_ts.c (working copy) @@ -52,7 +52,7 @@ #include #include -#include +#include #include #include #include @@ -178,6 +178,7 @@ __osmv_TOPSPIN_receiver_thr(void* p_ctx) ib_api_status_t osmv_transport_init(IN osm_bind_info_t *p_info, + IN uint8_t hca_idx, IN char hca_id[VENDOR_HCA_MAXNAMES], IN osmv_bind_obj_t *p_bo) { @@ -195,7 +196,7 @@ osmv_transport_init(IN osm_bind_info_t * /* open TopSpin file device */ /* HACK: assume last char in hostid is the HCA index */ - sprintf(device_file, "/dev/ts_ua%s", &hca_id[strlen(hca_id) -1]); + sprintf(device_file, "/dev/ts_ua%s", hca_idx); device_fd = open(device_file, O_RDWR ); if (device_fd < 0) { @@ -348,7 +349,7 @@ osmv_transport_mad_send(IN const osm_bin if( ret != sizeof(ts_mad) ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, - "osm_ts_send_mad: ERR 6804: " + "osmv_transport_mad_send: ERR 6804: " "Error sending mad (%d).\n", ret ); status = IB_ERROR; goto Exit; @@ -407,7 +408,7 @@ osmv_transport_done(IN const osm_bind_ha it'll know that we are currently closing down, and will not handle the mad. */ p_bo->magic_ptr = 0; - usleep(3000000); + /* usleep(3000000); */ /* seems the only way to abort a blocking read is to make it read something */ __osm_transport_gen_dummy_mad(p_bo); @@ -497,7 +498,7 @@ __osmv_TOPSPIN_mad_addr_to_osm_addr( * DESCRIPTION Modifies the port info for the bound port to set the "IS_SM" bit * according to the value given (TRUE or FALSE). */ -#ifdef OSM_VENDOR_INTF_TS_NO_VAPI +#if (defined(OSM_VENDOR_INTF_TS_NO_VAPI) || defined(OSM_VENDOR_INTF_TS)) void @@ -522,7 +523,7 @@ osm_vendor_set_sm( if ( ts_ioctl_ret < 0 ) { osm_log( p_vend->p_log, OSM_LOG_ERROR, - "osm_vendor_set_sm: ERR 7312: " + "osm_vendor_set_sm: ERR 6805: " "Unable set 'IS_SM' bit to:%u in port attributes (%d).\n", is_sm_val, ts_ioctl_ret ); } From halr at voltaire.com Tue Sep 13 03:08:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 06:08:46 -0400 Subject: [openib-general] Re: Opensm - casting issues In-Reply-To: <5zhdcptl2f.fsf@mtl066.yok.mtl.com> References: <5zhdcptl2f.fsf@mtl066.yok.mtl.com> Message-ID: <1126606125.4382.42275.camel@hal.voltaire.com> On Tue, 2005-09-13 at 04:43, Yael Kalka wrote: > Attached is a patch for comments left by mistake when you merged the > files. Thanks. Applied. Please note your email address in the patch email is coming through as: Yael Kalka -- Hal From halr at voltaire.com Tue Sep 13 03:14:30 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 06:14:30 -0400 Subject: [openib-general] Re: Opensm - osm_vendor_mlx_ts.c patches In-Reply-To: <5zek7ttj0c.fsf@mtl066.yok.mtl.com> References: <5zek7ttj0c.fsf@mtl066.yok.mtl.com> Message-ID: <1126606468.4382.42337.camel@hal.voltaire.com> On Tue, 2005-09-13 at 05:27, Yael Kalka wrote: > I think you forgot to add the updates on osm_vendor_mlx_ts.c. > Your branch break compilation on this file. Thanks. Applied. -- Hal From halr at voltaire.com Tue Sep 13 03:38:31 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 06:38:31 -0400 Subject: [openib-general] RE: osmtest osmt_multicast.c physible In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA372@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA372@mtlexch01.mtl.com> Message-ID: <1126607715.4382.42558.camel@hal.voltaire.com> On Tue, 2005-09-13 at 03:30, Liran Sorani wrote: > The o15-0.2.2 is dealing with MC group creation its required > components . as was the old o15.0.1.4 from 1.1 which the comment in the code was referring to. Shouldn't it be updated to 1.2 ? > The section you've pointed at the osmtest does MC creation by variant > MTU & RATE values. Right. > A physible MTU / RATE means valid subnet MTU/RATE values for the port > Osmtest running on. These values are responsed by OpenSM as the > correct values this port can use. I understand now: Feasible rather than physible. -- Hal > -----Original Message----- > From: Yael Kalka > Sent: Tuesday, September 13, 2005 8:21 AM > To: 'Hal Rosenstock'; Yael Kalka; Liran Sorani > Cc: openib-general at openib.org > Subject: RE: osmtest osmt_multicast.c physible > > > Liran, > As owner of the osmtest - please answer the below. > Yael > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Monday, September 12, 2005 3:36 PM > To: Yael Kalka > Cc: openib-general at openib.org > Subject: osmtest osmt_multicast.c physible > > > Hi Yael, > > What is meant by physible in the below ? > > osmtest/osmt_multicast.c: "Fifth exact MTU & RATE physible, > Sixth exact RATE physible\n\t\t" > osmtest/osmt_multicast.c: "Seventh exact MTU physible > (o15.0.1.4)...\n" > osmtest/osmt_multicast.c: /* Using Exact physible MTU & RATE */ > osmtest/osmt_multicast.c: /* Using Exact physible RATE */ > osmtest/osmt_multicast.c: /* Using Exact physible MTU */ > > Also, o15.0.1.4 is obsolete at 1.2 and is replaced by o15-0.2.2.. > > Thanks. > > -- Hal > From halr at voltaire.com Tue Sep 13 04:20:27 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 07:20:27 -0400 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features Message-ID: <1126609953.4382.42857.camel@hal.voltaire.com> [This is a minor update to the previous announcement on this.] OpenIB OpenSM 1.1.0 now includes the OpenSM 1.8.0 functionality. Major thanks go to Yael Kalka and Eitan Zahavi of Mellanox. This is a complete merge of the osm-1.8.0-branch up through version 3368. There are 2 known caveats with this so far: 1. Some Anafa ports cannot be brought to active if not most recent firmware (5.3.0) 2. Solaris interoperability needs work again (I'm working on this) New Features Semistatic LID assignment No LID change on SM restart or node reboot Critical for IPoIB to avoid communication loss Irresponsive port scan during light sweep No response but Link state not down Switch ports with HCA neighbor have lower HOQLife Faster drain so bad HCA not impact subnet Pkeys Not reordered Default values not set DDR and QDR support Options Cache including all non command line Use –c flag to create /var/cache/osm/opensm.opts Kill –HUP Forces a new full sweep Bug Fixes Overflow on SA queries (now drops them if overflow) Multicast tree build took forever on large clusters MTU and Rate selectors ignored during MCMemberRecord Query Deleted multicast groups existing until deferred deletion Crashed on any zero Port or Node GUID SMInfo with a non default PKey was dropped DDR and QDR rates were not calculated correctly Fail to error Service Record delete of non-existing record Memory leak in SA Client code Multicast Join did not check for ‘JoinState != 0’ PortInfo SA query fail if base_lid component used OpenSM runs out of MLIDs even though some groups were deleted Complib race in Passive Lock caused a deadlock (now use rwlock) Many more less severe bugs fixed NOTE: The old OpenIB OpenSM is still available on the osm-pre-1.8.0 branch. -- Hal From halr at voltaire.com Tue Sep 13 04:24:07 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 07:24:07 -0400 Subject: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features In-Reply-To: <20050913055909.C5B92253423@mailgw.voltaire.com> References: <1126559105.4382.35460.camel@hal.voltaire.com> <20050913055909.C5B92253423@mailgw.voltaire.com> Message-ID: <1126610125.4382.42878.camel@hal.voltaire.com> Hi Eitan, On Tue, 2005-09-13 at 01:59, Eitan Zahavi wrote: > Great news! > > We will be sending patches for the main trunk from now on. Glad to hear it. Will this now be the master code repository for OpenSM development ? (If so, does that include Windows as well as Linux) ? BTW, your email address on this came through as: Eitan Zahavi -- Hal From hani at mellanox.co.il Tue Sep 13 04:52:32 2005 From: hani at mellanox.co.il (Hani Salloum) Date: Tue, 13 Sep 2005 14:52:32 +0300 Subject: [openib-general] New Mellanox Firmware Page on the web Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E306795B@mtlexch01.mtl.com> Hello, The firmware webpage of Mellanox Technologies (www.mellanox.com/products/firmware.html) has been updated in the hope of making things easier for customers. Here are a few notes about it. * The main firmware page now contains a table of HCA firmware images which can be burnt onto Mellanox HCA cards using the tool flint. It also includes simple instructions for burning firmware. * The Mellanox Firmware Tools (MFT) package, which includes mlxburn (image generator), flint (HCA firmware burner), spark (single switch firmware burner), can be downloaded straight from the web (5MByte only). Release Notes and User's Manual are there as well. * The main page contains links to the following pages: * How to identify your HCA - which basically helps the customer find the PSID (a firmware configuration identifer). Using the PSID the correct firmware image can be downloaded from the table. * How to customize & update firmware using .ini files. This page contains a table of .mlx firmware packages and .ini files that can be modified. * How to update firmware for a cluster using ibadm(ibfwmgr). Missing: How to update firmware for a single switch device - TBD I'll be glad to receive your feedback. Best Regards, Hani Salloum Technical Support Mellanox Technologies -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at openib.org Tue Sep 13 04:48:31 2005 From: eitan at openib.org (Eitan Zahavi) Date: Tue, 13 Sep 2005 14:48:31 +0300 Subject: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features In-Reply-To: <1126610125.4382.42878.camel@hal.voltaire.com> References: <1126610125.4382.42878.camel@hal.voltaire.com> Message-ID: <20050913120629.D1EED22834D@openib.ca.sandia.gov> Hal Rosenstock wrote: > Hi Eitan, > > On Tue, 2005-09-13 at 01:59, Eitan Zahavi wrote: > >>Great news! >> >>We will be sending patches for the main trunk from now on. > > > Glad to hear it. Will this now be the master code repository for OpenSM > development ? (If so, does that include Windows as well as Linux) ? > > BTW, your email address on this came through as: > Eitan Zahavi We changed Exchange server and got this nice feature ... I will ask our IT to work on it. > > -- Hal > From eitan at mtl001.openib.org Tue Sep 13 05:11:05 2005 From: eitan at mtl001.openib.org (Yael Kalka) Date: 13 Sep 2005 15:11:05 +0300 Subject: [openib-general] [PATCH] Opensm - default cache dir as constant Message-ID: <5zd5ndtbfq.fsf@mtl066.yok.mtl.com> Hi Hal, The default cache dir used was hard coded. I replaced it with a constant in osm_base.h. Attached is a patch for that. Thanks, Yael Signed-off-by: Yael Kalka Index: include/opensm/osm_base.h =================================================================== --- include/opensm/osm_base.h (revision 3395) +++ include/opensm/osm_base.h (working copy) @@ -193,6 +193,22 @@ BEGIN_C_DECLS #endif /***********/ +/****d* OpenSM: Base/OSM_DEFAULT_CACHE_DIR +* NAME +* OSM_DEFAULT_CACHE_DIR +* +* DESCRIPTION +* Specifies the default cache directory for the db files. +* +* SYNOPSIS +*/ +#ifdef __WIN__ +#define OSM_DEFAULT_CACHE_DIR "C:\\Windows\\Temp\\" +#else +#define OSM_DEFAULT_CACHE_DIR "/var/cache/osm" +#endif +/***********/ + /****d* OpenSM: Base/OSM_DEFAULT_LOG_FILE * NAME * OSM_DEFAULT_LOG_FILE Index: include/opensm/osm_subnet.h =================================================================== --- include/opensm/osm_subnet.h (revision 3395) +++ include/opensm/osm_subnet.h (working copy) @@ -978,7 +978,7 @@ osm_subn_parse_conf_file( * * NOTES * Assumes the conf file is part of the cache dir which defaults to -* /var/cache/osm or OSM_CACHE_DIR the name is opensm.opts +* OSM_DEFAULT_CACHE_DIR or OSM_CACHE_DIR the name is opensm.opts * * SEE ALSO * Subnet object, osm_subn_construct, osm_subn_destroy, @@ -1008,7 +1008,7 @@ osm_subn_write_conf_file( * * NOTES * Assumes the conf file is part of the cache dir which defaults to -* /var/cache/osm or OSM_CACHE_DIR the name is opensm.opts +* OSM_DEFAULT_CACHE_DIR or OSM_CACHE_DIR the name is opensm.opts * * SEE ALSO * Subnet object, osm_subn_construct, osm_subn_destroy, Index: opensm/osm_subnet.c =================================================================== --- opensm/osm_subnet.c (revision 3395) +++ opensm/osm_subnet.c (working copy) @@ -609,7 +609,7 @@ osm_subn_parse_conf_file( char *p_key, *p_val ,*p_last; /* try to open the options file from the cache dir */ - if (! p_cache_dir) p_cache_dir = "/var/cache/osm"; + if (! p_cache_dir) p_cache_dir = OSM_DEFAULT_CACHE_DIR; strcpy(file_name, p_cache_dir); strcat(file_name,"/opensm.opts"); @@ -773,7 +773,7 @@ osm_subn_write_conf_file( FILE *opts_file; /* try to open the options file from the cache dir */ - if (! p_cache_dir) p_cache_dir = "/var/cache/osm"; + if (! p_cache_dir) p_cache_dir = OSM_DEFAULT_CACHE_DIR; strcpy(file_name, p_cache_dir); strcat(file_name,"/opensm.opts"); From eitan at mtl001.openib.org Tue Sep 13 05:18:03 2005 From: eitan at mtl001.openib.org (Yael Kalka) Date: 13 Sep 2005 15:18:03 +0300 Subject: [openib-general] [PATCH] Opensm - declaration/implementation missmatch Message-ID: <5zbr2xtb44.fsf@mtl066.yok.mtl.com> Hi Hal, The following functions have a missmatch between their declaration and their implementation. Attached is a patch for that. Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/osm_sminfo_rcv.c =================================================================== --- opensm/osm_sminfo_rcv.c (revision 3400) +++ opensm/osm_sminfo_rcv.c (working copy) @@ -97,7 +97,7 @@ osm_sminfo_rcv_init( IN osm_resp_t* const p_resp, IN osm_log_t* const p_log, IN osm_state_mgr_t* const p_state_mgr, - IN osm_sm_state_mgr_t* p_sm_state_mgr, + IN osm_sm_state_mgr_t* const p_sm_state_mgr, IN cl_plock_t* const p_lock ) { ib_api_status_t status = IB_SUCCESS; Index: opensm/osm_sa_mcmember_record.c =================================================================== --- opensm/osm_sa_mcmember_record.c (revision 3400) +++ opensm/osm_sa_mcmember_record.c (working copy) @@ -1151,7 +1151,7 @@ ib_api_status_t osm_mcmr_rcv_create_new_mgrp( IN osm_mcmr_recv_t* const p_rcv, IN ib_net64_t comp_mask, - IN const ib_member_rec_t* p_recvd_mcmember_rec, + IN const ib_member_rec_t* const p_recvd_mcmember_rec, OUT osm_mgrp_t **pp_mgrp) { ib_net16_t mlid; From halr at voltaire.com Tue Sep 13 05:44:21 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 08:44:21 -0400 Subject: [openib-general] Re: [PATCH] Opensm - default cache dir as constant In-Reply-To: <5zd5ndtbfq.fsf@mtl066.yok.mtl.com> References: <5zd5ndtbfq.fsf@mtl066.yok.mtl.com> Message-ID: <1126615418.4382.43383.camel@hal.voltaire.com> On Tue, 2005-09-13 at 08:11, Yael Kalka wrote: > The default cache dir used was hard coded. I replaced it with a > constant in osm_base.h. Thanks. Applied. -- Hal From eitan at mellanox.co.il Tue Sep 13 05:53:16 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 13 Sep 2005 15:53:16 +0300 Subject: [openib-general] RE: osmtest/OpenSM: ServiceGID and busy status In-Reply-To: <1126564181.4382.36164.camel@hal.voltaire.com> References: <1126564181.4382.36164.camel@hal.voltaire.com> Message-ID: <4326CBBC.4010609@mellanox.co.il> Hal Rosenstock wrote: > On Mon, 2005-09-12 at 11:53, Eitan Zahavi wrote: > >>Hal Rosenstock wrote: > > ... > >>>C15-0.1.13: SA shall reject as invalid any attempt to create, > > modify, or > >>>delete a ServiceRecord in which the ServiceP_Key is not present in > > the > >>>P_Key Tables of both the port identified by the ServiceGID and the > > port > >>>from which the request came. >>> >> >>Thanks for finding/pointing it. I am not sure this is covered in the >>implementation as of today. As it would cause the osmtest requests to > > be dropped. > > The tests should be fixed to do the right thing although there can be > negative tests as well. Agree. Liran, Please plan to change the ServiceGID to use the local GID and use ServiceID for differentiating multiple registrations. Eitan > > >>Let us add it to the todo list. > > > Done. > > -- Hal > From halr at voltaire.com Tue Sep 13 06:06:44 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 09:06:44 -0400 Subject: [openib-general] Re: [PATCH] Opensm - declaration/implementation missmatch In-Reply-To: <5zbr2xtb44.fsf@mtl066.yok.mtl.com> References: <5zbr2xtb44.fsf@mtl066.yok.mtl.com> Message-ID: <1126616710.4382.43410.camel@hal.voltaire.com> On Tue, 2005-09-13 at 08:18, Yael Kalka wrote: > The following functions have a missmatch between their declaration and > their implementation. Thanks. Applied. -- Hal From halr at voltaire.com Tue Sep 13 06:10:23 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 09:10:23 -0400 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <5zfys9tjwl.fsf@mtl066.yok.mtl.com> References: <5zfys9tjwl.fsf@mtl066.yok.mtl.com> Message-ID: <1126616983.4382.43413.camel@hal.voltaire.com> Hi Yael, On Tue, 2005-09-13 at 05:08, Yael Kalka wrote: > Attached is a patch to fix some casting issues in ib_types.h. > In Linux it compiles fine, but under Windows I get compilation errors due to the problem. Have the casting changes been tested (as well as compiled) ? Also, is the OpenIB svn tree the master code repository for OpenSM for both Linux and Windows ? Thanks. -- Hal From eitan at mellanox.co.il Tue Sep 13 06:16:01 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 13 Sep 2005 16:16:01 +0300 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <1126616983.4382.43413.camel@hal.voltaire.com> References: <1126616983.4382.43413.camel@hal.voltaire.com> Message-ID: <4326D111.7090109@mellanox.co.il> Hal Rosenstock wrote: > Hi Yael, > > On Tue, 2005-09-13 at 05:08, Yael Kalka wrote: > >>Attached is a patch to fix some casting issues in ib_types.h. >>In Linux it compiles fine, but under Windows I get compilation errors > > due to the problem. > > Have the casting changes been tested (as well as compiled) ? > > Also, is the OpenIB svn tree the master code repository for OpenSM for > both Linux and Windows ? The answer here is a little more complex: The WinIB repository will be used to keep the windows code. However, we are looking for a way to automatically update it with changes from the OpenIB trunk. So we are looking for having all the "core" of OpenSM be maintained only in OpenIB and be automatically "copied" into the WinIB. EZ > > Thanks. > > -- Hal > From mst at mellanox.co.il Tue Sep 13 06:25:04 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Sep 2005 16:25:04 +0300 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <4326D111.7090109@mellanox.co.il> References: <4326D111.7090109@mellanox.co.il> Message-ID: <20050913132504.GD14121@mellanox.co.il> Quoting Eitan Zahavi : > > Also, is the OpenIB svn tree the master code repository for OpenSM for > > both Linux and Windows ? > > The answer here is a little more complex: > The WinIB repository will be used to keep the windows code. > However, we are looking for a way to automatically update it > with changes from the OpenIB trunk. So we are looking for > having all the "core" of OpenSM be maintained only in OpenIB and > be automatically "copied" into the WinIB. Try svn exports feature. -- MST From eitan at mellanox.co.il Tue Sep 13 06:35:36 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 13 Sep 2005 16:35:36 +0300 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <20050913132504.GD14121@mellanox.co.il> References: <20050913132504.GD14121@mellanox.co.il> Message-ID: <4326D5A8.4060803@mellanox.co.il> Michael S. Tsirkin wrote: > Quoting Eitan Zahavi : > >>>Also, is the OpenIB svn tree the master code repository for OpenSM > > for > >>>both Linux and Windows ? >> >>The answer here is a little more complex: >>The WinIB repository will be used to keep the windows code. >>However, we are looking for a way to automatically update it >>with changes from the OpenIB trunk. So we are looking for >>having all the "core" of OpenSM be maintained only in OpenIB and >>be automatically "copied" into the WinIB. > > > Try svn exports feature. The problem is not with the script to copy the files. Even though I guess we will need quilt on the way too. (your comment:"use svn export" was even a bit insulting - but I will ignore it). The problem is maintaining the code in a mode that will support both OS without too many patches to apply in between. > From jlentini at netapp.com Tue Sep 13 06:40:46 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 13 Sep 2005 09:40:46 -0400 (EDT) Subject: [openib-general] Re: udapl copyrights and OpenIB In-Reply-To: <1126559793.4382.35592.camel@hal.voltaire.com> References: <1126559793.4382.35592.camel@hal.voltaire.com> Message-ID: On Mon, 12 Sep 2005, Hal Rosenstock wrote: > Hi Arlin & James, > > The udapl code appears to have the original license on it. Should it > have the OpenIB copyright ? What is the OpenIB copyright? Like other OpenIB code, the uDAPL code can be taken under either a BSD license or GPLv2. uDAPL also allows the code to be licensed under the Common Public License 1.0. From halr at voltaire.com Tue Sep 13 06:44:45 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 09:44:45 -0400 Subject: [openib-general] Re: udapl copyrights and OpenIB In-Reply-To: References: <1126559793.4382.35592.camel@hal.voltaire.com> Message-ID: <1126619085.4514.14.camel@hal.voltaire.com> On Tue, 2005-09-13 at 09:40, James Lentini wrote: > On Mon, 12 Sep 2005, Hal Rosenstock wrote: > > > Hi Arlin & James, > > > > The udapl code appears to have the original license on it. Should it > > have the OpenIB copyright ? > > What is the OpenIB copyright? It is the dual GPL/BSD license. > Like other OpenIB code, the uDAPL code can be taken under either a BSD > license or GPLv2. uDAPL also allows the code to be licensed under the > Common Public License 1.0. What license is actually on the uDAPL files themselves ? -- Hal From jlentini at netapp.com Tue Sep 13 07:01:27 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 13 Sep 2005 10:01:27 -0400 (EDT) Subject: [openib-general] Re: udapl copyrights and OpenIB In-Reply-To: <1126619085.4514.14.camel@hal.voltaire.com> References: <1126559793.4382.35592.camel@hal.voltaire.com> <1126619085.4514.14.camel@hal.voltaire.com> Message-ID: On Tue, 13 Sep 2005, Hal Rosenstock wrote: > On Tue, 2005-09-13 at 09:40, James Lentini wrote: > > On Mon, 12 Sep 2005, Hal Rosenstock wrote: > > > > > Hi Arlin & James, > > > > > > The udapl code appears to have the original license on it. Should it > > > have the OpenIB copyright ? > > > > What is the OpenIB copyright? > > It is the dual GPL/BSD license. My understanding is as follows: A copyright and a license are different. A copyright gives the creator the right to license the code. > > Like other OpenIB code, the uDAPL code can be taken under either a BSD > > license or GPLv2. uDAPL also allows the code to be licensed under the > > Common Public License 1.0. > > What license is actually on the uDAPL files themselves ? The license text gives a user the option of any of the three licenses: /* * Copyright (c) 2002-2003, Network Appliance, Inc. All rights * reserved. * * This Software is licensed under one of the following licenses: * * 1) under the terms of the "Common Public License 1.0" a copy of * which is available from the Open Source Initiative, see * http://www.opensource.org/licenses/cpl.php. * * 2) under the terms of the "The BSD License" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/bsd-license.php. * * 3) under the terms of the "GNU General Public License (GPL) Version * 2" a copy of which is available from the Open Source Initiative, * see http://www.opensource.org/licenses/gpl-license.php. * * Licensee has the right to choose one of the above licenses. * * Redistributions of source code must retain the above copyright * notice and one of the license notices. * * Redistributions in binary form must reproduce both the above * copyright notice, one of the license notices in the documentation * and/or other materials provided with the distribution. */ From halr at voltaire.com Tue Sep 13 07:13:18 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 10:13:18 -0400 Subject: [openib-general] Re: udapl copyrights and OpenIB In-Reply-To: References: <1126559793.4382.35592.camel@hal.voltaire.com> <1126619085.4514.14.camel@hal.voltaire.com> Message-ID: <1126620797.4514.68.camel@hal.voltaire.com> On Tue, 2005-09-13 at 10:01, James Lentini wrote: > My understanding is as follows: A copyright and a license are > different. A copyright gives the creator the right to license the > code. > > > > Like other OpenIB code, the uDAPL code can be taken under either a BSD > > > license or GPLv2. uDAPL also allows the code to be licensed under the > > > Common Public License 1.0. > > > > What license is actually on the uDAPL files themselves ? > > The license text gives a user the option of any of the three licenses: > > /* > * Copyright (c) 2002-2003, Network Appliance, Inc. All rights > * reserved. > * > * This Software is licensed under one of the following licenses: > * > * 1) under the terms of the "Common Public License 1.0" a copy of > * which is available from the Open Source Initiative, see > * http://www.opensource.org/licenses/cpl.php. > * > * 2) under the terms of the "The BSD License" a copy of which is > * available from the Open Source Initiative, see > * http://www.opensource.org/licenses/bsd-license.php. > * > * 3) under the terms of the "GNU General Public License (GPL) Version > * 2" a copy of which is available from the Open Source Initiative, > * see http://www.opensource.org/licenses/gpl-license.php. > * > * Licensee has the right to choose one of the above licenses. > * > * Redistributions of source code must retain the above copyright > * notice and one of the license notices. > * > * Redistributions in binary form must reproduce both the above > * copyright notice, one of the license notices in the documentation > * and/or other materials provided with the distribution. > */ That's fine. I didn't look closely enough. I didn't see the choices. I thought it was the old license. Sorry. -- Hal From halr at voltaire.com Tue Sep 13 07:16:56 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 10:16:56 -0400 Subject: [openib-general] Some More Operational Issues with OpenSM 1.1.0 Message-ID: <1126620933.4514.79.camel@hal.voltaire.com> Hi, Here are some additional operational issues with OpenSM 1.1.0: 1. The following warning now appears when OpenSM is started up: opensm: /usr/local/lib/libopensm.so.1: no version information available (required by opensm) 2. Not sure what the LID manager doesn't like about the old settings (from OpenSM 1.1.0). Sep 13 09:34:59 330140 [B7F144A0] -> __osm_lid_mgr_validate_db: [ Sep 13 09:34:59 330260 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR 0312: Ilegal LID range [0x4:0x0] for guid:0x0008f10403961355. Sep 13 09:34:59 330289 [B7F144A0] -> osm_db_delete: [ Sep 13 09:34:59 330313 [B7F144A0] -> osm_db_delete: ] Sep 13 09:34:59 330337 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR 0312: Ilegal LID range [0x3:0x0] for guid:0x0008f10403960559. Sep 13 09:34:59 330360 [B7F144A0] -> osm_db_delete: [ Sep 13 09:34:59 330379 [B7F144A0] -> osm_db_delete: ] Sep 13 09:34:59 330402 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR 0312: Ilegal LID range [0x5:0x0] for guid:0x005442ba00003080. Sep 13 09:34:59 330424 [B7F144A0] -> osm_db_delete: [ Sep 13 09:34:59 330443 [B7F144A0] -> osm_db_delete: ] Sep 13 09:34:59 330466 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR 0312: Ilegal LID range [0x7:0x0] for guid:0x0008f1040396055a. Sep 13 09:34:59 330535 [B7F144A0] -> osm_db_delete: [ Sep 13 09:34:59 330556 [B7F144A0] -> osm_db_delete: ] 3. LinearFDBTop is being detected as corrupted. This is bad. Sep 13 09:34:59 732496 [B7713C40] -> osm_si_rcv_process: [ Sep 13 09:34:59 732514 [B7713C40] -> osm_si_rcv_process: Switch GUID = 0x0008f10400410015, TID = 0x1273. Sep 13 09:34:59 732535 [B7713C40] -> osm_si_rcv_process: ERR 3610: Bad LinearFDBTop value = 0xC000 on switch 0x8f10400410015. Forcing correction to 0x0. 4. SM Set PortInfo being rejected with status 7. Not sure why that would be. Also, in this case (and probably others which are similar), OpenSM continues as if things succeeded. Is that right ? Sep 13 09:35:00 326832 [B6F13BC0] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x2 (SubnSet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x1 trans_id................0x12c9 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0xA m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1] Return path: [0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0C 03 03 02 14 02 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Sep 13 09:35:00 326970 [B6F13BC0] -> osm_vendor_send: [ Sep 13 09:35:00 327426 [B6F13BC0] -> osm_vendor_send: Completed Sending Request p_madw = 0x80a44a8. Sep 13 09:35:00 327453 [B6F13BC0] -> osm_vendor_send: ] Sep 13 09:35:00 327473 [B6F13BC0] -> __osm_vl15_poller: 1 on wire, 6 outstanding, 0 unicasts sent, 150 sent total. Sep 13 09:35:00 327634 [B5F13AC0] -> osm_mad_pool_get: [ Sep 13 09:35:00 327755 [B5F13AC0] -> osm_vendor_get: [ Sep 13 09:35:00 327775 [B5F13AC0] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x80a46c4, size = 256. Sep 13 09:35:00 327893 [B5F13AC0] -> osm_vendor_get: Acquired UMAD 0x80dbb18, size = 256. Sep 13 09:35:00 327914 [B5F13AC0] -> osm_vendor_get: ] Sep 13 09:35:00 327933 [B5F13AC0] -> osm_mad_pool_get: Acquired p_madw = 0x80a46b8, p_mad = 0x80dbb50, size = 256. Sep 13 09:35:00 328050 [B5F13AC0] -> osm_mad_pool_get: ] Sep 13 09:35:00 328070 [B5F13AC0] -> __osm_sm_mad_ctrl_rcv_callback: [ Sep 13 09:35:00 328183 [B5F13AC0] -> __osm_sm_mad_ctrl_rcv_callback: 150 QP0 MADs received. Sep 13 09:35:00 328362 [B5F13AC0] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x1 trans_id................0x12c9 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0xA m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1] Return path: [0][C] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0C 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Sep 13 09:35:00 328481 [B5F13AC0] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00. Sep 13 09:35:00 328655 [B5F13AC0] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x1 trans_id................0x12c9 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0xA m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1] Return path: [0][C] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0C 03 03 02 14 52 00 11 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Sep 13 09:35:00 336766 [B7713C40] -> osm_pi_rcv_process: [ Sep 13 09:35:00 336786 [B7713C40] -> PortInfo dump: port number.............0xA node_guid...............0x005442ba00003080 port_guid...............0x005442ba00003080 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0xC link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap..................0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 mtu_cap.................0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Sep 13 09:35:00 336954 [B7713C40] -> Capabilities Mask: Sep 13 09:35:00 336999 [B7713C40] -> osm_pi_rcv_process_set: [ Sep 13 09:35:00 337018 [B7713C40] -> osm_pi_rcv_process_set: ERR 0F10: Received Error Status for SetResp() Sep 13 09:35:00 337133 [B7713C40] -> PortInfo dump: port number.............0xA node_guid...............0x005442ba00003080 port_guid...............0x005442ba00003080 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0xC link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x1 port_state..............ACTIVE state_info2.............0x52 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x11 mtu_smsl................0x40 vl_cap..................0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 mtu_cap.................0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Sep 13 09:35:00 337176 [B7713C40] -> Capabilities Mask: Sep 13 09:35:00 337216 [B7713C40] -> osm_pi_rcv_process_set: Received logical SetResp() for GUID = 0x5442ba00003080, port num = 10 for parent node GUID = 0x5442ba00003080 TID = 0x12c9. Sep 13 09:35:00 337238 [B7713C40] -> osm_pi_rcv_process_set: ] Sep 13 09:35:00 337257 [B7713C40] -> osm_pi_rcv_process: ] Similarly for some other ports (0xC) Thanks. -- Hal From hch at lst.de Tue Sep 13 07:24:51 2005 From: hch at lst.de (Christoph Hellwig) Date: Tue, 13 Sep 2005 16:24:51 +0200 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <4326D111.7090109@mellanox.co.il> References: <1126616983.4382.43413.camel@hal.voltaire.com> <4326D111.7090109@mellanox.co.il> Message-ID: <20050913142451.GA21653@lst.de> On Tue, Sep 13, 2005 at 04:16:01PM +0300, Eitan Zahavi wrote: > The answer here is a little more complex: > The WinIB repository will be used to keep the windows code. > However, we are looking for a way to automatically update it > with changes from the OpenIB trunk. So we are looking for > having all the "core" of OpenSM be maintained only in OpenIB and > be automatically "copied" into the WinIB. Why does the windows port needs a separate repository? Please just check all windows code (not just opensm) into the openib repository. From eitan at mellanox.co.il Tue Sep 13 07:30:10 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 13 Sep 2005 17:30:10 +0300 Subject: [openib-general] Re: Some More Operational Issues with OpenSM 1.1.0 In-Reply-To: <1126620933.4514.79.camel@hal.voltaire.com> References: <1126620933.4514.79.camel@hal.voltaire.com> Message-ID: <4326E272.1060307@mellanox.co.il> Hal Rosenstock wrote: > Hi, > > Here are some additional operational issues with OpenSM 1.1.0: > > 1. The following warning now appears when OpenSM is started up: > opensm: /usr/local/lib/libopensm.so.1: no version information available > (required by opensm) > > 2. Not sure what the LID manager doesn't like about the old settings > (from OpenSM 1.1.0). > > Sep 13 09:34:59 330140 [B7F144A0] -> __osm_lid_mgr_validate_db: [ > Sep 13 09:34:59 330260 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR > 0312: Ilegal LID range [0x4:0x0] for guid:0x0008f10403961355. > Sep 13 09:34:59 330289 [B7F144A0] -> osm_db_delete: [ > Sep 13 09:34:59 330313 [B7F144A0] -> osm_db_delete: ] > Sep 13 09:34:59 330337 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR > 0312: Ilegal LID range [0x3:0x0] for guid:0x0008f10403960559. > Sep 13 09:34:59 330360 [B7F144A0] -> osm_db_delete: [ > Sep 13 09:34:59 330379 [B7F144A0] -> osm_db_delete: ] > Sep 13 09:34:59 330402 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR > 0312: Ilegal LID range [0x5:0x0] for guid:0x005442ba00003080. > Sep 13 09:34:59 330424 [B7F144A0] -> osm_db_delete: [ > Sep 13 09:34:59 330443 [B7F144A0] -> osm_db_delete: ] > Sep 13 09:34:59 330466 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR > 0312: Ilegal LID range [0x7:0x0] for guid:0x0008f1040396055a. > Sep 13 09:34:59 330535 [B7F144A0] -> osm_db_delete: [ > Sep 13 09:34:59 330556 [B7F144A0] -> osm_db_delete: ] The cache file should have the format: 0x0008f1040396055a 0x7 0x7 I wonder if this is what it looks like. From the complaint it looks like the line is: 0x0008f1040396055a 0x7 0x0 or 0x0008f1040396055a 0x7 Can you tell us what is it really? Also there might be a bug in parsing that file too. But it is a new bug caused by the merges... I tested this feature on 1.8.0 very thoroughly. > > > 3. LinearFDBTop is being detected as corrupted. This is bad. > Sep 13 09:34:59 732496 [B7713C40] -> osm_si_rcv_process: [ > Sep 13 09:34:59 732514 [B7713C40] -> osm_si_rcv_process: Switch GUID = > 0x0008f10400410015, TID = 0x1273. > Sep 13 09:34:59 732535 [B7713C40] -> osm_si_rcv_process: ERR 3610: > Bad LinearFDBTop value = 0xC000 on > switch 0x8f10400410015. > Forcing correction to 0x0. This is an old message that is caused by the way the Anafa firmware reports the LinearFDBTop after reboot. The SM forces the value 0x0 and this clears the issue until the next boot of the switch. We should make this into a warning. > > 4. SM Set PortInfo being rejected with status 7. Not sure why that would > be. Also, in this case (and probably others which are similar), OpenSM > continues as if things succeeded. Is that right ? Yes it continues but should report "Errors in Intialization" and retry. We should be able to reproduce it here. and will. The key is to understand what in the PortInfo caused the "illegal value" error. From halr at voltaire.com Tue Sep 13 08:20:57 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 11:20:57 -0400 Subject: [openib-general] RE: osmtest/OpenSM: ServiceGID and busy status In-Reply-To: <20050912155358.B6F6A253369@mailgw.voltaire.com> References: <1126531043.4382.30246.camel@hal.voltaire.com> <20050912155358.B6F6A253369@mailgw.voltaire.com> Message-ID: <1126624475.4514.300.camel@hal.voltaire.com> On Mon, 2005-09-12 at 11:53, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > >> > >>Proxy would be allowed. There are 2 possibilities: > >>1. Allow valid looking GIDs > >>or > >>2. Only allow GIDs present in the subnet > > > > > > I think that C15-0.0.1.13 (IBA 1.2 p.896) results in the second > > possibility above as it states: > > > > C15-0.1.13: SA shall reject as invalid any attempt to create, modify, or > > delete a ServiceRecord in which the ServiceP_Key is not present in the > > P_Key Tables of both the port identified by the ServiceGID and the port > > from which the request came. > > > Thanks for finding/pointing it. I am not sure this is covered in the > implementation as of today. As it would cause the osmtest requests to be dropped. > > Let us add it to the todo list. > > Thanks > > So that is more stringent and the ServiceGID must be a (valid) GID in > > the subnet. > > > > > I believe the test ais ttempting to create the SR with an invalid > > ServiceGID as its subnet prefix is 0. It should be rejected by the SA > > per the rule above. In terms of the SA code, it does not enforce the PKey sharing as follows: /* if the pkey given is an invalid pkey - return TRUE. */ if(ib_pkey_is_invalid(pkey)) { osm_log( p_log, OSM_LOG_DEBUG, "osm_physp_has_pkey: " "Given invalid PKey - we treat it loosely and allow it.\n"); res = TRUE; goto Exit; } Wonder what relies on this (looseness)... -- Hal From mst at mellanox.co.il Tue Sep 13 08:31:55 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Sep 2005 18:31:55 +0300 Subject: [openib-general] [PATCH] libmthca: fix wqe post (was Re: strange mem-free bug) In-Reply-To: <52u0gp95d9.fsf@cisco.com> References: <52u0gp95d9.fsf@cisco.com> Message-ID: <20050913153155.GK14121@mellanox.co.il> Quoting r. Roland Dreier : > Subject: strange mem-free bug (was: [openib-general] completion Q overflow error/panic) > > While looking at Viswa's example, I've found what seems to be a > problem using lots of QPs on mem-free HCAs. Hi, Roland! This seems to be a bug in libmthca. Patch below. We probably need a similiar fix for kernel mthca - let me know if you plan to work on that, otherwise I'll look into it tomorrow. And its probably something we want fixed for 2.6.14, right? Let me know. With regard to the test code that you posted - I also have some small comments. If you plan to use it in the future, you can stick it in svn somewhere and I'll send patches. --- Fix posting of the first work request for memfree hardware. Simplify code for tavor mode hardware. Signed-off-by: Michael S. Tsirkin Index: userspace/libmthca/src/qp.c =================================================================== --- userspace.orig/libmthca/src/qp.c 2005-09-13 17:17:58.000000000 +0300 +++ userspace/libmthca/src/qp.c 2005-09-13 17:26:23.000000000 +0300 @@ -259,15 +259,13 @@ int mthca_tavor_post_send(struct ibv_qp goto out; } - if (prev_wqe) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - htonl(((ind << qp->sq.wqe_shift) + - qp->send_wqe_offset) | - mthca_opcode[wr->opcode]); + ((struct mthca_next_seg *) prev_wqe)->nda_op = + htonl(((ind << qp->sq.wqe_shift) + + qp->send_wqe_offset) | + mthca_opcode[wr->opcode]); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - htonl((size0 ? 0 : MTHCA_NEXT_DBD) | size); - } + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + htonl((size0 ? 0 : MTHCA_NEXT_DBD) | size); if (!size0) { size0 = size; @@ -353,12 +351,10 @@ int mthca_tavor_post_recv(struct ibv_qp qp->wrid[ind] = wr->wr_id; - if (prev_wqe) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - htonl((ind << qp->rq.wqe_shift) | 1); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - htonl(MTHCA_NEXT_DBD | size); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + htonl((ind << qp->rq.wqe_shift) | 1); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + htonl(MTHCA_NEXT_DBD | size); if (!size0) size0 = size; @@ -562,15 +558,13 @@ int mthca_arbel_post_send(struct ibv_qp goto out; } - if (prev_wqe) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - htonl(((ind << qp->sq.wqe_shift) + - qp->send_wqe_offset) | - mthca_opcode[wr->opcode]); - mb(); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - htonl(MTHCA_NEXT_DBD | size); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + htonl(((ind << qp->sq.wqe_shift) + + qp->send_wqe_offset) | + mthca_opcode[wr->opcode]); + mb(); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + htonl(MTHCA_NEXT_DBD | size); if (!size0) { size0 = size; @@ -767,6 +761,8 @@ int mthca_alloc_qp_buf(struct ibv_pd *pd } } + qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); + qp->rq.last = get_recv_wqe(qp, qp->sq.max - 1); return 0; } Index: userspace/libmthca/src/srq.c =================================================================== --- userspace.orig/libmthca/src/srq.c 2005-09-13 17:25:41.000000000 +0300 +++ userspace/libmthca/src/srq.c 2005-09-13 17:25:51.000000000 +0300 @@ -142,13 +142,11 @@ int mthca_tavor_post_srq_recv(struct ibv ((struct mthca_data_seg *) wqe)->addr = 0; } - if (prev_wqe) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - htonl((ind << srq->wqe_shift) | 1); - mb(); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - htonl(MTHCA_NEXT_DBD); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + htonl((ind << srq->wqe_shift) | 1); + mb(); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + htonl(MTHCA_NEXT_DBD); srq->wrid[ind] = wr->wr_id; srq->first_free = next_ind; @@ -294,6 +292,7 @@ int mthca_alloc_srq_buf(struct ibv_pd *p srq->first_free = 0; srq->last_free = srq->max - 1; + srq->last = get_wqe(srq, srq->max - 1); return 0; } Index: userspace/libmthca/src/verbs.c =================================================================== --- userspace.orig/libmthca/src/verbs.c 2005-08-23 14:03:12.000000000 +0300 +++ userspace/libmthca/src/verbs.c 2005-09-13 17:25:14.000000000 +0300 @@ -306,7 +306,6 @@ struct ibv_srq *mthca_create_srq(struct srq->max = align_queue_size(pd->context, attr->attr.max_wr, 1); srq->max_gs = attr->attr.max_sge; - srq->last = NULL; srq->counter = 0; if (mthca_alloc_srq_buf(pd, &attr->attr, srq)) @@ -413,14 +412,12 @@ struct ibv_qp *mthca_create_qp(struct ib qp->sq.last_comp = qp->sq.max - 1; qp->sq.head = 0; qp->sq.tail = 0; - qp->sq.last = NULL; qp->rq.max = align_queue_size(pd->context, attr->cap.max_recv_wr, 0); qp->rq.next_ind = 0; qp->rq.last_comp = qp->rq.max - 1; qp->rq.head = 0; qp->rq.tail = 0; - qp->rq.last = NULL; if (mthca_alloc_qp_buf(pd, &attr->cap, qp)) goto err; -- MST From halr at voltaire.com Tue Sep 13 08:30:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 11:30:46 -0400 Subject: [openib-general] Re: Some More Operational Issues with OpenSM 1.1.0 In-Reply-To: <4326E272.1060307@mellanox.co.il> References: <1126620933.4514.79.camel@hal.voltaire.com> <4326E272.1060307@mellanox.co.il> Message-ID: <1126625088.4514.346.camel@hal.voltaire.com> Hi Eitan, On Tue, 2005-09-13 at 10:30, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > 2. Not sure what the LID manager doesn't like about the old settings > > (from OpenSM 1.1.0). > > > > Sep 13 09:34:59 330140 [B7F144A0] -> __osm_lid_mgr_validate_db: [ > > Sep 13 09:34:59 330260 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR > > 0312: Ilegal LID range [0x4:0x0] for guid:0x0008f10403961355. > > Sep 13 09:34:59 330289 [B7F144A0] -> osm_db_delete: [ > > Sep 13 09:34:59 330313 [B7F144A0] -> osm_db_delete: ] > > Sep 13 09:34:59 330337 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR > > 0312: Ilegal LID range [0x3:0x0] for guid:0x0008f10403960559. > > Sep 13 09:34:59 330360 [B7F144A0] -> osm_db_delete: [ > > Sep 13 09:34:59 330379 [B7F144A0] -> osm_db_delete: ] > > Sep 13 09:34:59 330402 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR > > 0312: Ilegal LID range [0x5:0x0] for guid:0x005442ba00003080. > > Sep 13 09:34:59 330424 [B7F144A0] -> osm_db_delete: [ > > Sep 13 09:34:59 330443 [B7F144A0] -> osm_db_delete: ] > > Sep 13 09:34:59 330466 [B7F144A0] -> __osm_lid_mgr_validate_db: ERR > > 0312: Ilegal LID range [0x7:0x0] for guid:0x0008f1040396055a. > > Sep 13 09:34:59 330535 [B7F144A0] -> osm_db_delete: [ > > Sep 13 09:34:59 330556 [B7F144A0] -> osm_db_delete: ] > The cache file should have the format: > 0x0008f1040396055a 0x7 0x7 > > I wonder if this is what it looks like. From the complaint it looks like > the line is: > 0x0008f1040396055a 0x7 0x0 > or > 0x0008f1040396055a 0x7 > > Can you tell us what is it really? /var/cache/osm/guid2lid 0x0008f10403960985 0x0007 0x0007 0x0008f10400410015 0x0003 0x0003 0x005442ba00003080 0x0005 0x0005 0x0008f1040396055a 0x0006 0x0006 0x005442b100004901 0x0002 0x0002 0x0008f10403961355 0x0004 0x0004 0x0008f10403960559 0x0001 0x0001 > Also there might be a bug in parsing that file too. But it is a new bug > caused by the merges... I tested this feature on 1.8.0 very thoroughly. > > > > > > > > 3. LinearFDBTop is being detected as corrupted. This is bad. > > Sep 13 09:34:59 732496 [B7713C40] -> osm_si_rcv_process: [ > > Sep 13 09:34:59 732514 [B7713C40] -> osm_si_rcv_process: Switch GUID = > > 0x0008f10400410015, TID = 0x1273. > > Sep 13 09:34:59 732535 [B7713C40] -> osm_si_rcv_process: ERR 3610: > > Bad LinearFDBTop value = 0xC000 on > > switch 0x8f10400410015. > > Forcing correction to 0x0. > This is an old message that is caused by the way the Anafa firmware reports > the LinearFDBTop after reboot. The SM forces the value 0x0 and this clears > the issue until the next boot of the switch. We should make this into a warning. Is this still an outstanding Anafa firmware bug ? > > 4. SM Set PortInfo being rejected with status 7. Not sure why that would > > be. Also, in this case (and probably others which are similar), OpenSM > > continues as if things succeeded. Is that right ? > Yes it continues but should report "Errors in Intialization" and retry. I don't see this. It might depend on when it occurs. > We should be able to reproduce it here. and will. Good. Thanks. > The key is to understand what in the PortInfo caused the "illegal value" error. OK. -- Hal From jackm at mellanox.co.il Tue Sep 13 08:54:17 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Tue, 13 Sep 2005 18:54:17 +0300 Subject: [openib-general] ipoib send-only join to IGMP multicast group Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA532@mtlexch01.mtl.com> I noticed that at startup, IPoIB attempts a send-only join to the MGID ff12:401b:ffff:0:0:0:0:16 (equivalent to IP 224.0.0.22 -- the IGMP multicast group -- see http://www.iana.org/assignments/multicast-addresses). 1. Why is this a send-only join? Is this just so that the local host to "announce" itself to a router (for neighbor calculations)? 2. Who is responsible for creating this multicast group in IPoIB? (A send-only join will not cause a group to be created if it does not yet exist) Jack Morgenstein -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Sep 13 09:00:11 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 12:00:11 -0400 Subject: [openib-general] ipoib send-only join to IGMP multicast group In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA532@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA532@mtlexch01.mtl.com> Message-ID: <1126627023.4514.439.camel@hal.voltaire.com> On Tue, 2005-09-13 at 11:54, Jack Morgenstein wrote: > I noticed that at startup, IPoIB attempts a send-only join to the MGID > ff12:401b:ffff:0:0:0:0:16 (equivalent to IP 224.0.0.22 -- the IGMP > multicast group -- see > http://www.iana.org/assignments/multicast-addresses). > > 1. Why is this a send-only join? Is this just so that the local host > to "announce" itself to a router (for neighbor calculations)? The router needs to create the group for the hosts to join. > 2. Who is responsible for creating this multicast group in IPoIB? See above. > (A send-only join will not cause a group to be created if it does not > yet exist) Correct. BTW, I think there may be a difficiency in Linux multicast semantics v. IB multicast semantics which has been commented on previously (a long time ago). -- Hal From rolandd at cisco.com Tue Sep 13 09:08:49 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 09:08:49 -0700 Subject: [openib-general] Re: ipoib send-only join to IGMP multicast group In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA532@mtlexch01.mtl.com> (Jack Morgenstein's message of "Tue, 13 Sep 2005 18:54:17 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA532@mtlexch01.mtl.com> Message-ID: <52acih7xwu.fsf@cisco.com> Jack> 1. Why is this a send-only join? Is this just so that the Jack> local host to "announce" itself to a router (for neighbor Jack> calculations)? Because the kernel has asked the IPoIB driver to send a packet to this group, but has not asked the IPoIB driver to start Jack> 2. Who is responsible for creating this multicast group in Jack> IPoIB? (A send-only join will not cause a group to be Jack> created if it does not yet exist) The group will be created if any full-member joins are done. But that won't happen unless some entity wants to receive packets from the group -- in other words, some multicast router. - R. From hozer at hozed.org Tue Sep 13 09:15:34 2005 From: hozer at hozed.org (Troy Benjegerdes) Date: Tue, 13 Sep 2005 11:15:34 -0500 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <1126609953.4382.42857.camel@hal.voltaire.com> References: <1126609953.4382.42857.camel@hal.voltaire.com> Message-ID: <20050913161534.GD1685@kalmia.hozed.org> On Tue, Sep 13, 2005 at 07:20:27AM -0400, Hal Rosenstock wrote: > [This is a minor update to the previous announcement on this.] > > OpenIB OpenSM 1.1.0 now includes the OpenSM 1.8.0 functionality. > > Major thanks go to Yael Kalka and Eitan Zahavi of Mellanox. > > This is a complete merge of the osm-1.8.0-branch up through version > 3368. > > There are 2 known caveats with this so far: > 1. Some Anafa ports cannot be brought to active if not most recent > firmware (5.3.0) > 2. Solaris interoperability needs work again (I'm working on this) > > New Features > > Semistatic LID assignment > No LID change on SM restart or node reboot > Critical for IPoIB to avoid communication loss > Irresponsive port scan during light sweep > No response but Link state not down We just had a node crash on our network, and it caused our OpenSM to stop working.. we were running version openib-1.0.0.. I suppose this means I should start beating up on 1.1.0 now, right? From halr at voltaire.com Tue Sep 13 09:19:30 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 12:19:30 -0400 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <20050913161534.GD1685@kalmia.hozed.org> References: <1126609953.4382.42857.camel@hal.voltaire.com> <20050913161534.GD1685@kalmia.hozed.org> Message-ID: <1126628283.4514.496.camel@hal.voltaire.com> On Tue, 2005-09-13 at 12:15, Troy Benjegerdes wrote: > We just had a node crash on our network, and it caused our OpenSM to > stop working.. we were running version openib-1.0.0.. Can you define stop working (more details) ? Are there any logs ? > I suppose this means I should start beating up on 1.1.0 now, right? Yes but the same issue might still exist. Can you reproduce it on the OpenSM you are running on now and then move up and see if it still exists ? Thanks. -- Hal From mshefty at ichips.intel.com Tue Sep 13 09:26:31 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 13 Sep 2005 09:26:31 -0700 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <20050913142451.GA21653@lst.de> References: <1126616983.4382.43413.camel@hal.voltaire.com> <4326D111.7090109@mellanox.co.il> <20050913142451.GA21653@lst.de> Message-ID: <4326FDB7.1070207@ichips.intel.com> Christoph Hellwig wrote: > Why does the windows port needs a separate repository? Please just > check all windows code (not just opensm) into the openib repository. My understanding is that the labs, who control the OpenIB servers, refused to host any Windows related code, forcing it to have a separate repository. - Sean From mst at mellanox.co.il Tue Sep 13 09:29:04 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 13 Sep 2005 19:29:04 +0300 Subject: [openib-general] Re: ipoib send-only join to IGMP multicast group In-Reply-To: <52acih7xwu.fsf@cisco.com> References: <52acih7xwu.fsf@cisco.com> Message-ID: <20050913162904.GA16651@mellanox.co.il> Quoting r. Roland Dreier : > Jack> 2. Who is responsible for creating this multicast group in > Jack> IPoIB? (A send-only join will not cause a group to be > Jack> created if it does not yet exist) > > The group will be created if any full-member joins are done. Forgive me if I'm asking a dump question - but wouldnt it be simpler to join as a full member? It seems that if we are joining a group and it doesnt exist, we have to handle this specially by forwarding packets to all-IP broadcast group. Further, since the group can be added/deleted at any time. it seems that we also should request, and handle, delete updates and 'creation' reports from the SM. > But that > won't happen unless some entity wants to receive packets from the > group -- in other words, some multicast router. The ipoib draft seems to imply that routers should typically perform nonmember joins. Do I misunderstand it? -- MST From hch at lst.de Tue Sep 13 10:01:22 2005 From: hch at lst.de (Christoph Hellwig) Date: Tue, 13 Sep 2005 19:01:22 +0200 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <4326FDB7.1070207@ichips.intel.com> References: <1126616983.4382.43413.camel@hal.voltaire.com> <4326D111.7090109@mellanox.co.il> <20050913142451.GA21653@lst.de> <4326FDB7.1070207@ichips.intel.com> Message-ID: <20050913170122.GA24527@lst.de> On Tue, Sep 13, 2005 at 09:26:31AM -0700, Sean Hefty wrote: > Christoph Hellwig wrote: > >Why does the windows port needs a separate repository? Please just > >check all windows code (not just opensm) into the openib repository. > > My understanding is that the labs, who control the OpenIB servers, refused > to host any Windows related code, forcing it to have a separate repository. It shouldn't be difficult to find someone to host it. I could maybe ask if such a repo could be put at the lst.de servers. From alexn at voltaire.com Tue Sep 13 10:01:31 2005 From: alexn at voltaire.com (Alex Nezhinsky) Date: Tue, 13 Sep 2005 20:01:31 +0300 Subject: [openib-general] [PATCH] iSER - changes in API, socket-based connect Message-ID: Hi, Attached is a patch with changes in iSER API. 1. Got rid of iscsi entities stuff, now iser implies a single iscsi entity. 2. Connections are established using iser sockets. The iser module registers itself as a new socket provider. Connections are established by creating and connecting a socket. Then iscsi should call a new iser_conn_bind() api function. It associates an instance of struct socket * with a pair of reciprocate connection handles. All further api calls identify the connection using these handles. Finally, conn_terminate() releases the socket as part of the connection shutdown routine. Files added: iser_socket.c, iser_socket.h 3. Some cosmetic changes included, too. Files deleted: iser_pdu.c, include/iser_types.h, include/iser_pdu.h Some leftovers from the deleted files in include/*.h moved into include/iser_api.h. --- Changes in iSER API. Single iSCSI entity supported. Connection establishment using sockets, registered by iSER module. Header files cleanup. Signed-off-by: Alexander Nezhinsky Index: iser_memory.h =================================================================== --- iser_memory.h (revision 3404) +++ iser_memory.h (working copy) @@ -66,9 +66,9 @@ iser_regd_buff_release(struct iser_regd_buf *p_regd_buf, int *release_deferred); -int iser_init_regd_buff_cache(struct iser_adaptor *p_iser_adaptor); +int iser_reg_all_mem(struct iser_adaptor *p_iser_adaptor); -int iser_release_regd_buff_cache(struct iser_adaptor *p_iser_adaptor); +int iser_unreg_all_mem(struct iser_adaptor *p_iser_adaptor); struct iser_regd_buf *iser_regd_buff_lookup(struct iser_adaptor *p_iser_adaptor, Index: iser_mod.c =================================================================== --- iser_mod.c (revision 3404) +++ iser_mod.c (working copy) @@ -49,7 +49,11 @@ #include #include "iser.h" +#include "iser_socket.h" +#include "iser_conn.h" +#include "iser_task.h" #include "iser_initiator.h" +#include "iser_utils.h" int iser_fmr_enabled = 1; int iser_trace_enabled = 0; @@ -65,20 +69,184 @@ MODULE_PARM_DESC(iser_trace_enabled, "enable/disable trace upon loading (default:disabled)"); +struct iser_global ig; + /** + * iser_global_init - Initializes the global iSER context structure. + * + * returns 0 on success, -1 on failure + */ +int iser_global_init(void) +{ + memset(&ig, 0, sizeof(struct iser_global)); + + /* Allocate adaptors; currently single adaptor */ + ig.num_adaptors = 1; + if (iser_adaptor_init(&ig.adaptor[0], "InfiniHost0") != 0) { + printk(KERN_ERR PFX "initializing iser failed!\n"); + iser_adaptor_release(&ig.adaptor[0]); + return -1; + } + + /* Allocate kmem_cache for iser_task structures */ + ig.task_mem_cache = + kmem_cache_create("iser_task", sizeof(struct iser_task), 0, + SLAB_HWCACHE_ALIGN, NULL, NULL); + + if (ig.task_mem_cache == NULL) { + printk(KERN_ERR PFX + "Failed to alloc task_mem_cache, name: iser_task\n"); + return -1; + } + + /* Allocate kmem_cache for iser_dto structures, for post-recv */ + ig.recv_dto_mem_cache = + kmem_cache_create("iser_recv_dto", + sizeof(struct iser_dto), + 0, SLAB_HWCACHE_ALIGN, NULL, NULL); + if (ig.recv_dto_mem_cache == NULL) { + printk(KERN_ERR PFX + "Failed to alloc recv_dto_mem_cache, " + "name: iser_recv_dto\n"); + return -1; + } + + /* Allocate kmem_cache for iser_dto structures, for send */ + ig.send_dto_mem_cache = + kmem_cache_create("iser_send_dto", + sizeof(struct iser_dto), + 0, SLAB_HWCACHE_ALIGN, NULL, NULL); + if (ig.send_dto_mem_cache == NULL) { + printk(KERN_ERR PFX + "Failed to alloc send_dto_mem_cache, iser_send_dto\n"); + return -1; + } + + /* Allocate kmem_cache for iser_regd_buf structures */ + ig.regd_buf_mem_cache = + kmem_cache_create("iser_regbuf", + sizeof(struct iser_regd_buf), + 0, SLAB_HWCACHE_ALIGN, NULL, NULL); + if (ig.regd_buf_mem_cache == NULL) { + printk(KERN_ERR PFX + "Failed to alloc regd_buf_mem_cache, " + "name: iser_regbuf\n"); + return -1; + } + /* Initialize task hash table */ + hash_init(&ig.task_hash); + + return 0; +} /* iser_global_init */ + +/** + * iser_global_release - Releases all res through + * the global iSER context structure. + * + * returns 0 on success, -1 on failure + */ +int iser_global_release(void) +{ + /* Release all adaptors */ + iser_adaptor_release(&ig.adaptor[0]); + ig.num_adaptors = 0; + + if (ig.task_mem_cache != NULL) { + kmem_cache_destroy(ig.task_mem_cache); + ig.task_mem_cache = NULL; + } + + if (ig.recv_dto_mem_cache != NULL) { + kmem_cache_destroy(ig.recv_dto_mem_cache); + ig.recv_dto_mem_cache = NULL; + } + + if (ig.send_dto_mem_cache != NULL) { + kmem_cache_destroy(ig.send_dto_mem_cache); + ig.send_dto_mem_cache = NULL; + } + + if (ig.regd_buf_mem_cache != NULL) { + kmem_cache_destroy(ig.regd_buf_mem_cache); + ig.regd_buf_mem_cache = NULL; + } + + return 0; +} /* iser_global_release */ + +/** + * iser_api_register - register iSER API, two-way + */ +int +iser_api_register(char *provider_name, + struct iser_api * api, + struct iser_api_cb * api_cb) +{ + int iser_err = 0; + + if (provider_name == NULL) { + printk(KERN_ERR PFX "NULL *provider_name\n"); + iser_err = -EINVAL; + goto api_register_exit; + } + if (api == NULL) { + printk(KERN_ERR PFX "NULL *api structure\n"); + iser_err = -EINVAL; + goto api_register_exit; + } + if (api_cb == NULL) { + printk(KERN_ERR PFX "NULL *api_cb structure\n"); + iser_err = -EINVAL; + goto api_register_exit; + } + + api->conn_bind = iser_conn_bind; + api->notice_key_values = iser_notice_key_values; + api->conn_enable_rdma = iser_conn_enable_rdma; + api->send_control = iser_send_control; + api->release_control = iser_release_control; + api->conn_terminate = iser_conn_term; + api->dealloc_conn_res = iser_dealloc_conn_res; + api->dealloc_task_res = iser_dealloc_task_res; + + ig.api_cb.control_notify = api_cb->control_notify; + ig.api_cb.conn_term_notify = api_cb->conn_term_notify; + + strncpy(ig.provider_name,provider_name,ISER_OBJECT_NAME_SIZE); + + api_register_exit: + return iser_err; +} /* iser_api_register */ +EXPORT_SYMBOL(iser_api_register); + +/** + * iser_api_unregister - Unregister API + */ +int iser_api_unregister(void) +{ + ig.api_cb.conn_term_notify = NULL; + ig.api_cb.control_notify = NULL; + return 0; +} /* iser_api_unregister */ +EXPORT_SYMBOL(iser_api_unregister); + +/** * init_module - module initialialization function */ int init_module(void) { ITRACE(ISER_TRACE_MODULE, "Starting iSER datamover...\n"); - /* Initialize the global iSER context */ - if (ig_init() != 0) { + if (iser_global_init() != 0) { printk(KERN_ERR PFX "initializing iser global structures failed!\n"); return -1; } - + if (iser_register_sockets() != 0) { + printk(KERN_ERR PFX "iser socket init failed!\n"); + iser_global_release(); + return -1; + } return 0; } /* init_module */ @@ -88,6 +256,6 @@ void cleanup_module(void) { ITRACE(ISER_TRACE_MODULE, "Removing iSER datamover...\n"); - - ig_release(); -} + iser_global_release(); + iser_unreg_sockets(); +} /* cleanup_module */ Index: include/iser_pdu.h =================================================================== --- include/iser_pdu.h (revision 3404) +++ include/iser_pdu.h (working copy) @@ -1,288 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -#ifndef __ISER_PDU_H__ -#define __ISER_PDU_H__ - -#include - -#include "iser_types.h" - -/*! -------------------------------------------------------------------- - [iser_pdu_bhs] - - Description: iSCSI PDU Basic Header Segment (BHS) - - The purpose of defining BHS here is define the fields relevant - to iSER operation. iSCSI layer should have its own definitions and - may view the BHS passed to iSER just as a buffer. - In order to avoid potential name collision, the structure has - "iser" prefix, although strictly speaking it must be "iscsi". - -------------------------------------------------------------------- */ - -#define ISER_PDU_BHS_LENGTH 48 - -union iser_pdu_bhs { - - unsigned char buf[ISER_PDU_BHS_LENGTH]; - - struct { - uint8_t opcode; /* optional bits + opcode */ - uint8_t flags; /* opcode-specific flags */ - uint8_t rsvd[2]; /* usually reserved, used in response */ - uint8_t ahs_length; /* AHS total length */ - uint8_t dlength[3]; /* Data segment length */ - uint8_t lun[8]; /* LUN */ - uint8_t itt[4]; /* Initiator Task Tag */ - uint8_t other[28]; /* opcode-specific */ - } byte; - - struct { - uint32_t op_flg_rsvd; /* opcode, flags, reserved */ - uint32_t length; /* AHS length, Data segment length */ - uint32_t lun[2]; /* LUN */ - uint32_t itt; /* Initiator Task Tag */ - uint32_t other[7]; /* opcode-specific */ - } dword; -}; /* iser_pdu_bhs */ - -/* -------------------------------------------------------- - * opcode byte - * -------------------------------------------------------- - */ - -/* Set when marked for immediate delivery */ -#define ISCSI_OP_IMMEDIATE 0x40 -/* Masks out the opcode itself */ -#define ISCSI_OPCODE_MASK 0x3F - -/* Initiator only opcode values */ -#define ISCSI_OP_NOP_OUT 0x00 -#define ISCSI_OP_SCSI_CMD 0x01 -#define ISCSI_OP_TASK_MGT_REQ 0x02 -#define ISCSI_OP_LOGIN_REQ 0x03 -#define ISCSI_OP_TEXT_REQ 0x04 -#define ISCSI_OP_DATA_OUT 0x05 -#define ISCSI_OP_LOGOUT_REQ 0x06 -#define ISCSI_OP_SNACK_REQ 0x10 - -/* Target only opcode values */ -#define ISCSI_OP_NOP_IN 0x20 -#define ISCSI_OP_SCSI_RSP 0x21 -#define ISCSI_OP_TASK_MGT_RSP 0x22 -#define ISCSI_OP_LOGIN_RSP 0x23 -#define ISCSI_OP_TEXT_RSP 0x24 -#define ISCSI_OP_DATA_IN 0x25 -#define ISCSI_OP_LOGOUT_RSP 0x26 -#define ISCSI_OP_R2T 0x31 -#define ISCSI_OP_ASYNC 0x32 -#define ISCSI_OP_REJECT 0x3f - -/* -------------------------------------------------------- - * flags byte - * -------------------------------------------------------- - */ - -/* When set indicates the final (or only) PDU of a sequence */ -#define ISCSI_FLAG_FINAL 0x80 -/* When set indicates that data is to be read */ -#define ISCSI_FLAG_READ_CMD 0x40 -/* When set indicates that data is to be written */ -#define ISCSI_FLAG_WRITE_CMD 0x20 - -#define IS_SET_ISCSI_FLAG_FINAL(x) \ - (ISCSI_FLAG_FINAL & x) - -#define IS_SET_ISCSI_FLAG_READ_CMD(x) \ - (ISCSI_FLAG_READ_CMD & x) - -#define IS_SET_ISCSI_FLAG_WRITE_CMD(x) \ - (ISCSI_FLAG_WRITE_CMD & x) - -/* Login specific */ -#define ISCSI_FLAG_LOGIN_TRANSIT 0x80 -#define ISCSI_FLAG_LOGIN_CONTINUE 0x40 - -#define IS_SET_ISCSI_FLAG_LOGIN_TRANSIT(x) \ - (ISCSI_FLAG_LOGIN_TRANSIT & x) - -#define IS_SET_ISCSI_FLAG_LOGIN_CONTINUE(x) \ - (ISCSI_FLAG_LOGIN_CONTINUE & x) - -#define ISCSI_FLAG_LOGIN_CSG_MASK 0x0C -#define ISCSI_FLAG_LOGIN_NSG_MASK 0x03 - -#define GET_ISCSI_FLAG_LOGIN_CSG(x) \ - (ISCSI_FLAG_LOGIN_CSG_MASK & x) - -#define GET_ISCSI_FLAG_LOGIN_NSG(x) \ - (ISCSI_FLAG_LOGIN_NSG_MASK & x) - -#define ISCSI_LOGIN_STAGE_SECURITY 0x00 -#define ISCSI_LOGIN_STAGE_OPERATIONAL 0x01 -#define ISCSI_LOGIN_STAGE_FULL_FEATURE 0x03 - -#define IS_ISCSI_LOGIN_NSG_OPERATIONAL(x) \ - (GET_ISCSI_FLAG_LOGIN_NSG(x) == ISCSI_LOGIN_STAGE_OPERATIONAL) - -#define IS_ISCSI_LOGIN_NSG_FULL_FEATURE(x) \ - (GET_ISCSI_FLAG_LOGIN_NSG(x) == ISCSI_LOGIN_STAGE_FULL_FEATURE) - -#define ISCSI_AHSL_MASK 0xFF000000 -#define ISCSI_DSL_MASK 0x00FFFFFF - -#define ISCSI_INVALID_ITT 0xFFFFFFFF - -/* -------------------------------------------------------- - * opcode-specific dword fields, - * offsets are indices of: iser_pdu_bhs.dword.other[7] - * -------------------------------------------------------- - */ - -/* SCSI Command */ -#define ISCSI_CMD_FIELD_EDTL 0 -#define ISCSI_CMD_FIELD_CMD_SN 1 -#define ISCSI_CMD_FIELD_EXP_STAT_SN 2 - -/* Data-OUT */ -#define ISCSI_DATA_OUT_FIELD_TTT 0 -#define ISCSI_DATA_OUT_FIELD_EXP_STAT_SN 2 -#define ISCSI_DATA_OUT_FIELD_DATA_SN 4 -#define ISCSI_DATA_OUT_FIELD_OFFSET 5 - -/* Data-IN */ -#define ISCSI_DATA_IN_FIELD_TTT 0 -#define ISCSI_DATA_IN_FIELD_STAT_SN 1 -#define ISCSI_DATA_IN_FIELD_EXP_CMD_SN 2 -#define ISCSI_DATA_IN_FIELD_MAX_CMD_SN 3 -#define ISCSI_DATA_IN_FIELD_DATA_SN 4 -#define ISCSI_DATA_IN_FIELD_OFFSET 5 -#define ISCSI_DATA_IN_FIELD_RESID_CNT 6 - -/* R2T */ -#define ISCSI_R2T_FIELD_TTT 0 -#define ISCSI_R2T_FIELD_STAT_SN 1 -#define ISCSI_R2T_FIELD_EXP_CMD_SN 2 -#define ISCSI_R2T_FIELD_MAX_CMD_SN 3 -#define ISCSI_R2T_FIELD_R2T_SN 4 -#define ISCSI_R2T_FIELD_OFFSET 5 -#define ISCSI_R2T_FIELD_DESIRED_CNT 6 - -/*! -------------------------------------------------------------------- - [iser_send_pdu] - - Description: descriptor of a control PDU to be sent by Send_Control - -------------------------------------------------------------------- */ -struct iser_send_pdu { - union iser_pdu_bhs *p_bhs; /* BHS */ - union { - /* Command */ - struct { - /* data-out buffer meant for the entire - write/bidir command */ - struct iser_data_buf buf_out; - /* data-in buffer meant for the - entire read/bidir command */ - struct iser_data_buf buf_in; - /* although not data segment, ahs defined here */ - struct iser_data_buf ahs; - /* size of immediate unsolicited data for - write/bidir command, will be sent as - the data segment of the command PDU */ - unsigned int immediate_sz; - /* entire unsolicited data for write/bidir command, - will be sent in data segments of command - and data-out PDUs */ - unsigned int unsolicited_sz; - } command; - /* Response */ - struct { - /* the sense and response information for the command */ - struct iser_data_buf buf_status; - - } response; - /* Task Mgt Request */ - struct { - /* data-out for the entire write/bidir command, - valid only if Function=TASK REASSIGN */ - struct iser_data_buf buf_in; - /* data-out for the entire read/bidir command, - valid only if Function=TASK REASSIGN */ - struct iser_data_buf buf_out; - } task_mgt_req; - /* Data-In - not sent */ - struct { - /* buffer from which the data-in is to be sent */ - struct iser_data_buf src_buf; - } data_in; - /* R2T - not sent */ - struct { - /* buffer to which the data-out is to be received */ - struct iser_data_buf dst_buf; - } r2t; - - /* Other Transmitted PDUs - * ------------------------------------------------------------ - * Data-Out - disregarded, offset & DSL fields of BHS are used - * Asynchronous message - sense and iSCSI Event information - * Text Request - iSCSI Text Request - * Text Response - iSCSI Text Response - * Login Request - iSCSI Login Request - * Login Response - iSCSI Login Response - * Reject - reject desriptor - * Nop-Out - data accompanying Nop-Out PDU (iSCSI ping) - * Nop-In - data accompanying Nop-In PDU (iSCSI return ping) - * ------------------------------------------------------------ - */ - struct { - /* data segment of a generic tx PDU */ - struct iser_data_buf buf; - } tx; - /* Any Received PDU */ - } data; /* PDU data segment descriptor */ - -}; /* iser_send_pdu */ - -/*! -------------------------------------------------------------------- - [iser_recv_pdu] - - Description: received control PDU descriptor; passed to Control_Notify - -------------------------------------------------------------------- */ -struct iser_recv_pdu { - - union iser_pdu_bhs *p_bhs; /* BHS */ - - struct iser_data_buf rx_data; /* data segment */ -}; /* iser_recv_pdu */ - -#endif /* __ISER_PDU_H__ */ Index: include/iser_api.h =================================================================== --- include/iser_api.h (revision 3404) +++ include/iser_api.h (working copy) @@ -34,98 +34,294 @@ #ifndef __ISER_API_H__ #define __ISER_API_H__ -#include -#include +#include +#include -#include "iser_types.h" -#include "iser_pdu.h" +/** + * iser_data_buf - generic iSER buffer descriptor + */ -/* Functions exported by iSER datamover layer for iSCSI layer */ +enum iser_buf_type { + ISER_BUF_TYPE_SINGLE = 0, /* single contiguous buffer */ + ISER_BUF_TYPE_SCATTERLIST, /* struct scatterlist array */ + ISER_BUF_TYPES_NUM +}; -typedef iser_status - (*iser_conn_establish_func) (void *api_h, - void *iscsi_conn_h, - struct sockaddr_in * dst_addr, - struct sockaddr_in * src_addr); -typedef iser_status - (*iser_enable_datamover_func) (void *conn_h, void *tport_conn); +struct iser_data_buf { + void *p_buf; + unsigned int size; + enum iser_buf_type type; +}; -typedef iser_status(*iser_dealloc_conn_res_func) (void *conn_h); +#define AF_ISER 28 /* to be defined properly */ -typedef iser_status - (*iser_send_control_func) (void *conn_h, - struct iser_send_pdu * p_ctrl_pdu); +/** + * iser_pdu_bhs - iSCSI PDU Basic Header Segment (BHS) + * + * The purpose of defining BHS here is define the fields relevant + * to iSER operation. iSCSI layer should have its own definitions and + * may view the BHS passed to iSER just as a buffer. + * In order to avoid potential name collision, the structure has + * "iser" prefix, although strictly speaking it must be "iscsi". + */ +#define ISER_PDU_BHS_LENGTH 48 +union iser_pdu_bhs { -/* extention API function, iSCSI layer notifies iSER datamover layer - that an iSCSI control-type PDU previously passed through the Control_Notify - primitive is no longer in use and its memory may be released. -*/ -typedef iser_status - (*iser_release_control_func) (void *conn_h, - struct iser_recv_pdu * p_ctrl_pdu); + unsigned char buf[ISER_PDU_BHS_LENGTH]; -typedef iser_status(*iser_connectionerminate_func) (void *conn_h); + struct { + uint8_t opcode; /* optional bits + opcode */ + uint8_t flags; /* opcode-specific flags */ + uint8_t rsvd[2]; /* usually reserved, used in response */ + uint8_t ahs_length; /* AHS total length */ + uint8_t dlength[3]; /* Data segment length */ + uint8_t lun[8]; /* LUN */ + uint8_t itt[4]; /* Initiator Task Tag */ + uint8_t other[28]; /* opcode-specific */ + } byte; -typedef iser_status - (*iser_dealloc_task_res_func) (void *conn_h, unsigned int itt); + struct { + uint32_t op_flg_rsvd; /* opcode, flags, reserved */ + uint32_t length; /* AHS length, Data segment length */ + uint32_t lun[2]; /* LUN */ + uint32_t itt; /* Initiator Task Tag */ + uint32_t other[7]; /* opcode-specific */ + } dword; +}; /* iser_pdu_bhs */ -typedef iser_status - (*iser_notice_key_values_func) (void *conn_h, char *key, char *value); +/** + * opcode byte + */ -struct iser_conn_res { - void *api_h; /* IN */ - struct sockaddr_in *dst_addr; /* IN */ - struct sockaddr_in *src_addr; /* IN */ - unsigned int max_recv_pdu_sz; /* to be deleted!!! */ - /* Maximal PDU size, receiving of which should be anticipated */ - unsigned int first_burst_length; /* IN */ - unsigned int max_recv_dsl; /* IN */ - unsigned int max_outstand_cmds; /* IN */ +/* Set when marked for immediate delivery */ +#define ISCSI_OP_IMMEDIATE 0x40 +/* Masks out the opcode itself */ +#define ISCSI_OPCODE_MASK 0x3F - unsigned int num_reg_buf; /* IN */ -}; /* iser_conn_res */ +/* Initiator only opcode values */ +#define ISCSI_OP_NOP_OUT 0x00 +#define ISCSI_OP_SCSI_CMD 0x01 +#define ISCSI_OP_TASK_MGT_REQ 0x02 +#define ISCSI_OP_LOGIN_REQ 0x03 +#define ISCSI_OP_TEXT_REQ 0x04 +#define ISCSI_OP_DATA_OUT 0x05 +#define ISCSI_OP_LOGOUT_REQ 0x06 +#define ISCSI_OP_SNACK_REQ 0x10 -typedef iser_status(*iser_alloc_conn_res_func) - (void *conn_h, struct iser_conn_res * conn_res); +/* Target only opcode values */ +#define ISCSI_OP_NOP_IN 0x20 +#define ISCSI_OP_SCSI_RSP 0x21 +#define ISCSI_OP_TASK_MGT_RSP 0x22 +#define ISCSI_OP_LOGIN_RSP 0x23 +#define ISCSI_OP_TEXT_RSP 0x24 +#define ISCSI_OP_DATA_IN 0x25 +#define ISCSI_OP_LOGOUT_RSP 0x26 +#define ISCSI_OP_R2T 0x31 +#define ISCSI_OP_ASYNC 0x32 +#define ISCSI_OP_REJECT 0x3f -struct iser_api { - iser_conn_establish_func conn_establish; - iser_send_control_func send_control; - iser_release_control_func release_control; - iser_alloc_conn_res_func alloc_conn_res; - iser_notice_key_values_func notice_key_values; - iser_enable_datamover_func enable_datamover; - iser_connectionerminate_func conn_terminate; - iser_dealloc_conn_res_func dealloc_conn_res; - iser_dealloc_task_res_func dealloc_task_res; +/** + * flags byte + */ + +/* When set indicates the final (or only) PDU of a sequence */ +#define ISCSI_FLAG_FINAL 0x80 +/* When set indicates that data is to be read */ +#define ISCSI_FLAG_READ_CMD 0x40 +/* When set indicates that data is to be written */ +#define ISCSI_FLAG_WRITE_CMD 0x20 + +#define IS_SET_ISCSI_FLAG_FINAL(x) \ + (ISCSI_FLAG_FINAL & x) + +#define IS_SET_ISCSI_FLAG_READ_CMD(x) \ + (ISCSI_FLAG_READ_CMD & x) + +#define IS_SET_ISCSI_FLAG_WRITE_CMD(x) \ + (ISCSI_FLAG_WRITE_CMD & x) + +/* Login specific */ +#define ISCSI_FLAG_LOGIN_TRANSIT 0x80 +#define ISCSI_FLAG_LOGIN_CONTINUE 0x40 + +#define IS_SET_ISCSI_FLAG_LOGIN_TRANSIT(x) \ + (ISCSI_FLAG_LOGIN_TRANSIT & x) + +#define IS_SET_ISCSI_FLAG_LOGIN_CONTINUE(x) \ + (ISCSI_FLAG_LOGIN_CONTINUE & x) + +#define ISCSI_FLAG_LOGIN_CSG_MASK 0x0C +#define ISCSI_FLAG_LOGIN_NSG_MASK 0x03 + +#define GET_ISCSI_FLAG_LOGIN_CSG(x) \ + (ISCSI_FLAG_LOGIN_CSG_MASK & x) + +#define GET_ISCSI_FLAG_LOGIN_NSG(x) \ + (ISCSI_FLAG_LOGIN_NSG_MASK & x) + +#define ISCSI_LOGIN_STAGE_SECURITY 0x00 +#define ISCSI_LOGIN_STAGE_OPERATIONAL 0x01 +#define ISCSI_LOGIN_STAGE_FULL_FEATURE 0x03 + +#define IS_ISCSI_LOGIN_NSG_OPERATIONAL(x) \ + (GET_ISCSI_FLAG_LOGIN_NSG(x) == ISCSI_LOGIN_STAGE_OPERATIONAL) + +#define IS_ISCSI_LOGIN_NSG_FULL_FEATURE(x) \ + (GET_ISCSI_FLAG_LOGIN_NSG(x) == ISCSI_LOGIN_STAGE_FULL_FEATURE) + +#define ISCSI_AHSL_MASK 0xFF000000 +#define ISCSI_DSL_MASK 0x00FFFFFF + +#define ISCSI_INVALID_ITT 0xFFFFFFFF + +/** + * opcode-specific dword fields, + * offsets are indices of: iser_pdu_bhs.dword.other + */ + +/* SCSI Command */ +#define ISCSI_CMD_F_EDTL 0 +#define ISCSI_CMD_F_CMDSN 1 +#define ISCSI_CMD_F_EXP_STATSN 2 + +/* Data-OUT */ +#define ISCSI_DOUT_F_TTT 0 +#define ISCSI_DOUT_F_EXP_STATSN 2 +#define ISCSI_DOUT_F_DATASN 4 +#define ISCSI_DOUT_F_OFFSET 5 + +/** + * iser_send_pdu - descriptor of a control PDU to be sent by send_control + */ +struct iser_send_pdu { + union iser_pdu_bhs *p_bhs; /* BHS */ + union { + /* Command */ + struct { + struct iser_data_buf buf_out; + struct iser_data_buf buf_in; + struct iser_data_buf ahs; + /* size of immediate unsolicited data for + write/bidir command, will be sent as + the data segment of the command PDU */ + unsigned int immediate_sz; + /* entire unsolicited data for write/bidir command, + will be sent in data segments of command + and data-out PDUs */ + unsigned int unsolicited_sz; + } command; + /* Task Mgt Request */ + struct { + /* data-out for the entire write/bidir command, + valid only if Function=TASK REASSIGN */ + struct iser_data_buf buf_in; + /* data-out for the entire read/bidir command, + valid only if Function=TASK REASSIGN */ + struct iser_data_buf buf_out; + } task_mgt_req; + /* Other Transmitted PDUs: + * Data-Out - disregarded, offset & DSL fields of BHS are used + * Text Request - iSCSI Text Request + * Login Request - iSCSI Login Request + * Nop-Out - data accompanying Nop-Out PDU (iSCSI ping) + */ + struct { + struct iser_data_buf buf; + } tx; + } data; +}; /* iser_send_pdu */ + +/** + * iser_recv_pdu - received control PDU descriptor; passed to control_notify + */ +struct iser_recv_pdu { + union iser_pdu_bhs *p_bhs; /* BHS */ + struct iser_data_buf rx_data; /* data segment */ +}; /* iser_recv_pdu */ + +/** + * Functions exported by iSER layer for iSCSI layer + */ + +struct iser_conn_res { + unsigned int first_burst_length; + unsigned int max_recv_dsl; + unsigned int max_outstand_cmds; }; -/* Functions exported by iSCSI layer for iSER datamover layer */ +/* bind connection previosuly established thru a socket */ +typedef int +(*iser_conn_bind_func) (void *iscsi_conn_h, /* IN */ + struct socket *sock, /* IN */ + void **iser_conn_h); /* OUT */ -typedef iser_status(*iser_conn_establish_notify_func) (void *iscsi_conn_h, - int established); +/* pass a finalized key-value pair */ +typedef int +(*iser_notice_key_values_func) (void *iser_conn_h, + char *key, char *value); -typedef void (*iser_control_notify_func) (void *conn_h, - struct iser_recv_pdu * p_ctrl_pdu); +/* allocate connection resources and enable datamover rdma functionality */ +typedef int +(*iser_conn_enable_rdma_func) (void *iser_conn_h, + struct iser_conn_res *conn_res); -typedef void (*iser_connectionerm_notify_func) (void *conn_h); +/* send iSCSI control-type PDU, also appropriate for Login phase */ +typedef int +(*iser_send_control_func) (void *iser_conn_h, + struct iser_send_pdu *p_ctrl_pdu); -/* Callbacks registration */ +/* deallocate connection resources */ +typedef int +(*iser_dealloc_conn_res_func) (void *iser_conn_h); +/* signal that iSCSI control-type PDU previously passed through + control_notify() is no longer in use, its resources may be released */ +typedef int +(*iser_release_control_func) (void *iser_conn_h, + struct iser_recv_pdu *p_ctrl_pdu); + +/* terminate a connection */ +typedef int +(*iser_conn_terminate_func) (void *iser_conn_h); + +/* release all iSER task resources */ +typedef int +(*iser_dealloc_task_res_func) (void *iser_conn_h, unsigned int itt); + +struct iser_api { + iser_conn_bind_func conn_bind; + iser_notice_key_values_func notice_key_values; + iser_conn_enable_rdma_func conn_enable_rdma; + iser_send_control_func send_control; + iser_release_control_func release_control; + iser_conn_terminate_func conn_terminate; + iser_dealloc_conn_res_func dealloc_conn_res; + iser_dealloc_task_res_func dealloc_task_res; +}; + +/** + * Functions exported by iSCSI layer for iSER layer + */ + +typedef void +(*iser_control_notify_func) (void *iscsi_conn_h, + struct iser_recv_pdu *p_ctrl_pdu); +typedef void +(*iser_conn_term_notify_func) (void *iscsi_conn_h); + struct iser_api_cb { - iser_conn_establish_notify_func conn_establish_notify; - iser_connectionerm_notify_func conn_terminate_notify; iser_control_notify_func control_notify; + iser_conn_term_notify_func conn_term_notify; }; -#define ISER_INVALID_API_H ((void *)~0x0) +/** + * iSER API registration + */ -iser_status -iser_api_register(char *provider_name, - struct iser_api *api, - struct iser_api_cb *api_cb, void **p_api_h); +int iser_api_register(char *provider_name, + struct iser_api *api, /* OUT */ + struct iser_api_cb *api_cb); /* IN */ -/* Unregister API previously registered for this entity */ -iser_status iser_api_unregister(void *api_h); /* IN */ +int iser_api_unregister(void); -#endif /* __ISER_API_H__ */ +#endif /* __ISER_API_H__ */ Index: include/iser_types.h =================================================================== --- include/iser_types.h (revision 3404) +++ include/iser_types.h (working copy) @@ -1,63 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -#ifndef __ISER_TYPES_H__ -#define __ISER_TYPES_H__ - -typedef enum { - ISER_SUCCESS = 0, /* operation successful */ - ISER_FAILURE, /* operation failed */ - ISER_INVALID_CONN, /* invalid conn handle supplied */ - ISER_INVALID_ITT, /* invalid Initiator Task Tag supplied */ - ISER_ILLEGAL_PARAM, /* illegal value of a parameter, - e.g. NULL pointer */ - ISER_UNREGISTERED, /* this function has not been registered */ - ISER_INVALID_KEY, /* Unsupported or invalid key passed to - Notice_Key_Value */ - ISER_INVALID_VALUE, /* Unsupported or invalid value passed to - Notice_Key_Value */ -} iser_status; - -enum iser_buf_type { - ISER_BUF_TYPE_SINGLE = 0, /* single contiguous buffer */ - ISER_BUF_TYPE_SCATTERLIST, /* struct scatterlist array */ - ISER_BUF_TYPES_NUM -}; - -struct iser_data_buf { - void *p_buf; - unsigned int size; - enum iser_buf_type type; -}; - -#endif /* __ISER_TYPES_H__ */ Index: iser.h =================================================================== --- iser.h (revision 3404) +++ iser.h (working copy) @@ -65,7 +65,6 @@ /* Various size limits */ #define ISER_LOGIN_PHASE_PDU_DATA_LEN (8*1024) /* 8K */ #define ISER_MIN_RECV_DSL (8*1024) /* 8K */ -#define ISER_MAX_CMD_SIZE (2*1024*1024) /* 2M */ #define ISER_MAX_FIRST_BURST (128*1024) /* 128K */ #define ISER_MAX_CTRLS_PER_CMD(first_burst,recv_dsl,imm) \ @@ -226,7 +225,6 @@ struct iser_connection { void *iscsi_conn_h; /* iSCSI-supplied handle */ struct iser_adaptor *p_adaptor; /* IA context */ - struct iser_entity *p_entity; /* iSCSI entity */ atomic_t state; /* Connection state */ spinlock_t conn_lock; /* Guards the conn and related structures */ @@ -239,37 +237,21 @@ hash buckets */ struct list_head adaptor_list; /* Entry in the adaptor's list of conns */ - struct list_head entity_list; /* Entry in the entity - list of conns */ - struct dat_ep *ep_handle; /* End-Point of the conn */ - struct iser_buf_pool *post_recv_pool; /* Pool of post_recv - buffers */ - struct iser_buf_pool *send_data_pool; /* Pool of send data - buffers */ - atomic_t post_recv_buf_count; /* Counter of posted - recv buffers */ - atomic_t post_send_buf_count; /* Counter of posted - send buffers */ - struct list_head ctrl_notify_dto_list; /* controls passed - to ctrl_notify() */ - struct iser_op_params param; /* Session operational parameters */ - unsigned int max_outstand_cmds; /* current max allowed - commands */ - unsigned int initial_post_recv_bufs_num; /* posted at - the beginning */ - /* allocd through a buffer pool */ + struct dat_ep *ep_handle; + struct iser_buf_pool *post_recv_pool; + struct iser_buf_pool *send_data_pool; + atomic_t post_recv_buf_count; + atomic_t post_send_buf_count; + struct list_head ctrl_notify_dto_list; /* controls pushed to issci */ + struct iser_op_params param; /* operational parameters */ + unsigned int max_outstand_cmds; + unsigned int initial_post_recv_bufs_num; unsigned int alloc_post_recv_bufs_num; - int disc_evt_flag; /* Set if EVENT_DISCONNECTED - is received from DAPL */ - wait_queue_head_t disconnect_wait_q; /* for waiting for a - disconn completion */ - struct iser_buf_pool *spare_post_recv_pool; /* Pool of post_recv - buffers */ - struct iser_buf_pool *spare_send_data_pool; /* Pool of send_data - buffers */ - char name[ISER_OBJECT_NAME_SIZE]; /* Conn. name - - e.g. ip addr */ - + int disc_evt_flag; /* Set if disconnect event recvd */ + wait_queue_head_t disconnect_wait_q; + struct iser_buf_pool *spare_post_recv_pool; + struct iser_buf_pool *spare_send_data_pool; + char name[ISER_OBJECT_NAME_SIZE]; }; struct iser_regd_buf { @@ -388,19 +370,8 @@ }; #define ISER_MAX_ADAPTORS 1 -#define ISER_MAX_ENTITIES 1 #define ISER_MAX_CONN 4 -struct iser_entity { - int registered; - struct iser_api_cb api_cb; - /* List of all iSER conns on this adaptor */ - spinlock_t conn_lock; - struct list_head conn_list; - /* name of the iSCSI layer entity provider */ - char provider_name[ISER_OBJECT_NAME_SIZE]; -}; - struct iser_adaptor { struct dat_ia *ia_handle; /* Interface Adaptor */ struct dat_pz *pz_handle; /* Protection Zone */ @@ -445,23 +416,17 @@ * providers use this iSER datamover. */ struct iser_global { - /* IA context, Future: array of adaptors, by name */ unsigned int num_adaptors; struct iser_adaptor adaptor[ISER_MAX_ADAPTORS]; - /* Ext API callbacks registration */ - unsigned int num_entities; - struct iser_entity entity[ISER_MAX_ENTITIES]; - /* Memory pools */ - kmem_cache_t *conn_mem_cache; /* slab for iser_connection */ + kmem_cache_t *task_mem_cache; /* slab for iser_task */ kmem_cache_t *recv_dto_mem_cache; /* slab for iser_dto */ kmem_cache_t *send_dto_mem_cache; /* slab for iser_dto */ kmem_cache_t *regd_buf_mem_cache; /* for iser_regd_buf */ - /* Hash tables */ struct hash_table task_hash; /* hash table for tasks */ - struct hash_table conn_hash; /* conns */ - + struct iser_api_cb api_cb; + char provider_name[ISER_OBJECT_NAME_SIZE]; }; /* iser_global */ extern struct iser_global ig; Index: iser_dto.c =================================================================== --- iser_dto.c (revision 3404) +++ iser_dto.c (working copy) @@ -144,7 +144,7 @@ cur_buf = 0; do { - contig = iser_iovec_contig_length(p_mem, cur_buf, + contig = iser_data_contig_length(p_mem, cur_buf, &contig_start_addr, &contig_size); /* Try to find the buffer in the pre-registered cache */ @@ -163,16 +163,16 @@ /* If got here using a pre-registered buffer is not an option, register memory */ /* Determine an aligned stretch */ - aligned_len = iser_iovec_aligned_length(p_mem, cur_buf); + aligned_len = iser_data_aligned_length(p_mem, cur_buf); if (p_mem->type != ISER_BUF_TYPE_SINGLE) { - p_phys_vec = iser_alloc_phys_mem(p_mem, cur_buf, + p_phys_vec = iser_alloc_phys_desc(p_mem, cur_buf, aligned_len); - iser_convert_mem_to_phys(p_mem, p_phys_vec, cur_buf, + iser_data_convert_to_phys(p_mem, p_phys_vec, cur_buf, aligned_len); } else { p_phys_vec = - iser_alloc_phys_mem(p_mem, 0, p_mem->size); - iser_convert_mem_to_phys(p_mem, p_phys_vec, 0, 0); + iser_alloc_phys_desc(p_mem, 0, p_mem->size); + iser_data_convert_to_phys(p_mem, p_phys_vec, 0, 0); } p_regd_buf = (struct iser_regd_buf *) @@ -182,7 +182,7 @@ printk(KERN_ERR PFX "Failed to alloc registered buffer\n"); ret_val = -1; - iser_free_phys_mem(p_phys_vec); + iser_free_phys_desc(p_phys_vec); goto dto_add_local_memory_exit; } memset(p_regd_buf, 0, sizeof(struct iser_regd_buf)); @@ -198,7 +198,7 @@ "Failed to register %d phys entries " "starting from %d\n", aligned_len, cur_buf); /* ToDo: deregister all previously allocd memory */ - iser_free_phys_mem(p_phys_vec); + iser_free_phys_desc(p_phys_vec); goto dto_add_local_memory_exit; } @@ -218,11 +218,10 @@ "to add %d phys.entries\n", p_regd_buf, p_dto, p_dto->regd_vector_len, p_mem->size); - iser_free_phys_mem(p_phys_vec); + iser_free_phys_desc(p_phys_vec); } while (cur_buf < p_mem->size); - dto_add_local_memory_exit: - + dto_add_local_memory_exit: return ret_val; } /* iser_dto_add_local_memory */ Index: iser_pdu.c =================================================================== --- iser_pdu.c (revision 3404) +++ iser_pdu.c (working copy) @@ -1,1274 +0,0 @@ -/* - * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - - -#include -#include -#include - -#include "iser.h" -#include "iser_bhs.h" -#include "iser_global.h" -#include "iser_utils.h" -#include "iser_trace.h" - -/* size of the buffer for trace string accumulation */ -#define PRINT_BUF_LEN 1024 -/* max size of data printed in a single block */ -#define ISER_PDU_PRINT_MAX_DATA_BLOCK_LEN 256 - -/* current position witin the print buffer */ -static int print_buf_pos = 0; -/* set if the buffer was truncated to avoid overflow */ -static int print_buf_trunc = 0; -/* print buffer */ -static char print_buf[PRINT_BUF_LEN]; - -#define PRINT_BUFF_CAT( fmt , args... ) \ - do { \ - if( !print_buf_trunc && print_buf_pos + 32 < PRINT_BUF_LEN ) \ - print_buf_pos += sprintf(print_buf+print_buf_pos, fmt , ## args); \ - else if( !print_buf_trunc ) { \ - print_buf_pos += \ - sprintf(print_buf+print_buf_pos, " ... (deleted)\n"); \ - print_buf_trunc = 1; \ - } \ - } while(0) - -#define PRINT_BUFF_OUT() \ - do { \ - sprintf(print_buf+print_buf_pos, "\n"); \ - print_buf_pos = 0; print_buf_trunc = 0; \ - ISER_DEBUG_LOG(print_buf); \ - } while(0) - -union u64_view { - __u64 x; - __u8 c[8]; -}; - -static char * -string_llx( __u64 x ) -{ - int i; - static char str[20]; - static char convert[] = "0123456789abcdef"; - char *ptr; - __u32 c; - union u64_view tmp; - - tmp.x = x; - ptr = str; - *ptr++ = '0'; - *ptr++ = 'x'; - for( i = 7; i >= 0; i-- ) - { - c = tmp.c[i]; - *ptr++ = convert[c>>4]; - *ptr++ = convert[c&0xf]; - } - *ptr = '\0'; - return str; -} - - -static void -print_rsvd_u8( int n, __u8 rsvd ) -{ - if( rsvd != 0 ) - PRINT_BUFF_CAT("rsvd%d: 0x%.2x ", n, rsvd); -} - - -static void -print_rsvd_u16( int n, __u16 rsvd ) -{ - if( rsvd != 0 ) - PRINT_BUFF_CAT("rsvd%d: 0x%.4x ", n, rsvd); -} - - -static void -print_rsvd_u32( int n, __u32 rsvd ) -{ - if( rsvd != 0 ) - PRINT_BUFF_CAT("rsvd%d: 0x%.8x ", n, rsvd); -} - - -static void -print_rsvd_u64( int n, __u64 rsvd ) -{ - if( rsvd != 0ll ) - PRINT_BUFF_CAT("rsvd%d: %s ", n, string_llx(rsvd)); -} - - -static void -print_opcode( __u8 opcode ) -{ - PRINT_BUFF_CAT("(0x%.2X%s) ", - opcode&0x3f, ((opcode&0x40)!=0 ? "-Imm" : "") ); -} - - -static void -print_flags( __u8 flags ) -{ - PRINT_BUFF_CAT("flags: 0x%.2x ", flags); -} - - -static void -print_version( char *which, __u8 version ) -{ - PRINT_BUFF_CAT("Version%s: 0x%.2x ", which, version); -} - - -static void -print_response( __u8 response ) -{ - if ( response == 0x00 ) { - PRINT_BUFF_CAT("Resp: Completed(0) "); - } - else if ( response == 0x01 ) { - PRINT_BUFF_CAT("Resp: Failure(1) "); - } - else { - PRINT_BUFF_CAT("Resp: Vendor(%d) ", response); - } -} - - -static void -print_status( __u8 status ) -{ - if ( status == 0x00 ) { - PRINT_BUFF_CAT("Status: GOOD(0) "); - } - else if ( status == 0x02 ) { - PRINT_BUFF_CAT("Status: CHCK(2) "); - } - else if ( status == 0x08 ) { - PRINT_BUFF_CAT("Status: BUSY(2) "); - } - else { - PRINT_BUFF_CAT("Status: 0x%.2x ", status); - } -} - - -static void -print_lun( __u64 lun ) -{ - PRINT_BUFF_CAT("LUN: %s ", string_llx(lun)); -} - - -static void -print_isid_tsih( __u8 isid[6], __u16 tsih ) -{ - PRINT_BUFF_CAT("ISID: 0x%.2x %.2x %.2x %.2x %.2x %.2x ", - isid[0], isid[1], isid[2], isid[3], isid[4], isid[5]); - PRINT_BUFF_CAT("TSIH: %u ", be16_to_cpu(tsih)); -} - - -static void -print_dsl( __u32 length ) -{ - __u32 be32_length = be32_to_cpu(length); - - PRINT_BUFF_CAT("DSL: 0x%X(%u) ", be32_length, be32_length); -} - -static void -print_itt( __u32 init_task_tag ) -{ - if( init_task_tag == 0xffffffff ) - PRINT_BUFF_CAT("ITT: 0x%08X ", init_task_tag); - else { - __u32 be32_init_task_tag = be32_to_cpu(init_task_tag); - PRINT_BUFF_CAT("ITT: 0x%X(%u) ", - be32_init_task_tag, be32_init_task_tag); - } -} - - -static void -print_ttt( __u32 target_xfer_tag ) -{ - if( target_xfer_tag == 0xffffffff ) - PRINT_BUFF_CAT("TTT: 0x%08x ", target_xfer_tag); - else - PRINT_BUFF_CAT("TTT: %u ", be32_to_cpu(target_xfer_tag)); -} - - -static void -print_cid( __u16 cid ) -{ - PRINT_BUFF_CAT("CID: %u ", be16_to_cpu(cid)); -} - - -static void -print_expstatsn( __u32 exp_stat_sn ) -{ - if( exp_stat_sn != 0 ) - PRINT_BUFF_CAT("ExpStatSN: %u ", be32_to_cpu(exp_stat_sn)); -} - - -static void -print_cmdsn_expstatsn( __u32 cmd_sn, __u32 exp_stat_sn ) -{ - PRINT_BUFF_CAT("CmdSN: %u ", be32_to_cpu(cmd_sn)); - print_expstatsn(exp_stat_sn); -} - - -static void -print_statsn_exp_max( __u32 stat_sn, __u32 exp_cmd_sn, __u32 max_cmd_sn ) -{ - if( stat_sn != 0 ) - PRINT_BUFF_CAT("StatSN: %u ", be32_to_cpu(stat_sn)); - PRINT_BUFF_CAT("ExpCmdSN: %u ", be32_to_cpu(exp_cmd_sn)); - PRINT_BUFF_CAT("MaxCmdSN: %u ", be32_to_cpu(max_cmd_sn)); -} - - -static void -print_residual( __u32 resid ) -{ - if( resid != 0 ) - PRINT_BUFF_CAT("ResidualCount: %u ", be32_to_cpu(resid)); -} - - -static void -print_datasn( __u32 data_sn ) -{ - PRINT_BUFF_CAT("DataSN: %u ", be32_to_cpu(data_sn)); -} - -static void -print_offset( __u32 offset, __u32 length ) -{ - __u32 be32_offset = be32_to_cpu(offset); - __u32 be32_length = be32_to_cpu(length); - - PRINT_BUFF_CAT("Buf: %u-%u ", - be32_offset, be32_offset+be32_length); -} - - -static void -print_rtt( __u32 ref_task_tag ) -{ - if( ref_task_tag != 0 ) - PRINT_BUFF_CAT("RTT: 0x%.8x ", be32_to_cpu(ref_task_tag)); -} - - -static void -print_exp_data_sn( __u32 exp_data_sn ) -{ - if( exp_data_sn != 0 ) - PRINT_BUFF_CAT("ExpDataSN: %u ", be32_to_cpu(exp_data_sn)); -} - - -static void -print_begrun( __u32 begrun ) -{ - PRINT_BUFF_CAT("BegRun: %u ", be32_to_cpu(begrun)); -} - - -static void -print_runlen( __u32 runlen ) -{ - PRINT_BUFF_CAT("RunLength: %u ", be32_to_cpu(runlen)); -} - -/* command opcodes which don't appear in */ -#define EXTENDED_COPY 0x83 -#define MOVE_MEDIUM_ATTACHED 0xa7 -#define READ_ELEMENT_STATUS_ATTACHED 0xb4 -#define RECEIVE_COPY_RESULTS 0x84 -#define REPORT_DEVICE_ID 0xa3 -#define REPORT_LUNS 0xa0 -#define SET_DEVICE_ID 0xa4 - -static void -print_scsi_cdb_op( __u8 opcode ) -{ - char *result = "UNKNOWN"; - - /* opcode symbols defined in */ - /* this table includes only those symbols we have actually seen - being sent to/from a vendor */ - switch( opcode ) - { - case TEST_UNIT_READY: /* 0x00 */ - result = "TEST_UNIT_READY"; - break; - case REZERO_UNIT: /* 0x01 */ - result = "REWIND"; - break; - case REQUEST_SENSE: /* 0x03 */ - result = "REQUEST_SENSE"; - break; - case READ_BLOCK_LIMITS: /* 0x05 */ - result = "READ_BLOCK_LIMITS"; - break; - case READ_6: /* 0x08 */ - result = "READ_6"; - break; - case WRITE_6: /* 0x0a */ - result = "WRITE_6"; - break; - case WRITE_FILEMARKS: /* 0x10 */ - result = "WRITE_FILEMARKS"; - break; - case INQUIRY: /* 0x12 */ - result = "INQUIRY"; - break; - case MODE_SENSE: /* 0x1a */ - result = "MODE_SENSE"; - break; - case READ_CAPACITY: /* 0x25 */ - result = "READ_CAPACITY"; - break; - case READ_10: /* 0x28 */ - result = "READ_10"; - break; - case WRITE_10: /* 0x2a */ - result = "WRITE_10"; - break; - case READ_12: /* 0xa8 */ - result = "READ_12"; - break; - case WRITE_12: /* 0xaa */ - result = "WRITE_12"; - break; - case FORMAT_UNIT: /* 0x04 */ - result = "FORMAT_UNIT"; - break; - case REASSIGN_BLOCKS: /* 0x07 */ - result = "REASSIGN_BLOCKS"; - break; - case SEEK_6: /* 0x0b */ - result = "SEEK_6"; - break; - case READ_REVERSE: /* 0x0f */ - result = "READ_REVERSE"; - break; - case SPACE: /* 0x11 */ - result = "SPACE"; - break; - case RECOVER_BUFFERED_DATA: /* 0x14 */ - result = "RECOVER_BUFFERED_DATA"; - break; - case MODE_SELECT: /* 0x15 */ - result = "MODE_SELECT"; - break; - case RESERVE: /* 0x16 */ - result = "RESERVE"; - break; - case RELEASE: /* 0x17 */ - result = "RELEASE"; - break; - case COPY: /* 0x18 */ - result = "COPY"; - break; - case ERASE: /* 0x19 */ - result = "ERASE"; - break; - case START_STOP: /* 0x1b */ - result = "START_STOP"; - break; - case RECEIVE_DIAGNOSTIC: /* 0x1c */ - result = "RECEIVE_DIAGNOSTIC"; - break; - case SEND_DIAGNOSTIC: /* 0x1d */ - result = "SEND_DIAGNOSTIC"; - break; - case ALLOW_MEDIUM_REMOVAL: /* 0x1e */ - result = "ALLOW_MEDIUM_REMOVAL"; - break; - case SET_WINDOW: /* 0x24 */ - result = "SET_WINDOW"; - break; - case SEEK_10: /* 0x2b */ - result = "SEEK_10"; - break; - case WRITE_VERIFY: /* 0x2e */ - result = "WRITE_VERIFY"; - break; - case VERIFY: /* 0x2f */ - result = "VERIFY"; - break; - case SEARCH_HIGH: /* 0x30 */ - result = "SEARCH_HIGH"; - break; - case SEARCH_EQUAL: /* 0x31 */ - result = "SEARCH_EQUAL"; - break; - case SEARCH_LOW: /* 0x32 */ - result = "SEARCH_LOW"; - break; - case SET_LIMITS: /* 0x33 */ - result = "SET_LIMITS"; - break; - case PRE_FETCH: /* 0x34 */ - result = "PRE_FETCH"; - break; - case SYNCHRONIZE_CACHE: /* 0x35 */ - result = "SYNCHRONIZE_CACHE"; - break; - case LOCK_UNLOCK_CACHE: /* 0x36 */ - result = "LOCK_UNLOCK_CACHE"; - break; - case READ_DEFECT_DATA: /* 0x37 */ - result = "READ_DEFECT_DATA"; - break; - case MEDIUM_SCAN: /* 0x38 */ - result = "MEDIUM_SCAN"; - break; - case COMPARE: /* 0x39 */ - result = "COMPARE"; - break; - case COPY_VERIFY: /* 0x3a */ - result = "COPY_VERIFY"; - break; - case WRITE_BUFFER: /* 0x3b */ - result = "WRITE_BUFFER"; - break; - case READ_BUFFER: /* 0x3c */ - result = "READ_BUFFER"; - break; - case UPDATE_BLOCK: /* 0x3d */ - result = "UPDATE_BLOCK"; - break; - case READ_LONG: /* 0x3e */ - result = "READ_LONG"; - break; - case WRITE_LONG: /* 0x3f */ - result = "WRITE_LONG"; - break; - case CHANGE_DEFINITION: /* 0x40 */ - result = "CHANGE_DEFINITION"; - break; - case WRITE_SAME: /* 0x41 */ - result = "WRITE_SAME"; - break; - case READ_TOC: /* 0x43 */ - result = "READ_TOC"; - break; - case LOG_SELECT: /* 0x4c */ - result = "LOG_SELECT"; - break; - case LOG_SENSE: /* 0x4d */ - result = "LOG_SENSE"; - break; - case MODE_SELECT_10: /* 0x55 */ - result = "MODE_SELECT_10"; - break; - case RESERVE_10: /* 0x56 */ - result = "RESERVE_10"; - break; - case RELEASE_10: /* 0x57 */ - result = "RELEASE_10"; - break; - case MODE_SENSE_10: /* 0x5a */ - result = "MODE_SENSE_10"; - break; - case PERSISTENT_RESERVE_IN: /* 0x5e */ - result = "PERSISTENT_RESERVE_IN"; - break; - case PERSISTENT_RESERVE_OUT: /* 0x5f */ - result = "PERSISTENT_RESERVE_OUT"; - break; - case EXTENDED_COPY: /* 0x83 */ - result = "EXTENDED_COPY"; - break; - case RECEIVE_COPY_RESULTS: /* 0x84 */ - result = "RECEIVE_COPY_RESULTS"; - break; - case REPORT_DEVICE_ID: /* 0xa3 */ - result = "REPORT_DEVICE_ID"; - break; - case REPORT_LUNS: /* 0xa0 */ - result = "REPORT_LUNS"; - break; - case SET_DEVICE_ID: /* 0xa4 */ - result = "SET_DEVICE_ID"; - break; - case MOVE_MEDIUM: /* 0xa5 */ - result = "MOVE_MEDIUM"; - break; - case MOVE_MEDIUM_ATTACHED: /* 0xa7 */ - result = "MOVE_MEDIUM_ATTACHED"; - break; - case WRITE_VERIFY_12: /* 0xae */ - result = "WRITE_VERIFY_12"; - break; - case SEARCH_HIGH_12: /* 0xb0 */ - result = "SEARCH_HIGH_12"; - break; - case SEARCH_EQUAL_12: /* 0xb1 */ - result = "SEARCH_EQUAL_12"; - break; - case SEARCH_LOW_12: /* 0xb2 */ - result = "SEARCH_LOW_12"; - break; - case READ_ELEMENT_STATUS_ATTACHED: /* 0xb4 */ - result = "READ_ELEMENT_STATUS_ATTACHED"; - break; - case READ_ELEMENT_STATUS: /* 0xb8 */ - result = "READ_ELEMENT_STATUS"; - break; - case SEND_VOLUME_TAG: /* 0xb6 */ - result = "SEND_VOLUME_TAG"; - break; - case WRITE_LONG_2: /* 0xea */ - result = "WRITE_LONG_2"; - break; - } /* switch */ - - PRINT_BUFF_CAT( "CDB-OP: %s ", result ); -} - -static void -print_init_scsi_cmnd( char * buffer ) -{ - struct iscsi_init_scsi_cmnd *cmd = (struct iscsi_init_scsi_cmnd *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_rsvd_u16(1, cmd->rsvd1); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - PRINT_BUFF_CAT("EDTL: %u ", be32_to_cpu(cmd->xfer_len)); - print_cmdsn_expstatsn(cmd->cmd_sn ,cmd->exp_stat_sn); - print_scsi_cdb_op(cmd->cdb[0]); - PRINT_BUFF_CAT("CDB-DATA: 0x%.2x%.2x%.2x%.2x %.2x%.2x%.2x%.2x " - "%.2x%.2x%.2x%.2x %.2x%.2x%.2x%.2x ", - cmd->cdb[0], cmd->cdb[1], cmd->cdb[2], cmd->cdb[3], - cmd->cdb[4], cmd->cdb[5], cmd->cdb[6], cmd->cdb[7], - cmd->cdb[8], cmd->cdb[9], cmd->cdb[10], cmd->cdb[11], - cmd->cdb[12], cmd->cdb[13], cmd->cdb[14], cmd->cdb[15]); -} - - -static void -print_targ_scsi_rsp( char * buffer ) -{ - struct iscsi_targ_scsi_rsp *cmd = (struct iscsi_targ_scsi_rsp *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_response(cmd->response); - print_status(cmd->status); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_ttt(cmd->target_xfer_tag); - print_statsn_exp_max( cmd->stat_sn , cmd->exp_cmd_sn, cmd->max_cmd_sn); - print_exp_data_sn(cmd->exp_data_sn); - if( cmd->bidi_resid != 0 ) - PRINT_BUFF_CAT("BidirResidualCount: %u ", be32_to_cpu(cmd->bidi_resid)); - print_residual(cmd->resid); -} - - -static void -print_init_text_cmnd( char * buffer ) -{ - struct iscsi_init_text_cmnd *cmd = (struct iscsi_init_text_cmnd *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_rsvd_u16(2, cmd->rsvd2); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_ttt(cmd->target_xfer_tag); - print_cmdsn_expstatsn(cmd->cmd_sn ,cmd->exp_stat_sn); - print_rsvd_u64(4, cmd->rsvd4); - print_rsvd_u64(5, cmd->rsvd5); -} - - -static void -print_targ_text_rsp( char * buffer ) -{ - struct iscsi_targ_text_rsp *cmd = (struct iscsi_targ_text_rsp *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_rsvd_u16(2, cmd->rsvd2); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_ttt(cmd->target_xfer_tag); - print_statsn_exp_max( cmd->stat_sn , cmd->exp_cmd_sn, cmd->max_cmd_sn); - print_rsvd_u32(4, cmd->rsvd4); - print_rsvd_u64(5, cmd->rsvd5); -} - - -static void -print_init_login_cmnd( char * buffer ) -{ - struct iscsi_init_login_cmnd *cmd = (struct iscsi_init_login_cmnd *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_version("Max", cmd->version_max); - print_version("Min", cmd->version_min); - print_dsl(cmd->length); - print_isid_tsih(cmd->isid, cmd->tsih); - print_itt(cmd->init_task_tag); - print_cid(cmd->cid); - print_rsvd_u16(1, cmd->rsvd1); - print_cmdsn_expstatsn(cmd->cmd_sn ,cmd->exp_stat_sn); - print_rsvd_u64(2, cmd->rsvd2); - print_rsvd_u64(3, cmd->rsvd3); -} - - -static void -print_targ_login_rsp( char * buffer ) -{ - struct iscsi_targ_login_rsp *cmd = (struct iscsi_targ_login_rsp *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_version("Max", cmd->version_max); - print_version("Active", cmd->version_active); - print_dsl(cmd->length); - print_isid_tsih(cmd->isid, cmd->tsih); - print_itt(cmd->init_task_tag); - print_rsvd_u32(1, cmd->rsvd1); - print_statsn_exp_max( cmd->stat_sn , cmd->exp_cmd_sn, cmd->max_cmd_sn); - if( cmd->status_class != 0 ) - PRINT_BUFF_CAT("StatusClass: 0x%.2x ", cmd->status_class); - if( cmd->status_detail != 0 ) - PRINT_BUFF_CAT("StatusDetail: 0x%.2x ", cmd->status_detail); - print_rsvd_u16(2, cmd->rsvd2); - print_rsvd_u64(3, cmd->rsvd3); -} - - -static void -print_init_logout_cmnd( char * buffer ) -{ - struct iscsi_init_logout_cmnd *cmd = - (struct iscsi_init_logout_cmnd *)buffer; - - print_opcode(cmd->opcode); - PRINT_BUFF_CAT("reasoncod: 0x%.2x ", cmd->flags); - print_rsvd_u16(1, cmd->rsvd1); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_cid(cmd->cid); - print_rsvd_u16(2, cmd->rsvd2); - print_cmdsn_expstatsn(cmd->cmd_sn ,cmd->exp_stat_sn); - print_rsvd_u64(4, cmd->rsvd4); - print_rsvd_u64(5, cmd->rsvd5); -} - - -static void -print_targ_logout_rsp( char * buffer ) -{ - struct iscsi_targ_logout_rsp *cmd = (struct iscsi_targ_logout_rsp *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_response(cmd->response); - print_rsvd_u8(1, cmd->rsvd1); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_rsvd_u32(3, cmd->rsvd3); - print_statsn_exp_max( cmd->stat_sn , cmd->exp_cmd_sn, cmd->max_cmd_sn); - print_rsvd_u32(4, cmd->rsvd4); - PRINT_BUFF_CAT("Time2Wait: 0x%.8x ", be16_to_cpu(cmd->time2wait)); - PRINT_BUFF_CAT("Tm2Retain: 0x%.8x ", be16_to_cpu(cmd->time2retain)); - print_rsvd_u32(5, cmd->rsvd5); -} - - -static void -print_init_scsi_data_out( char * buffer ) -{ - struct iscsi_init_scsi_data_out *cmd = - (struct iscsi_init_scsi_data_out *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_rsvd_u16(2, cmd->rsvd2); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_ttt(cmd->target_xfer_tag); - print_rsvd_u32(3, cmd->rsvd3); - print_expstatsn(cmd->exp_stat_sn); - print_rsvd_u32(4, cmd->rsvd4); - print_datasn(cmd->data_sn); - print_offset(cmd->offset,cmd->length); - print_rsvd_u32(5, cmd->rsvd5); -} - - -static void -print_targ_scsi_data_in( char * buffer ) -{ - struct iscsi_targ_scsi_data_in *cmd = - (struct iscsi_targ_scsi_data_in *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_rsvd_u8(1, cmd->rsvd1); - print_status(cmd->status); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_ttt(cmd->target_xfer_tag); - print_statsn_exp_max( cmd->stat_sn , cmd->exp_cmd_sn, cmd->max_cmd_sn); - print_datasn(cmd->data_sn); - print_offset(cmd->offset,cmd->length); - print_residual(cmd->resid); -} - - -static void -print_targ_rjt( char * buffer ) -{ - struct iscsi_targ_rjt *cmd = (struct iscsi_targ_rjt *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - if( cmd->reason != 0 ) - PRINT_BUFF_CAT("Reason: 0x%.2x ", cmd->reason); - print_rsvd_u8(2, cmd->rsvd2); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_rsvd_u32(4, cmd->rsvd4); - print_statsn_exp_max( cmd->stat_sn , cmd->exp_cmd_sn, cmd->max_cmd_sn); - print_datasn(cmd->data_sn); - print_rsvd_u64(4, cmd->rsvd4); - print_rsvd_u64(5, cmd->rsvd5); -} - - -static void -print_init_nopout( char * buffer ) -{ - struct iscsi_init_nopout *cmd = (struct iscsi_init_nopout *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_rsvd_u16(1, cmd->rsvd1); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_ttt(cmd->target_xfer_tag); - print_cmdsn_expstatsn(cmd->cmd_sn ,cmd->exp_stat_sn); - print_rsvd_u64(2, cmd->rsvd2); - print_rsvd_u64(3, cmd->rsvd3); -} - - -static void -print_targ_nopin( char * buffer ) -{ - struct iscsi_targ_nopin *cmd = (struct iscsi_targ_nopin *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_rsvd_u16(1, cmd->rsvd1); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_ttt(cmd->target_xfer_tag); - print_statsn_exp_max(cmd->stat_sn , cmd->exp_cmd_sn, cmd->max_cmd_sn); - print_rsvd_u32(2, cmd->rsvd2); - print_rsvd_u64(3, cmd->rsvd3); -} - - -static void -print_targ_r2t( char * buffer ) -{ - struct iscsi_targ_r2t *cmd = (struct iscsi_targ_r2t *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_rsvd_u16(2, cmd->rsvd2); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_ttt(cmd->target_xfer_tag); - print_statsn_exp_max( cmd->stat_sn , cmd->exp_cmd_sn, cmd->max_cmd_sn); - PRINT_BUFF_CAT("R2TSN: %u ", be32_to_cpu(cmd->r2t_sn)); - print_offset(cmd->offset,cmd->xfer_len); -} - - -static void -print_targ_async_msg( char * buffer ) -{ - struct iscsi_targ_async_msg *cmd = (struct iscsi_targ_async_msg *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_rsvd_u16(2, cmd->rsvd2); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_rsvd_u32(3, cmd->rsvd3); - print_statsn_exp_max( cmd->stat_sn , cmd->exp_cmd_sn, cmd->max_cmd_sn); - PRINT_BUFF_CAT("AsyncEvnt: %u ", cmd->async_event); - PRINT_BUFF_CAT("AsyncVCod: %u ", cmd->async_vcode); - if( cmd->parameter1 != 0 ) - PRINT_BUFF_CAT(" Param1: %u ", cmd->parameter1); - if( cmd->parameter2 != 0 ) - PRINT_BUFF_CAT(" Param2: %u ", cmd->parameter2); - if( cmd->parameter3 != 0 ) - PRINT_BUFF_CAT(" Param3: %u ", cmd->parameter3); - print_rsvd_u32(5, cmd->rsvd5); -} - - -static void -print_init_task_mgt_command( char * buffer ) -{ - struct iscsi_init_task_mgt_command *cmd = - (struct iscsi_init_task_mgt_command *)buffer; - - print_opcode(cmd->opcode); - PRINT_BUFF_CAT("Function: 0x%.2x ", cmd->function); - print_rsvd_u16(1, cmd->rsvd1); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_rtt(cmd->ref_task_tag); - print_cmdsn_expstatsn(cmd->cmd_sn, cmd->exp_stat_sn); - if( cmd->ref_cmd_sn != 0 ) - PRINT_BUFF_CAT("RefCmdSN: %u ", be32_to_cpu(cmd->ref_cmd_sn)); - print_exp_data_sn(cmd->exp_data_sn); - print_rsvd_u64(4, cmd->rsvd4); -} - - -static void -print_targ_task_mgt_response( char * buffer ) -{ - struct iscsi_targ_task_mgt_response *cmd = - (struct iscsi_targ_task_mgt_response *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_response(cmd->response); - print_rsvd_u8(1, cmd->rsvd1); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_rsvd_u32(2, cmd->rsvd2); - print_statsn_exp_max( cmd->stat_sn , cmd->exp_cmd_sn, cmd->max_cmd_sn); - print_rsvd_u32(4, cmd->rsvd4); - print_rsvd_u64(5, cmd->rsvd5); -} - - -static void -print_init_snack( char * buffer ) -{ - struct iscsi_init_snack *cmd = (struct iscsi_init_snack *)buffer; - - print_opcode(cmd->opcode); - print_flags(cmd->flags); - print_rsvd_u16(1, cmd->rsvd1); - print_dsl(cmd->length); - print_lun(cmd->lun); - print_itt(cmd->init_task_tag); - print_ttt(cmd->target_xfer_tag); - print_rsvd_u32(2, cmd->rsvd2); - print_expstatsn(cmd->exp_stat_sn); - print_rsvd_u64(3, cmd->rsvd3); - print_begrun(cmd->begrun); - print_runlen(cmd->runlen); -} - - -/*--------------------------------------------------------------------- - */ -/*! - @brief Prints out an iSCSI PDU - its BHS and, if supplied, the data segment -*/ -/* --------------------------------------------------------------------- */ -void -iser_pdu_print(char *prefix, - void *p_id, - char *p_bhs, - struct iser_data_buf_t *p_data) -{ - __u32 opcode; - int max_data_print_len = iser_trace_get_data_max_print_len(); - - PRINT_BUFF_CAT( prefix ); - if ( (char *)p_id != NULL ) - PRINT_BUFF_CAT( "(0x%p): ",(char *)p_id ); - - opcode = p_bhs[0] & ISCSI_OPCODE_MASK; - - switch (opcode) { - case ISCSI_OP_LOGIN_REQ: - { - PRINT_BUFF_CAT( "LOGIN REQUEST" ); - print_init_login_cmnd( p_bhs ); - break; - } - - case ISCSI_OP_TEXT_REQ: - { - PRINT_BUFF_CAT( "TEXT REQUEST" ); - print_init_text_cmnd( p_bhs ); - break; - } - - case ISCSI_OP_SCSI_CMD: - { - PRINT_BUFF_CAT( "SCSI COMMAND" ); - print_init_scsi_cmnd( p_bhs ); - break; - } - - case ISCSI_OP_DATA_OUT: - { - PRINT_BUFF_CAT( "DATA_OUT" ); - print_init_scsi_data_out( p_bhs ); - break; - } - - case ISCSI_OP_TASK_MGT_REQ: - { - PRINT_BUFF_CAT("TAST MNGT CMD" ); - print_init_task_mgt_command( p_bhs ); - break; - } - - case ISCSI_OP_LOGOUT_REQ: - { - PRINT_BUFF_CAT( "LOGOUT REQUEST" ); - print_init_logout_cmnd( p_bhs ); - break; - } - - case ISCSI_OP_NOP_OUT: - { - PRINT_BUFF_CAT( "NOP_OUT" ); - print_init_nopout( p_bhs ); - break; - } - - /* SNACK Handling by Target - SAI */ - case ISCSI_OP_SNACK_REQ: - { - PRINT_BUFF_CAT( "SNACK Request" ); - print_init_snack( p_bhs ); - break; - } - - case ISCSI_OP_NOP_IN: - { - PRINT_BUFF_CAT( "NOP_IN" ); - print_targ_nopin( p_bhs ); - break; - } - - case ISCSI_OP_SCSI_RSP: - { - PRINT_BUFF_CAT( "SCSI RESPONSE" ); - print_targ_scsi_rsp( p_bhs ); - break; - } - - case ISCSI_OP_TASK_MGT_RSP: - { - PRINT_BUFF_CAT( "MGMT RESPONSE" ); - print_targ_task_mgt_response( p_bhs ); - break; - } - - case ISCSI_OP_LOGIN_RSP: - { - PRINT_BUFF_CAT( "LOGIN RESPONSE" ); - print_targ_login_rsp( p_bhs ); - break; - } - - case ISCSI_OP_TEXT_RSP: - { - PRINT_BUFF_CAT( "TEXT RESPONSE" ); - print_targ_text_rsp( p_bhs ); - break; - } - - case ISCSI_OP_DATA_IN: - { - PRINT_BUFF_CAT( "DATA_IN" ); - print_targ_scsi_data_in( p_bhs ); - break; - } - - case ISCSI_OP_LOGOUT_RSP: - { - PRINT_BUFF_CAT( "LOGOUT RESPONSE" ); - print_targ_logout_rsp( p_bhs ); - break; - } - - case ISCSI_OP_R2T: - { - PRINT_BUFF_CAT( "R2T" ); - print_targ_r2t( p_bhs ); - break; - } - - case ISCSI_OP_ASYNC: - { - PRINT_BUFF_CAT( "ASYNC MSG" ); - print_targ_async_msg( p_bhs ); - break; - } - - case ISCSI_OP_REJECT: - { - PRINT_BUFF_CAT( "TARGET REJECT"); - print_targ_rjt( p_bhs ); - break; - } - - default: - { - PRINT_BUFF_CAT("UNKNOWN OPCODE"); - break; - } - } - if (p_data != NULL) { - PRINT_BUFF_CAT("DATA: 0x%p (sz: %d, ptr: 0x%p, type: %s)", - p_data, p_data->size, p_data->p_buf, - iser_data_buf_get_type_name(p_data)); - } - PRINT_BUFF_OUT(); - - /* Print data if available */ - if (max_data_print_len > 0 && p_data != NULL && p_data->size > 0) { - char *p_chunk_buf = NULL; - unsigned int chunk_len = 0; - unsigned int len_to_print = 0; - unsigned int chunk_printed_len = 0; - unsigned int total_printed_len = 0; - unsigned int cur_chunk = 0; - int all_printed = 0; - - do { - if (p_data->type == ISER_BUF_TYPE_SINGLE) { - p_chunk_buf = p_data->p_buf + total_printed_len; - len_to_print = min(p_data->size - total_printed_len, - max_data_print_len - total_printed_len); - len_to_print = min(len_to_print, - (unsigned int) - ISER_PDU_PRINT_MAX_DATA_BLOCK_LEN); - - PRINT_BUFF_CAT("DATA(0x%p,SINGLE,%d+%d/%d): ", - p_data, total_printed_len, len_to_print, - p_data->size); - - /* Update counters */ - total_printed_len += len_to_print; - if (total_printed_len == max_data_print_len || - total_printed_len == p_data->size) { - all_printed = 1; - } - } - else { - switch (p_data->type) { - case ISER_BUF_TYPE_IOVEC_VIRT: { - p_chunk_buf = - iser_iovec_virt_entry_addr(p_data->p_buf,cur_chunk) - + chunk_printed_len; - chunk_len = iser_iovec_entry_len(p_data->p_buf, - cur_chunk); - - len_to_print = min(chunk_len - chunk_printed_len, - max_data_print_len - - total_printed_len); - len_to_print = min(len_to_print, - (unsigned int) - ISER_PDU_PRINT_MAX_DATA_BLOCK_LEN); - - PRINT_BUFF_CAT("DATA(0x%p,IOVECVIRT[%d],%d+%d/%d): ", - p_data, cur_chunk, chunk_printed_len, - len_to_print, chunk_len); - break; - } - case ISER_BUF_TYPE_IOVEC_PHYS: { - p_chunk_buf = - iser_iovec_phys_entry_to_virt(p_data->p_buf, - cur_chunk) + - chunk_printed_len; - chunk_len = iser_iovec_entry_len(p_data->p_buf, - cur_chunk); - - len_to_print = min(chunk_len - chunk_printed_len, - max_data_print_len - - total_printed_len); - len_to_print = min(len_to_print, - (unsigned int) - ISER_PDU_PRINT_MAX_DATA_BLOCK_LEN); - - PRINT_BUFF_CAT("DATA(0x%p,IOVECPHYS[%d],%d+%d/%d): ", - p_data, cur_chunk, - chunk_printed_len, - len_to_print, chunk_len); - break; - } - case ISER_BUF_TYPE_SCATTERLIST: { - p_chunk_buf = - iser_scatterlist_entry_to_virt(p_data->p_buf, - cur_chunk) + - chunk_printed_len; - chunk_len = - iser_scatterlist_entry_len(p_data->p_buf,cur_chunk); - - len_to_print = min(chunk_len - chunk_printed_len, - max_data_print_len - - total_printed_len); - len_to_print = min(len_to_print, - (unsigned int) - ISER_PDU_PRINT_MAX_DATA_BLOCK_LEN); - - PRINT_BUFF_CAT("DATA(0x%p,SCATTERLIST[%d],%d+%d/%d): ", - p_data, cur_chunk, chunk_printed_len, - len_to_print, chunk_len); - break; - } - default: - all_printed = 1; - break; - } /* switch */ - - /* Update counters */ - total_printed_len += len_to_print; - if (total_printed_len == max_data_print_len) { - all_printed = 1; - } - else { - chunk_printed_len += len_to_print; - if (chunk_printed_len == chunk_len) { - chunk_printed_len = 0; - cur_chunk++; - if (cur_chunk == p_data->size) { - all_printed = 1; - } - } - } - } - - if (p_chunk_buf != NULL && len_to_print > 0) { - int i; - - for (i=0; i +#include +#include +#include + +#include "iser.h" +#include "iser_conn.h" +#include "iser_initiator.h" +#include "iser_socket.h" + +#define PF_ISER AF_ISER + +static int iser_sock_create(struct socket *, int); +static int iser_sock_release(struct socket *); +static int iser_sock_connect(struct socket *, struct sockaddr *, int, int); +static int iser_sock_shutdown(struct socket *,int); +static int iser_sock_getsockopt(struct socket *,int,int,char *,int *); +static unsigned int iser_sock_poll(struct file *,struct socket *, + struct poll_table_struct *); + +struct iser_sock { + struct sock sock; + wait_queue_head_t conn_sleep_q; + struct iser_connection iser_conn; +}; + +static struct net_proto_family iser_proto_family = { + family: PF_ISER, + create: iser_sock_create, + authentication: 0, + encryption: 0, + encrypt_net: 0, +}; + +static struct proto_ops iser_proto_ops = { + family: AF_ISER, + owner: THIS_MODULE, + + connect: iser_sock_connect, + release: iser_sock_release, + shutdown: iser_sock_shutdown, + + bind: sock_no_bind, + poll: iser_sock_poll, + socketpair: sock_no_socketpair, + accept: sock_no_accept, + getname: sock_no_getname, + ioctl: sock_no_ioctl, + listen: sock_no_listen, + setsockopt: sock_setsockopt, + getsockopt: iser_sock_getsockopt, + sendmsg: sock_no_sendmsg, + recvmsg: sock_no_recvmsg, + mmap: sock_no_mmap, + sendpage: sock_no_sendpage, +}; + +static struct proto iser_sock_proto = { + name: "ib_iser", + owner: THIS_MODULE, + obj_size: sizeof(struct iser_sock), +}; + +struct iser_connection *iser_conn_from_sock(struct socket *sock) +{ + struct iser_sock *iser_sk = (struct iser_sock *)sock->sk; + return &iser_sk->iser_conn; +} /* iser_conn_from_sock */ + +struct socket *iser_conn_to_sock(struct iser_connection *p_iser_conn) +{ + struct iser_sock *iser_sk; + iser_sk = container_of(p_iser_conn,struct iser_sock,iser_conn); + return iser_sk->sock.sk_socket; +} /* iser_conn_to_sock */ + +int iser_register_sockets(void) +{ + int error = 0; + + error = proto_register(&iser_sock_proto, 1); + if (error < 0) { + printk(KERN_ERR "proto_register failed (%d)\n", error); + goto register_iser_socket_exit; + } + error = sock_register(&iser_proto_family); + if (error < 0) { + printk(KERN_ERR "sock_register failed (%d)\n", error); + } + + register_iser_socket_exit: + return error; +} /* iser_register_sockets */ + +void iser_unreg_sockets(void) +{ + sock_unregister(PF_ISER); + proto_unregister(&iser_sock_proto); +} /* iser_unreg_sockets */ + +static int iser_sock_create(struct socket *sock, int protocol) +{ + struct iser_sock *iser_sk = NULL; + + if (sock->type != SOCK_STREAM) + return -ESOCKTNOSUPPORT; + + iser_sk = (struct iser_sock *)sk_alloc(PF_INET, GFP_KERNEL, + &iser_sock_proto, 1); + if (iser_sk == NULL) + return -ENOBUFS; + + sock_init_data(sock, &iser_sk->sock); + iser_sk->sock.sk_destruct = NULL; + iser_sk->sock.sk_family = PF_ISER; + iser_sk->sock.sk_sndbuf = 64*1024; + + init_waitqueue_head(&iser_sk->conn_sleep_q); + iser_conn_init(&iser_sk->iser_conn); + + sock->ops = &iser_proto_ops; + sock->state = SS_UNCONNECTED; + sock_graft(&iser_sk->sock, sock); + + return 0; +} /* iser_sock_create */ + +int iser_sock_connect(struct socket *sock, struct sockaddr *uservaddr, + int sockaddr_len, int flags) +{ + struct sockaddr_in *dst_addr = (struct sockaddr_in *)uservaddr; + struct iser_sock *iser_sk = (struct iser_sock *)sock->sk; + struct iser_connection *p_iser_conn = &iser_sk->iser_conn; + int iser_err = 0; + + dst_addr->sin_port = htons(dst_addr->sin_port); + printk("%s: ip = %d.%d.%d.%d, port = %d\n", __func__, + NIPQUAD(dst_addr->sin_addr), dst_addr->sin_port); + + iser_err = iser_conn_establish(p_iser_conn, dst_addr, NULL); + if (iser_err) { + printk(KERN_ERR "conn_establish failed: %d\n",iser_err); + goto iser_connect_exit; + } + + /* Sleep until the connection is established or rejected */ + wait_event_interruptible(iser_sk->conn_sleep_q, + atomic_read(&p_iser_conn->state) != ISER_CONN_PENDING); + + if (atomic_read(&p_iser_conn->state) != ISER_CONN_UP) { + iser_err = -EIO; + } + + iser_connect_exit: + return iser_err; +} /* iser_sock_connect */ + +int iser_conn_establish_notify(struct iser_connection *p_iser_conn) +{ + struct iser_sock *iser_sk; + + iser_sk = container_of(p_iser_conn,struct iser_sock,iser_conn); + wake_up_interruptible(&iser_sk->conn_sleep_q); + + return 0; +} /* iser_conn_establish_notify */ + +static inline void iser_sock_free(struct socket *sock) +{ + struct sock *sk = sock->sk; + sock->sk = NULL; + sock_orphan(sk); + sk_free(sk); +} + +int iser_sock_release(struct socket *sock) +{ + struct iser_sock *iser_sock = (struct iser_sock *)sock->sk; + struct iser_connection *p_iser_conn = &iser_sock->iser_conn; + int iser_err = 0; + + if (atomic_read(&p_iser_conn->state) == ISER_CONN_DOWN) { + iser_sock_free(sock); + } else { + iser_err = -EPERM; + } + return iser_err; +} /* iser_sock_release */ + +int iser_sock_shutdown(struct socket *sock, int how) +{ + return 0; +} /* iser_sock_shutdown */ + +static int iser_sock_getsockopt(struct socket *sock, int level, int optname, + char *optval, int *optlen) +{ + return 0; +} /* iser_sock_getsockopt */ + +static unsigned int iser_sock_poll(struct file *file, struct socket *sock, + struct poll_table_struct *wait) +{ + return POLLOUT; +} /* iser_sock_poll */ Index: iser_socket.h =================================================================== --- iser_socket.h (revision 0) +++ iser_socket.h (revision 0) @@ -0,0 +1,56 @@ + +/* + * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + */ + +#ifndef __ISER_SOCKETS_H__ +#define __ISER_SOCKETS_H__ + +#include +#include "iser.h" + +enum iscsi_iser_conn_status { + ISER_SOCK_INVALID, + ISER_SOCK_PENDING, + ISER_SOCK_CONNECTED, + ISER_SOCK_DISCONNECTING, + ISER_SOCK_DISCONNECTED +}; + +int iser_register_sockets(void); +void iser_unreg_sockets(void); + +struct iser_connection *iser_conn_from_sock(struct socket *sock); +struct socket *iser_conn_to_sock(struct iser_connection *p_iser_conn); +int iser_conn_establish_notify(struct iser_connection *p_iser_conn); + +#endif /* __ISER_SOCKETS_H__ */ Index: iser_conn.c =================================================================== --- iser_conn.c (revision 3404) +++ iser_conn.c (working copy) @@ -31,9 +31,16 @@ * */ +#include +#include +#include +#include +#include + #include "iser.h" #include "iser_initiator.h" #include "iser_conn.h" +#include "iser_socket.h" #include "iser_task.h" #include "iser_dto.h" #include "iser_kdapl.h" @@ -67,22 +74,12 @@ } /* iser_conn_get_state_name */ /** - * iser_conn_alloc - Allocates and initializes a conn descriptor + * iser_conn_init - initializes connection structure * - * returns iSER conn descriptor, or NULL on failure + * returns conn name string */ -struct iser_connection *iser_conn_alloc() +void iser_conn_init(struct iser_connection *p_iser_conn) { - struct iser_connection *p_iser_conn; - - /* Allocate iSER conn structure */ - p_iser_conn = kmem_cache_alloc(ig.conn_mem_cache, - GFP_KERNEL | __GFP_NOFAIL); - if (p_iser_conn == NULL) { - printk(KERN_ERR PFX "Failed to alloc iSER conn descriptor.\n"); - return NULL; - } - memset(p_iser_conn, 0, sizeof(struct iser_connection)); ITRACE(ISER_TRACE_CONN, "memset at 0x%p to 0x%lX\n", p_iser_conn, ((long)p_iser_conn) + sizeof(struct iser_connection)); @@ -94,7 +91,6 @@ INIT_LIST_HEAD(&p_iser_conn->task_list); INIT_LIST_HEAD(&p_iser_conn->hash_list); INIT_LIST_HEAD(&p_iser_conn->adaptor_list); - INIT_LIST_HEAD(&p_iser_conn->entity_list); atomic_set(&p_iser_conn->post_recv_buf_count, 0); atomic_set(&p_iser_conn->post_send_buf_count, 0); @@ -119,13 +115,435 @@ p_iser_conn->param.DataPDUInOrder = defaultDataPDUInOrder; p_iser_conn->param.DataSequenceInOrder = defaultDataSequenceInOrder; p_iser_conn->param.ErrorRecoveryLevel = defaultErrorRecoveryLevel; - init_waitqueue_head(&p_iser_conn->disconnect_wait_q); +} /* iser_conn_init */ +/** + * iser_adaptor_init - Initializes iSER adaptor structure. + * + * + * Creates adaptor-scope objects (Interface Adaptor, Protection Zone, + * Public Service Points). + * + * returns 0 on success, -1 on failure + */ +int iser_adaptor_init(struct iser_adaptor *p_iser_adaptor, + char *name) +{ + if (p_iser_adaptor == NULL) + return -1; + + memset(p_iser_adaptor, 0, sizeof(struct iser_adaptor)); + ITRACE(ISER_TRACE_CONN, "memset at 0x%p to 0x%lX\n", + p_iser_adaptor, + ((long)p_iser_adaptor) + sizeof(struct iser_adaptor)); + + /* Initialize waiting queue for the event handler thread */ + init_waitqueue_head(&p_iser_adaptor->dat_events_wait_q); + init_waitqueue_head(&p_iser_adaptor->connect_wait_q); + + /* Initialize list of conns */ + spin_lock_init(&p_iser_adaptor->conn_lock); + INIT_LIST_HEAD(&p_iser_adaptor->conn_list); + + /* Create IA, PZ, EVD */ + if (iser_create_ia_pz_evd(p_iser_adaptor) != 0) { + return -1; + } + + /* Allocate pool of pre-registered iSER headers */ + p_iser_adaptor->header_pool = + iser_small_bpool_create(p_iser_adaptor, "headers", + ISER_TOTAL_HEADERS_LEN, 0, 1); + if (p_iser_adaptor->header_pool == NULL) { + iser_adaptor_release(p_iser_adaptor); + return -1; + } + + /* Initizlize the pre-registered buffers cache */ + iser_reg_all_mem(p_iser_adaptor); + + /* Start the event thread */ + p_iser_adaptor->terminate_thread = 0; + init_MUTEX_LOCKED(&p_iser_adaptor->startstop_sem); + p_iser_adaptor->event_thrd_pid = + kernel_thread(iser_event_handler_thread, + p_iser_adaptor, + CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND); + + if (p_iser_adaptor->event_thrd_pid <= 0) { + printk(KERN_ERR PFX "Failed to start event kernel thread\n"); + iser_adaptor_release(p_iser_adaptor); + return -1; + } + printk("p1\n"); + /* wait for the event thread to start */ + down(&p_iser_adaptor->startstop_sem); + + return 0; +} /* iser_adaptor_init */ + +/** + * iser_adaptor_release - Releases all adaptor-related res. + * + * returns 0 on success, -1 on failure + */ +int iser_adaptor_release(struct iser_adaptor *p_iser_adaptor) +{ + struct iser_connection *p_iser_conn; + + /* Free all conns and associated objects, + must be done before freeing adaptor objects */ + while (!list_empty(&p_iser_adaptor->conn_list)) { + p_iser_conn = list_entry(p_iser_adaptor->conn_list.next, + struct iser_connection, adaptor_list); + /* Connection should be shut down before releasing res */ + iser_conn_sync_terminate(p_iser_conn); + iser_conn_release(p_iser_conn); + } + /* Release buffer pool and unregister its memory */ + if (p_iser_adaptor->header_pool != NULL) { + iser_bpool_destroy(p_iser_adaptor->header_pool); + p_iser_adaptor->header_pool = NULL; + } + + if (iser_unreg_all_mem(p_iser_adaptor) != 0) { + ITRACE(ISER_TRACE_ERRORS, + "iser_unreg_all_mem failed\n"); + } + + /* Free adaptor-related objects */ + if (iser_free_ia_pz_evd(p_iser_adaptor) != 0) { + return -1; + } + + /* kill event thread (if exists) */ + if (p_iser_adaptor->event_thrd_pid > 0) { + ITRACE(ISER_TRACE_EVENT_THREAD, + "start terminating event thread\n"); + lock_kernel(); + init_MUTEX_LOCKED(&p_iser_adaptor->startstop_sem); + mb(); + p_iser_adaptor->terminate_thread = 1; + mb(); + kill_proc(p_iser_adaptor->event_thrd_pid, SIGKILL, 1); + ITRACE(ISER_TRACE_EVENT_THREAD, + "waiting for event thread to terminate down\n"); + down(&p_iser_adaptor->startstop_sem); + unlock_kernel(); + ITRACE(ISER_TRACE_EVENT_THREAD, "event thread terminated\n"); + } + + return 0; +} /* iser_adaptor_release */ + +/** + * iser_adaptor_add_conn - Adds a conn to adaptor + */ +void iser_adaptor_add_conn(struct iser_adaptor *p_iser_adaptor, + struct iser_connection *p_iser_conn) +{ + p_iser_conn->p_adaptor = p_iser_adaptor; + spin_lock(&p_iser_adaptor->conn_lock); + list_add(&p_iser_conn->adaptor_list, &p_iser_adaptor->conn_list); + spin_unlock(&p_iser_adaptor->conn_lock); + +} /* iser_adaptor_add_conn */ + +/** + * iser_adaptor_find_conn - Adds a conn to adaptor + * + * returns 0 on success, -1 on failure + */ +struct iser_connection *iser_adaptor_find_conn(struct iser_adaptor + *p_iser_adaptor, + void *ep_handle) +{ + + struct iser_connection *p_iser_conn = NULL; + struct list_head *p_list; + + spin_lock(&p_iser_adaptor->conn_lock); + p_list = p_iser_adaptor->conn_list.next; + while (p_list != &p_iser_adaptor->conn_list) { + p_iser_conn = list_entry(p_list, struct iser_connection, + adaptor_list); + if (((void *)p_iser_conn->ep_handle) == ep_handle) + break; + p_iser_conn = NULL; + p_list = p_list->next; + } + spin_unlock(&p_iser_adaptor->conn_lock); + return p_iser_conn; -} /* iser_conn_alloc */ +} /* iser_adaptor_find_conn */ /** + * iser_adaptor_remove_conn - Removes a conn from adaptor + */ +void iser_adaptor_remove_conn(struct iser_connection *p_iser_conn) +{ + struct iser_adaptor *p_iser_adaptor; + + if (!list_empty(&p_iser_conn->adaptor_list)) { + p_iser_adaptor = p_iser_conn->p_adaptor; + if (p_iser_adaptor == NULL) { + IPANIC("NULL adaptor in conn: 0x%p\n", p_iser_conn); + } + spin_lock(&p_iser_adaptor->conn_lock); + list_del(&p_iser_conn->adaptor_list); + spin_unlock(&p_iser_adaptor->conn_lock); + } + +} /* iser_adaptor_remove_conn */ + +/** + * iser_conn_establish - establish iser connection + */ +int iser_conn_establish(struct iser_connection *p_iser_conn, + struct sockaddr_in *dst_addr, + struct sockaddr_in *src_addr) +{ + struct iser_adaptor *p_adaptor; + char bufpool_name[128]; + int ep_create_ret; + int iser_err = 0; + + p_adaptor = &ig.adaptor[0]; + iser_adaptor_add_conn(p_adaptor, p_iser_conn); + + /* Allocate post-receive buffers for the login phase */ + sprintf(bufpool_name, "login-pdu-data"); + p_iser_conn->post_recv_pool = + iser_large_bpool_create(p_iser_conn->p_adaptor, bufpool_name, + ISER_LOGIN_PHASE_PDU_DATA_LEN, 4, 0); + if (p_iser_conn->post_recv_pool == NULL) { + printk(KERN_ERR PFX "login pool = NULL\n"); + iser_err = -ENOMEM; + goto iser_conn_establish_failure; + } + + /* During the login phase data for sent PDUs is drawn from */ + /* the same buffer pool as recv */ + p_iser_conn->send_data_pool = p_iser_conn->post_recv_pool; + + ep_create_ret = iser_create_ep(p_iser_conn); + if (ep_create_ret != 0) { + iser_err = -EIO; + goto iser_conn_establish_failure; + } + + if (dst_addr != NULL) { + sprintf(p_iser_conn->name, "%d.%d.%d.%d", + NIPQUAD(dst_addr->sin_addr)); + } else { + sprintf(p_iser_conn->name, "Unknown"); + } + ITRACE(ISER_TRACE_CONN, "Connecting to: %s, port 0x%x\n", + p_iser_conn->name, dst_addr->sin_port); + + atomic_set(&p_iser_conn->state, ISER_CONN_PENDING); + + if (iser_connect(p_iser_conn, dst_addr, src_addr) != 0) { + printk(KERN_ERR PFX "iser_connect failed\n"); + goto iser_conn_establish_failure; + } + return 0; + +iser_conn_establish_failure: + if (p_iser_conn) { + atomic_set(&p_iser_conn->state, ISER_CONN_DOWN); + } + printk(KERN_ERR PFX "%s: failed\n", __FUNCTION__); + return iser_err; +} /* iser_conn_establish */ + +/** + * iser_conn_bind - iSER API. + * Binds iSER connection and previosuly connected socket + * Exchanges connection handles with iSCSI layer. + * + */ +int iser_conn_bind(void *iscsi_conn_h, /* IN */ + struct socket *sock, /* IN */ + void **iser_conn_h) /* OUT */ +{ + struct iser_connection *p_iser_conn; + + p_iser_conn = iser_conn_from_sock(sock); + p_iser_conn->iscsi_conn_h = iscsi_conn_h; + *iser_conn_h = p_iser_conn; + return 0; +} /* iser_conn_bind */ + +/** + * iser_conn_enable_rdma - iSER API. Implements + * Allocate_Connection_Resources and Enable_Datamover primitives. + * + */ +int iser_conn_enable_rdma(void *iser_conn_h, + struct iser_conn_res *conn_res) +{ + struct iser_connection *p_iser_conn = iser_conn_h; + int iser_err = 0; + int i; + char bufpool_name[ISER_BUF_POOL_NAME_SIZE]; + + if (p_iser_conn == NULL) { + printk(KERN_ERR PFX "NULL iser conn handle\n"); + iser_err = -EINVAL; + goto conn_enable_rdma_exit; + } + + p_iser_conn->max_outstand_cmds = conn_res->max_outstand_cmds; + p_iser_conn->alloc_post_recv_bufs_num = ISER_EP_AVG_POST_RECV + 2; + p_iser_conn->initial_post_recv_bufs_num = ISER_INITIAL_POST_RECV + 2; + + ITRACE(ISER_TRACE_CONN, + "Max outst. cmds: %d, Allocate post recv bufs:" + "%d, Initially post: %d\n", + p_iser_conn->max_outstand_cmds, + p_iser_conn->alloc_post_recv_bufs_num, + p_iser_conn->initial_post_recv_bufs_num); + + /* Allocate post-receive buffers for the full-featured phase */ + snprintf(bufpool_name, ISER_BUF_POOL_NAME_SIZE, "%s-%s-post-recv", + ig.provider_name, p_iser_conn->name); + + p_iser_conn->spare_post_recv_pool = + iser_small_bpool_create(p_iser_conn->p_adaptor, + bufpool_name, + 128, + p_iser_conn->alloc_post_recv_bufs_num, + 0); + if (p_iser_conn->spare_post_recv_pool == NULL) { + printk(KERN_ERR PFX "Failed to alloc the post receive buffer " + "pool for conn = 0x%p, iscsi_h = 0x%p\n", + p_iser_conn, p_iser_conn->iscsi_conn_h); + iser_err = -EINVAL; + } + + /* Allocate send-data buffers for the full-featured phase */ + snprintf(bufpool_name, ISER_BUF_POOL_NAME_SIZE, "%s-%s-send-data", + ig.provider_name, p_iser_conn->name); + + /* ToDo: TargetMaxRecvDSL */ + p_iser_conn->spare_send_data_pool = + iser_large_bpool_create(p_iser_conn->p_adaptor, bufpool_name, + 8 * 1024, + ISER_MAX_NOP_OUT + ISER_MAX_TASK_MGT_REQ, + 0); + if (p_iser_conn->spare_send_data_pool == NULL) { + printk(KERN_ERR PFX + "Failed to alloc the send data buffer pool " + "for conn = 0x%p, iscsi_h = 0x%p\n", p_iser_conn, + p_iser_conn->iscsi_conn_h); + iser_err = -EINVAL; + } + + /* Check that there is no posted recv or send buffers left - */ + /* they must be consumed during the login phase */ + if (atomic_read(&p_iser_conn->post_recv_buf_count) != 0) { + IPANIC("Number of currently posted recv bufs non-zero\n"); + } + if (atomic_read(&p_iser_conn->post_send_buf_count) != 0) { + IPANIC("Number of currently posted send bufs non-zero\n"); + } + /* If a buffer pool has been allocd for loginphase, destroy it */ + if (p_iser_conn->post_recv_pool != NULL) { + iser_bpool_destroy(p_iser_conn->post_recv_pool); + } + + /* Switch to the main post-recv buffer pool */ + p_iser_conn->post_recv_pool = p_iser_conn->spare_post_recv_pool; + p_iser_conn->spare_post_recv_pool = NULL; + + /* Switch to the main send-data buffer pool */ + p_iser_conn->send_data_pool = p_iser_conn->spare_send_data_pool; + p_iser_conn->spare_send_data_pool = NULL; + + /* Initial Post-Receive buffers */ + for (i = 0; i < p_iser_conn->initial_post_recv_bufs_num; i++) { + if (iser_post_receive_control(p_iser_conn) != 0) { + printk(KERN_ERR PFX "Failed to post recv buffers\n"); + iser_err = -ENOMEM; + goto conn_enable_rdma_exit; + } + } + ITRACE(ISER_TRACE_CONN, "Allocated %d post recv bufs\n", i); + + conn_enable_rdma_exit: + return iser_err; +} /* iser_conn_enable_rdma */ + +/** + * iser_conn_term - iSER API. Implements + * Connection_Terminate primitive. + * + * Starts conn teardown process. Waits until all previously posted + * buffers get flushed. Deallocs all conn res. + */ +int iser_conn_term(void *iser_conn_h) +{ + struct iser_connection *p_iser_conn = iser_conn_h; + struct iser_dto *p_recv_dto; + struct iser_task *p_iser_task; + int iser_err = 0; + + if (p_iser_conn == NULL) { + iser_err = -EINVAL; + goto iser_conn_term_exit; + } + if (atomic_read(&p_iser_conn->state) != ISER_CONN_UP) { + iser_err = -EPERM; + goto iser_conn_term_exit; + } + + /* Release all receive control DTOs passed to Control_Notify */ + spin_lock(&p_iser_conn->conn_lock); + while (!list_empty(&p_iser_conn->ctrl_notify_dto_list)) { + /* Get the next rcv buffer & remove it + from the list */ + p_recv_dto = + list_entry(p_iser_conn-> + ctrl_notify_dto_list.next, + struct iser_dto, dto_list); + list_del(&p_recv_dto->dto_list); + spin_unlock(&p_iser_conn->conn_lock); + + /* Get the recv DTO descriptor coupled + with the PDU */ + if (p_recv_dto->type != ISER_DTO_RCV) { + IPANIC("Releasing non-RECV type dto: " + "0x%p, type: %d\n", + p_recv_dto, p_recv_dto->type); + } + + /* Release the buffers and dto descriptor */ + iser_dto_free(p_recv_dto); + + /* It this is the last DTO in a completed + task then free the task itself */ + p_iser_task = p_recv_dto->p_task; + if (p_iser_task != NULL && + iser_task_ctrl_notify_count_dec_and_test + (p_iser_task)) { + iser_task_free(p_iser_task); + } + + spin_lock(&p_iser_conn->conn_lock); + } + spin_unlock(&p_iser_conn->conn_lock); + + /* We need to terminate the conn synchronously */ + iser_conn_sync_terminate(p_iser_conn); + iser_conn_release(p_iser_conn); + + iser_conn_term_exit: + return iser_err; +} /* iser_conn_term */ + +/** * iser_dealloc_conn_res - iSER API. * Implements Dealloc_Connection_Resources primitive. * @@ -133,29 +551,19 @@ * Deallocs conn res previously allocd using * alloc_conn_res(), if the conn becomes * unnecessary. - * - * @returns iSER status (ISER_SUCCESS, ISER_FAILURE, ISER_INVALID_CONN) */ -iser_status iser_dealloc_conn_res(void *iscsi_conn_h) +int iser_dealloc_conn_res(void *iser_conn_h) { - iser_status iser_ret = ISER_SUCCESS; - struct iser_connection *p_iser_conn; + struct iser_connection *p_iser_conn = iser_conn_h; + int iser_err = 0; - /* Find the conn */ - p_iser_conn = hash_find_iser_conn(iscsi_conn_h); - if (p_iser_conn == NULL) { - printk(KERN_ERR PFX - "Failed to find iSER conn, for iscsi_h = 0x%p\n", - iscsi_conn_h); - return ISER_INVALID_CONN; - goto dealloc_conn_res_exit; + if (p_iser_conn != NULL) { + iser_conn_release(p_iser_conn); + } else { + printk(KERN_ERR PFX "NULL conn handle\n"); + iser_err = -EINVAL; } - - /* Dealloc all res, free conn descriptor */ - iser_conn_free(p_iser_conn); - - dealloc_conn_res_exit: - return iser_ret; + return iser_err; } /* iser_dealloc_conn_res */ /** @@ -179,26 +587,19 @@ * * An iSCSI layer requests its local iSER datamover layer * to take note of the negotiated values of the listed keys. - * - * returns iSER status (ISER_SUCCESS, ISER_FAILURE, - * ISER_INVALID_CONN, ISER_ILLEGAL_PARAM) */ -iser_status iser_notice_key_values(void *iscsi_conn_h, char *key, char *value) +int iser_notice_key_values(void *iser_conn_h, char *key, char *value) { - struct iser_connection *p_iser_conn; + struct iser_connection *p_iser_conn = iser_conn_h; int num_val; - /* Find the conn */ - p_iser_conn = hash_find_iser_conn(iscsi_conn_h); if (p_iser_conn == NULL) { - printk(KERN_ERR PFX "Connection not found, conn_h: %ld\n", - (unsigned long)iscsi_conn_h); - return ISER_INVALID_CONN; + printk(KERN_ERR PFX "NULL conn handle \n"); + return -EINVAL; } - if (key == NULL || value == NULL) { printk(KERN_ERR PFX "NULL key or value\n"); - return ISER_ILLEGAL_PARAM; + return -EINVAL; } /* Compare key to the supported key p_iser_connnames @@ -250,15 +651,15 @@ p_iser_conn->param.ErrorRecoveryLevel = num_val; } else { printk(KERN_ERR PFX "Unsupported key name: %s\n", key); - return ISER_INVALID_KEY; + return -EINVAL; } ITRACE(ISER_TRACE_CONN, "Set %s to %d\n", key, num_val); - return ISER_SUCCESS; + return 0; - notice_key_values_invalid_val: + notice_key_values_invalid_val: printk(KERN_ERR PFX "Invalid value: %s for key %s\n", value, key); - return ISER_INVALID_VALUE; + return -EINVAL; } /* iser_notice_key_values */ @@ -337,26 +738,22 @@ * * returns allocd DTO descriptor */ -iser_status -iser_release_control(void *iscsi_conn_h, struct iser_recv_pdu *p_ctrl_pdu) +int +iser_release_control(void *iser_conn_h, struct iser_recv_pdu *p_ctrl_pdu) { - struct iser_connection *p_iser_conn; + struct iser_connection *p_iser_conn = iser_conn_h; struct iser_dto *p_recv_dto; struct iser_task *p_iser_task; - iser_status iser_ret = ISER_SUCCESS; + int iser_err = 0; - /* Find the conn */ - p_iser_conn = hash_find_iser_conn(iscsi_conn_h); if (p_iser_conn == NULL) { - iser_ret = ISER_INVALID_CONN; + iser_err = -EINVAL; goto release_control_exit; } - if (p_ctrl_pdu == NULL) { - printk(KERN_ERR PFX - "NULL receive control to release, conn: 0x%p\n", - iscsi_conn_h); - iser_ret = ISER_ILLEGAL_PARAM; + printk(KERN_ERR PFX "NULL recv ctrl to release, conn: 0x%p\n", + p_iser_conn); + iser_err = -EINVAL; goto release_control_exit; } @@ -384,13 +781,14 @@ } release_control_exit: - return iser_ret; + return iser_err; } /* iser_release_control */ /** * iser_conn_wait_for_disconn - Sleep on queue and wait * until the conn is fully disconnected - **/ + * + */ void iser_conn_wait_for_disconn(struct iser_connection *p_iser_conn) { ITRACE(ISER_TRACE_CONN, "Waiting for the disconnect event, p_conn: " @@ -400,23 +798,19 @@ ISER_CONN_DOWN)); ITRACE(ISER_TRACE_CONN, "Got the disconnect event, p_conn: 0x%p\n", p_iser_conn); +} /* iser_conn_wait_for_disconn */ - /* Destroy the conn finally */ - /*iser_conn_free(p_iser_conn); */ -} - /** - * iser_conn_sync_terminate - Triggers start of the disconn procedures + * iser_conn_sync_terminate - Triggers start of the disconnect procedures */ int iser_conn_sync_terminate(struct iser_connection *p_iser_conn) { - int ret_val = -1; + int ret_val = 0; switch (atomic_read(&p_iser_conn->state)) { case ISER_CONN_UP: case ISER_CONN_PENDING: /* Signal that the conn is being terminated synchronously */ - /*p_iser_conn->state = ISER_CONN_SYNC_TERM; */ atomic_set(&p_iser_conn->state, ISER_CONN_SYNC_TERM); /* Start the disconnect procedures */ ret_val = iser_disconnect(p_iser_conn); @@ -432,25 +826,23 @@ case ISER_CONN_SYNC_TERM: iser_conn_wait_for_disconn(p_iser_conn); - ret_val = 0; break; case ISER_CONN_ASYNC_TERM: /* Signal that the conn is being terminated synchronously */ - /*p_iser_conn->state = ISER_CONN_SYNC_TERM; */ atomic_set(&p_iser_conn->state, ISER_CONN_SYNC_TERM); /* Sleep on queue and wait until the conn is fully disconnected */ iser_conn_wait_for_disconn(p_iser_conn); - ret_val = 0; break; case ISER_CONN_DOWN: - ret_val = 0; /* this may happen only when the iSCSI is - being currently notified */ + /* this may happen only when iSCSI is being notified */ + break; default: printk(KERN_ERR PFX "called when in state %s\n", iser_conn_get_state_name(p_iser_conn)); + ret_val = -EPERM; break; } @@ -481,116 +873,40 @@ } /* iser_conn_async_terminate */ /** - * iser_connectionerminate - iSER API. Implements - * Connection_Terminate primitive. - * - * - * Starts conn teardown process. Waits until all previously posted - * buffers get flushed. Deallocs all conn res. - * - * returns iSER status (ISER_SUCCESS, ISER_FAILURE, ISER_INVALID_CONN) - */ -iser_status iser_connectionerminate(void *iscsi_conn_h) -{ - iser_status iser_ret = ISER_SUCCESS; - struct iser_connection *p_iser_conn; - - /* Find the conn */ - p_iser_conn = hash_find_iser_conn(iscsi_conn_h); - if (p_iser_conn != NULL) { - struct iser_dto *p_recv_dto; - struct iser_task *p_iser_task; - - /* Release all receive control DTOs passed to Control_Notify */ - spin_lock(&p_iser_conn->conn_lock); - while (!list_empty(&p_iser_conn->ctrl_notify_dto_list)) { - /* Get the next rcv buffer & remove it - from the list */ - p_recv_dto = - list_entry(p_iser_conn-> - ctrl_notify_dto_list.next, - struct iser_dto, dto_list); - list_del(&p_recv_dto->dto_list); - spin_unlock(&p_iser_conn->conn_lock); - - /* Get the recv DTO descriptor coupled - with the PDU */ - if (p_recv_dto->type != ISER_DTO_RCV) { - IPANIC("Releasing non-RECV type dto: " - "0x%p, type: %d\n", - p_recv_dto, p_recv_dto->type); - } - - /* Release the buffers and dto descriptor */ - iser_dto_free(p_recv_dto); - - /* It this is the last DTO in a completed - task then free the task itself */ - p_iser_task = p_recv_dto->p_task; - if (p_iser_task != NULL && - iser_task_ctrl_notify_count_dec_and_test - (p_iser_task)) { - iser_task_free(p_iser_task); - } - - spin_lock(&p_iser_conn->conn_lock); - } - spin_unlock(&p_iser_conn->conn_lock); - - /* We need to terminate the conn syncronously */ - if (iser_conn_sync_terminate(p_iser_conn) != 0) { - iser_ret = ISER_FAILURE; - } - } else { - iser_ret = ISER_INVALID_CONN; - } - - return iser_ret; -} /* iser_connectionerminate */ - -/** - * iser_conn_free - Frees all conn objects and + * iser_conn_release - Frees all conn objects and * deallocs conn descriptor */ -void iser_conn_free(struct iser_connection *p_iser_conn) +void iser_conn_release(struct iser_connection *p_iser_conn) { - if (atomic_read(&p_iser_conn->state) == ISER_CONN_DOWN) { - /* Free all previosly allocd kDAPL objects, */ - /* this may be not the first call */ - /* but the end-point is freed only once */ - iser_free_ep(p_iser_conn); - - /* Detach from the adaptor conns list */ + iser_free_ep(p_iser_conn); /* ep is freed only once */ iser_adaptor_remove_conn(p_iser_conn); - /* Detach from the entity conns list */ - iser_entity_remove_conn(p_iser_conn); - - /* Detach from conns hash */ - hash_delete_iser_conn(p_iser_conn); - /* Destroy the buffer pools */ if (p_iser_conn->post_recv_pool != NULL) { iser_bpool_destroy(p_iser_conn->post_recv_pool); + if (p_iser_conn->post_recv_pool == + p_iser_conn->send_data_pool) { + p_iser_conn->send_data_pool = NULL; + } + p_iser_conn->post_recv_pool = NULL; } - if (p_iser_conn->send_data_pool != NULL && - p_iser_conn->send_data_pool != - p_iser_conn->post_recv_pool) { + if (p_iser_conn->send_data_pool != NULL) { iser_bpool_destroy(p_iser_conn->send_data_pool); + p_iser_conn->send_data_pool = NULL; } - if (p_iser_conn->spare_post_recv_pool != NULL) { iser_bpool_destroy(p_iser_conn->spare_post_recv_pool); + p_iser_conn->spare_post_recv_pool = NULL; } if (p_iser_conn->spare_send_data_pool != NULL) { iser_bpool_destroy(p_iser_conn->spare_send_data_pool); + p_iser_conn->spare_send_data_pool = NULL; } - /* Return the memory to the slab */ - kmem_cache_free(ig.conn_mem_cache, p_iser_conn); + sock_release(iser_conn_to_sock(p_iser_conn)); } -} /* iser_conn_free */ +} /* iser_conn_release */ /** * iser_complete_conn_termination - Checks if the conn @@ -631,9 +947,7 @@ /* If the conn was terminated asynchronously, */ /* notify the upper layer */ if (cur_conn_state == ISER_CONN_ASYNC_TERM) { - p_iser_conn->p_entity-> - api_cb.conn_terminate_notify(p_iser_conn-> - iscsi_conn_h); + ig.api_cb.conn_term_notify(p_iser_conn->iscsi_conn_h); } /* Free all tasks in this conn */ @@ -647,15 +961,11 @@ } spin_unlock(&p_iser_conn->conn_lock); - /* Destroy the conn finally */ - iser_conn_free(p_iser_conn); - if (cur_conn_state == ISER_CONN_SYNC_TERM) { - /* If the conn was terminated syncronously, wake up - the upper layer (waiting on a sleep queue) */ wake_up_interruptible(&p_iser_conn->disconnect_wait_q); + } else { + iser_conn_release(p_iser_conn); } - retval = 0; } else { ITRACE(ISER_TRACE_CONN, @@ -665,7 +975,6 @@ send_buf_count, iser_conn_get_state_name(p_iser_conn)); retval = -1; } - return retval; } /* iser_complete_conn_termination */ Index: iser_utils.c =================================================================== --- iser_utils.c (revision 3404) +++ iser_utils.c (working copy) @@ -35,8 +35,8 @@ #include #include #include -#include /* struct iovec */ -#include /* struct scatterlist */ +#include +#include #include "iser.h" #include "iser_utils.h" @@ -134,7 +134,6 @@ */ void hash_delete_iser_task(struct iser_task *it) { - spin_lock(&ig.task_hash.lock); if (!list_empty(&it->hash_list)) list_del_init(&it->hash_list); @@ -143,128 +142,10 @@ } /* hash_delete_iser_task */ /* --------------------------------------------------------------------- - * ISER CONNECTION-SPECIFIC HASH MANAGEMENT + * BUFFERS * ------------------------------------------------------------------ */ /** - * hash_add_iser_conn - Add iSER conn descriptor to the - * conns hash table - */ -void hash_add_iser_conn(struct iser_connection *iser_conn) -{ - int hash_val; - - hash_val = hash_func((u32) (long)iser_conn->iscsi_conn_h); - - spin_lock(&ig.conn_hash.lock); - INIT_LIST_HEAD(&iser_conn->hash_list); - list_add_tail(&iser_conn->hash_list, - &(ig.conn_hash.bucket_head[hash_val])); - spin_unlock(&ig.conn_hash.lock); - -} /* hash_add_iser_conn */ - -/** - * hash_find_iser_conn - Find am iSER conn descriptor - * in the conns hash table. - * - * Use conn handle supplied by iSCSI as the key. - * - * returns found conn decriptor or NULL. - */ -struct iser_connection *hash_find_iser_conn(void *iscsi_conn_h) -{ - int hash_val; - struct list_head *p_bucket; - struct list_head *p_list; - struct iser_connection *iser_conn = NULL; - - hash_val = hash_func((u32) (long)iscsi_conn_h); - p_bucket = &(ig.conn_hash.bucket_head[hash_val]); - - spin_lock(&ig.conn_hash.lock); - p_list = p_bucket->next; - while (p_list != p_bucket) { - iser_conn = - list_entry(p_list, struct iser_connection, hash_list); - if (iser_conn->iscsi_conn_h == iscsi_conn_h) - break; - iser_conn = NULL; - p_list = p_list->next; - } - spin_unlock(&ig.conn_hash.lock); - - return iser_conn; -} /* hash_find_iser_conn */ - -/* - * Removes an iSER conn from the conns hash - */ -void hash_delete_iser_conn(struct iser_connection *iser_conn) -{ - - spin_lock(&ig.conn_hash.lock); - if (!list_empty(&iser_conn->hash_list)) { - list_del_init(&iser_conn->hash_list); - } - spin_unlock(&ig.conn_hash.lock); - -} /* hash_delete_iser_conn */ - -/** - * iser_get_data_total_length - Calculates the total buffer - * length for different buffer - * descriptor types - * - * returns total buffer length in bytes - */ -unsigned long iser_get_data_total_length(struct iser_data_buf *p_data, - int skip, int count) -{ - unsigned long total_len = 0; - - if (p_data == NULL) { - IPANIC("NULL data buffer descriptor\n"); - } - switch (p_data->type) { - case ISER_BUF_TYPE_SINGLE: - total_len = p_data->size; - break; - - case ISER_BUF_TYPE_SCATTERLIST:{ - struct scatterlist *p_sg = - (struct scatterlist *)p_data->p_buf; - int i, last; - - if (p_sg == NULL) { - IPANIC("NULL data buffer's sglist\n"); - } - - last = skip + count; - /* check for last>p_mem->size error ? */ - for (i = skip; i < last; i++) { - total_len += (unsigned long)p_sg[i].length; - } - break; - } - - default: - IPANIC("Unsupported buffer type: %d\n", p_data->type); - break; - } - - ITRACE(ISER_TRACE_PHYS_MEM_REG, - "calculated total length=%ld for p_data->type = %d\n", - total_len, p_data->type); - /* total_len is expected to be > 0 */ - if (total_len == 0) { - IPANIC("total_len is 0\n"); - } - - return total_len; -} /* iser_get_data_total_length */ - -/** * iser_single_virt_to_phys - Translates virtual addresses from a * single pointer to * physical addresses in a user-supplied output array @@ -290,7 +171,7 @@ } ITRACE(ISER_TRACE_PHYS_MEM_REG, - "Translating data: 0x%p, single virt: " "0x%p, data size: %d\n", + "Translating data:0x%p, single virt:0x%p, data sz: %d\n", p_data, p_data->p_buf, p_data->size); /* compute the offset of first element */ @@ -310,8 +191,8 @@ for (i = 0, page = fpage; page < lpage; page += PAGE_SIZE, i++) { p_phys->addrs[i] = page; ITRACE(ISER_TRACE_PHYS_MEM_REG, - "SINGLE VIRT ADDED page[%d]=0x%lX " "at pmt %p\n", i, - (long)page, p_phys); + "SINGLE VIRT ADDED page[%d]=0x%lX at phys_desc %p\n", + i, (long)page, p_phys); } p_phys->data_size = p_data->size; @@ -322,12 +203,13 @@ /** * iser_sglist_virt_to_phys - Translates scatterlist entries to * physical addresses + * * returns the length of resulting physical address array (may be less - * than the original - * due to possible compaction). + * than the original due to possible compaction). */ int iser_sglist_virt_to_phys(struct iser_data_buf *p_data, - struct iser_phys_mem *p_phys, int skip, int count) + struct iser_phys_mem *p_phys, + int skip, int count) { struct scatterlist *p_sg; unsigned int cur_phys = 0; @@ -382,16 +264,17 @@ return cur_phys; } /* iser_sglist_virt_to_phys */ -#define MASK_4K ((1UL << 12) - 1) /* 0xFFF */ -#define IS_PAGE_ALIGNED(addr) ((((unsigned long)addr) & ~MASK_4K) == 0) +#define MASK_4K ((1UL << 12) - 1) /* 0xFFF */ +#define IS_4K_ALIGNED(addr) ((((unsigned long)addr) & MASK_4K) == 0) /** - * iser_iovec_aligned_length - Tries to determine the maximal + * iser_data_aligned_length - Tries to determine the maximal * correctly aligned sub-list of a * scatter-gather list of memory buffers * returns the number of entries which are aligned correctly */ -unsigned int iser_iovec_aligned_length(struct iser_data_buf *p_data, int skip) +unsigned int iser_data_aligned_length(struct iser_data_buf *p_data, + int skip) { unsigned int ret_len = 0; int i, count; @@ -427,7 +310,7 @@ "Checking sg iobuf end address " "0x%08lX\n", end_addr); if ((i + 1 < p_data->size) - && !IS_PAGE_ALIGNED(end_addr)) { + && !IS_4K_ALIGNED(end_addr)) { ret_len = count + 1; break; } @@ -447,15 +330,15 @@ ret_len, p_data->size, p_data); return ret_len; -} /* iser_iovec_aligned_length */ +} /* iser_data_aligned_length */ /** - * iser_iovec_contig_length - Tries to determine the maximal + * iser_data_contig_length - Tries to determine the maximal * contiguous sub-list of a * scatter-gather list of memory buffers * returns the number of entries which are aligned correctly */ -unsigned int iser_iovec_contig_length(struct iser_data_buf *p_data, int skip, +unsigned int iser_data_contig_length(struct iser_data_buf *p_data, int skip, uint64_t * start_addr, int *size) { unsigned int ret_len = 0; @@ -509,75 +392,25 @@ return ret_len; } -/** - * iser_phys_to_virt - Translates physical address to virtual kernel address - * returns virtual kernel address - */ -inline void *iser_phys_to_virt(void *phys_addr) -{ - return phys_to_virt((unsigned long)phys_addr); -} /** - * iser_page_to_virt - Translates page descriptor to virtual kernel address - * returns virtual kernel address - */ -inline void *iser_page_to_virt(struct page *page) -{ - return phys_to_virt(page_to_phys(page)); -} - -/** - * iser_iovec_virt_entry_addr - Retrieves address of a single IOVEC_VIRT entry - * returns IOVEC_VIRT entry address - */ -inline void *iser_iovec_virt_entry_addr(void *p_iovec_virt, int i) -{ - struct iovec *p_iovec; - p_iovec = (struct iovec *)p_iovec_virt; - return p_iovec[i].iov_base; -} - -/** - * iser_iovec_phys_entry_to_virt - Translates address of a single - * IOVEC_PHYS entry to virtual kernel address - * returns IOVEC_PHYS entry virtual address - */ -inline void *iser_iovec_phys_entry_to_virt(void *p_iovec_phys, int i) -{ - struct iovec *p_iovec; - p_iovec = (struct iovec *)p_iovec_phys; - return phys_to_virt((unsigned long)p_iovec[i].iov_base); -} - -/** * iser_sglist_entry_to_virt - Translates address of a single * SCATTERLIST entry to virtual kernel address * returns SCATTERLIST entry virtual address */ -inline void *iser_sglist_entry_to_virt(void *p_sglist, int i) +static inline void *iser_sglist_entry_to_virt(void *p_sglist, int i) { struct scatterlist *p_sg; p_sg = (struct scatterlist *)p_sglist; return phys_to_virt(page_to_phys(p_sg[i].page) + p_sg[i].offset); } -/** - * iser_iovec_entry_len - Retrieves length of a single IOVEC_PHYS/VIRT entry - * returns IOVEC_PHYS/VIRT entry lengths - */ -inline unsigned long iser_iovec_entry_len(void *p_iovec_arr, int i) -{ - struct iovec *p_iovec; - p_iovec = (struct iovec *)p_iovec_arr; - return (unsigned long)p_iovec[i].iov_len; -} /** * iser_sglist_entry_len - Retrieves length of a single SCATTERLIST entry * returns SCATTERLIST entry lengths */ -inline unsigned long iser_sglist_entry_len(void *p_sglist, int i) +static inline unsigned long iser_sglist_entry_len(void *p_sglist, int i) { struct scatterlist *p_sg; p_sg = (struct scatterlist *)p_sglist; @@ -592,9 +425,6 @@ struct iser_data_buf *p_src_data, unsigned long *p_total_copied_sz) { - unsigned char *chunk_addr = 0; - unsigned int chunk_size = 0; - int i; if (p_src_data->type == ISER_BUF_TYPE_SINGLE) { ITRACE(ISER_TRACE_PHYS_MEM_REG, "copy SINGLE virt: 0x%p -> 0x%p, " "sz: %d\n", @@ -603,66 +433,90 @@ if (p_total_copied_sz != NULL) { *p_total_copied_sz = p_src_data->size; } - return; } + else { + unsigned char *chunk_addr = 0; + unsigned int chunk_size = 0; + unsigned long total_sz = 0; + int i; - if (p_total_copied_sz != NULL) { - *p_total_copied_sz = 0; - } - - for (i = 0; i < p_src_data->size; i++) { - switch (p_src_data->type) { - case ISER_BUF_TYPE_SCATTERLIST: + for (i = 0; i < p_src_data->size; i++) { chunk_addr = (unsigned char *) iser_sglist_entry_to_virt(p_src_data->p_buf, i); chunk_size = iser_sglist_entry_len(p_src_data->p_buf, i); - break; - - default: - IPANIC("Unexpected buffer type\n"); - break; + ITRACE(ISER_TRACE_PHYS_MEM_REG, + "copy SG[%d]: 0x%p -> 0x%p, sz: %d\n", + i, chunk_addr, p_dst_buf, chunk_size); + memcpy(p_dst_buf, chunk_addr, chunk_size); + p_dst_buf += chunk_size; + total_sz += chunk_size; } - - ITRACE(ISER_TRACE_PHYS_MEM_REG, - "copy IOVEC[%d] virt: 0x%p -> 0x%p, sz: %d\n", i, - chunk_addr, p_dst_buf, chunk_size); - memcpy(p_dst_buf, chunk_addr, chunk_size); - p_dst_buf += chunk_size; if (p_total_copied_sz != NULL) { - *p_total_copied_sz += chunk_size; + *p_total_copied_sz = total_sz; } } } +void iser_data_desc_dump(struct iser_data_buf *p_data) +{ + if (p_data->type == ISER_BUF_TYPE_SINGLE) { + IINFO("single addr:0x%p sz:%d\n", + p_data->p_buf, p_data->size); + } else { + struct scatterlist *p_sg = (struct scatterlist *)p_data->p_buf; + int i; + for (i = 0; i < p_data->size; i++) { + IINFO("sg[%d] dma_addr:0x%lX page:0x%p off:%d sz:%d\n", + i, (unsigned long)p_sg[i].dma_address, + p_sg[i].page, + p_sg[i].offset, + p_sg[i].length); + } + } +} /* iser_data_desc_dump */ + /** - * iser_data_buf_get_type_name - Retrieves name of a iSER data buffer type - * returns name string + * iser_sg_subset_len - Calculates the total buffer length + * for different buffer types + * + * returns total buffer length in bytes */ -static char *iser_data_bufype_name[ISER_BUF_TYPES_NUM + 1] = { - "SINGLE", /* ISER_BUF_TYPE_SINGLE */ - "SCATTERLIST", /* ISER_BUF_TYPE_SCATTERLIST */ - - "ILLEGAL" /* ISER_BUF_TYPES_NUM */ -}; - -char *iser_data_buf_get_type_name(struct iser_data_buf *p_data) +unsigned long iser_sg_subset_len(struct iser_data_buf *p_data, + int skip_entries, + int count_entries) { - if (p_data != NULL && p_data->type < ISER_BUF_TYPES_NUM) { - return iser_data_bufype_name[p_data->type]; - } else { - return iser_data_bufype_name[ISER_BUF_TYPES_NUM]; + struct scatterlist *p_sg = (struct scatterlist *)p_data->p_buf; + unsigned long total_len = 0; + int last_entry; + int i; + + if (p_sg == NULL) { + IPANIC("NULL data buffer's sglist\n"); } -} /* iser_data_buf_get_type_name */ + last_entry = skip_entries + count_entries; + /* check for last>p_mem->size error ? */ + for (i = skip_entries; i < last_entry; i++) { + total_len += (unsigned long)p_sg[i].length; + } + ITRACE(ISER_TRACE_PHYS_MEM_REG, + "calculated total length=%ld for p_data->type = %d\n", + total_len, p_data->type); + /* total_len is expected to be > 0 */ + if (total_len == 0) { + IPANIC("total_len is 0\n"); + } + return total_len; +} /* iser_sg_subset_len */ /** - * iser_alloc_phys_mem - alloc phys_mem structure and page array + * iser_alloc_phys_desc - alloc phys_mem structure and page array * large enough to contain the translated iser data buffer */ -struct iser_phys_mem *iser_alloc_phys_mem(struct iser_data_buf *p_data, +struct iser_phys_mem *iser_alloc_phys_desc(struct iser_data_buf *p_data, int skip, int count) { - struct iser_phys_mem *pmt; + struct iser_phys_mem *phys_desc; int pages, total_size; /* compute number of elements that should be in the array */ @@ -670,49 +524,52 @@ IPANIC("count == 0 - do not expect that\n"); } - total_size = iser_get_data_total_length(p_data, skip, count); + if (p_data->type == ISER_BUF_TYPE_SINGLE) { + total_size = p_data->size; + } + else { + total_size = iser_sg_subset_len(p_data, skip, count); + } pages = total_size / PAGE_SIZE + 2; /* alloc a structure and the array */ - pmt = kmalloc(sizeof(struct iser_phys_mem) + sizeof(uint64_t) * pages, - GFP_KERNEL | __GFP_NOFAIL); - if (pmt == NULL) { + phys_desc = kmalloc(sizeof(struct iser_phys_mem) + + (sizeof(uint64_t) * pages), + GFP_KERNEL | __GFP_NOFAIL); + if (phys_desc == NULL) { IPANIC("Failed to alloc phys_mem for data size=%d " "in approximately %d pages\n", total_size, pages); return NULL; } - pmt->addrs = (uint64_t *) (pmt + 1); - pmt->data_size = total_size; - pmt->length = 0; - pmt->offset = 0; + phys_desc->addrs = (uint64_t *) (phys_desc + 1); + phys_desc->data_size = total_size; + phys_desc->length = 0; + phys_desc->offset = 0; ITRACE(ISER_TRACE_PHYS_MEM_REG, - "Allocated phys_mem %p for " - "data size=%d in approximately %d pages\n", pmt, total_size, - pages); + "Allocated phys_mem %p for size=%d in appr. %d pages\n", + phys_desc, total_size, pages); - return pmt; -} + return phys_desc; +} /* iser_alloc_phys_desc */ -void iser_free_phys_mem(struct iser_phys_mem *pmt) +void iser_free_phys_desc(struct iser_phys_mem *phys_desc) { - - if (pmt == NULL) { + if (phys_desc == NULL) { IPANIC("Called with NULL phys_mem\n"); } + ITRACE(ISER_TRACE_PHYS_MEM_REG, "Freeing phys_mem %p\n", phys_desc); + kfree(phys_desc); +} /* iser_free_phys_desc */ - ITRACE(ISER_TRACE_PHYS_MEM_REG, "Freeing phys_mem %p\n", pmt); - kfree(pmt); - -} - /** - * iser_convert_mem_to_phys - expand iov/sg elements into an + * iser_data_convert_to_phys - expand iov/sg elements into an * array of physical addresses * @return number of addresses in the expanded array */ -int iser_convert_mem_to_phys(struct iser_data_buf *p_data, - struct iser_phys_mem *p_phys, int skip, int count) +int iser_data_convert_to_phys(struct iser_data_buf *p_data, + struct iser_phys_mem *p_phys, + int skip, int count) { int phys_vec_len = 0; @@ -742,4 +599,5 @@ p_phys->length = phys_vec_len; return phys_vec_len; -} +} /* iser_data_convert_to_phys */ + Index: iser_kdapl.h =================================================================== --- iser_kdapl.h (revision 3404) +++ iser_kdapl.h (working copy) @@ -80,7 +80,7 @@ ISER_EVD_MAX_REQ_DTOS) #define ISER_MAX_ASYNC_QLEN 8 -#define ISER_MAX_TOTAL_QLEN (ISER_MAX_ENTITIES * ISER_MAX_QLEN) +#define ISER_MAX_TOTAL_QLEN ISER_MAX_QLEN int iser_create_ia_pz_evd(struct iser_adaptor *p_adaptor); @@ -97,13 +97,6 @@ int iser_event_handler_thread(void *arg_unused); -int iser_register_virt_mem(struct iser_adaptor *p_adaptor, - void *p_buf, - unsigned long data_sz, - enum dat_mem_priv_flags priv_flags, - struct iser_mem_handles *mem_reg, - DAT_RMR_CONTEXT * rmr_ctxt); - int iser_register_phys_mem(struct iser_adaptor *p_adaptor, struct iser_phys_mem *p_phys_vec, enum dat_mem_priv_flags priv_flags, Index: iser_initiator.c =================================================================== --- iser_initiator.c (revision 3404) +++ iser_initiator.c (working copy) @@ -31,12 +31,6 @@ * */ -#include -#include -#include -#include -#include - #include "iser.h" #include "iser_conn.h" #include "iser_task.h" @@ -45,123 +39,15 @@ #include "iser_memory.h" #include "iser_utils.h" -struct iser_global ig; - -iser_status iser_conn_establish(void *api_h, - void *iscsi_conn_h, - struct sockaddr_in *dst_addr, - struct sockaddr_in *src_addr); - -iser_status iser_alloc_conn_res(void *conn_h, struct iser_conn_res *conn_res); - -iser_status iser_enable_datamover(void *iser_conn_h, void *tport_conn); - -iser_status iser_conn_accept(void *conn_req_h, void *iscsi_conn_h, int accept); - -iser_status iser_send_control(void *iscsi_conn_h, - struct iser_send_pdu *p_ctrl_pdu); - -void iser_adaptor_add_conn(struct iser_adaptor *p_iser_adaptor, - struct iser_connection *p_iser_conn); - -void iser_entity_add_conn(struct iser_entity *p_entity, - struct iser_connection *p_iser_conn); - -int iser_adaptor_init(struct iser_adaptor *p_iser_adaptor, char *name); - -int iser_adaptor_release(struct iser_adaptor *p_iser_adaptor); -int ig_init(void); -int ig_release(void); - /** - * iser_conn_establish - Initiates conn establishment process. - * - * returns iSER status (ISER_SUCCESS, ISER_FAILURE, ISER_ILLEGAL_PARAM) - */ -iser_status iser_conn_establish(void *api_h, - void *iscsi_conn_h, - struct sockaddr_in *dst_addr, - struct sockaddr_in *src_addr) -{ - struct iser_entity *p_entity; - struct iser_connection *p_iser_conn; - struct iser_adaptor *p_adaptor; - char bufpool_name[128]; - int ep_create_ret; - - if (api_h == NULL || api_h == ISER_INVALID_API_H) { - printk(KERN_ERR PFX "API Handle is illegal: 0x%p\n", api_h); - return ISER_ILLEGAL_PARAM; - } - p_entity = (struct iser_entity *)api_h; - - /* Allocate and initialize iSER conn structure */ - p_iser_conn = iser_conn_alloc(); - if (p_iser_conn == NULL) { - printk(KERN_ERR PFX "iser_conn_alloc failed\n"); - goto iser_conn_establish_failure; - } - - p_iser_conn->iscsi_conn_h = iscsi_conn_h; - p_iser_conn->p_entity = p_entity; - p_adaptor = &ig.adaptor[0]; - - iser_adaptor_add_conn(p_adaptor, p_iser_conn); - iser_entity_add_conn(p_entity, p_iser_conn); - hash_add_iser_conn(p_iser_conn); - - /* Allocate post-receive buffers for the login phase */ - sprintf(bufpool_name, "%s-login-pdu-data", p_entity->provider_name); - p_iser_conn->post_recv_pool = - iser_large_bpool_create(p_iser_conn->p_adaptor, bufpool_name, - ISER_LOGIN_PHASE_PDU_DATA_LEN, 4, 0); - if (p_iser_conn->post_recv_pool == NULL) { - printk(KERN_ERR PFX "login pool = NULL\n"); - goto iser_conn_establish_failure; - } - - /* During the login phase data for sent PDUs is drawn from */ - /* the same buffer pool as recv */ - p_iser_conn->send_data_pool = p_iser_conn->post_recv_pool; - - ep_create_ret = iser_create_ep(p_iser_conn); - if (ep_create_ret != 0) - goto iser_conn_establish_failure; - - if (dst_addr != NULL) { - sprintf(p_iser_conn->name, "%d.%d.%d.%d", - NIPQUAD(dst_addr->sin_addr)); - } else { - sprintf(p_iser_conn->name, "Unknown"); - } - ITRACE(ISER_TRACE_CONN, "Connecting to: %s, port 0x%x\n", - p_iser_conn->name, dst_addr->sin_port); - - atomic_set(&p_iser_conn->state, ISER_CONN_PENDING); - - if (iser_connect(p_iser_conn, dst_addr, src_addr) != 0) { - printk(KERN_ERR PFX "iser_connect failed\n"); - goto iser_conn_establish_failure; - } - return ISER_SUCCESS; - - iser_conn_establish_failure: - if (p_iser_conn) { - atomic_set(&p_iser_conn->state, ISER_CONN_DOWN); - iser_conn_free(p_iser_conn); - } - printk(KERN_ERR PFX "%s: failed\n", __FUNCTION__); - return ISER_FAILURE; -} /* iser_conn_establish */ - -/** * iser_reg_rdma_mem - Registers memory * intended for RDMA, obtaining RMR context * * returns 0 on success, -1 on failure */ int iser_reg_rdma_mem(struct iser_task *p_iser_task, - enum iser_data_dir cmd_dir, struct iser_data_buf *p_mem) + enum iser_data_dir cmd_dir, + struct iser_data_buf *p_mem) { struct iser_dto *p_dto = NULL; struct iser_regd_buf *p_rdma_regd = NULL; @@ -187,40 +73,26 @@ cmd_dir, p_iser_task); } - p_phys_vec = iser_alloc_phys_mem(p_mem, 0, p_mem->size); + p_phys_vec = iser_alloc_phys_desc(p_mem, 0, p_mem->size); if (p_mem->type != ISER_BUF_TYPE_SINGLE) { ITRACE(ISER_TRACE_SEND_CONTROL, "converting non-single to phys\n"); - /* Determine an aligned stretch and check that - the entire vector is aligned correctly */ - aligned_len = iser_iovec_aligned_length(p_mem, 0); + aligned_len = iser_data_aligned_length(p_mem, 0); if (aligned_len != p_mem->size) { - int i; - /* Get the input iovec array */ - struct iovec *p_iovec_phys = - (struct iovec *)p_mem->p_buf;; - - printk(KERN_ERR PFX - "Can't register for rdma because of alignment " - "violation\n"); - for (i = 0; i < p_mem->size; i++) { - IINFO("iov[%d] addr=0x%lX size=%ld\n", - i, - (unsigned long)p_iovec_phys[i].iov_base, - (unsigned long)p_iovec_phys[i].iov_len); - } + printk(KERN_ERR PFX "Can't register for rdma, " + "alignment violation\n"); + iser_data_desc_dump(p_mem); ret_val = -1; goto register_rdma_memory_exit; } - phys_vec_len = iser_convert_mem_to_phys(p_mem, p_phys_vec, 0, - aligned_len); + phys_vec_len = iser_data_convert_to_phys(p_mem, p_phys_vec, + 0, aligned_len); } else { ITRACE(ISER_TRACE_SEND_CONTROL, "converting single to phys\n"); - phys_vec_len = - iser_convert_mem_to_phys(p_mem, p_phys_vec, 0, 0); + phys_vec_len = iser_data_convert_to_phys(p_mem, p_phys_vec, + 0, 0); } - /* Start initializing the RDMA regsitered memory descriptor */ memset(p_rdma_regd, 0, sizeof(struct iser_regd_buf)); ITRACE(ISER_TRACE_CONN, "memset at 0x%p to 0x%lX\n", p_rdma_regd, @@ -259,7 +131,7 @@ spin_unlock(&p_iser_task->task_lock); register_rdma_memory_exit: - iser_free_phys_mem(p_phys_vec); + iser_free_phys_desc(p_phys_vec); return ret_val; } /* iser_reg_rdma_mem */ @@ -439,49 +311,42 @@ * operating in the remote iSCSI node. The iSER datamover layer sends * the PDU after adding its own protocol header. */ -iser_status iser_send_control(void *iscsi_conn_h, - struct iser_send_pdu * p_ctrl_pdu) +int iser_send_control(void *iser_conn_h, + struct iser_send_pdu * p_ctrl_pdu) { - iser_status iser_ret = ISER_SUCCESS; + struct iser_connection *p_iser_conn = iser_conn_h; union iser_pdu_bhs *p_bhs; - struct iser_connection *p_iser_conn = NULL; struct iser_dto *p_send_dto = NULL; struct iser_task *p_iser_task = NULL; unsigned long buf_offset; unsigned long data_seg_len; unsigned int itt; unsigned char opcode; + int iser_err = 0; int ret_val; - /* Check the parameters correctness */ + if (p_iser_conn == NULL) { + printk(KERN_ERR PFX "NULL conn handle\n"); + iser_err = -EINVAL; + goto send_control_error; + } if (p_ctrl_pdu == NULL) { printk(KERN_ERR PFX "NULL control PDU descriptor\n"); - iser_ret = ISER_ILLEGAL_PARAM; + iser_err = -EINVAL; goto send_control_error; } p_bhs = p_ctrl_pdu->p_bhs; if (p_bhs == NULL) { printk(KERN_ERR PFX "NULL BHS in control PDU descriptor\n"); - iser_ret = ISER_ILLEGAL_PARAM; + iser_err = -EINVAL; goto send_control_error; } - /* Look up the conn */ - p_iser_conn = hash_find_iser_conn(iscsi_conn_h); - - if (p_iser_conn == NULL) { - printk(KERN_ERR PFX "Failed to find conn, iscsi_conn_h: %lX\n", - (unsigned long)iscsi_conn_h); - iser_ret = ISER_INVALID_CONN; - goto send_control_error; - } if (atomic_read(&p_iser_conn->state) != ISER_CONN_UP) { - printk(KERN_ERR PFX - "Failed to send_control as conn is not up, " - "iscsi_conn_h: %lX, p_conn: 0x%p\n", - (unsigned long)iscsi_conn_h, p_iser_conn); + printk(KERN_ERR PFX "Failed to send, conn: 0x%p is not up\n", + p_iser_conn); p_iser_conn = NULL; /* Inhibits conn shutdown */ - iser_ret = ISER_FAILURE; + iser_err = -EPERM; goto send_control_error; } @@ -491,7 +356,7 @@ if (p_send_dto == NULL) { printk(KERN_ERR PFX "Failed to create send DTO, conn: 0x%p\n", p_iser_conn); - iser_ret = ISER_FAILURE; + iser_err = -ENOMEM; goto send_control_error; } @@ -509,7 +374,7 @@ /* Retrieve ExpectedDataTransferLength from the Command BHS */ - edtl = ntohl(p_bhs->dword.other[ISCSI_CMD_FIELD_EDTL]); + edtl = ntohl(p_bhs->dword.other[ISCSI_CMD_F_EDTL]); /* Allocate new task descriptor */ p_iser_task = iser_task_alloc(p_iser_conn, itt); @@ -517,7 +382,7 @@ printk(KERN_ERR PFX "Failed to alloc iser task, conn: " "0x%p, itt: %d\n", p_iser_conn, itt); - iser_ret = ISER_FAILURE; + iser_err = -ENOMEM; goto send_control_error; } p_send_dto->p_task = p_iser_task; @@ -532,11 +397,11 @@ edtl, p_iser_header); if (ret_val) { - iser_ret = ISER_FAILURE; + iser_err = -EINVAL; goto send_control_error; } - } - /* read cmd */ + } /* read cmd */ + if (IS_SET_ISCSI_FLAG_WRITE_CMD(p_bhs->byte.flags)) { ret_val = iser_prepare_write_cmd(p_iser_task, p_ctrl_pdu, @@ -544,10 +409,10 @@ edtl, p_iser_header); if (ret_val) { - iser_ret = ISER_FAILURE; + iser_err = -EINVAL; goto send_control_error; } - } /* write cmd */ + } /* write cmd */ } break; /* ISCSI_OP_SCSI_CMD */ @@ -557,15 +422,14 @@ } /* Retrive BufferOffset from Data-OUT BHS */ - buf_offset = - ntohl(p_ctrl_pdu->p_bhs-> - dword.other[ISCSI_DATA_OUT_FIELD_OFFSET]); + buf_offset = ntohl( + p_ctrl_pdu->p_bhs->dword.other[ISCSI_DOUT_F_OFFSET]); /* Find task in the hash */ p_iser_task = hash_find_iser_task(p_iser_conn, itt); if (p_iser_task == NULL) { printk(KERN_ERR PFX "Task not found, itt=%d\n", itt); - iser_ret = ISER_INVALID_ITT; + iser_err = -EINVAL; goto send_control_error; } if (!p_iser_task->dir[ISER_DIR_OUT]) { @@ -589,7 +453,7 @@ "len (%ld), itt=%d\n", buf_offset, data_seg_len, p_iser_task->data_len[ISER_DIR_OUT], itt); - iser_ret = ISER_ILLEGAL_PARAM; + iser_err = -EINVAL; goto send_control_error; } @@ -615,29 +479,22 @@ /* Allocate data regd buffer and copy the user data */ iser_dto_copy_send_data(p_send_dto, &p_ctrl_pdu->data.tx.buf); - - /* We compare the actual buf size (total_buf_sz) - * to the dsl in the BHS (data_seg_len). - * We assume that padding (up to 4 bytes) - * may be used - */ } break; default: printk(KERN_ERR PFX "Unsupported opcode = %d\n", opcode); - iser_ret = ISER_ILLEGAL_PARAM; + iser_err = -EINVAL; goto send_control_error; break; + } - } /* switch p_bhs->byte.opcode */ - /* Post-Receive buffer for a reply PDU */ /* ToDo: account for different types of NOP-OUT (req, resp) */ if (opcode != ISCSI_OP_DATA_OUT) { if (iser_post_receive_control(p_iser_conn) != 0) { printk(KERN_ERR PFX "post_rcv_buff failed!\n"); - iser_ret = ISER_FAILURE; + iser_err = -ENOMEM; /* ISER_ERROR: 12.1.3.1 (insufficient res) */ goto send_control_error; } @@ -653,11 +510,11 @@ } if (iser_start_dto(p_send_dto) != 0) { printk(KERN_ERR PFX "Failed to start send DTO\n"); - iser_ret = ISER_FAILURE; + iser_err = -EIO; goto send_control_error; } - return ISER_SUCCESS; + return 0; send_control_error: if (p_send_dto != NULL) { @@ -671,14 +528,19 @@ iser_conn_async_terminate(p_iser_conn); } - return iser_ret; + return iser_err; } /* iser_send_control */ +/** + * iser_rcv_dto_task: task-related recv DTO completion + * + */ void -iser_task_rcv_dto(struct iser_dto *p_dto, +iser_rcv_dto_task(struct iser_dto *p_dto, struct iser_connection *p_iser_conn, - int itt, unsigned char opcode) + int itt, + unsigned char opcode) { struct iser_task *p_iser_task = NULL; unsigned int rx_count; @@ -693,11 +555,9 @@ return; p_iser_task = hash_find_iser_task(p_iser_conn, itt); - if (p_iser_task == NULL) return; - /* The rcv buffer belongs now to the task */ p_dto->p_task = p_iser_task; /* Account for the used up posted recv buffer */ iser_task_recvd_pdu_count_inc(p_iser_task); @@ -716,18 +576,21 @@ itt, rx_count); } iser_task_ctrl_notify_count_inc(p_iser_task); -} +} /* iser_rcv_dto_task */ -void iser_dto_rcv(struct iser_dto *p_dto, unsigned long dto_xfer_len) +/** + * iser_rcv_dto_completion - recv DTO completion + * + */ +void iser_rcv_dto_completion(struct iser_dto *p_dto, + unsigned long dto_xfer_len) { + struct iser_connection *p_iser_conn = p_dto->p_conn; union iser_pdu_bhs *p_bhs; - struct iser_connection *p_iser_conn; unsigned char opcode; unsigned int itt; - p_iser_conn = p_dto->p_conn; - - /* Number of posted receive buffers has decreased by one */ + /* Number of posted receive buffers has decreased by one */ atomic_dec(&p_iser_conn->post_recv_buf_count); /* Create descriptor for the received PDU */ @@ -739,24 +602,24 @@ /* Retrieve opcode */ opcode = p_bhs->byte.opcode & ISCSI_OPCODE_MASK; - iser_task_rcv_dto(p_dto, p_iser_conn, itt, opcode); + iser_rcv_dto_task(p_dto, p_iser_conn, itt, opcode); /* Notify iSCSI layer */ iser_conn_add_recv_ctrl_dto(p_iser_conn, p_dto); ITRACE(ISER_TRACE_CTRL_NOTIFY, "Control notify, DTO:0x%p, as PDU:0x%p\n", p_dto, &p_dto->as.recv.pdu); - p_iser_conn->p_entity->api_cb. - control_notify(p_iser_conn->iscsi_conn_h, &p_dto->as.recv.pdu); -} -void iser_dto_snd(struct iser_dto *p_dto, unsigned long dto_xfer_len) + ig.api_cb.control_notify(p_iser_conn->iscsi_conn_h, + &p_dto->as.recv.pdu); +} /* iser_rcv_dto_completion */ + +void iser_snd_dto_completion(struct iser_dto *p_dto, + unsigned long dto_xfer_len) { + struct iser_connection *p_iser_conn = p_dto->p_conn; struct iser_task *p_iser_task = NULL; - struct iser_connection *p_iser_conn; - p_iser_conn = p_dto->p_conn; - ITRACE(ISER_TRACE_SEND_DTO, "Initiator, Data sent p_dto=0x%p\n", p_dto); /* One posted send DTO less */ @@ -773,724 +636,26 @@ iser_task_post_send_count_dec_and_test(p_iser_task)) { iser_task_free(p_iser_task); } -} +} /* iser_snd_dto_completion */ /** * iser_dto_completion - Handle a successful * DTO completion at Initiator * - * returns 0 on success, -1 on failure */ -int iser_dto_completion(struct iser_dto *p_dto, unsigned long dto_xfer_len) +void iser_dto_completion(struct iser_dto *p_dto, + unsigned long dto_xfer_len) { - - if (p_dto->p_conn == NULL) { - IPANIC("NULL conn in p_dto: 0x%p\n", p_dto); - } - switch (p_dto->type) { case ISER_DTO_RCV: - iser_dto_rcv(p_dto, dto_xfer_len); + iser_rcv_dto_completion(p_dto, dto_xfer_len); break; - case ISER_DTO_SEND: /* handle sent messages */ - iser_dto_snd(p_dto, dto_xfer_len); + case ISER_DTO_SEND: + iser_snd_dto_completion(p_dto, dto_xfer_len); + break; default: IPANIC("Illegal initiator iSER DTO type: %d\n", p_dto->type); break; } - - return 0; } /* iser_dto_completion */ -/** - * iser_alloc_conn_res - iSER API. Implements - * Allocate_Connection_Resources primitive. - * - * Performs exchange of API functions. Allocates additional receive - * buffers. Performs static memory registration if instructed to do so. - * - * returns iSER status (ISER_SUCCESS, ISER_FAILURE, ISER_INVALID_CONN) - */ -iser_status iser_alloc_conn_res(void *iscsi_conn_h, - struct iser_conn_res * conn_res) -{ - struct iser_entity *p_entity; - struct iser_connection *p_iser_conn; - int uncache_buf_pools = 0; - iser_status iser_ret = ISER_SUCCESS; - char bufpool_name[ISER_BUF_POOL_NAME_SIZE]; - - p_entity = (struct iser_entity *)conn_res->api_h; - if (p_entity == NULL) { - printk(KERN_ERR PFX "NULL API handle\n"); - iser_ret = ISER_ILLEGAL_PARAM; - /* ISER_ERROR: 12.1.3.1 (insufficient res) */ - goto alloc_conn_res_exit; - } - - if (conn_res->max_recv_pdu_sz > 256 * 1024) { - printk(KERN_ERR PFX "max_recv_pdu_sz too large\n"); - iser_ret = ISER_ILLEGAL_PARAM; - /* ISER_ERROR: 12.1.3.1 (insufficient res) */ - goto alloc_conn_res_exit; - } - - p_iser_conn = hash_find_iser_conn(iscsi_conn_h); - if (p_iser_conn == NULL) { - printk(KERN_ERR PFX - "Could not find iSER conn for iscsi_h = 0x%p\n", - iscsi_conn_h); - iser_ret = ISER_INVALID_CONN; - /* ISER_ERROR: 12.1.3.1 (insufficient res) */ - goto alloc_conn_res_exit; - } - if (p_iser_conn->p_entity != p_entity) { - printk(KERN_ERR PFX "Entity in conn_res does not match " - "conn for iscsi_h = 0x%p\n", p_iser_conn->iscsi_conn_h); - iser_ret = ISER_ILLEGAL_PARAM; - /* ISER_ERROR: 12.1.3.1 (insufficient res) */ - goto alloc_conn_res_exit; - } - - p_iser_conn->max_outstand_cmds = conn_res->max_outstand_cmds; - - if (p_iser_conn->param.InitiatorRecvDataSegmentLength == - defaultInitiatorRecvDataSegmentLength) { - p_iser_conn->param.InitiatorRecvDataSegmentLength = - conn_res->max_recv_pdu_sz; - } else if (p_iser_conn->param.InitiatorRecvDataSegmentLength != - conn_res->max_recv_pdu_sz) { - printk(KERN_ERR PFX - "max_recv_pdu_sz(%d) does not match non-default value " - "of InitiatorRecvDataSegmentLength(%d)\n", - conn_res->max_recv_pdu_sz, - p_iser_conn->param.InitiatorRecvDataSegmentLength); - iser_ret = ISER_ILLEGAL_PARAM; - /* ISER_ERROR: 12.1.3.1 (insufficient res) */ - goto alloc_conn_res_exit; - } - - p_iser_conn->alloc_post_recv_bufs_num = ISER_EP_AVG_POST_RECV + 2; - p_iser_conn->initial_post_recv_bufs_num = ISER_INITIAL_POST_RECV + 2; - - ITRACE(ISER_TRACE_CONN, - "Max outst. cmds: %d, Allocate post recv bufs:" - "%d, Initially post: %d\n", - p_iser_conn->max_outstand_cmds, - p_iser_conn->alloc_post_recv_bufs_num, - p_iser_conn->initial_post_recv_bufs_num); - - /* Allocate post-receive buffers for the full-featured phase */ - snprintf(bufpool_name, ISER_BUF_POOL_NAME_SIZE, "%s-%s-post-recv", - p_entity->provider_name, p_iser_conn->name); - - p_iser_conn->spare_post_recv_pool = - iser_small_bpool_create(p_iser_conn->p_adaptor, - bufpool_name, - 128, - p_iser_conn->alloc_post_recv_bufs_num, - uncache_buf_pools); - if (p_iser_conn->spare_post_recv_pool == NULL) { - printk(KERN_ERR PFX "Failed to alloc the post receive buffer " - "pool for conn = 0x%p, iscsi_h = 0x%p\n", - p_iser_conn, p_iser_conn->iscsi_conn_h); - iser_ret = ISER_ILLEGAL_PARAM; - } - - /* Allocate send-data buffers for the full-featured phase */ - snprintf(bufpool_name, ISER_BUF_POOL_NAME_SIZE, "%s-%s-send-data", - p_entity->provider_name, p_iser_conn->name); - - /* ToDo: TargetMaxRecvDSL */ - p_iser_conn->spare_send_data_pool = - iser_large_bpool_create(p_iser_conn->p_adaptor, bufpool_name, - 8 * 1024, - ISER_MAX_NOP_OUT + ISER_MAX_TASK_MGT_REQ, - uncache_buf_pools); - if (p_iser_conn->spare_send_data_pool == NULL) { - printk(KERN_ERR PFX - "Failed to alloc the send data buffer pool " - "for conn = 0x%p, iscsi_h = 0x%p\n", p_iser_conn, - p_iser_conn->iscsi_conn_h); - iser_ret = ISER_ILLEGAL_PARAM; - } - - alloc_conn_res_exit: - return iser_ret; -} /* iser_alloc_conn_res */ - -/** - * iser_enable_datamover - iSER API. Implements - * Enable_Datamover primitive. - * - * Posts receive buffers. - * Upgrades the conn to RDMA-assisted. - * - * returns iSER status (ISER_SUCCESS, ISER_FAILURE, ISER_INVALID_CONN) - */ -iser_status iser_enable_datamover(void *iscsi_conn_h, void *tport_conn) -{ - iser_status iser_ret = ISER_SUCCESS; - struct iser_connection *p_iser_conn; - int i; - - /* Find the conn */ - p_iser_conn = hash_find_iser_conn(iscsi_conn_h); - if (p_iser_conn == NULL) { - iser_ret = ISER_INVALID_CONN; - /* ISER_ERROR: 12.1.3.1 (insufficient res) */ - goto enable_data_mover_exit; - } - - /* Check that there is no posted recv or send buffers left - */ - /* they must be consumed during the login phase */ - if (atomic_read(&p_iser_conn->post_recv_buf_count) != 0) { - IPANIC - ("Number of currently posted receive buffers non-zero\n"); - } - if (atomic_read(&p_iser_conn->post_send_buf_count) != 0) { - IPANIC("Number of currently posted send buffers non-zero\n"); - } - /* If a buffer pool has been allocd for loginphase, destroy it */ - if (p_iser_conn->post_recv_pool != NULL) { - iser_bpool_destroy(p_iser_conn->post_recv_pool); - } - - /* Switch to the main post-recv buffer pool */ - p_iser_conn->post_recv_pool = p_iser_conn->spare_post_recv_pool; - p_iser_conn->spare_post_recv_pool = NULL; - - /* Switch to the main send-data buffer pool */ - p_iser_conn->send_data_pool = p_iser_conn->spare_send_data_pool; - p_iser_conn->spare_send_data_pool = NULL; - - /* Initial Post-Receive buffers */ - for (i = 0; i < p_iser_conn->initial_post_recv_bufs_num; i++) { - if (iser_post_receive_control(p_iser_conn) != 0) { - printk(KERN_ERR PFX "Failed to post recv buffers\n"); - /* ToDo: recovery? */ - iser_ret = ISER_FAILURE; - goto enable_data_mover_exit; - } - } - ITRACE(ISER_TRACE_CONN, "Allocated %d post recv bufs\n", i); - - enable_data_mover_exit: - return iser_ret; -} /* iser_enable_datamover */ - -/* -- from ig.c */ -/** - * iser_adaptor_init - Initializes iSER adaptor structure. - * - * - * Creates adaptor-scope objects (Interface Adaptor, Protection Zone, - * Public Service Points). - * - * returns 0 on success, -1 on failure - */ -int iser_adaptor_init(struct iser_adaptor *p_iser_adaptor, char *name) -{ - if (p_iser_adaptor == NULL) - return -1; - - memset(p_iser_adaptor, 0, sizeof(struct iser_adaptor)); - ITRACE(ISER_TRACE_CONN, "memset at 0x%p to 0x%lX\n", - p_iser_adaptor, - ((long)p_iser_adaptor) + sizeof(struct iser_adaptor)); - - /* Initialize waiting queue for the event handler thread */ - init_waitqueue_head(&p_iser_adaptor->dat_events_wait_q); - init_waitqueue_head(&p_iser_adaptor->connect_wait_q); - - /* Initialize list of conns */ - spin_lock_init(&p_iser_adaptor->conn_lock); - INIT_LIST_HEAD(&p_iser_adaptor->conn_list); - - /* Create IA, PZ, EVD */ - if (iser_create_ia_pz_evd(p_iser_adaptor) != 0) { - return -1; - } - - /* Allocate pool of pre-registered iSER headers */ - p_iser_adaptor->header_pool = - iser_small_bpool_create(p_iser_adaptor, "headers", - ISER_TOTAL_HEADERS_LEN, 0, 1); - if (p_iser_adaptor->header_pool == NULL) { - iser_adaptor_release(p_iser_adaptor); - return -1; - } - - /* Initizlize the pre-registered buffers cache */ - iser_init_regd_buff_cache(p_iser_adaptor); - - /* Start the event thread */ - p_iser_adaptor->terminate_thread = 0; - init_MUTEX_LOCKED(&p_iser_adaptor->startstop_sem); - p_iser_adaptor->event_thrd_pid = - kernel_thread(iser_event_handler_thread, - p_iser_adaptor, - CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND); - - if (p_iser_adaptor->event_thrd_pid <= 0) { - printk(KERN_ERR PFX "Failed to start event kernel thread\n"); - iser_adaptor_release(p_iser_adaptor); - return -1; - } - printk("p1\n"); - /* wait for the event thread to start */ - down(&p_iser_adaptor->startstop_sem); - - return 0; -} /* iser_adaptor_init */ - -/** - * iser_adaptor_release - Releases all adaptor-related res. - * - * returns 0 on success, -1 on failure - */ -int iser_adaptor_release(struct iser_adaptor *p_iser_adaptor) -{ - struct iser_connection *p_iser_conn; - - /* Free all conns and associated objects, - must be done before freeing adaptor objects */ - while (!list_empty(&p_iser_adaptor->conn_list)) { - p_iser_conn = list_entry(p_iser_adaptor->conn_list.next, - struct iser_connection, adaptor_list); - /* Connection should be shut down before releasing res */ - iser_conn_sync_terminate(p_iser_conn); - iser_conn_free(p_iser_conn); - } - /* Release buffer pool and unregister its memory */ - if (p_iser_adaptor->header_pool != NULL) { - iser_bpool_destroy(p_iser_adaptor->header_pool); - p_iser_adaptor->header_pool = NULL; - } - - if (iser_release_regd_buff_cache(p_iser_adaptor) != 0) { - ITRACE(ISER_TRACE_ERRORS, - "iser_release_regd_buff_cache failed\n"); - } - - /* Free adaptor-related objects */ - if (iser_free_ia_pz_evd(p_iser_adaptor) != 0) { - return -1; - } - - /* DAN: kill event thread (if exists) */ - if (p_iser_adaptor->event_thrd_pid > 0) { - ITRACE(ISER_TRACE_EVENT_THREAD, - "start terminating event thread\n"); - lock_kernel(); - init_MUTEX_LOCKED(&p_iser_adaptor->startstop_sem); - mb(); - p_iser_adaptor->terminate_thread = 1; - mb(); - kill_proc(p_iser_adaptor->event_thrd_pid, SIGKILL, 1); - ITRACE(ISER_TRACE_EVENT_THREAD, - "waiting for event thread to terminate down\n"); - down(&p_iser_adaptor->startstop_sem); - unlock_kernel(); - ITRACE(ISER_TRACE_EVENT_THREAD, "event thread terminated\n"); - } - - return 0; -} /* iser_adaptor_release */ - -/** - * iser_adaptor_add_conn - Adds a conn to adaptor - */ -void iser_adaptor_add_conn(struct iser_adaptor *p_iser_adaptor, - struct iser_connection *p_iser_conn) -{ - p_iser_conn->p_adaptor = p_iser_adaptor; - spin_lock(&p_iser_adaptor->conn_lock); - list_add(&p_iser_conn->adaptor_list, &p_iser_adaptor->conn_list); - spin_unlock(&p_iser_adaptor->conn_lock); - -} /* iser_adaptor_add_conn */ - -/** - * iser_adaptor_find_conn - Adds a conn to adaptor - * - * returns 0 on success, -1 on failure - */ -struct iser_connection *iser_adaptor_find_conn(struct iser_adaptor - *p_iser_adaptor, - void *ep_handle) -{ - - struct iser_connection *p_iser_conn = NULL; - struct list_head *p_list; - - spin_lock(&p_iser_adaptor->conn_lock); - p_list = p_iser_adaptor->conn_list.next; - while (p_list != &p_iser_adaptor->conn_list) { - p_iser_conn = list_entry(p_list, struct iser_connection, - adaptor_list); - if (((void *)p_iser_conn->ep_handle) == ep_handle) - break; - p_iser_conn = NULL; - p_list = p_list->next; - } - spin_unlock(&p_iser_adaptor->conn_lock); - - return p_iser_conn; -} /* iser_adaptor_find_conn */ - -/** - * iser_adaptor_remove_conn - Removes a conn from adaptor - */ -void iser_adaptor_remove_conn(struct iser_connection *p_iser_conn) -{ - struct iser_adaptor *p_iser_adaptor; - - if (!list_empty(&p_iser_conn->adaptor_list)) { - p_iser_adaptor = p_iser_conn->p_adaptor; - if (p_iser_adaptor == NULL) { - IPANIC("NULL adaptor in conn: 0x%p\n", p_iser_conn); - } - spin_lock(&p_iser_adaptor->conn_lock); - list_del(&p_iser_conn->adaptor_list); - spin_unlock(&p_iser_adaptor->conn_lock); - } - -} /* iser_adaptor_remove_conn */ - -/** - * iser_entity_add_conn - Adds a conn to entity - */ -void iser_entity_add_conn(struct iser_entity *p_entity, - struct iser_connection *p_iser_conn) -{ - p_iser_conn->p_entity = p_entity; - spin_lock(&p_entity->conn_lock); - list_add(&p_iser_conn->entity_list, &p_entity->conn_list); - spin_unlock(&p_entity->conn_lock); - -} /* iser_entity_add_conn */ - -/** - * iser_entity_remove_conn - Removes a conn from adaptor - */ -void iser_entity_remove_conn(struct iser_connection *p_iser_conn) -{ - struct iser_entity *p_entity; - - if (!list_empty(&p_iser_conn->entity_list)) { - p_entity = p_iser_conn->p_entity; - if (p_entity == NULL) { - IPANIC("NULL entity in conn: 0x%p\n", p_iser_conn); - } - spin_lock(&p_entity->conn_lock); - list_del(&p_iser_conn->entity_list); - spin_unlock(&p_entity->conn_lock); - } -} /* iser_entity_remove_conn */ - -/** - * ig_init - Initializes the global iSER context structure. - * - * returns 0 on success, -1 on failure - */ -int ig_init() -{ - memset(&ig, 0, sizeof(struct iser_global)); - - /* Allocate adaptors; currently single adaptor */ - ig.num_adaptors = 1; - if (iser_adaptor_init(&ig.adaptor[0], "InfiniHost0") != 0) { - printk(KERN_ERR PFX "initializing iser failed!\n"); - iser_adaptor_release(&ig.adaptor[0]); - return -1; - } - - /* Start with no entities defined */ - ig.num_entities = 0; - - /* Allocate kmem_cache for iser_connection structures */ - ig.conn_mem_cache = - kmem_cache_create("iser_conn", sizeof(struct iser_connection), 0, - SLAB_HWCACHE_ALIGN, NULL, NULL); - if (ig.conn_mem_cache == NULL) { - printk(KERN_ERR PFX - "Failed to alloc conn_mem_cache, name: iser_conn\n"); - return -1; - } - - /* Allocate kmem_cache for iser_task structures */ - ig.task_mem_cache = - kmem_cache_create("iser_task", sizeof(struct iser_task), 0, - SLAB_HWCACHE_ALIGN, NULL, NULL); - - if (ig.task_mem_cache == NULL) { - printk(KERN_ERR PFX - "Failed to alloc task_mem_cache, name: iser_task\n"); - return -1; - } - - /* Allocate kmem_cache for iser_dto structures, for post-recv */ - ig.recv_dto_mem_cache = - kmem_cache_create("iser_recv_dto", - sizeof(struct iser_dto), - 0, SLAB_HWCACHE_ALIGN, NULL, NULL); - if (ig.recv_dto_mem_cache == NULL) { - printk(KERN_ERR PFX - "Failed to alloc recv_dto_mem_cache, " - "name: iser_recv_dto\n"); - return -1; - } - - /* Allocate kmem_cache for iser_dto structures, for send */ - ig.send_dto_mem_cache = - kmem_cache_create("iser_send_dto", - sizeof(struct iser_dto), - 0, SLAB_HWCACHE_ALIGN, NULL, NULL); - if (ig.send_dto_mem_cache == NULL) { - printk(KERN_ERR PFX - "Failed to alloc send_dto_mem_cache, iser_send_dto\n"); - return -1; - } - - /* Allocate kmem_cache for iser_regd_buf structures */ - ig.regd_buf_mem_cache = - kmem_cache_create("iser_regbuf", - sizeof(struct iser_regd_buf), - 0, SLAB_HWCACHE_ALIGN, NULL, NULL); - if (ig.regd_buf_mem_cache == NULL) { - printk(KERN_ERR PFX - "Failed to alloc regd_buf_mem_cache, " - "name: iser_regbuf\n"); - return -1; - } - - /* Initialize task hash table */ - hash_init(&ig.task_hash); - /* Initialize conns hash table */ - hash_init(&ig.conn_hash); - - return 0; -} /* ig_init */ - -/** - * ig_release - Releases all res through - * the global iSER context structure. - * - * returns 0 on success, -1 on failure - */ -int ig_release() -{ - - /* Are all entities released? */ - if (ig.num_entities > 0) { - printk(KERN_ERR PFX - "Some entities did not unregister. " - "Can't release module.\n"); - return -1; - } - /* Release all adaptors */ - iser_adaptor_release(&ig.adaptor[0]); - ig.num_adaptors = 0; - - if (ig.conn_mem_cache != NULL) { - kmem_cache_destroy(ig.conn_mem_cache); - ig.conn_mem_cache = NULL; - } - - if (ig.task_mem_cache != NULL) { - kmem_cache_destroy(ig.task_mem_cache); - ig.task_mem_cache = NULL; - } - - if (ig.recv_dto_mem_cache != NULL) { - kmem_cache_destroy(ig.recv_dto_mem_cache); - ig.recv_dto_mem_cache = NULL; - } - - if (ig.send_dto_mem_cache != NULL) { - kmem_cache_destroy(ig.send_dto_mem_cache); - ig.send_dto_mem_cache = NULL; - } - - if (ig.regd_buf_mem_cache != NULL) { - kmem_cache_destroy(ig.regd_buf_mem_cache); - ig.regd_buf_mem_cache = NULL; - } - - return 0; -} /* ig_release */ - -/** - * iser_find_regd_entity - Look for an iSCSI entity - * using its type and provider name as the key. - * - * returns entity descriptor if found, NULL - if not found - */ -struct iser_entity *iser_find_regd_entity(char *provider_name) -{ - struct iser_entity *p_entity = NULL; - int i; - - for (i = 0; i < ISER_MAX_ENTITIES; i++) { - if (ig.entity[i].registered && - strncmp(provider_name, - ig.entity[i].provider_name, 64) == 0) { - p_entity = &ig.entity[i]; - break; - } - } - - return p_entity; -} /* iser_find_regd_entity */ - -/** - * iser_find_free_entity - Look for a free entity - * descriptor within the global context struct. - * - * returns index into the entities array if found, -1 if not found - */ - -static int iser_find_free_entity(void) -{ - int i; - - for (i = 0; i < ISER_MAX_ENTITIES; i++) { - if (!ig.entity[i].registered) - return i; - } - /* no free entities */ - return -1; -} /* iser_find_free_entity */ - -/** - * iser_api_register - register with iser extended api - * - * returns iSER status - */ -iser_status -iser_api_register(char *provider_name, - struct iser_api * api, - struct iser_api_cb * api_cb, void **p_api_h) -{ - iser_status iser_ret = ISER_SUCCESS; - struct iser_entity *p_entity; - int entity_num; - - if (p_api_h == NULL) { - printk(KERN_ERR PFX "NULL *p_api_h \n"); - iser_ret = ISER_ILLEGAL_PARAM; - goto api_register_exit; - } - *p_api_h = ISER_INVALID_API_H; - - if (provider_name == NULL) { - printk(KERN_ERR PFX "NULL *provider_name\n"); - iser_ret = ISER_ILLEGAL_PARAM; - goto api_register_exit; - } - if (api == NULL) { - printk(KERN_ERR PFX "NULL *api structure\n"); - iser_ret = ISER_ILLEGAL_PARAM; - goto api_register_exit; - } - if (api_cb == NULL) { - printk(KERN_ERR PFX "NULL *api_cb structure\n"); - iser_ret = ISER_ILLEGAL_PARAM; - goto api_register_exit; - } - - api->conn_establish = iser_conn_establish; - api->send_control = iser_send_control; - api->release_control = iser_release_control; - api->alloc_conn_res = iser_alloc_conn_res; - api->notice_key_values = iser_notice_key_values; - api->enable_datamover = iser_enable_datamover; - api->conn_terminate = iser_connectionerminate; - api->dealloc_conn_res = iser_dealloc_conn_res; - api->dealloc_task_res = iser_dealloc_task_res; - - p_entity = iser_find_regd_entity(provider_name); - if (p_entity != NULL) { - printk(KERN_ERR PFX "This entity is already registered: %s\n", - provider_name); - iser_ret = ISER_FAILURE; - goto api_register_exit; - } - entity_num = iser_find_free_entity(); - if (entity_num < 0) { - printk(KERN_ERR PFX - "Failed to find a free entity entry for: %s\n", - provider_name); - iser_ret = ISER_FAILURE; - goto api_register_exit; - } - p_entity = &ig.entity[entity_num]; - memset(p_entity, 0, sizeof(struct iser_entity)); - - p_entity->api_cb.conn_establish_notify = api_cb->conn_establish_notify; - p_entity->api_cb.conn_terminate_notify = api_cb->conn_terminate_notify; - p_entity->api_cb.control_notify = api_cb->control_notify; - p_entity->registered = 1; - *p_api_h = (void *)p_entity; - - ig.num_entities++; - strncpy(p_entity->provider_name, provider_name, 64); - spin_lock_init(&p_entity->conn_lock); - INIT_LIST_HEAD(&p_entity->conn_list); - - api_register_exit: - return iser_ret; -} /* iser_api_register */ - -/** - * iser_api_unregister - Unregister API entity - * - * returns iSER status - */ -iser_status iser_api_unregister(void *api_h) -{ - iser_status iser_ret = ISER_SUCCESS; - struct iser_entity *p_entity; - struct iser_connection *p_iser_conn; - - if (api_h == NULL || api_h == ISER_INVALID_API_H) { - printk(KERN_ERR PFX "Invalid API registration handle"); - iser_ret = ISER_ILLEGAL_PARAM; - goto api_unregister_exit; - } - p_entity = (struct iser_entity *)api_h; - - if (!p_entity->registered) { - printk(KERN_ERR PFX - "Trying to deregister unregistered entity\n"); - iser_ret = ISER_ILLEGAL_PARAM; - goto api_unregister_exit; - } - - /* Terminate and release all conns on that entity */ - spin_lock(&p_entity->conn_lock); - while (!list_empty(&p_entity->conn_list)) { - p_iser_conn = list_entry(p_entity->conn_list.next, - struct iser_connection, entity_list); - spin_unlock(&p_entity->conn_lock); - /* ToDo: we'll have to wait - is it OK? */ - iser_conn_sync_terminate(p_iser_conn); - spin_lock(&p_entity->conn_lock); - } - spin_unlock(&p_entity->conn_lock); - - p_entity->registered = 0; - if (ig.num_entities > 0) { - ig.num_entities--; - } else { - IPANIC("Zero registered entities when " - "trying to unregister one\n"); - } - - api_unregister_exit: - return iser_ret; -} /* iser_api_unregister */ - -EXPORT_SYMBOL(iser_api_unregister); -EXPORT_SYMBOL(iser_api_register); Index: iser_conn.h =================================================================== --- iser_conn.h (revision 3404) +++ iser_conn.h (working copy) @@ -36,40 +36,42 @@ #include "iser.h" -/* iSER API Functions */ +/* adaptor-related */ +int iser_adaptor_init(struct iser_adaptor *p_iser_adaptor, char *name); +int iser_adaptor_release(struct iser_adaptor *p_iser_adaptor); +struct iser_connection *iser_adaptor_find_conn( + struct iser_adaptor *p_iser_adaptor, void *ep_handle); -iser_status iser_notice_key_values(void *conn_h, char *key, char *value); -iser_status iser_connectionerminate(void *iscsi_conn_h); +/* iSER connection related API */ +int iser_conn_bind(void *iscsi_conn_h, + struct socket *sock, + void **iser_conn_h); +int iser_conn_enable_rdma(void *iser_conn_h, + struct iser_conn_res *conn_res); +int iser_notice_key_values(void *iser_conn_h, char *key, char *value); +int iser_release_control(void *iser_conn_h, + struct iser_recv_pdu *p_ctrl_pdu); +int iser_conn_term(void *iser_conn_h); +int iser_dealloc_conn_res(void *iser_conn_h); -iser_status iser_dealloc_conn_res(void *iscsi_conn_h); - -/* Extension */ -iser_status iser_release_control(void *iscsi_conn_h, - struct iser_recv_pdu *p_ctrl_pdu); - -/* Internal iSER Functions */ - -struct iser_connection *iser_conn_alloc(void); -void iser_conn_free(struct iser_connection *iser_conn); - +/* internal connection handling */ +void iser_conn_init(struct iser_connection *p_iser_conn); +int iser_conn_establish(struct iser_connection *p_iser_conn, + struct sockaddr_in *dst_addr, + struct sockaddr_in *src_addr); +void iser_conn_release(struct iser_connection *p_iser_conn); void iser_conn_add_task(struct iser_connection *p_iser_conn, struct iser_task *p_iser_task); - void iser_conn_delete_task(struct iser_connection *p_iser_conn, struct iser_task *p_iser_task); - void iser_conn_add_recv_ctrl_dto(struct iser_connection *p_iser_conn, struct iser_dto *p_ctrl_dto); - void iser_conn_delete_recv_ctrl_dto(struct iser_connection *p_iser_conn, struct iser_dto *p_ctrl_dto); - -int iser_complete_conn_termination(struct iser_connection *p_iser_conn); - int iser_post_receive_control(struct iser_connection *p_iser_conn); char *iser_conn_get_state_name(struct iser_connection *p_iser_conn); - int iser_conn_async_terminate(struct iser_connection *p_iser_conn); int iser_conn_sync_terminate(struct iser_connection *p_iser_conn); +int iser_complete_conn_termination(struct iser_connection *p_iser_conn); #endif /* __ISER_CONN_H__ */ Index: iser_task.c =================================================================== --- iser_task.c (revision 3404) +++ iser_task.c (working copy) @@ -361,15 +361,23 @@ /** * iser_dealloc_task_res - Deallocs task res */ -iser_status iser_dealloc_task_res(void *conn_h, unsigned int itt) +int iser_dealloc_task_res(void *iser_conn_h, unsigned int itt) { - struct iser_connection *p_iser_conn; + struct iser_connection *p_iser_conn = iser_conn_h; struct iser_task *p_iser_task; + int iser_err = 0; - p_iser_conn = hash_find_iser_conn(conn_h); - p_iser_task = hash_find_iser_task(p_iser_conn, itt); - - iser_task_free(p_iser_task); - - return ISER_SUCCESS; + if (p_iser_conn != NULL) { + p_iser_task = hash_find_iser_task(p_iser_conn, itt); + if (p_iser_task != NULL) { + iser_task_free(p_iser_task); + } + else { + iser_err = -EINVAL; + } + } + else { + iser_err = -EINVAL; + } + return iser_err; } /* iser_dealloc_task_res */ Index: iser_utils.h =================================================================== --- iser_utils.h (revision 3404) +++ iser_utils.h (working copy) @@ -54,66 +54,35 @@ void hash_delete_iser_task(struct iser_task *iser_task); /* --------------------------------------------------------------------- - * ISER CONNECTION-SPECIFIC HASH MANAGEMENT - * ------------------------------------------------------------------ */ - -struct iser_connection *hash_find_iser_conn(void *iscsi_conn_h); - -void hash_add_iser_conn(struct iser_connection *iser_conn); - -void hash_delete_iser_conn(struct iser_connection *iser_conn); - -/* --------------------------------------------------------------------- * BUFFER DESCRIPTORS * ------------------------------------------------------------------ */ -unsigned long -iser_get_data_total_length(struct iser_data_buf *p_data, int skip, int count); +unsigned int iser_data_contig_length(struct iser_data_buf *p_data, + int skip, + uint64_t *start_addr, + int *size); -int -iser_iovec_virt_to_phys(struct iser_data_buf *p_data, - struct iser_phys_mem *p_phys, int skip, int count); +unsigned long iser_sg_subset_len(struct iser_data_buf *p_data, + int skip_entries, + int count_entries); -int -iser_sglist_virt_to_phys(struct iser_data_buf *p_data, - struct iser_phys_mem *p_phys, int skip, int count); -unsigned int -iser_iovec_contig_length(struct iser_data_buf *p_data, - int skip, uint64_t * start_addr, int *size); +int iser_data_convert_to_phys(struct iser_data_buf *p_data, + struct iser_phys_mem *p_phys, + int skip, + int count); -int -iser_iovec_phys_copy_to_phys(struct iser_data_buf *p_data, - struct iser_phys_mem *p_phys, int skip, - int count); -int iser_convert_mem_to_phys(struct iser_data_buf *p_data, - struct iser_phys_mem *p_phys, int skip, - int count); +unsigned int iser_data_aligned_length(struct iser_data_buf *p_data, + int skip); -unsigned int iser_iovec_aligned_length(struct iser_data_buf *p_data, int skip); +void iser_data_buf_memcpy(unsigned char *p_dst_buf, + struct iser_data_buf *p_src_data, + unsigned long *p_total_copied_sz); -void *iser_phys_to_virt(void *phys_addr); +void iser_data_desc_dump(struct iser_data_buf *p_data); -void *iser_page_to_virt(struct page *page); +struct iser_phys_mem *iser_alloc_phys_desc(struct iser_data_buf *p_data, + int skip, + int count); +void iser_free_phys_desc(struct iser_phys_mem *phys_desc); -void *iser_iovec_virt_entry_addr(void *p_iovec_virt, int i); - -void *iser_iovec_phys_entry_to_virt(void *p_iovec_phys, int i); - -void *iser_sglist_entry_to_virt(void *p_sglist, int i); - -unsigned long iser_iovec_entry_len(void *p_iovec_arr, int i); - -unsigned long iser_sglist_entry_len(void *p_sglist, int i); - -void -iser_data_buf_memcpy(unsigned char *p_dst_buf, - struct iser_data_buf *p_src_data, - unsigned long *p_total_copied_sz); - -char *iser_data_buf_get_type_name(struct iser_data_buf *p_data); - -struct iser_phys_mem *iser_alloc_phys_mem(struct iser_data_buf *p_data, - int skip, int count); -void iser_free_phys_mem(struct iser_phys_mem *pmt); - #endif /* __ISER_UTILS_H__ */ Index: iser_task.h =================================================================== --- iser_task.h (revision 3404) +++ iser_task.h (working copy) @@ -36,32 +36,20 @@ #include "iser.h" -/* iSER API Functions */ +int iser_dealloc_task_res(void *conn_h, unsigned int itt); -iser_status iser_dealloc_task_res(void *conn_h, unsigned int itt); -/* Internal iSER Functions */ - struct iser_task *iser_task_alloc(struct iser_connection *iser_conn, unsigned int itt); void iser_task_ctrl_notify_count_inc(struct iser_task *p_iser_task); - int iser_task_ctrl_notify_count_dec_and_test(struct iser_task *p_iser_task); - void iser_task_post_send_count_inc(struct iser_task *p_iser_task); - int iser_task_post_send_count_dec_and_test(struct iser_task *p_iser_task); - void iser_task_recvd_pdu_count_inc(struct iser_task *p_iser_task); - int iser_task_recvd_pdu_count_reset(struct iser_task *p_iser_task); - -void -iser_task_set_status(struct iser_task *p_iser_task, - enum iser_task_status status); +void iser_task_set_status(struct iser_task *p_iser_task, + enum iser_task_status status); void iser_task_free(struct iser_task *iser_task); - void iser_task_release_recv_buffers(struct iser_task *p_iser_task); - void iser_task_release_send_buffers(struct iser_task *p_iser_task); #endif /* __ISER_TASK_H__ */ Index: iser_lkdapl.c =================================================================== --- iser_lkdapl.c (revision 3404) +++ iser_lkdapl.c (working copy) @@ -47,6 +47,7 @@ #include "iser_kdapl.h" #include "iser_utils.h" #include "iser_memory.h" +#include "iser_socket.h" #include "iser_initiator.h" /* Service ID generation */ @@ -152,7 +153,7 @@ ITRACE(ISER_TRACE_KDAPL, "failed to find a working adaptor\n"); kfree(list); kfree(tmem); - return -1; + return -ENODEV; } /* done with lists */ @@ -437,61 +438,11 @@ } /* iser_disconnect */ /** - * iser_register_virt_mem - Register a memory buffer. + * iser_register_ia_mem - Register a memory buffer using IA method. * * returns: 0 on success, -1 on failure */ int -iser_register_virt_mem(struct iser_adaptor *p_adaptor, - void *p_buf, - unsigned long data_sz, - enum dat_mem_priv_flags priv_flags, - struct iser_mem_handles *p_mem_reg, - DAT_RMR_CONTEXT * p_rmr_ctxt) -{ - DAT_REGION_DESCRIPTION mem_region; - u32 dat_ret; - DAT_RMR_CONTEXT dummy_rmr; - int ret_val = 0; - - mem_region.for_va = p_buf; - dat_ret = dat_lmr_kcreate(p_adaptor->ia_handle, - DAT_MEM_TYPE_VIRTUAL, - mem_region, - data_sz, - p_adaptor->pz_handle, - priv_flags, - DAT_MEM_OPTIMIZE_DONT_CARE, - &p_mem_reg->lmr_handle, - &p_mem_reg->lmr_triplet.lmr_context, - p_rmr_ctxt == NULL ? &dummy_rmr : p_rmr_ctxt, - &p_mem_reg->lmr_triplet.segment_length, - &p_mem_reg->lmr_triplet.virtual_address); - - ITRACE(ISER_TRACE_PHYS_MEM_REG, - "VIRTUAL Mem.register [SINGLE p_buf: 0x%p, sz: %ld] " - "-> [lkey: 0x%08X lmr_h: 0x%p va: 0x%08lX sz: %ld]\n", - p_buf, data_sz, - (unsigned int)p_mem_reg->lmr_triplet.lmr_context, - p_mem_reg->lmr_handle, - (unsigned long)p_mem_reg->lmr_triplet.virtual_address, - (unsigned long)p_mem_reg->lmr_triplet.segment_length); - - if (dat_ret) { - printk(KERN_ERR PFX "dat_lmr_kcreate VIRTUAL failed ret=%d\n", - dat_ret); - ret_val = -1; - } - - return ret_val; -} /* iser_register_virt_mem */ - -/** - * iser_register_ia_mem - Register a virtual memory buffer using IA method. - * - * returns: 0 on success, -1 on failure - */ -int iser_register_ia_mem(struct iser_adaptor *p_adaptor, void *virt_addr, unsigned long data_sz, @@ -640,11 +591,11 @@ } /* iser_unregister_memory */ /** - * iser_dtoo_iov - Posts a receive buffer. + * iser_dto_to_iov - Posts a receive buffer. * * returns 0 on success, -1 on failure */ -static void iser_dtoo_iov(struct iser_dto *p_dto, struct dat_lmr_triplet *iov) +static void iser_dto_to_iov(struct iser_dto *p_dto, struct dat_lmr_triplet *iov) { int i; @@ -697,7 +648,7 @@ (unsigned int)iov[i].lmr_context, p_dto->regd[i]->mem_reg.lmr_handle); } -} /* iser_dtoo_iov */ +} /* iser_dto_to_iov */ /** * iser_post_recv - Posts a receive buffer. @@ -726,7 +677,7 @@ IPANIC("DTO regd_vector_len exceeds maximal IOV len\n"); } - iser_dtoo_iov(p_recv_dto, iov); + iser_dto_to_iov(p_recv_dto, iov); /* Post rcv buffers */ spin_lock(&p_iser_conn->post_rx_lock); @@ -784,7 +735,7 @@ IPANIC("DTO regd_vector_len exceeds maximal IOV len\n"); } - iser_dtoo_iov(p_dto, iov); + iser_dto_to_iov(p_dto, iov); switch (p_dto->type) { case ISER_DTO_SEND: @@ -1009,11 +960,8 @@ switch (event_number) { case DAT_CONNECTION_EVENT_ESTABLISHED: - /* Set the new conn state */ atomic_set(&p_iser_conn->state, ISER_CONN_UP); - /* Notify about conn establishment */ - p_iser_conn->p_entity->api_cb. - conn_establish_notify(p_iser_conn->iscsi_conn_h, 1); + iser_conn_establish_notify(p_iser_conn); break; case DAT_CONNECTION_EVENT_DISCONNECTED: @@ -1025,10 +973,8 @@ the iSCSI layer's perspective. */ if (atomic_read(&p_iser_conn->state) == ISER_CONN_PENDING) { atomic_set(&p_iser_conn->state, ISER_CONN_DOWN); - p_iser_conn->p_entity->api_cb. - conn_establish_notify(p_iser_conn->iscsi_conn_h, - 0); - iser_conn_free(p_iser_conn); + iser_conn_establish_notify(p_iser_conn); + iser_conn_release(p_iser_conn); } else { if (atomic_read(&p_iser_conn->state) == ISER_CONN_UP) { atomic_set(&p_iser_conn->state, @@ -1051,10 +997,8 @@ case DAT_CONNECTION_EVENT_UNREACHABLE: if (atomic_read(&p_iser_conn->state) == ISER_CONN_PENDING) { atomic_set(&p_iser_conn->state, ISER_CONN_DOWN); - p_iser_conn->p_entity->api_cb. - conn_establish_notify(p_iser_conn->iscsi_conn_h, - 0); - iser_conn_free(p_iser_conn); + iser_conn_establish_notify(p_iser_conn); + iser_conn_release(p_iser_conn); break; } else { printk(KERN_ERR PFX @@ -1138,13 +1082,7 @@ dto_xfer_len, ev->event_data.dto_completion_event_data.ep); - /* Handle successful DTO completion */ - ret_val = iser_dto_completion(p_dto, dto_xfer_len); - - if (ret_val != 0) { - printk(KERN_ERR PFX "Failed to handle " - "successful DTO completion\n"); - } + iser_dto_completion(p_dto, dto_xfer_len); break; case DAT_DTO_ERR_FLUSHED: Index: iser_initiator.h =================================================================== --- iser_initiator.h (revision 3404) +++ iser_initiator.h (working copy) @@ -36,18 +36,10 @@ #include "iser.h" -int ig_init(void); -int ig_release(void); +int iser_send_control(void *iser_conn_h, + struct iser_send_pdu *p_ctrl_pdu); -struct iser_connection *iser_adaptor_find_conn(struct iser_adaptor - *p_iser_adaptor, - void *ep_handle); -void iser_adaptor_remove_conn(struct iser_connection *p_iser_conn); +void iser_dto_completion(struct iser_dto *p_dto, + unsigned long dto_xfer_len); -void iser_entity_add_conn(struct iser_entity *p_entity, - struct iser_connection *p_iser_conn); -void iser_entity_remove_conn(struct iser_connection *p_iser_conn); - -int iser_dto_completion(struct iser_dto *p_dto, unsigned long dto_xfer_len); - #endif /* __ISER_INITIATOR_H__ */ Index: iser_memory.c =================================================================== --- iser_memory.c (revision 3404) +++ iser_memory.c (working copy) @@ -35,8 +35,8 @@ #include #include #include -#include /* struct iovec */ -#include /* struct scatterlist */ +#include +#include #include "iser.h" #include "iser_memory.h" @@ -157,24 +157,14 @@ p_pool_region->gfp_order = gfp_order; p_pool_region->id = ++p_bufpool->num_regions; - /* Register the memory */ -#if VIRTUAL_SUPPORTED - /* remote ctxt */ - ret_val = - iser_register_virt_mem(p_bufpool->p_adaptor, p_pool_mem, gfp_size, - DAT_MEM_PRIV_ALL_FLAG, - &p_pool_region->mem_reg, NULL); -#else /* DAN */ - /* remote ctxt */ - ret_val = - iser_register_ia_mem(p_bufpool->p_adaptor, p_pool_mem, gfp_size, - DAT_MEM_PRIV_ALL_FLAG, - &p_pool_region->mem_reg, NULL); -#endif - + ret_val = iser_register_ia_mem(p_bufpool->p_adaptor, + p_pool_mem, + gfp_size, + DAT_MEM_PRIV_ALL_FLAG, + &p_pool_region->mem_reg, + NULL); if (ret_val != 0) { - printk(KERN_ERR PFX - "Failed to register buffer pool memory: 0x%p, " + printk(KERN_ERR PFX "Failed to register buff pool mem:0x%p, " "sz: %ld, region: 0x%p\n", p_pool_mem, gfp_size, p_pool_region); iser_bufpool_region_destroy(p_pool_region); @@ -750,12 +740,12 @@ } /* iser_regd_buff_release */ /** - * iser_init_regd_buff_cache - Pre-registers the entire memory buffers, + * Pre-registers the entire memory buffers, * for single buffers lookup * * returns 0 on success, -1 otherwise */ -int iser_init_regd_buff_cache(struct iser_adaptor *p_iser_adaptor) +int iser_reg_all_mem(struct iser_adaptor *p_iser_adaptor) { struct sysinfo si; int phys_mem_size; @@ -779,9 +769,9 @@ } return ret_val; -} /* iser_init_regd_buff_cache */ +} /* iser_reg_all_mem */ -int iser_release_regd_buff_cache(struct iser_adaptor *p_iser_adaptor) +int iser_unreg_all_mem(struct iser_adaptor *p_iser_adaptor) { int ret_val = 0; @@ -794,7 +784,7 @@ } return ret_val; -} /* iser_release_regd_buff_cache */ +} /* iser_unreg_all_mem */ /** * iser_regd_buff_lookup - Checks if a memory buffer is contained Index: Makefile =================================================================== --- Makefile (revision 3404) +++ Makefile (working copy) @@ -10,5 +10,6 @@ iser_task.o \ iser_utils.o \ iser_dto.o \ - iser_lkdapl.o + iser_lkdapl.o \ + iser_socket.o From rminnich at lanl.gov Tue Sep 13 10:04:51 2005 From: rminnich at lanl.gov (Ronald G Minnich) Date: Tue, 13 Sep 2005 11:04:51 -0600 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <20050913170122.GA24527@lst.de> References: <1126616983.4382.43413.camel@hal.voltaire.com> <4326D111.7090109@mellanox.co.il> <20050913142451.GA21653@lst.de> <4326FDB7.1070207@ichips.intel.com> <20050913170122.GA24527@lst.de> Message-ID: <432706B3.4050800@lanl.gov> > On Tue, Sep 13, 2005 at 09:26:31AM -0700, Sean Hefty wrote: >>My understanding is that the labs, who control the OpenIB servers, refused >>to host any Windows related code, forcing it to have a separate repository. wow, that's news to me! Maybe I'm at the wrong lab! Anybody have a source for this "understanding", because off the top of my head, it just doesn't sound right. ron From rolandd at cisco.com Tue Sep 13 10:08:09 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 10:08:09 -0700 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <20050913170122.GA24527@lst.de> (Christoph Hellwig's message of "Tue, 13 Sep 2005 19:01:22 +0200") References: <1126616983.4382.43413.camel@hal.voltaire.com> <4326D111.7090109@mellanox.co.il> <20050913142451.GA21653@lst.de> <4326FDB7.1070207@ichips.intel.com> <20050913170122.GA24527@lst.de> Message-ID: <5264t499qe.fsf@cisco.com> Sean> My understanding is that the labs, who control the OpenIB Sean> servers, refused to host any Windows related code, forcing Sean> it to have a separate repository. Christoph> It shouldn't be difficult to find someone to host it. Christoph> I could maybe ask if such a repo could be put at the Christoph> lst.de servers. Actually I think the issue was somewhat different. Microsoft is so allergic to the GPL that they asked for the code to be in a physically separate repository. Before you ask: no, it doesn't make any sense to me either. - R. From rminnich at lanl.gov Tue Sep 13 10:10:50 2005 From: rminnich at lanl.gov (Ronald G Minnich) Date: Tue, 13 Sep 2005 11:10:50 -0600 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <5264t499qe.fsf@cisco.com> References: <1126616983.4382.43413.camel@hal.voltaire.com> <4326D111.7090109@mellanox.co.il> <20050913142451.GA21653@lst.de> <4326FDB7.1070207@ichips.intel.com> <20050913170122.GA24527@lst.de> <5264t499qe.fsf@cisco.com> Message-ID: <4327081A.1020603@lanl.gov> Roland Dreier wrote: > Actually I think the issue was somewhat different. Microsoft is so > allergic to the GPL that they asked for the code to be in a physically > separate repository. > that makes much more sense, ah, well, not really, but it is easier to understand. I doubt the Labs would have any objection to Windows code. Actually, I kind of wish that the code were all at openib.org. Should we really pay that much heed to requests of this sort if it makes life hard for openib people? ron From rolandd at cisco.com Tue Sep 13 10:12:40 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 10:12:40 -0700 Subject: [openib-general] Re: [PATCH] libmthca: fix wqe post In-Reply-To: <20050913153155.GK14121@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 13 Sep 2005 18:31:55 +0300") References: <52u0gp95d9.fsf@cisco.com> <20050913153155.GK14121@mellanox.co.il> Message-ID: <521x3s99iv.fsf@cisco.com> Michael> This seems to be a bug in libmthca. Patch below. Thanks for the really fast debugging. This patch does fix it for me, and I'll apply it now. Michael> We probably need a similiar fix for kernel mthca - let me Michael> know if you plan to work on that, otherwise I'll look Michael> into it tomorrow. And its probably something we want Michael> fixed for 2.6.14, right? Let me know. I can fix the kernel side of things, and I'll queue it for 2.6.14. Michael> With regard to the test code that you posted - I also Michael> have some small comments. If you plan to use it in the Michael> future, you can stick it in svn somewhere and I'll send Michael> patches. I just thought of that program as a throw-away test case, but if you think it's useful I could check it in somewhere. Is it worth it? - R. From viswa.krish at gmail.com Tue Sep 13 10:13:31 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Tue, 13 Sep 2005 10:13:31 -0700 Subject: [openib-general] Re: [PATCH] libmthca: fix wqe post (was Re: strange mem-free bug) In-Reply-To: <20050913153155.GK14121@mellanox.co.il> References: <52u0gp95d9.fsf@cisco.com> <20050913153155.GK14121@mellanox.co.il> Message-ID: <4df28be4050913101339456607@mail.gmail.com> Michael, Thanks.. Roland, Once you generate a kernel patch, I can test out both user and kernel mthca since I have the tests ready.. -Viswa On 9/13/05, Michael S. Tsirkin wrote: > > Quoting r. Roland Dreier : > > Subject: strange mem-free bug (was: [openib-general] completion Q > overflow error/panic) > > > > While looking at Viswa's example, I've found what seems to be a > > problem using lots of QPs on mem-free HCAs. > > Hi, Roland! > This seems to be a bug in libmthca. Patch below. > > We probably need a similiar fix for kernel mthca - let me know if > you plan to work on that, otherwise I'll look into it tomorrow. > And its probably something we want fixed for 2.6.14, right? > Let me know. > > With regard to the test code that you posted - I also have some small > comments. If you plan to use it in the future, you can stick it > in svn somewhere and I'll send patches. > > --- > > Fix posting of the first work request for memfree hardware. > Simplify code for tavor mode hardware. > > Signed-off-by: Michael S. Tsirkin > > Index: userspace/libmthca/src/qp.c > =================================================================== > --- userspace.orig/libmthca/src/qp.c 2005-09-13 17:17:58.000000000 +0300 > +++ userspace/libmthca/src/qp.c 2005-09-13 17:26:23.000000000 +0300 > @@ -259,15 +259,13 @@ int mthca_tavor_post_send(struct ibv_qp > goto out; > } > > - if (prev_wqe) { > - ((struct mthca_next_seg *) prev_wqe)->nda_op = > - htonl(((ind << qp->sq.wqe_shift) + > - qp->send_wqe_offset) | > - mthca_opcode[wr->opcode]); > + ((struct mthca_next_seg *) prev_wqe)->nda_op = > + htonl(((ind << qp->sq.wqe_shift) + > + qp->send_wqe_offset) | > + mthca_opcode[wr->opcode]); > > - ((struct mthca_next_seg *) prev_wqe)->ee_nds = > - htonl((size0 ? 0 : MTHCA_NEXT_DBD) | size); > - } > + ((struct mthca_next_seg *) prev_wqe)->ee_nds = > + htonl((size0 ? 0 : MTHCA_NEXT_DBD) | size); > > if (!size0) { > size0 = size; > @@ -353,12 +351,10 @@ int mthca_tavor_post_recv(struct ibv_qp > > qp->wrid[ind] = wr->wr_id; > > - if (prev_wqe) { > - ((struct mthca_next_seg *) prev_wqe)->nda_op = > - htonl((ind << qp->rq.wqe_shift) | 1); > - ((struct mthca_next_seg *) prev_wqe)->ee_nds = > - htonl(MTHCA_NEXT_DBD | size); > - } > + ((struct mthca_next_seg *) prev_wqe)->nda_op = > + htonl((ind << qp->rq.wqe_shift) | 1); > + ((struct mthca_next_seg *) prev_wqe)->ee_nds = > + htonl(MTHCA_NEXT_DBD | size); > > if (!size0) > size0 = size; > @@ -562,15 +558,13 @@ int mthca_arbel_post_send(struct ibv_qp > goto out; > } > > - if (prev_wqe) { > - ((struct mthca_next_seg *) prev_wqe)->nda_op = > - htonl(((ind << qp->sq.wqe_shift) + > - qp->send_wqe_offset) | > - mthca_opcode[wr->opcode]); > - mb(); > - ((struct mthca_next_seg *) prev_wqe)->ee_nds = > - htonl(MTHCA_NEXT_DBD | size); > - } > + ((struct mthca_next_seg *) prev_wqe)->nda_op = > + htonl(((ind << qp->sq.wqe_shift) + > + qp->send_wqe_offset) | > + mthca_opcode[wr->opcode]); > + mb(); > + ((struct mthca_next_seg *) prev_wqe)->ee_nds = > + htonl(MTHCA_NEXT_DBD | size); > > if (!size0) { > size0 = size; > @@ -767,6 +761,8 @@ int mthca_alloc_qp_buf(struct ibv_pd *pd > } > } > > + qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); > + qp->rq.last = get_recv_wqe(qp, qp->sq.max - 1); > return 0; > } > > Index: userspace/libmthca/src/srq.c > =================================================================== > --- userspace.orig/libmthca/src/srq.c 2005-09-13 17:25:41.000000000 +0300 > +++ userspace/libmthca/src/srq.c 2005-09-13 17:25:51.000000000 +0300 > @@ -142,13 +142,11 @@ int mthca_tavor_post_srq_recv(struct ibv > ((struct mthca_data_seg *) wqe)->addr = 0; > } > > - if (prev_wqe) { > - ((struct mthca_next_seg *) prev_wqe)->nda_op = > - htonl((ind << srq->wqe_shift) | 1); > - mb(); > - ((struct mthca_next_seg *) prev_wqe)->ee_nds = > - htonl(MTHCA_NEXT_DBD); > - } > + ((struct mthca_next_seg *) prev_wqe)->nda_op = > + htonl((ind << srq->wqe_shift) | 1); > + mb(); > + ((struct mthca_next_seg *) prev_wqe)->ee_nds = > + htonl(MTHCA_NEXT_DBD); > > srq->wrid[ind] = wr->wr_id; > srq->first_free = next_ind; > @@ -294,6 +292,7 @@ int mthca_alloc_srq_buf(struct ibv_pd *p > > srq->first_free = 0; > srq->last_free = srq->max - 1; > + srq->last = get_wqe(srq, srq->max - 1); > > return 0; > } > Index: userspace/libmthca/src/verbs.c > =================================================================== > --- userspace.orig/libmthca/src/verbs.c 2005-08-23 14:03:12.000000000+0300 > +++ userspace/libmthca/src/verbs.c 2005-09-13 17:25:14.000000000 +0300 > @@ -306,7 +306,6 @@ struct ibv_srq *mthca_create_srq(struct > > srq->max = align_queue_size(pd->context, attr->attr.max_wr, 1); > srq->max_gs = attr->attr.max_sge; > - srq->last = NULL; > srq->counter = 0; > > if (mthca_alloc_srq_buf(pd, &attr->attr, srq)) > @@ -413,14 +412,12 @@ struct ibv_qp *mthca_create_qp(struct ib > qp->sq.last_comp = qp->sq.max - 1; > qp->sq.head = 0; > qp->sq.tail = 0; > - qp->sq.last = NULL; > > qp->rq.max = align_queue_size(pd->context, attr->cap.max_recv_wr, 0); > qp->rq.next_ind = 0; > qp->rq.last_comp = qp->rq.max - 1; > qp->rq.head = 0; > qp->rq.tail = 0; > - qp->rq.last = NULL; > > if (mthca_alloc_qp_buf(pd, &attr->cap, qp)) > goto err; > > > -- > MST > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Tue Sep 13 10:19:17 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 10:19:17 -0700 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <4327081A.1020603@lanl.gov> (Ronald G. Minnich's message of "Tue, 13 Sep 2005 11:10:50 -0600") References: <1126616983.4382.43413.camel@hal.voltaire.com> <4326D111.7090109@mellanox.co.il> <20050913142451.GA21653@lst.de> <4326FDB7.1070207@ichips.intel.com> <20050913170122.GA24527@lst.de> <5264t499qe.fsf@cisco.com> <4327081A.1020603@lanl.gov> Message-ID: <52wtlk7une.fsf@cisco.com> Ronald> Actually, I kind of wish that the code were all at Ronald> openib.org. Should we really pay that much heed to Ronald> requests of this sort if it makes life hard for openib Ronald> people? The Windows source tree is actually at svn://windows.openib.org/svn, so it is at openib.org. So, there's more than one repository, but given how little the code has in common with the Linux code, I don't think that's a big deal. - R. From halr at voltaire.com Tue Sep 13 10:14:36 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 13:14:36 -0400 Subject: [openib-general] Re: ipoib send-only join to IGMP multicast group In-Reply-To: <20050913162904.GA16651@mellanox.co.il> References: <52acih7xwu.fsf@cisco.com> <20050913162904.GA16651@mellanox.co.il> Message-ID: <1126631676.4514.660.camel@hal.voltaire.com> On Tue, 2005-09-13 at 12:29, Michael S. Tsirkin wrote: > Quoting r. Roland Dreier : > > Jack> 2. Who is responsible for creating this multicast group in > > Jack> IPoIB? (A send-only join will not cause a group to be > > Jack> created if it does not yet exist) > > > > The group will be created if any full-member joins are done. > > Forgive me if I'm asking a dump question - but wouldnt it be simpler > to join as a full member? It would just need the additional characteristics for that group which are available from the broadcast group. > It seems that if we are joining a group and it doesnt exist, > we have to handle this specially by forwarding packets to > all-IP broadcast group. > Further, since the group can be added/deleted at any time. > it seems that we also should request, and handle, > delete updates and 'creation' reports from the SM. > > > But that > > won't happen unless some entity wants to receive packets from the > > group -- in other words, some multicast router. > > The ipoib draft seems to imply that routers should typically > perform nonmember joins. Do I misunderstand it? No and these nonmember joins are based on it subscribing to MGID created/deleted notices.. This is all covered better in the IPoIB Architecture I-D http://www.ietf.org/internet-drafts/draft-ietf-ipoib-architecture-04.txt IPmc senders join groups either as send only (nonmembers) or full members. A pure sender may choose to join the multicast group as a FullMember. In such a case the sender will receive all the multicast packets transmitted to the IB group. Additionally, the IB group will not be deleted until the sender leaves the group. Alternatively, a sender might IB_join as a SendOnlyNonMember. In such a case the packets are not routed to the sender though packets transmitted by it can reach the other group members. Additionally, the group can be deleted when all FullMembers have left the group. The sender can further request delete updates from the SM. All that is said about receivers is that they need to join and create the group if necessary. The IP host must join the IB multicast group corresponding to the IP address. This follows from the IBA requirement that the receiver must join the relevant IB multicast group. The group is automatically created if it does not exist [IB_ARCH]. The IP receivers must IB_leave the IB group when the IP layer stops listening of the corresponding IP address. The SM can then choose to delete the group. IP multicast routers need to listen promiscuously and joins these discovered groups as nonmembers. IP routers know of the new IP groups created in the subnet by the use of protocols such as IGMP/MLD. However, this is not enough for IPoIB since the router needs to IB_join the relevant IB groups to be able to receive and transmit the packets. There is no promiscuous mode for listening to all packets. The IPoIB routers therefore need to request the SM to report all creations of IB groups in the fabric. The IPoIB router can then IB_join the reported group. It is not desirable that the router's IB_joining of a multicast group be considered the same as the IB_join from a receiver - the router's IB_join shouldn't disallow the group's deletion when all receivers leave. To overcome just this type of situations, IBA provides the NonMember IB_join mode. The NonMember IB_join mode can be used by IP routers when they join in response to the create reports. A router should ideally request the delete reports too so that it can release all the resources associated with the group. T -- Hal From rishi.shah at soulcitypubs.com Tue Sep 13 10:26:50 2005 From: rishi.shah at soulcitypubs.com (RAVE*SQ Magazine) Date: Tue, 13 Sep 2005 10:26:50 -0700 Subject: [openib-general] RAVE*SQ Magazine at Barnes & Noble and Borders! Message-ID: <967503822-1463747838-1126632457@soulcitypublications.b.tep1.com> If you can read this message your browser does not support HTML. To read this message from RAVE*SQ Magazine visit http://soulcitypublications.c.topica.com/maadYCiabkhpAci5DeZe/ today! ==================================================================== Update Your Profile: http://soulcitypublications.f.topica.com/f/?a84NZf.ci5DeZ.b3Blbmli Unsubscribe: http://soulcitypublications.f.topica.com/f/unsub.html/aafs57olsf4g91gfecd3h1q8_k8tp0mh__c7.rw8941dd Confirm Your Subscription: http://soulcitypublications.f.topica.com/f/?a84NZf.ci5DeZ.b3Blbmli.c Report Unsolicited Email: http://topica.com/f/abuse.html?aafs57olsf4g91gfecd3h1q8_k8tp0mh__c7.rw8941dd Delivered by Topica: http://www.topica.com/?p=T3FOOTER -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Sep 13 10:22:50 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 13:22:50 -0400 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <52wtlk7une.fsf@cisco.com> References: <1126616983.4382.43413.camel@hal.voltaire.com> <4326D111.7090109@mellanox.co.il> <20050913142451.GA21653@lst.de> <4326FDB7.1070207@ichips.intel.com> <20050913170122.GA24527@lst.de> <5264t499qe.fsf@cisco.com> <4327081A.1020603@lanl.gov> <52wtlk7une.fsf@cisco.com> Message-ID: <1126632035.4514.680.camel@hal.voltaire.com> On Tue, 2005-09-13 at 13:19, Roland Dreier wrote: > The Windows source tree is actually at svn://windows.openib.org/svn, > so it is at openib.org. So, there's more than one repository, but > given how little the code has in common with the Linux code, I don't > think that's a big deal. OpenSM and (in the future) the Mellanox diagnostic tools are what I am aware of that is shared between the 2 environments. -- Hal From ftillier at silverstorm.com Tue Sep 13 10:35:48 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Tue, 13 Sep 2005 10:35:48 -0700 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <5264t499qe.fsf@cisco.com> Message-ID: <004701c5b889$95074880$9e5aa8c0@infiniconsys.com> > From: Roland Dreier [mailto:rolandd at cisco.com] > Sent: Tuesday, September 13, 2005 10:08 AM > > Sean> My understanding is that the labs, who control the OpenIB > Sean> servers, refused to host any Windows related code, forcing > Sean> it to have a separate repository. > > Christoph> It shouldn't be difficult to find someone to host it. > Christoph> I could maybe ask if such a repo could be put at the > Christoph> lst.de servers. > > Actually I think the issue was somewhat different. Microsoft is so > allergic to the GPL that they asked for the code to be in a physically > separate repository. Microsoft requested a separate repository, not separate servers. Sandia currently hosts the OpenIB SVN repository for Linux and did not want to host the Windows code since they have no interest in it. Yes, this makes things a bit more cumbersome, but such is life. The DDK license supposedly has limitations that make it incompatible with the GPL license - building GPL code with the DDK would be a violation of the DDK license somehow. I have no interest in revisiting this topic - it is what it is, we've argued endlessly about it already, so let's just move on. That said, I personally don't see any issue with user-mode tools being dual-license - it's the core bits that can't be. As far as I'm concerned, having OpenSM maintained in the Linux SVN repository is fine. It would be handy to have a shadow in the Windows repository so that it's easy to get and build, and that's what I think the plan is. As a note, the uDAPL code in the Windows SVN has the uDAPL triple license. - Fab From ftillier at silverstorm.com Tue Sep 13 10:38:57 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Tue, 13 Sep 2005 10:38:57 -0700 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <4327081A.1020603@lanl.gov> Message-ID: <004801c5b88a$0602c690$9e5aa8c0@infiniconsys.com> > From: Ronald G Minnich [mailto:rminnich at lanl.gov] > Sent: Tuesday, September 13, 2005 10:11 AM > > Roland Dreier wrote: > > > Actually I think the issue was somewhat different. Microsoft is so > > allergic to the GPL that they asked for the code to be in a physically > > separate repository. > > > > that makes much more sense, ah, well, not really, but it is easier to > understand. I doubt the Labs would have any objection to Windows code. Sandia did object, so we found an alternate host for the Windows SVN repository. > Actually, I kind of wish that the code were all at openib.org. Should we > really pay that much heed to requests of this sort if it makes life hard > for openib people? The code is all under the openib.org domain. The Windows SVN repository is at svn://windows.openib.org. Given the goals of the Windows project, heeding Microsoft's requests made sense. - Fab From jim.ryan at intel.com Tue Sep 13 10:07:08 2005 From: jim.ryan at intel.com (Ryan, Jim) Date: Tue, 13 Sep 2005 10:07:08 -0700 Subject: [openib-general] Re: Opensm - casting issues #2 Message-ID: My recollection is Matt Leininger, could be wrong Jim -----Original Message----- From: openib-general-bounces at openib.org [mailto:openib-general-bounces at openib.org] On Behalf Of Ronald G Minnich Sent: Tuesday, September 13, 2005 10:05 AM To: Christoph Hellwig Cc: openib-general at openib.org Subject: Re: [openib-general] Re: Opensm - casting issues #2 > On Tue, Sep 13, 2005 at 09:26:31AM -0700, Sean Hefty wrote: >>My understanding is that the labs, who control the OpenIB servers, refused >>to host any Windows related code, forcing it to have a separate repository. wow, that's news to me! Maybe I'm at the wrong lab! Anybody have a source for this "understanding", because off the top of my head, it just doesn't sound right. ron _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From ftillier at silverstorm.com Tue Sep 13 11:13:13 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Tue, 13 Sep 2005 11:13:13 -0700 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: Message-ID: <004d01c5b88e$cf6a1570$9e5aa8c0@infiniconsys.com> > From: Ryan, Jim [mailto:jim.ryan at intel.com] > Sent: Tuesday, September 13, 2005 10:07 AM > > My recollection is Matt Leininger, could be wrong I believe that Matt was just the messenger, conveying his organization's position on the matter. Whether or not we agree with that position is immaterial - it is Sandia's prerogative. There was a need to have a separate repository independent of hosting issues. - Fab From mlleini at ca.sandia.gov Tue Sep 13 11:04:07 2005 From: mlleini at ca.sandia.gov (Matt L. Leininger) Date: Tue, 13 Sep 2005 11:04:07 -0700 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <4326FDB7.1070207@ichips.intel.com> References: <1126616983.4382.43413.camel@hal.voltaire.com> <4326D111.7090109@mellanox.co.il> <20050913142451.GA21653@lst.de> <4326FDB7.1070207@ichips.intel.com> Message-ID: <1126634647.19055.89.camel@localhost> On Tue, 2005-09-13 at 09:26 -0700, Sean Hefty wrote: > Christoph Hellwig wrote: > > Why does the windows port needs a separate repository? Please just > > check all windows code (not just opensm) into the openib repository. > > My understanding is that the labs, who control the OpenIB servers, refused to > host any Windows related code, forcing it to have a separate repository. > The windows stack is a completely separate code base. Folks wanted to host it on another server. - Matt From rolandd at cisco.com Tue Sep 13 11:14:55 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 11:14:55 -0700 Subject: [openib-general] [PATCH v1/RFC] IB: Add SCSI RDMA Protocol (SRP) initiator Message-ID: <52ll207s2o.fsf@cisco.com> Sorry to interrupt the SAS arguments, but... Here's the latest version of the InfiniBand SRP initiator. I think it's ready for merging; I implemented error handling, which was the main thing pointed out from the generally positive reviews of my previous posting. I've done a decent amount of testing without seeing any problems, and John Kingman has also tested against his SRP target. Since this is a completely new driver and can't break anything, assuming the code looks good, does it seem OK to merge for 2.6.14? Thanks, Roland Add an InfiniBand SCSI RDMA Protocol (SRP) initiator. This lets us talk to InfiniBand SRP targets (storage devices). Signed-off-by: Roland Dreier --- drivers/infiniband/Kconfig | 2 drivers/infiniband/Makefile | 1 drivers/infiniband/ulp/srp/Kbuild | 3 drivers/infiniband/ulp/srp/Kconfig | 11 drivers/infiniband/ulp/srp/ib_srp.c | 1637 +++++++++++++++++++++++++++++++++++ drivers/infiniband/ulp/srp/ib_srp.h | 324 +++++++ 6 files changed, 1978 insertions(+), 0 deletions(-) 891de9c3d67dc4afc2e5c941bba96613bca81ae1 diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -33,4 +33,6 @@ source "drivers/infiniband/hw/mthca/Kcon source "drivers/infiniband/ulp/ipoib/Kconfig" +source "drivers/infiniband/ulp/srp/Kconfig" + endmenu diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -1,3 +1,4 @@ obj-$(CONFIG_INFINIBAND) += core/ obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mthca/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ +obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ diff --git a/drivers/infiniband/ulp/srp/Kbuild b/drivers/infiniband/ulp/srp/Kbuild new file mode 100644 --- /dev/null +++ b/drivers/infiniband/ulp/srp/Kbuild @@ -0,0 +1,3 @@ +EXTRA_CFLAGS += -Idrivers/infiniband/include + +obj-$(CONFIG_INFINIBAND_SRP) += ib_srp.o diff --git a/drivers/infiniband/ulp/srp/Kconfig b/drivers/infiniband/ulp/srp/Kconfig new file mode 100644 --- /dev/null +++ b/drivers/infiniband/ulp/srp/Kconfig @@ -0,0 +1,11 @@ +config INFINIBAND_SRP + tristate "InfiniBand SCSI RDMA Protocol" + depends on INFINIBAND && SCSI + ---help--- + Support for the SCSI RDMA Protocol over InfiniBand. This + allows you to access storage devices that speak SRP over + InfiniBand. + + The SRP protocol is defined by the INCITS T10 technical + committee. See . + diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c new file mode 100644 --- /dev/null +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -0,0 +1,1637 @@ +/* + * Copyright (c) 2005 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: ib_srp.c 3395 2005-09-13 05:10:39Z roland $ + */ + +#include +#include +#include +#include +#include +#include +#include + +#include + +#include +#include +#include + +#include + +#include "ib_srp.h" + +#define DRV_NAME "ib_srp" +#define PFX DRV_NAME ": " +#define DRV_VERSION "0.01" +#define DRV_RELDATE "January 11, 2005" + +MODULE_AUTHOR("Roland Dreier"); +MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol driver"); +MODULE_LICENSE("Dual BSD/GPL"); + +static int topspin_workarounds = 1; + +module_param(topspin_workarounds, int, 0444); +MODULE_PARM_DESC(topspin_workarounds, + "Enable workarounds for Topspin/Cisco SRP target bugs if != 0"); + +static const u8 topspin_oui[3] = { 0x00, 0x05, 0xad }; + +static void srp_add_one(struct ib_device *device); +static void srp_remove_one(struct ib_device *device); +static void srp_completion(struct ib_cq *cq, void *target_ptr); +static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event); + +static struct ib_client srp_client = { + .name = "srp", + .add = srp_add_one, + .remove = srp_remove_one +}; + +static inline struct srp_target_port *host_to_target(struct Scsi_Host *host) +{ + return (struct srp_target_port *) host->hostdata; +} + +static const char *srp_target_info(struct Scsi_Host *host) +{ + return host_to_target(host)->target_name; +} + +static struct srp_iu *srp_alloc_iu(struct srp_host *host, size_t size, + unsigned int __nocast gfp_mask, + enum dma_data_direction direction) +{ + struct srp_iu *iu; + + iu = kmalloc(sizeof *iu, gfp_mask); + if (!iu) + goto out; + + iu->buf = kzalloc(size, gfp_mask); + if (!iu->buf) + goto out_free_iu; + + iu->dma = dma_map_single(host->dev->dma_device, iu->buf, size, direction); + if (dma_mapping_error(iu->dma)) + goto out_free_buf; + + iu->size = size; + iu->direction = direction; + + return iu; + +out_free_buf: + kfree(iu->buf); +out_free_iu: + kfree(iu); +out: + return NULL; +} + +static void srp_free_iu(struct srp_host *host, struct srp_iu *iu) +{ + if (!iu) + return; + + dma_unmap_single(host->dev->dma_device, iu->dma, iu->size, iu->direction); + kfree(iu->buf); + kfree(iu); +} + +static void srp_qp_event(struct ib_event *event, void *context) +{ + printk(KERN_ERR PFX "QP event %d\n", event->event); +} + +static int srp_init_qp(struct srp_target_port *target, + struct ib_qp *qp) +{ + struct ib_qp_attr *attr; + int ret; + + attr = kmalloc(sizeof *attr, GFP_KERNEL); + if (!attr) + return -ENOMEM; + + ret = ib_find_cached_pkey(target->srp_host->dev, + target->srp_host->port, + be16_to_cpu(target->path.pkey), + &attr->pkey_index); + if (ret) + return ret; + + attr->qp_state = IB_QPS_INIT; + attr->qp_access_flags = (IB_ACCESS_REMOTE_READ | + IB_ACCESS_REMOTE_WRITE); + attr->port_num = target->srp_host->port; + + return ib_modify_qp(qp, attr, + IB_QP_STATE | + IB_QP_PKEY_INDEX | + IB_QP_ACCESS_FLAGS | + IB_QP_PORT); +} + +static struct ib_qp *srp_create_qp(struct srp_target_port *target, + struct ib_qp_init_attr *init_attr) +{ + struct ib_qp *qp; + int ret; + + qp = ib_create_qp(target->srp_host->pd, init_attr); + if (IS_ERR(qp)) + return qp; + + ret = srp_init_qp(target, qp); + if (ret) { + ib_destroy_qp(qp); + qp = ERR_PTR(ret); + } + + return qp; +} + +static void srp_path_rec_completion(int status, + struct ib_sa_path_rec *pathrec, + void *target_ptr) +{ + struct srp_target_port *target = target_ptr; + struct srp_host *host = target->srp_host; + struct ib_qp_init_attr *init_attr = NULL; + + if (status) { + printk(KERN_ERR PFX "Got failed path rec status %d\n", status); + target->status = status; + goto out; + } + + target->path = *pathrec; + + /* + * We may be getting a path for the second time because we + * were redirected to a different port. In that case, there's + * no reason to create our CQ and QP again. + */ + if (target->cq) { + target->status = 0; + goto out; + } + + init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL); + if (!init_attr) { + target->status = -ENOMEM; + goto out; + } + + target->cq = ib_create_cq(host->dev, srp_completion, + NULL, target, SRP_CQ_SIZE); + if (IS_ERR(target->cq)) { + target->status = PTR_ERR(target->cq); + goto out_free; + } + + ib_req_notify_cq(target->cq, IB_CQ_NEXT_COMP); + + init_attr->event_handler = srp_qp_event; + init_attr->cap.max_send_wr = SRP_SQ_SIZE; + init_attr->cap.max_recv_wr = SRP_RQ_SIZE; + init_attr->cap.max_recv_sge = 1; + init_attr->cap.max_send_sge = 1; + init_attr->sq_sig_type = IB_SIGNAL_ALL_WR; + init_attr->qp_type = IB_QPT_RC; + init_attr->send_cq = target->cq; + init_attr->recv_cq = target->cq; + + target->qp = srp_create_qp(target, init_attr); + if (IS_ERR(target->qp)) { + target->status = PTR_ERR(target->qp); + ib_destroy_cq(target->cq); + goto out_free; + } + + target->status = 0; + +out_free: + kfree(init_attr); + +out: + complete(&target->done); +} + +static int srp_lookup_path(struct srp_target_port *target) +{ + target->path.numb_path = 1; + + init_completion(&target->done); + + target->path_query_id = ib_sa_path_rec_get(target->srp_host->dev, + target->srp_host->port, + &target->path, + IB_SA_PATH_REC_DGID | + IB_SA_PATH_REC_SGID | + IB_SA_PATH_REC_NUMB_PATH | + IB_SA_PATH_REC_PKEY, + SRP_PATH_REC_TIMEOUT_MS, + GFP_KERNEL, + srp_path_rec_completion, + target, &target->path_query); + if (target->path_query_id < 0) + return target->path_query_id; + + wait_for_completion(&target->done); + + if (target->status < 0) + printk(KERN_WARNING PFX "Path record query failed\n"); + + return target->status; +} + +static int srp_send_req(struct srp_target_port *target) +{ + struct { + struct ib_cm_req_param param; + struct srp_login_req priv; + } *req = NULL; + int status; + + req = kzalloc(sizeof *req, GFP_KERNEL); + if (!req) + return -ENOMEM; + + req->param.primary_path = &target->path; + req->param.alternate_path = NULL; + req->param.service_id = target->service_id; + req->param.qp_num = target->qp->qp_num; + req->param.qp_type = target->qp->qp_type; + req->param.starting_psn = 0; /* XXX */ + req->param.private_data = &req->priv; + req->param.private_data_len = sizeof req->priv; + req->param.responder_resources = 4; + req->param.remote_cm_response_timeout = 20; + req->param.flow_control = 1; + req->param.local_cm_response_timeout = 20; + req->param.retry_count = 7; + req->param.rnr_retry_count = 7; + req->param.max_cm_retries = 15; + + req->priv.opcode = SRP_LOGIN_REQ; + req->priv.tag = 0; + req->priv.req_it_iu_len = cpu_to_be32(SRP_MAX_IU_LEN); + req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | + SRP_BUF_FORMAT_INDIRECT); + memcpy(req->priv.initiator_port_id, target->srp_host->initiator_port_id, 16); + /* + * Topspin/Cisco SRP targets will reject our login unless we + * zero out the first 8 bytes of our initiator port ID. The + * second 8 bytes must be our local node GUID, but we always + * use that anyway. + */ + if (topspin_workarounds && !memcmp(&target->ioc_guid, topspin_oui, 3)) { + printk(KERN_DEBUG PFX "Topspin/Cisco initiator port ID workaround " + "activated for target GUID %016llx\n", + (unsigned long long) be64_to_cpu(target->ioc_guid)); + memset(req->priv.initiator_port_id, 0, 8); + } + memcpy(req->priv.target_port_id, &target->id_ext, 8); + memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); + + status = ib_send_cm_req(target->cm_id, &req->param); + if (status) { + ib_destroy_qp(target->qp); + ib_destroy_cq(target->cq); + } + + return status; +} + +static void srp_disconnect_target(struct srp_target_port *target) +{ + /* XXX should send SRP_I_LOGOUT request */ + + init_completion(&target->done); + ib_send_cm_dreq(target->cm_id, NULL, 0); + wait_for_completion(&target->done); +} + +static void srp_free_target_ib(struct srp_target_port *target) +{ + int i; + + ib_destroy_qp(target->qp); + ib_destroy_cq(target->cq); + + for (i = 0; i < SRP_RQ_SIZE; ++i) + srp_free_iu(target->srp_host, target->rx_ring[i]); + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) + srp_free_iu(target->srp_host, target->tx_ring[i]); +} + +static void srp_remove_work(void *target_ptr) +{ + struct srp_target_port *target = target_ptr; + + spin_lock_irq(target->scsi_host->host_lock); + if (target->state != SRP_TARGET_DEAD) { + spin_unlock_irq(target->scsi_host->host_lock); + scsi_host_put(target->scsi_host); + return; + } + target->state = SRP_TARGET_REMOVED; + spin_unlock_irq(target->scsi_host->host_lock); + + down(&target->srp_host->target_mutex); + list_del(&target->list); + up(&target->srp_host->target_mutex); + + scsi_remove_host(target->scsi_host); + ib_destroy_cm_id(target->cm_id); + srp_free_target_ib(target); + scsi_host_put(target->scsi_host); + /* And another put to really free the target port... */ + scsi_host_put(target->scsi_host); +} + +static int srp_connect_target(struct srp_target_port *target) +{ + int ret; + + while (1) { + init_completion(&target->done); + ret = srp_send_req(target); + if (ret) + return ret; + wait_for_completion(&target->done); + + /* + * The CM event handling code will set status to + * SRP_PORT_REDIRECT if we get a port redirect REJ + * back, or SRP_DLID_REDIRECT if we get a lid/qp + * redirect REJ back. + */ + switch (target->status) { + case 0: + return 0; + + case SRP_PORT_REDIRECT: + ret = srp_lookup_path(target); + if (ret) + return ret; + break; + + case SRP_DLID_REDIRECT: + break; + + default: + return target->status; + } + } +} + +static int srp_reconnect_target(struct srp_target_port *target) +{ + struct ib_qp_attr qp_attr; + struct srp_request *req; + struct ib_wc wc; + u32 remote_cm_qpn; + int ret; + int i; + + spin_lock_irq(target->scsi_host->host_lock); + if (target->state != SRP_TARGET_LIVE) { + spin_unlock_irq(target->scsi_host->host_lock); + return -EAGAIN; + } + target->state = SRP_TARGET_CONNECTING; + spin_unlock_irq(target->scsi_host->host_lock); + + remote_cm_qpn = target->cm_id->remote_cm_qpn; + + srp_disconnect_target(target); + + target->cm_id = ib_create_cm_id(srp_cm_handler, target); + if (IS_ERR(target->cm_id)) { + ret = PTR_ERR(target->cm_id); + target->cm_id = NULL; + goto err; + } + + target->cm_id->remote_cm_qpn = remote_cm_qpn; + + qp_attr.qp_state = IB_QPS_RESET; + ret = ib_modify_qp(target->qp, &qp_attr, IB_QP_STATE); + if (ret) + goto err; + + ret = srp_init_qp(target, target->qp); + if (ret) + goto err; + + while (ib_poll_cq(target->cq, 1, &wc) > 0) + ; /* nothing */ + + list_for_each_entry(req, &target->req_queue, list) { + req->scmnd->result = DID_RESET << 16; + req->scmnd->scsi_done(req->scmnd); + } + + target->rx_head = 0; + target->rx_tail = 0; + target->tx_head = 0; + target->tx_tail = 0; + target->req_head = 0; + for (i = 0; i < SRP_SQ_SIZE - 1; ++i) + target->req_ring[i].next = i + 1; + target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->req_queue); + + ret = srp_connect_target(target); + if (ret) + goto err; + + spin_lock_irq(target->scsi_host->host_lock); + if (target->state == SRP_TARGET_CONNECTING) { + ret = 0; + target->state = SRP_TARGET_LIVE; + } else + ret = -EAGAIN; + spin_unlock_irq(target->scsi_host->host_lock); + + return ret; + +err: + printk(KERN_ERR PFX "reconnect failed, removing target port.\n"); + + /* + * We couldn't reconnect, so kill our target port off. + * However, we have to defer the real removal because we might + * be in the context of the SCSI error handler now, which + * would deadlock if we call scsi_remove_host(). + */ + spin_lock_irq(target->scsi_host->host_lock); + if (target->state == SRP_TARGET_CONNECTING) { + target->state = SRP_TARGET_DEAD; + INIT_WORK(&target->work, srp_remove_work, target); + schedule_work(&target->work); + } + spin_unlock_irq(target->scsi_host->host_lock); + + return ret; +} + +static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target, + struct srp_iu *iu) +{ + struct srp_cmd *cmd = iu->buf; + int len; + u8 fmt; + + if (!scmnd->request_buffer || scmnd->sc_data_direction == DMA_NONE) + return sizeof (struct srp_cmd); + + if (scmnd->sc_data_direction != DMA_FROM_DEVICE && + scmnd->sc_data_direction != DMA_TO_DEVICE) { + printk(KERN_WARNING PFX "Unhandled data direction %d\n", + scmnd->sc_data_direction); + return -EINVAL; + } + + if (scmnd->use_sg) { + struct scatterlist *scat = scmnd->request_buffer; + int n; + int i; + + n = dma_map_sg(target->srp_host->dev->dma_device, + scat, scmnd->use_sg, scmnd->sc_data_direction); + + if (n == 1) { + struct srp_direct_buf *buf = (void *) cmd->add_data; + + fmt = SRP_DATA_DESC_DIRECT; + + buf->va = cpu_to_be64(sg_dma_address(scat)); + buf->key = cpu_to_be32(target->srp_host->mr->rkey); + buf->len = cpu_to_be32(sg_dma_len(scat)); + + len = sizeof (struct srp_cmd) + + sizeof (struct srp_direct_buf); + } else { + struct srp_indirect_buf *buf = (void *) cmd->add_data; + u32 datalen = 0; + + fmt = SRP_DATA_DESC_INDIRECT; + + if (scmnd->sc_data_direction == DMA_TO_DEVICE) + cmd->data_out_desc_cnt = n; + else + cmd->data_in_desc_cnt = n; + + buf->table_desc.va = cpu_to_be64(iu->dma + + sizeof *cmd + + sizeof *buf); + buf->table_desc.key = + cpu_to_be32(target->srp_host->mr->rkey); + buf->table_desc.len = + cpu_to_be32(n * sizeof (struct srp_direct_buf)); + + for (i = 0; i < n; ++i) { + buf->desc_list[i].va = cpu_to_be64(sg_dma_address(&scat[i])); + buf->desc_list[i].key = + cpu_to_be32(target->srp_host->mr->rkey); + buf->desc_list[i].len = cpu_to_be32(sg_dma_len(&scat[i])); + + datalen += sg_dma_len(&scat[i]); + } + + buf->len = cpu_to_be32(datalen); + + len = sizeof (struct srp_cmd) + + sizeof (struct srp_indirect_buf) + + n * sizeof (struct srp_direct_buf); + } + } else { + struct srp_direct_buf *buf = (void *) cmd->add_data; + dma_addr_t dma; + + dma = dma_map_single(target->srp_host->dev->dma_device, + scmnd->request_buffer, scmnd->request_bufflen, + scmnd->sc_data_direction); + if (dma_mapping_error(dma)) { + printk(KERN_WARNING PFX "unable to map %p/%d (dir %d)\n", + scmnd->request_buffer, (int) scmnd->request_bufflen, + scmnd->sc_data_direction); + return -EINVAL; + } + + buf->va = cpu_to_be64(dma); + buf->key = cpu_to_be32(target->srp_host->mr->rkey); + buf->len = cpu_to_be32(scmnd->request_bufflen); + + fmt = SRP_DATA_DESC_DIRECT; + + len = sizeof (struct srp_cmd) + sizeof (struct srp_direct_buf); + } + + if (scmnd->sc_data_direction == DMA_TO_DEVICE) + cmd->buf_fmt = fmt << 4; + else + cmd->buf_fmt = fmt; + + + return len; +} + +static void srp_unmap_data(struct scsi_cmnd *scmnd, + struct srp_target_port *target, + struct srp_cmd *cmd) +{ + if (!scmnd->request_buffer || + (scmnd->sc_data_direction != DMA_TO_DEVICE && + scmnd->sc_data_direction != DMA_FROM_DEVICE)) + return; + + if (scmnd->use_sg) + dma_unmap_sg(target->srp_host->dev->dma_device, + (struct scatterlist *) scmnd->request_buffer, + scmnd->use_sg, scmnd->sc_data_direction); + else + dma_unmap_single(target->srp_host->dev->dma_device, + be64_to_cpu(((struct srp_direct_buf *) cmd->add_data)->va), + scmnd->request_bufflen, + scmnd->sc_data_direction); +} + +static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp) +{ + struct srp_request *req; + struct scsi_cmnd *scmnd; + struct srp_iu *iu; + unsigned long flags; + s32 delta; + + delta = (s32) be32_to_cpu(rsp->req_lim_delta); + + spin_lock_irqsave(target->scsi_host->host_lock, flags); + + target->req_lim += delta; + + req = &target->req_ring[rsp->tag & ~SRP_TAG_TSK_MGMT]; + + if (rsp->tag & SRP_TAG_TSK_MGMT) { + if (be32_to_cpu(rsp->resp_data_len) < 4) + req->tsk_status = -1; + else + req->tsk_status = rsp->data[3]; + complete(&req->done); + } else { + iu = req->cmd; + scmnd = req->scmnd; + scmnd->result = rsp->status; + + if (rsp->flags & SRP_RSP_FLAG_SNSVALID) { + memcpy(scmnd->sense_buffer, rsp->data + + be32_to_cpu(rsp->resp_data_len), + min_t(int, be32_to_cpu(rsp->sense_data_len), + SCSI_SENSE_BUFFERSIZE)); + } + + if (rsp->flags & (SRP_RSP_FLAG_DOOVER | SRP_RSP_FLAG_DOUNDER)) + scmnd->resid = be32_to_cpu(rsp->data_out_res_cnt); + else if (rsp->flags & (SRP_RSP_FLAG_DIOVER | SRP_RSP_FLAG_DIUNDER)) + scmnd->resid = be32_to_cpu(rsp->data_in_res_cnt); + + srp_unmap_data(scmnd, target, iu->buf); + + if (!req->tsk_mgmt) { + req->scmnd = NULL; + scmnd->host_scribble = (void *) -1L; + scmnd->scsi_done(scmnd); + + list_del(&req->list); + req->next = target->req_head; + target->req_head = rsp->tag & ~SRP_TAG_TSK_MGMT; + } else + req->cmd_done = 1; + } + + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); +} + +static void srp_reconnect_work(void *target_ptr) +{ + struct srp_target_port *target = target_ptr; + + srp_reconnect_target(target); +} + +static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc) +{ + struct srp_iu *iu; + u8 opcode; + + iu = target->rx_ring[wc->wr_id & ~SRP_OP_RECV]; + + dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, + target->max_ti_iu_len, DMA_FROM_DEVICE); + + opcode = *(u8 *) iu->buf; + + if (0) { + int i; + + printk(KERN_ERR PFX "recv completion, opcode 0x%02x\n", opcode); + + for (i = 0; i < wc->byte_len; ++i) { + if (i % 8 == 0) + printk(KERN_ERR " [%02x] ", i); + printk(" %02x", ((u8 *) iu->buf)[i]); + if ((i + 1) % 8 == 0) + printk("\n"); + } + + if (wc->byte_len % 8) + printk("\n"); + } + + switch (opcode) { + case SRP_RSP: + srp_process_rsp(target, iu->buf); + break; + + case SRP_T_LOGOUT: + /* XXX Handle target logout */ + printk(KERN_WARNING PFX "Got target logout request\n"); + break; + + default: + printk(KERN_WARNING PFX "Unhandled SRP opcode 0x%02x\n", opcode); + break; + } + + dma_sync_single_for_device(target->srp_host->dev->dma_device, iu->dma, + target->max_ti_iu_len, DMA_FROM_DEVICE); + + ++target->rx_tail; +} + +static void srp_completion(struct ib_cq *cq, void *target_ptr) +{ + struct srp_target_port *target = target_ptr; + struct ib_wc wc; + unsigned long flags; + + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + while (ib_poll_cq(cq, 1, &wc) > 0) { + if (wc.status) { + printk(KERN_ERR PFX "failed %s status %d\n", + wc.wr_id & SRP_OP_RECV ? "receive" : "send", + wc.status); + spin_lock_irqsave(target->scsi_host->host_lock, flags); + if (target->state == SRP_TARGET_LIVE) + schedule_work(&target->work); + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); + break; + } + + if (wc.wr_id & SRP_OP_RECV) + srp_handle_recv(target, &wc); + else + ++target->tx_tail; + } +} + +static int __srp_post_recv(struct srp_target_port *target, + unsigned int __nocast gfp_mask) +{ + struct srp_iu *iu; + struct ib_sge list; + struct ib_recv_wr wr, *bad_wr; + unsigned int next; + int ret; + + next = target->rx_head & (SRP_RQ_SIZE - 1); + wr.wr_id = next | SRP_OP_RECV; + iu = target->rx_ring[next]; + + list.addr = iu->dma; + list.length = iu->size; + list.lkey = target->srp_host->mr->lkey; + + wr.next = NULL; + wr.sg_list = &list; + wr.num_sge = 1; + + ret = ib_post_recv(target->qp, &wr, &bad_wr); + if (!ret) + ++target->rx_head; + + return ret; +} + +static int srp_post_recv(struct srp_target_port *target, + unsigned int __nocast gfp_mask) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(target->scsi_host->host_lock, flags); + ret = __srp_post_recv(target, gfp_mask); + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); + + return ret; +} + +/* + * Must be called with target->scsi_host->host_lock held to protect + * req_lim and tx_head. + */ +static struct srp_iu *__srp_get_tx_iu(struct srp_target_port *target) +{ + if (target->tx_head - target->tx_tail >= SRP_SQ_SIZE) + return NULL; + + return target->tx_ring[target->tx_head & SRP_SQ_SIZE]; +} + +/* + * Must be called with target->scsi_host->host_lock held to protect + * req_lim and tx_head. + */ +static int __srp_post_send(struct srp_target_port *target, + struct srp_iu *iu, int len) +{ + struct ib_sge list; + struct ib_send_wr wr, *bad_wr; + int ret = 0; + + if (target->req_lim < 1) { + printk(KERN_ERR PFX "Target has req_lim %d\n", target->req_lim); + return -EAGAIN; + } + + list.addr = iu->dma; + list.length = len; + list.lkey = target->srp_host->mr->lkey; + + wr.next = NULL; + wr.wr_id = target->tx_head & SRP_SQ_SIZE; + wr.sg_list = &list; + wr.num_sge = 1; + wr.opcode = IB_WR_SEND; + wr.send_flags = IB_SEND_SIGNALED; + + ret = ib_post_send(target->qp, &wr, &bad_wr); + + if (!ret) { + ++target->tx_head; + --target->req_lim; + } + + return ret; +} + +static int srp_queuecommand(struct scsi_cmnd *scmnd, + void (*done)(struct scsi_cmnd *)) +{ + struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_request *req; + struct srp_iu *iu; + struct srp_cmd *cmd; + long req_index; + int len; + + if (target->state == SRP_TARGET_CONNECTING) + goto err; + + if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_REMOVED) { + scmnd->result = DID_BAD_TARGET << 16; + done(scmnd); + return 0; + } + + iu = __srp_get_tx_iu(target); + if (!iu) + goto err; + + dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, + SRP_MAX_IU_LEN, DMA_TO_DEVICE); + + req_index = target->req_head; + + scmnd->scsi_done = done; + scmnd->result = 0; + scmnd->host_scribble = (void *) req_index; + + cmd = iu->buf; + memset(cmd, 0, sizeof *cmd); + + cmd->opcode = SRP_CMD; + cmd->lun = cpu_to_be64((u64) scmnd->device->lun << 48); + cmd->tag = req_index; + memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len); + + req = &target->req_ring[req_index]; + + req->scmnd = scmnd; + req->cmd = iu; + req->cmd_done = 0; + req->tsk_mgmt = NULL; + + len = srp_map_data(scmnd, target, iu); + if (len < 0) { + printk(KERN_ERR PFX "Failed to map data\n"); + goto err; + } + + if (__srp_post_recv(target, GFP_ATOMIC)) { + printk(KERN_ERR PFX "Recv failed\n"); + goto err_unmap; + } + + dma_sync_single_for_device(target->srp_host->dev->dma_device, iu->dma, + SRP_MAX_IU_LEN, DMA_TO_DEVICE); + + if (__srp_post_send(target, iu, len)) { + printk(KERN_ERR PFX "Send failed\n"); + goto err_unmap; + } + + target->req_head = req->next; + list_add_tail(&req->list, &target->req_queue); + + return 0; + +err_unmap: + srp_unmap_data(scmnd, target, cmd); + +err: + return SCSI_MLQUEUE_HOST_BUSY; +} + +static int srp_alloc_iu_bufs(struct srp_target_port *target) +{ + int i; + + for (i = 0; i < SRP_RQ_SIZE; ++i) { + target->rx_ring[i] = srp_alloc_iu(target->srp_host, + target->max_ti_iu_len, + GFP_KERNEL, DMA_FROM_DEVICE); + if (!target->rx_ring[i]) + goto err; + } + + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) { + target->tx_ring[i] = srp_alloc_iu(target->srp_host, + SRP_MAX_IU_LEN, + GFP_KERNEL, DMA_TO_DEVICE); + if (!target->tx_ring[i]) + goto err; + } + + return 0; + +err: + for (i = 0; i < SRP_RQ_SIZE; ++i) { + srp_free_iu(target->srp_host, target->rx_ring[i]); + target->rx_ring[i] = NULL; + } + + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) { + srp_free_iu(target->srp_host, target->tx_ring[i]); + target->tx_ring[i] = NULL; + } + + return -ENOMEM; +} + +static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) +{ + struct srp_target_port *target = cm_id->context; + struct ib_class_port_info *cpi; + struct ib_qp_attr *qp_attr = NULL; + int attr_mask = 0; + int comp = 0; + int ret = 0; + + switch (event->event) { + case IB_CM_REQ_ERROR: + printk(KERN_DEBUG PFX "Sending CM REQ failed\n"); + comp = 1; + target->status = -ECONNRESET; + break; + + case IB_CM_REP_RECEIVED: + comp = 1; + + { + struct srp_login_rsp *rsp = event->private_data; + + /* XXX check that opcode is SRP RSP */ + + target->max_ti_iu_len = be32_to_cpu(rsp->max_ti_iu_len); + target->req_lim = be32_to_cpu(rsp->req_lim_delta); + + target->scsi_host->can_queue = min(target->req_lim, + target->scsi_host->can_queue); + } + + target->status = srp_alloc_iu_bufs(target); + if (target->status) + break; + + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); + if (!qp_attr) { + target->status = -ENOMEM; + break; + } + + qp_attr->qp_state = IB_QPS_RTR; + target->status = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask); + if (target->status) + break; + + qp_attr->rq_psn = 0; /* XXX */ + attr_mask |= IB_QP_RQ_PSN; + + target->status = ib_modify_qp(target->qp, qp_attr, attr_mask); + if (target->status) + break; + + target->status = srp_post_recv(target, GFP_KERNEL); + if (target->status) + break; + + qp_attr->qp_state = IB_QPS_RTS; + target->status = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask); + if (target->status) + break; + + target->status = ib_modify_qp(target->qp, qp_attr, attr_mask); + if (target->status) + break; + + target->status = ib_send_cm_rtu(cm_id, NULL, 0); + if (target->status) + break; + + break; + + case IB_CM_REJ_RECEIVED: + printk(KERN_DEBUG PFX "REJ received\n"); + comp = 1; + + if (event->param.rej_rcvd.reason == IB_CM_REJ_PORT_CM_REDIRECT) { + cpi = event->param.rej_rcvd.ari; + target->path.dlid = cpi->redirect_lid; + target->path.pkey = cpi->redirect_pkey; + cm_id->remote_cm_qpn = be32_to_cpu(cpi->redirect_qp) & 0x00ffffff; + memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); + + target->status = target->path.dlid ? + SRP_DLID_REDIRECT : SRP_PORT_REDIRECT; + } else if (topspin_workarounds && + !memcmp(&target->ioc_guid, topspin_oui, 3) && + event->param.rej_rcvd.reason == IB_CM_REJ_PORT_REDIRECT) { + /* + * Topspin/Cisco SRP gateways incorrectly send + * reject reason code 25 when they mean 24 + * (port redirect). + */ + memcpy(target->path.dgid.raw, + event->param.rej_rcvd.ari, 16); + + printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", + (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), + (unsigned long long) be64_to_cpu(target->path.dgid.global.interface_id)); + + target->status = SRP_PORT_REDIRECT; + } else { + printk(KERN_WARNING " REJ reason 0x%x\n", + event->param.rej_rcvd.reason); + target->status = -ECONNRESET; + ret = 1; + } + + break; + + case IB_CM_MRA_RECEIVED: + printk(KERN_ERR PFX "MRA received\n"); + break; + + case IB_CM_DREP_RECEIVED: + break; + + case IB_CM_TIMEWAIT_EXIT: + printk(KERN_ERR PFX "connection closed\n"); + + comp = 1; + ret = 1; + target->status = 0; + break; + + default: + printk(KERN_WARNING PFX "Unhandled CM event %d\n", event->event); + break; + } + + if (comp) + complete(&target->done); + + kfree(qp_attr); + + return ret; +} + +static int srp_send_tsk_mgmt(struct scsi_cmnd *scmnd, u8 func) +{ + struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_request *req; + struct srp_iu *iu; + struct srp_tsk_mgmt *tsk_mgmt; + int req_index; + int ret = FAILED; + + spin_lock_irq(target->scsi_host->host_lock); + + if (scmnd->host_scribble == (void *) -1L) + goto out; + + req_index = (long) scmnd->host_scribble; + + req = &target->req_ring[req_index]; + init_completion(&req->done); + + iu = __srp_get_tx_iu(target); + if (!iu) + goto out; + + tsk_mgmt = iu->buf; + memset(tsk_mgmt, 0, sizeof *tsk_mgmt); + + tsk_mgmt->opcode = SRP_TSK_MGMT; + tsk_mgmt->lun = cpu_to_be64((u64) scmnd->device->lun << 48); + tsk_mgmt->tag = req_index | SRP_TAG_TSK_MGMT; + tsk_mgmt->tsk_mgmt_func = func; + tsk_mgmt->task_tag = req_index; + + if (__srp_post_send(target, iu, sizeof tsk_mgmt)) + goto out; + + req->tsk_mgmt = iu; + + spin_unlock_irq(target->scsi_host->host_lock); + if (!wait_for_completion_timeout(&req->done, + msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS))) + return FAILED; + spin_lock_irq(target->scsi_host->host_lock); + + if (req->cmd_done) { + list_del(&req->list); + req->next = target->req_head; + target->req_head = req_index; + + scmnd->scsi_done(scmnd); + } else if (!req->tsk_status) { + scmnd->result = DID_ABORT << 16; + ret = SUCCESS; + } + +out: + spin_unlock_irq(target->scsi_host->host_lock); + return ret; +} + +static int srp_abort(struct scsi_cmnd *scmnd) +{ + printk(KERN_ERR "SRP abort called\n"); + + return srp_send_tsk_mgmt(scmnd, SRP_TSK_ABORT_TASK); +} + +static int srp_reset_device(struct scsi_cmnd *scmnd) +{ + printk(KERN_ERR "SRP reset_device called\n"); + + return srp_send_tsk_mgmt(scmnd, SRP_TSK_LUN_RESET); +} + +static int srp_reset_host(struct scsi_cmnd *scmnd) +{ + struct srp_target_port *target = host_to_target(scmnd->device->host); + int ret = FAILED; + + printk(KERN_ERR PFX "SRP reset_host called\n"); + + if (!srp_reconnect_target(target)) + ret = SUCCESS; + + return ret; +} + +static struct scsi_host_template srp_template = { + .module = THIS_MODULE, + .name = DRV_NAME, + .info = srp_target_info, + .queuecommand = srp_queuecommand, + .eh_abort_handler = srp_abort, + .eh_device_reset_handler = srp_reset_device, + .eh_host_reset_handler = srp_reset_host, + .can_queue = SRP_SQ_SIZE, + .this_id = -1, + .sg_tablesize = SRP_MAX_INDIRECT, + .cmd_per_lun = SRP_SQ_SIZE, + .use_clustering = ENABLE_CLUSTERING +}; + +static int srp_add_target(struct srp_host *host, struct srp_target_port *target) +{ + sprintf(target->target_name, "SRP.T10:%016llX", + (unsigned long long) be64_to_cpu(target->id_ext)); + + if (scsi_add_host(target->scsi_host, host->dev->dma_device)) + return -ENODEV; + + down(&host->target_mutex); + list_add_tail(&target->list, &host->target_list); + up(&host->target_mutex); + + target->state = SRP_TARGET_LIVE; + + /* XXX: are we supposed to have a definition of SCAN_WILD_CARD ?? */ + scsi_scan_target(&target->scsi_host->shost_gendev, + 0, target->scsi_id, ~0, 0); + + return 0; +} + +static void srp_release_class_dev(struct class_device *class_dev) +{ + struct srp_host *host = + container_of(class_dev, struct srp_host, class_dev); + + complete(&host->released); +} + +static struct class srp_class = { + .name = "infiniband_srp", + .release = srp_release_class_dev +}; + +/* + * Target ports are added by writing + * + * id_ext=,ioc_guid=,dgid=, + * pkey=,service_id= + * + * to the add_target sysfs attribute. + */ +enum { + SRP_OPT_ERR = 0, + SRP_OPT_ID_EXT = 1 << 0, + SRP_OPT_IOC_GUID = 1 << 1, + SRP_OPT_DGID = 1 << 2, + SRP_OPT_PKEY = 1 << 3, + SRP_OPT_SERVICE_ID = 1 << 4, + SRP_OPT_MAX_SECT = 1 << 5, + SRP_OPT_ALL = (SRP_OPT_ID_EXT | + SRP_OPT_IOC_GUID | + SRP_OPT_DGID | + SRP_OPT_PKEY | + SRP_OPT_SERVICE_ID), +}; + +static match_table_t srp_opt_tokens = { + { SRP_OPT_ID_EXT, "id_ext=%s" }, + { SRP_OPT_IOC_GUID, "ioc_guid=%s" }, + { SRP_OPT_DGID, "dgid=%s" }, + { SRP_OPT_PKEY, "pkey=%x" }, + { SRP_OPT_SERVICE_ID, "service_id=%s" }, + { SRP_OPT_MAX_SECT, "max_sect=%d" }, + { SRP_OPT_ERR, NULL } +}; + +static int srp_parse_options(const char *buf, struct srp_target_port *target) +{ + char *options; + char *p; + char dgid[3]; + substring_t args[MAX_OPT_ARGS]; + int opt_mask = 0; + int token; + int ret = -EINVAL; + int i; + + options = kstrdup(buf, GFP_KERNEL); + if (!options) + return -ENOMEM; + + while ((p = strsep(&options, ",")) != NULL) { + if (!*p) + continue; + + token = match_token(p, srp_opt_tokens, args); + opt_mask |= token; + + switch (token) { + case SRP_OPT_ID_EXT: + p = match_strdup(args); + target->id_ext = cpu_to_be64(simple_strtoull(p, NULL, 16)); + kfree(p); + break; + + case SRP_OPT_IOC_GUID: + p = match_strdup(args); + target->ioc_guid = cpu_to_be64(simple_strtoull(p, NULL, 16)); + kfree(p); + break; + + case SRP_OPT_DGID: + p = match_strdup(args); + if (strlen(p) != 32) + goto out; + + for (i = 0; i < 16; ++i) { + strlcpy(dgid, p + i * 2, 3); + target->path.dgid.raw[i] = simple_strtoul(dgid, NULL, 16); + } + break; + + case SRP_OPT_PKEY: + if (match_hex(args, &token)) + goto out; + target->path.pkey = cpu_to_be16(token); + break; + + case SRP_OPT_SERVICE_ID: + p = match_strdup(args); + target->service_id = cpu_to_be64(simple_strtoull(p, NULL, 16)); + kfree(p); + break; + + case SRP_OPT_MAX_SECT: + if (match_int(args, &token)) + goto out; + target->scsi_host->max_sectors = token; + break; + + default: + goto out; + } + } + + if (opt_mask == SRP_OPT_ALL) + ret = 0; + +out: + kfree(options); + return ret; +} + +static ssize_t srp_create_target(struct class_device *class_dev, + const char *buf, size_t count) +{ + struct srp_host *host = + container_of(class_dev, struct srp_host, class_dev); + struct Scsi_Host *target_host; + struct srp_target_port *target; + int ret; + int i; + + target_host = scsi_host_alloc(&srp_template, + sizeof (struct srp_target_port)); + if (!target_host) + return -ENOMEM; + + target = host_to_target(target_host); + memset(target, 0, sizeof *target); + + target->scsi_host = target_host; + target->srp_host = host; + + INIT_WORK(&target->work, srp_reconnect_work, target); + + for (i = 0; i < SRP_SQ_SIZE - 1; ++i) + target->req_ring[i].next = i + 1; + target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->req_queue); + + ret = srp_parse_options(buf, target); + if (ret) + goto err; + + target->cm_id = ib_create_cm_id(srp_cm_handler, target); + if (IS_ERR(target->cm_id)) { + ret = PTR_ERR(target->cm_id); + goto err; + } + + ib_get_cached_gid(host->dev, host->port, 0, &target->path.sgid); + + printk(KERN_DEBUG PFX "new target: id_ext %016llx ioc_guid %016llx pkey %04x " + "service_id %016llx dgid %04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + (unsigned long long) be64_to_cpu(target->id_ext), + (unsigned long long) be64_to_cpu(target->ioc_guid), + be16_to_cpu(target->path.pkey), + (unsigned long long) be64_to_cpu(target->service_id), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[0]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[2]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[4]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[6]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[8]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[10]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[12]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[14])); + + ret = srp_lookup_path(target); + if (ret) { + ib_destroy_cm_id(target->cm_id); + goto err; + } + + ret = srp_connect_target(target); + if (ret) { + printk(KERN_ERR PFX "Connection failed\n"); + goto err; + } + + ret = srp_add_target(host, target); + if (ret) + goto err_disconnect; + + return count; + +err_disconnect: + init_completion(&target->done); + ib_send_cm_dreq(target->cm_id, NULL, 0); + wait_for_completion(&target->done); + + ib_destroy_qp(target->qp); + ib_destroy_cq(target->cq); + +err: + for (i = 0; i < SRP_RQ_SIZE; ++i) + srp_free_iu(target->srp_host, target->rx_ring[i]); + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) + srp_free_iu(target->srp_host, target->tx_ring[i]); + + scsi_host_put(target_host); + + return ret; +} + +static CLASS_DEVICE_ATTR(add_target, S_IWUSR, NULL, srp_create_target); + +static struct srp_host *srp_add_port(struct ib_device *device, + __be64 node_guid, u8 port) +{ + struct srp_host *host; + + host = kzalloc(sizeof *host, GFP_KERNEL); + if (!host) + return NULL; + + INIT_LIST_HEAD(&host->target_list); + init_MUTEX(&host->target_mutex); + init_completion(&host->released); + host->dev = device; + host->port = port; + + host->initiator_port_id[7] = port; + memcpy(host->initiator_port_id + 8, &node_guid, 8); + + host->pd = ib_alloc_pd(device); + if (IS_ERR(host->pd)) + goto err_free; + + host->mr = ib_get_dma_mr(host->pd, + IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_READ | + IB_ACCESS_REMOTE_WRITE); + if (IS_ERR(host->mr)) + goto err_pd; + + host->class_dev.class = &srp_class; + host->class_dev.dev = device->dma_device; + snprintf(host->class_dev.class_id, BUS_ID_SIZE, "srp-%s-%d", + device->name, port); + + if (class_device_register(&host->class_dev)) + goto err_mr; + if (class_device_create_file(&host->class_dev, &class_device_attr_add_target)) + goto err_class; + /* XXX ibdev / port files as well */ + + return host; + +err_class: + class_device_unregister(&host->class_dev); + +err_mr: + ib_dereg_mr(host->mr); + +err_pd: + ib_dealloc_pd(host->pd); + +err_free: + kfree(host); + + return NULL; +} + +static void srp_add_one(struct ib_device *device) +{ + struct list_head *dev_list; + struct srp_host *host; + struct ib_device_attr *dev_attr; + int s, e, p; + + dev_attr = kmalloc(sizeof *dev_attr, GFP_KERNEL); + if (!dev_attr) + return; + + if (ib_query_device(device, dev_attr)) { + printk(KERN_WARNING PFX "Couldn't query node GUID for %s.\n", + device->name); + goto out; + } + + dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL); + if (!dev_list) + goto out; + + INIT_LIST_HEAD(dev_list); + + if (device->node_type == IB_NODE_SWITCH) { + s = 0; + e = 0; + } else { + s = 1; + e = device->phys_port_cnt; + } + + for (p = s; p <= e; ++p) { + host = srp_add_port(device, dev_attr->node_guid, p); + if (host) + list_add_tail(&host->list, dev_list); + } + + ib_set_client_data(device, &srp_client, dev_list); + +out: + kfree(dev_attr); +} + +static void srp_remove_one(struct ib_device *device) +{ + struct list_head *dev_list; + struct srp_host *host, *tmp_host; + LIST_HEAD(target_list); + struct srp_target_port *target, *tmp_target; + unsigned long flags; + + dev_list = ib_get_client_data(device, &srp_client); + + list_for_each_entry_safe(host, tmp_host, dev_list, list) { + class_device_unregister(&host->class_dev); + /* + * Wait for the sysfs entry to go away, so that no new + * target ports can be created. + */ + wait_for_completion(&host->released); + + /* + * Mark all target ports as removed, so we stop queueing + * commands and don't try to reconnect. + */ + down(&host->target_mutex); + list_for_each_entry_safe(target, tmp_target, + &host->target_list, list) { + spin_lock_irqsave(target->scsi_host->host_lock, flags); + if (target->state != SRP_TARGET_REMOVED) + target->state = SRP_TARGET_REMOVED; + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); + } + up(&host->target_mutex); + + /* + * Wait for any reconnection tasks that may have + * started before we marked our target ports as + * removed, and any target port removal tasks. + */ + flush_scheduled_work(); + + list_for_each_entry_safe(target, tmp_target, + &host->target_list, list) { + scsi_remove_host(target->scsi_host); + srp_disconnect_target(target); + srp_free_target_ib(target); + scsi_host_put(target->scsi_host); + } + + ib_dereg_mr(host->mr); + ib_dealloc_pd(host->pd); + kfree(host); + } + + kfree(dev_list); +} + +static int __init srp_init_module(void) +{ + int ret; + + ret = class_register(&srp_class); + if (ret) { + printk(KERN_ERR PFX "couldn't register class infiniband_srp\n"); + return ret; + } + + ret = ib_register_client(&srp_client); + if (ret) { + printk(KERN_ERR PFX "couldn't register IB client\n"); + class_unregister(&srp_class); + return ret; + } + + return 0; +} + +static void __exit srp_cleanup_module(void) +{ + ib_unregister_client(&srp_client); + class_unregister(&srp_class); +} + +module_init(srp_init_module); +module_exit(srp_cleanup_module); diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h new file mode 100644 --- /dev/null +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -0,0 +1,324 @@ +/* + * Copyright (c) 2005 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: ib_srp.h 3394 2005-09-13 05:04:31Z roland $ + */ + +#ifndef IB_SRP_H +#define IB_SRP_H + +#include +#include + +#include + +#include +#include + +#include +#include +#include + +enum { + SRP_PATH_REC_TIMEOUT_MS = 1000, + SRP_ABORT_TIMEOUT_MS = 5000, + + SRP_PORT_REDIRECT = 1, + SRP_DLID_REDIRECT = 2, + + SRP_MAX_IU_LEN = 256, + + SRP_RQ_SHIFT = 6, + SRP_RQ_SIZE = 1 << SRP_RQ_SHIFT, + SRP_SQ_SIZE = SRP_RQ_SIZE - 1, + SRP_CQ_SIZE = SRP_SQ_SIZE + SRP_RQ_SIZE, + + SRP_TAG_TSK_MGMT = 1 << (SRP_RQ_SHIFT + 1) +}; + +#define SRP_OP_RECV (1 << 31) +#define SRP_MAX_INDIRECT ((SRP_MAX_IU_LEN - \ + sizeof (struct srp_cmd) - \ + sizeof (struct srp_indirect_buf)) / 16) + +enum srp_target_state { + SRP_TARGET_LIVE, + SRP_TARGET_CONNECTING, + SRP_TARGET_DEAD, + SRP_TARGET_REMOVED +}; + +struct srp_host { + u8 initiator_port_id[16]; + struct ib_device *dev; + u8 port; + struct ib_pd *pd; + struct ib_mr *mr; + struct class_device class_dev; + struct list_head target_list; + struct semaphore target_mutex; + struct completion released; + struct list_head list; +}; + +struct srp_request { + struct list_head list; + struct scsi_cmnd *scmnd; + struct srp_iu *cmd; + struct srp_iu *tsk_mgmt; + struct completion done; + short next; + u8 cmd_done; + u8 tsk_status; +}; + +struct srp_target_port { + __be64 id_ext; + __be64 ioc_guid; + __be64 service_id; + struct srp_host *srp_host; + struct Scsi_Host *scsi_host; + char target_name[32]; + unsigned int scsi_id; + + struct ib_sa_path_rec path; + struct ib_sa_query *path_query; + int path_query_id; + + struct ib_cm_id *cm_id; + struct ib_cq *cq; + struct ib_qp *qp; + + int max_ti_iu_len; + s32 req_lim; + + unsigned rx_head; + unsigned rx_tail; + struct srp_iu *rx_ring[SRP_RQ_SIZE]; + + unsigned tx_head; + unsigned tx_tail; + struct srp_iu *tx_ring[SRP_SQ_SIZE + 1]; + + int req_head; + struct list_head req_queue; + struct srp_request req_ring[SRP_SQ_SIZE]; + + struct work_struct work; + + struct list_head list; + struct completion done; + int status; + enum srp_target_state state; +}; + +struct srp_iu { + dma_addr_t dma; + void *buf; + size_t size; + enum dma_data_direction direction; +}; + +/* + * SRP protocol definitions + */ + +enum { + SRP_LOGIN_REQ = 0x00, + SRP_TSK_MGMT = 0x01, + SRP_CMD = 0x02, + SRP_I_LOGOUT = 0x03, + SRP_LOGIN_RSP = 0xc0, + SRP_RSP = 0xc1, + SRP_LOGIN_REJ = 0xc2, + SRP_T_LOGOUT = 0x80, + SRP_CRED_REQ = 0x81, + SRP_AER_REQ = 0x82, + SRP_CRED_RSP = 0x41, + SRP_AER_RSP = 0x42 +}; + +enum { + SRP_BUF_FORMAT_DIRECT = 1 << 1, + SRP_BUF_FORMAT_INDIRECT = 1 << 2 +}; + +enum { + SRP_NO_DATA_DESC = 0, + SRP_DATA_DESC_DIRECT = 1, + SRP_DATA_DESC_INDIRECT = 2 +}; + +enum { + SRP_TSK_ABORT_TASK = 0x01, + SRP_TSK_ABORT_TASK_SET = 0x02, + SRP_TSK_CLEAR_TASK_SET = 0x04, + SRP_TSK_LUN_RESET = 0x08, + SRP_TSK_CLEAR_ACA = 0x40 +}; + +struct srp_direct_buf { + __be64 va; + __be32 key; + __be32 len; +}; + +/* + * We need the packed attribute because the SRP spec puts the list of + * descriptors at an offset of 20, which is not aligned to the size + * of struct srp_direct_buf. + */ +struct srp_indirect_buf { + struct srp_direct_buf table_desc; + __be32 len; + struct srp_direct_buf desc_list[0] __attribute__((packed)); +}; + +enum { + SRP_MULTICHAN_SINGLE = 0, + SRP_MULTICHAN_MULTI = 1 +}; + +struct srp_login_req { + u8 opcode; + u8 reserved1[7]; + u64 tag; + __be32 req_it_iu_len; + u8 reserved2[4]; + __be16 req_buf_fmt; + u8 req_flags; + u8 reserved3[5]; + u8 initiator_port_id[16]; + u8 target_port_id[16]; +}; + +struct srp_login_rsp { + u8 opcode; + u8 reserved1[3]; + __be32 req_lim_delta; + u64 tag; + __be32 max_it_iu_len; + __be32 max_ti_iu_len; + __be16 buf_fmt; + u8 rsp_flags; + u8 reserved2[25]; +}; + +struct srp_login_rej { + u8 opcode; + u8 reserved1[3]; + __be32 reason; + u64 tag; + u8 reserved2[8]; + __be16 buf_fmt; + u8 reserved3[6]; +}; + +struct srp_i_logout { + u8 opcode; + u8 reserved[7]; + u64 tag; +}; + +struct srp_t_logout { + u8 opcode; + u8 sol_not; + u8 reserved[2]; + __be32 reason; + u64 tag; +}; + +/* + * We need the packed attribute because the SRP spec only aligns the + * 8-byte LUN field to 4 bytes. + */ +struct srp_tsk_mgmt { + u8 opcode; + u8 sol_not; + u8 reserved1[6]; + u64 tag; + u8 reserved2[4]; + __be64 lun __attribute__((packed)); + u8 reserved3[2]; + u8 tsk_mgmt_func; + u8 reserved4; + u64 task_tag; + u8 reserved5[8]; +}; + +/* + * We need the packed attribute because the SRP spec only aligns the + * 8-byte LUN field to 4 bytes. + */ +struct srp_cmd { + u8 opcode; + u8 sol_not; + u8 reserved1[3]; + u8 buf_fmt; + u8 data_out_desc_cnt; + u8 data_in_desc_cnt; + u64 tag; + u8 reserved2[4]; + __be64 lun __attribute__((packed)); + u8 reserved3; + u8 task_attr; + u8 reserved4; + u8 add_cdb_len; + u8 cdb[16]; + u8 add_data[0]; +}; + +enum { + SRP_RSP_FLAG_RSPVALID = 1 << 0, + SRP_RSP_FLAG_SNSVALID = 1 << 1, + SRP_RSP_FLAG_DOOVER = 1 << 2, + SRP_RSP_FLAG_DOUNDER = 1 << 3, + SRP_RSP_FLAG_DIOVER = 1 << 4, + SRP_RSP_FLAG_DIUNDER = 1 << 5 +}; + +struct srp_rsp { + u8 opcode; + u8 sol_not; + u8 reserved1[2]; + __be32 req_lim_delta; + u64 tag; + u8 reserved2[2]; + u8 flags; + u8 status; + __be32 data_out_res_cnt; + __be32 data_in_res_cnt; + __be32 sense_data_len; + __be32 resp_data_len; + u8 data[0]; +}; + +#endif /* IB_SRP_H */ From mlleini at ca.sandia.gov Tue Sep 13 11:07:46 2005 From: mlleini at ca.sandia.gov (Matt L. Leininger) Date: Tue, 13 Sep 2005 11:07:46 -0700 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <004801c5b88a$0602c690$9e5aa8c0@infiniconsys.com> References: <004801c5b88a$0602c690$9e5aa8c0@infiniconsys.com> Message-ID: <1126634866.19052.92.camel@localhost> On Tue, 2005-09-13 at 10:38 -0700, Fab Tillier wrote: > > From: Ronald G Minnich [mailto:rminnich at lanl.gov] > > Sent: Tuesday, September 13, 2005 10:11 AM > > > > Roland Dreier wrote: > > > > > Actually I think the issue was somewhat different. Microsoft is so > > > allergic to the GPL that they asked for the code to be in a physically > > > separate repository. > > > > > > > that makes much more sense, ah, well, not really, but it is easier to > > understand. I doubt the Labs would have any objection to Windows code. > > Sandia did object, so we found an alternate host for the Windows SVN repository. > It was more than just Sandia. But with the big different between the linux and windows stack having separate repositories made sense at the time. - Matt From rolandd at cisco.com Tue Sep 13 11:16:27 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 11:16:27 -0700 Subject: [openib-general] Re: [PATCH] libmthca: fix wqe post In-Reply-To: <4df28be4050913101339456607@mail.gmail.com> (Viswanath Krishnamurthy's message of "Tue, 13 Sep 2005 10:13:31 -0700") References: <52u0gp95d9.fsf@cisco.com> <20050913153155.GK14121@mellanox.co.il> <4df28be4050913101339456607@mail.gmail.com> Message-ID: <52hdco7s04.fsf@cisco.com> Viswanath> Once you generate a kernel patch, I can test out both Viswanath> user and kernel mthca since I have the tests ready.. Excellent. I merged MST's patch, and applied the patch below to the kernel. (So you can either update from svn or apply the patches) Thanks for testing -- let me know if you still see problems. Index: infiniband/hw/mthca/mthca_srq.c =================================================================== --- infiniband/hw/mthca/mthca_srq.c (revision 3404) +++ infiniband/hw/mthca/mthca_srq.c (working copy) @@ -189,7 +189,6 @@ int mthca_alloc_srq(struct mthca_dev *de srq->max = attr->max_wr; srq->max_gs = attr->max_sge; - srq->last = NULL; srq->counter = 0; if (mthca_is_memfree(dev)) @@ -264,6 +263,7 @@ int mthca_alloc_srq(struct mthca_dev *de srq->first_free = 0; srq->last_free = srq->max - 1; + srq->last = get_wqe(srq, srq->max - 1); return 0; @@ -446,13 +446,11 @@ int mthca_tavor_post_srq_recv(struct ib_ ((struct mthca_data_seg *) wqe)->addr = 0; } - if (likely(prev_wqe)) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - cpu_to_be32((ind << srq->wqe_shift) | 1); - wmb(); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - cpu_to_be32(MTHCA_NEXT_DBD); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + cpu_to_be32((ind << srq->wqe_shift) | 1); + wmb(); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + cpu_to_be32(MTHCA_NEXT_DBD); srq->wrid[ind] = wr->wr_id; srq->first_free = next_ind; Index: infiniband/hw/mthca/mthca_qp.c =================================================================== --- infiniband/hw/mthca/mthca_qp.c (revision 3404) +++ infiniband/hw/mthca/mthca_qp.c (working copy) @@ -227,7 +227,6 @@ static void mthca_wq_init(struct mthca_w wq->last_comp = wq->max - 1; wq->head = 0; wq->tail = 0; - wq->last = NULL; } void mthca_qp_event(struct mthca_dev *dev, u32 qpn, @@ -1103,6 +1102,9 @@ static int mthca_alloc_qp_common(struct } } + qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); + qp->rq.last = get_recv_wqe(qp, qp->rq.max - 1); + return 0; } @@ -1583,15 +1585,13 @@ int mthca_tavor_post_send(struct ib_qp * goto out; } - if (prev_wqe) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - cpu_to_be32(((ind << qp->sq.wqe_shift) + - qp->send_wqe_offset) | - mthca_opcode[wr->opcode]); - wmb(); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - cpu_to_be32((size0 ? 0 : MTHCA_NEXT_DBD) | size); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + cpu_to_be32(((ind << qp->sq.wqe_shift) + + qp->send_wqe_offset) | + mthca_opcode[wr->opcode]); + wmb(); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + cpu_to_be32((size0 ? 0 : MTHCA_NEXT_DBD) | size); if (!size0) { size0 = size; @@ -1688,13 +1688,11 @@ int mthca_tavor_post_receive(struct ib_q qp->wrid[ind] = wr->wr_id; - if (likely(prev_wqe)) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - cpu_to_be32((ind << qp->rq.wqe_shift) | 1); - wmb(); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - cpu_to_be32(MTHCA_NEXT_DBD | size); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + cpu_to_be32((ind << qp->rq.wqe_shift) | 1); + wmb(); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + cpu_to_be32(MTHCA_NEXT_DBD | size); if (!size0) size0 = size; @@ -1905,15 +1903,13 @@ int mthca_arbel_post_send(struct ib_qp * goto out; } - if (likely(prev_wqe)) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - cpu_to_be32(((ind << qp->sq.wqe_shift) + - qp->send_wqe_offset) | - mthca_opcode[wr->opcode]); - wmb(); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - cpu_to_be32(MTHCA_NEXT_DBD | size); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + cpu_to_be32(((ind << qp->sq.wqe_shift) + + qp->send_wqe_offset) | + mthca_opcode[wr->opcode]); + wmb(); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + cpu_to_be32(MTHCA_NEXT_DBD | size); if (!size0) { size0 = size; From liran at mellanox.co.il Tue Sep 13 11:30:25 2005 From: liran at mellanox.co.il (Liran Sorani) Date: Tue, 13 Sep 2005 21:30:25 +0300 Subject: [openib-general] RE: osmtest osmt_multicast.c physible Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA55B@mtlexch01.mtl.com> -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Tuesday, September 13, 2005 1:39 PM To: Liran Sorani Cc: Yael Kalka; 'openib-general at openib.org' Subject: RE: osmtest osmt_multicast.c physible > On Tue, 2005-09-13 at 03:30, Liran Sorani wrote: > > The o15-0.2.2 is dealing with MC group creation its required > > components . > as was the old o15.0.1.4 from 1.1 which the comment in the code was > referring to. Shouldn't it be updated to 1.2 ? Yes it is updated but in our local repository . The Mellanox test package will be uploaded to openib in the near future . > > The section you've pointed at the osmtest does MC creation by variant > > MTU & RATE values. > Right. > > A physible MTU / RATE means valid subnet MTU/RATE values for the port > > Osmtest running on. These values are responsed by OpenSM as the > > correct values this port can use. > I understand now: Feasible rather than physible. Sorry for the misleading name. > -- Hal > > -----Original Message----- > > From: Yael Kalka > > Sent: Tuesday, September 13, 2005 8:21 AM > > To: 'Hal Rosenstock'; Yael Kalka; Liran Sorani > > Cc: openib-general at openib.org > > Subject: RE: osmtest osmt_multicast.c physible > > > > > > Liran, > > As owner of the osmtest - please answer the below. > > Yael > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Monday, September 12, 2005 3:36 PM > > To: Yael Kalka > > Cc: openib-general at openib.org > > Subject: osmtest osmt_multicast.c physible > > > > > > Hi Yael, > > > > What is meant by physible in the below ? > > > > osmtest/osmt_multicast.c: "Fifth exact MTU & RATE physible, > > Sixth exact RATE physible\n\t\t" > > osmtest/osmt_multicast.c: "Seventh exact MTU physible > > (o15.0.1.4)...\n" > > osmtest/osmt_multicast.c: /* Using Exact physible MTU & RATE */ > > osmtest/osmt_multicast.c: /* Using Exact physible RATE */ > > osmtest/osmt_multicast.c: /* Using Exact physible MTU */ > > > > Also, o15.0.1.4 is obsolete at 1.2 and is replaced by o15-0.2.2.. > > > > Thanks. > > > > -- Hal > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlleini at ca.sandia.gov Tue Sep 13 11:30:07 2005 From: mlleini at ca.sandia.gov (Matt L. Leininger) Date: Tue, 13 Sep 2005 11:30:07 -0700 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <004d01c5b88e$cf6a1570$9e5aa8c0@infiniconsys.com> References: <004d01c5b88e$cf6a1570$9e5aa8c0@infiniconsys.com> Message-ID: <1126636207.19052.99.camel@localhost> On Tue, 2005-09-13 at 11:13 -0700, Fab Tillier wrote: > > From: Ryan, Jim [mailto:jim.ryan at intel.com] > > Sent: Tuesday, September 13, 2005 10:07 AM > > > > My recollection is Matt Leininger, could be wrong > > I believe that Matt was just the messenger, conveying his organization's > position on the matter. Whether or not we agree with that position is > immaterial - it is Sandia's prerogative. My position was we should use the same code in the Linux stack for the windows stack. MS and some openib members didn't seem to like that even though the code was dual licensed GPL/BSD. Speculation, fear, and uncertainty about MS also lead to the idea of having a separate code repository to show MS that there was a clear separate between the openib linux and windows code. I always preferred to do something reasonable. Don't blame me for the strange MS requirements. - Matt > > There was a need to have a separate repository independent of hosting issues. > > - Fab > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From xma at us.ibm.com Tue Sep 13 11:53:58 2005 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 13 Sep 2005 11:53:58 -0700 Subject: [openib-general] Mellanox device in INIT state Message-ID: After loading the ib_mthca module on PPC, the device state is in INIT not ACTIVE state. Any clue? 0000:d9:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost HCA (rev a1) Subsystem: Mellanox Technologies MT23108 InfiniHost HCA Flags: bus master, 66Mhz, medium devsel, latency 144, IRQ 137 Memory at c0800000 (64-bit, non-prefetchable) [size=1M] Memory at c0000000 (64-bit, prefetchable) [size=8M] Capabilities: [40] #11 [001f] Capabilities: [50] Vital Product Data Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable- Capabilities: [70] PCI-X non-bridge device. Below are from /var/log/messages: Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: FW version 000300030003, max commands 64 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: FW size 6143 KB (start c7a00000, end c7ffffff) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: HCA memory size 131071 KB (start c0000000, end c7ffffff) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Max QPs: 16777216, reserved QPs: 1024, entry size: 256 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Max SRQs: 1024, reserved SRQs: 16, entry size: 32 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Max CQs: 16777216, reserved CQs: 128, entry size: 64 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Max EQs: 64, reserved EQs: 1, entry size: 64 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: reserved MPTs: 16, reserved MTTs: 16 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Max PDs: 16777216, reserved PDs: 0, reserved UARs: 1 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Max QP/MCG: 16777216, reserved MGMs: 0 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Flags: 00370347 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[ 0]--10/20 @ 0x c0000000 (size 0x 4000000) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[ 1]-- 0/16 @ 0x c4000000 (size 0x 1000000) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[ 2]-- 7/18 @ 0x c5000000 (size 0x 800000) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[ 3]-- 9/17 @ 0x c5800000 (size 0x 800000) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[ 4]-- 3/16 @ 0x c6000000 (size 0x 400000) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[ 5]-- 4/16 @ 0x c6400000 (size 0x 200000) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[ 6]--12/15 @ 0x c6600000 (size 0x 100000) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[ 7]-- 8/13 @ 0x c6700000 (size 0x 80000) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[ 8]--11/11 @ 0x c6780000 (size 0x 10000) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[ 9]-- 2/10 @ 0x c6790000 (size 0x 8000) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: profile[10]-- 6/ 5 @ 0x c6798000 (size 0x 800) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: HCA memory: allocated 106082 KB/124928 KB (18846 KB free) Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Allocated EQ 1 with 65536 entries Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Allocated EQ 2 with 128 entries Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Allocated EQ 3 with 128 entries Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Setting mask 00000000000f43fe for eqn 2 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: Setting mask 0000000000000400 for eqn 3 Sep 13 10:49:12 elm3b39 kernel: ib_mthca 0000:d9:00.0: NOP command IRQ test passed Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Sep 13 11:56:03 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 14:56:03 -0400 Subject: [openib-general] Mellanox device in INIT state In-Reply-To: References: Message-ID: <1126637643.4514.879.camel@hal.voltaire.com> On Tue, 2005-09-13 at 14:53, Shirley Ma wrote: > After loading the ib_mthca module on PPC, the device state is in INIT > not ACTIVE state. Any clue? You need to run an SM on the subnet to bring it from INIT to ACTIVE. -- Hal From boas1 at llnl.gov Tue Sep 13 11:36:44 2005 From: boas1 at llnl.gov (Bill Boas) Date: Tue, 13 Sep 2005 11:36:44 -0700 Subject: [openib-general] Location of OpenIB Windows repository In-Reply-To: References: Message-ID: <6.2.1.2.2.20050913112903.036896c0@mail-lc.llnl.gov> All, To set the record straight the discussion about the location of the repository was based around locating it in such a way that no perception of "mix" of IP between Linux and Windows stacks could be alleged based upon the repositories being on the same server or at the same site. Therefore as the Linux repository is at Sandia and neither Livermore nor Los Alamos offered to host the Windows repository an alternative was sought and Cornell selected. One can challenge the validity of this thinking but thats what happened in my recollection. Bill. At 10:07 AM 9/13/2005, Ryan, Jim wrote: >My recollection is Matt Leininger, could be wrong > >Jim > >-----Original Message----- >From: openib-general-bounces at openib.org >[mailto:openib-general-bounces at openib.org] On Behalf Of Ronald G Minnich >Sent: Tuesday, September 13, 2005 10:05 AM >To: Christoph Hellwig >Cc: openib-general at openib.org >Subject: Re: [openib-general] Re: Opensm - casting issues #2 > > > > On Tue, Sep 13, 2005 at 09:26:31AM -0700, Sean Hefty wrote: > > >>My understanding is that the labs, who control the OpenIB servers, >refused > >>to host any Windows related code, forcing it to have a separate >repository. > >wow, that's news to me! Maybe I'm at the wrong lab! > >Anybody have a source for this "understanding", because off the top of >my head, it just doesn't sound right. > >ron >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit >http://openib.org/mailman/listinfo/openib-general >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general Bill Boas bboas at llnl.gov ICCD LLNL, B-453, R-2018 Wk: 925-422-4110 7000 East Ave, L-555 Cell: 925-337-2224 Livermore, CA 94551 Pgr: 877-203-2248 From mshefty at ichips.intel.com Tue Sep 13 12:03:14 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 13 Sep 2005 12:03:14 -0700 Subject: [openib-general] [PATCH] [CM] 1/6 core kernel changes to bind cm_id's to a device In-Reply-To: <52hdcxpv6b.fsf@cisco.com> References: <52hdcxpv6b.fsf@cisco.com> Message-ID: <43272272.6030606@ichips.intel.com> Roland Dreier wrote: > Now that cm_id's are per-IB-device, does it make sense to have the > userspace CM create a charcter node for each IB device? It seems that > might simplify the interface. I've modified the patch to have the uCM create a character node for each IB device. > uverbs handles up to 32 IB devices with minors 192...223, so using the > minors 224...255 for 32 ucm devices would make sense. Unfortunately > uat is using minor 254, so we would have to do some rejiggering. I > guess minor numbers shouldn't be driving this -- we should just pick > the right interface, whatever it is. I did go with base minor 224, requiring that uat move its minor. For the userspace portion, I'm still trying to decide what the correct API should be. I'd like to avoid apps from having to call something like ib_cm_get_devices(), which would mirror the verbs call. I was thinking of having ib_cm_create_id() still take a struct ibv_context* as input, opening the corresponding CM node, and managing that internally. Thoughts? - Sean From viswa.krish at gmail.com Tue Sep 13 12:29:50 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Tue, 13 Sep 2005 12:29:50 -0700 Subject: [openib-general] Strange configure error in libibcm Message-ID: <4df28be405091312295f6500d5@mail.gmail.com> I got the latest code from the repository to verify mthca fixes, I ran into this strange configure error in libibcm checking infiniband/at.h usability... yes checking infiniband/at.h presence... yes checking for infiniband/at.h... yes checking for ANSI C header files... (cached) yes checking for an ANSI C-conforming const... yes checking for long... yes checking size of long... configure: error: cannot compute sizeof (long), 77 See `config.log' for more details. gcc version is 3.4 Linux 2.6.13 I was able to build earlier versions on the same machine. This happens only with libibcm Any clues ? -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Sep 13 12:51:23 2005 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 13 Sep 2005 12:51:23 -0700 Subject: [openib-general] Strange configure error in libibcm In-Reply-To: <4df28be405091312295f6500d5@mail.gmail.com> Message-ID: export LD_LIBRARY_PATH=/usr/local/lib will solve this problem. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 Viswanath Krishnamurthy Sent by: openib-general-bounces at openib.org 09/13/2005 12:29 PM To openib-general at openib.org cc Subject [openib-general] Strange configure error in libibcm I got the latest code from the repository to verify mthca fixes, I ran into this strange configure error in libibcm checking infiniband/at.h usability... yes checking infiniband/at.h presence... yes checking for infiniband/at.h... yes checking for ANSI C header files... (cached) yes checking for an ANSI C-conforming const... yes checking for long... yes checking size of long... configure: error: cannot compute sizeof (long), 77 See `config.log' for more details. gcc version is 3.4 Linux 2.6.13 I was able to build earlier versions on the same machine. This happens only with libibcm Any clues ? -Viswa _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve_wooding at keysounds.co.uk Tue Sep 13 13:25:58 2005 From: steve_wooding at keysounds.co.uk (Steve Wooding) Date: Tue, 13 Sep 2005 21:25:58 +0100 Subject: [openib-general] Userspace function to query SA for ServiceRecord by name Message-ID: <432735D6.9040403@keysounds.co.uk> Hi, I was wondering if a function to query the SA for a ServiceRecord by its name has been implemented at all? I've found the kernelspace SA query function, but not a userspace one. Cheers, Steve. From halr at voltaire.com Tue Sep 13 13:24:40 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 16:24:40 -0400 Subject: [openib-general] imgen mic messages Message-ID: <1126643080.4514.1101.camel@hal.voltaire.com> Hi Michael, I got the following running mic: mic -dev_type MT23108 -fw fw-23108-a1-rel.mic -conf MHXL-CF128-T.brd -format BINARY -wrimage fw-23108-a1-rel.bin -W- Trying to assign float value (330357.142857143) to an integer parameter (dram_frequency_2x_khz). Fraction truncated. -W- Parameter "FW.b2.AutoRename.258" fw-23108-a1-rel.mic:2146 initializes same CR-SPACE location (0x20a0a.0) as parameter "FW.b2.AutoRename.257" fw-23108-a1-rel.mic:2145 -W- Parameter "FW.b2.AutoRename.804" fw-23108-a1-rel.mic:2826 initializes same CR-SPACE location (0x3cc15.0) as parameter "FW.pcu debug out posted enable" fw-23108-a1-rel.mic:2761 -W- Parameter "FW.b2.AutoRename.805" fw-23108-a1-rel.mic:2827 initializes same CR-SPACE location (0x3cc15.1) as parameter "FW.pcu debug out nonposted enable" fw-23108-a1-rel.mic:2762 -W- Parameter "FW.b2.AutoRename.811" fw-23108-a1-rel.mic:2839 initializes same CR-SPACE location (0x3d690.22) as parameter "FW.DMUrx1d" fw-23108-a1-rel.mic:2757 They all appear to be warnings rather than errors. Is this OK to burn ? -- Hal From jlentini at netapp.com Tue Sep 13 13:34:22 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 13 Sep 2005 16:34:22 -0400 (EDT) Subject: [openib-general] imgen mic messages In-Reply-To: <1126643080.4514.1101.camel@hal.voltaire.com> References: <1126643080.4514.1101.camel@hal.voltaire.com> Message-ID: These are expected. I documented the procedure I used to flash my cards in the installation cheat sheet: https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet james On Tue, 13 Sep 2005, Hal Rosenstock wrote: > Hi Michael, > > I got the following running mic: > mic -dev_type MT23108 -fw fw-23108-a1-rel.mic -conf MHXL-CF128-T.brd -format BINARY -wrimage fw-23108-a1-rel.bin > -W- Trying to assign float value (330357.142857143) to an integer parameter (dram_frequency_2x_khz). Fraction truncated. > -W- Parameter "FW.b2.AutoRename.258" fw-23108-a1-rel.mic:2146 initializes same CR-SPACE location (0x20a0a.0) as parameter "FW.b2.AutoRename.257" fw-23108-a1-rel.mic:2145 > -W- Parameter "FW.b2.AutoRename.804" fw-23108-a1-rel.mic:2826 initializes same CR-SPACE location (0x3cc15.0) as parameter "FW.pcu debug out posted enable" fw-23108-a1-rel.mic:2761 > -W- Parameter "FW.b2.AutoRename.805" fw-23108-a1-rel.mic:2827 initializes same CR-SPACE location (0x3cc15.1) as parameter "FW.pcu debug out nonposted enable" fw-23108-a1-rel.mic:2762 > -W- Parameter "FW.b2.AutoRename.811" fw-23108-a1-rel.mic:2839 initializes same CR-SPACE location (0x3d690.22) as parameter "FW.DMUrx1d" fw-23108-a1-rel.mic:2757 > > They all appear to be warnings rather than errors. > > Is this OK to burn ? > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rolandd at cisco.com Tue Sep 13 13:35:24 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 13:35:24 -0700 Subject: [openib-general] Userspace function to query SA for ServiceRecord by name In-Reply-To: <432735D6.9040403@keysounds.co.uk> (Steve Wooding's message of "Tue, 13 Sep 2005 21:25:58 +0100") References: <432735D6.9040403@keysounds.co.uk> Message-ID: <527jdk7lkj.fsf@cisco.com> Steve> Hi, I was wondering if a function to query the SA for a Steve> ServiceRecord by its name has been implemented at all? I've Steve> found the kernelspace SA query function, but not a Steve> userspace one. I don't think anyone has implemented that. It should be pretty easy to do on top of the userspace MAD access -- just send the query and the read the reply back. - R. From halr at voltaire.com Tue Sep 13 13:33:47 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 16:33:47 -0400 Subject: [openib-general] Userspace function to query SA for ServiceRecord by name In-Reply-To: <432735D6.9040403@keysounds.co.uk> References: <432735D6.9040403@keysounds.co.uk> Message-ID: <1126643626.4514.1131.camel@hal.voltaire.com> Hi Steve, On Tue, 2005-09-13 at 16:25, Steve Wooding wrote: > I was wondering if a function to query the SA for a ServiceRecord by its > name has been implemented at all? I've found the kernelspace SA query > function, but not a userspace one. The closest thing for this is the OpenSM SA client in the vendor layer in osm/libvendor/osm_vendor_ibumad_sa.c. [There are some examples of how to use this if you want.] A gen2 SA client for this has not been implemented. Do you also have the need to register (set) and delete ServiceRecords as well from userspace ? -- Hal From halr at voltaire.com Tue Sep 13 13:43:53 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 16:43:53 -0400 Subject: [openib-general] imgen mic messages In-Reply-To: References: <1126643080.4514.1101.camel@hal.voltaire.com> Message-ID: <1126644232.4514.1157.camel@hal.voltaire.com> On Tue, 2005-09-13 at 16:34, James Lentini wrote: > These are expected. The warnings you documented appear different: NOTE: The following warning can be ignored: -W- Parameter FW.tpt.scratchpad.pciex.endpoint_mask.pci_cfg_space.cfg_hdr.reg12.exp_rom_en already assigned in Perl. Old = 0x0; new = 0x0 -W- Parameter FW.tpt.scratchpad.pciex.endpoint_mask.pci_cfg_space.cfg_hdr.reg12.addr_31_11 already assigned in Perl. Old = 0x0; new = 0x0 -W- Parameter FW.tpt.scratchpad.pciex.endpoint_mask.pci_cfg_space.cfg_hdr.reg12.exp_rom_en already assigned in/mswg/release/fw-25208/fw-25208-rel-4_7_0-rc24/lion_cub_128.brd. Old = 0x0; new = 0x0 -W- Parameter FW.tpt.scratchpad.pciex.endpoint_mask.pci_cfg_space.cfg_hdr.reg12.addr_31_11 already assigned in /mswg/release/fw-25208/fw-25208-rel-4_7_0-rc24/lion_cub_128.brd. Old = 0x0; new = 0x0 > I documented the procedure I used to flash my cards in the > installation cheat sheet: > > https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet Thanks. -- Hal > > james > > On Tue, 13 Sep 2005, Hal Rosenstock wrote: > > > Hi Michael, > > > > I got the following running mic: > > mic -dev_type MT23108 -fw fw-23108-a1-rel.mic -conf MHXL-CF128-T.brd -format BINARY -wrimage fw-23108-a1-rel.bin > > -W- Trying to assign float value (330357.142857143) to an integer parameter (dram_frequency_2x_khz). Fraction truncated. > > -W- Parameter "FW.b2.AutoRename.258" fw-23108-a1-rel.mic:2146 initializes same CR-SPACE location (0x20a0a.0) as parameter "FW.b2.AutoRename.257" fw-23108-a1-rel.mic:2145 > > -W- Parameter "FW.b2.AutoRename.804" fw-23108-a1-rel.mic:2826 initializes same CR-SPACE location (0x3cc15.0) as parameter "FW.pcu debug out posted enable" fw-23108-a1-rel.mic:2761 > > -W- Parameter "FW.b2.AutoRename.805" fw-23108-a1-rel.mic:2827 initializes same CR-SPACE location (0x3cc15.1) as parameter "FW.pcu debug out nonposted enable" fw-23108-a1-rel.mic:2762 > > -W- Parameter "FW.b2.AutoRename.811" fw-23108-a1-rel.mic:2839 initializes same CR-SPACE location (0x3d690.22) as parameter "FW.DMUrx1d" fw-23108-a1-rel.mic:2757 > > > > They all appear to be warnings rather than errors. > > > > Is this OK to burn ? > > > > -- Hal > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > > From steve_wooding at keysounds.co.uk Tue Sep 13 13:55:29 2005 From: steve_wooding at keysounds.co.uk (Steve Wooding) Date: Tue, 13 Sep 2005 21:55:29 +0100 Subject: [openib-general] Userspace function to query SA for ServiceRecord by name In-Reply-To: <1126643626.4514.1131.camel@hal.voltaire.com> References: <432735D6.9040403@keysounds.co.uk> <1126643626.4514.1131.camel@hal.voltaire.com> Message-ID: <43273CC1.5070303@keysounds.co.uk> Hi Hal, Hal Rosenstock wrote: > > >The closest thing for this is the OpenSM SA client in the vendor layer >in osm/libvendor/osm_vendor_ibumad_sa.c. [There are some examples of how >to use this if you want.] A gen2 SA client for this has not been >implemented. > >Do you also have the need to register (set) and delete ServiceRecords as >well from userspace ? > > > Some examples would be great. I probably do need to set and delete the ServiceRecords for in-house testing purposes, though in the final solution another party will create the ServiceRecords. Cheers, Steve. -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Tue Sep 13 13:56:49 2005 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 13 Sep 2005 13:56:49 -0700 Subject: [openib-general] Mellanox device in INIT state In-Reply-To: <1126637643.4514.879.camel@hal.voltaire.com> Message-ID: > You need to run an SM on the subnet to bring it from INIT to ACTIVE. I have a topspin switch connected with has SM running. I have 1x/4x/12x connections to Topspin 120 switch. It could be something wrong on the ports configuration. thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Sep 13 14:03:54 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 17:03:54 -0400 Subject: [openib-general] Mellanox device in INIT state In-Reply-To: References: Message-ID: <1126645434.4425.12.camel@hal.voltaire.com> On Tue, 2005-09-13 at 16:56, Shirley Ma wrote: > > You need to run an SM on the subnet to bring it from INIT to ACTIVE. > > I have a topspin switch connected with has SM running. I have > 1x/4x/12x connections to Topspin 120 switch. It could be something > wrong on the ports configuration. Or the (Topspin) SM does not know how to deal with a 12x HCA yet... -- Hal From halr at voltaire.com Tue Sep 13 14:07:29 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 17:07:29 -0400 Subject: [openib-general] Userspace function to query SA for ServiceRecord by name In-Reply-To: <43273CC1.5070303@keysounds.co.uk> References: <432735D6.9040403@keysounds.co.uk> <1126643626.4514.1131.camel@hal.voltaire.com> <43273CC1.5070303@keysounds.co.uk> Message-ID: <1126645554.4425.20.camel@hal.voltaire.com> On Tue, 2005-09-13 at 16:55, Steve Wooding wrote: > Some examples would be great. > > I probably do need to set and delete the ServiceRecords for in-house > testing purposes, though in the final solution another party will > create the ServiceRecords. Try osmtest/osmt_service.c for some code which uses this API. -- Hal From rolandd at cisco.com Tue Sep 13 14:15:47 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 14:15:47 -0700 Subject: [openib-general] Mellanox device in INIT state In-Reply-To: (Shirley Ma's message of "Tue, 13 Sep 2005 11:53:58 -0700") References: Message-ID: <523bo87jp8.fsf@cisco.com> Shirley> After loading the ib_mthca module on PPC, the device Shirley> state is in INIT not ACTIVE state. Any clue? Is this a regression or are you trying a completely new setup? Do the counters under /sys/class/infiniband/mthca0/ports/1/counters/ show any packets sent or received? How about any errors? - R. From rolandd at cisco.com Tue Sep 13 14:17:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 14:17:10 -0700 Subject: [openib-general] Mellanox device in INIT state In-Reply-To: <1126645434.4425.12.camel@hal.voltaire.com> (Hal Rosenstock's message of "13 Sep 2005 17:03:54 -0400") References: <1126645434.4425.12.camel@hal.voltaire.com> Message-ID: <52y860652h.fsf@cisco.com> Hal> Or the (Topspin) SM does not know how to deal with a 12x HCA yet... Actually I think the SM would be OK. The fact that 12X Mellanox HCAs don't exist yet might be more of an obstacle... - R. From rolandd at cisco.com Tue Sep 13 14:29:04 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 14:29:04 -0700 Subject: [openib-general] Strange configure error in libibcm In-Reply-To: <4df28be405091312295f6500d5@mail.gmail.com> (Viswanath Krishnamurthy's message of "Tue, 13 Sep 2005 12:29:50 -0700") References: <4df28be405091312295f6500d5@mail.gmail.com> Message-ID: <52r7bs64in.fsf@cisco.com> This is an odd error that seems to be some sort of autotools or binutils bug. I see the same thing on my system, and what seems to happening is: checking for ib_at_route_by_ip in -libat... yes configure tries to link a program that calls ib_at_route_by_ip, and succeeds because ld searches /usr/local/lib. checking size of long... configure: error: cannot compute sizeof (long), 77 it then tries to run a program to see how big a long is, but it can't run because the dynamic loader can't find libibat, since /usr/local/lib isn't in its search path. So I'm not sure how to get a better error message, but I don't think it's a libibcm problem. - R. From hozer at hozed.org Tue Sep 13 14:42:27 2005 From: hozer at hozed.org (Troy Benjegerdes) Date: Tue, 13 Sep 2005 16:42:27 -0500 Subject: [openib-general] [PATCH v1/RFC] IB: Add SCSI RDMA Protocol (SRP) initiator In-Reply-To: <52ll207s2o.fsf@cisco.com> References: <52ll207s2o.fsf@cisco.com> Message-ID: <20050913214226.GE1685@kalmia.hozed.org> Is there anyplace I can find an SRP target for Linux? What is available? (Ideally, I'd like one for 2.6.1[3,4] ) On Tue, Sep 13, 2005 at 11:14:55AM -0700, Roland Dreier wrote: > Sorry to interrupt the SAS arguments, but... > > Here's the latest version of the InfiniBand SRP initiator. I think > it's ready for merging; I implemented error handling, which was the > main thing pointed out from the generally positive reviews of my > previous posting. I've done a decent amount of testing without seeing > any problems, and John Kingman has also tested against his SRP target. > > Since this is a completely new driver and can't break anything, > assuming the code looks good, does it seem OK to merge for 2.6.14? > > Thanks, > Roland > > > Add an InfiniBand SCSI RDMA Protocol (SRP) initiator. This lets us > talk to InfiniBand SRP targets (storage devices). > > Signed-off-by: Roland Dreier > > --- > > drivers/infiniband/Kconfig | 2 > drivers/infiniband/Makefile | 1 > drivers/infiniband/ulp/srp/Kbuild | 3 > drivers/infiniband/ulp/srp/Kconfig | 11 > drivers/infiniband/ulp/srp/ib_srp.c | 1637 +++++++++++++++++++++++++++++++++++ > drivers/infiniband/ulp/srp/ib_srp.h | 324 +++++++ > 6 files changed, 1978 insertions(+), 0 deletions(-) > > 891de9c3d67dc4afc2e5c941bba96613bca81ae1 > diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig > --- a/drivers/infiniband/Kconfig > +++ b/drivers/infiniband/Kconfig > @@ -33,4 +33,6 @@ source "drivers/infiniband/hw/mthca/Kcon > > source "drivers/infiniband/ulp/ipoib/Kconfig" > > +source "drivers/infiniband/ulp/srp/Kconfig" > + > endmenu > diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile > --- a/drivers/infiniband/Makefile > +++ b/drivers/infiniband/Makefile > @@ -1,3 +1,4 @@ > obj-$(CONFIG_INFINIBAND) += core/ > obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mthca/ > obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ > +obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ > diff --git a/drivers/infiniband/ulp/srp/Kbuild b/drivers/infiniband/ulp/srp/Kbuild > new file mode 100644 > --- /dev/null > +++ b/drivers/infiniband/ulp/srp/Kbuild > @@ -0,0 +1,3 @@ > +EXTRA_CFLAGS += -Idrivers/infiniband/include > + > +obj-$(CONFIG_INFINIBAND_SRP) += ib_srp.o > diff --git a/drivers/infiniband/ulp/srp/Kconfig b/drivers/infiniband/ulp/srp/Kconfig > new file mode 100644 > --- /dev/null > +++ b/drivers/infiniband/ulp/srp/Kconfig > @@ -0,0 +1,11 @@ > +config INFINIBAND_SRP > + tristate "InfiniBand SCSI RDMA Protocol" > + depends on INFINIBAND && SCSI > + ---help--- > + Support for the SCSI RDMA Protocol over InfiniBand. This > + allows you to access storage devices that speak SRP over > + InfiniBand. > + > + The SRP protocol is defined by the INCITS T10 technical > + committee. See . > + > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c > new file mode 100644 > --- /dev/null > +++ b/drivers/infiniband/ulp/srp/ib_srp.c > @@ -0,0 +1,1637 @@ > +/* > + * Copyright (c) 2005 Cisco Systems. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + * $Id: ib_srp.c 3395 2005-09-13 05:10:39Z roland $ > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include > + > +#include > +#include > +#include > + > +#include > + > +#include "ib_srp.h" > + > +#define DRV_NAME "ib_srp" > +#define PFX DRV_NAME ": " > +#define DRV_VERSION "0.01" > +#define DRV_RELDATE "January 11, 2005" > + > +MODULE_AUTHOR("Roland Dreier"); > +MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol driver"); > +MODULE_LICENSE("Dual BSD/GPL"); > + > +static int topspin_workarounds = 1; > + > +module_param(topspin_workarounds, int, 0444); > +MODULE_PARM_DESC(topspin_workarounds, > + "Enable workarounds for Topspin/Cisco SRP target bugs if != 0"); > + > +static const u8 topspin_oui[3] = { 0x00, 0x05, 0xad }; > + > +static void srp_add_one(struct ib_device *device); > +static void srp_remove_one(struct ib_device *device); > +static void srp_completion(struct ib_cq *cq, void *target_ptr); > +static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event); > + > +static struct ib_client srp_client = { > + .name = "srp", > + .add = srp_add_one, > + .remove = srp_remove_one > +}; > + > +static inline struct srp_target_port *host_to_target(struct Scsi_Host *host) > +{ > + return (struct srp_target_port *) host->hostdata; > +} > + > +static const char *srp_target_info(struct Scsi_Host *host) > +{ > + return host_to_target(host)->target_name; > +} > + > +static struct srp_iu *srp_alloc_iu(struct srp_host *host, size_t size, > + unsigned int __nocast gfp_mask, > + enum dma_data_direction direction) > +{ > + struct srp_iu *iu; > + > + iu = kmalloc(sizeof *iu, gfp_mask); > + if (!iu) > + goto out; > + > + iu->buf = kzalloc(size, gfp_mask); > + if (!iu->buf) > + goto out_free_iu; > + > + iu->dma = dma_map_single(host->dev->dma_device, iu->buf, size, direction); > + if (dma_mapping_error(iu->dma)) > + goto out_free_buf; > + > + iu->size = size; > + iu->direction = direction; > + > + return iu; > + > +out_free_buf: > + kfree(iu->buf); > +out_free_iu: > + kfree(iu); > +out: > + return NULL; > +} > + > +static void srp_free_iu(struct srp_host *host, struct srp_iu *iu) > +{ > + if (!iu) > + return; > + > + dma_unmap_single(host->dev->dma_device, iu->dma, iu->size, iu->direction); > + kfree(iu->buf); > + kfree(iu); > +} > + > +static void srp_qp_event(struct ib_event *event, void *context) > +{ > + printk(KERN_ERR PFX "QP event %d\n", event->event); > +} > + > +static int srp_init_qp(struct srp_target_port *target, > + struct ib_qp *qp) > +{ > + struct ib_qp_attr *attr; > + int ret; > + > + attr = kmalloc(sizeof *attr, GFP_KERNEL); > + if (!attr) > + return -ENOMEM; > + > + ret = ib_find_cached_pkey(target->srp_host->dev, > + target->srp_host->port, > + be16_to_cpu(target->path.pkey), > + &attr->pkey_index); > + if (ret) > + return ret; > + > + attr->qp_state = IB_QPS_INIT; > + attr->qp_access_flags = (IB_ACCESS_REMOTE_READ | > + IB_ACCESS_REMOTE_WRITE); > + attr->port_num = target->srp_host->port; > + > + return ib_modify_qp(qp, attr, > + IB_QP_STATE | > + IB_QP_PKEY_INDEX | > + IB_QP_ACCESS_FLAGS | > + IB_QP_PORT); > +} > + > +static struct ib_qp *srp_create_qp(struct srp_target_port *target, > + struct ib_qp_init_attr *init_attr) > +{ > + struct ib_qp *qp; > + int ret; > + > + qp = ib_create_qp(target->srp_host->pd, init_attr); > + if (IS_ERR(qp)) > + return qp; > + > + ret = srp_init_qp(target, qp); > + if (ret) { > + ib_destroy_qp(qp); > + qp = ERR_PTR(ret); > + } > + > + return qp; > +} > + > +static void srp_path_rec_completion(int status, > + struct ib_sa_path_rec *pathrec, > + void *target_ptr) > +{ > + struct srp_target_port *target = target_ptr; > + struct srp_host *host = target->srp_host; > + struct ib_qp_init_attr *init_attr = NULL; > + > + if (status) { > + printk(KERN_ERR PFX "Got failed path rec status %d\n", status); > + target->status = status; > + goto out; > + } > + > + target->path = *pathrec; > + > + /* > + * We may be getting a path for the second time because we > + * were redirected to a different port. In that case, there's > + * no reason to create our CQ and QP again. > + */ > + if (target->cq) { > + target->status = 0; > + goto out; > + } > + > + init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL); > + if (!init_attr) { > + target->status = -ENOMEM; > + goto out; > + } > + > + target->cq = ib_create_cq(host->dev, srp_completion, > + NULL, target, SRP_CQ_SIZE); > + if (IS_ERR(target->cq)) { > + target->status = PTR_ERR(target->cq); > + goto out_free; > + } > + > + ib_req_notify_cq(target->cq, IB_CQ_NEXT_COMP); > + > + init_attr->event_handler = srp_qp_event; > + init_attr->cap.max_send_wr = SRP_SQ_SIZE; > + init_attr->cap.max_recv_wr = SRP_RQ_SIZE; > + init_attr->cap.max_recv_sge = 1; > + init_attr->cap.max_send_sge = 1; > + init_attr->sq_sig_type = IB_SIGNAL_ALL_WR; > + init_attr->qp_type = IB_QPT_RC; > + init_attr->send_cq = target->cq; > + init_attr->recv_cq = target->cq; > + > + target->qp = srp_create_qp(target, init_attr); > + if (IS_ERR(target->qp)) { > + target->status = PTR_ERR(target->qp); > + ib_destroy_cq(target->cq); > + goto out_free; > + } > + > + target->status = 0; > + > +out_free: > + kfree(init_attr); > + > +out: > + complete(&target->done); > +} > + > +static int srp_lookup_path(struct srp_target_port *target) > +{ > + target->path.numb_path = 1; > + > + init_completion(&target->done); > + > + target->path_query_id = ib_sa_path_rec_get(target->srp_host->dev, > + target->srp_host->port, > + &target->path, > + IB_SA_PATH_REC_DGID | > + IB_SA_PATH_REC_SGID | > + IB_SA_PATH_REC_NUMB_PATH | > + IB_SA_PATH_REC_PKEY, > + SRP_PATH_REC_TIMEOUT_MS, > + GFP_KERNEL, > + srp_path_rec_completion, > + target, &target->path_query); > + if (target->path_query_id < 0) > + return target->path_query_id; > + > + wait_for_completion(&target->done); > + > + if (target->status < 0) > + printk(KERN_WARNING PFX "Path record query failed\n"); > + > + return target->status; > +} > + > +static int srp_send_req(struct srp_target_port *target) > +{ > + struct { > + struct ib_cm_req_param param; > + struct srp_login_req priv; > + } *req = NULL; > + int status; > + > + req = kzalloc(sizeof *req, GFP_KERNEL); > + if (!req) > + return -ENOMEM; > + > + req->param.primary_path = &target->path; > + req->param.alternate_path = NULL; > + req->param.service_id = target->service_id; > + req->param.qp_num = target->qp->qp_num; > + req->param.qp_type = target->qp->qp_type; > + req->param.starting_psn = 0; /* XXX */ > + req->param.private_data = &req->priv; > + req->param.private_data_len = sizeof req->priv; > + req->param.responder_resources = 4; > + req->param.remote_cm_response_timeout = 20; > + req->param.flow_control = 1; > + req->param.local_cm_response_timeout = 20; > + req->param.retry_count = 7; > + req->param.rnr_retry_count = 7; > + req->param.max_cm_retries = 15; > + > + req->priv.opcode = SRP_LOGIN_REQ; > + req->priv.tag = 0; > + req->priv.req_it_iu_len = cpu_to_be32(SRP_MAX_IU_LEN); > + req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | > + SRP_BUF_FORMAT_INDIRECT); > + memcpy(req->priv.initiator_port_id, target->srp_host->initiator_port_id, 16); > + /* > + * Topspin/Cisco SRP targets will reject our login unless we > + * zero out the first 8 bytes of our initiator port ID. The > + * second 8 bytes must be our local node GUID, but we always > + * use that anyway. > + */ > + if (topspin_workarounds && !memcmp(&target->ioc_guid, topspin_oui, 3)) { > + printk(KERN_DEBUG PFX "Topspin/Cisco initiator port ID workaround " > + "activated for target GUID %016llx\n", > + (unsigned long long) be64_to_cpu(target->ioc_guid)); > + memset(req->priv.initiator_port_id, 0, 8); > + } > + memcpy(req->priv.target_port_id, &target->id_ext, 8); > + memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); > + > + status = ib_send_cm_req(target->cm_id, &req->param); > + if (status) { > + ib_destroy_qp(target->qp); > + ib_destroy_cq(target->cq); > + } > + > + return status; > +} > + > +static void srp_disconnect_target(struct srp_target_port *target) > +{ > + /* XXX should send SRP_I_LOGOUT request */ > + > + init_completion(&target->done); > + ib_send_cm_dreq(target->cm_id, NULL, 0); > + wait_for_completion(&target->done); > +} > + > +static void srp_free_target_ib(struct srp_target_port *target) > +{ > + int i; > + > + ib_destroy_qp(target->qp); > + ib_destroy_cq(target->cq); > + > + for (i = 0; i < SRP_RQ_SIZE; ++i) > + srp_free_iu(target->srp_host, target->rx_ring[i]); > + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) > + srp_free_iu(target->srp_host, target->tx_ring[i]); > +} > + > +static void srp_remove_work(void *target_ptr) > +{ > + struct srp_target_port *target = target_ptr; > + > + spin_lock_irq(target->scsi_host->host_lock); > + if (target->state != SRP_TARGET_DEAD) { > + spin_unlock_irq(target->scsi_host->host_lock); > + scsi_host_put(target->scsi_host); > + return; > + } > + target->state = SRP_TARGET_REMOVED; > + spin_unlock_irq(target->scsi_host->host_lock); > + > + down(&target->srp_host->target_mutex); > + list_del(&target->list); > + up(&target->srp_host->target_mutex); > + > + scsi_remove_host(target->scsi_host); > + ib_destroy_cm_id(target->cm_id); > + srp_free_target_ib(target); > + scsi_host_put(target->scsi_host); > + /* And another put to really free the target port... */ > + scsi_host_put(target->scsi_host); > +} > + > +static int srp_connect_target(struct srp_target_port *target) > +{ > + int ret; > + > + while (1) { > + init_completion(&target->done); > + ret = srp_send_req(target); > + if (ret) > + return ret; > + wait_for_completion(&target->done); > + > + /* > + * The CM event handling code will set status to > + * SRP_PORT_REDIRECT if we get a port redirect REJ > + * back, or SRP_DLID_REDIRECT if we get a lid/qp > + * redirect REJ back. > + */ > + switch (target->status) { > + case 0: > + return 0; > + > + case SRP_PORT_REDIRECT: > + ret = srp_lookup_path(target); > + if (ret) > + return ret; > + break; > + > + case SRP_DLID_REDIRECT: > + break; > + > + default: > + return target->status; > + } > + } > +} > + > +static int srp_reconnect_target(struct srp_target_port *target) > +{ > + struct ib_qp_attr qp_attr; > + struct srp_request *req; > + struct ib_wc wc; > + u32 remote_cm_qpn; > + int ret; > + int i; > + > + spin_lock_irq(target->scsi_host->host_lock); > + if (target->state != SRP_TARGET_LIVE) { > + spin_unlock_irq(target->scsi_host->host_lock); > + return -EAGAIN; > + } > + target->state = SRP_TARGET_CONNECTING; > + spin_unlock_irq(target->scsi_host->host_lock); > + > + remote_cm_qpn = target->cm_id->remote_cm_qpn; > + > + srp_disconnect_target(target); > + > + target->cm_id = ib_create_cm_id(srp_cm_handler, target); > + if (IS_ERR(target->cm_id)) { > + ret = PTR_ERR(target->cm_id); > + target->cm_id = NULL; > + goto err; > + } > + > + target->cm_id->remote_cm_qpn = remote_cm_qpn; > + > + qp_attr.qp_state = IB_QPS_RESET; > + ret = ib_modify_qp(target->qp, &qp_attr, IB_QP_STATE); > + if (ret) > + goto err; > + > + ret = srp_init_qp(target, target->qp); > + if (ret) > + goto err; > + > + while (ib_poll_cq(target->cq, 1, &wc) > 0) > + ; /* nothing */ > + > + list_for_each_entry(req, &target->req_queue, list) { > + req->scmnd->result = DID_RESET << 16; > + req->scmnd->scsi_done(req->scmnd); > + } > + > + target->rx_head = 0; > + target->rx_tail = 0; > + target->tx_head = 0; > + target->tx_tail = 0; > + target->req_head = 0; > + for (i = 0; i < SRP_SQ_SIZE - 1; ++i) > + target->req_ring[i].next = i + 1; > + target->req_ring[SRP_SQ_SIZE - 1].next = -1; > + INIT_LIST_HEAD(&target->req_queue); > + > + ret = srp_connect_target(target); > + if (ret) > + goto err; > + > + spin_lock_irq(target->scsi_host->host_lock); > + if (target->state == SRP_TARGET_CONNECTING) { > + ret = 0; > + target->state = SRP_TARGET_LIVE; > + } else > + ret = -EAGAIN; > + spin_unlock_irq(target->scsi_host->host_lock); > + > + return ret; > + > +err: > + printk(KERN_ERR PFX "reconnect failed, removing target port.\n"); > + > + /* > + * We couldn't reconnect, so kill our target port off. > + * However, we have to defer the real removal because we might > + * be in the context of the SCSI error handler now, which > + * would deadlock if we call scsi_remove_host(). > + */ > + spin_lock_irq(target->scsi_host->host_lock); > + if (target->state == SRP_TARGET_CONNECTING) { > + target->state = SRP_TARGET_DEAD; > + INIT_WORK(&target->work, srp_remove_work, target); > + schedule_work(&target->work); > + } > + spin_unlock_irq(target->scsi_host->host_lock); > + > + return ret; > +} > + > +static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target, > + struct srp_iu *iu) > +{ > + struct srp_cmd *cmd = iu->buf; > + int len; > + u8 fmt; > + > + if (!scmnd->request_buffer || scmnd->sc_data_direction == DMA_NONE) > + return sizeof (struct srp_cmd); > + > + if (scmnd->sc_data_direction != DMA_FROM_DEVICE && > + scmnd->sc_data_direction != DMA_TO_DEVICE) { > + printk(KERN_WARNING PFX "Unhandled data direction %d\n", > + scmnd->sc_data_direction); > + return -EINVAL; > + } > + > + if (scmnd->use_sg) { > + struct scatterlist *scat = scmnd->request_buffer; > + int n; > + int i; > + > + n = dma_map_sg(target->srp_host->dev->dma_device, > + scat, scmnd->use_sg, scmnd->sc_data_direction); > + > + if (n == 1) { > + struct srp_direct_buf *buf = (void *) cmd->add_data; > + > + fmt = SRP_DATA_DESC_DIRECT; > + > + buf->va = cpu_to_be64(sg_dma_address(scat)); > + buf->key = cpu_to_be32(target->srp_host->mr->rkey); > + buf->len = cpu_to_be32(sg_dma_len(scat)); > + > + len = sizeof (struct srp_cmd) + > + sizeof (struct srp_direct_buf); > + } else { > + struct srp_indirect_buf *buf = (void *) cmd->add_data; > + u32 datalen = 0; > + > + fmt = SRP_DATA_DESC_INDIRECT; > + > + if (scmnd->sc_data_direction == DMA_TO_DEVICE) > + cmd->data_out_desc_cnt = n; > + else > + cmd->data_in_desc_cnt = n; > + > + buf->table_desc.va = cpu_to_be64(iu->dma + > + sizeof *cmd + > + sizeof *buf); > + buf->table_desc.key = > + cpu_to_be32(target->srp_host->mr->rkey); > + buf->table_desc.len = > + cpu_to_be32(n * sizeof (struct srp_direct_buf)); > + > + for (i = 0; i < n; ++i) { > + buf->desc_list[i].va = cpu_to_be64(sg_dma_address(&scat[i])); > + buf->desc_list[i].key = > + cpu_to_be32(target->srp_host->mr->rkey); > + buf->desc_list[i].len = cpu_to_be32(sg_dma_len(&scat[i])); > + > + datalen += sg_dma_len(&scat[i]); > + } > + > + buf->len = cpu_to_be32(datalen); > + > + len = sizeof (struct srp_cmd) + > + sizeof (struct srp_indirect_buf) + > + n * sizeof (struct srp_direct_buf); > + } > + } else { > + struct srp_direct_buf *buf = (void *) cmd->add_data; > + dma_addr_t dma; > + > + dma = dma_map_single(target->srp_host->dev->dma_device, > + scmnd->request_buffer, scmnd->request_bufflen, > + scmnd->sc_data_direction); > + if (dma_mapping_error(dma)) { > + printk(KERN_WARNING PFX "unable to map %p/%d (dir %d)\n", > + scmnd->request_buffer, (int) scmnd->request_bufflen, > + scmnd->sc_data_direction); > + return -EINVAL; > + } > + > + buf->va = cpu_to_be64(dma); > + buf->key = cpu_to_be32(target->srp_host->mr->rkey); > + buf->len = cpu_to_be32(scmnd->request_bufflen); > + > + fmt = SRP_DATA_DESC_DIRECT; > + > + len = sizeof (struct srp_cmd) + sizeof (struct srp_direct_buf); > + } > + > + if (scmnd->sc_data_direction == DMA_TO_DEVICE) > + cmd->buf_fmt = fmt << 4; > + else > + cmd->buf_fmt = fmt; > + > + > + return len; > +} > + > +static void srp_unmap_data(struct scsi_cmnd *scmnd, > + struct srp_target_port *target, > + struct srp_cmd *cmd) > +{ > + if (!scmnd->request_buffer || > + (scmnd->sc_data_direction != DMA_TO_DEVICE && > + scmnd->sc_data_direction != DMA_FROM_DEVICE)) > + return; > + > + if (scmnd->use_sg) > + dma_unmap_sg(target->srp_host->dev->dma_device, > + (struct scatterlist *) scmnd->request_buffer, > + scmnd->use_sg, scmnd->sc_data_direction); > + else > + dma_unmap_single(target->srp_host->dev->dma_device, > + be64_to_cpu(((struct srp_direct_buf *) cmd->add_data)->va), > + scmnd->request_bufflen, > + scmnd->sc_data_direction); > +} > + > +static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp) > +{ > + struct srp_request *req; > + struct scsi_cmnd *scmnd; > + struct srp_iu *iu; > + unsigned long flags; > + s32 delta; > + > + delta = (s32) be32_to_cpu(rsp->req_lim_delta); > + > + spin_lock_irqsave(target->scsi_host->host_lock, flags); > + > + target->req_lim += delta; > + > + req = &target->req_ring[rsp->tag & ~SRP_TAG_TSK_MGMT]; > + > + if (rsp->tag & SRP_TAG_TSK_MGMT) { > + if (be32_to_cpu(rsp->resp_data_len) < 4) > + req->tsk_status = -1; > + else > + req->tsk_status = rsp->data[3]; > + complete(&req->done); > + } else { > + iu = req->cmd; > + scmnd = req->scmnd; > + scmnd->result = rsp->status; > + > + if (rsp->flags & SRP_RSP_FLAG_SNSVALID) { > + memcpy(scmnd->sense_buffer, rsp->data + > + be32_to_cpu(rsp->resp_data_len), > + min_t(int, be32_to_cpu(rsp->sense_data_len), > + SCSI_SENSE_BUFFERSIZE)); > + } > + > + if (rsp->flags & (SRP_RSP_FLAG_DOOVER | SRP_RSP_FLAG_DOUNDER)) > + scmnd->resid = be32_to_cpu(rsp->data_out_res_cnt); > + else if (rsp->flags & (SRP_RSP_FLAG_DIOVER | SRP_RSP_FLAG_DIUNDER)) > + scmnd->resid = be32_to_cpu(rsp->data_in_res_cnt); > + > + srp_unmap_data(scmnd, target, iu->buf); > + > + if (!req->tsk_mgmt) { > + req->scmnd = NULL; > + scmnd->host_scribble = (void *) -1L; > + scmnd->scsi_done(scmnd); > + > + list_del(&req->list); > + req->next = target->req_head; > + target->req_head = rsp->tag & ~SRP_TAG_TSK_MGMT; > + } else > + req->cmd_done = 1; > + } > + > + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); > +} > + > +static void srp_reconnect_work(void *target_ptr) > +{ > + struct srp_target_port *target = target_ptr; > + > + srp_reconnect_target(target); > +} > + > +static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc) > +{ > + struct srp_iu *iu; > + u8 opcode; > + > + iu = target->rx_ring[wc->wr_id & ~SRP_OP_RECV]; > + > + dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, > + target->max_ti_iu_len, DMA_FROM_DEVICE); > + > + opcode = *(u8 *) iu->buf; > + > + if (0) { > + int i; > + > + printk(KERN_ERR PFX "recv completion, opcode 0x%02x\n", opcode); > + > + for (i = 0; i < wc->byte_len; ++i) { > + if (i % 8 == 0) > + printk(KERN_ERR " [%02x] ", i); > + printk(" %02x", ((u8 *) iu->buf)[i]); > + if ((i + 1) % 8 == 0) > + printk("\n"); > + } > + > + if (wc->byte_len % 8) > + printk("\n"); > + } > + > + switch (opcode) { > + case SRP_RSP: > + srp_process_rsp(target, iu->buf); > + break; > + > + case SRP_T_LOGOUT: > + /* XXX Handle target logout */ > + printk(KERN_WARNING PFX "Got target logout request\n"); > + break; > + > + default: > + printk(KERN_WARNING PFX "Unhandled SRP opcode 0x%02x\n", opcode); > + break; > + } > + > + dma_sync_single_for_device(target->srp_host->dev->dma_device, iu->dma, > + target->max_ti_iu_len, DMA_FROM_DEVICE); > + > + ++target->rx_tail; > +} > + > +static void srp_completion(struct ib_cq *cq, void *target_ptr) > +{ > + struct srp_target_port *target = target_ptr; > + struct ib_wc wc; > + unsigned long flags; > + > + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); > + while (ib_poll_cq(cq, 1, &wc) > 0) { > + if (wc.status) { > + printk(KERN_ERR PFX "failed %s status %d\n", > + wc.wr_id & SRP_OP_RECV ? "receive" : "send", > + wc.status); > + spin_lock_irqsave(target->scsi_host->host_lock, flags); > + if (target->state == SRP_TARGET_LIVE) > + schedule_work(&target->work); > + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); > + break; > + } > + > + if (wc.wr_id & SRP_OP_RECV) > + srp_handle_recv(target, &wc); > + else > + ++target->tx_tail; > + } > +} > + > +static int __srp_post_recv(struct srp_target_port *target, > + unsigned int __nocast gfp_mask) > +{ > + struct srp_iu *iu; > + struct ib_sge list; > + struct ib_recv_wr wr, *bad_wr; > + unsigned int next; > + int ret; > + > + next = target->rx_head & (SRP_RQ_SIZE - 1); > + wr.wr_id = next | SRP_OP_RECV; > + iu = target->rx_ring[next]; > + > + list.addr = iu->dma; > + list.length = iu->size; > + list.lkey = target->srp_host->mr->lkey; > + > + wr.next = NULL; > + wr.sg_list = &list; > + wr.num_sge = 1; > + > + ret = ib_post_recv(target->qp, &wr, &bad_wr); > + if (!ret) > + ++target->rx_head; > + > + return ret; > +} > + > +static int srp_post_recv(struct srp_target_port *target, > + unsigned int __nocast gfp_mask) > +{ > + unsigned long flags; > + int ret; > + > + spin_lock_irqsave(target->scsi_host->host_lock, flags); > + ret = __srp_post_recv(target, gfp_mask); > + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); > + > + return ret; > +} > + > +/* > + * Must be called with target->scsi_host->host_lock held to protect > + * req_lim and tx_head. > + */ > +static struct srp_iu *__srp_get_tx_iu(struct srp_target_port *target) > +{ > + if (target->tx_head - target->tx_tail >= SRP_SQ_SIZE) > + return NULL; > + > + return target->tx_ring[target->tx_head & SRP_SQ_SIZE]; > +} > + > +/* > + * Must be called with target->scsi_host->host_lock held to protect > + * req_lim and tx_head. > + */ > +static int __srp_post_send(struct srp_target_port *target, > + struct srp_iu *iu, int len) > +{ > + struct ib_sge list; > + struct ib_send_wr wr, *bad_wr; > + int ret = 0; > + > + if (target->req_lim < 1) { > + printk(KERN_ERR PFX "Target has req_lim %d\n", target->req_lim); > + return -EAGAIN; > + } > + > + list.addr = iu->dma; > + list.length = len; > + list.lkey = target->srp_host->mr->lkey; > + > + wr.next = NULL; > + wr.wr_id = target->tx_head & SRP_SQ_SIZE; > + wr.sg_list = &list; > + wr.num_sge = 1; > + wr.opcode = IB_WR_SEND; > + wr.send_flags = IB_SEND_SIGNALED; > + > + ret = ib_post_send(target->qp, &wr, &bad_wr); > + > + if (!ret) { > + ++target->tx_head; > + --target->req_lim; > + } > + > + return ret; > +} > + > +static int srp_queuecommand(struct scsi_cmnd *scmnd, > + void (*done)(struct scsi_cmnd *)) > +{ > + struct srp_target_port *target = host_to_target(scmnd->device->host); > + struct srp_request *req; > + struct srp_iu *iu; > + struct srp_cmd *cmd; > + long req_index; > + int len; > + > + if (target->state == SRP_TARGET_CONNECTING) > + goto err; > + > + if (target->state == SRP_TARGET_DEAD || > + target->state == SRP_TARGET_REMOVED) { > + scmnd->result = DID_BAD_TARGET << 16; > + done(scmnd); > + return 0; > + } > + > + iu = __srp_get_tx_iu(target); > + if (!iu) > + goto err; > + > + dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, > + SRP_MAX_IU_LEN, DMA_TO_DEVICE); > + > + req_index = target->req_head; > + > + scmnd->scsi_done = done; > + scmnd->result = 0; > + scmnd->host_scribble = (void *) req_index; > + > + cmd = iu->buf; > + memset(cmd, 0, sizeof *cmd); > + > + cmd->opcode = SRP_CMD; > + cmd->lun = cpu_to_be64((u64) scmnd->device->lun << 48); > + cmd->tag = req_index; > + memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len); > + > + req = &target->req_ring[req_index]; > + > + req->scmnd = scmnd; > + req->cmd = iu; > + req->cmd_done = 0; > + req->tsk_mgmt = NULL; > + > + len = srp_map_data(scmnd, target, iu); > + if (len < 0) { > + printk(KERN_ERR PFX "Failed to map data\n"); > + goto err; > + } > + > + if (__srp_post_recv(target, GFP_ATOMIC)) { > + printk(KERN_ERR PFX "Recv failed\n"); > + goto err_unmap; > + } > + > + dma_sync_single_for_device(target->srp_host->dev->dma_device, iu->dma, > + SRP_MAX_IU_LEN, DMA_TO_DEVICE); > + > + if (__srp_post_send(target, iu, len)) { > + printk(KERN_ERR PFX "Send failed\n"); > + goto err_unmap; > + } > + > + target->req_head = req->next; > + list_add_tail(&req->list, &target->req_queue); > + > + return 0; > + > +err_unmap: > + srp_unmap_data(scmnd, target, cmd); > + > +err: > + return SCSI_MLQUEUE_HOST_BUSY; > +} > + > +static int srp_alloc_iu_bufs(struct srp_target_port *target) > +{ > + int i; > + > + for (i = 0; i < SRP_RQ_SIZE; ++i) { > + target->rx_ring[i] = srp_alloc_iu(target->srp_host, > + target->max_ti_iu_len, > + GFP_KERNEL, DMA_FROM_DEVICE); > + if (!target->rx_ring[i]) > + goto err; > + } > + > + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) { > + target->tx_ring[i] = srp_alloc_iu(target->srp_host, > + SRP_MAX_IU_LEN, > + GFP_KERNEL, DMA_TO_DEVICE); > + if (!target->tx_ring[i]) > + goto err; > + } > + > + return 0; > + > +err: > + for (i = 0; i < SRP_RQ_SIZE; ++i) { > + srp_free_iu(target->srp_host, target->rx_ring[i]); > + target->rx_ring[i] = NULL; > + } > + > + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) { > + srp_free_iu(target->srp_host, target->tx_ring[i]); > + target->tx_ring[i] = NULL; > + } > + > + return -ENOMEM; > +} > + > +static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) > +{ > + struct srp_target_port *target = cm_id->context; > + struct ib_class_port_info *cpi; > + struct ib_qp_attr *qp_attr = NULL; > + int attr_mask = 0; > + int comp = 0; > + int ret = 0; > + > + switch (event->event) { > + case IB_CM_REQ_ERROR: > + printk(KERN_DEBUG PFX "Sending CM REQ failed\n"); > + comp = 1; > + target->status = -ECONNRESET; > + break; > + > + case IB_CM_REP_RECEIVED: > + comp = 1; > + > + { > + struct srp_login_rsp *rsp = event->private_data; > + > + /* XXX check that opcode is SRP RSP */ > + > + target->max_ti_iu_len = be32_to_cpu(rsp->max_ti_iu_len); > + target->req_lim = be32_to_cpu(rsp->req_lim_delta); > + > + target->scsi_host->can_queue = min(target->req_lim, > + target->scsi_host->can_queue); > + } > + > + target->status = srp_alloc_iu_bufs(target); > + if (target->status) > + break; > + > + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); > + if (!qp_attr) { > + target->status = -ENOMEM; > + break; > + } > + > + qp_attr->qp_state = IB_QPS_RTR; > + target->status = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask); > + if (target->status) > + break; > + > + qp_attr->rq_psn = 0; /* XXX */ > + attr_mask |= IB_QP_RQ_PSN; > + > + target->status = ib_modify_qp(target->qp, qp_attr, attr_mask); > + if (target->status) > + break; > + > + target->status = srp_post_recv(target, GFP_KERNEL); > + if (target->status) > + break; > + > + qp_attr->qp_state = IB_QPS_RTS; > + target->status = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask); > + if (target->status) > + break; > + > + target->status = ib_modify_qp(target->qp, qp_attr, attr_mask); > + if (target->status) > + break; > + > + target->status = ib_send_cm_rtu(cm_id, NULL, 0); > + if (target->status) > + break; > + > + break; > + > + case IB_CM_REJ_RECEIVED: > + printk(KERN_DEBUG PFX "REJ received\n"); > + comp = 1; > + > + if (event->param.rej_rcvd.reason == IB_CM_REJ_PORT_CM_REDIRECT) { > + cpi = event->param.rej_rcvd.ari; > + target->path.dlid = cpi->redirect_lid; > + target->path.pkey = cpi->redirect_pkey; > + cm_id->remote_cm_qpn = be32_to_cpu(cpi->redirect_qp) & 0x00ffffff; > + memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); > + > + target->status = target->path.dlid ? > + SRP_DLID_REDIRECT : SRP_PORT_REDIRECT; > + } else if (topspin_workarounds && > + !memcmp(&target->ioc_guid, topspin_oui, 3) && > + event->param.rej_rcvd.reason == IB_CM_REJ_PORT_REDIRECT) { > + /* > + * Topspin/Cisco SRP gateways incorrectly send > + * reject reason code 25 when they mean 24 > + * (port redirect). > + */ > + memcpy(target->path.dgid.raw, > + event->param.rej_rcvd.ari, 16); > + > + printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", > + (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), > + (unsigned long long) be64_to_cpu(target->path.dgid.global.interface_id)); > + > + target->status = SRP_PORT_REDIRECT; > + } else { > + printk(KERN_WARNING " REJ reason 0x%x\n", > + event->param.rej_rcvd.reason); > + target->status = -ECONNRESET; > + ret = 1; > + } > + > + break; > + > + case IB_CM_MRA_RECEIVED: > + printk(KERN_ERR PFX "MRA received\n"); > + break; > + > + case IB_CM_DREP_RECEIVED: > + break; > + > + case IB_CM_TIMEWAIT_EXIT: > + printk(KERN_ERR PFX "connection closed\n"); > + > + comp = 1; > + ret = 1; > + target->status = 0; > + break; > + > + default: > + printk(KERN_WARNING PFX "Unhandled CM event %d\n", event->event); > + break; > + } > + > + if (comp) > + complete(&target->done); > + > + kfree(qp_attr); > + > + return ret; > +} > + > +static int srp_send_tsk_mgmt(struct scsi_cmnd *scmnd, u8 func) > +{ > + struct srp_target_port *target = host_to_target(scmnd->device->host); > + struct srp_request *req; > + struct srp_iu *iu; > + struct srp_tsk_mgmt *tsk_mgmt; > + int req_index; > + int ret = FAILED; > + > + spin_lock_irq(target->scsi_host->host_lock); > + > + if (scmnd->host_scribble == (void *) -1L) > + goto out; > + > + req_index = (long) scmnd->host_scribble; > + > + req = &target->req_ring[req_index]; > + init_completion(&req->done); > + > + iu = __srp_get_tx_iu(target); > + if (!iu) > + goto out; > + > + tsk_mgmt = iu->buf; > + memset(tsk_mgmt, 0, sizeof *tsk_mgmt); > + > + tsk_mgmt->opcode = SRP_TSK_MGMT; > + tsk_mgmt->lun = cpu_to_be64((u64) scmnd->device->lun << 48); > + tsk_mgmt->tag = req_index | SRP_TAG_TSK_MGMT; > + tsk_mgmt->tsk_mgmt_func = func; > + tsk_mgmt->task_tag = req_index; > + > + if (__srp_post_send(target, iu, sizeof tsk_mgmt)) > + goto out; > + > + req->tsk_mgmt = iu; > + > + spin_unlock_irq(target->scsi_host->host_lock); > + if (!wait_for_completion_timeout(&req->done, > + msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS))) > + return FAILED; > + spin_lock_irq(target->scsi_host->host_lock); > + > + if (req->cmd_done) { > + list_del(&req->list); > + req->next = target->req_head; > + target->req_head = req_index; > + > + scmnd->scsi_done(scmnd); > + } else if (!req->tsk_status) { > + scmnd->result = DID_ABORT << 16; > + ret = SUCCESS; > + } > + > +out: > + spin_unlock_irq(target->scsi_host->host_lock); > + return ret; > +} > + > +static int srp_abort(struct scsi_cmnd *scmnd) > +{ > + printk(KERN_ERR "SRP abort called\n"); > + > + return srp_send_tsk_mgmt(scmnd, SRP_TSK_ABORT_TASK); > +} > + > +static int srp_reset_device(struct scsi_cmnd *scmnd) > +{ > + printk(KERN_ERR "SRP reset_device called\n"); > + > + return srp_send_tsk_mgmt(scmnd, SRP_TSK_LUN_RESET); > +} > + > +static int srp_reset_host(struct scsi_cmnd *scmnd) > +{ > + struct srp_target_port *target = host_to_target(scmnd->device->host); > + int ret = FAILED; > + > + printk(KERN_ERR PFX "SRP reset_host called\n"); > + > + if (!srp_reconnect_target(target)) > + ret = SUCCESS; > + > + return ret; > +} > + > +static struct scsi_host_template srp_template = { > + .module = THIS_MODULE, > + .name = DRV_NAME, > + .info = srp_target_info, > + .queuecommand = srp_queuecommand, > + .eh_abort_handler = srp_abort, > + .eh_device_reset_handler = srp_reset_device, > + .eh_host_reset_handler = srp_reset_host, > + .can_queue = SRP_SQ_SIZE, > + .this_id = -1, > + .sg_tablesize = SRP_MAX_INDIRECT, > + .cmd_per_lun = SRP_SQ_SIZE, > + .use_clustering = ENABLE_CLUSTERING > +}; > + > +static int srp_add_target(struct srp_host *host, struct srp_target_port *target) > +{ > + sprintf(target->target_name, "SRP.T10:%016llX", > + (unsigned long long) be64_to_cpu(target->id_ext)); > + > + if (scsi_add_host(target->scsi_host, host->dev->dma_device)) > + return -ENODEV; > + > + down(&host->target_mutex); > + list_add_tail(&target->list, &host->target_list); > + up(&host->target_mutex); > + > + target->state = SRP_TARGET_LIVE; > + > + /* XXX: are we supposed to have a definition of SCAN_WILD_CARD ?? */ > + scsi_scan_target(&target->scsi_host->shost_gendev, > + 0, target->scsi_id, ~0, 0); > + > + return 0; > +} > + > +static void srp_release_class_dev(struct class_device *class_dev) > +{ > + struct srp_host *host = > + container_of(class_dev, struct srp_host, class_dev); > + > + complete(&host->released); > +} > + > +static struct class srp_class = { > + .name = "infiniband_srp", > + .release = srp_release_class_dev > +}; > + > +/* > + * Target ports are added by writing > + * > + * id_ext=,ioc_guid=,dgid=, > + * pkey=,service_id= > + * > + * to the add_target sysfs attribute. > + */ > +enum { > + SRP_OPT_ERR = 0, > + SRP_OPT_ID_EXT = 1 << 0, > + SRP_OPT_IOC_GUID = 1 << 1, > + SRP_OPT_DGID = 1 << 2, > + SRP_OPT_PKEY = 1 << 3, > + SRP_OPT_SERVICE_ID = 1 << 4, > + SRP_OPT_MAX_SECT = 1 << 5, > + SRP_OPT_ALL = (SRP_OPT_ID_EXT | > + SRP_OPT_IOC_GUID | > + SRP_OPT_DGID | > + SRP_OPT_PKEY | > + SRP_OPT_SERVICE_ID), > +}; > + > +static match_table_t srp_opt_tokens = { > + { SRP_OPT_ID_EXT, "id_ext=%s" }, > + { SRP_OPT_IOC_GUID, "ioc_guid=%s" }, > + { SRP_OPT_DGID, "dgid=%s" }, > + { SRP_OPT_PKEY, "pkey=%x" }, > + { SRP_OPT_SERVICE_ID, "service_id=%s" }, > + { SRP_OPT_MAX_SECT, "max_sect=%d" }, > + { SRP_OPT_ERR, NULL } > +}; > + > +static int srp_parse_options(const char *buf, struct srp_target_port *target) > +{ > + char *options; > + char *p; > + char dgid[3]; > + substring_t args[MAX_OPT_ARGS]; > + int opt_mask = 0; > + int token; > + int ret = -EINVAL; > + int i; > + > + options = kstrdup(buf, GFP_KERNEL); > + if (!options) > + return -ENOMEM; > + > + while ((p = strsep(&options, ",")) != NULL) { > + if (!*p) > + continue; > + > + token = match_token(p, srp_opt_tokens, args); > + opt_mask |= token; > + > + switch (token) { > + case SRP_OPT_ID_EXT: > + p = match_strdup(args); > + target->id_ext = cpu_to_be64(simple_strtoull(p, NULL, 16)); > + kfree(p); > + break; > + > + case SRP_OPT_IOC_GUID: > + p = match_strdup(args); > + target->ioc_guid = cpu_to_be64(simple_strtoull(p, NULL, 16)); > + kfree(p); > + break; > + > + case SRP_OPT_DGID: > + p = match_strdup(args); > + if (strlen(p) != 32) > + goto out; > + > + for (i = 0; i < 16; ++i) { > + strlcpy(dgid, p + i * 2, 3); > + target->path.dgid.raw[i] = simple_strtoul(dgid, NULL, 16); > + } > + break; > + > + case SRP_OPT_PKEY: > + if (match_hex(args, &token)) > + goto out; > + target->path.pkey = cpu_to_be16(token); > + break; > + > + case SRP_OPT_SERVICE_ID: > + p = match_strdup(args); > + target->service_id = cpu_to_be64(simple_strtoull(p, NULL, 16)); > + kfree(p); > + break; > + > + case SRP_OPT_MAX_SECT: > + if (match_int(args, &token)) > + goto out; > + target->scsi_host->max_sectors = token; > + break; > + > + default: > + goto out; > + } > + } > + > + if (opt_mask == SRP_OPT_ALL) > + ret = 0; > + > +out: > + kfree(options); > + return ret; > +} > + > +static ssize_t srp_create_target(struct class_device *class_dev, > + const char *buf, size_t count) > +{ > + struct srp_host *host = > + container_of(class_dev, struct srp_host, class_dev); > + struct Scsi_Host *target_host; > + struct srp_target_port *target; > + int ret; > + int i; > + > + target_host = scsi_host_alloc(&srp_template, > + sizeof (struct srp_target_port)); > + if (!target_host) > + return -ENOMEM; > + > + target = host_to_target(target_host); > + memset(target, 0, sizeof *target); > + > + target->scsi_host = target_host; > + target->srp_host = host; > + > + INIT_WORK(&target->work, srp_reconnect_work, target); > + > + for (i = 0; i < SRP_SQ_SIZE - 1; ++i) > + target->req_ring[i].next = i + 1; > + target->req_ring[SRP_SQ_SIZE - 1].next = -1; > + INIT_LIST_HEAD(&target->req_queue); > + > + ret = srp_parse_options(buf, target); > + if (ret) > + goto err; > + > + target->cm_id = ib_create_cm_id(srp_cm_handler, target); > + if (IS_ERR(target->cm_id)) { > + ret = PTR_ERR(target->cm_id); > + goto err; > + } > + > + ib_get_cached_gid(host->dev, host->port, 0, &target->path.sgid); > + > + printk(KERN_DEBUG PFX "new target: id_ext %016llx ioc_guid %016llx pkey %04x " > + "service_id %016llx dgid %04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", > + (unsigned long long) be64_to_cpu(target->id_ext), > + (unsigned long long) be64_to_cpu(target->ioc_guid), > + be16_to_cpu(target->path.pkey), > + (unsigned long long) be64_to_cpu(target->service_id), > + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[0]), > + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[2]), > + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[4]), > + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[6]), > + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[8]), > + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[10]), > + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[12]), > + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[14])); > + > + ret = srp_lookup_path(target); > + if (ret) { > + ib_destroy_cm_id(target->cm_id); > + goto err; > + } > + > + ret = srp_connect_target(target); > + if (ret) { > + printk(KERN_ERR PFX "Connection failed\n"); > + goto err; > + } > + > + ret = srp_add_target(host, target); > + if (ret) > + goto err_disconnect; > + > + return count; > + > +err_disconnect: > + init_completion(&target->done); > + ib_send_cm_dreq(target->cm_id, NULL, 0); > + wait_for_completion(&target->done); > + > + ib_destroy_qp(target->qp); > + ib_destroy_cq(target->cq); > + > +err: > + for (i = 0; i < SRP_RQ_SIZE; ++i) > + srp_free_iu(target->srp_host, target->rx_ring[i]); > + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) > + srp_free_iu(target->srp_host, target->tx_ring[i]); > + > + scsi_host_put(target_host); > + > + return ret; > +} > + > +static CLASS_DEVICE_ATTR(add_target, S_IWUSR, NULL, srp_create_target); > + > +static struct srp_host *srp_add_port(struct ib_device *device, > + __be64 node_guid, u8 port) > +{ > + struct srp_host *host; > + > + host = kzalloc(sizeof *host, GFP_KERNEL); > + if (!host) > + return NULL; > + > + INIT_LIST_HEAD(&host->target_list); > + init_MUTEX(&host->target_mutex); > + init_completion(&host->released); > + host->dev = device; > + host->port = port; > + > + host->initiator_port_id[7] = port; > + memcpy(host->initiator_port_id + 8, &node_guid, 8); > + > + host->pd = ib_alloc_pd(device); > + if (IS_ERR(host->pd)) > + goto err_free; > + > + host->mr = ib_get_dma_mr(host->pd, > + IB_ACCESS_LOCAL_WRITE | > + IB_ACCESS_REMOTE_READ | > + IB_ACCESS_REMOTE_WRITE); > + if (IS_ERR(host->mr)) > + goto err_pd; > + > + host->class_dev.class = &srp_class; > + host->class_dev.dev = device->dma_device; > + snprintf(host->class_dev.class_id, BUS_ID_SIZE, "srp-%s-%d", > + device->name, port); > + > + if (class_device_register(&host->class_dev)) > + goto err_mr; > + if (class_device_create_file(&host->class_dev, &class_device_attr_add_target)) > + goto err_class; > + /* XXX ibdev / port files as well */ > + > + return host; > + > +err_class: > + class_device_unregister(&host->class_dev); > + > +err_mr: > + ib_dereg_mr(host->mr); > + > +err_pd: > + ib_dealloc_pd(host->pd); > + > +err_free: > + kfree(host); > + > + return NULL; > +} > + > +static void srp_add_one(struct ib_device *device) > +{ > + struct list_head *dev_list; > + struct srp_host *host; > + struct ib_device_attr *dev_attr; > + int s, e, p; > + > + dev_attr = kmalloc(sizeof *dev_attr, GFP_KERNEL); > + if (!dev_attr) > + return; > + > + if (ib_query_device(device, dev_attr)) { > + printk(KERN_WARNING PFX "Couldn't query node GUID for %s.\n", > + device->name); > + goto out; > + } > + > + dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL); > + if (!dev_list) > + goto out; > + > + INIT_LIST_HEAD(dev_list); > + > + if (device->node_type == IB_NODE_SWITCH) { > + s = 0; > + e = 0; > + } else { > + s = 1; > + e = device->phys_port_cnt; > + } > + > + for (p = s; p <= e; ++p) { > + host = srp_add_port(device, dev_attr->node_guid, p); > + if (host) > + list_add_tail(&host->list, dev_list); > + } > + > + ib_set_client_data(device, &srp_client, dev_list); > + > +out: > + kfree(dev_attr); > +} > + > +static void srp_remove_one(struct ib_device *device) > +{ > + struct list_head *dev_list; > + struct srp_host *host, *tmp_host; > + LIST_HEAD(target_list); > + struct srp_target_port *target, *tmp_target; > + unsigned long flags; > + > + dev_list = ib_get_client_data(device, &srp_client); > + > + list_for_each_entry_safe(host, tmp_host, dev_list, list) { > + class_device_unregister(&host->class_dev); > + /* > + * Wait for the sysfs entry to go away, so that no new > + * target ports can be created. > + */ > + wait_for_completion(&host->released); > + > + /* > + * Mark all target ports as removed, so we stop queueing > + * commands and don't try to reconnect. > + */ > + down(&host->target_mutex); > + list_for_each_entry_safe(target, tmp_target, > + &host->target_list, list) { > + spin_lock_irqsave(target->scsi_host->host_lock, flags); > + if (target->state != SRP_TARGET_REMOVED) > + target->state = SRP_TARGET_REMOVED; > + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); > + } > + up(&host->target_mutex); > + > + /* > + * Wait for any reconnection tasks that may have > + * started before we marked our target ports as > + * removed, and any target port removal tasks. > + */ > + flush_scheduled_work(); > + > + list_for_each_entry_safe(target, tmp_target, > + &host->target_list, list) { > + scsi_remove_host(target->scsi_host); > + srp_disconnect_target(target); > + srp_free_target_ib(target); > + scsi_host_put(target->scsi_host); > + } > + > + ib_dereg_mr(host->mr); > + ib_dealloc_pd(host->pd); > + kfree(host); > + } > + > + kfree(dev_list); > +} > + > +static int __init srp_init_module(void) > +{ > + int ret; > + > + ret = class_register(&srp_class); > + if (ret) { > + printk(KERN_ERR PFX "couldn't register class infiniband_srp\n"); > + return ret; > + } > + > + ret = ib_register_client(&srp_client); > + if (ret) { > + printk(KERN_ERR PFX "couldn't register IB client\n"); > + class_unregister(&srp_class); > + return ret; > + } > + > + return 0; > +} > + > +static void __exit srp_cleanup_module(void) > +{ > + ib_unregister_client(&srp_client); > + class_unregister(&srp_class); > +} > + > +module_init(srp_init_module); > +module_exit(srp_cleanup_module); > diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h > new file mode 100644 > --- /dev/null > +++ b/drivers/infiniband/ulp/srp/ib_srp.h > @@ -0,0 +1,324 @@ > +/* > + * Copyright (c) 2005 Cisco Systems. All rights reserved. > + * > + * This software is available to you under a choice of one of two > + * licenses. You may choose to be licensed under the terms of the GNU > + * General Public License (GPL) Version 2, available from the file > + * COPYING in the main directory of this source tree, or the > + * OpenIB.org BSD license below: > + * > + * Redistribution and use in source and binary forms, with or > + * without modification, are permitted provided that the following > + * conditions are met: > + * > + * - Redistributions of source code must retain the above > + * copyright notice, this list of conditions and the following > + * disclaimer. > + * > + * - Redistributions in binary form must reproduce the above > + * copyright notice, this list of conditions and the following > + * disclaimer in the documentation and/or other materials > + * provided with the distribution. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > + * SOFTWARE. > + * > + * $Id: ib_srp.h 3394 2005-09-13 05:04:31Z roland $ > + */ > + > +#ifndef IB_SRP_H > +#define IB_SRP_H > + > +#include > +#include > + > +#include > + > +#include > +#include > + > +#include > +#include > +#include > + > +enum { > + SRP_PATH_REC_TIMEOUT_MS = 1000, > + SRP_ABORT_TIMEOUT_MS = 5000, > + > + SRP_PORT_REDIRECT = 1, > + SRP_DLID_REDIRECT = 2, > + > + SRP_MAX_IU_LEN = 256, > + > + SRP_RQ_SHIFT = 6, > + SRP_RQ_SIZE = 1 << SRP_RQ_SHIFT, > + SRP_SQ_SIZE = SRP_RQ_SIZE - 1, > + SRP_CQ_SIZE = SRP_SQ_SIZE + SRP_RQ_SIZE, > + > + SRP_TAG_TSK_MGMT = 1 << (SRP_RQ_SHIFT + 1) > +}; > + > +#define SRP_OP_RECV (1 << 31) > +#define SRP_MAX_INDIRECT ((SRP_MAX_IU_LEN - \ > + sizeof (struct srp_cmd) - \ > + sizeof (struct srp_indirect_buf)) / 16) > + > +enum srp_target_state { > + SRP_TARGET_LIVE, > + SRP_TARGET_CONNECTING, > + SRP_TARGET_DEAD, > + SRP_TARGET_REMOVED > +}; > + > +struct srp_host { > + u8 initiator_port_id[16]; > + struct ib_device *dev; > + u8 port; > + struct ib_pd *pd; > + struct ib_mr *mr; > + struct class_device class_dev; > + struct list_head target_list; > + struct semaphore target_mutex; > + struct completion released; > + struct list_head list; > +}; > + > +struct srp_request { > + struct list_head list; > + struct scsi_cmnd *scmnd; > + struct srp_iu *cmd; > + struct srp_iu *tsk_mgmt; > + struct completion done; > + short next; > + u8 cmd_done; > + u8 tsk_status; > +}; > + > +struct srp_target_port { > + __be64 id_ext; > + __be64 ioc_guid; > + __be64 service_id; > + struct srp_host *srp_host; > + struct Scsi_Host *scsi_host; > + char target_name[32]; > + unsigned int scsi_id; > + > + struct ib_sa_path_rec path; > + struct ib_sa_query *path_query; > + int path_query_id; > + > + struct ib_cm_id *cm_id; > + struct ib_cq *cq; > + struct ib_qp *qp; > + > + int max_ti_iu_len; > + s32 req_lim; > + > + unsigned rx_head; > + unsigned rx_tail; > + struct srp_iu *rx_ring[SRP_RQ_SIZE]; > + > + unsigned tx_head; > + unsigned tx_tail; > + struct srp_iu *tx_ring[SRP_SQ_SIZE + 1]; > + > + int req_head; > + struct list_head req_queue; > + struct srp_request req_ring[SRP_SQ_SIZE]; > + > + struct work_struct work; > + > + struct list_head list; > + struct completion done; > + int status; > + enum srp_target_state state; > +}; > + > +struct srp_iu { > + dma_addr_t dma; > + void *buf; > + size_t size; > + enum dma_data_direction direction; > +}; > + > +/* > + * SRP protocol definitions > + */ > + > +enum { > + SRP_LOGIN_REQ = 0x00, > + SRP_TSK_MGMT = 0x01, > + SRP_CMD = 0x02, > + SRP_I_LOGOUT = 0x03, > + SRP_LOGIN_RSP = 0xc0, > + SRP_RSP = 0xc1, > + SRP_LOGIN_REJ = 0xc2, > + SRP_T_LOGOUT = 0x80, > + SRP_CRED_REQ = 0x81, > + SRP_AER_REQ = 0x82, > + SRP_CRED_RSP = 0x41, > + SRP_AER_RSP = 0x42 > +}; > + > +enum { > + SRP_BUF_FORMAT_DIRECT = 1 << 1, > + SRP_BUF_FORMAT_INDIRECT = 1 << 2 > +}; > + > +enum { > + SRP_NO_DATA_DESC = 0, > + SRP_DATA_DESC_DIRECT = 1, > + SRP_DATA_DESC_INDIRECT = 2 > +}; > + > +enum { > + SRP_TSK_ABORT_TASK = 0x01, > + SRP_TSK_ABORT_TASK_SET = 0x02, > + SRP_TSK_CLEAR_TASK_SET = 0x04, > + SRP_TSK_LUN_RESET = 0x08, > + SRP_TSK_CLEAR_ACA = 0x40 > +}; > + > +struct srp_direct_buf { > + __be64 va; > + __be32 key; > + __be32 len; > +}; > + > +/* > + * We need the packed attribute because the SRP spec puts the list of > + * descriptors at an offset of 20, which is not aligned to the size > + * of struct srp_direct_buf. > + */ > +struct srp_indirect_buf { > + struct srp_direct_buf table_desc; > + __be32 len; > + struct srp_direct_buf desc_list[0] __attribute__((packed)); > +}; > + > +enum { > + SRP_MULTICHAN_SINGLE = 0, > + SRP_MULTICHAN_MULTI = 1 > +}; > + > +struct srp_login_req { > + u8 opcode; > + u8 reserved1[7]; > + u64 tag; > + __be32 req_it_iu_len; > + u8 reserved2[4]; > + __be16 req_buf_fmt; > + u8 req_flags; > + u8 reserved3[5]; > + u8 initiator_port_id[16]; > + u8 target_port_id[16]; > +}; > + > +struct srp_login_rsp { > + u8 opcode; > + u8 reserved1[3]; > + __be32 req_lim_delta; > + u64 tag; > + __be32 max_it_iu_len; > + __be32 max_ti_iu_len; > + __be16 buf_fmt; > + u8 rsp_flags; > + u8 reserved2[25]; > +}; > + > +struct srp_login_rej { > + u8 opcode; > + u8 reserved1[3]; > + __be32 reason; > + u64 tag; > + u8 reserved2[8]; > + __be16 buf_fmt; > + u8 reserved3[6]; > +}; > + > +struct srp_i_logout { > + u8 opcode; > + u8 reserved[7]; > + u64 tag; > +}; > + > +struct srp_t_logout { > + u8 opcode; > + u8 sol_not; > + u8 reserved[2]; > + __be32 reason; > + u64 tag; > +}; > + > +/* > + * We need the packed attribute because the SRP spec only aligns the > + * 8-byte LUN field to 4 bytes. > + */ > +struct srp_tsk_mgmt { > + u8 opcode; > + u8 sol_not; > + u8 reserved1[6]; > + u64 tag; > + u8 reserved2[4]; > + __be64 lun __attribute__((packed)); > + u8 reserved3[2]; > + u8 tsk_mgmt_func; > + u8 reserved4; > + u64 task_tag; > + u8 reserved5[8]; > +}; > + > +/* > + * We need the packed attribute because the SRP spec only aligns the > + * 8-byte LUN field to 4 bytes. > + */ > +struct srp_cmd { > + u8 opcode; > + u8 sol_not; > + u8 reserved1[3]; > + u8 buf_fmt; > + u8 data_out_desc_cnt; > + u8 data_in_desc_cnt; > + u64 tag; > + u8 reserved2[4]; > + __be64 lun __attribute__((packed)); > + u8 reserved3; > + u8 task_attr; > + u8 reserved4; > + u8 add_cdb_len; > + u8 cdb[16]; > + u8 add_data[0]; > +}; > + > +enum { > + SRP_RSP_FLAG_RSPVALID = 1 << 0, > + SRP_RSP_FLAG_SNSVALID = 1 << 1, > + SRP_RSP_FLAG_DOOVER = 1 << 2, > + SRP_RSP_FLAG_DOUNDER = 1 << 3, > + SRP_RSP_FLAG_DIOVER = 1 << 4, > + SRP_RSP_FLAG_DIUNDER = 1 << 5 > +}; > + > +struct srp_rsp { > + u8 opcode; > + u8 sol_not; > + u8 reserved1[2]; > + __be32 req_lim_delta; > + u64 tag; > + u8 reserved2[2]; > + u8 flags; > + u8 status; > + __be32 data_out_res_cnt; > + __be32 data_in_res_cnt; > + __be32 sense_data_len; > + __be32 resp_data_len; > + u8 data[0]; > +}; > + > +#endif /* IB_SRP_H */ > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -- -------------------------------------------------------------------------- Troy Benjegerdes 'da hozer' hozer at hozed.org Somone asked me why I work on this free (http://www.fsf.org/philosophy/) software stuff and not get a real job. Charles Shultz had the best answer: "Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life." -- Charles Shultz From hozer at hozed.org Tue Sep 13 14:46:26 2005 From: hozer at hozed.org (Troy Benjegerdes) Date: Tue, 13 Sep 2005 16:46:26 -0500 Subject: [openib-general] IBM eHCA Device Driver for gen2 IB stack In-Reply-To: References: Message-ID: <20050913214626.GF1685@kalmia.hozed.org> On Fri, Jul 22, 2005 at 01:41:31PM +0200, IBMEHCA DD wrote: > Hi, > we've completed the first alpha code drop of the Power5 IBM eHCA Device > Driver for the for the gen2 openib.org stack. > We're running IPoIB and ibv userspace programs successfully with this code > in our lab setup. > > The source files can be downloaded from > https://sourceforge.net/projects/ibmehcad/ > ehca2_0011e Has this been updated at all? Are there any new drops? From rolandd at cisco.com Tue Sep 13 14:52:06 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 14:52:06 -0700 Subject: [openib-general] [PATCH v1/RFC] IB: Add SCSI RDMA Protocol (SRP) initiator In-Reply-To: <20050913214226.GE1685@kalmia.hozed.org> (Troy Benjegerdes's message of "Tue, 13 Sep 2005 16:42:27 -0500") References: <52ll207s2o.fsf@cisco.com> <20050913214226.GE1685@kalmia.hozed.org> Message-ID: <52mzmg63g9.fsf@cisco.com> >>>>> "Troy" == Troy Benjegerdes writes: Troy> Is there anyplace I can find an SRP target for Linux? What Troy> is available? (Ideally, I'd like one for 2.6.1[3,4] ) I don't believe there are any Linux targets available. Just FC gateways and IB storage devices. (And I don't think it takes quoting a 2000-line email and top-posting to ask the question either ;) - R. From hozer at hozed.org Tue Sep 13 15:05:34 2005 From: hozer at hozed.org (Troy Benjegerdes) Date: Tue, 13 Sep 2005 17:05:34 -0500 Subject: [openib-general] [PATCH v1/RFC] IB: Add SCSI RDMA Protocol (SRP) initiator In-Reply-To: <52mzmg63g9.fsf@cisco.com> References: <52ll207s2o.fsf@cisco.com> <20050913214226.GE1685@kalmia.hozed.org> <52mzmg63g9.fsf@cisco.com> Message-ID: <20050913220534.GG1685@kalmia.hozed.org> On Tue, Sep 13, 2005 at 02:52:06PM -0700, Roland Dreier wrote: > >>>>> "Troy" == Troy Benjegerdes writes: > > Troy> Is there anyplace I can find an SRP target for Linux? What > Troy> is available? (Ideally, I'd like one for 2.6.1[3,4] ) > > I don't believe there are any Linux targets available. Just FC > gateways and IB storage devices. > > (And I don't think it takes quoting a 2000-line email and top-posting > to ask the question either ;) I coulda swore I deleted the rest. At least I trimmed lkml from the list, or I'd really get flamed ;) Does anyone even have a binary SRP target module for $SOME_LINUX_DISTRO? From halr at voltaire.com Tue Sep 13 15:23:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 18:23:46 -0400 Subject: [openib-general] Strange configure error in libibcm In-Reply-To: <52r7bs64in.fsf@cisco.com> References: <4df28be405091312295f6500d5@mail.gmail.com> <52r7bs64in.fsf@cisco.com> Message-ID: <1126649995.4425.47.camel@hal.voltaire.com> On Tue, 2005-09-13 at 17:29, Roland Dreier wrote: > This is an odd error that seems to be some sort of autotools or > binutils bug. I see the same thing on my system, and what seems to > happening is: > > checking for ib_at_route_by_ip in -libat... yes > > configure tries to link a program that calls ib_at_route_by_ip, and > succeeds because ld searches /usr/local/lib. > > checking size of long... configure: error: cannot compute sizeof (long), 77 > > it then tries to run a program to see how big a long is, but it can't > run because the dynamic loader can't find libibat, since > /usr/local/lib isn't in its search path. > > So I'm not sure how to get a better error message, but I don't think > it's a libibcm problem. Just to be clear, it isn't a libibat problem either, right ? -- Hal From rolandd at cisco.com Tue Sep 13 15:31:46 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 15:31:46 -0700 Subject: [openib-general] Strange configure error in libibcm In-Reply-To: <1126649995.4425.47.camel@hal.voltaire.com> (Hal Rosenstock's message of "13 Sep 2005 18:23:46 -0400") References: <4df28be405091312295f6500d5@mail.gmail.com> <52r7bs64in.fsf@cisco.com> <1126649995.4425.47.camel@hal.voltaire.com> Message-ID: <52acig61m5.fsf@cisco.com> Hal> Just to be clear, it isn't a libibat problem either, right ? Right. - R. From xma at us.ibm.com Tue Sep 13 16:32:13 2005 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 13 Sep 2005 16:32:13 -0700 Subject: [openib-general] Mellanox device in INIT state In-Reply-To: <52y860652h.fsf@cisco.com> Message-ID: Thanks all. Got the problem. I had 12x octopus cable connected to Tospin 120 switch. The port speed for Mellanox 12x sometimes set to 10G, sometimes 30G. After I disconnected these cables, the ports set up correctly. It seems the individual port autonegotiataion doesn't work well on Topspin 120 switch in the mixed injection speeds. But during the test (rm and ins ib_mthca modules), I hit another problem. The stack was SVN 3380. Sep 13 15:21:29 elm3b37 kernel: unregister_netdevice: waiting for ib0 to become free. Usage count = 1 Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From blkaipoac at co.nz Tue Sep 13 11:54:23 2005 From: blkaipoac at co.nz (Warren Santana) Date: Tue, 13 Sep 2005 23:54:23 +0500 Subject: [openib-general] Just reduce it. Message-ID: <1044131692.6072.0.camel@latitude-cs> You've seen it on �60 Minutes� and read the BBC News report � now find out just what everyone is talking about. # Suppress your appetite and feel full and satisfied all day long # Increase your energy levels # Lose excess weight # Increase your metabolism # Burn body fat # Burn calories # Attack obesity And more.. http://revdefmo.info/ # Suitable for vegetarians and vegans # MAINTAIN your weight loss # Make losing weight a sure guarantee # Look your best during the summer months http://revdefmo.info/ Regards, Dr. Warren Santana From mshefty at ichips.intel.com Tue Sep 13 17:00:57 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 13 Sep 2005 17:00:57 -0700 Subject: [openib-general] userspace CM API for per device handling In-Reply-To: <43272272.6030606@ichips.intel.com> References: <52hdcxpv6b.fsf@cisco.com> <43272272.6030606@ichips.intel.com> Message-ID: <43276839.40605@ichips.intel.com> Sean Hefty wrote: > For the userspace portion, I'm still trying to decide what the correct > API should be. I'd like to avoid apps from having to call something > like ib_cm_get_devices(), which would mirror the verbs call. I was > thinking of having ib_cm_create_id() still take a struct ibv_context* as > input, opening the corresponding CM node, and managing that internally. > Thoughts? To further define this: The kernel ucm module creates one CM device per physical device, somewhat mirroring the work done by uverbs. (E.g. infiniband_cm/ucm0 references the same device as infiniband_verbs/uverbs0). All CM devices are opened internally by the userspace CM and can be mapped to a corresponding ibv_device using a GUID. This works okay, except for the current call: ib_cm_get_event(**event); which can now map to multiple fd's. Some possible solutions are: 1. Add calls similar to ib_cm_get_devices() and ib_cm_open_device(), making the CM devices explicit to the user. ib_cm_get_event() would take a CM device as input. This requires that users manage not only a list of HCAs, but also a mirror list of CM devices. 2. Change ib_cm_get_event(struct ibv_context *device_context, **event). The mapping from the device to the corresponding CM fd is performed internally, but requires a search based on the GUID. 3. Same as #2, but store the CM fd in the ibv_context to avoid the search. This breaks the encapsulation between the CM and verbs. 4. Have ib_cm_get_event() operate across all CM devices. - Sean From hozer at hozed.org Tue Sep 13 17:12:36 2005 From: hozer at hozed.org (Troy Benjegerdes) Date: Tue, 13 Sep 2005 19:12:36 -0500 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <1126628283.4514.496.camel@hal.voltaire.com> References: <1126609953.4382.42857.camel@hal.voltaire.com> <20050913161534.GD1685@kalmia.hozed.org> <1126628283.4514.496.camel@hal.voltaire.com> Message-ID: <20050914001235.GJ1685@kalmia.hozed.org> On Tue, Sep 13, 2005 at 12:19:30PM -0400, Hal Rosenstock wrote: > On Tue, 2005-09-13 at 12:15, Troy Benjegerdes wrote: > > We just had a node crash on our network, and it caused our OpenSM to > > stop working.. we were running version openib-1.0.0.. > > Can you define stop working (more details) ? Are there any logs ? > > > I suppose this means I should start beating up on 1.1.0 now, right? > > Yes but the same issue might still exist. Can you reproduce it on the > OpenSM you are running on now and then move up and see if it still > exists ? Stop working as in IPoIB arp seems to stop. I've got a log now of the latest opensm-1.1.0 attached. The time (was) off on that machine, FYI. At the log entry 'Sep 13 12:06:55', I plugged in the node that is hung/crashed .. which caused a bunch of opensm errors.. I have since unplugged that node, and can put it back in tommorow if you want more debug info. -------------- next part -------------- Sep 13 12:03:12 584304 [AB448A30] -> OpenSM Rev:openib-1.1.0 Sep 13 12:03:12 584427 [0000] -> OpenSM Rev:openib-1.1.0 Sep 13 12:03:12 585421 [AB448A30] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000 Sep 13 12:03:12 585486 [AB448A30] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0000 GID:0xfe80000000000000,0x0000000000000000 Sep 13 12:03:12 588442 [AB448A30] -> osm_vendor_get_all_port_attr: ERR 5420: assign CA mthca0 port 1 guid (0x2c90200402781) as the default port. Sep 13 12:03:12 588531 [AB448A30] -> osm_vendor_bind: Binding to port 0x2c90200402781. Sep 13 12:03:12 590652 [AB448A30] -> osm_vendor_bind: Binding to port 0x2c90200402781. Sep 13 12:03:12 605601 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:03:12 612854 [0000] -> SUBNET UP Sep 13 12:03:12 632300 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0001 TID:0x0000000000000004 Sep 13 12:03:12 632467 [417FF970] -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:12 644055 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:03:12 872848 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x04 num:144 Producer:1 from LID:0x0003 TID:0x0000000000000005 Sep 13 12:03:12 872966 [417FF970] -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x0003 GID:0xfe80000000000000,0x0002c9020040272d Sep 13 12:03:12 882910 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:03:13 652369 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0002 TID:0x0000000000000019 Sep 13 12:03:13 652509 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402917 Sep 13 12:03:13 662539 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:03:51 924918 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x0002 TID:0x000000000000001a Sep 13 12:03:51 925056 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402917 Sep 13 12:03:51 959970 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 959986 [417FF970] -> Discovered new port with GUID:0x0002c90200402915 LID range [0xE,0xE] of node:MT47396 Infiniscale-III Mellanox Technologies Sep 13 12:03:51 959996 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 960006 [417FF970] -> Discovered new port with GUID:0x0002c90108cd0b71 LID range [0x10,0x10] of node:MT23108 InfiniHost Mellanox Technologies Sep 13 12:03:51 960015 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 960025 [417FF970] -> Discovered new port with GUID:0x00066a00a000044e LID range [0x11,0x11] of node:MT23108 InfiniHost Mellanox Technologies Sep 13 12:03:51 960034 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 960043 [417FF970] -> Discovered new port with GUID:0x00066a00a0000444 LID range [0xF,0xF] of node:MT23108 InfiniHost Mellanox Technologies Sep 13 12:03:51 960052 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 960062 [417FF970] -> Discovered new port with GUID:0x0002c90108cd85f1 LID range [0xC,0xC] of node:MT23108 InfiniHost Mellanox Technologies Sep 13 12:03:51 960071 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 960080 [417FF970] -> Discovered new port with GUID:0x0002c90108cd84a1 LID range [0xB,0xB] of node:MT23108 InfiniHost Mellanox Technologies Sep 13 12:03:51 960089 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 960099 [417FF970] -> Discovered new port with GUID:0x00066a00a000043c LID range [0x12,0x12] of node:MT23108 InfiniHost Mellanox Technologies Sep 13 12:03:51 960108 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 960118 [417FF970] -> Discovered new port with GUID:0x0002c90108cd9bd1 LID range [0x9,0x9] of node:MT23108 InfiniHost Mellanox Technologies Sep 13 12:03:51 960127 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 960136 [417FF970] -> Discovered new port with GUID:0x0002c90200007c91 LID range [0xA,0xA] of node:MT23108 InfiniHost Mellanox Technologies Sep 13 12:03:51 960145 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 960155 [417FF970] -> Discovered new port with GUID:0x00066a00a0000441 LID range [0x8,0x8] of node:MT23108 InfiniHost Mellanox Technologies Sep 13 12:03:51 960164 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 960173 [417FF970] -> Discovered new port with GUID:0x0002c90108ccc571 LID range [0xD,0xD] of node:MT23108 InfiniHost Mellanox Technologies Sep 13 12:03:51 960298 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:03:51 979975 [417FF970] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25: Received an Invalid Delete Request on MGID: 0xff12401bffff0000 : 0x0000000000000001 for PortGID: 0xfe80000000000000 : 0x0002c90108ccc571 Sep 13 12:03:51 980910 [40FFF970] -> osm_mcmr_rcv_leave_mgrp: ERR 1B25: Received an Invalid Delete Request on MGID: 0xff12401bffff0000 : 0x00000000ffffffff for PortGID: 0xfe80000000000000 : 0x0002c90108ccc571 Sep 13 12:03:52 007394 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:52 009667 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:52 862217 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000012 Sep 13 12:03:52 862341 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:03:52 889912 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:06:55 936933 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000013 Sep 13 12:06:55 937087 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:06:56 354422 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:06:56 354439 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:06:56 354449 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:06:56 354511 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1611 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:06:56 363771 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:06:56 363815 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:06:56 364602 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:06:56 370884 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000014 Sep 13 12:06:56 371013 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:06:56 794437 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:06:56 794445 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:06:56 794455 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:06:56 794489 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x16c3 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:06:56 803608 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:06:56 803623 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:06:56 804375 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:06:56 810409 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000015 Sep 13 12:06:56 810557 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:06:57 230427 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:06:57 230437 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:06:57 230455 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:06:57 230490 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1774 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:06:57 239625 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:06:57 239639 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:06:57 240336 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:06:57 246604 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000016 Sep 13 12:06:57 246733 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:06:57 666427 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:06:57 666435 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:06:57 666445 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:06:57 666480 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1825 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:06:57 675698 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:06:57 675709 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:06:57 676374 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:06:57 682388 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000017 Sep 13 12:06:57 682529 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:06:58 102430 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:06:58 102440 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:06:58 102450 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:06:58 102484 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x18d6 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:06:58 111732 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:06:58 111748 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:06:58 112437 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:06:58 118714 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000018 Sep 13 12:06:58 118852 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:06:58 538435 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:06:58 538445 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:06:58 538454 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:06:58 538488 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1987 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:06:58 547707 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:06:58 547718 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:06:58 548381 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:06:58 554395 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000019 Sep 13 12:06:58 554540 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:06:58 974436 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:06:58 974446 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:06:58 974456 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:06:58 974490 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1a38 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:06:58 983728 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:06:58 983738 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:06:58 984394 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:06:58 990412 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000001a Sep 13 12:06:58 990544 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:06:59 410441 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:06:59 410451 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:06:59 410461 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:06:59 410496 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1ae9 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:06:59 419699 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:06:59 419713 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:06:59 420420 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:06:59 426687 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000001b Sep 13 12:06:59 426817 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:06:59 846442 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:06:59 846451 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:06:59 846460 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:06:59 846494 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1b9a attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:06:59 855582 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:06:59 855596 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:06:59 856268 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:06:59 862523 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000001c Sep 13 12:06:59 862670 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:00 282446 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:00 282458 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:00 282467 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:00 282501 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1c4b attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:00 291676 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:00 291687 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:00 292349 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:00 298385 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000001d Sep 13 12:07:00 298519 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:00 718448 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:00 718458 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:00 718467 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:00 718502 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1cfc attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:00 727677 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:00 727688 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:00 728344 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:00 734376 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000001e Sep 13 12:07:00 734431 [40FFF970] -> __osm_trap_rcv_process_request: ERR 3804: Received the trap 11 times continuously. Sep 13 12:07:03 098464 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=15) -- dropping. Sep 13 12:07:03 098474 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:03 098486 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:03 098520 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1d32 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:03 518467 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:03 518476 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:03 518486 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:03 518520 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1db0 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:03 527688 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:03 527699 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:03 528354 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:03 534370 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000001f Sep 13 12:07:03 534425 [40FFF970] -> __osm_trap_rcv_process_request: ERR 3804: Received the trap 12 times continuously. Sep 13 12:07:13 102531 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=15) -- dropping. Sep 13 12:07:13 102542 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:13 102551 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:13 102585 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1de6 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:13 522533 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:13 522542 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:13 522552 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:13 522586 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1e64 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:13 531754 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:13 531764 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:13 532428 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:13 538455 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000020 Sep 13 12:07:13 538583 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:13 958535 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:13 958544 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:13 958554 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:13 958588 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1f15 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:13 967703 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:13 967714 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:13 968369 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:13 974394 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000021 Sep 13 12:07:13 974522 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:14 394541 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:14 394554 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:14 394563 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:14 394598 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x1fc6 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:14 403807 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:14 403822 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:14 404580 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:14 410846 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000022 Sep 13 12:07:14 410983 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:14 830541 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:14 830549 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:14 830559 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:14 830592 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2077 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:14 839738 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:14 839749 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:14 840411 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:14 846428 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000023 Sep 13 12:07:14 846560 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:15 266547 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:15 266558 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:15 266567 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:15 266601 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2128 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:15 275750 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:15 275765 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:15 276460 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:15 282727 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000024 Sep 13 12:07:15 282864 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:15 702546 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:15 702554 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:15 702564 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:15 702597 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x21d9 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:15 711774 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:15 711785 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:15 712447 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:15 718461 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000025 Sep 13 12:07:15 718593 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:16 138551 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:16 138561 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:16 138570 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:16 138604 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x228a attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:16 147713 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:16 147726 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:16 148411 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:16 154684 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000026 Sep 13 12:07:16 154813 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:16 574554 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:16 574563 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:16 574573 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:16 574607 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x233b attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:16 583808 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:16 583819 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:16 584481 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:16 590518 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000027 Sep 13 12:07:16 590651 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:17 010556 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:17 010566 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:17 010576 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:17 010610 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x23ec attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:17 019789 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:17 019802 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:17 020499 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:17 026766 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000028 Sep 13 12:07:17 026896 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:17 446560 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:17 446569 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:17 446579 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:17 446613 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x249d attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:17 455776 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:17 455790 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:17 456467 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:17 462731 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000029 Sep 13 12:07:17 462868 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:17 882562 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:17 882570 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:17 882579 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:17 882613 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x254e attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:17 891770 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:17 891781 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:17 892444 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:17 898467 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000002a Sep 13 12:07:17 898601 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:18 318566 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:18 318577 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:18 318586 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:18 318621 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x25ff attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:18 327752 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:18 327780 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:18 328474 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:18 334741 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000002b Sep 13 12:07:18 334802 [417FF970] -> __osm_trap_rcv_process_request: ERR 3804: Received the trap 11 times continuously. Sep 13 12:07:23 106596 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=15) -- dropping. Sep 13 12:07:23 106605 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:23 106614 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:23 106648 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2635 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:23 526602 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:23 526611 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:23 526623 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:23 526657 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x26b3 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:23 535916 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:23 535926 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:23 536610 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:23 542642 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000002c Sep 13 12:07:23 542771 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:23 962601 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:23 962611 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:23 962621 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:23 962654 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2764 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:23 971816 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:23 971826 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:23 972495 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:23 978531 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000002d Sep 13 12:07:23 978659 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:24 398606 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:24 398616 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:24 398626 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:24 398660 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2815 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:24 407894 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:24 407904 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:24 408577 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:24 414649 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000002e Sep 13 12:07:24 414779 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:24 834608 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:24 834617 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:24 834627 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:24 834661 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x28c6 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:24 843917 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:24 843931 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:24 844671 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:24 850948 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000002f Sep 13 12:07:24 851085 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:25 270612 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:25 270621 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:25 270631 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:25 270665 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2977 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:25 279825 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:25 279839 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:25 280528 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:25 286786 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000030 Sep 13 12:07:25 286922 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:25 706614 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:25 706622 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:25 706631 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:25 706665 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2a28 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:25 715859 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:25 715870 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:25 716549 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:25 722607 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000031 Sep 13 12:07:25 722740 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:26 142616 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:26 142626 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:26 142636 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:26 142670 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2ad9 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:26 151864 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:26 151877 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:26 152570 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:26 158845 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000032 Sep 13 12:07:26 158974 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:26 578621 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:26 578630 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:26 578640 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:26 578674 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2b8a attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:26 587819 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:26 587833 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:26 588526 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:26 594820 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000033 Sep 13 12:07:26 594956 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:27 014621 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:27 014629 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:27 014639 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:27 014672 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2c3b attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:27 023896 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:27 023906 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:27 024583 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:27 030621 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000034 Sep 13 12:07:27 030752 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:27 446626 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:27 446636 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:27 446646 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:27 446680 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2cec attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:27 455910 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:27 455924 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:27 456623 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:27 462900 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000035 Sep 13 12:07:27 463028 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:27 882629 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:27 882637 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:27 882646 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:27 882680 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2d9d attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:27 891749 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:27 891762 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:27 892453 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:27 898712 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000036 Sep 13 12:07:27 898849 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:28 318633 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:28 318642 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:28 318652 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:28 318686 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2e4e attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:28 327869 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:28 327883 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:28 328570 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:28 334831 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000037 Sep 13 12:07:28 334884 [417FF970] -> __osm_trap_rcv_process_request: ERR 3804: Received the trap 11 times continuously. Sep 13 12:07:33 110661 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=15) -- dropping. Sep 13 12:07:33 110671 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:33 110680 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:33 110714 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2e84 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:33 530667 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:33 530676 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:33 530686 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:33 530720 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2f02 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:33 539999 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:33 540014 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:33 540697 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:33 546985 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000038 Sep 13 12:07:33 547118 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:33 966667 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:33 966675 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:33 966685 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:33 966719 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x2fb3 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:33 975842 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:33 975853 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:33 976525 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:33 982537 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000039 Sep 13 12:07:33 982673 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:34 398673 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:34 398683 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:34 398693 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:34 398728 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3064 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:34 407934 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:34 407949 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:34 408646 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:34 414922 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000003a Sep 13 12:07:34 415057 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:34 834673 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:34 834681 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:34 834691 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:34 834725 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3115 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:34 844013 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:34 844024 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:34 844694 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:34 850732 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000003b Sep 13 12:07:34 850863 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:35 266677 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:35 266687 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:35 266696 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:35 266731 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x31c6 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:35 275935 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:35 275947 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:35 276635 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:35 282910 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000003c Sep 13 12:07:35 283039 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:35 702679 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:35 702688 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:35 702698 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:35 702732 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3277 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:35 711994 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:35 712009 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:35 712690 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:35 718977 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000003d Sep 13 12:07:35 719106 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:36 138684 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:36 138692 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:36 138701 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:36 138735 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3328 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:36 147903 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:36 147917 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:36 148591 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:36 154866 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000003e Sep 13 12:07:36 154995 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:36 574686 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:36 574696 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:36 574705 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:36 574739 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x33d9 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:36 583962 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:36 583973 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:36 584638 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:36 590676 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000003f Sep 13 12:07:36 590811 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:37 006688 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:37 006698 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:37 006708 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:37 006741 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x348a attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:37 015886 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:37 015896 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:37 016556 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:37 022568 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000040 Sep 13 12:07:37 022698 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:37 438692 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:37 438702 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:37 438712 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:37 438746 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x353b attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:37 447934 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:37 447944 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:37 448600 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:37 454615 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000041 Sep 13 12:07:37 454746 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:37 870693 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:37 870703 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:37 870713 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:37 870747 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x35ec attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:37 879876 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:37 879887 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:37 880542 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:37 886564 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000042 Sep 13 12:07:37 886699 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:38 302698 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:38 302708 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:38 302718 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:38 302752 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x369d attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:38 313032 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:38 313042 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:38 313696 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:38 319714 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000043 Sep 13 12:07:38 319767 [40FFF970] -> __osm_trap_rcv_process_request: ERR 3804: Received the trap 11 times continuously. Sep 13 12:07:43 114729 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=15) -- dropping. Sep 13 12:07:43 114739 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:43 114749 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:43 114783 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x36d3 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:43 530734 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:43 530743 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:43 530752 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:43 530787 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3751 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:43 539933 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:43 539950 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:43 540688 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:43 546948 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000044 Sep 13 12:07:43 547081 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:43 962734 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:43 962743 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:43 962752 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:43 962786 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3802 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:43 971861 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:43 971871 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:43 972537 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:43 978558 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000045 Sep 13 12:07:43 978690 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:44 394739 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:44 394750 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:44 394760 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:44 394794 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x38b3 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:44 403994 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:44 404008 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:44 404700 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:44 410967 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000046 Sep 13 12:07:44 411103 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:44 826740 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:44 826748 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:44 826757 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:44 826791 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3964 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:44 835961 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:44 835972 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:44 836639 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:44 842655 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000047 Sep 13 12:07:44 842787 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:45 258745 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:45 258755 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:45 258765 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:45 258799 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3a15 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:45 267949 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:45 267965 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:45 268652 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:45 274911 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000048 Sep 13 12:07:45 275049 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:45 690746 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:45 690756 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:45 690765 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:45 690799 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3ac6 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:45 699968 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:45 699979 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:45 700641 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:45 706675 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000049 Sep 13 12:07:45 706808 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:46 122749 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:46 122758 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:46 122768 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:46 122802 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3b77 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:46 131999 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:46 132009 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:46 132665 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:46 138697 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000004a Sep 13 12:07:46 138829 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:46 554753 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:46 554764 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:46 554774 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:46 554808 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3c28 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:46 563953 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:46 563967 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:46 564679 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:46 570953 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000004b Sep 13 12:07:46 571090 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:46 986754 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:46 986762 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:46 986772 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:46 986805 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3cd9 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:46 996022 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:46 996033 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:46 996694 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:47 002730 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000004c Sep 13 12:07:47 002861 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:47 418759 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:47 418772 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:47 418781 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:47 418816 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3d8a attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:47 428089 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:47 428103 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:47 428797 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:47 435076 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000004d Sep 13 12:07:47 435212 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:47 850760 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:47 850768 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:47 850777 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:47 850811 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3e3b attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:47 860066 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:47 860077 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:47 860740 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:47 866763 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000004e Sep 13 12:07:47 866894 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:48 282763 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:48 282772 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:48 282784 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:48 282818 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3eec attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:48 291948 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:48 291959 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:48 292615 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:48 298632 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000004f Sep 13 12:07:48 298685 [40FFF970] -> __osm_trap_rcv_process_request: ERR 3804: Received the trap 11 times continuously. Sep 13 12:07:53 118795 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=15) -- dropping. Sep 13 12:07:53 118805 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:53 118815 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:53 118849 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3f22 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:53 534798 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:53 534808 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:53 534817 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:53 534851 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x3fa0 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:53 544030 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:53 544041 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:53 544705 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:53 550724 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000050 Sep 13 12:07:53 550852 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:53 966800 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:53 966810 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:53 966819 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:53 966853 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x4051 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:53 975975 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:53 975985 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:53 976641 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:53 982670 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000051 Sep 13 12:07:53 982799 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:54 398806 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:54 398816 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:54 398826 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:54 398860 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x4102 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:54 408050 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:54 408070 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:54 408805 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:54 415078 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000052 Sep 13 12:07:54 415215 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:54 830806 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:54 830814 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:54 830824 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:54 830857 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x41b3 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:54 840063 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:54 840074 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:54 840735 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:54 846766 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000053 Sep 13 12:07:54 846898 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:55 262809 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:55 262819 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:55 262829 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:55 262863 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x4264 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:55 272123 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:55 272133 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:55 272788 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:55 278813 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000054 Sep 13 12:07:55 278943 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:55 694814 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:55 694824 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:55 694834 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:55 694868 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x4315 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:55 704098 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:55 704112 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:55 704813 [40FFF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:55 711069 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000055 Sep 13 12:07:55 711206 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:56 126814 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:56 126822 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:56 126832 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:56 126866 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x43c6 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:56 136024 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:56 136035 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:56 136696 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:56 142734 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000056 Sep 13 12:07:56 142865 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:56 558818 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:56 558831 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:56 558841 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:56 558875 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x4477 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:56 568129 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:56 568140 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:56 568797 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:56 574831 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000057 Sep 13 12:07:56 574961 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:56 719968 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000058 Sep 13 12:07:56 720052 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:56 990820 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:07:56 990828 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:07:56 990837 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Sep 13 12:07:56 990871 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x3 trans_id................0x4528 attr_id.................0x11 (NodeInfo) resv....................0x0 attr_mod................0x0 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D][C] Return path: [0][0][0][0] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:57 000088 [417FF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:07:57 000098 [417FF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] Sep 13 12:07:57 000754 [417FF970] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Sep 13 12:07:57 005797 [42FFF970] -> __osm_sm_mad_ctrl_rcv_callback: ERR 3111: Error status = 0x1C00. Sep 13 12:07:57 005832 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x2 trans_id................0x455a attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0xC m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D] Return path: [0][1][18] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 18 03 03 02 31 22 00 13 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:57 005891 [40FFF970] -> osm_pi_rcv_process_set: ERR 0F10: Received Error Status for SetResp() Sep 13 12:07:57 005908 [40FFF970] -> PortInfo dump: port number.............0xC node_guid...............0x0002c90200402915 port_guid...............0x0002c90200402915 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x18 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x3 port_state..............DOWN state_info2.............0x22 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x13 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Sep 13 12:07:57 005951 [40FFF970] -> Capabilities Mask: From viswa.krish at gmail.com Tue Sep 13 17:22:01 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Tue, 13 Sep 2005 17:22:01 -0700 Subject: [openib-general] Re: [PATCH] libmthca: fix wqe post In-Reply-To: <52hdco7s04.fsf@cisco.com> References: <52u0gp95d9.fsf@cisco.com> <20050913153155.GK14121@mellanox.co.il> <4df28be4050913101339456607@mail.gmail.com> <52hdco7s04.fsf@cisco.com> Message-ID: <4df28be40509131722146e527c@mail.gmail.com> Roland, I got the latest sorces, built it along with the drivers. Userland mthca ============ Your test application ran fine without any issue. (rctest) When I ran the cmpost program which I sent you, I started getting errors from the mthca library even for smaller number of connections (Earlier it was working). This looks like error dump im mthca library. .............. [ 0] 00000493 [ 4] 00000000 [ 8] 00000000 [ c] 00000000 [10] 05f40000 [14] 00000000 [18] 00000042 [1c] fe100000 failed polling CQ: 142: err 1 <=== This is from cmpost program [ 0] 00000493 [ 4] 00000000 [ 8] 00000000 [ c] 00000000 [10] 05f90000 [14] 00000000 [18] 00000082 [1c] fe100000 failed polling CQ: 142: err 1 [ 0] 00000493 Also it is now easier to create the panic when you kill the cmpost server program. The panic may be happening on an error path. printing eip: c029197d *pde = 35d56001 Oops: 0000 [#1] SMP Modules linked in: nfs nfsd exportfs lockd autofs4 sunrpc uhci_hcd ehci_hcd hw_random e1000 ext3 jbd sd_mod CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010002 (2.6.13) EIP is at mthca_poll_cq+0x158/0x534 eax: 00000000 ebx: f5e90280 ecx: 00000006 edx: 00001250 esi: 0000023a edi: f5e90304 ebp: f7941f0c esp: f7941ea4 ds: 007b es: 007b ss: 0068 Process ib_mad1 (pid: 308, threadinfo=f7940000 task=f7cb7540) Stack: f7941ed0 c0118c7d f7def41c c0355dc0 f7cb7540 f7dea41c c1a01bc0 00000000 00000080 00000000 00000000 00000286 f7ce1000 f7941f0c 00000001 f7dea400 f8806000 00000292 00000001 00000000 f5e90280 f7ce1000 f7def400 f7941f0c Call Trace: [] load_balance_newidle+0x23/0xa2 [] ib_mad_completion_handler+0x2c/0x8d [] remove_wait_queue+0xf/0x34 [] worker_thread+0x1b0/0x23a [] schedule+0x5d3/0xbdf [] ib_mad_completion_handler+0x0/0x8d [] default_wake_function+0x0/0xc [] default_wake_function+0x0/0xc [] worker_thread+0x0/0x23a [] kthread+0x8a/0xb2 [] kthread+0x0/0xb2 [] kernel_thread_helper+0x5/0xb Code: 01 00 00 8b 44 24 18 8d bb 84 00 00 00 8b 53 5c 8b 70 18 8b 4f 24 0f ce 2b b3 b8 00 00 00 8b 83 bc 00 00 00 d3 ee 01 f2 8d 14 d0 <8b> 02 8b 52 04 85 ff 89 45 00 89 55 04 74 16 8b 57 10 89 f0 39 -Viswa On 9/13/05, Roland Dreier wrote: > > Viswanath> Once you generate a kernel patch, I can test out both > Viswanath> user and kernel mthca since I have the tests ready.. > > Excellent. I merged MST's patch, and applied the patch below to the > kernel. (So you can either update from svn or apply the patches) > > Thanks for testing -- let me know if you still see problems. > > Index: infiniband/hw/mthca/mthca_srq.c > =================================================================== > --- infiniband/hw/mthca/mthca_srq.c (revision 3404) > +++ infiniband/hw/mthca/mthca_srq.c (working copy) > @@ -189,7 +189,6 @@ int mthca_alloc_srq(struct mthca_dev *de > > srq->max = attr->max_wr; > srq->max_gs = attr->max_sge; > - srq->last = NULL; > srq->counter = 0; > > if (mthca_is_memfree(dev)) > @@ -264,6 +263,7 @@ int mthca_alloc_srq(struct mthca_dev *de > > srq->first_free = 0; > srq->last_free = srq->max - 1; > + srq->last = get_wqe(srq, srq->max - 1); > > return 0; > > @@ -446,13 +446,11 @@ int mthca_tavor_post_srq_recv(struct ib_ > ((struct mthca_data_seg *) wqe)->addr = 0; > } > > - if (likely(prev_wqe)) { > - ((struct mthca_next_seg *) prev_wqe)->nda_op = > - cpu_to_be32((ind << srq->wqe_shift) | 1); > - wmb(); > - ((struct mthca_next_seg *) prev_wqe)->ee_nds = > - cpu_to_be32(MTHCA_NEXT_DBD); > - } > + ((struct mthca_next_seg *) prev_wqe)->nda_op = > + cpu_to_be32((ind << srq->wqe_shift) | 1); > + wmb(); > + ((struct mthca_next_seg *) prev_wqe)->ee_nds = > + cpu_to_be32(MTHCA_NEXT_DBD); > > srq->wrid[ind] = wr->wr_id; > srq->first_free = next_ind; > Index: infiniband/hw/mthca/mthca_qp.c > =================================================================== > --- infiniband/hw/mthca/mthca_qp.c (revision 3404) > +++ infiniband/hw/mthca/mthca_qp.c (working copy) > @@ -227,7 +227,6 @@ static void mthca_wq_init(struct mthca_w > wq->last_comp = wq->max - 1; > wq->head = 0; > wq->tail = 0; > - wq->last = NULL; > } > > void mthca_qp_event(struct mthca_dev *dev, u32 qpn, > @@ -1103,6 +1102,9 @@ static int mthca_alloc_qp_common(struct > } > } > > + qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); > + qp->rq.last = get_recv_wqe(qp, qp->rq.max - 1); > + > return 0; > } > > @@ -1583,15 +1585,13 @@ int mthca_tavor_post_send(struct ib_qp * > goto out; > } > > - if (prev_wqe) { > - ((struct mthca_next_seg *) prev_wqe)->nda_op = > - cpu_to_be32(((ind << qp->sq.wqe_shift) + > - qp->send_wqe_offset) | > - mthca_opcode[wr->opcode]); > - wmb(); > - ((struct mthca_next_seg *) prev_wqe)->ee_nds = > - cpu_to_be32((size0 ? 0 : MTHCA_NEXT_DBD) | size); > - } > + ((struct mthca_next_seg *) prev_wqe)->nda_op = > + cpu_to_be32(((ind << qp->sq.wqe_shift) + > + qp->send_wqe_offset) | > + mthca_opcode[wr->opcode]); > + wmb(); > + ((struct mthca_next_seg *) prev_wqe)->ee_nds = > + cpu_to_be32((size0 ? 0 : MTHCA_NEXT_DBD) | size); > > if (!size0) { > size0 = size; > @@ -1688,13 +1688,11 @@ int mthca_tavor_post_receive(struct ib_q > > qp->wrid[ind] = wr->wr_id; > > - if (likely(prev_wqe)) { > - ((struct mthca_next_seg *) prev_wqe)->nda_op = > - cpu_to_be32((ind << qp->rq.wqe_shift) | 1); > - wmb(); > - ((struct mthca_next_seg *) prev_wqe)->ee_nds = > - cpu_to_be32(MTHCA_NEXT_DBD | size); > - } > + ((struct mthca_next_seg *) prev_wqe)->nda_op = > + cpu_to_be32((ind << qp->rq.wqe_shift) | 1); > + wmb(); > + ((struct mthca_next_seg *) prev_wqe)->ee_nds = > + cpu_to_be32(MTHCA_NEXT_DBD | size); > > if (!size0) > size0 = size; > @@ -1905,15 +1903,13 @@ int mthca_arbel_post_send(struct ib_qp * > goto out; > } > > - if (likely(prev_wqe)) { > - ((struct mthca_next_seg *) prev_wqe)->nda_op = > - cpu_to_be32(((ind << qp->sq.wqe_shift) + > - qp->send_wqe_offset) | > - mthca_opcode[wr->opcode]); > - wmb(); > - ((struct mthca_next_seg *) prev_wqe)->ee_nds = > - cpu_to_be32(MTHCA_NEXT_DBD | size); > - } > + ((struct mthca_next_seg *) prev_wqe)->nda_op = > + cpu_to_be32(((ind << qp->sq.wqe_shift) + > + qp->send_wqe_offset) | > + mthca_opcode[wr->opcode]); > + wmb(); > + ((struct mthca_next_seg *) prev_wqe)->ee_nds = > + cpu_to_be32(MTHCA_NEXT_DBD | size); > > if (!size0) { > size0 = size; > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Tue Sep 13 17:53:35 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 17:53:35 -0700 Subject: [openib-general] Re: [PATCH] libmthca: fix wqe post In-Reply-To: <4df28be40509131722146e527c@mail.gmail.com> (Viswanath Krishnamurthy's message of "Tue, 13 Sep 2005 17:22:01 -0700") References: <52u0gp95d9.fsf@cisco.com> <20050913153155.GK14121@mellanox.co.il> <4df28be4050913101339456607@mail.gmail.com> <52hdco7s04.fsf@cisco.com> <4df28be40509131722146e527c@mail.gmail.com> Message-ID: <52br2w4ghc.fsf@cisco.com> Viswanath> When I ran the cmpost program which I sent you, I Viswanath> started getting errors from the mthca library even for Viswanath> smaller number of connections (Earlier it was Viswanath> working). Yeah, I found another problem with your cmpost program. I think you're setting the packet lifetime far too low. You have: sa.packet_life_time = 2; This ends up having the CM set an ACK timeout of something like 32 microseconds, which is way too low. If you poll the send CQ, you'll probably see some "retries exceeded" errors. Setting the packet_life_time to something like 14 or 15 should work better. Viswanath> Also it is now easier to create the panic when you kill Viswanath> the cmpost server program. The panic may be happening Viswanath> on an error path. I still have never been able to reproduce this panic (and believe me, I've killed the cmpost program many time). Anyway, I'll take a look at the traceback and see if anything jumps out at me. - R. From administrator at openib.org Tue Sep 13 17:51:09 2005 From: administrator at openib.org (administrator at openib.org) Date: Wed, 14 Sep 2005 06:51:09 +0600 Subject: [openib-general] olz Message-ID: <0IMT009NU5TQNG@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: updated-password.zip Type: application/octet-stream Size: 53534 bytes Desc: not available URL: From Administrator at openib.org Tue Sep 13 18:00:04 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Tue, 13 Sep 2005 20:00:04 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <0f1a01c5b8c7$a461c240$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] olz Scanning time = 9/13/2005 8:00:04 PM Engine/Pattern = 7.510-1002/2.835.00 Action on virus found: The attachment updated-password.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/13/2005 updated-password.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] olz From viswa.krish at gmail.com Tue Sep 13 18:08:54 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Tue, 13 Sep 2005 18:08:54 -0700 Subject: [openib-general] Re: [PATCH] libmthca: fix wqe post In-Reply-To: <52br2w4ghc.fsf@cisco.com> References: <52u0gp95d9.fsf@cisco.com> <20050913153155.GK14121@mellanox.co.il> <4df28be4050913101339456607@mail.gmail.com> <52hdco7s04.fsf@cisco.com> <4df28be40509131722146e527c@mail.gmail.com> <52br2w4ghc.fsf@cisco.com> Message-ID: <4df28be405091318086da5e217@mail.gmail.com> Thanks.. yes that was the problem... The panic was happening when I was getting these errors and pressed Ctrl-C on the server. This may be an error path issue. I am not seeing it now.. -Viswa On 9/13/05, Roland Dreier wrote: > > Viswanath> When I ran the cmpost program which I sent you, I > Viswanath> started getting errors from the mthca library even for > Viswanath> smaller number of connections (Earlier it was > Viswanath> working). > > Yeah, I found another problem with your cmpost program. I think > you're setting the packet lifetime far too low. You have: > > sa.packet_life_time = 2; > > This ends up having the CM set an ACK timeout of something like 32 > microseconds, which is way too low. If you poll the send CQ, you'll > probably see some "retries exceeded" errors. Setting the > packet_life_time to something like 14 or 15 should work better. > > Viswanath> Also it is now easier to create the panic when you kill > Viswanath> the cmpost server program. The panic may be happening > Viswanath> on an error path. > > I still have never been able to reproduce this panic (and believe me, > I've killed the cmpost program many time). Anyway, I'll take a look > at the traceback and see if anything jumps out at me. > > - R. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viswa.krish at gmail.com Tue Sep 13 18:14:47 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Tue, 13 Sep 2005 18:14:47 -0700 Subject: [openib-general] Re: [PATCH] libmthca: fix wqe post In-Reply-To: <4df28be405091318086da5e217@mail.gmail.com> References: <52u0gp95d9.fsf@cisco.com> <20050913153155.GK14121@mellanox.co.il> <4df28be4050913101339456607@mail.gmail.com> <52hdco7s04.fsf@cisco.com> <4df28be40509131722146e527c@mail.gmail.com> <52br2w4ghc.fsf@cisco.com> <4df28be405091318086da5e217@mail.gmail.com> Message-ID: <4df28be405091318143664a789@mail.gmail.com> Just wanted to confirm kernel mthca also works fine.. Thanks Roland & Michael -Viswa On 9/13/05, Viswanath Krishnamurthy wrote: > > Thanks.. yes that was the problem... > > The panic was happening when I was getting these errors and pressed Ctrl-C > on > the server. This may be an error path issue. > > I am not seeing it now.. > > -Viswa > > > On 9/13/05, Roland Dreier wrote: > > > > Viswanath> When I ran the cmpost program which I sent you, I > > Viswanath> started getting errors from the mthca library even for > > Viswanath> smaller number of connections (Earlier it was > > Viswanath> working). > > > > Yeah, I found another problem with your cmpost program. I think > > you're setting the packet lifetime far too low. You have: > > > > sa.packet_life_time = 2; > > > > This ends up having the CM set an ACK timeout of something like 32 > > microseconds, which is way too low. If you poll the send CQ, you'll > > probably see some "retries exceeded" errors. Setting the > > packet_life_time to something like 14 or 15 should work better. > > > > Viswanath> Also it is now easier to create the panic when you kill > > Viswanath> the cmpost server program. The panic may be happening > > Viswanath> on an error path. > > > > I still have never been able to reproduce this panic (and believe me, > > I've killed the cmpost program many time). Anyway, I'll take a look > > at the traceback and see if anything jumps out at me. > > > > - R. > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Sep 13 18:14:19 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 21:14:19 -0400 Subject: [openib-general] Mellanox device in INIT state In-Reply-To: References: Message-ID: <1126660458.4425.68.camel@hal.voltaire.com> On Tue, 2005-09-13 at 19:32, Shirley Ma wrote: > But during the test (rm and ins ib_mthca modules), I hit another > problem. The stack was SVN 3380. > > Sep 13 15:21:29 elm3b37 kernel: unregister_netdevice: waiting for ib0 > to become free. Usage count = 1 What other IB modules were you running when you removed ib_mthca (aside from ib_ipoib) ? -- Hal From makia at llnl.gov Tue Sep 13 18:31:53 2005 From: makia at llnl.gov (Makia Minich) Date: Tue, 13 Sep 2005 18:31:53 -0700 Subject: [openib-general] mvapich-gen2 question Message-ID: <20050914013153.GK16264@langley.llnl.gov> I'm using a RHEL4 based system with the backport-2.6.9 svn drop (svn3279). Building the mvapich-gen2 from subversion against this, everything seems to be ok, and installing it goes well. The problem is when I run a test I get the following error: :::::: => mpicc -o osu-bw osu-bw.c => mpirun_rsh -rsh -hostfile ~/machines -np 2 ./osu-bw /benchmarks/osu/src /benchmarks/osu/src [1] Abort: Error creating CQ at line 121 in file viainit.c mpirun: executable version 1 does not match our version 2. done. => :::::: I see in the code for mvapich (in ch-gen2) that there is a check against the version, but I'm not quite sure where this version is defined in my compiled code. Perhaps there's something I'm just not seeing. Thanks.... (((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))) Makia Minich Money is the Devil's toothpaste. 925.XXX.XXXX --The Flea (Mucha Lucha) (((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))) From rolandd at cisco.com Tue Sep 13 18:33:47 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 13 Sep 2005 18:33:47 -0700 Subject: [openib-general] mvapich-gen2 question In-Reply-To: <20050914013153.GK16264@langley.llnl.gov> (Makia Minich's message of "Tue, 13 Sep 2005 18:31:53 -0700") References: <20050914013153.GK16264@langley.llnl.gov> Message-ID: <523bo84emc.fsf@cisco.com> Makia> I'm using a RHEL4 based system with the backport-2.6.9 svn Makia> drop (svn3279). Building the mvapich-gen2 from subversion Makia> against this, everything seems to be ok, and installing it Makia> goes well. The problem is when I run a test I get the Makia> following error: Makia> mpirun: executable version 1 does not match our version 2. Do you possibly have old MPI executables or libraries somewhere that are getting picked up? - R. From halr at voltaire.com Tue Sep 13 18:31:11 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 21:31:11 -0400 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <20050914001235.GJ1685@kalmia.hozed.org> References: <1126609953.4382.42857.camel@hal.voltaire.com> <20050913161534.GD1685@kalmia.hozed.org> <1126628283.4514.496.camel@hal.voltaire.com> <20050914001235.GJ1685@kalmia.hozed.org> Message-ID: <1126661470.4425.81.camel@hal.voltaire.com> On Tue, 2005-09-13 at 20:12, Troy Benjegerdes wrote: > On Tue, Sep 13, 2005 at 12:19:30PM -0400, Hal Rosenstock wrote: > > On Tue, 2005-09-13 at 12:15, Troy Benjegerdes wrote: > > > We just had a node crash on our network, and it caused our OpenSM to > > > stop working.. we were running version openib-1.0.0.. > > > > Can you define stop working (more details) ? Are there any logs ? > > > > > I suppose this means I should start beating up on 1.1.0 now, right? > > > > Yes but the same issue might still exist. Can you reproduce it on the > > OpenSM you are running on now and then move up and see if it still > > exists ? > > Stop working as in IPoIB arp seems to stop. I suspect that the multicast tree for the broadcast group somehow gets broken. Were you running any other ULPs other than IPoIB ? > I've got a log now of the latest opensm-1.1.0 attached. > > The time (was) off on that machine, FYI. > > At the log entry 'Sep 13 12:06:55', I plugged in the node that is hung/crashed > .. which caused a bunch of opensm errors.. Thanks. That helps orient me. What was the opensm crash ? The log just ends abruptly. > I have since unplugged that > node, and can put it back in tommorow if you want more debug info. Great. More later on the log itself... -- Hal From hozer at hozed.org Tue Sep 13 18:46:05 2005 From: hozer at hozed.org (Troy Benjegerdes) Date: Tue, 13 Sep 2005 20:46:05 -0500 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <1126661470.4425.81.camel@hal.voltaire.com> References: <1126609953.4382.42857.camel@hal.voltaire.com> <20050913161534.GD1685@kalmia.hozed.org> <1126628283.4514.496.camel@hal.voltaire.com> <20050914001235.GJ1685@kalmia.hozed.org> <1126661470.4425.81.camel@hal.voltaire.com> Message-ID: <20050914014605.GK1685@kalmia.hozed.org> > Were you running any other ULPs other than IPoIB ? Not at the time. Maybe only MPI. > > I've got a log now of the latest opensm-1.1.0 attached. > > > > The time (was) off on that machine, FYI. > > > > At the log entry 'Sep 13 12:06:55', I plugged in the node that is hung/crashed > > .. which caused a bunch of opensm errors.. > > Thanks. That helps orient me. What was the opensm crash ? The log just > ends abruptly. OpenSM didn't crash, but the IPoIB multicast seemed broken. The opensm instance that generated that log is still currently running. Brett can hopefully tell us tommorow what (if anything) is still broken. We have another node that crashed, and have seen this behavior before where one node crashes and gets into some strange state that OpenSM can't handle well. > > > I have since unplugged that > > node, and can put it back in tommorow if you want more debug info. > > Great. More later on the log itself... > > -- Hal > -- -------------------------------------------------------------------------- Troy Benjegerdes 'da hozer' hozer at hozed.org Somone asked me why I work on this free (http://www.fsf.org/philosophy/) software stuff and not get a real job. Charles Shultz had the best answer: "Why do musicians compose symphonies and poets write poems? They do it because life wouldn't have any meaning for them if they didn't. That's why I draw cartoons. It's my life." -- Charles Shultz From halr at voltaire.com Tue Sep 13 19:12:24 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 13 Sep 2005 22:12:24 -0400 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <20050914001235.GJ1685@kalmia.hozed.org> References: <1126609953.4382.42857.camel@hal.voltaire.com> <20050913161534.GD1685@kalmia.hozed.org> <1126628283.4514.496.camel@hal.voltaire.com> <20050914001235.GJ1685@kalmia.hozed.org> Message-ID: <1126663943.4425.127.camel@hal.voltaire.com> Hi Troy, On Tue, 2005-09-13 at 20:12, Troy Benjegerdes wrote: Here is my analysis of the log you provided. I need to do a little more digging. I am curious as to the switch type and firmware versions of that switch and the failed HCA. > At the log entry 'Sep 13 12:06:55', I plugged in the node that is hung/crashed > .. which caused a bunch of opensm errors.. I have since unplugged that > node, and can put it back in tommorow if you want more debug info. At that point in time, we see the following: Sep 13 12:06:55 936933 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000013 Sep 13 12:06:55 937087 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:06:56 354422 [42FFF970] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=11) -- dropping. Sep 13 12:06:56 354439 [42FFF970] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 Sep 13 12:06:56 354449 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_TIMEOUT). Trap 128 is an urgent Link state of switch port changed trap. It looks like a solicited send failed (SubnGet NodeInfo). We had an exchange on this a while ago on the list in terms of an unresponsive port. Sep 13 12:06:56 363771 [40FFF970] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c90200402915 port 12. Adding to light sweep sampling list. Sep 13 12:06:56 363815 [40FFF970] -> Directed Path Dump of 2 hop path: Path = [0][1][D] The DR display is showing the path to the switch. The dump of the SMP shows: hop_ptr.................0x0 hop_count...............0x3 Initial path: [0][1][D][C] Also, the GUID cited is an HCA GUID rather than a switch GUID so I doubt it has 12 ports. I think these are just problems with the debug messages. Earlier in the log: Sep 13 12:03:51 959970 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 Sep 13 12:03:51 959986 [417FF970] -> Discovered new port with GUID:0x0002c90200402915 LID range [0xE,0xE] of node:MT47396 Infiniscale-III Mellanox Technologies Sep 13 12:03:51 959996 [417FF970] -> osm_report_notice: Reporting Generic Notice type:3 num:64 from LID:0x0001 GID:0xfe80000000000000,0x0002c90200402781 It appears that the failed node is a MT47396 off switch 0x0002c90200402781. What firmware version is running in both of these ? What is switch 0x0002c90200402781 ? A minor issue but the DR display above is not correct. The dump of the SMP shows: hop_ptr.................0x0 hop_count...............0x3 Initial path: [0][1][D][C] It seems to repeat this over and over again every few seconds until things break I presume at 12:07:57. The key to me is that OpenSM continues to receive: Sep 13 12:07:23 542642 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x000000000000002c Sep 13 12:07:23 542771 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Either OpenSM never shuts this off or it keeps bouncing the port in the light sweep. I need to investigate this further. It all ends when: Sep 13 12:07:56 574831 [40FFF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000057 Sep 13 12:07:56 574961 [40FFF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 Sep 13 12:07:56 719968 [417FF970] -> __osm_trap_rcv_process_request: Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E TID:0x0000000000000058 Sep 13 12:07:56 720052 [417FF970] -> osm_report_notice: Reporting Generic Notice type:1 num:128 from LID:0x0000 GID:0xfe80000000000000,0x0002c90200402915 and then that switch returns a bad status in a SM GetResp PortInfo (in response to a SM Set PortInfo): Sep 13 12:07:57 005832 [42FFF970] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x1C00 hop_ptr.................0x0 hop_count...............0x2 trans_id................0x455a attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0xC m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][D] Return path: [0][1][18] Reserved: [0][0][0][0][0][0][0] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 18 03 03 02 31 22 00 13 40 40 00 08 08 04 F2 40 00 00 00 00 00 00 00 00 00 88 00 00 00 00 00 00 00 00 00 00 Sep 13 12:07:57 005891 [40FFF970] -> osm_pi_rcv_process_set: ERR 0F10: Received Error Status for SetResp() Sep 13 12:07:57 005908 [40FFF970] -> PortInfo dump: port number.............0xC node_guid...............0x0002c90200402915 port_guid...............0x0002c90200402915 m_key...................0x0000000000000000 subnet_prefix...........0x0000000000000000 base_lid................0x0 master_sm_base_lid......0x0 capability_mask.........0x0 diag_code...............0x0 m_key_lease_period......0x0 local_port_num..........0x18 link_width_enabled......0x3 link_width_supported....0x3 link_width_active.......0x2 link_speed_supported....0x3 port_state..............DOWN state_info2.............0x22 m_key_protect_bits......0x0 lmc.....................0x0 link_speed..............0x13 mtu_smsl................0x40 vl_cap_init_type........0x40 vl_high_limit...........0x0 vl_arb_high_cap.........0x8 vl_arb_low_cap..........0x8 init_rep_mtu_cap........0x4 vl_stall_life...........0xF2 vl_enforce..............0x40 m_key_violations........0x0 p_key_violations........0x0 q_key_violations........0x0 guid_cap................0x0 subnet_timeout..........0x0 resp_time_value.........0x0 error_threshold.........0x88 Sep 13 12:07:57 005951 [40FFF970] -> Capabilities Mask: That is when things stop working. Likely multicast in that switch is not working. I'd be curious whether the multicast setup in that switch is trashed or not. That can be determined with the diag tools. Let me know if you would like me to document the procedure for this. There is a pending issue with Sets of PortInfo getting this status back which has been on this list. Not sure whether this is a related problem or not. -- Hal From mlleinin at hpcn.ca.sandia.gov Tue Sep 13 19:50:40 2005 From: mlleinin at hpcn.ca.sandia.gov (Matt Leininger) Date: Tue, 13 Sep 2005 19:50:40 -0700 Subject: [openib-general] mvapich-gen2 question In-Reply-To: <20050914013153.GK16264@langley.llnl.gov> References: <20050914013153.GK16264@langley.llnl.gov> Message-ID: <1126666240.1382.216.camel@localhost> Including mvapich-help at cse.ohio-state.edu in this thread. - Matt On Tue, 2005-09-13 at 18:31 -0700, Makia Minich wrote: > I'm using a RHEL4 based system with the backport-2.6.9 svn drop (svn3279). > Building the mvapich-gen2 from subversion against this, everything seems to > be ok, and installing it goes well. The problem is when I run a test I > get the following error: > > :::::: > => mpicc -o osu-bw osu-bw.c > => mpirun_rsh -rsh -hostfile ~/machines -np 2 ./osu-bw > /benchmarks/osu/src > /benchmarks/osu/src > [1] Abort: Error creating CQ > at line 121 in file viainit.c > mpirun: executable version 1 does not match our version 2. > > done. > => > :::::: > > I see in the code for mvapich (in ch-gen2) that there is a check against the > version, but I'm not quite sure where this version is defined in my compiled > code. Perhaps there's something I'm just not seeing. > > Thanks.... > > (((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))) > Makia Minich Money is the Devil's toothpaste. > 925.XXX.XXXX --The Flea (Mucha Lucha) > (((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))) > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From yuw at cse.ohio-state.edu Tue Sep 13 20:05:20 2005 From: yuw at cse.ohio-state.edu (Weikuan Yu) Date: Tue, 13 Sep 2005 23:05:20 -0400 Subject: [openib-general] mvapich-gen2 question In-Reply-To: <20050914013153.GK16264@langley.llnl.gov> References: <20050914013153.GK16264@langley.llnl.gov> Message-ID: <62B3DD38-24CC-11DA-97AD-000D932C3754@cse.ohio-state.edu> On Sep 13, 2005, at 9:31 PM, Makia Minich wrote: > I'm using a RHEL4 based system with the backport-2.6.9 svn drop > (svn3279). > Building the mvapich-gen2 from subversion against this, everything > seems to > be ok, and installing it goes well. The problem is when I run a test I > get the following error: > > :::::: > => mpicc -o osu-bw osu-bw.c > => mpirun_rsh -rsh -hostfile ~/machines -np 2 ./osu-bw > /benchmarks/osu/src > /benchmarks/osu/src > [1] Abort: Error creating CQ > at line 121 in file viainit.c This means one of your node have some problems in allocating resources. Please check the output of the following command # ulimit You may have a default, limited mlock limit, 32k for example. If so, please do these steps a) un-comment the following line in /etc/limits.conf to remove memlock limit. # * soft memlock unlimited b) And also put another line to the beginning of /etc/init.d/sshd to make it default for any new login. ulimit -l unlimited Please let us know if the memlock limit is the problem you are facing. > mpirun: executable version 1 does not match our version 2. > > done. > => > :::::: > > I see in the code for mvapich (in ch-gen2) that there is a check > against the > version, but I'm not quite sure where this version is defined in my > compiled > code. Perhaps there's something I'm just not seeing. If the problem is not due to memlock limit, we will happy looking into this further. If possible, a temporary account that helps to reproduce the problem would speed up things significantly. BTW, the version number here is defined to facilitate external process manager to check/match the protocol used at startup time. The actual code is defined in this file: mpid/ch_gen2/process/pmgr_client.h #define PMGR_VERSION 2 Thanks, Weikuan > > Thanks.... > > (((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))) > Makia Minich Money is the Devil's toothpaste. > 925.XXX.XXXX --The Flea (Mucha Lucha) > (((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))) > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From iod00d at hp.com Tue Sep 13 22:00:03 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 13 Sep 2005 22:00:03 -0700 Subject: [openib-general] Mellanox device in INIT state In-Reply-To: References: <52y860652h.fsf@cisco.com> Message-ID: <20050914050003.GA29137@esmail.cup.hp.com> On Tue, Sep 13, 2005 at 04:32:13PM -0700, Shirley Ma wrote: > But during the test (rm and ins ib_mthca modules), I hit another problem. > The stack was SVN 3380. > > Sep 13 15:21:29 elm3b37 kernel: unregister_netdevice: waiting for ib0 to > become free. Usage count = 1 I've also got a problem on ia64 though it's clearly related to sdp_init(). It's possible yours is caused by the same issue. SDP causes modprobe to segfault in sdp_init and I suspect leaves a reference count on ib_mthca modules. I've not had a chance to look into this...I'll look tomorrow if it's not obvious to someone else. oh - this was with SVN r3391. thanks, grant gsyprf3:~# reload_ib + IPoIB=51 + ifconfig ib0 down ib0: ERROR while getting interface flags: No such device + ifconfig ib1 down ib1: ERROR while getting interface flags: No such device + rmmod ib_ipoib ib_uverbs ib_sdp ib_cm ib_sa ib_mthca ib_mad ib_core ERROR: Module ib_ipoib does not exist in /proc/modules ERROR: Module ib_uverbs does not exist in /proc/modules ERROR: Module ib_sdp does not exist in /proc/modules ERROR: Module ib_cm does not exist in /proc/modules ERROR: Module ib_sa does not exist in /proc/modules ERROR: Module ib_mthca does not exist in /proc/modules ERROR: Module ib_mad does not exist in /proc/modules ERROR: Module ib_core does not exist in /proc/modules + modprobe ib_mthca msi_x=1 ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) ib_mthca: Initializing ((¥) GSI 60 (level, low) -> CPU 0 (0x0000) vector 69 ACPI: PCI Interrupt 0000:81:00.0[A] -> GSI 60 (level, low) -> IRQ 69 (¥: Missing DCS, aborting. ACPI: PCI interrupt for device 0000:81:00.0 disabled GSI 60 (level, low) -> CPU 0 (0x0000) vector 69 unregistered + modprobe ib_ipoib + modprobe ib_sdp kmem_cache_create: Early error in slab request_sock_ kernel BUG at mm/slab.c:1220! modprobe[1947]: bugcheck! 0 [1] Modules linked in: ib_sdp ib_cm ib_ipoib ib_sa ib_mthca ib_mad ib_core qla2300 qla2xxx firmware_class scsi_transport_fc e1000 tg3 e100 dm_mod Pid: 1947, CPU 1, comm: modprobe psr : 00001010085a6010 ifs : 8000000000000a1a ip : [] Not tainted ip is at kmem_cache_create+0x1c0/0x1000 unat: 0000000000000000 pfs : 0000000000000a1a rsc : 0000000000000003 rnat: 0000000000000000 bsps: 0000000000000000 pr : 0000000000159959 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a00000010010e700 b6 : a0000001000d7c40 b7 : a00000010000b130 f6 : 1003e00000000000000a0 f7 : 1003e20c49ba5e353f7cf f8 : 1003e00000000000004e2 f9 : 1003e000000000fa00000 f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db r1 : a000000100d23ce0 r2 : 0000000000000000 r3 : 0000000000000000 r8 : 0000000000000021 r9 : 00000000000000fd r10 : a000000100b3ad40 r11 : 0000000000000000 r12 : e000004042c57e20 r13 : e000004042c50000 r14 : 0000000000004000 r15 : a000000100967f68 r16 : a000000100967f70 r17 : e00000003e3f7e18 r18 : 0000000000000000 r19 : 0000000000000000 r20 : a000000100b43a48 r21 : a000000100b43a48 r22 : 0000000000000000 r23 : 0000000000000000 r24 : 0000000000000000 r25 : 0000000000000004 r26 : a0000001008f4d50 r27 : 0000000000000001 r28 : e000004042c50d54 r29 : a0000001008f4d54 r30 : 0000000000000000 r31 : 0000000000000000 Call Trace: [] show_stack+0x80/0xa0 sp=e000004042c579c0 bsp=e000004042c51128 [] show_regs+0x900/0x940 sp=e000004042c57b9/usr/local/bin/r0 bseloadp_ib:04ine 9=e: 1947 Segmenta00tion fault 00modprobe ib_sdp + modprobe ib_u04verbs FATAL: Mo2cdule ib_uverbs n10dot found. + ifc0 .51 netmask 255. [] die+0x150/0x200 sp=e000004042c57ba0 bsp=e000004042c51088 [] die_if_kernel+0x50/0x80 sp=e000004042c57ba0 bsp=e000004042c51058 [] ia64_bad_break+0x530/0x900 sp=e000004042c57ba0 bsp=e000004042c51030 [] ia64_leave_kernel+0x0/0x280 sp=e000004042c57c50 bsp=e000004042c51030 [] kmem_cache_create+0x1c0/0x1000 sp=e000004042c57e20 bsp=e000004042c50f58 [] proto_register+0x140/0x2a0 sp=e000004042c57e30 bsp=e000004042c50f10 [] sdp_init+0x30/0x830 [ib_sdp] sp=e000004042c57e30 bsp=e000004042c50ee8 [] sys_init_module+0x2e0/0x680 sp=e000004042c57e30 bsp=e000004042c50e60 [] ia64_ret_from_syscall+0x0/0x20 sp=e000004042c57e30 bsp=e000004042c50e60 [] __kernel_syscall_via_break+0x0/0x20 sp=e000004042c58000 bsp=e000004042c50e60 From IBMEHCAD at de.ibm.com Tue Sep 13 23:55:36 2005 From: IBMEHCAD at de.ibm.com (IBMEHCA DD) Date: Wed, 14 Sep 2005 08:55:36 +0200 Subject: [openib-general] IBM eHCA Device Driver for gen2 IB stack In-Reply-To: <20050913214626.GF1685@kalmia.hozed.org> Message-ID: we're currently intergrating much better userspace support and the mailing list comments into the next tar ball. That code should be available on sourceforge within the next vew days. With that version we're able to run intial MPI tests on mvapich with ehca. We're still working the license issue and the approval to use openib.org svn instead of tar files on sourceforge. Christoph > Troy Benjegerdes wrote on 13.09.2005 23:46:26: > > https://sourceforge.net/projects/ibmehcad/ > > ehca2_0011e > Has this been updated at all? Are there any new drops? -------------- next part -------------- An HTML attachment was scrubbed... URL: From yael at mellanox.co.il Wed Sep 14 01:02:02 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 14 Sep 2005 11:02:02 +0300 Subject: [openib-general] [PATCH] Opensm - change error to log Message-ID: <5zy860oz5x.fsf@mtl066.yok.mtl.com> Hi Hal, The following is printed as error, but should be printed as info instead. Attached is a patch for that. Thanks, Yael Signed-off-by: Yael Kalka Index: osm_vendor_ibumad.c =================================================================== --- osm_vendor_ibumad.c (revision 3412) +++ osm_vendor_ibumad.c (working copy) @@ -565,8 +565,8 @@ osm_vendor_get_all_port_attr( lids[0] = def_port.base_lid; linkstates[0] = def_port.state; - osm_log( p_vend->p_log, OSM_LOG_ERROR, - "osm_vendor_get_all_port_attr: ERR 5420: " + osm_log( p_vend->p_log, OSM_LOG_INFO, + "osm_vendor_get_all_port_attr: " "assign CA %s port %d guid (0x%"PRIx64") as the default port.\n", def_port.ca_name, def_port.portnum, cl_hton64(def_port.port_guid)); From yael at mellanox.co.il Wed Sep 14 01:04:14 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 14 Sep 2005 11:04:14 +0300 Subject: [openib-general] [PATCH] Opensm - bug in parsing db file Message-ID: <5zwtlkoz29.fsf@mtl066.yok.mtl.com> Hi Hal, There was a bug in the parsing of the guid2lid file. Attached is a patch for that. Thanks, Yael Signed-off-by: Yael Kalka Index: osm_db_files.c =================================================================== --- osm_db_files.c (revision 3412) +++ osm_db_files.c (working copy) @@ -387,9 +387,6 @@ osm_db_restore( p_prev_val = NULL; } - /* the last char of the value is newline - remove it */ - p_accum_val[strlen(p_accum_val) - 1] = '\0'; - /* store our key and value */ st_insert(p_domain_imp->p_hash, (st_data_t)p_key, (st_data_t)p_accum_val); From danb at voltaire.com Wed Sep 14 01:16:13 2005 From: danb at voltaire.com (Dan Bar Dov) Date: Wed, 14 Sep 2005 11:16:13 +0300 Subject: [openib-general] [PATCH] iSER - changes in API, socket-based connect Message-ID: Thanks, applied, rev 3415. Dan > -----Original Message----- > From: Alex Nezhinsky > Sent: Tuesday, September 13, 2005 8:02 PM > To: Dan Bar Dov > Cc: openib-general at openib.org > Subject: [openib-general] [PATCH] iSER - changes in API, > socket-based connect > > Hi, > > Attached is a patch with changes in iSER API. > 1. Got rid of iscsi entities stuff, now iser implies a single > iscsi entity. > 2. Connections are established using iser sockets. > The iser module registers itself as a new socket provider. > Connections are established by creating and connecting a socket. > Then iscsi should call a new iser_conn_bind() api function. > It associates an instance of struct socket * with a pair of > reciprocate connection handles. > All further api calls identify the connection using these handles. > Finally, conn_terminate() releases the socket as part of the > connection shutdown routine. > Files added: iser_socket.c, iser_socket.h > 3. Some cosmetic changes included, too. > Files deleted: iser_pdu.c, include/iser_types.h, include/iser_pdu.h > Some leftovers from the deleted files in include/*.h moved > into include/iser_api.h. > > --- > > Changes in iSER API. Single iSCSI entity supported. > Connection establishment using > sockets, registered by iSER module. Header files cleanup. > > Signed-off-by: Alexander Nezhinsky > From mst at mellanox.co.il Wed Sep 14 01:26:10 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Sep 2005 11:26:10 +0300 Subject: [openib-general] Re: Mellanox device in INIT state In-Reply-To: <20050914050003.GA29137@esmail.cup.hp.com> References: <20050914050003.GA29137@esmail.cup.hp.com> Message-ID: <20050914082610.GB28025@mellanox.co.il> Quoting Grant Grundler : > + modprobe ib_sdp > kmem_cache_create: Early error in slab request_sock_ > kernel BUG at mm/slab.c:1220! Seems to be a previous memory corruption that is biting us now. Looks like prot->rsk_prot isnt NULL, and prot->name seems to point to zeroed memory. Grant, is this reproducible? If so, could you please try running with the following patch, and see what does it print? MST Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_inet.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_inet.c 2005-09-11 12:36:48.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_inet.c 2005-09-14 13:14:35.000000000 +0300 @@ -1321,6 +1321,11 @@ static int __init sdp_init(void) sdp_dbg_init("SDP module load."); + printk("sdp_sk_proto.name = %s\n", sdp_sk_proto.name); + printk("sdp_sk_proto.obj_size = %lld\n", (long long)sdp_sk_proto.obj_size); + printk("sdp_init in_interrupt = %d\n", in_interrupt()); + printk("sdp_init prot->rsk_prot = %p\n", prot->rsk_prot); + result = proto_register(&sdp_sk_proto, 1); if (result < 0) { sdp_warn("INIT: Error <%d> registering sk proto,", result); -- MST From danb at voltaire.com Wed Sep 14 02:53:29 2005 From: danb at voltaire.com (Dan Bar Dov) Date: Wed, 14 Sep 2005 12:53:29 +0300 Subject: [openib-general] ISER cleanup Message-ID: the following change was commited: replace all ITRACE and IINFO with error printk Dan From halr at voltaire.com Wed Sep 14 03:27:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 06:27:25 -0400 Subject: [openib-general] Re: [PATCH] Opensm - change error to log In-Reply-To: <5zy860oz5x.fsf@mtl066.yok.mtl.com> References: <5zy860oz5x.fsf@mtl066.yok.mtl.com> Message-ID: <1126693488.4425.129.camel@hal.voltaire.com> On Wed, 2005-09-14 at 04:02, Yael Kalka wrote: > The following is printed as error, but should be printed as info instead. > Attached is a patch for that. Thanks. Applied. -- Hal From halr at voltaire.com Wed Sep 14 03:30:40 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 06:30:40 -0400 Subject: [openib-general] Re: [PATCH] Opensm - bug in parsing db file In-Reply-To: <5zwtlkoz29.fsf@mtl066.yok.mtl.com> References: <5zwtlkoz29.fsf@mtl066.yok.mtl.com> Message-ID: <1126693590.4425.131.camel@hal.voltaire.com> On Wed, 2005-09-14 at 04:04, Yael Kalka wrote: > - /* the last char of the value is newline - remove it */ > - p_accum_val[strlen(p_accum_val) - 1] = '\0'; > - Is this needed for Windows ? Did you try this there ? -- Hal From yael at mellanox.co.il Wed Sep 14 04:15:45 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Wed, 14 Sep 2005 14:15:45 +0300 Subject: [openib-general] RE: [PATCH] Opensm - bug in parsing db file Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E22E5@mtlexch01.mtl.com> It is not only for windows. We changed the file before, and now saw that we inserted a bug. We checked it on gen2. Yael -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Wednesday, September 14, 2005 1:31 PM To: Yael Kalka Cc: openib-general at openib.org; Eitan Zahavi Subject: Re: [PATCH] Opensm - bug in parsing db file On Wed, 2005-09-14 at 04:04, Yael Kalka wrote: > - /* the last char of the value is newline - remove it */ > - p_accum_val[strlen(p_accum_val) - 1] = '\0'; > - Is this needed for Windows ? Did you try this there ? -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed Sep 14 04:43:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 07:43:14 -0400 Subject: [openib-general] RE: [PATCH] Opensm - bug in parsing db file In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30E22E5@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30E22E5@mtlexch01.mtl.com> Message-ID: <1126697998.4425.257.camel@hal.voltaire.com> On Wed, 2005-09-14 at 07:15, Yael Kalka wrote: > It is not only for windows. > We changed the file before, and now saw that we inserted a bug. > We checked it on gen2. Thanks. Applied. -- Hal From halr at voltaire.com Wed Sep 14 04:51:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 07:51:25 -0400 Subject: [openib-general] [PATCH] OpenSM: osm_inform.c: Change error message 0207 to informational Message-ID: <1126698684.4425.283.camel@hal.voltaire.com> osm_inform.c: Change error message 0207 to informational Signed-off-by: Hal Rosenstock Index: osm_inform.c =================================================================== --- osm_inform.c (revision 3417) +++ osm_inform.c (working copy) @@ -574,8 +574,8 @@ __match_notice_to_inf_rec( if( p_src_port == (osm_port_t*)cl_qmap_end( &(p_subn->port_guid_tbl)) ) { - osm_log(p_log, OSM_LOG_ERROR, - "__match_notice_to_inf_rec: ERR 0207: " + osm_log(p_log, OSM_LOG_INFO, + "__match_notice_to_inf_rec: " "Cannot find source port with GUID:0x%016" PRIx64 "\n", cl_ntoh64(source_gid.unicast.interface_id) ); goto Exit; From jackm at mellanox.co.il Wed Sep 14 05:09:03 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 14 Sep 2005 15:09:03 +0300 Subject: [PATCH] [openib-general] Strange configure error in libibcm Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA6CE@mtlexch01.mtl.com> The problem is in the ordering of checks in file userspace/libibcm/configure.in Below is a patch for the problem: ------------------------------------------------------------------------- Index: configure.in =================================================================== --- configure.in (revision 3404) +++ configure.in (working copy) @@ -12,6 +12,10 @@ dnl Checks for programs AC_PROG_CC +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST +AC_CHECK_SIZEOF(long) + dnl Checks for libraries AC_CHECK_LIB(ibverbs, ibv_get_devices, [], AC_MSG_ERROR([ibv_get_devices() not found. libibcm requires libibcm.])) @@ -25,10 +29,6 @@ AC_MSG_ERROR([ not found. Is libibat installed?])) AC_HEADER_STDC -dnl Checks for typedefs, structures, and compiler characteristics. -AC_C_CONST -AC_CHECK_SIZEOF(long) - AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, if test -n "`$LD --help < /dev/null 2>/dev/null | grep version-script`"; then ac_cv_version_script=yes ---------------------------------------------------------------------------- ---------------- Jack -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed Sep 14 06:14:07 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 09:14:07 -0400 Subject: [openib-general] RE: [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_d elete In-Reply-To: <506C3D7B14CDD411A52C00025558DED60CCF35@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED60CCF35@mtlex01.yok.mtl.com> Message-ID: <1126703647.4425.358.camel@hal.voltaire.com> Hi Yael, On Sun, 2005-09-11 at 04:11, Yael Kalka wrote: > There is a problem with the patch. > 1. In osm_vendor_unbind: You used free(p_bind), when the pointer was > allocated using cl_zalloc. > You need to use cl_free. OK. > 2. There is a race between the cl_free and the receiver thread. We get > a segmentation fault due to the fact that the thread isn't destroyed > before freeing the p_bind object. There should be a way to signal the > reciever thread to exit, and the unbind should wait for that thread to > join. How can I reproduce that seg fault ? I never saw that in my testing of this. I will issue an updated version of this patch when I fix these issues. Thanks. -- Hal From eitan at mellanox.co.il Wed Sep 14 06:32:37 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 14 Sep 2005 16:32:37 +0300 Subject: [openib-general] Re: [PATCHv2] OpenSM: OpenIB vendor layer: Implement osm_vendor_d elete In-Reply-To: <1126703647.4425.358.camel@hal.voltaire.com> References: <1126703647.4425.358.camel@hal.voltaire.com> Message-ID: <43282675.6040903@mellanox.co.il> Hal Rosenstock wrote: > Hi Yael, > > On Sun, 2005-09-11 at 04:11, Yael Kalka wrote: > >>There is a problem with the patch. >>1. In osm_vendor_unbind: You used free(p_bind), when the pointer was >>allocated using cl_zalloc. >> You need to use cl_free. > > > OK. > > >>2. There is a race between the cl_free and the receiver thread. We get >>a segmentation fault due to the fact that the thread isn't destroyed >>before freeing the p_bind object. There should be a way to signal the >>reciever thread to exit, and the unbind should wait for that thread to >>join. > > > How can I reproduce that seg fault ? I never saw that in my testing of > this. For us it happened every time we run OpenSM: SuSE Linux 9.3, 2.6.11.4-20a-smp I think it depends on the OS or the way glibc was compiled: gcc (GCC) 3.3.5 20050117 (prerelease) (SUSE Linux) > > I will issue an updated version of this patch when I fix these issues. > Thanks. > > -- Hal > From eitan at mellanox.co.il Wed Sep 14 06:34:27 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 14 Sep 2005 16:34:27 +0300 Subject: [openib-general] Re: [PATCH] OpenSM: osm_inform.c: Change error message 0207 to informational In-Reply-To: <1126698684.4425.283.camel@hal.voltaire.com> References: <1126698684.4425.283.camel@hal.voltaire.com> Message-ID: <432826E3.2090005@mellanox.co.il> Sure - this is what we talked about regarding the InformInfo in case the port that caused the trap is no longer accessible. What about the other ERR 0207 ? I thought it is also not more then an info. Hal Rosenstock wrote: > osm_inform.c: Change error message 0207 to informational > > Signed-off-by: Hal Rosenstock > > Index: osm_inform.c > =================================================================== > --- osm_inform.c (revision 3417) > +++ osm_inform.c (working copy) > @@ -574,8 +574,8 @@ __match_notice_to_inf_rec( > > if( p_src_port == (osm_port_t*)cl_qmap_end( &(p_subn->port_guid_tbl)) > ) > { > - osm_log(p_log, OSM_LOG_ERROR, > - "__match_notice_to_inf_rec: ERR 0207: " > + osm_log(p_log, OSM_LOG_INFO, > + "__match_notice_to_inf_rec: " > "Cannot find source port with GUID:0x%016" PRIx64 "\n", > cl_ntoh64(source_gid.unicast.interface_id) ); > goto Exit; > From mst at mellanox.co.il Wed Sep 14 06:44:33 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Sep 2005 16:44:33 +0300 Subject: [openib-general] [PATCH] remove unnecessary include from mad.h Message-ID: <20050914134433.GE28025@mellanox.co.il> mad.h pulls in common.h which it does not actually depend on. The user should include it if he wants to. This approach reduces global namespace pollution. Signed-off-by: Michael S. Tsirkin Index: management/libibmad/include/infiniband/mad.h =================================================================== --- management/libibmad/include/infiniband/mad.h (revision 3423) +++ management/libibmad/include/infiniband/mad.h (working copy) @@ -36,7 +36,6 @@ #include #include -#include #ifdef __cplusplus # define BEGIN_C_DECLS extern "C" { Index: management/libibmad/src/resolve.c =================================================================== --- management/libibmad/src/resolve.c (revision 3423) +++ management/libibmad/src/resolve.c (working copy) @@ -45,6 +45,7 @@ #include #include +#include #undef DEBUG Index: management/libibmad/src/smp.c =================================================================== --- management/libibmad/src/smp.c (revision 3423) +++ management/libibmad/src/smp.c (working copy) @@ -44,6 +44,7 @@ #include #include +#include #undef DEBUG #define DEBUG if (ibdebug) WARN Index: management/libibmad/src/serv.c =================================================================== --- management/libibmad/src/serv.c (revision 3423) +++ management/libibmad/src/serv.c (working copy) @@ -47,6 +47,7 @@ #include #include +#include #undef DEBUG #define DEBUG if (ibdebug) WARN Index: management/libibmad/src/mad.c =================================================================== --- management/libibmad/src/mad.c (revision 3423) +++ management/libibmad/src/mad.c (working copy) @@ -45,6 +45,7 @@ #include #include +#include #undef DEBUG #define DEBUG if (ibdebug) WARN Index: management/libibmad/src/portid.c =================================================================== --- management/libibmad/src/portid.c (revision 3423) +++ management/libibmad/src/portid.c (working copy) @@ -44,6 +44,7 @@ #include #include +#include #undef DEBUG #define DEBUG if (ibdebug) WARN Index: management/libibmad/src/sa.c =================================================================== --- management/libibmad/src/sa.c (revision 3423) +++ management/libibmad/src/sa.c (working copy) @@ -44,6 +44,7 @@ #include #include +#include #undef DEBUG #define DEBUG if (ibdebug) WARN Index: management/libibmad/src/dump.c =================================================================== --- management/libibmad/src/dump.c (revision 3423) +++ management/libibmad/src/dump.c (working copy) @@ -43,6 +43,7 @@ #include #include +#include void mad_dump_int(char *buf, int bufsz, void *val, int valsz) Index: management/libibmad/src/fields.c =================================================================== --- management/libibmad/src/fields.c (revision 3423) +++ management/libibmad/src/fields.c (working copy) @@ -42,6 +42,7 @@ #include #include +#include /* * BITSOFFS and BE_OFFS are required due the fact that the bit offsets are inconsistently Index: management/libibmad/src/vendor.c =================================================================== --- management/libibmad/src/vendor.c (revision 3423) +++ management/libibmad/src/vendor.c (working copy) @@ -44,6 +44,7 @@ #include #include +#include #undef DEBUG #define DEBUG if (ibdebug) WARN -- MST From halr at voltaire.com Wed Sep 14 06:49:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 09:49:14 -0400 Subject: [openib-general] Re: [PATCH] OpenSM: osm_inform.c: Change error message 0207 to informational In-Reply-To: <432826E3.2090005@mellanox.co.il> References: <1126698684.4425.283.camel@hal.voltaire.com> <432826E3.2090005@mellanox.co.il> Message-ID: <1126705753.4425.362.camel@hal.voltaire.com> On Wed, 2005-09-14 at 09:34, Eitan Zahavi wrote: > Sure - this is what we talked about regarding the InformInfo > in case the port that caused the trap is no longer accessible. > > What about the other ERR 0207 ? I thought it is also not more then > an info. OK. I had previously changed that to ERR 0208 but I will change that to an info message also. Updated patch to follow. -- Hal From halr at voltaire.com Wed Sep 14 06:52:35 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 09:52:35 -0400 Subject: [openib-general] [PATCHv2] OpenSM: osm_inform.c: Change error messages 0207 and 0208 to informational Message-ID: <1126705954.4425.366.camel@hal.voltaire.com> osm_inform.c: Change error messages 0207 and 0208 to informational Signed-off-by: Hal Rosenstock Index: osm_inform.c =================================================================== --- osm_inform.c (revision 3417) +++ osm_inform.c (working copy) @@ -574,8 +574,8 @@ __match_notice_to_inf_rec( if( p_src_port == (osm_port_t*)cl_qmap_end( &(p_subn->port_guid_tbl)) ) { - osm_log(p_log, OSM_LOG_ERROR, - "__match_notice_to_inf_rec: ERR 0207: " + osm_log(p_log, OSM_LOG_INFO, + "__match_notice_to_inf_rec: " "Cannot find source port with GUID:0x%016" PRIx64 "\n", cl_ntoh64(source_gid.unicast.interface_id) ); goto Exit; @@ -586,8 +586,8 @@ __match_notice_to_inf_rec( cl_ntoh16(p_infr_rec->report_addr.dest_lid) ); if( !p_dest_port ) { - osm_log(p_log, OSM_LOG_ERROR, - "__match_notice_to_inf_rec: ERR 0208: " + osm_log(p_log, OSM_LOG_INFO, + "__match_notice_to_inf_rec: " "Cannot find destination port with LID:0x%04x\n", cl_ntoh16(p_infr_rec->report_addr.dest_lid) ); goto Exit; From brett at scl.ameslab.gov Wed Sep 14 07:01:13 2005 From: brett at scl.ameslab.gov (Brett Bode) Date: Wed, 14 Sep 2005 09:01:13 -0500 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <1126663943.4425.127.camel@hal.voltaire.com> References: <1126609953.4382.42857.camel@hal.voltaire.com> <20050913161534.GD1685@kalmia.hozed.org> <1126628283.4514.496.camel@hal.voltaire.com> <20050914001235.GJ1685@kalmia.hozed.org> <1126663943.4425.127.camel@hal.voltaire.com> Message-ID: <952e87492dd535e7c98a841852bd3c62@scl.ameslab.gov> Hal, Let's see how many of these I can tackle... There are two switches in the setup and both are brand new 24 port DDR2 switches from Mellanox (sorry i don't know the switch part off the top of my head). Most of the NICs are rev a1 based NICs that have a fairly recent firmware on them. Though the opensm is running on a MT25208 InfiniHost III Ex in a dual opteron. The node that failed is an 8 way IBM pSeries p655 with: [ 176.575945] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) [ 176.674674] ib_mthca: Initializing Mellanox Technologies MT23108 InfiniHost (0001:62:00.0) [ 176.800432] PCI: Enabling device: (0001:62:00.0), cmd 142 lspci on a matching node gives: 0001:62:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1) Subsystem: Mellanox Technology MT23108 InfiniHost Flags: bus master, 66MHz, medium devsel, latency 144, IRQ 201 Memory at d8800000 (64-bit, non-prefetchable) [size=1M] Memory at d8000000 (64-bit, prefetchable) [size=8M] Capabilities: The pSeries nodes have firmware to hide the onboard memory due to some issues with openfirmware... At this point the node that had the issue is in a weird state where I can still login and perform some commands, but some fail (lspci hangs) and its routing is a bit screwed up as it can't see hosts over the ethernet properly either. I think that is because it crashed in the ipoib code. Here is the kernel oops: [2507694.118336] Oops: Kernel access of bad area, sig: 11 [#1] [2507694.131400] SMP NR_CPUS=8 PSERIES [2507694.139788] Modules linked in: pvfs2 ib_ipoib ib_sa ib_mthca ib_mad ib_core [2507694.156537] NIP: D0000000006216A0 XER: 20000000 LR: D000000000621680 CTR: C0000000001C5CD8 [2507694.176313] REGS: c0000007fe73f8a0 TRAP: 0300 Not tainted (2.6.12.3-power4) [2507694.193613] MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 24000084 [2507694.211341] DAR: 0000000000000008 DSISR: 0000000042000000 [2507694.224396] TASK: c0000000081e1000[23] 'events/5' THREAD: c0000007fe73c000 CPU: 5 [2507694.241849] GPR00: D000000000621680 C0000007FE73FB20 D000000000636A98 C00000079D14A2C0 [2507694.261054] GPR04: D00000000062E648 C00000003FA4B080 0000000000000000 0000000000000000 [2507694.280223] GPR08: 0000000000000000 0000000000000000 C00000003FA557D0 C00000077CA81E80 [2507694.299362] GPR12: D0000000006289D0 C00000000052BC00 0000000000000000 0000000000000000 [2507694.318568] GPR16: 0000000000000000 0000000000000000 0000000003A10000 000000000291FE84 [2507694.337756] GPR20: 0000000000000038 0000000003EB8E28 0000000800000000 C0000000041B1AC0 [2507694.356925] GPR24: C00000002CE34380 9000000000009032 C00000003FA557E8 C00000003FA55780 [2507694.376130] GPR28: 0000000000000000 C0000007918252C0 D000000000635F80 C0000007918252C0 [2507694.395650] NIP [d0000000006216a0] .path_free+0x1a8/0x26c [ib_ipoib] [2507694.410972] LR [d000000000621680] .path_free+0x188/0x26c [ib_ipoib] [2507694.426015] Call Trace: [2507694.432189] [c0000007fe73fb20] [d000000000621680] .path_free+0x188/0x26c [ib_ipoib] (unreliable) [2507694.453198] [c0000007fe73fbd0] [d000000000621864] .ipoib_flush_paths+0x100/0x148 [ib_ipoib] [2507694.473184] [c0000007fe73fc80] [d0000000006249c0] .ipoib_ib_dev_down+0x13c/0x194 [ib_ipoib] [2507694.493149] [c0000007fe73fd20] [d000000000625004] .ipoib_ib_dev_flush+0x44/0xac [ib_ipoib] [2507694.512946] [c0000007fe73fdb0] [c00000000005ca0c] .worker_thread+0x244/0x318 [2507694.529880] [c0000007fe73fee0] [c0000000000630e4] .kthread+0x154/0x1a4 [2507694.545601] [c0000007fe73ff90] [c000000000013508] .kernel_thread+0x4c/0x6c [2507694.562225] Instruction dump: [2507694.569557] 38630020 419effb8 e89e8008 48007355 e8410028 e93d0020 7fa3eb78 fb890058 [2507694.588139] 60000000 e97d0020 7ffdfb78 e92b00d8 480072fd e8410028 381f0028 [2507694.607156] What is the procedure for determining if the multicast setup on the switch is trashed? I suspect that if it is, the crashed node is causing it as I had power cycled the switch yesterday which seemed to get things working up until I plugged the crashed node in again. Brett On Sep 13, 2005, at 9:12 PM, Hal Rosenstock wrote: > Hi Troy, > > On Tue, 2005-09-13 at 20:12, Troy Benjegerdes wrote: > > Here is my analysis of the log you provided. I need to do a little more > digging. I am curious as to the switch type and firmware versions of > that switch and the failed HCA. > >> At the log entry 'Sep 13 12:06:55', I plugged in the node that is >> hung/crashed >> .. which caused a bunch of opensm errors.. I have since unplugged that >> node, and can put it back in tommorow if you want more debug info. > > At that point in time, we see the following: > > Sep 13 12:06:55 936933 [417FF970] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E > TID:0x0000000000000013 > Sep 13 12:06:55 937087 [417FF970] -> osm_report_notice: Reporting > Generic Notice type:1 num:128 from LID:0x0000 > GID:0xfe80000000000000,0x0002c90200402915 > Sep 13 12:06:56 354422 [42FFF970] -> umad_receiver: ERR 5409: send > completed with error (method=1 attr=11) -- dropping. > Sep 13 12:06:56 354439 [42FFF970] -> umad_receiver: ERR 5411: DR SMP > hop ptr 0 hop count 3 DR SLID 0x0 DR DLID 0x0 > Sep 13 12:06:56 354449 [42FFF970] -> __osm_sm_mad_ctrl_send_err_cb: > ERR 3113: MAD completed in error (IB_TIMEOUT). > > Trap 128 is an urgent Link state of switch port changed trap. > It looks like a solicited send failed (SubnGet NodeInfo). We had an > exchange on this a while ago on the list in terms of an unresponsive > port. > > Sep 13 12:06:56 363771 [40FFF970] -> osm_drop_mgr_process: ERR 0108: > Unknown remote side for node 0x0002c90200402915 port 12. Adding to > light sweep sampling list. > Sep 13 12:06:56 363815 [40FFF970] -> Directed Path Dump of 2 hop path: > Path = [0][1][D] > > The DR display is showing the path to the switch. The dump of the SMP > shows: > hop_ptr.................0x0 > hop_count...............0x3 > Initial path: [0][1][D][C] > Also, the GUID cited is an HCA GUID rather than a switch GUID so I > doubt > it has 12 ports. I think these are just problems with the debug > messages. > > Earlier in the log: > > Sep 13 12:03:51 959970 [417FF970] -> osm_report_notice: Reporting > Generic Notice type:3 num:64 from LID:0x0001 > GID:0xfe80000000000000,0x0002c90200402781 > Sep 13 12:03:51 959986 [417FF970] -> Discovered new port with > GUID:0x0002c90200402915 LID range [0xE,0xE] of node:MT47396 > Infiniscale-III Mellanox Technologies > Sep 13 12:03:51 959996 [417FF970] -> osm_report_notice: Reporting > Generic Notice type:3 num:64 from LID:0x0001 > GID:0xfe80000000000000,0x0002c90200402781 > > It appears that the failed node is a MT47396 off switch > 0x0002c90200402781. > What firmware version is running in both of these ? What is switch > 0x0002c90200402781 ? > > A minor issue but the DR display above is not correct. The dump of the > SMP shows: > hop_ptr.................0x0 > hop_count...............0x3 > Initial path: [0][1][D][C] > > It seems to repeat this over and over again every few seconds until > things break I presume at 12:07:57. > > The key to me is that OpenSM continues to receive: > Sep 13 12:07:23 542642 [40FFF970] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E > TID:0x000000000000002c > Sep 13 12:07:23 542771 [40FFF970] -> osm_report_notice: Reporting > Generic Notice type:1 num:128 from LID:0x0000 > GID:0xfe80000000000000,0x0002c90200402915 > > Either OpenSM never shuts this off or it keeps bouncing the port in the > light sweep. I need to investigate this further. > > It all ends when: > Sep 13 12:07:56 574831 [40FFF970] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E > TID:0x0000000000000057 > Sep 13 12:07:56 574961 [40FFF970] -> osm_report_notice: Reporting > Generic Notice type:1 num:128 from LID:0x0000 > GID:0xfe80000000000000,0x0002c90200402915 > Sep 13 12:07:56 719968 [417FF970] -> __osm_trap_rcv_process_request: > Received Generic Notice type:0x01 num:128 Producer:2 from LID:0x000E > TID:0x0000000000000058 > Sep 13 12:07:56 720052 [417FF970] -> osm_report_notice: Reporting > Generic Notice type:1 num:128 from LID:0x0000 > GID:0xfe80000000000000,0x0002c90200402915 > > and then that switch returns a bad status in a SM GetResp PortInfo (in > response to a SM Set PortInfo): > > Sep 13 12:07:57 005832 [42FFF970] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 > (SubnGetResp) > D bit...................0x1 > status..................0x1C00 > hop_ptr.................0x0 > hop_count...............0x2 > trans_id................0x455a > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0xC > > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][D] > Return path: [0][1][18] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 > 18 03 03 02 > > 31 22 00 13 40 40 00 08 08 04 F2 40 > 00 00 00 00 > > 00 00 00 00 00 88 00 00 00 00 00 00 > 00 00 00 00 > > Sep 13 12:07:57 005891 [40FFF970] -> osm_pi_rcv_process_set: ERR 0F10: > Received Error Status for SetResp() > Sep 13 12:07:57 005908 [40FFF970] -> PortInfo dump: > port number.............0xC > > node_guid...............0x0002c90200402915 > > port_guid...............0x0002c90200402915 > > m_key...................0x0000000000000000 > > subnet_prefix...........0x0000000000000000 > base_lid................0x0 > master_sm_base_lid......0x0 > capability_mask.........0x0 > diag_code...............0x0 > m_key_lease_period......0x0 > local_port_num..........0x18 > link_width_enabled......0x3 > link_width_supported....0x3 > link_width_active.......0x2 > link_speed_supported....0x3 > port_state..............DOWN > state_info2.............0x22 > m_key_protect_bits......0x0 > lmc.....................0x0 > link_speed..............0x13 > mtu_smsl................0x40 > vl_cap_init_type........0x40 > vl_high_limit...........0x0 > vl_arb_high_cap.........0x8 > vl_arb_low_cap..........0x8 > init_rep_mtu_cap........0x4 > vl_stall_life...........0xF2 > vl_enforce..............0x40 > m_key_violations........0x0 > p_key_violations........0x0 > q_key_violations........0x0 > guid_cap................0x0 > subnet_timeout..........0x0 > resp_time_value.........0x0 > error_threshold.........0x88 > Sep 13 12:07:57 005951 [40FFF970] -> Capabilities Mask: > > That is when things stop working. Likely multicast in that switch is > not > working. I'd be curious whether the multicast setup in that switch is > trashed or not. That can be determined with the diag tools. Let me know > if you would like me to document the procedure for this. > > There is a pending issue with Sets of PortInfo getting this status back > which has been on this list. Not sure whether this is a related problem > or not. > > -- Hal > > From halr at voltaire.com Wed Sep 14 07:05:15 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 10:05:15 -0400 Subject: [openib-general] Re: [PATCH] remove unnecessary include from mad.h In-Reply-To: <20050914134433.GE28025@mellanox.co.il> References: <20050914134433.GE28025@mellanox.co.il> Message-ID: <1126706714.4425.368.camel@hal.voltaire.com> On Wed, 2005-09-14 at 09:44, Michael S. Tsirkin wrote: > mad.h pulls in common.h which it does not actually depend on. > The user should include it if he wants to. > This approach reduces global namespace pollution. Thanks. Applied. -- Hal From eitan at mellanox.co.il Wed Sep 14 07:33:37 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 14 Sep 2005 17:33:37 +0300 Subject: [openib-general] Re: [PATCHv2] OpenSM: osm_inform.c: Change error messages 0207 and 0208 to informational In-Reply-To: <1126705954.4425.366.camel@hal.voltaire.com> References: <1126705954.4425.366.camel@hal.voltaire.com> Message-ID: <432834C1.2010302@mellanox.co.il> Approved. Hal Rosenstock wrote: > osm_inform.c: Change error messages 0207 and 0208 to informational > > Signed-off-by: Hal Rosenstock > > Index: osm_inform.c > =================================================================== > --- osm_inform.c (revision 3417) > +++ osm_inform.c (working copy) > @@ -574,8 +574,8 @@ __match_notice_to_inf_rec( > > if( p_src_port == (osm_port_t*)cl_qmap_end( &(p_subn->port_guid_tbl)) > ) > { > - osm_log(p_log, OSM_LOG_ERROR, > - "__match_notice_to_inf_rec: ERR 0207: " > + osm_log(p_log, OSM_LOG_INFO, > + "__match_notice_to_inf_rec: " > "Cannot find source port with GUID:0x%016" PRIx64 "\n", > cl_ntoh64(source_gid.unicast.interface_id) ); > goto Exit; > @@ -586,8 +586,8 @@ __match_notice_to_inf_rec( > cl_ntoh16(p_infr_rec->report_addr.dest_lid) ); > if( !p_dest_port ) > { > - osm_log(p_log, OSM_LOG_ERROR, > - "__match_notice_to_inf_rec: ERR 0208: " > + osm_log(p_log, OSM_LOG_INFO, > + "__match_notice_to_inf_rec: " > "Cannot find destination port with LID:0x%04x\n", > cl_ntoh16(p_infr_rec->report_addr.dest_lid) ); > goto Exit; > From eitan at mellanox.co.il Wed Sep 14 08:08:27 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 14 Sep 2005 18:08:27 +0300 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <506C3D7B14CDD411A52C00025558DED607C306CE@mtlex01.yok.mtl.com> References: <506C3D7B14CDD411A52C00025558DED607C306CE@mtlex01.yok.mtl.com> Message-ID: <43283CEB.6050402@mellanox.co.il> Hi Hal,Sean I tested today what I think is the trunk Gen2 core with trunk OpenSM and still see some RMPP packet issues. It looks from the osmtest log file that the calculation of the reassembled packet size as there is always some extra bytes: Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88) Aug 21 14:46:56 [4017F6C0] -> osmtest_write_all_node_recs: Received 1 records. Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 3 = 200 / 64 (8) Aug 21 14:46:56 [4017F6C0] -> osmtest_write_all_port_recs: Received 3 records. I attach also some Analyzer file that show paylen is wrong, Is it possible the patches (of the core) that fix the rmpp paylen are not part of the main trunk? Thanks Eitan Eitan Zahavi wrote: > Hi Sean, Hal, > > We have started testing RMPP packets with osmtest and opensm (gen2 > version). > > We did not go very far. The first NodeRecord GetTable of all the nodes > in a "loopback" case, has some issues. > > The explanation is below: > > 1. NodeRecord MAD size is 112bytes (note the required padding of 4 > bytes at the end of the NodeRec data). > 2. OpenSM log file shows the query should return 2 records one for > each end-port. This really happens: > > > Aug 21 14:59:49 998104 [40D9DBB0] -> __osm_nr_rcv_create_nr: > Looking for NodeRecord with LID: 0x0 GUID:0x0000000000000000 > > Aug 21 14:59:49 998224 [40D9DBB0] -> __osm_nr_rcv_new_nr: New > NodeRecord: node 0x0002c902000017a0 > > port 0x0002c902000017a1, lid > 0x1. > > Aug 21 14:59:49 998327 [40D9DBB0] -> __osm_nr_rcv_new_nr: New > NodeRecord: node 0x0002c902000017a0 > > port 0x0002c902000017a2, lid > 0x2. > > Aug 21 14:59:49 998395 [40D9DBB0] -> osm_nr_rcv_process: > Returning 2 records. > > 3. On the wire we see the following (see attached gif for more > details): > a. Two data segments were sent and two ACKs were returned. This is > OK. > b. The first segment reports PayLen = 440bytes. According to the > spec the first segment might provide paylen != 0 and when it is done it > should be equal to the (class header * Num-Segments) + data length. In > our case we have data length = 2*112, and SA extra header = 20byte * > 2seg. This leads to peylen=264 and not 440!!! > The spec defines that in p775-l37. > So this is a violation of the spec. > c. The last segment (segment 2) provides the paylen field of 100. > The expected value for the last segment length should have been: SA > extra header + leftover data size from prev segments. Since the first > segment has 200bytes for data the left over should have been 112*2 - 200 > = 24. With the SA extra header 44bytes. > So this is another violation of the spec. > d. The analyzer is confused by the above and reports the result as > having 3 NodeRecords. > e. <> > 4. Following that when we trace the log file of osmtest we find > more issues. Probably caused by changes to the vendor layer or the rmpp > assembly: It is expected that after assembly the size of the RMPP mad > reported to the osm vendor layer will be the rmpp header + SA extra > header + data-size. In our case that is 32 + 20 + 2*112 = 276. > > The log file shows: > > Aug 21 14:59:49 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = > 200 / 112 (88) > > Aug 21 14:59:49 [4017F6C0] -> osmtest_write_all_node_recs: > Received 1 records > > So this is another problem - probably with the way RMPP results > are assembled or pass back to the vendor. > > Please let me know if you will have time to dig into these problems or > if I should try and resolve them myself and provide patches. > > Thanks > > Eitan > > Eitan Zahavi > > Design Technology Director > > Mellanox Technologies LTD > > Tel:+972-4-9097208 > Fax:+972-4-9593245 > > P.O. Box 586 Yokneam 20692 ISRAEL > > > > > ------------------------------------------------------------------------ > > > ------------------------------------------------------------------------ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- A non-text attachment was scrubbed... Name: gen2 rmpp 14 sep 2005.iba Type: application/octet-stream Size: 15449 bytes Desc: not available URL: From halr at voltaire.com Wed Sep 14 08:14:30 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 11:14:30 -0400 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <952e87492dd535e7c98a841852bd3c62@scl.ameslab.gov> References: <1126609953.4382.42857.camel@hal.voltaire.com> <20050913161534.GD1685@kalmia.hozed.org> <1126628283.4514.496.camel@hal.voltaire.com> <20050914001235.GJ1685@kalmia.hozed.org> <1126663943.4425.127.camel@hal.voltaire.com> <952e87492dd535e7c98a841852bd3c62@scl.ameslab.gov> Message-ID: <1126710869.4425.436.camel@hal.voltaire.com> Hi Brett, On Wed, 2005-09-14 at 10:01, Brett Bode wrote: > Let's see how many of these I can tackle... Thanks for the configuration information. It puts things into more perspective. There may be more specifics needed but we'll see where we get to on this. > There are two switches > in the setup and both are brand new 24 port DDR2 switches from Mellanox > (sorry i don't know the switch part off the top of my head). Most of > the NICs are rev a1 based NICs that have a fairly recent firmware on > them. Though the opensm is running on a MT25208 InfiniHost III Ex in a > dual opteron. The node that failed is an 8 way IBM pSeries p655 with: > [ 176.575945] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, > 2005) > [ 176.674674] ib_mthca: Initializing Mellanox Technologies MT23108 > InfiniHost (0001:62:00.0) > [ 176.800432] PCI: Enabling device: (0001:62:00.0), cmd 142 > > lspci on a matching node gives: > 0001:62:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1) > Subsystem: Mellanox Technology MT23108 InfiniHost > Flags: bus master, 66MHz, medium devsel, latency 144, IRQ 201 > Memory at d8800000 (64-bit, non-prefetchable) [size=1M] > Memory at d8000000 (64-bit, prefetchable) [size=8M] > Capabilities: > > The pSeries nodes have firmware to hide the onboard memory due to some > issues with openfirmware... At this point the node that had the issue > is in a weird state where I can still login and perform some commands, > but some fail (lspci hangs) and its routing is a bit screwed up as it > can't see hosts over the ethernet properly either. I think that is > because it crashed in the ipoib code. Here is the kernel oops: > [2507694.118336] Oops: Kernel access of bad area, sig: 11 [#1] > [2507694.131400] SMP NR_CPUS=8 PSERIES > [2507694.139788] Modules linked in: pvfs2 ib_ipoib ib_sa ib_mthca > ib_mad ib_core > [2507694.156537] NIP: D0000000006216A0 XER: 20000000 LR: > D000000000621680 CTR: C0000000001C5CD8 > [2507694.176313] REGS: c0000007fe73f8a0 TRAP: 0300 Not tainted > (2.6.12.3-power4) > [2507694.193613] MSR: 9000000000001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: > 11 CR: 24000084 > [2507694.211341] DAR: 0000000000000008 DSISR: 0000000042000000 > [2507694.224396] TASK: c0000000081e1000[23] 'events/5' THREAD: > c0000007fe73c000 CPU: 5 > [2507694.241849] GPR00: D000000000621680 C0000007FE73FB20 > D000000000636A98 C00000079D14A2C0 > [2507694.261054] GPR04: D00000000062E648 C00000003FA4B080 > 0000000000000000 0000000000000000 > [2507694.280223] GPR08: 0000000000000000 0000000000000000 > C00000003FA557D0 C00000077CA81E80 > [2507694.299362] GPR12: D0000000006289D0 C00000000052BC00 > 0000000000000000 0000000000000000 > [2507694.318568] GPR16: 0000000000000000 0000000000000000 > 0000000003A10000 000000000291FE84 > [2507694.337756] GPR20: 0000000000000038 0000000003EB8E28 > 0000000800000000 C0000000041B1AC0 > [2507694.356925] GPR24: C00000002CE34380 9000000000009032 > C00000003FA557E8 C00000003FA55780 > [2507694.376130] GPR28: 0000000000000000 C0000007918252C0 > D000000000635F80 C0000007918252C0 > [2507694.395650] NIP [d0000000006216a0] .path_free+0x1a8/0x26c > [ib_ipoib] > [2507694.410972] LR [d000000000621680] .path_free+0x188/0x26c [ib_ipoib] > [2507694.426015] Call Trace: > [2507694.432189] [c0000007fe73fb20] [d000000000621680] > .path_free+0x188/0x26c [ib_ipoib] (unreliable) > [2507694.453198] [c0000007fe73fbd0] [d000000000621864] > .ipoib_flush_paths+0x100/0x148 [ib_ipoib] > [2507694.473184] [c0000007fe73fc80] [d0000000006249c0] > .ipoib_ib_dev_down+0x13c/0x194 [ib_ipoib] > [2507694.493149] [c0000007fe73fd20] [d000000000625004] > .ipoib_ib_dev_flush+0x44/0xac [ib_ipoib] > [2507694.512946] [c0000007fe73fdb0] [c00000000005ca0c] > .worker_thread+0x244/0x318 > [2507694.529880] [c0000007fe73fee0] [c0000000000630e4] > .kthread+0x154/0x1a4 > [2507694.545601] [c0000007fe73ff90] [c000000000013508] > .kernel_thread+0x4c/0x6c > [2507694.562225] Instruction dump: > [2507694.569557] 38630020 419effb8 e89e8008 48007355 e8410028 e93d0020 > 7fa3eb78 fb890058 > [2507694.588139] 60000000 e97d0020 7ffdfb78 e92b00d8 > 480072fd e8410028 381f0028 > [2507694.607156] What OpenIB svn version are you running ? > What is the procedure for determining if the multicast setup on the > switch is trashed? When the failure occurs: Please run ibnetdiscover and send the output. Also run ibchecknet to see what this shows ibroute - display unicast and multicast forwarding tables of switches So determine the LIDs of the switches (ibswitches can help with this) So it's something like: ibnetdiscover top1 ibswitches top1 Switch : 0x005442ba00003080 ports 24 "MT47396 Infiniscale-III Mellanox Technologies" port 0 lid 2 Switch : 0x0008f10400410015 ports 8 "SW-6IB4 Voltaire" port 0 lid 5 ibroute -M 2 Multicast mlids [0xc000-0xc3ff] of switch Lid 0x2 guid 0x005442ba00003080 (MT47396 Infiniscale-III Mellanox Technologies): 0 1 2 Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 MLid 0xc000 x 0xc001 x 0xc002 x 0xc003 x 4 valid mlids dumped ibroute -M 5 Multicast mlids [0xc000-0xc3ff] of switch Lid 0x5 guid 0x0008f10400410015 (SW-6IB4 Voltaire): Ports: 0 1 2 3 4 5 6 7 8 MLid 0xc000 x x x 0xc001 x x x 0xc003 x x x 0xc004 x x 0xc005 x 0xc006 x 6 valid mlids dumped The LIDs to use are configuration dependent and depend on what the OpenSM hands out. There is also ibtracert ibtracert - display unicast or multicast route from source to destination > I suspect that if it is, the crashed node is causing it as I had power > cycled the switch yesterday which seemed to get things working up until > I plugged the crashed node in again. But without recycling the switch things don't work, right ? With just unplugging this node, it doesn't work ? It sounds like the switch has some issue. Can you tell if it forwards any packets ? -- Hal From halr at voltaire.com Wed Sep 14 08:19:50 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 11:19:50 -0400 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <43283CEB.6050402@mellanox.co.il> References: <506C3D7B14CDD411A52C00025558DED607C306CE@mtlex01.yok.mtl.com> <43283CEB.6050402@mellanox.co.il> Message-ID: <1126711190.4425.439.camel@hal.voltaire.com> On Wed, 2005-09-14 at 11:08, Eitan Zahavi wrote: > Hi Hal,Sean > > I tested today what I think is the trunk Gen2 core with trunk OpenSM and > still see some RMPP packet issues. What svn version ? > It looks from the osmtest log file that > the calculation of the reassembled packet size as there is always some > extra bytes: > > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88) > Aug 21 14:46:56 [4017F6C0] -> osmtest_write_all_node_recs: Received 1 records. > > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 3 = 200 / 64 (8) > Aug 21 14:46:56 [4017F6C0] -> osmtest_write_all_port_recs: Received 3 records. > > I attach also some Analyzer file that show paylen is wrong, I will look at this shortly. Unfortunately I do not have IBA 1.2 decode so this is by hand. > Is it possible the patches (of the core) that fix the rmpp paylen are not > part of the main trunk? No. The patches are part of this. It would depend on what OpenIB svn version you are running with but if it is a recent pull then they are all there. -- Hal From halr at voltaire.com Wed Sep 14 08:26:43 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 11:26:43 -0400 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <1126711190.4425.439.camel@hal.voltaire.com> References: <506C3D7B14CDD411A52C00025558DED607C306CE@mtlex01.yok.mtl.com> <43283CEB.6050402@mellanox.co.il> <1126711190.4425.439.camel@hal.voltaire.com> Message-ID: <1126711603.4425.443.camel@hal.voltaire.com> On Wed, 2005-09-14 at 11:19, Hal Rosenstock wrote: > On Wed, 2005-09-14 at 11:08, Eitan Zahavi wrote: > > I attach also some Analyzer file that show paylen is wrong, > > I will look at this shortly. Unfortunately I do not have IBA 1.2 decode > so this is by hand. It doesn't look right. You either don't have all the changes or haven't restarted things since you picked them up. The first thing I notice in the trace is that middle segments don't have their payload length set to 0. So I'm not sure this was a valid experiment. -- Hal From pddbgzvlidhh at hinet.net Wed Sep 14 12:31:46 2005 From: pddbgzvlidhh at hinet.net (Bob Santos) Date: Wed, 14 Sep 2005 16:31:46 -0300 Subject: [openib-general] re: 69. Message-ID: <038w462k.7661820@hinet.net> We are happy to present you with six deals from four different brokers. Please remember that there is no commitment required on your part, and your credit is not an issue. Please validate your information with our secure and private database to ensure our records are up to date and accurate. http://mad3-in.com/save1.asp Have a good day. Sincerely, Bob Santos Customer Service Rep eLFH Inc. songful a siena be but gosling but or richardson , see enlargeable but some mercury a it's aperture ! in pentagonal orit humphrey see. dionysus the scissor not not retract , the cottonwood or and melamine try some laplacian or but foxglove , or sequester maysome meier or. From ardavis at ichips.intel.com Wed Sep 14 09:57:43 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 14 Sep 2005 09:57:43 -0700 Subject: [openib-general] userspace CM API for per device handling In-Reply-To: <43276839.40605@ichips.intel.com> References: <52hdcxpv6b.fsf@cisco.com> <43272272.6030606@ichips.intel.com> <43276839.40605@ichips.intel.com> Message-ID: <43285687.6040402@ichips.intel.com> Sean Hefty wrote: > Sean Hefty wrote: > >> For the userspace portion, I'm still trying to decide what the >> correct API should be. I'd like to avoid apps from having to call >> something like ib_cm_get_devices(), which would mirror the verbs >> call. I was thinking of having ib_cm_create_id() still take a struct >> ibv_context* as input, opening the corresponding CM node, and >> managing that internally. Thoughts? > > > To further define this: > > The kernel ucm module creates one CM device per physical device, > somewhat mirroring the work done by uverbs. (E.g. infiniband_cm/ucm0 > references the same device as infiniband_verbs/uverbs0). > > All CM devices are opened internally by the userspace CM and can be > mapped to a corresponding ibv_device using a GUID. This works okay, > except for the current call: > > ib_cm_get_event(**event); > > which can now map to multiple fd's. Some possible solutions are: > > 1. Add calls similar to ib_cm_get_devices() and ib_cm_open_device(), > making the CM devices explicit to the user. ib_cm_get_event() would > take a CM device as input. This requires that users manage not only a > list of HCAs, but also a mirror list of CM devices. > > 2. Change ib_cm_get_event(struct ibv_context *device_context, > **event). The mapping from the device to the corresponding CM fd is > performed internally, but requires a search based on the GUID. > > 3. Same as #2, but store the CM fd in the ibv_context to avoid the > search. This breaks the encapsulation between the CM and verbs. > > 4. Have ib_cm_get_event() operate across all CM devices. User events are processed (poll/select) with FD's so can we just use the FD to get events? This would give the user a direct mapping back to the correct device based on the poll or select results. Something like... 5. ib_cm_get_fd( struct ibv_context *device_context) and ib_cm_get_event(int fd, **event). and maybe consider a verbs change like.... ibv_get_async_event(int fd, struct ibv_async_event *event); ibv_get_cq_event(int fd, struct ibv_cq **cq, void **cq_context); comments? -arlin > > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rolandd at cisco.com Wed Sep 14 10:00:40 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 14 Sep 2005 10:00:40 -0700 Subject: [PATCH] [openib-general] Strange configure error in libibcm In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA6CE@mtlexch01.mtl.com> (Jack Morgenstein's message of "Wed, 14 Sep 2005 15:09:03 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA6CE@mtlexch01.mtl.com> Message-ID: <52ek7r1t53.fsf@cisco.com> Jack> The problem is in the ordering of checks in file Jack> userspace/libibcm/configure.in Jack> Below is a patch for the problem: Yes, this will mask the problem for sure. However the underlying issue is still that ld has a different search path than ld.so. I'll apply this since it will make the error message easier to understand. - R. From rolandd at cisco.com Wed Sep 14 10:01:47 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 14 Sep 2005 10:01:47 -0700 Subject: [PATCH] [openib-general] Strange configure error in libibcm In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA6CE@mtlexch01.mtl.com> (Jack Morgenstein's message of "Wed, 14 Sep 2005 15:09:03 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30AA6CE@mtlexch01.mtl.com> Message-ID: <52acif1t38.fsf@cisco.com> Oh, Sean beat me to applying it. - R. From jlentini at netapp.com Wed Sep 14 10:05:45 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 14 Sep 2005 13:05:45 -0400 (EDT) Subject: [openib-general] [PATCH][uDAPL] use IBAT ATS Message-ID: Hi Arlin, I'd like to checkin the following patch (also attached) to uDAPL. Please let me know if you see any issues. james Index: dapl/openib/dapl_ib_cm.c =================================================================== --- dapl/openib/dapl_ib_cm.c (revision 3408) +++ dapl/openib/dapl_ib_cm.c (working copy) @@ -207,7 +207,8 @@ static void dapli_rt_comp_handler(uint64 } status = ib_at_route_by_ip(((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, - 0, 0, 0, &conn->dapl_rt, + 0, 0, IB_AT_ROUTE_FORCE_ATS, + &conn->dapl_rt, &conn->dapl_comp,&conn->dapl_comp.req_id); if (status < 0) { dapl_dbg_log(DAPL_DBG_TYPE_ERR, "dapl_rt_comp_handler: " @@ -607,7 +608,8 @@ dapls_ib_connect ( status = ib_at_route_by_ip( ((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, ((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr, - 0, 0, &conn->dapl_rt, &conn->dapl_comp, &conn->dapl_comp.req_id); + 0, IB_AT_ROUTE_FORCE_ATS, &conn->dapl_rt, &conn->dapl_comp, + &conn->dapl_comp.req_id); dapl_dbg_log(DAPL_DBG_TYPE_CM, " connect: at_route ret=%d,%s req_id %d GID %016llx %016llx\n", status, strerror(errno), conn->dapl_comp.req_id, -------------- next part -------------- Index: dapl/openib/dapl_ib_cm.c =================================================================== --- dapl/openib/dapl_ib_cm.c (revision 3408) +++ dapl/openib/dapl_ib_cm.c (working copy) @@ -207,7 +207,8 @@ static void dapli_rt_comp_handler(uint64 } status = ib_at_route_by_ip(((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, - 0, 0, 0, &conn->dapl_rt, + 0, 0, IB_AT_ROUTE_FORCE_ATS, + &conn->dapl_rt, &conn->dapl_comp,&conn->dapl_comp.req_id); if (status < 0) { dapl_dbg_log(DAPL_DBG_TYPE_ERR, "dapl_rt_comp_handler: " @@ -607,7 +608,8 @@ dapls_ib_connect ( status = ib_at_route_by_ip( ((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, ((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr, - 0, 0, &conn->dapl_rt, &conn->dapl_comp, &conn->dapl_comp.req_id); + 0, IB_AT_ROUTE_FORCE_ATS, &conn->dapl_rt, &conn->dapl_comp, + &conn->dapl_comp.req_id); dapl_dbg_log(DAPL_DBG_TYPE_CM, " connect: at_route ret=%d,%s req_id %d GID %016llx %016llx\n", status, strerror(errno), conn->dapl_comp.req_id, From mst at mellanox.co.il Wed Sep 14 10:15:21 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 14 Sep 2005 20:15:21 +0300 Subject: [openib-general] [PATCH] libibcm/libibat disable-libcheck option Message-ID: <20050914171521.GA31528@mellanox.co.il> Add an option to disable configure checks for ib libraries. This makes it possible to first configure all libraries, then make them all. Signed-off-by: Michael S. Tsirkin Index: userspace/libibcm/configure.in =================================================================== --- userspace.orig/libibcm/configure.in 2005-09-14 20:06:55.000000000 +0300 +++ userspace/libibcm/configure.in 2005-09-14 20:09:22.000000000 +0300 @@ -9,6 +9,12 @@ AM_INIT_AUTOMAKE(libibcm, 0.9.0) AC_DISABLE_STATIC AM_PROG_LIBTOOL +AC_ARG_ENABLE(libcheck, [ --disable-libcheck do not test for presence of ib libraries], +[ if test x$enableval = xno ; then + disable_libcheck=yes + fi +]) + dnl Checks for programs AC_PROG_CC @@ -17,16 +23,22 @@ AC_C_CONST AC_CHECK_SIZEOF(long) dnl Checks for libraries +if test "$disable_libcheck" != "yes" +then AC_CHECK_LIB(ibverbs, ibv_get_devices, [], AC_MSG_ERROR([ibv_get_devices() not found. libibcm requires libibcm.])) AC_CHECK_LIB(ibat, ib_at_route_by_ip, [], AC_MSG_ERROR([ib_at_route_by_ip() not found. libibcm requires libat.])) +fi dnl Checks for header files. +if test "$disable_libcheck" != "yes" +then AC_CHECK_HEADER(infiniband/verbs.h, [], AC_MSG_ERROR([ not found. Is libibverbs installed?])) AC_CHECK_HEADER(infiniband/at.h, [], AC_MSG_ERROR([ not found. Is libibat installed?])) +fi AC_HEADER_STDC AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, Index: userspace/libibat/configure.in =================================================================== --- userspace.orig/libibat/configure.in 2005-08-03 11:38:47.000000000 +0300 +++ userspace/libibat/configure.in 2005-09-14 20:12:03.000000000 +0300 @@ -9,14 +9,23 @@ AM_INIT_AUTOMAKE(libibat, 0.9.0) AC_DISABLE_STATIC AM_PROG_LIBTOOL +AC_ARG_ENABLE(libcheck, [ --disable-libcheck do not test for presence of ib libraries], +[ if test x$enableval = xno ; then + disable_libcheck=yes + fi +]) + dnl Checks for programs AC_PROG_CC dnl Checks for libraries dnl Checks for header files. +if test "$disable_libcheck" != "yes" +then AC_CHECK_HEADER(infiniband/verbs.h, [], AC_MSG_ERROR([ not found. Is libibverbs installed?])) +fi AC_HEADER_STDC dnl Checks for typedefs, structures, and compiler characteristics. -- MST From ardavis at ichips.intel.com Wed Sep 14 10:19:23 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 14 Sep 2005 10:19:23 -0700 Subject: [openib-general] Re: [PATCH][uDAPL] use IBAT ATS In-Reply-To: References: Message-ID: <43285B9B.9080800@ichips.intel.com> James Lentini wrote: >Hi Arlin, > >I'd like to checkin the following patch (also attached) to uDAPL. >Please let me know if you see any issues. > > Look fine to me. >james > >Index: dapl/openib/dapl_ib_cm.c >=================================================================== >--- dapl/openib/dapl_ib_cm.c (revision 3408) >+++ dapl/openib/dapl_ib_cm.c (working copy) >@@ -207,7 +207,8 @@ static void dapli_rt_comp_handler(uint64 > } > > status = ib_at_route_by_ip(((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, >- 0, 0, 0, &conn->dapl_rt, >+ 0, 0, IB_AT_ROUTE_FORCE_ATS, >+ &conn->dapl_rt, > &conn->dapl_comp,&conn->dapl_comp.req_id); > if (status < 0) { > dapl_dbg_log(DAPL_DBG_TYPE_ERR, "dapl_rt_comp_handler: " >@@ -607,7 +608,8 @@ dapls_ib_connect ( > status = ib_at_route_by_ip( > ((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, > ((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr, >- 0, 0, &conn->dapl_rt, &conn->dapl_comp, &conn->dapl_comp.req_id); >+ 0, IB_AT_ROUTE_FORCE_ATS, &conn->dapl_rt, &conn->dapl_comp, >+ &conn->dapl_comp.req_id); > > dapl_dbg_log(DAPL_DBG_TYPE_CM, " connect: at_route ret=%d,%s req_id %d GID %016llx %016llx\n", > status, strerror(errno), conn->dapl_comp.req_id, > >------------------------------------------------------------------------ > >Index: dapl/openib/dapl_ib_cm.c >=================================================================== >--- dapl/openib/dapl_ib_cm.c (revision 3408) >+++ dapl/openib/dapl_ib_cm.c (working copy) >@@ -207,7 +207,8 @@ static void dapli_rt_comp_handler(uint64 > } > > status = ib_at_route_by_ip(((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, >- 0, 0, 0, &conn->dapl_rt, >+ 0, 0, IB_AT_ROUTE_FORCE_ATS, >+ &conn->dapl_rt, > &conn->dapl_comp,&conn->dapl_comp.req_id); > if (status < 0) { > dapl_dbg_log(DAPL_DBG_TYPE_ERR, "dapl_rt_comp_handler: " >@@ -607,7 +608,8 @@ dapls_ib_connect ( > status = ib_at_route_by_ip( > ((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, > ((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr, >- 0, 0, &conn->dapl_rt, &conn->dapl_comp, &conn->dapl_comp.req_id); >+ 0, IB_AT_ROUTE_FORCE_ATS, &conn->dapl_rt, &conn->dapl_comp, >+ &conn->dapl_comp.req_id); > > dapl_dbg_log(DAPL_DBG_TYPE_CM, " connect: at_route ret=%d,%s req_id %d GID %016llx %016llx\n", > status, strerror(errno), conn->dapl_comp.req_id, > > From mshefty at ichips.intel.com Wed Sep 14 10:25:18 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Sep 2005 10:25:18 -0700 Subject: [openib-general] userspace CM API for per device handling In-Reply-To: <43285687.6040402@ichips.intel.com> References: <52hdcxpv6b.fsf@cisco.com> <43272272.6030606@ichips.intel.com> <43276839.40605@ichips.intel.com> <43285687.6040402@ichips.intel.com> Message-ID: <43285CFE.2080502@ichips.intel.com> Arlin Davis wrote: > User events are processed (poll/select) with FD's so can we just use the > FD to get events? This would give the user a direct mapping back to the > correct device based on the poll or select results. > > Something like... > > 5. ib_cm_get_fd( struct ibv_context *device_context) and > ib_cm_get_event(int fd, **event). I think that this is a good approach. It avoids searching lists when calling ib_cm_get_event(), but still gives the flexibility of per device event handling. Plus users don't have to track CM devices separately from verb devices. > ibv_get_async_event(int fd, struct ibv_async_event *event); > ibv_get_cq_event(int fd, struct ibv_cq **cq, void **cq_context); This makes sense and avoids users from having to map from an fd back to a verbs defined data structure, that is used to reference the fd. Roland, what do you think? - Sean From jlentini at netapp.com Wed Sep 14 10:40:56 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 14 Sep 2005 13:40:56 -0400 (EDT) Subject: [openib-general] Re: [PATCH][uDAPL] use IBAT ATS In-Reply-To: <43285B9B.9080800@ichips.intel.com> References: <43285B9B.9080800@ichips.intel.com> Message-ID: On Wed, 14 Sep 2005, Arlin Davis wrote: > James Lentini wrote: > > > Hi Arlin, > > > > I'd like to checkin the following patch (also attached) to uDAPL. Please let > > me know if you see any issues. > > > Look fine to me. Committed in revision 3432. From brett at scl.ameslab.gov Wed Sep 14 12:13:32 2005 From: brett at scl.ameslab.gov (Brett Bode) Date: Wed, 14 Sep 2005 14:13:32 -0500 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <1126710869.4425.436.camel@hal.voltaire.com> References: <1126609953.4382.42857.camel@hal.voltaire.com> <20050913161534.GD1685@kalmia.hozed.org> <1126628283.4514.496.camel@hal.voltaire.com> <20050914001235.GJ1685@kalmia.hozed.org> <1126663943.4425.127.camel@hal.voltaire.com> <952e87492dd535e7c98a841852bd3c62@scl.ameslab.gov> <1126710869.4425.436.camel@hal.voltaire.com> Message-ID: <4c50eb9e7a888cebad3dd931b9677592@scl.ameslab.gov> Hal, I have found out a bit more information. I think you are correct that the switch was getting messed up. I had tried resetting the switch with the old opensm code we had been running and found that fixed things up until the bad node was plugged in. We had not reset the switch since upgrading the opensm code. Upon doing that all seems to work again. Opensm throws some error below due to the bad node, but it appears to continue to correctly configure the remaining network. So I am currently thinking the latest opensm more or less correctly deals with the failed node. I also suspect the older opensm not only handled the error badly but somehow caused the switch to get into a confused state that the new opensm couldn't fix without a reset. Here is the repeated errors thrown: -------------- next part -------------- A non-text attachment was scrubbed... Name: osm-error.log Type: application/octet-stream Size: 24279 bytes Desc: not available URL: -------------- next part -------------- Here is the output of the other commands you suggested with everything working: -------------- next part -------------- A non-text attachment was scrubbed... Name: ib-works Type: application/octet-stream Size: 7629 bytes Desc: not available URL: -------------- next part -------------- Brett On Sep 14, 2005, at 10:14 AM, Hal Rosenstock wrote: > > What OpenIB svn version are you running ? > >> What is the procedure for determining if the multicast setup on the >> switch is trashed? > > When the failure occurs: > > Please run ibnetdiscover and send the output. > Also run ibchecknet to see what this shows > > ibroute - display unicast and multicast forwarding tables of switches > > So determine the LIDs of the switches (ibswitches can help with this) > > So it's something like: > ibnetdiscover top1 > ibswitches top1 > Switch : 0x005442ba00003080 ports 24 "MT47396 Infiniscale-III > Mellanox Technologies" port 0 lid 2 > Switch : 0x0008f10400410015 ports 8 "SW-6IB4 Voltaire" port 0 lid 5 > > ibroute -M 2 > Multicast mlids [0xc000-0xc3ff] of switch Lid 0x2 guid > 0x005442ba00003080 (MT47396 Infiniscale-III Mellanox Technologies): > 0 1 2 > Ports: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 > MLid > 0xc000 x > 0xc001 x > 0xc002 x > 0xc003 x > 4 valid mlids dumped > > ibroute -M 5 > Multicast mlids [0xc000-0xc3ff] of switch Lid 0x5 guid > 0x0008f10400410015 (SW-6IB4 Voltaire): > Ports: 0 1 2 3 4 5 6 7 8 > MLid > 0xc000 x x x > 0xc001 x x x > 0xc003 x x x > 0xc004 x x > 0xc005 x > 0xc006 x > 6 valid mlids dumped > > The LIDs to use are configuration dependent and depend on what the > OpenSM hands out. > > There is also ibtracert > ibtracert - display unicast or multicast route from source to > destination > >> I suspect that if it is, the crashed node is causing it as I had power >> cycled the switch yesterday which seemed to get things working up >> until >> I plugged the crashed node in again. > > But without recycling the switch things don't work, right ? With just > unplugging this node, it doesn't work ? It sounds like the switch has > some issue. Can you tell if it forwards any packets ? > > -- Hal > > From makia at llnl.gov Wed Sep 14 12:22:04 2005 From: makia at llnl.gov (Makia Minich) Date: Wed, 14 Sep 2005 12:22:04 -0700 Subject: [openib-general] mvapich-gen2 question In-Reply-To: <62B3DD38-24CC-11DA-97AD-000D932C3754@cse.ohio-state.edu> References: <20050914013153.GK16264@langley.llnl.gov> <62B3DD38-24CC-11DA-97AD-000D932C3754@cse.ohio-state.edu> Message-ID: <20050914192203.GL16264@langley.llnl.gov> Thanks for the help, the memlock change seems to have fixed this problem. * Weikuan Yu (yuw at cse.ohio-state.edu) wrote: > > On Sep 13, 2005, at 9:31 PM, Makia Minich wrote: > > >I'm using a RHEL4 based system with the backport-2.6.9 svn drop > >(svn3279). > >Building the mvapich-gen2 from subversion against this, everything > >seems to > >be ok, and installing it goes well. The problem is when I run a test I > >get the following error: > > > >:::::: > >=> mpicc -o osu-bw osu-bw.c > >=> mpirun_rsh -rsh -hostfile ~/machines -np 2 ./osu-bw > >/benchmarks/osu/src > >/benchmarks/osu/src > >[1] Abort: Error creating CQ > > at line 121 in file viainit.c > > This means one of your node have some problems in allocating resources. > Please check the output of the following command > > # ulimit > > You may have a default, limited mlock limit, 32k for example. If so, > please do these steps > a) un-comment the following line in /etc/limits.conf to remove memlock > limit. > # * soft memlock unlimited > > b) And also put another line to the beginning of /etc/init.d/sshd to > make it default for any new login. > ulimit -l unlimited > > Please let us know if the memlock limit is the problem you are facing. > > >mpirun: executable version 1 does not match our version 2. > > > >done. > >=> > >:::::: > > > >I see in the code for mvapich (in ch-gen2) that there is a check > >against the > >version, but I'm not quite sure where this version is defined in my > >compiled > >code. Perhaps there's something I'm just not seeing. > > If the problem is not due to memlock limit, we will happy looking into > this further. If possible, a temporary account that helps to reproduce > the problem would speed up things significantly. > > BTW, the version number here is defined to facilitate external process > manager to check/match the protocol used at startup time. The actual > code is defined in this file: mpid/ch_gen2/process/pmgr_client.h > > #define PMGR_VERSION 2 > > Thanks, > Weikuan > > > > >Thanks.... > > > >(((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))) > > Makia Minich Money is the Devil's toothpaste. > > 925.XXX.XXXX --The Flea (Mucha Lucha) > >(((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))) > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit > >http://openib.org/mailman/listinfo/openib-general > > (((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))) Makia Minich Money is the Devil's toothpaste. 925.424.5675 --The Flea (Mucha Lucha) (((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))) From ardavis at ichips.intel.com Wed Sep 14 12:28:13 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 14 Sep 2005 12:28:13 -0700 Subject: [openib-general] IBAT kernel oops with latest 3432 svn drop.... Message-ID: <432879CD.7060003@ichips.intel.com> Hal, Can you take a look at this? My SM was down (ports in INIT state) and I started a uDAPL (dtest -s) test that called ib_at_ips_by_gid() and then called ib_at_cancel() when it did not complete the request. My application exited with the expected open error and then I saw the kernel panic. Thanks, -arlin Sep 14 12:05:12 iclust-20 start_udev: Starting udev: failed Sep 14 12:05:12 iclust-20 kernel: ib_at: ib_dev_ats_op: dev (ffffffff8807c620) ib0 already has pending op 2 Sep 14 12:05:38 iclust-20 kernel: ib_at: req_end: pend ffff81003d473e40 already completed? status 0 Sep 14 12:05:38 iclust-20 kernel: ib_at: req_free: bad async req type 0 Sep 14 12:05:51 iclust-20 kernel: general protection fault: 0000 [1] SMP Sep 14 12:05:51 iclust-20 kernel: CPU 1 Sep 14 12:05:51 iclust-20 kernel: Modules linked in: ib_uat ib_at ib_ucm ib_cm ib_umad ib_uverbs ib_ipoib ib_sa det i b_mthca ib_mad ib_core Sep 14 12:05:51 iclust-20 kernel: Pid: 2249, comm: ib_at_wq/1 Tainted: P 2.6.13 Sep 14 12:05:51 iclust-20 kernel: RIP: 0010:[] {kfree+168} Sep 14 12:05:51 iclust-20 kernel: RSP: 0018:ffff810036b1be18 EFLAGS: 00010003 Sep 14 12:05:51 iclust-20 kernel: RAX: 015492e2f29413d0 RBX: ffff81003e108e40 RCX: ffff81000000c000 Sep 14 12:05:51 iclust-20 kernel: RDX: 002aa23c5e32827a RSI: 0000000000000002 RDI: 6172542073736572 Sep 14 12:05:51 iclust-20 kernel: RBP: 6172542073736572 R08: ffff810036b1a000 R09: 0000000000000000 Sep 14 12:05:51 iclust-20 kernel: R10: 0000000000000001 R11: ffffffff805250f8 R12: 0000000000000002 Sep 14 12:05:51 iclust-20 kernel: R13: 0000000000000292 R14: ffff81003e108e40 R15: ffffffff88078a19 Sep 14 12:05:51 iclust-20 kernel: FS: 0000000000000000(0000) GS:ffffffff80597880(0000) knlGS:0000000000000000 Sep 14 12:05:51 iclust-20 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Sep 14 12:05:51 iclust-20 kernel: CR2: 00002aaaaaaac000 CR3: 000000003708f000 CR4: 00000000000006e0 Sep 14 12:05:51 iclust-20 kernel: Process ib_at_wq/1 (pid: 2249, threadinfo ffff810036b1a000, task ffff810036af8730) Sep 14 12:05:51 iclust-20 kernel: Stack: 0000000000000206 ffff81003e108e40 ffff81003c3beb80 ffffffff8807e2a6 Sep 14 12:05:51 iclust-20 kernel: ffff81003e108e40 ffff81003e108e68 ffff81003a5e8880 ffffffff88078a34 Sep 14 12:05:51 iclust-20 kernel: ffff81003e108e60 ffffffff801427f6 Sep 14 12:05:51 iclust-20 kernel: Call Trace:{:ib_uat:ib_uat_callback+46} {:ib_at :req_comp_work+27} Sep 14 12:05:51 iclust-20 kernel: {worker_thread+503} {default_wake_functi on+0} Sep 14 12:05:51 iclust-20 kernel: {__wake_up_common+64} {default_wake_func tion+0} Sep 14 12:05:51 iclust-20 kernel: {keventd_create_kthread+0} {worker_threa d+0} Sep 14 12:05:51 iclust-20 kernel: {keventd_create_kthread+0} {kthread+204} Sep 14 12:05:51 iclust-20 kernel: {child_rip+8} {keventd_create_kthread+0} Sep 14 12:05:51 iclust-20 kernel: {kthread+0} {child_rip+0} Sep 14 12:05:51 iclust-20 kernel: Sep 14 12:05:51 iclust-20 kernel: Sep 14 12:05:51 iclust-20 kernel: Code: 48 8b 78 28 65 8b 04 25 34 00 00 00 48 98 48 8b 1c c7 8b 03 Sep 14 12:05:51 iclust-20 kernel: RIP {kfree+168} RSP From rolandd at cisco.com Wed Sep 14 12:39:49 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 14 Sep 2005 12:39:49 -0700 Subject: [openib-general] userspace CM API for per device handling In-Reply-To: <43285CFE.2080502@ichips.intel.com> (Sean Hefty's message of "Wed, 14 Sep 2005 10:25:18 -0700") References: <52hdcxpv6b.fsf@cisco.com> <43272272.6030606@ichips.intel.com> <43276839.40605@ichips.intel.com> <43285687.6040402@ichips.intel.com> <43285CFE.2080502@ichips.intel.com> Message-ID: <527jdjzbei.fsf@cisco.com> > ibv_get_async_event(int fd, struct ibv_async_event *event); > ibv_get_cq_event(int fd, struct ibv_cq **cq, void **cq_context); This seems like mostly pain with little gain to me. A consumer doing a poll or something with multiple file descriptors still needs some mapping to some per-fd context so that it knows which fds are CQ event fds, which ones are async event fds, and which ones are neither. So it's pretty easy to go back to a verbs context. The advantage of going through the verbs context when reading events is that it makes it harder to pass a bogus fd into the functions. If an app passes an inappropriate fd into an API like the one above, then some funky bugs could be introduced. I'm not dead-set here, but making this change seems like a net loss to me. If we wanted to be more symmetrical, we could have a CM API like struct ib_cm_context *ib_cm_get_context(struct ibv_context *dev_context); int ib_cm_get_event(struct ib_cm_context *context, struct ib_cm_event **event); - R. From ardavis at ichips.intel.com Wed Sep 14 13:04:31 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 14 Sep 2005 13:04:31 -0700 Subject: [openib-general] IBAT kernel oops with latest 3432 svn drop.... In-Reply-To: <432879CD.7060003@ichips.intel.com> References: <432879CD.7060003@ichips.intel.com> Message-ID: <4328824F.4020805@ichips.intel.com> Arlin Davis wrote: > Hal, > > Can you take a look at this? > > My SM was down (ports in INIT state) and I started a uDAPL (dtest -s) > test that called ib_at_ips_by_gid() and then called ib_at_cancel() > when it did not complete the request. My application exited with the > expected open error and then I saw the kernel panic. > Question: What can I expect as a result of ib_at_cancel()? Will I always get an event with -EINTR for the cancelled request id? > Thanks, > > -arlin > > Sep 14 12:05:12 iclust-20 start_udev: Starting udev: failed > Sep 14 12:05:12 iclust-20 kernel: ib_at: ib_dev_ats_op: dev > (ffffffff8807c620) ib0 already has pending op 2 > Sep 14 12:05:38 iclust-20 kernel: ib_at: req_end: pend > ffff81003d473e40 already completed? status 0 > Sep 14 12:05:38 iclust-20 kernel: ib_at: req_free: bad async req type 0 > Sep 14 12:05:51 iclust-20 kernel: general protection fault: 0000 [1] SMP > Sep 14 12:05:51 iclust-20 kernel: CPU 1 > Sep 14 12:05:51 iclust-20 kernel: Modules linked in: ib_uat ib_at > ib_ucm ib_cm ib_umad ib_uverbs ib_ipoib ib_sa det i b_mthca ib_mad > ib_core > Sep 14 12:05:51 iclust-20 kernel: Pid: 2249, comm: ib_at_wq/1 Tainted: > P 2.6.13 > Sep 14 12:05:51 iclust-20 kernel: RIP: 0010:[] > {kfree+168} > Sep 14 12:05:51 iclust-20 kernel: RSP: 0018:ffff810036b1be18 EFLAGS: > 00010003 > Sep 14 12:05:51 iclust-20 kernel: RAX: 015492e2f29413d0 RBX: > ffff81003e108e40 RCX: ffff81000000c000 > Sep 14 12:05:51 iclust-20 kernel: RDX: 002aa23c5e32827a RSI: > 0000000000000002 RDI: 6172542073736572 > Sep 14 12:05:51 iclust-20 kernel: RBP: 6172542073736572 R08: > ffff810036b1a000 R09: 0000000000000000 > Sep 14 12:05:51 iclust-20 kernel: R10: 0000000000000001 R11: > ffffffff805250f8 R12: 0000000000000002 > Sep 14 12:05:51 iclust-20 kernel: R13: 0000000000000292 R14: > ffff81003e108e40 R15: ffffffff88078a19 > Sep 14 12:05:51 iclust-20 kernel: FS: 0000000000000000(0000) > GS:ffffffff80597880(0000) knlGS:0000000000000000 > Sep 14 12:05:51 iclust-20 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: > 000000008005003b > Sep 14 12:05:51 iclust-20 kernel: CR2: 00002aaaaaaac000 CR3: > 000000003708f000 CR4: 00000000000006e0 > Sep 14 12:05:51 iclust-20 kernel: Process ib_at_wq/1 (pid: 2249, > threadinfo ffff810036b1a000, task ffff810036af8730) > Sep 14 12:05:51 iclust-20 kernel: Stack: 0000000000000206 > ffff81003e108e40 ffff81003c3beb80 ffffffff8807e2a6 > Sep 14 12:05:51 iclust-20 kernel: ffff81003e108e40 > ffff81003e108e68 ffff81003a5e8880 ffffffff88078a34 > Sep 14 12:05:51 iclust-20 kernel: ffff81003e108e60 > ffffffff801427f6 > Sep 14 12:05:51 iclust-20 kernel: Call > Trace:{:ib_uat:ib_uat_callback+46} > {:ib_at :req_comp_work+27} > Sep 14 12:05:51 iclust-20 kernel: > {worker_thread+503} > {default_wake_functi on+0} > Sep 14 12:05:51 iclust-20 kernel: > {__wake_up_common+64} > {default_wake_func tion+0} > Sep 14 12:05:51 iclust-20 kernel: > {keventd_create_kthread+0} > {worker_threa d+0} > Sep 14 12:05:51 iclust-20 kernel: > {keventd_create_kthread+0} > {kthread+204} > Sep 14 12:05:51 iclust-20 kernel: > {child_rip+8} > {keventd_create_kthread+0} > Sep 14 12:05:51 iclust-20 kernel: {kthread+0} > {child_rip+0} > Sep 14 12:05:51 iclust-20 kernel: > Sep 14 12:05:51 iclust-20 kernel: > Sep 14 12:05:51 iclust-20 kernel: Code: 48 8b 78 28 65 8b 04 25 34 00 > 00 00 48 98 48 8b 1c c7 8b 03 > Sep 14 12:05:51 iclust-20 kernel: RIP {kfree+168} > RSP > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From panda at cse.ohio-state.edu Wed Sep 14 13:53:01 2005 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Wed, 14 Sep 2005 16:53:01 -0400 (EDT) Subject: [openib-general] A stripped down version of mvapich-gen2 has been checked in Message-ID: <200509142053.j8EKr1It027872@xi.cse.ohio-state.edu> Based on the feedbacks we received about the size of the mvapich-gen2 code base, I had a discussion with the Argonne folks regarding removing NT-related directories and binary Java files, etc. from the MPICH stack. Argonne folks have agreed to this proposal of having a stripped down version of MPICH together with MVAPICH being available at the OpenIB SVN. Based on the discussion with Argonne folks, many directories and files have been removed from the MPICH stack. A stripped down version has been checked in. Currently, the code size (without the entries) is 23MB instead of the original 53MB. I will encourage people to take a look at this stripped down version. People interested in the complete integrated stack without the removal of any files/directories can always download the complete stack from OSU/MVAPICH download site. Thanks, DK From robert.j.woodruff at intel.com Wed Sep 14 14:25:44 2005 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 14 Sep 2005 14:25:44 -0700 Subject: [openib-general] [PATCH] iSER - changes in API, socket-based connect Message-ID: <1AC79F16F5C5284499BB9591B33D6F0005886E48@orsmsx408> Dan wrote, > Files added: iser_socket.c, iser_socket.h > 3. Some cosmetic changes included, too. > Files deleted: iser_pdu.c, include/iser_types.h, include/iser_pdu.h > Some leftovers from the deleted files in include/*.h moved > into include/iser_api.h. > I am trying to backbort the iSer (svn 3432) to 2.6.9 and I am running into issues with it compiling things like static struct proto iser_sock_proto = { name: "ib_iser", owner: THIS_MODULE, obj_size: sizeof(struct iser_sock), }; Would you happen to have a backport patch for this file that allows it to work on 2.6.9 kernels ? woody From mshefty at ichips.intel.com Wed Sep 14 14:32:55 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 14 Sep 2005 14:32:55 -0700 Subject: [openib-general] userspace CM API for per device handling In-Reply-To: <527jdjzbei.fsf@cisco.com> References: <52hdcxpv6b.fsf@cisco.com> <43272272.6030606@ichips.intel.com> <43276839.40605@ichips.intel.com> <43285687.6040402@ichips.intel.com> <43285CFE.2080502@ichips.intel.com> <527jdjzbei.fsf@cisco.com> Message-ID: <43289707.60203@ichips.intel.com> Roland Dreier wrote: >>ibv_get_async_event(int fd, struct ibv_async_event *event); >>ibv_get_cq_event(int fd, struct ibv_cq **cq, void **cq_context); > > This seems like mostly pain with little gain to me. A consumer doing > a poll or something with multiple file descriptors still needs some > mapping to some per-fd context so that it knows which fds are CQ event > fds, which ones are async event fds, and which ones are neither. So > it's pretty easy to go back to a verbs context. I'm not sold on this change either. Right now I'm just trying to find a decent API for the CM, and the one you mentioned works just as well. As for mapping fd's to context, I think this depends on how the user groups multiple file descriptors together and their threading model. A user could poll only fd's associated with CQs; although, I don't think that the current implementation of DAPL does this. > If we wanted to be more symmetrical, we could have a CM API like > > struct ib_cm_context *ib_cm_get_context(struct ibv_context *dev_context); > int ib_cm_get_event(struct ib_cm_context *context, struct ib_cm_event **event); This API still gives the benefits that I was looking for, so I will go with something like this for now. Thanks. - Sean From halr at voltaire.com Wed Sep 14 14:36:12 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 17:36:12 -0400 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <4c50eb9e7a888cebad3dd931b9677592@scl.ameslab.gov> References: <1126609953.4382.42857.camel@hal.voltaire.com> <20050913161534.GD1685@kalmia.hozed.org> <1126628283.4514.496.camel@hal.voltaire.com> <20050914001235.GJ1685@kalmia.hozed.org> <1126663943.4425.127.camel@hal.voltaire.com> <952e87492dd535e7c98a841852bd3c62@scl.ameslab.gov> <1126710869.4425.436.camel@hal.voltaire.com> <4c50eb9e7a888cebad3dd931b9677592@scl.ameslab.gov> Message-ID: <1126733771.5425.106.camel@hal.voltaire.com> Hi Brett, On Wed, 2005-09-14 at 15:13, Brett Bode wrote: > I have found out a bit more information. I think you are correct > that the switch was getting messed up. I had tried resetting the switch > with the old opensm code we had been running and found that fixed > things up until the bad node was plugged in. We had not reset the > switch since upgrading the opensm code. Upon doing that all seems to > work again. Opensm throws some error below due to the bad node, but it > appears to continue to correctly configure the remaining network. and the switch continues to work ? (That's with the new (1.1.0) OpenSM, right ? > So I > am currently thinking the latest opensm more or less correctly deals > with the failed node. I also suspect the older opensm not only handled > the error badly but somehow caused the switch to get into a confused > state that the new opensm couldn't fix without a reset. > > Here is the repeated errors thrown: Right, that looks similar to yesterday's log except that the DR is a little different. Did the misbehaving HCA node get plugged into a different switch port perhaps ? > > ______________________________________________________________________ > Here is the output of the other commands you suggested with everything > working: I'm not sure which HCA port the SM ran on but... The multicast tree appears only set up on the one switch. Were the other nodes off the other switch not involved ? Also, port 8 off the switch appears not in the multicast tree although I see it in the topology file. Not sure why that would be. -- Hal From mst at mellanox.co.il Wed Sep 14 14:47:59 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Sep 2005 00:47:59 +0300 Subject: [openib-general] Re: [PATCH] libibcm/libibat disable-libcheck option In-Reply-To: <20050914171521.GA31528@mellanox.co.il> References: <20050914171521.GA31528@mellanox.co.il> Message-ID: <20050914214759.GA31953@mellanox.co.il> > Subject: [PATCH] libibcm/libibat disable-libcheck option > > Add an option to disable configure checks for ib libraries. > This makes it possible to first configure all libraries, > then make them all. Guys, are you OK with checking in this change? -- MST From rolandd at cisco.com Wed Sep 14 14:51:53 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 14 Sep 2005 14:51:53 -0700 Subject: [openib-general] Re: [PATCH] libibcm/libibat disable-libcheck option In-Reply-To: <20050914214759.GA31953@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 15 Sep 2005 00:47:59 +0300") References: <20050914171521.GA31528@mellanox.co.il> <20050914214759.GA31953@mellanox.co.il> Message-ID: <52psrbxqpy.fsf@cisco.com> Michael> Add an option to disable configure checks for ib Michael> libraries. This makes it possible to first configure all Michael> libraries, then make them all. Why do we really want this? Is it so hard to build things in order? Does libibcm even build without libibat installed? It seems like libibcm needs to be able to find sa.h. - R. From halr at voltaire.com Wed Sep 14 14:48:36 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 17:48:36 -0400 Subject: [openib-general] IBAT kernel oops with latest 3432 svn drop.... In-Reply-To: <4328824F.4020805@ichips.intel.com> References: <432879CD.7060003@ichips.intel.com> <4328824F.4020805@ichips.intel.com> Message-ID: <1126734515.5425.127.camel@hal.voltaire.com> Hi Arlin, On Wed, 2005-09-14 at 16:04, Arlin Davis wrote: Arlin Davis wrote: > > > Hal, > > > > Can you take a look at this? Yes, but I have a few things in front of this right now. > > My SM was down (ports in INIT state) Had an SM been up previously ? Just wondering how to recreate this. > and I started a uDAPL (dtest -s) > > test that called ib_at_ips_by_gid() and then called ib_at_cancel() > > when it did not complete the request. My application exited with the > > expected open error and then I saw the kernel panic. > > > Question: What can I expect as a result of ib_at_cancel()? Return 0 if canceled, -1 if cancel failed (e.g. bad ID) > Will I always > get an event with -EINTR for the cancelled request id? Is that what you are seeing ? -- Hal From brett at scl.ameslab.gov Wed Sep 14 15:03:05 2005 From: brett at scl.ameslab.gov (Brett Bode) Date: Wed, 14 Sep 2005 17:03:05 -0500 Subject: [openib-general] [ANNOUCEv2] OpenIB OpenSM 1.1.0: trunk now supports 1.8.0 features In-Reply-To: <1126733771.5425.106.camel@hal.voltaire.com> References: <1126609953.4382.42857.camel@hal.voltaire.com> <20050913161534.GD1685@kalmia.hozed.org> <1126628283.4514.496.camel@hal.voltaire.com> <20050914001235.GJ1685@kalmia.hozed.org> <1126663943.4425.127.camel@hal.voltaire.com> <952e87492dd535e7c98a841852bd3c62@scl.ameslab.gov> <1126710869.4425.436.camel@hal.voltaire.com> <4c50eb9e7a888cebad3dd931b9677592@scl.ameslab.gov> <1126733771.5425.106.camel@hal.voltaire.com> Message-ID: <40c1f975afb8d903d2421b35cdbc7d65@scl.ameslab.gov> On Sep 14, 2005, at 4:36 PM, Hal Rosenstock wrote: > Hi Brett, > > On Wed, 2005-09-14 at 15:13, Brett Bode wrote: >> I have found out a bit more information. I think you are correct >> that the switch was getting messed up. I had tried resetting the >> switch >> with the old opensm code we had been running and found that fixed >> things up until the bad node was plugged in. We had not reset the >> switch since upgrading the opensm code. Upon doing that all seems to >> work again. Opensm throws some error below due to the bad node, but it >> appears to continue to correctly configure the remaining network. > > and the switch continues to work ? (That's with the new (1.1.0) OpenSM, > right ? Yes > >> So I >> am currently thinking the latest opensm more or less correctly deals >> with the failed node. I also suspect the older opensm not only handled >> the error badly but somehow caused the switch to get into a confused >> state that the new opensm couldn't fix without a reset. >> >> Here is the repeated errors thrown: > > Right, that looks similar to yesterday's log except that the DR is a > little different. Did the misbehaving HCA node get plugged into a > different switch port perhaps ? That is possible. >> >> ______________________________________________________________________ >> Here is the output of the other commands you suggested with everything >> working: > > I'm not sure which HCA port the SM ran on but... > > The multicast tree appears only set up on the one switch. Were the > other > nodes off the other switch not involved ? > > Also, port 8 off the switch appears not in the multicast tree although > I > see it in the topology file. Not sure why that would be. > I think we only have the IPOIB modules loaded on the systems on the one switch. The system connected to port 8 also does not have the IP module loaded. Originally we did not have the two switches linked together, but it we had a system on the second switch that had more up to date software so we loaded the new opensm onto it and connected the switches together. We are just getting the stuff on the second switch installed and are still waiting on some parts as well... Thanks, Brett From ardavis at ichips.intel.com Wed Sep 14 16:16:39 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 14 Sep 2005 16:16:39 -0700 Subject: [openib-general] IBAT kernel oops with latest 3432 svn drop.... In-Reply-To: <1126734515.5425.127.camel@hal.voltaire.com> References: <432879CD.7060003@ichips.intel.com> <4328824F.4020805@ichips.intel.com> <1126734515.5425.127.camel@hal.voltaire.com> Message-ID: <4328AF57.2020306@ichips.intel.com> Hal Rosenstock wrote: >Hi Arlin, > >On Wed, 2005-09-14 at 16:04, Arlin Davis wrote: >Arlin Davis wrote: > > >>>Hal, >>> >>>Can you take a look at this? >>> >>> > >Yes, but I have a few things in front of this right now. > > > >>>My SM was down (ports in INIT state) >>> >>> > >Had an SM been up previously ? Just wondering how to recreate this. > > SM was actually running, but for some reason it was not sweeping and configuring my ports. > > >> and I started a uDAPL (dtest -s) >> >> >>>test that called ib_at_ips_by_gid() and then called ib_at_cancel() >>>when it did not complete the request. My application exited with the >>>expected open error and then I saw the kernel panic. >>> >>> >>> >>Question: What can I expect as a result of ib_at_cancel()? >> >> > >Return 0 if canceled, -1 if cancel failed (e.g. bad ID) > > > >>Will I always >>get an event with -EINTR for the cancelled request id? >> >> > >Is that what you are seeing ? > > sometimes I see an event with -EINTR and sometimes I don't see any event. >-- Hal > > > From iod00d at hp.com Wed Sep 14 16:17:47 2005 From: iod00d at hp.com (Grant Grundler) Date: Wed, 14 Sep 2005 16:17:47 -0700 Subject: [openib-general] Re: Mellanox device in INIT state In-Reply-To: <20050914082610.GB28025@mellanox.co.il> References: <20050914050003.GA29137@esmail.cup.hp.com> <20050914082610.GB28025@mellanox.co.il> Message-ID: <20050914231747.GK31182@esmail.cup.hp.com> On Wed, Sep 14, 2005 at 11:26:10AM +0300, Michael S. Tsirkin wrote: > Seems to be a previous memory corruption that is biting us now. > Looks like prot->rsk_prot isnt NULL, and prot->name seems to > point to zeroed memory. Grant, is this reproducible? Yes - I think so. At least SDP is generating a segfault/stack trace to the console with it's loaded. Now that I'm recording the failures, I'm not certain the previous two failures were the same. > If so, could you please try running with the following patch, > and see what does it print? yup > MST > > Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_inet.c > =================================================================== > --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_inet.c 2005-09-11 12:36:48.000000000 +0300 > +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_inet.c 2005-09-14 13:14:35.000000000 +0300 > @@ -1321,6 +1321,11 @@ static int __init sdp_init(void) > > sdp_dbg_init("SDP module load."); > > + printk("sdp_sk_proto.name = %s\n", sdp_sk_proto.name); > + printk("sdp_sk_proto.obj_size = %lld\n", (long long)sdp_sk_proto.obj_size); > + printk("sdp_init in_interrupt = %d\n", in_interrupt()); > + printk("sdp_init prot->rsk_prot = %p\n", prot->rsk_prot); The last printk failed to compile: vers/infiniband/ulp/sdp/sdp_inet.c:1327: error: 'proto' undeclared (first use in this function) I assume that was intended to be "sdp_sk_proto.rsk_prot". Output follows - but with a different failure this time. Something wierd is definitely going on. gsyprf3:/usr/src/linux-2.6.13# reload_ib + IPoIB=51 + ifconfig ib0 down ib0: ERROR while getting interface flags: No such device + ifconfig ib1 down ib1: ERROR while getting interface flags: No such device + rmmod ib_ipoib ib_uverbs ib_sdp ib_cm ib_sa ib_mthca ib_mad ib_core ERROR: Module ib_ipoib does not exist in /proc/modules ERROR: Module ib_uverbs does not exist in /proc/modules ERROR: Module ib_sdp does not exist in /proc/modules ERROR: Module ib_cm does not exist in /proc/modules ERROR: Module ib_sa does not exist in /proc/modules ERROR: Module ib_mthca does not exist in /proc/modules ERROR: Module ib_mad does not exist in /proc/modules ERROR: Module ib_core does not exist in /proc/modules + modprobe ib_mthca msi_x=1 ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) ib_mthca: Initializing ((¥) GSI 60 (level, low) -> CPU 0 (0x0000) vector 69 ACPI: PCI Interrupt 0000:81:00.0[A] -> GSI 60 (level, low) -> IRQ 69 (¥: Missing DCS, aborting. ACPI: PCI interrupt for device 0000:81:00.0 disabled GSI 60 (level, low) -> CPU 0 (0x0000) vector 69 unregistered + modprobe ib_ipoib + modprobe ib_sdp sdp_sk_proto.name = SDP sdp_sk_proto.obj_size = 1744 sdp_init in_interrupt = 0 sdp_init prot->rsk_prot = 0000000000000000 Uninitialised timer! This is just a warning. Your computer is OK function=0xa0000001008ac990, data=0xa00000020021b600 Call Trace: [] show_stack+0x80/0xa0 sp=e000004041267c50 bsp=e000004041260fe0 [] dump_stack+0x30/0x60 sp=e000004041267e20 bsp=e000004041260fc8 [] check_timer_failed+0xe0/0x120 sp=e000004041267e20 bsp=e000004041260fa8 [] __mod_timer+0x60/0x200 sp=e000004041267e20 bsp=e000004041260f68 [] queue_delayed_work+0x110/0x1c0 sp=e000004041267e30 bsp=e000004041260f38 [] sdp_link_addr_init+0x1a0/0x3e0 [ib_sdp] sp=e000004041267e30 bsp=e000004041260f10 [] sdp_init+0x160/0x900 [ib_sdp] sp=e000004041267e30 bsp=e000004041260ee8 [] sys_init_module+0x2e0/0x680 sp=e000004041267e30 bsp=e000004041260e60 [] ia64_ret_from_syscall+0x0/0x20 sp=e000004041267e30 bsp=e000004041260e60 [] __kernel_syscall_via_break+0x0/0x20 sp=e000004041268000 bsp=e000004041260e60 [ console hangs ] I can't abort/interrupt the modprobe command and it's not segfaulting this time. "ps -ef" shows (among other things): grundler at gsyprf3:~$ ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 15:32 ? 00:00:04 init [2] ... root 3972 2250 0 15:58 ttyS3 00:00:00 /bin/sh -x /usr/local/bin/reload root 3998 9 0 15:58 ? 00:00:00 [ipoib] root 3999 3972 99 15:58 ttyS3 00:08:30 modprobe ib_sdp root 4003 9 0 15:58 ? 00:00:00 [ib_cm/0] root 4004 9 0 15:58 ? 00:00:00 [ib_cm/1] root 4008 9 0 15:58 ? 00:00:00 [sdp_wq/0] root 4009 9 0 15:58 ? 00:00:00 [sdp_wq/1] ... grundler at gsyprf3:~$ ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.2 0.0 3440 1328 ? S 15:32 0:04 init [2] ... root 3972 0.0 0.1 5584 2624 ttyS3 S+ 15:58 0:00 /bin/sh -x /usr root 3998 0.0 0.0 0 0 ? S< 15:58 0:00 [ipoib] root 3999 99.9 0.2 6624 4592 ttyS3 R+ 15:58 9:50 modprobe ib_sdp root 4003 0.0 0.0 0 0 ? S< 15:58 0:00 [ib_cm/0] root 4004 0.0 0.0 0 0 ? S< 15:58 0:00 [ib_cm/1] root 4008 0.0 0.0 0 0 ? S< 15:58 0:00 [sdp_wq/0] root 4009 0.0 0.0 0 0 ? S< 15:58 0:00 [sdp_wq/1] ... "kill -9 3999" didn't have the intended effect either. I'll rebuild with SDP_DEBUG options and see if that changes it yet again. grant From rolandd at cisco.com Wed Sep 14 16:43:44 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 14 Sep 2005 16:43:44 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: (Sean Hefty's message of "Wed, 7 Sep 2005 16:24:56 -0700") References: Message-ID: <52ek7rxljj.fsf@cisco.com> This patch does a few things: - Adds node_guid and node_desc fields to struct ib_device - Has mthca set these fields on startup - Extends modify_device method to handle setting node_desc - Exposes node_desc in sysfs - Allows userspace to set node_desc by writing into sysfs file, eg. echo -n `hostname` >> /sys/class/infiniband/mthca0/node_desc This should probably be combined with Sean's work to get rid of node_guid queries in ULPs. Comments? - R. Index: infiniband/include/rdma/ib_verbs.h =================================================================== --- infiniband/include/rdma/ib_verbs.h (revision 3432) +++ infiniband/include/rdma/ib_verbs.h (working copy) @@ -223,11 +223,13 @@ struct ib_port_attr { }; enum ib_device_modify_flags { - IB_DEVICE_MODIFY_SYS_IMAGE_GUID = 1 + IB_DEVICE_MODIFY_SYS_IMAGE_GUID = 1 << 0, + IB_DEVICE_MODIFY_NODE_DESC = 1 << 1 }; struct ib_device_modify { u64 sys_image_guid; + char node_desc[64]; }; enum ib_port_modify_flags { @@ -952,6 +954,8 @@ struct ib_device { IB_DEV_UNREGISTERED } reg_state; + char node_desc[64]; + __be64 node_guid; u8 node_type; u8 phys_port_cnt; }; Index: infiniband/core/sysfs.c =================================================================== --- infiniband/core/sysfs.c (revision 3432) +++ infiniband/core/sysfs.c (working copy) @@ -609,28 +609,50 @@ static ssize_t show_sys_image_guid(struc static ssize_t show_node_guid(struct class_device *cdev, char *buf) { struct ib_device *dev = container_of(cdev, struct ib_device, class_dev); - struct ib_device_attr attr; - ssize_t ret; - ret = ib_query_device(dev, &attr); + return sprintf(buf, "%04x:%04x:%04x:%04x\n", + be16_to_cpu(((__be16 *) &dev->node_guid)[0]), + be16_to_cpu(((__be16 *) &dev->node_guid)[1]), + be16_to_cpu(((__be16 *) &dev->node_guid)[2]), + be16_to_cpu(((__be16 *) &dev->node_guid)[3])); +} + +static ssize_t show_node_desc(struct class_device *cdev, char *buf) +{ + struct ib_device *dev = container_of(cdev, struct ib_device, class_dev); + + return sprintf(buf, "%.64s\n", dev->node_desc); +} + +static ssize_t set_node_desc(struct class_device *cdev, const char *buf, + size_t count) +{ + struct ib_device *dev = container_of(cdev, struct ib_device, class_dev); + struct ib_device_modify desc; + int ret; + + if (!dev->modify_device) + return -EIO; + + memcpy(desc.node_desc, buf, min_t(int, count, 64)); + ret = ib_modify_device(dev, IB_DEVICE_MODIFY_NODE_DESC, &desc); if (ret) return ret; - return sprintf(buf, "%04x:%04x:%04x:%04x\n", - be16_to_cpu(((__be16 *) &attr.node_guid)[0]), - be16_to_cpu(((__be16 *) &attr.node_guid)[1]), - be16_to_cpu(((__be16 *) &attr.node_guid)[2]), - be16_to_cpu(((__be16 *) &attr.node_guid)[3])); + return count; } static CLASS_DEVICE_ATTR(node_type, S_IRUGO, show_node_type, NULL); static CLASS_DEVICE_ATTR(sys_image_guid, S_IRUGO, show_sys_image_guid, NULL); static CLASS_DEVICE_ATTR(node_guid, S_IRUGO, show_node_guid, NULL); +static CLASS_DEVICE_ATTR(node_desc, S_IRUGO | S_IWUSR, show_node_desc, + set_node_desc); static struct class_device_attribute *ib_class_attributes[] = { &class_device_attr_node_type, &class_device_attr_sys_image_guid, - &class_device_attr_node_guid + &class_device_attr_node_guid, + &class_device_attr_node_desc }; static struct class ib_class = { Index: infiniband/hw/mthca/mthca_dev.h =================================================================== --- infiniband/hw/mthca/mthca_dev.h (revision 3432) +++ infiniband/hw/mthca/mthca_dev.h (working copy) @@ -283,7 +283,7 @@ struct mthca_dev { u64 ddr_end; MTHCA_DECLARE_DOORBELL_LOCK(doorbell_lock) - struct semaphore cap_mask_mutex; + struct semaphore dev_attr_mutex; void __iomem *hcr; void __iomem *kar; @@ -517,4 +517,17 @@ static inline int mthca_is_memfree(struc return dev->mthca_flags & MTHCA_FLAG_MEMFREE; } +/* + * XXX remove once 2.6.14 is released. + */ +static inline void *mthca_kzalloc(size_t size, unsigned int __nocast flags) +{ + void *ret = kmalloc(size, flags); + if (ret) + memset(ret, 0, size); + return ret; +} +#undef kzalloc +#define kzalloc(s, f) mthca_kzalloc(s, f); + #endif /* MTHCA_DEV_H */ Index: infiniband/hw/mthca/mthca_provider.c =================================================================== --- infiniband/hw/mthca/mthca_provider.c (revision 3432) +++ infiniband/hw/mthca/mthca_provider.c (working copy) @@ -44,6 +44,14 @@ #include "mthca_user.h" #include "mthca_memfree.h" +static void init_query_mad(struct ib_smp *mad) +{ + mad->base_version = 1; + mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; + mad->class_version = 1; + mad->method = IB_MGMT_METHOD_GET; +} + static int mthca_query_device(struct ib_device *ibdev, struct ib_device_attr *props) { @@ -54,7 +62,7 @@ static int mthca_query_device(struct ib_ u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; @@ -63,12 +71,8 @@ static int mthca_query_device(struct ib_ props->fw_ver = mdev->fw_ver; - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id = IB_SMP_ATTR_NODE_INFO; + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_NODE_INFO; err = mthca_MAD_IFC(mdev, 1, 1, 1, NULL, NULL, in_mad, out_mad, @@ -115,20 +119,16 @@ static int mthca_query_port(struct ib_de int err = -ENOMEM; u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; memset(props, 0, sizeof *props); - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id = IB_SMP_ATTR_PORT_INFO; - in_mad->attr_mod = cpu_to_be32(port); + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_PORT_INFO; + in_mad->attr_mod = cpu_to_be32(port); err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1, port, NULL, NULL, in_mad, out_mad, @@ -160,6 +160,23 @@ static int mthca_query_port(struct ib_de return err; } +static int mthca_modify_device(struct ib_device *ibdev, + int mask, + struct ib_device_modify *props) +{ + if (mask & ~IB_DEVICE_MODIFY_NODE_DESC) + return -EOPNOTSUPP; + + if (mask & IB_DEVICE_MODIFY_NODE_DESC) { + if (down_interruptible(&to_mdev(ibdev)->dev_attr_mutex)) + return -ERESTARTSYS; + memcpy(ibdev->node_desc, props->node_desc, 64); + up(&to_mdev(ibdev)->dev_attr_mutex); + } + + return 0; +} + static int mthca_modify_port(struct ib_device *ibdev, u8 port, int port_modify_mask, struct ib_port_modify *props) @@ -169,7 +186,7 @@ static int mthca_modify_port(struct ib_d int err; u8 status; - if (down_interruptible(&to_mdev(ibdev)->cap_mask_mutex)) + if (down_interruptible(&to_mdev(ibdev)->dev_attr_mutex)) return -ERESTARTSYS; err = mthca_query_port(ibdev, port, &attr); @@ -191,7 +208,7 @@ static int mthca_modify_port(struct ib_d } out: - up(&to_mdev(ibdev)->cap_mask_mutex); + up(&to_mdev(ibdev)->dev_attr_mutex); return err; } @@ -203,18 +220,14 @@ static int mthca_query_pkey(struct ib_de int err = -ENOMEM; u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id = IB_SMP_ATTR_PKEY_TABLE; - in_mad->attr_mod = cpu_to_be32(index / 32); + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_PKEY_TABLE; + in_mad->attr_mod = cpu_to_be32(index / 32); err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1, port, NULL, NULL, in_mad, out_mad, @@ -242,18 +255,14 @@ static int mthca_query_gid(struct ib_dev int err = -ENOMEM; u8 status; - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); if (!in_mad || !out_mad) goto out; - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id = IB_SMP_ATTR_PORT_INFO; - in_mad->attr_mod = cpu_to_be32(port); + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_PORT_INFO; + in_mad->attr_mod = cpu_to_be32(port); err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1, port, NULL, NULL, in_mad, out_mad, @@ -267,13 +276,9 @@ static int mthca_query_gid(struct ib_dev memcpy(gid->raw, out_mad->data + 8, 8); - memset(in_mad, 0, sizeof *in_mad); - in_mad->base_version = 1; - in_mad->mgmt_class = IB_MGMT_CLASS_SUBN_LID_ROUTED; - in_mad->class_version = 1; - in_mad->method = IB_MGMT_METHOD_GET; - in_mad->attr_id = IB_SMP_ATTR_GUID_INFO; - in_mad->attr_mod = cpu_to_be32(index / 8); + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_GUID_INFO; + in_mad->attr_mod = cpu_to_be32(index / 8); err = mthca_MAD_IFC(to_mdev(ibdev), 1, 1, port, NULL, NULL, in_mad, out_mad, @@ -1050,11 +1055,62 @@ static struct class_device_attribute *mt &class_device_attr_board_id }; +static int mthca_init_node_data(struct mthca_dev *dev) +{ + struct ib_smp *in_mad = NULL; + struct ib_smp *out_mad = NULL; + int err = -ENOMEM; + u8 status; + + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); + out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); + if (!in_mad || !out_mad) + goto out; + + init_query_mad(in_mad); + in_mad->attr_id = IB_SMP_ATTR_NODE_DESC; + + err = mthca_MAD_IFC(dev, 1, 1, + 1, NULL, NULL, in_mad, out_mad, + &status); + if (err) + goto out; + if (status) { + err = -EINVAL; + goto out; + } + + memcpy(dev->ib_dev.node_desc, out_mad->data, 64); + + in_mad->attr_id = IB_SMP_ATTR_NODE_INFO; + + err = mthca_MAD_IFC(dev, 1, 1, + 1, NULL, NULL, in_mad, out_mad, + &status); + if (err) + goto out; + if (status) { + err = -EINVAL; + goto out; + } + + memcpy(&dev->ib_dev.node_guid, out_mad->data + 12, 8); + +out: + kfree(in_mad); + kfree(out_mad); + return err; +} + int mthca_register_device(struct mthca_dev *dev) { int ret; int i; + ret = mthca_init_node_data(dev); + if (ret) + return ret; + strlcpy(dev->ib_dev.name, "mthca%d", IB_DEVICE_NAME_MAX); dev->ib_dev.owner = THIS_MODULE; @@ -1064,6 +1120,7 @@ int mthca_register_device(struct mthca_d dev->ib_dev.class_dev.dev = &dev->pdev->dev; dev->ib_dev.query_device = mthca_query_device; dev->ib_dev.query_port = mthca_query_port; + dev->ib_dev.modify_device = mthca_modify_device; dev->ib_dev.modify_port = mthca_modify_port; dev->ib_dev.query_pkey = mthca_query_pkey; dev->ib_dev.query_gid = mthca_query_gid; @@ -1120,7 +1177,7 @@ int mthca_register_device(struct mthca_d dev->ib_dev.post_recv = mthca_tavor_post_receive; } - init_MUTEX(&dev->cap_mask_mutex); + init_MUTEX(&dev->dev_attr_mutex); ret = ib_register_device(&dev->ib_dev); if (ret) Index: infiniband/hw/mthca/mthca_mad.c =================================================================== --- infiniband/hw/mthca/mthca_mad.c (revision 3432) +++ infiniband/hw/mthca/mthca_mad.c (working copy) @@ -111,6 +111,19 @@ static void smp_snoop(struct ib_device * } } +static void node_desc_override(struct ib_device *dev, + struct ib_mad *mad) +{ + if ((mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_LID_ROUTED || + mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) && + mad->mad_hdr.method == IB_MGMT_METHOD_GET_RESP && + mad->mad_hdr.attr_id == IB_SMP_ATTR_NODE_DESC) { + down(&to_mdev(dev)->dev_attr_mutex); + memcpy(((struct ib_smp *) mad)->data, dev->node_desc, 64); + up(&to_mdev(dev)->dev_attr_mutex); + } +} + static void forward_trap(struct mthca_dev *dev, u8 port_num, struct ib_mad *mad) @@ -250,8 +263,10 @@ int mthca_process_mad(struct ib_device * return IB_MAD_RESULT_FAILURE; } - if (!out_mad->mad_hdr.status) + if (!out_mad->mad_hdr.status) { smp_snoop(ibdev, port_num, in_mad); + node_desc_override(ibdev, out_mad); + } /* set return bit in status of directed route responses */ if (in_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) From halr at voltaire.com Wed Sep 14 17:08:41 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 20:08:41 -0400 Subject: [openib-general] IBAT kernel oops with latest 3432 svn drop.... In-Reply-To: <4328AF57.2020306@ichips.intel.com> References: <432879CD.7060003@ichips.intel.com> <4328824F.4020805@ichips.intel.com> <1126734515.5425.127.camel@hal.voltaire.com> <4328AF57.2020306@ichips.intel.com> Message-ID: <1126742920.5425.395.camel@hal.voltaire.com> On Wed, 2005-09-14 at 19:16, Arlin Davis wrote: > >>>My SM was down (ports in INIT state) > >>> > >>> > > > >Had an SM been up previously ? Just wondering how to recreate this. > > > > > SM was actually running, but for some reason it was not sweeping and > configuring my ports. OpenSM 1.0.0 or 1.1.0 or some other SM ? > >>Question: What can I expect as a result of ib_at_cancel()? > >> > >> > > > >Return 0 if canceled, -1 if cancel failed (e.g. bad ID) > > > > > > > >>Will I always > >>get an event with -EINTR for the cancelled request id? > >> > >> > > > >Is that what you are seeing ? > > > > > sometimes I see an event with -EINTR and sometimes I don't see any event. Yes, it looks like in kernel AT it depends on whether the request is still pending or not as to what occurs (at least right now). -- Hal From halr at voltaire.com Wed Sep 14 17:55:23 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 20:55:23 -0400 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <52ek7rxljj.fsf@cisco.com> References: <52ek7rxljj.fsf@cisco.com> Message-ID: <1126745723.5425.495.camel@hal.voltaire.com> On Wed, 2005-09-14 at 19:43, Roland Dreier wrote: > - Allows userspace to set node_desc by writing into sysfs file, eg. > echo -n `hostname` >> /sys/class/infiniband/mthca0/node_desc Shouldn't there be a non volatile way to do this ? Are there issues with userspace being able to do this or is the access rights on /sys/class/infiniband/mthca0/node_desc sufficient ? -- Hal From rolandd at cisco.com Wed Sep 14 18:56:07 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 14 Sep 2005 18:56:07 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <1126745723.5425.495.camel@hal.voltaire.com> (Hal Rosenstock's message of "14 Sep 2005 20:55:23 -0400") References: <52ek7rxljj.fsf@cisco.com> <1126745723.5425.495.camel@hal.voltaire.com> Message-ID: <527jdjxfew.fsf@cisco.com> Roland> - Allows userspace to set node_desc by writing into sysfs file, eg. Roland> echo -n `hostname` >> /sys/class/infiniband/mthca0/node_desc Hal> Shouldn't there be a non volatile way to do this ? I'm not sure what "non volatile" means in this context. For Mellanox HCAs, one can already set a permanent NodeDescription in the flash when burning firmware. Also, the echo command above was just an example; a hotplug script could also do cat /my/permanent/node_desc > /sys/class/infiniband/mthca0/node_desc or anything more elaborate it wants. Hal> Are there issues with userspace being able to do this or is Hal> the access rights on /sys/class/infiniband/mthca0/node_desc Hal> sufficient ? No, there are no issues. The node_desc file is owned by root and has perms 0644, and root can already change the node_desc by more nefarious means anyway. - R. From ftillier at silverstorm.com Wed Sep 14 20:31:15 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Wed, 14 Sep 2005 20:31:15 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <527jdjxfew.fsf@cisco.com> Message-ID: <005b01c5b9a5$eec70550$9e5aa8c0@infiniconsys.com> > From: Roland Dreier [mailto:rolandd at cisco.com] > Sent: Wednesday, September 14, 2005 6:56 PM > > Roland> - Allows userspace to set node_desc by writing into sysfs file, > Hal> Shouldn't there be a non volatile way to do this ? > > I'm not sure what "non volatile" means in this context. For Mellanox > HCAs, one can already set a permanent NodeDescription in the flash > when burning firmware. To me, non volatile in this context means something like using the system name. Would this be hard to do? In fact, I would prefer to see the system name used instead of whatever is programmed in the HCA as the default. Granted, this makes the node description burned in the firmware useless, but that doesn't seem like a big deal. - Fab From halr at voltaire.com Wed Sep 14 20:38:43 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 14 Sep 2005 23:38:43 -0400 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <527jdjxfew.fsf@cisco.com> References: <52ek7rxljj.fsf@cisco.com> <1126745723.5425.495.camel@hal.voltaire.com> <527jdjxfew.fsf@cisco.com> Message-ID: <1126755522.5425.824.camel@hal.voltaire.com> On Wed, 2005-09-14 at 21:56, Roland Dreier wrote: > Roland> - Allows userspace to set node_desc by writing into sysfs file, eg. > Roland> echo -n `hostname` >> /sys/class/infiniband/mthca0/node_desc > > Hal> Shouldn't there be a non volatile way to do this ? > > I'm not sure what "non volatile" means in this context. For Mellanox > HCAs, one can already set a permanent NodeDescription in the flash > when burning firmware. Also, the echo command above was just an > example; a hotplug script could also do > > cat /my/permanent/node_desc > /sys/class/infiniband/mthca0/node_desc > > or anything more elaborate it wants. The issue I see with changing this "on the fly" is that the SM has no way of knowing it changed (other than polling this which is less than optimal). -- Hal From rolandd at cisco.com Wed Sep 14 22:18:28 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 14 Sep 2005 22:18:28 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <005b01c5b9a5$eec70550$9e5aa8c0@infiniconsys.com> (Fab Tillier's message of "Wed, 14 Sep 2005 20:31:15 -0700") References: <005b01c5b9a5$eec70550$9e5aa8c0@infiniconsys.com> Message-ID: <52y85yx61n.fsf@cisco.com> Fab> To me, non volatile in this context means something like Fab> using the system name. Huh?? To me non-volatile means not changing. Fab> Would this be hard to do? In fact, I would prefer to see the Fab> system name used instead of whatever is programmed in the HCA Fab> as the default. Granted, this makes the node description Fab> burned in the firmware useless, but that doesn't seem like a Fab> big deal. It's easy to do, but I don't want to put naming policy in the kernel. I'll try to think of a clean way to give userspace a chance to set the node description before the HCA ports are exposed to the SM. However, there are various unsolvable cases like boot over IB, where the HCA ports need to be active to mount the root filesystem, but the system doesn't have its hostname set until after the root filesystem is mounted. - R. From mst at mellanox.co.il Wed Sep 14 22:19:31 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Sep 2005 08:19:31 +0300 Subject: [openib-general] Re: [PATCH] libibcm/libibat disable-libcheck option In-Reply-To: <52psrbxqpy.fsf@cisco.com> References: <52psrbxqpy.fsf@cisco.com> Message-ID: <20050915051931.GA7802@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] libibcm/libibat disable-libcheck option > > Michael> Add an option to disable configure checks for ib > Michael> libraries. This makes it possible to first configure all > Michael> libraries, then make them all. > > Why do we really want this? Is it so hard to build things in order? The point is to be able to first configure all libraries, then make them. I have a central configure script that configures the rest of the libraries. In monolotic builds, configure checks are just a hassle. > Does libibcm even build without libibat installed? It seems like > libibcm needs to be able to find sa.h. Yes, thats unfortunate. But thats not a problem for me since I make things in order. Its only the configure step compiling test programs that presents a problem. -- MST From jackm at mellanox.co.il Wed Sep 14 22:31:43 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 15 Sep 2005 08:31:43 +0300 Subject: [PATCH] [openib-general] Strange configure error in libibcm Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEB1D@mtlexch01.mtl.com> No error message is generated (i.e., the patch fixes the problem). autoconf and configure work as they should with this patch. Jack -----Original Message----- From: Roland Dreier [mailto:rolandd at cisco.com] Sent: Wednesday, September 14, 2005 8:01 PM To: Jack Morgenstein Cc: openib-general at openib.org Subject: Re: [PATCH] [openib-general] Strange configure error in libibcm Jack> The problem is in the ordering of checks in file Jack> userspace/libibcm/configure.in Jack> Below is a patch for the problem: Yes, this will mask the problem for sure. However the underlying issue is still that ld has a different search path than ld.so. I'll apply this since it will make the error message easier to understand. - R. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Wed Sep 14 22:32:09 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 14 Sep 2005 22:32:09 -0700 Subject: [PATCH] [openib-general] Strange configure error in libibcm In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEB1D@mtlexch01.mtl.com> (Jack Morgenstein's message of "Thu, 15 Sep 2005 08:31:43 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEB1D@mtlexch01.mtl.com> Message-ID: <52k6hix5eu.fsf@cisco.com> Jack> No error message is generated (i.e., the patch fixes the Jack> problem). autoconf and configure work as they should with Jack> this patch. Right, but the program that gets built won't run because ld.so won't find the library it's linked with. That's what I meant about the error message being easier to understand: it's easy to see you have to set LD_LIBRARY_PATH in that case. - R. From ftillier at silverstorm.com Wed Sep 14 22:43:45 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Wed, 14 Sep 2005 22:43:45 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <52y85yx61n.fsf@cisco.com> Message-ID: <005c01c5b9b8$713a62e0$9e5aa8c0@infiniconsys.com> > From: Roland Dreier [mailto:rolandd at cisco.com] > Sent: Wednesday, September 14, 2005 10:18 PM > > Fab> To me, non volatile in this context means something like > Fab> using the system name. > > Huh?? To me non-volatile means not changing. What I meant is that having the users be able to set anything they want at runtime isn't non-volatile. One could argue that the system name is volatile, since it can be changed, but how often do people change their system names once the system is setup? Just ignore me if I don't make sense. :) - Fab From jackm at mellanox.co.il Wed Sep 14 22:54:26 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 15 Sep 2005 08:54:26 +0300 Subject: [PATCH] [openib-general] Strange configure error in libibcm Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEB36@mtlexch01.mtl.com> What I noticed is that moving the sizeof check before the library check resulted in the generated GCC command not including the libraries (no -l parameters). Therefore, the sizeof check program did indeed run, and yielded the correct result. Evidently, once the library check is performed, subsequent checks all generate a GCC command line which includes the checked libraries as "-l" parameters. You simply need to organize file configure.in so that all tests which do not require extra libraries are executed before any library-check tests. (I assume that the macro language generates a "LIB=" line when a library check is requested, and that this line is not unset after the test -- a probable bug in autoconf or configure). BTW, I noticed that the library check only compiled test programs -- it did not attempt to run them! (thats why the library check succeeded even though the checked libraries are not in the search path). Strange. Jack -----Original Message----- From: Roland Dreier [mailto:rolandd at cisco.com] Sent: Thursday, September 15, 2005 8:32 AM To: Jack Morgenstein Cc: openib-general at openib.org Subject: Re: [PATCH] [openib-general] Strange configure error in libibcm Jack> No error message is generated (i.e., the patch fixes the Jack> problem). autoconf and configure work as they should with Jack> this patch. Right, but the program that gets built won't run because ld.so won't find the library it's linked with. That's what I meant about the error message being easier to understand: it's easy to see you have to set LD_LIBRARY_PATH in that case. - R. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Wed Sep 14 22:51:06 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 15 Sep 2005 08:51:06 +0300 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <1126745723.5425.495.camel@hal.voltaire.com> References: <1126745723.5425.495.camel@hal.voltaire.com> Message-ID: <43290BCA.5050308@mellanox.co.il> Hal Rosenstock wrote: > On Wed, 2005-09-14 at 19:43, Roland Dreier wrote: > >> - Allows userspace to set node_desc by writing into sysfs file, eg. >> echo -n `hostname` >> /sys/class/infiniband/mthca0/node_desc I would think the way to use this is to add the above call to the modprobe sequence. From mst at mellanox.co.il Thu Sep 15 00:36:29 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Sep 2005 10:36:29 +0300 Subject: [openib-general] Re: [PATCH] Allow setting of NodeDescription In-Reply-To: <52ek7rxljj.fsf@cisco.com> References: <52ek7rxljj.fsf@cisco.com> Message-ID: <20050915073629.GQ28025@mellanox.co.il> Quoting r. Roland Dreier : > Subject: [PATCH] Allow setting of NodeDescription > > This patch does a few things: > - Adds node_guid and node_desc fields to struct ib_device > - Has mthca set these fields on startup > - Extends modify_device method to handle setting node_desc > - Exposes node_desc in sysfs > - Allows userspace to set node_desc by writing into sysfs file, eg. > echo -n `hostname` >> /sys/class/infiniband/mthca0/node_desc > > This should probably be combined with Sean's work to get rid of > node_guid queries in ULPs. > > Comments? > > - R. Good stuff. I think echo -n `hostname` mthca0 >> /sys/class/infiniband/mthca0/node_desc is even more useful, but thats now up to the user, isnt it? > +static int mthca_init_node_data(struct mthca_dev *dev) > +{ > + struct ib_smp *in_mad = NULL; > + struct ib_smp *out_mad = NULL; > + int err = -ENOMEM; > + u8 status; > + > + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); > + out_mad = kmalloc(sizeof *out_mad, GFP_KERNEL); > + if (!in_mad || !out_mad) > + goto out; > + > + init_query_mad(in_mad); > + in_mad->attr_id = IB_SMP_ATTR_NODE_DESC; > + > + err = mthca_MAD_IFC(dev, 1, 1, > + 1, NULL, NULL, in_mad, out_mad, > + &status); > + if (err) > + goto out; > + if (status) { > + err = -EINVAL; > + goto out; > + } > + > + memcpy(dev->ib_dev.node_desc, out_mad->data, 64); Would make more sense to initialize node_desc to an empty string instead of whatever is programmed in the device flash? This creates a way for the remote user to find out that the "echo" script above did not run yet, and try again later. > @@ -1064,6 +1120,7 @@ int mthca_register_device(struct mthca_d > dev->ib_dev.class_dev.dev = &dev->pdev->dev; > dev->ib_dev.query_device = mthca_query_device; > dev->ib_dev.query_port = mthca_query_port; > + dev->ib_dev.modify_device = mthca_modify_device; > dev->ib_dev.modify_port = mthca_modify_port; > dev->ib_dev.query_pkey = mthca_query_pkey; > dev->ib_dev.query_gid = mthca_query_gid; By the way, why do we need ib_modify_device in ib_verbs, at all? The code basically does memcpy from ibdev, in addition to some locking. This seems something that belongs in ib_mad, the mad snooping logic would be exactly the same for any provider. No? -- MST From mst at mellanox.co.il Thu Sep 15 00:44:07 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Sep 2005 10:44:07 +0300 Subject: [openib-general] RFC: struct netdevice changes for IPoIB UC support Message-ID: <20050915074407.GS28025@mellanox.co.il> Hi! As was already discussed on this list, one of the difficulties with IP over IB support for UC mode, is the fact that the same device has to support sending both UC (max MTU 2Gbyte) and UD (max MTU 2Kbyte) packets, depending on packet link address. I propose the following simple patch to let the netdevice override the path MTU per dst entry. The patch was tested by modifying existing IPoIB code to use MTU of 1K for some addresses, and 2K for others. Please comment on this approach: if it makes sense to you guys, I'll try forwarding this to netdev and lkml lists. Thanks, MST --- Make it possible for a network device to support more than one MTU value at a time (depending on packet link address, or other criteria). Signed-off-by: Michael S. Tsirkin Index: linux-2.6.12.5/include/linux/netdevice.h =================================================================== --- linux-2.6.12.5.orig/include/linux/netdevice.h +++ linux-2.6.12.5/include/linux/netdevice.h @@ -454,6 +454,10 @@ struct net_device #define HAVE_CHANGE_MTU int (*change_mtu)(struct net_device *dev, int new_mtu); +#define HAVE_GET_MTU + u32 (*get_mtu)(struct net_device *dev, + struct neighbour *neigh, + int path_mtu); #define HAVE_TX_TIMEOUT void (*tx_timeout) (struct net_device *dev); Index: linux-2.6.12.5/include/net/dst.h =================================================================== --- linux-2.6.12.5.orig/include/net/dst.h +++ linux-2.6.12.5/include/net/dst.h @@ -111,7 +111,12 @@ dst_metric(const struct dst_entry *dst, static inline u32 dst_mtu(const struct dst_entry *dst) { - u32 mtu = dst_metric(dst, RTAX_MTU); + u32 mtu; + if (dst->dev && dst->dev->get_mtu) + mtu = dst->dev->get_mtu(dst->dev, dst->neighbour, + dst_metric(dst, RTAX_MTU)); + else + mtu = dst_metric(dst, RTAX_MTU); /* * Alexey put it here, so ask him about it :) */ -- MST From eitan at mellanox.co.il Thu Sep 15 01:01:47 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 15 Sep 2005 11:01:47 +0300 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <1126711190.4425.439.camel@hal.voltaire.com> References: <1126711190.4425.439.camel@hal.voltaire.com> Message-ID: <43292A6B.7040906@mellanox.co.il> Hal Rosenstock wrote: > > > No. The patches are part of this. It would depend on what OpenIB svn > version you are running with but if it is a recent pull then they are > all there. OK I got the kernel restarted now. From the Analyzer dump I can see the intermediate segments paylen is 0 so I guess I'm up to date. But the osmtest produces an inventory file that misses some of the records being sent. Now lets go back to the test: I use a machine connected through a single switch (IS3) to itself. I use osmtest -f c to get Nodes,Ports and PathRecords from the SM. From OpenSM Log file I see: Sep 15 09:47:37 531029 [8003] -> osm_nr_rcv_process: Returning 3 records. Sep 15 09:47:37 538586 [C004] -> osm_pir_rcv_process: Returning 27 records. So we can conclude the following RMPP transactions should be sent: 1. NodeRec: attrOffset is 14 and each record size with padding is 112bytes. The RMPP with 336byte data should require 2 segments = ceiling(336/200). First segment paylen should be 336 + 2 * 20 = 376. Last segment paylen should be 336 - 200 + 20 = 156. 2. PortInfoRecords: attrOffset is 8 and each record size with padding is 64bytes. The RMPP with 1728 = 27 * 64byte data should require 9 segments = ceiling(1728/200). First segment paylen should be 1728 + 9 * 20 = 1908. Lat segment paylen should be 1728 - 8*200 + 20 = 148. What we see in the attached analyzer capture: NodeInfoRec Attr Expected Measured Num Segments 2 2 First Paylen 376 376 Last Paylen 156 156 PortInfoRec Attr Expected Measured Num Segments 9 9 First Paylen 1908 1908 Last Paylen 148 148 So the response on the wire is 100% OK. Thanks Sean. Now I go to the SA client section: From osmtest log I see: NodeInfoRec: Aug 21 14:46:56 [4017F6C0] -> __osmv_send_sa_req: Waiting for async event. Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: [ Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: [ Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x807b8a4, size = 256. Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquired UMAD 0x807c198, size = 256. Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: ] Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: Acquired p_madw = 0x807b898, p_mad = 0x807c1d0, size = 256. Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: ] Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: [ Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88) Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: [ Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: ] Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: ] I wonder how come the received MAD is only of 256 bytes. I expected it to be of headers + data = 56 + 336 = 392byte. So my conclusion is that for some reason the response MAD is not re-assembled correctly or the communication between the assembly to the umad layer is broken. Or maybe I am missing some patches. I see that in the osm_vendor_ibumad.c the receive flow is allocating a MAD using: p_osm_madw = osm_mad_pool_get_wrapper(p_mad_bind_info->p_mad_pool, p_mad_bind_info, MAD_BLOCK_SIZE, (ib_mad_t*)&pRecvMad->IBMad, &osm_mad_addr); I suspect the allocation should use the receive mad size. Thanks Eitan From mst at mellanox.co.il Thu Sep 15 01:09:08 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Sep 2005 11:09:08 +0300 Subject: [openib-general] Re: [PATCH] Allow setting of NodeDescription In-Reply-To: <20050915073629.GQ28025@mellanox.co.il> References: <20050915073629.GQ28025@mellanox.co.il> Message-ID: <20050915080908.GU28025@mellanox.co.il> Quoting Michael S. Tsirkin : > Would make more sense to initialize node_desc to an empty string Typo fix: Would *it* make more sense to initialize node_desc to an empty string -- MST From jackm at mellanox.co.il Thu Sep 15 02:00:18 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 15 Sep 2005 12:00:18 +0300 Subject: [openib-general] [PATCH] Allow setting of NodeDescription Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEBF6@mtlexch01.mtl.com> Unless the Node Description is changed before QP0 on the HCA enters the RTS state, there is no guarantee that the SM will receive the updated Node Description string. Note the following scenario (in which SM is already up-and-running on another host in the network): 1. Local driver starts up 2. QP0 on local HCA enters RTS state 3. SM sweep occurs just after 2 above 4. Local SM agent provides SM with Node Description string as was specified at HCA startup 5. SM provides SA with Node Description info, for the NodeRecord query for each node in the net. 5. User now modifies Node Description to something else 6. NO guarantee that SM will pay attention to change of Node Description at next sweep. (depends on SM implementation). 7. SA will then use the original Node Description string when responding to a NodeRecord query. The resulting set of NodeDescription strings present in the SM and SA could then be a race-dependent salad (depending on the timing of QP0 entering RTS state, SM subnet sweep, and resetting of the local NodeDescription string). Jack -----Original Message----- From: Roland Dreier [mailto:rolandd at cisco.com] Sent: Thursday, September 15, 2005 8:18 AM To: Fab Tillier Cc: openib-general at openib.org Subject: Re: [openib-general] [PATCH] Allow setting of NodeDescription Fab> To me, non volatile in this context means something like Fab> using the system name. Huh?? To me non-volatile means not changing. Fab> Would this be hard to do? In fact, I would prefer to see the Fab> system name used instead of whatever is programmed in the HCA Fab> as the default. Granted, this makes the node description Fab> burned in the firmware useless, but that doesn't seem like a Fab> big deal. It's easy to do, but I don't want to put naming policy in the kernel. I'll try to think of a clean way to give userspace a chance to set the node description before the HCA ports are exposed to the SM. However, there are various unsolvable cases like boot over IB, where the HCA ports need to be active to mount the root filesystem, but the system doesn't have its hostname set until after the root filesystem is mounted. - R. _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From QiWang.Chen at Clustars.CN Thu Sep 15 03:11:26 2005 From: QiWang.Chen at Clustars.CN (QiWang, Chen) Date: Thu, 15 Sep 2005 18:11:26 +0800 Subject: [openib-general] could not add HCA InfiniHost0 Message-ID: <1126779086.22691.7.camel@QiWang> Hello everyone, I had Mellanox MT23108 HCA, RHEL4 U1, kernel 2.6.9-11, node1 to node8 works fine. Drivers: IBGD-1.8.0, FW= 3.3.3 lspci: 02:00.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge(reva1) 03:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) but node9 to node16 doesn't work. lspci: 02:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge(reva1) 03:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) there are some diff: 02:00.0 --> work 02:01.0 --> failed and first time I install the ib-verbs on node1, It also failed, because lspci= 02:01.0, an I don not know how i change 02:01.0 to 02:00.0, and it works fine for me. the error logs list here: _--------------------------------------------------------------------- Hostname: c01-14 OS: Red Hat Enterprise Linux AS release 4 (Nahant Update 1) Kernel \r on an \m Current kernel: 2.6.9-11.ELsmp Architecture: i686 GCC version: gcc (GCC) 3.4.3 20050227 (Red Hat 3.4.3-22.1) Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. CPU: model name : Intel(R) Xeon(TM) CPU 2.66GHz MemTotal: 2075004 kB Chipset: 00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) Device /dev/mst/mt23108_pci_cr0 Info: Firmware: Version: 3.03.0003 Date: 05/07/2005 18:46:35 ############# LSPCI ############## 00:00.0 Host bridge: Intel Corporation E7501 Memory Controller Hub (rev 01) 00:00.1 Class ff00: Intel Corporation E7500/E7501 Host RASUM Controller (rev 01) 00:04.0 PCI bridge: Intel Corporation E7500/E7501 Hub Interface D PCI- to-PCI Bridge (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #1) (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #2) (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801CA/CAM USB (Hub #3) (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 42) 00:1f.0 ISA bridge: Intel Corporation 82801CA LPC Interface Controller (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801CA Ultra ATA Storage Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801CA/CAM SMBus Controller (rev 02) 01:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 01:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 01:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 02:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) 03:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) 04:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet (rev 03) 04:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704S Gigabit Ethernet (rev 03) 05:00.0 VGA compatible controller: Chips and Technologies F69030 (rev 61) 05:08.0 Ethernet controller: Intel Corporation 82801BA/BAM/CA/CAM Ethernet Controller (rev 42) ############# LSPCI -N ############## 00:00.0 Class 0600: 8086:254c (rev 01) 00:00.1 Class ff00: 8086:2541 (rev 01) 00:04.0 Class 0604: 8086:2547 (rev 01) 00:1d.0 Class 0c03: 8086:2482 (rev 02) 00:1d.1 Class 0c03: 8086:2484 (rev 02) 00:1d.2 Class 0c03: 8086:2487 (rev 02) 00:1e.0 Class 0604: 8086:244e (rev 42) 00:1f.0 Class 0601: 8086:2480 (rev 02) 00:1f.1 Class 0101: 8086:248b (rev 02) 00:1f.3 Class 0c05: 8086:2483 (rev 02) 01:1c.0 Class 0800: 8086:1461 (rev 04) 01:1d.0 Class 0604: 8086:1460 (rev 04) 01:1e.0 Class 0800: 8086:1461 (rev 04) 01:1f.0 Class 0604: 8086:1460 (rev 04) 02:01.0 Class 0604: 15b3:5a46 (rev a1) 03:00.0 Class 0c06: 15b3:5a44 (rev a1) 04:01.0 Class 0200: 14e4:16a8 (rev 03) 04:01.1 Class 0200: 14e4:16a8 (rev 03) 05:00.0 Class 0300: 102c:0c30 (rev 61) 05:08.0 Class 0200: 8086:2449 (rev 42) ############# LSMOD ############## Module Size Used by ib_sa_client 34312 0 ib_client_query 22240 1 ib_sa_client ib_poll 21560 1 ib_client_query ib_useraccess 16708 0 ib_tavor 39972 0 ib_mad 26380 3 ib_client_query,ib_useraccess,ib_tavor ib_core 237588 4 ib_sa_client,ib_useraccess,ib_tavor,ib_mad ib_services 22468 7 ib_sa_client,ib_client_query,ib_poll,ib_useraccess,ib_tavor,ib_mad,ib_core mod_thh 290020 0 mst_pciconf 87296 0 mst_pci 84352 0 mod_vip 329288 2 ib_tavor,mod_thh mlxsys 95664 2 mod_thh,mod_vip nfs 200869 0 nfsd 205281 9 exportfs 10049 1 nfsd lockd 65257 3 nfs,nfsd md5 8001 1 ipv6 238817 12 autofs4 22085 2 sunrpc 138789 20 nfs,nfsd,lockd dm_mod 58949 0 button 10449 0 battery 12869 0 ac 8773 0 uhci_hcd 32729 0 tg3 82373 0 e100 36673 0 mii 8641 1 e100 floppy 58065 0 ext3 118729 3 jbd 59481 1 ext3 ############# DMESG ############## iband/ib_verbs/hw/mellanox-hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=31, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=32, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=33, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=34, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=35, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=36, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=37, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=38, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=39, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=40, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=41, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=42, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=43, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=44, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=45, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=46, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=47, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=48, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=49, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=50, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=51, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=52, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=53, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=54, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=55, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=56, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=57, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=58, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=59, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=60, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=61, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=62, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=63, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=64, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=65, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=66, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=67, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=68, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=69, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=70, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=71, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=72, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=73, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=74, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=75, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=76, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=77, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=78, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=79, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=80, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=81, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=82, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=83, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=84, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=85, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=86, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=87, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=88, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=89, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=90, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=91, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=92, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=93, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=94, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=95, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=96, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=97, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=98, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=99, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=100, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=101, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=102, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=103, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=104, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=105, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=106, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=107, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=108, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=109, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=110, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=111, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=112, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=113, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=114, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=115, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=116, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=117, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=118, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=119, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=120, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=121, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=122, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=123, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=124, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=125, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=126, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=127, token=0x0000, counter=0 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif.c[211]: Failed command 0x24 (TAVOR_IF_CMD_MAD_IFC): status=0x103 (0x0103 - unexpected error - fatal) THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/hob_comm.c[250]: XHH_hob_query_port_prop: cmdif returned FATAL VIPKL(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/vip/qpm.c[291]: QPM_new: HOBKL_query_port_prop returned with error: -254 = VAPI_EFATAL VIPKL(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/vip/qpm.c[322]: QPM_new: returned with error: -254 = VAPI_EFATAL THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/hob_comm.c[2323]: XHH_hob_halt_hca: HALT HCA returned 0x103 THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/hob_comm.c[2699]: XHH_hob_restart: destroying old HOB THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/hob.c[1581]: XHH_hob_destroy_internal: FATAL ERROR THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/hob_comm.c[2705]: XHH_hob_restart: creating new HOB Mellanox Tavor Device Driver is creating device "InfiniHost0" (bus=03, devfn=00) [KERNEL_IB][_tsIbTavorInitOne][/var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/provider/tavor_main.c:178]InfiniHost0: VAPI_open_hca failed, status -254 (Fatal error (Local Catastrophic Error)) [KERNEL_IB][_tslbTavorPnPEventHandler][/var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/provider/tavor_main.c:352]_tslbTavorPnPEventHandler: could not add HCA InfiniHost0 (-19) ############# Messages ############## Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=106, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=107, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=108, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=109, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=110, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=111, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=112, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=113, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=114, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=115, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=116, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=117, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=118, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=119, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=120, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=121, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=122, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=123, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=124, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=125, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=126, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif_comm.c[1482]: print_track_arr: idx=127, token=0x0000, counter=0 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/cmdif.c[211]: Failed command 0x24 (TAVOR_IF_CMD_MAD_IFC): status=0x103 (0x0103 - unexpected error - fatal) Sep 15 01:08:54 c01-14 kernel: Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/hob_comm.c[250]: XHH_hob_query_port_prop: cmdif returned FATAL Sep 15 01:08:54 c01-14 kernel: VIPKL(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox-hca/vip/qpm.c [291]: QPM_new: HOBKL_query_port_prop returned with error: -254 = VAPI_EFATAL Sep 15 01:08:54 c01-14 kernel: VIPKL(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox-hca/vip/qpm.c [322]: QPM_new: returned with error: -254 = VAPI_EFATAL Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/hob_comm.c[2323]: XHH_hob_halt_hca: HALT HCA returned 0x103 Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/hob_comm.c[2699]: XHH_hob_restart: destroying old HOB Sep 15 01:08:54 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/hob.c[1581]: XHH_hob_destroy_internal: FATAL ERROR Sep 15 01:08:55 c01-14 kernel: THH(1): var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/mellanox- hca/mlxhh/thh/hob_comm.c[2705]: XHH_hob_restart: creating new HOB Sep 15 01:08:55 c01-14 kernel: Sep 15 01:08:55 c01-14 kernel: Mellanox Tavor Device Driver is creating device "InfiniHost0" (bus=03, devfn=00) Sep 15 01:08:55 c01-14 kernel: Sep 15 01:08:56 c01-14 kernel: [KERNEL_IB][_tsIbTavorInitOne][/var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/provider/tavor_main.c:178]InfiniHost0: VAPI_open_hca failed, status -254 (Fatal error (Local Catastrophic Error)) Sep 15 01:08:56 c01-14 kernel: [KERNEL_IB][_tslbTavorPnPEventHandler][/var/tmp/IBGD//tmp/openib/infiniband/ib_verbs/hw/provider/tavor_main.c:352]_tslbTavorPnPEventHandler: could not add HCA InfiniHost0 (-19) Sep 15 01:08:56 c01-14 modprobe: FATAL: Error inserting ib_ipoib (/lib/modules/2.6.9-11.ELsmp/kernel/drivers/infiniband/ib_ipoib.ko): No such device Sep 15 01:08:56 c01-14 modprobe: FATAL: Error running install command for ib_ipoib Sep 15 01:08:56 c01-14 modprobe: FATAL: Error inserting ib_ipoib (/lib/modules/2.6.9-11.ELsmp/kernel/drivers/infiniband/ib_ipoib.ko): No such device Sep 15 01:08:56 c01-14 modprobe: FATAL: Error running install command for ib_ipoib Sep 15 01:08:56 c01-14 modprobe: FATAL: Error inserting ib_ipoib (/lib/modules/2.6.9-11.ELsmp/kernel/drivers/infiniband/ib_ipoib.ko): No such device Sep 15 01:08:56 c01-14 modprobe: FATAL: Error running install command for ib_ipoib Sep 15 01:08:56 c01-14 modprobe: FATAL: Error inserting ib_ipoib (/lib/modules/2.6.9-11.ELsmp/kernel/drivers/infiniband/ib_ipoib.ko): No such device Sep 15 01:08:56 c01-14 modprobe: FATAL: Error running install command for ib_ipoib Sep 15 01:08:56 c01-14 modprobe: FATAL: Error inserting ib_ipoib (/lib/modules/2.6.9-11.ELsmp/kernel/drivers/infiniband/ib_ipoib.ko): No such device Sep 15 01:08:56 c01-14 modprobe: FATAL: Error running install command for ib_ipoib Sep 15 01:08:56 c01-14 modprobe: FATAL: Error inserting ib_ipoib (/lib/modules/2.6.9-11.ELsmp/kernel/drivers/infiniband/ib_ipoib.ko): No such device Sep 15 01:08:56 c01-14 modprobe: FATAL: Error running install command for ib_ipoib Sep 15 01:08:56 c01-14 modprobe: FATAL: Error inserting ib_ipoib (/lib/modules/2.6.9-11.ELsmp/kernel/drivers/infiniband/ib_ipoib.ko): No such device Sep 15 01:08:56 c01-14 modprobe: FATAL: Error running install command for ib_ipoib ############# Running Processes ############## UID PID PPID C STIME TTY TIME CMD root 1 0 0 00:32 ? 00:00:00 init [3] root 2 1 0 00:32 ? 00:00:00 [migration/0] root 3 1 0 00:32 ? 00:00:00 [ksoftirqd/0] root 4 1 0 00:32 ? 00:00:00 [migration/1] root 5 1 0 00:32 ? 00:00:00 [ksoftirqd/1] root 6 1 0 00:32 ? 00:00:00 [events/0] root 7 1 0 00:32 ? 00:00:00 [events/1] root 8 6 0 00:32 ? 00:00:00 [khelper] root 9 6 0 00:32 ? 00:00:00 [kacpid] root 36 6 0 00:32 ? 00:00:00 [kblockd/0] root 37 6 0 00:32 ? 00:00:00 [kblockd/1] root 47 6 0 00:32 ? 00:00:00 [pdflush] root 50 6 0 00:32 ? 00:00:00 [aio/0] root 51 6 0 00:32 ? 00:00:00 [aio/1] root 38 1 0 00:32 ? 00:00:00 [khubd] root 49 1 0 00:32 ? 00:00:00 [kswapd0] root 124 1 0 00:32 ? 00:00:00 [kseriod] root 198 1 0 00:32 ? 00:00:00 [kjournald] root 1020 1 0 00:32 ? 00:00:00 udevd root 1331 1 0 00:32 ? 00:00:00 [kjournald] root 1332 1 0 00:32 ? 00:00:00 [kjournald] root 1723 1 0 00:32 ? 00:00:00 /sbin/dhclient -1 -q - lf /var/lib/dhcp/dhclient-eth0.leases -pf /var/run/dhclient-eth0.pid eth0 root 1782 1 0 00:32 ? 00:00:00 syslogd -m 0 root 1787 1 0 00:32 ? 00:00:00 klogd -x root 1798 1 0 00:32 ? 00:00:00 irqbalance rpc 1816 1 0 00:32 ? 00:00:00 portmap rpcuser 1836 1 0 00:32 ? 00:00:00 rpc.statd root 1931 1 0 00:32 ? 00:00:00 rpc.idmapd root 1987 1 0 00:32 ? 00:00:00 ypbind root 2137 1 0 00:32 ? 00:00:00 /usr/sbin/automount -- timeout=60 /home yp auto.home root 2139 1 0 00:32 ? 00:00:00 /usr/sbin/automount -- timeout=60 /export yp auto.export root 2157 1 0 00:32 ? 00:00:00 /usr/sbin/smartd root 2167 1 0 00:32 ? 00:00:00 /usr/sbin/acpid root 2267 1 0 00:32 ? 00:00:00 /usr/sbin/sshd root 2282 1 0 00:32 ? 00:00:00 xinetd -stayalive - pidfile /var/run/xinetd.pid ntp 2298 1 0 00:32 ? 00:00:00 ntpd -u ntp:ntp - p /var/run/ntpd.pid root 2312 1 0 00:32 ? 00:00:00 rpc.rquotad root 2321 1 0 00:32 ? 00:00:00 [nfsd] root 2322 1 0 00:32 ? 00:00:00 [nfsd] root 2323 1 0 00:32 ? 00:00:00 [nfsd] root 2324 1 0 00:32 ? 00:00:00 [nfsd] root 2325 1 0 00:32 ? 00:00:00 [nfsd] root 2326 1 0 00:32 ? 00:00:00 [nfsd] root 2327 1 0 00:32 ? 00:00:00 [nfsd] root 2328 1 0 00:32 ? 00:00:00 [nfsd] root 2329 1 0 00:32 ? 00:00:00 [lockd] root 2330 1 0 00:32 ? 00:00:00 [rpciod] root 2334 1 0 00:32 ? 00:00:00 rpc.mountd nobody 2362 1 0 00:32 ? 00:00:00 /usr/sbin/gmond root 2378 1 0 00:32 ? 00:00:00 gpm -m /dev/input/mice - t imps2 root 2425 1 0 00:32 ? 00:00:00 /sbin/dhclient -1 -q - lf /var/lib/dhcp/dhclient-eth0.leases -pf /var/run/dhclient-eth0.pid eth0 root 2450 1 0 00:32 ? 00:00:00 /opt/torque-1.2.0p5/sbin/pbs_mom -r root 2459 1 0 00:32 ? 00:00:00 crond xfs 2487 1 0 00:32 ? 00:00:00 xfs -droppriv -daemon root 2497 1 0 00:32 ? 00:00:00 anacron -s daemon 2506 1 0 00:32 ? 00:00:00 /usr/sbin/atd dbus 2516 1 0 00:32 ? 00:00:00 dbus-daemon-1 --system root 2527 1 0 00:32 ? 00:00:00 cups-config-daemon root 2538 1 0 00:32 ? 00:00:00 hald root 2547 1 0 00:32 tty1 00:00:00 /sbin/mingetty tty1 root 2548 1 0 00:32 tty2 00:00:00 /sbin/mingetty tty2 root 2549 1 0 00:32 tty3 00:00:00 /sbin/mingetty tty3 root 2550 1 0 00:32 tty4 00:00:00 /sbin/mingetty tty4 root 2551 1 0 00:32 tty5 00:00:00 /sbin/mingetty tty5 root 2552 1 0 00:32 tty6 00:00:00 /sbin/mingetty tty6 root 3687 6 0 01:05 ? 00:00:00 [pdflush] root 3855 2282 0 01:07 ? 00:00:00 in.rshd root 3856 3855 0 01:07 ? 00:00:00 /bin/bash /etc/rc.d/init.d/openibd start root 4109 1 0 01:08 ? 00:00:00 [cleanup_thread] root 4136 1 0 01:08 ? 00:00:00 [ts_poll] root 4235 3856 0 01:08 ? 00:00:00 /bin/ps -ef ############################################## Can anybody help me??? Thx -- QiWang, Chen Clustars Supercomputing Technology corp. http://www.Clustars.CN TEL:+86-0816-2546345-815 FAX:+86-0816-2546370 Mobile:+86-13096497499 From halr at voltaire.com Thu Sep 15 03:40:26 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2005 06:40:26 -0400 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <43290BCA.5050308@mellanox.co.il> References: <1126745723.5425.495.camel@hal.voltaire.com> <43290BCA.5050308@mellanox.co.il> Message-ID: <1126780671.5425.1694.camel@hal.voltaire.com> On Thu, 2005-09-15 at 01:51, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > On Wed, 2005-09-14 at 19:43, Roland Dreier wrote: > > > >> - Allows userspace to set node_desc by writing into sysfs file, eg. > >> echo -n `hostname` >> /sys/class/infiniband/mthca0/node_desc > I would think the way to use this is to add the above call to the modprobe sequence. That still leaves a window where the SM could get this after the driver and core are started but before this is set. -- Hal From halr at voltaire.com Thu Sep 15 03:43:41 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2005 06:43:41 -0400 Subject: [openib-general] Re: [PATCH] Allow setting of NodeDescription In-Reply-To: <20050915073629.GQ28025@mellanox.co.il> References: <52ek7rxljj.fsf@cisco.com> <20050915073629.GQ28025@mellanox.co.il> Message-ID: <1126780676.5425.1696.camel@hal.voltaire.com> On Thu, 2005-09-15 at 03:36, Michael S. Tsirkin wrote: > Would make more sense to initialize node_desc to an empty string > instead of whatever is programmed in the device flash? > This creates a way for the remote user to find out that the "echo" > script above did not run yet, and try again later. I think this is the wrong direction to go as the SM may now see many nodes without any NodeDescription. I don't think the tradeoff to know that it wasn't set this way is worth it on what it potentially causes to SM and any management/diagnosic tools. -- Hal From halr at voltaire.com Thu Sep 15 03:46:54 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2005 06:46:54 -0400 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEBF6@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEBF6@mtlexch01.mtl.com> Message-ID: <1126780826.5425.1706.camel@hal.voltaire.com> On Thu, 2005-09-15 at 05:00, Jack Morgenstein wrote: > Unless the Node Description is changed before QP0 on the HCA enters > the RTS state, there is no guarantee that the SM will receive the > updated Node Description string. > 6. NO guarantee that SM will pay attention to change of Node > Description at next sweep. (depends on SM implementation). That was/is my point exactly. -- Hal From mst at mellanox.co.il Thu Sep 15 03:58:30 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Sep 2005 13:58:30 +0300 Subject: [openib-general] Re: [PATCH] Allow setting of NodeDescription In-Reply-To: <1126780676.5425.1696.camel@hal.voltaire.com> References: <1126780676.5425.1696.camel@hal.voltaire.com> Message-ID: <20050915105830.GY28025@mellanox.co.il> Quoting Hal Rosenstock : > Subject: Re: [openib-general] Re: [PATCH] Allow setting of NodeDescription > > On Thu, 2005-09-15 at 03:36, Michael S. Tsirkin wrote: > > Would make more sense to initialize node_desc to an empty string > > instead of whatever is programmed in the device flash? > > This creates a way for the remote user to find out that the "echo" > > script above did not run yet, and try again later. > > I think this is the wrong direction to go as the SM may now see many > nodes without any NodeDescription. I don't think the tradeoff to know > that it wasn't set this way is worth it on what it potentially causes to > SM and any management/diagnosic tools. Sorry, I dont really understand. Does the spec imply NodeDescription must have non-zero bytes? If not, maybe the management/diagnosic tools should be fixed. -- MST From mst at mellanox.co.il Thu Sep 15 04:18:04 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Sep 2005 14:18:04 +0300 Subject: [openib-general] Re: [PATCH] Allow setting of NodeDescription In-Reply-To: <1126780671.5425.1694.camel@hal.voltaire.com> References: <1126780671.5425.1694.camel@hal.voltaire.com> Message-ID: <20050915111804.GA28025@mellanox.co.il> Quoting Hal Rosenstock : > > > On Wed, 2005-09-14 at 19:43, Roland Dreier wrote: > > > > > >> - Allows userspace to set node_desc by writing into sysfs file, eg. > > >> echo -n `hostname` >> /sys/class/infiniband/mthca0/node_desc > > > > I would think the way to use this is to add the above call to the > > modprobe sequence. > > That still leaves a window where the SM could get this after the driver > and core are started but before this is set. Well, if someone is really worried about this, we could add a module parameter to drop NodeInfo MADs by default. Then you can modprobe ib_mad nodeinfo_drop=1 echo `hostname` > /sys/class/infiniband/mthca0/node_desc echo 0 > /sys/class/infiniband/mthca0/nodeinfo_drop Hmm, maybe this can be generalized to have a mask of MADs to drop. I dont know if this is a good idea. Roland? -- MST From halr at voltaire.com Thu Sep 15 04:33:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2005 07:33:25 -0400 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <43292A6B.7040906@mellanox.co.il> References: <1126711190.4425.439.camel@hal.voltaire.com> <43292A6B.7040906@mellanox.co.il> Message-ID: <1126784004.5425.1904.camel@hal.voltaire.com> On Thu, 2005-09-15 at 04:01, Eitan Zahavi wrote: > OK I got the kernel restarted now. From the Analyzer dump I can see the intermediate segments > paylen is 0 so I guess I'm up to date. Good. > But the osmtest produces an inventory file that misses > some of the records being sent. > > Now lets go back to the test: > > I use a machine connected through a single switch (IS3) to itself. > > I use osmtest -f c to get Nodes, Ports and PathRecords from the SM. > > From OpenSM Log file I see: > Sep 15 09:47:37 531029 [8003] -> osm_nr_rcv_process: Returning 3 records. > Sep 15 09:47:37 538586 [C004] -> osm_pir_rcv_process: Returning 27 records. > > So we can conclude the following RMPP transactions should be sent: > 1. NodeRec: > attrOffset is 14 and each record size with padding is 112bytes. > The RMPP with 336byte data should require 2 segments = ceiling(336/200). > First segment paylen should be 336 + 2 * 20 = 376. > Last segment paylen should be 336 - 200 + 20 = 156. > > 2. PortInfoRecords: > attrOffset is 8 and each record size with padding is 64bytes. > The RMPP with 1728 = 27 * 64byte data should require 9 segments = ceiling(1728/200). > First segment paylen should be 1728 + 9 * 20 = 1908. > Lat segment paylen should be 1728 - 8*200 + 20 = 148. Yes, those calculations appear correct to me. > What we see in the attached analyzer capture: > NodeInfoRec > Attr Expected Measured > Num Segments 2 2 > First Paylen 376 376 > Last Paylen 156 156 > > PortInfoRec > Attr Expected Measured > Num Segments 9 9 > First Paylen 1908 1908 > Last Paylen 148 148 > > So the response on the wire is 100% OK. Thanks Sean. BTW, I did some work here to get this right too. > Now I go to the SA client section: > > From osmtest log I see: > > NodeInfoRec: > Aug 21 14:46:56 [4017F6C0] -> __osmv_send_sa_req: Waiting for async event. > Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: [ > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: [ > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x807b8a4, size = 256. > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquired UMAD 0x807c198, size = 256. > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: ] > Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: Acquired p_madw = 0x807b898, p_mad = 0x807c1d0, size = 256. > Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: ] > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: [ > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88) > Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: [ > Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: ] > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: ] > I wonder how come the received MAD is only of 256 bytes. I expected it to be of headers + data = 56 + 336 = 392byte. > > So my conclusion is that for some reason the response MAD is not re-assembled correctly or the communication between the > assembly to the umad layer is broken. I believe there is something wrong in osm_vendor_ibumad_sa.c in terms of this. I will look into it. Note that the RMPP part of this had little testing and the only consumer right now is osmtest which is just emerging in terms of OpenIB. > Or maybe I am missing some patches. No. > I see that in the osm_vendor_ibumad.c the receive flow is allocating a MAD using: > p_osm_madw = osm_mad_pool_get_wrapper(p_mad_bind_info->p_mad_pool, > p_mad_bind_info, > MAD_BLOCK_SIZE, > (ib_mad_t*)&pRecvMad->IBMad, > &osm_mad_addr); > > I suspect the allocation should use the receive mad size. I don't see that call in osm_vendor_ibumad.c; only in osm_vendor_al.c, osm_vendor_mtl.c, osm_vendor_ts.c, and osm_vendor_umadt.c. I think there is a problem on the receive side of osm_vendor_ibumad_sa.c for RMPP. I am looking into it now. -- Hal From mst at mellanox.co.il Thu Sep 15 05:39:07 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Sep 2005 15:39:07 +0300 Subject: [openib-general] [PATCH] set eq->nent earlier in mthca_create_eq Message-ID: <20050915123907.GE28025@mellanox.co.il> Hi! Since get_eqe uses eq->nent, it seems cleaner to set this field to its proper value before calling get_eqe. Existing code works because ib_alloc_device pre-zeroes the memory that it allocates, so eq->nent is 0. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.13/drivers/infiniband/hw/mthca/mthca_eq.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/hw/mthca/mthca_eq.c 2005-07-31 14:12:06.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/hw/mthca/mthca_eq.c 2005-09-15 17:20:22.000000000 +0300 @@ -479,7 +479,7 @@ static int __devinit mthca_create_eq(str /* Make sure EQ size is aligned to a power of 2 size. */ for (i = 1; i < nent; i <<= 1) ; /* nothing */ - nent = i; + eq->nent = nent = i; eq->dev = dev; @@ -528,8 +528,6 @@ static int __devinit mthca_create_eq(str if (err) goto err_out_free_eq; - eq->nent = nent; - memset(eq_context, 0, sizeof *eq_context); eq_context->flags = cpu_to_be32(MTHCA_EQ_STATUS_OK | MTHCA_EQ_OWNER_HW | -- MST From eitan at mellanox.co.il Thu Sep 15 05:56:30 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 15 Sep 2005 15:56:30 +0300 Subject: [openib-general] [PATCH] osm: osm_vendor_umad osm_vendor_get_all_port_attr bug Message-ID: <86u0gmsd4x.fsf@mtl066.yok.mtl.com> Hi Hal Seems like last patch fixing osmv_transport_init was broken. This fixes it. Thanks Eitan Signed-off-by: Eitan Zahavi Index: libvendor/osm_vendor_mlx_ts.c =================================================================== --- libvendor/osm_vendor_mlx_ts.c (revision 3443) +++ libvendor/osm_vendor_mlx_ts.c (working copy) @@ -177,8 +177,8 @@ __osmv_TOPSPIN_receiver_thr(void* p_ctx) ib_api_status_t osmv_transport_init(IN osm_bind_info_t *p_info, - IN uint8_t hca_idx, IN char hca_id[VENDOR_HCA_MAXNAMES], + IN uint8_t hca_idx, IN osmv_bind_obj_t *p_bo) { cl_status_t cl_st; @@ -195,7 +195,7 @@ osmv_transport_init(IN osm_bind_info_t * /* open TopSpin file device */ /* HACK: assume last char in hostid is the HCA index */ - sprintf(device_file, "/dev/ts_ua%s", hca_idx); + sprintf(device_file, "/dev/ts_ua%u", hca_idx); device_fd = open(device_file, O_RDWR ); if (device_fd < 0) { From halr at voltaire.com Thu Sep 15 06:12:11 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2005 09:12:11 -0400 Subject: [openib-general] Re: [PATCH] osm: osm_vendor_umad osm_vendor_get_all_port_attr bug In-Reply-To: <86u0gmsd4x.fsf@mtl066.yok.mtl.com> References: <86u0gmsd4x.fsf@mtl066.yok.mtl.com> Message-ID: <1126789929.5425.2347.camel@hal.voltaire.com> On Thu, 2005-09-15 at 08:56, Eitan Zahavi wrote: > Seems like last patch fixing osmv_transport_init was broken. > This fixes it. OK but this isn't the OpenIB vendor layer (for SA client). -- Hal From halr at voltaire.com Thu Sep 15 06:42:54 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2005 09:42:54 -0400 Subject: [openib-general] Re: [PATCH] Allow setting of NodeDescription In-Reply-To: <20050915105830.GY28025@mellanox.co.il> References: <1126780676.5425.1696.camel@hal.voltaire.com> <20050915105830.GY28025@mellanox.co.il> Message-ID: <1126791600.5425.2462.camel@hal.voltaire.com> On Thu, 2005-09-15 at 06:58, Michael S. Tsirkin wrote: > Quoting Hal Rosenstock : > > Subject: Re: [openib-general] Re: [PATCH] Allow setting of NodeDescription > > > > On Thu, 2005-09-15 at 03:36, Michael S. Tsirkin wrote: > > > Would make more sense to initialize node_desc to an empty string > > > instead of whatever is programmed in the device flash? > > > This creates a way for the remote user to find out that the "echo" > > > script above did not run yet, and try again later. > > > > I think this is the wrong direction to go as the SM may now see many > > nodes without any NodeDescription. I don't think the tradeoff to know > > that it wasn't set this way is worth it on what it potentially causes to > > SM and any management/diagnosic tools. > > Sorry, I dont really understand. Does the spec imply NodeDescription > must have non-zero bytes? If not, maybe the management/diagnosic tools > should be fixed. NodeDescription itself is always 64 bytes. Whether it is null terminated at 0 bytes and the implications of this is another story. This approach could result in nodes not having a 0 byte null terminated NodeDescription. Since this is being used as a convenient administrator (human) node identification string to say it is that node over there, it loses that meaning when it is not filled in. That was the problem I was referring to. It's the human side of using the tools. -- Hal From yipeeyipeeyipeeyipee at yahoo.com Thu Sep 15 06:21:59 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Thu, 15 Sep 2005 13:21:59 +0000 (UTC) Subject: [openib-general] rdma_lat vs. perf_main Message-ID: Hi, I'm trying to measure rdma write latencies. The platforms I'm using are a pair of 3GHz 64bit Xeons, 2GB ram, 2.6.13 vanilla kernel machines. The two machines are connected back-to-back through PCIe memfree DDR Mellanox HCA's (OpenSM handles the initialization of the "fabric"). When using the rdma_lat utility (gen2/trunk/src/userspace/perftest/rdma_lat/) to measure 4K rdma write latencies I get 22.8 usec (one way). Notice that I removed the IBV_SEND_INLINE flag from rdma_lat.c:376 in order to be able to send large non-inlined data. Also ".max_inline_data=size" was changed to ".max_inline_data=128" (line 342) otherwise the qp creation fails. When trying the perf_main utility from Mellanox 4.1 stack (vapi-linux-4_1_0.tgz) I get latencies of 12.8 usec. Is there a reason for the major latency differences? Do the two programs do rdma write latency tests in a fundamentally different way? Am I doing something wrong? The commands I used for rdma_lat are 10.100.1.130] ./rdma_lat 10.100.1.129] ./rdma_lat -s 4096 10.100.1.130 The commands for perf_main are: 10.100.1.129] perf_main --send --test=lat --rdma=write --size=4096 --iter=1000 -a10.100.1.130 10.100.1.130] perf_main --server Thanks, y From dotanb at mellanox.co.il Thu Sep 15 07:14:46 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 15 Sep 2005 17:14:46 +0300 Subject: [openib-general] question about register MR from user level Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30FECA6@mtlexch01.mtl.com> Hi. i try to register 2 types of memory buffers with local write permission enabled: if the memory was declared as: const char ptr{5}; , then the registration can be done if the memory was declared as: const char *ptr = "somthing "; then the registration fails. what is the difference between the 2 memory buffers? should the MR registration verbs handle both of the buffers as read only memory buffers? thanks Dotan Barak Software Verification Engineer Mellanox Technologies LTD Tel: +972-4-9097200 Ext: 231 Fax: +972-4-9593245 P.O. Box 86 Yokneam 20692 ISRAEL. Home: +972-77-8841095 Cell: 052-4222383 [ May the fork be with you ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Thu Sep 15 07:21:41 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Sep 2005 17:21:41 +0300 Subject: [openib-general] Re: [PATCH] Allow setting of NodeDescription In-Reply-To: <1126791600.5425.2462.camel@hal.voltaire.com> References: <1126791600.5425.2462.camel@hal.voltaire.com> Message-ID: <20050915142141.GG28025@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [openib-general] Re: [PATCH] Allow setting of NodeDescription > > On Thu, 2005-09-15 at 06:58, Michael S. Tsirkin wrote: > > Quoting Hal Rosenstock : > > > Subject: Re: [openib-general] Re: [PATCH] Allow setting of > NodeDescription > > > > > > On Thu, 2005-09-15 at 03:36, Michael S. Tsirkin wrote: > > > > Would make more sense to initialize node_desc to an empty string > > > > instead of whatever is programmed in the device flash? > > > > This creates a way for the remote user to find out that the "echo" > > > > script above did not run yet, and try again later. > > [ skip .... ] > > This approach could result in nodes not having a 0 byte null terminated > NodeDescription. I am proposing filling NodeDescription with 0's (thats "", or empty string). Clearly this produces a 0 byte terminated NodeDescription. What am I missing? BTW, it seems obvious that tools should use something like printf("%.64s", NodeDescription) to print it, allowing nodes to use all 64 bytes of the description, rather than assume that the name is at most 63 bit wide. -- MST From mst at mellanox.co.il Thu Sep 15 07:33:15 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 15 Sep 2005 17:33:15 +0300 Subject: [openib-general] Re: rdma_lat vs. perf_main In-Reply-To: References: Message-ID: <20050915143315.GH28025@mellanox.co.il> Quoting yipee : > Subject: rdma_lat vs. perf_main > > Hi, > > I'm trying to measure rdma write latencies. > The platforms I'm using are a pair of 3GHz 64bit Xeons, 2GB ram, 2.6.13 > vanilla > kernel machines. > The two machines are connected back-to-back through PCIe memfree DDR > Mellanox > HCA's (OpenSM handles the initialization of the "fabric"). > > When using the rdma_lat utility > (gen2/trunk/src/userspace/perftest/rdma_lat/) > to measure 4K rdma write latencies I get 22.8 usec (one way). > Notice that I removed the IBV_SEND_INLINE flag from rdma_lat.c:376 in > order to > be able to send large non-inlined data. > Also ".max_inline_data=size" was changed to ".max_inline_data=128" (line > 342) > otherwise the qp creation fails. Hi! 1. Try setting max_inline_data to 0. You really dont need it since you've removed IBV_SEND_INLINE. 2. Try changing the MTU: rdma_lat sets it to IBV_MTU_256. Try IBV_MTU_2048 or IBV_MTU_4096. Let me know how does it go. As a side note, please note that rdma_lat measures memory-to-memory latency. Some people mean other things by latency, e.g. TCP guys often measure the time until an ack is sent, ignoring the time it takes to pass the data from TCP stack to the application buffer. Ack latency would be a lower number than what rdma_lat reports. -- MST From dotanb at mellanox.co.il Thu Sep 15 07:40:52 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Thu, 15 Sep 2005 17:40:52 +0300 Subject: [openib-general] RE: when executing sminfo with a port in down state, there is a r eturn value 0 Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30FECBD@mtlexch01.mtl.com> > > I just tried this and got 255. Can you try this again ? i tried again and got the same result (return value 0). > > It looks like 0 is set because there is some SMInfo response > but I don't > understand how that would be the case. sminfo can use either DR or LR. > This form (without -D) uses LR so that shouldn't work if the port is > down. In either case, the other end wouldn't respond if there is no SM > there. Also, the GUID looks suspicious. there isn't any opensm in the subnet. > > If you can reproduce this, not sure what is different about > your setup. > Is port 2 on host 2 cabled to anything ? > > -- Hal > I have 2 machines (with 23108 HCAs), port 1 is connected to port 1, port 2 on both of the machines doesn't have any cable at all. Thanx Dotan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlentini at netapp.com Thu Sep 15 07:46:00 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 15 Sep 2005 10:46:00 -0400 (EDT) Subject: [openib-general] question about register MR from user level In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30FECA6@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FECA6@mtlexch01.mtl.com> Message-ID: On Thu, 15 Sep 2005, Dotan Barak wrote: > Hi. > > i try to register 2 types of memory buffers with local write permission > enabled: > if the memory was declared as: const char ptr{5}; , > then the registration can be done > > if the memory was declared as: const char *ptr = "somthing "; > then the registration fails. > > what is the difference between the 2 memory buffers? I'm not sure what your problem is, but I'll make this observation: Since buffer #1 is not initialized and buffer #2 is initialized, they will be located in different parts of the process address space. Buffer #1 will go into the bss and #2 will go into the data area. It may be that buffer #2 is placed on a read only page. What is the error you receive? Why are you requesting local write permission on buffers you have declared const, ie. read only? > should the MR registration verbs handle both of the buffers as read only > memory buffers? If you request local write permission, I believe the MR registration will attempt to set that up. If it finds that the memory is not writable, I would expect an error. From iod00d at hp.com Thu Sep 15 08:50:35 2005 From: iod00d at hp.com (Grant Grundler) Date: Thu, 15 Sep 2005 08:50:35 -0700 Subject: [openib-general] could not add HCA InfiniHost0 In-Reply-To: <1126779086.22691.7.camel@QiWang> References: <1126779086.22691.7.camel@QiWang> Message-ID: <20050915155035.GA3013@esmail.cup.hp.com> On Thu, Sep 15, 2005 at 06:11:26PM +0800, QiWang, Chen wrote: > there are some diff: > 02:00.0 --> work > 02:01.0 --> failed > > and first time I install the ib-verbs on node1, It also failed, because > lspci= 02:01.0, an I don not know how i change 02:01.0 to 02:00.0, and > it works fine for me. You can only change it by removing the Mellanox card and re-installing in the other slot. Can you post "lspci -vvs 02:01.0" output from the machine that failed? Can you post "lspci -vvs 02:00.0" output from the machine that worked? grant From rolandd at cisco.com Thu Sep 15 08:59:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 15 Sep 2005 08:59:11 -0700 Subject: [openib-general] Re: [PATCH] Allow setting of NodeDescription In-Reply-To: <20050915073629.GQ28025@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 15 Sep 2005 10:36:29 +0300") References: <52ek7rxljj.fsf@cisco.com> <20050915073629.GQ28025@mellanox.co.il> Message-ID: <527jdiwcds.fsf@cisco.com> Michael> I think echo -n `hostname` mthca0 >> Michael> /sys/class/infiniband/mthca0/node_desc is even more Michael> useful, but thats now up to the user, isnt it? Yes, the whole idea is to leave naming policy in userspace. - R. From rolandd at cisco.com Thu Sep 15 09:00:35 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 15 Sep 2005 09:00:35 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEBF6@mtlexch01.mtl.com> (Jack Morgenstein's message of "Thu, 15 Sep 2005 12:00:18 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEBF6@mtlexch01.mtl.com> Message-ID: <523bo6wcbg.fsf@cisco.com> Jack> The resulting set of NodeDescription strings present in the Jack> SM and SA could then be a race-dependent salad (depending on Jack> the timing of QP0 entering RTS state, SM subnet sweep, and Jack> resetting of the local NodeDescription string). Yes, it's unfortunate. But I don't see any way to handle the situation arising when booting over IB, where a system needs the SM to bring its port to active before it can boot, but where the system doesn't know its host name until after it boots. - R. From rolandd at cisco.com Thu Sep 15 09:02:06 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 15 Sep 2005 09:02:06 -0700 Subject: [openib-general] Re: RFC: struct netdevice changes for IPoIB UC support In-Reply-To: <20050915074407.GS28025@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 15 Sep 2005 10:44:07 +0300") References: <20050915074407.GS28025@mellanox.co.il> Message-ID: <52y85yuxoh.fsf@cisco.com> Michael> Please comment on this approach: if it makes sense to you Michael> guys, I'll try forwarding this to netdev and lkml lists. Seems reasonable to me. It doesn't hurt to see what netdev has to say. From rolandd at cisco.com Thu Sep 15 09:54:15 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 15 Sep 2005 09:54:15 -0700 Subject: [openib-general] Re: [PATCH] set eq->nent earlier in mthca_create_eq In-Reply-To: <20050915123907.GE28025@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 15 Sep 2005 15:39:07 +0300") References: <20050915123907.GE28025@mellanox.co.il> Message-ID: <52slw6uv9k.fsf@cisco.com> Thanks, good catch. How about if we get rid of any confusion by just using eq->nent all the time? And since we're touching the code anyway, we might as well us roundup_pow_of_two()... How's this seem to you? --- infiniband/hw/mthca/mthca_eq.c (revision 3432) +++ infiniband/hw/mthca/mthca_eq.c (working copy) @@ -476,12 +476,8 @@ static int __devinit mthca_create_eq(str int i; u8 status; - /* Make sure EQ size is aligned to a power of 2 size. */ - for (i = 1; i < nent; i <<= 1) - ; /* nothing */ - nent = i; - - eq->dev = dev; + eq->dev = dev; + eq->nent = roundup_pow_of_two(nent); eq->page_list = kmalloc(npages * sizeof *eq->page_list, GFP_KERNEL); @@ -512,7 +508,7 @@ static int __devinit mthca_create_eq(str memset(eq->page_list[i].buf, 0, PAGE_SIZE); } - for (i = 0; i < nent; ++i) + for (i = 0; i < eq->nent; ++i) set_eqe_hw(get_eqe(eq, i)); eq->eqn = mthca_alloc(&dev->eq_table.alloc); @@ -528,8 +524,6 @@ static int __devinit mthca_create_eq(str if (err) goto err_out_free_eq; - eq->nent = nent; - memset(eq_context, 0, sizeof *eq_context); eq_context->flags = cpu_to_be32(MTHCA_EQ_STATUS_OK | MTHCA_EQ_OWNER_HW | @@ -538,7 +532,7 @@ static int __devinit mthca_create_eq(str if (mthca_is_memfree(dev)) eq_context->flags |= cpu_to_be32(MTHCA_EQ_STATE_ARBEL); - eq_context->logsize_usrpage = cpu_to_be32((ffs(nent) - 1) << 24); + eq_context->logsize_usrpage = cpu_to_be32((ffs(eq->nent) - 1) << 24); if (mthca_is_memfree(dev)) { eq_context->arbel_pd = cpu_to_be32(dev->driver_pd.pd_num); } else { @@ -569,7 +563,7 @@ static int __devinit mthca_create_eq(str dev->eq_table.arm_mask |= eq->eqn_mask; mthca_dbg(dev, "Allocated EQ %d with %d entries\n", - eq->eqn, nent); + eq->eqn, eq->nent); return err; From ftillier at silverstorm.com Thu Sep 15 10:10:42 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Thu, 15 Sep 2005 10:10:42 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <523bo6wcbg.fsf@cisco.com> Message-ID: <006001c5ba18$684dfc90$9e5aa8c0@infiniconsys.com> > From: Roland Dreier [mailto:rolandd at cisco.com] > Sent: Thursday, September 15, 2005 9:01 AM > > Jack> The resulting set of NodeDescription strings present in the > Jack> SM and SA could then be a race-dependent salad (depending on > Jack> the timing of QP0 entering RTS state, SM subnet sweep, and > Jack> resetting of the local NodeDescription string). > > Yes, it's unfortunate. > > But I don't see any way to handle the situation arising when booting > over IB, where a system needs the SM to bring its port to active > before it can boot, but where the system doesn't know its host name > until after it boots. What happens during the handoff from the boot environment to the OS? Does the HCA get disabled and then the mthca driver starts fresh? Or does the mthca driver inherit a device that is already fully initialized. If it gets re-initialized, don't the ports go down when the boot agent shuts down (and the SM should get a GID out of service trap), followed by the ports going up when mthca starts? Or is the problem that the boot driver doesn't know when the handoff is, and thus can't disable the device? Thanks, - Fab From rolandd at cisco.com Thu Sep 15 10:17:08 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 15 Sep 2005 10:17:08 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <006001c5ba18$684dfc90$9e5aa8c0@infiniconsys.com> (Fab Tillier's message of "Thu, 15 Sep 2005 10:10:42 -0700") References: <006001c5ba18$684dfc90$9e5aa8c0@infiniconsys.com> Message-ID: <52oe6uuu7f.fsf@cisco.com> Fab> What happens during the handoff from the boot environment to Fab> the OS? Does the HCA get disabled and then the mthca driver Fab> starts fresh? Or does the mthca driver inherit a device that Fab> is already fully initialized. If it gets re-initialized, Fab> don't the ports go down when the boot agent shuts down (and Fab> the SM should get a GID out of service trap), followed by the Fab> ports going up when mthca starts? Or is the problem that the Fab> boot driver doesn't know when the handoff is, and thus can't Fab> disable the device? After the kernel takes over, mthca will reset the HCA and of course the SM will have to bring the port back up. But at the point that mthca is loaded, the system typically won't have a hostname set. The kernel will need to have the HCA port active with the mthca driver running before it can mount root and get to /etc/sysconfig/network or wherever the hostname is set. - R. From sean.hefty at intel.com Thu Sep 15 10:35:48 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 15 Sep 2005 10:35:48 -0700 Subject: [openib-general] [PATCH] [CM] 1/5 per device communication identifiers Message-ID: The following patch binds communication identifiers to a device. It exports per HCA devices to userspace. Signed-off-by: Sean Hefty Index: userspace/libibcm/include/infiniband/cm_abi.h =================================================================== --- userspace/libibcm/include/infiniband/cm_abi.h (revision 3433) +++ userspace/libibcm/include/infiniband/cm_abi.h (working copy) @@ -42,7 +42,8 @@ * drivers/infiniband/include/ib_user_cm.h */ -#define IB_USER_CM_ABI_VERSION 2 +#define IB_USER_CM_MIN_ABI_VERSION 3 +#define IB_USER_CM_MAX_ABI_VERSION 3 enum { IB_USER_CM_CMD_CREATE_ID, @@ -303,8 +304,6 @@ struct cm_abi_event_get { }; struct cm_abi_req_event_resp { - /* device */ - /* port */ struct cm_abi_path_rec primary_path; struct cm_abi_path_rec alternate_path; __u64 remote_ca_guid; @@ -320,6 +319,7 @@ struct cm_abi_req_event_resp { __u8 retry_count; __u8 rnr_retry_count; __u8 srq; + __u8 port; }; struct cm_abi_rep_event_resp { @@ -357,10 +357,9 @@ struct cm_abi_apr_event_resp { }; struct cm_abi_sidr_req_event_resp { - /* device */ - /* port */ __u16 pkey; - __u8 reserved[2]; + __u8 port; + __u8 reserved; }; struct cm_abi_sidr_rep_event_resp { Index: userspace/libibcm/include/infiniband/cm.h =================================================================== --- userspace/libibcm/include/infiniband/cm.h (revision 3433) +++ userspace/libibcm/include/infiniband/cm.h (working copy) @@ -77,13 +77,21 @@ enum ib_cm_data_size { IB_CM_SIDR_REP_INFO_LENGTH = 72 }; +struct ib_cm_device { + uint64_t guid; + int fd; +}; + struct ib_cm_id { void *context; + struct ibv_context *device_context; + struct ib_cm_device *device; uint32_t handle; }; struct ib_cm_req_event_param { struct ib_cm_id *listen_id; + uint8_t port; struct ib_sa_path_rec *primary_path; struct ib_sa_path_rec *alternate_path; @@ -193,7 +201,6 @@ struct ib_cm_apr_event_param { struct ib_cm_sidr_req_event_param { struct ib_cm_id *listen_id; - struct ib_device *device; uint8_t port; uint16_t pkey; }; @@ -239,6 +246,7 @@ struct ib_cm_event { /** * ib_cm_get_event - Retrieves the next pending communications event, * if no event is pending waits for an event. + * @device: CM device to retrieve the event. * @event: Allocated information about the next communication event. * Event should be freed using ib_cm_ack_event() * @@ -249,19 +257,7 @@ struct ib_cm_event { * IB_CM_REQ_RECEIVED and all other events, the returned @cm_id corresponds * to a user's existing communication identifier. */ -int ib_cm_get_event(struct ib_cm_event **event); - -/** - * ib_cm_get_event_timed - Retrieves the next pending communications event, - * if no event is pending wait up to a certain timeout for an event. - * @timeout_ms: Maximum time in milliseconds to wait for an event. - * @event: Allocated information about the next communication event. - * Event should be freed using ib_cm_ack_event() - * - * If timeout expires without an event, the error -ETIMEDOUT will be - * returned - */ -int ib_cm_get_event_timed(int timeout_ms, struct ib_cm_event **event); +int ib_cm_get_event(struct ib_cm_device *device, struct ib_cm_event **event); /** * ib_cm_ack_event - Free a communications event. @@ -272,19 +268,21 @@ int ib_cm_get_event_timed(int timeout_ms * and puts. */ int ib_cm_ack_event(struct ib_cm_event *event); - + /** - * ib_cm_get_fd - Returns the file descriptor which the CM uses to - * submit requests and retrieve events. + * ib_cm_get_device - Returns the device the CM uses to submit requests + * and retrieve events that corresponds to the specified verbs device. * - * The primary use of the file descriptor is to test for CM readiness - * events. When the CM becomes ready to READ there is a pending event - * ready, and a subsequent call to ib_cm_get_event will not block. + * The CM device contains the file descriptor that the CM uses to + * communicate with the kernel CM component. The primary use of the + * file descriptor is to test for CM readiness events. When the CM + * becomes ready to READ there is a pending event ready, and a subsequent + * call to ib_cm_get_event will not block. * Note: The user should not read or write directly to the CM file * descriptor, it will likely result in an error or unexpected * results. */ -int ib_cm_get_fd(void); +struct ib_cm_device* ib_cm_get_device(struct ibv_context *device_context); /** * ib_cm_create_id - Allocate a communication identifier. @@ -292,7 +290,8 @@ int ib_cm_get_fd(void); * Communication identifiers are used to track connection states, service * ID resolution requests, and listen requests. */ -int ib_cm_create_id(struct ib_cm_id **cm_id, void *context); +int ib_cm_create_id(struct ibv_context *device_context, + struct ib_cm_id **cm_id, void *context); /** * ib_cm_destroy_id - Destroy a connection identifier. Index: userspace/libibcm/src/cm.c =================================================================== --- userspace/libibcm/src/cm.c (revision 3433) +++ userspace/libibcm/src/cm.c (working copy) @@ -47,12 +47,21 @@ #include #include #include +#include +#include #include #include -#define IB_UCM_DEV_PATH "/dev/infiniband/ucm" -#define PFX "libucm: " +#define PFX "libibcm: " + +#if __BYTE_ORDER == __LITTLE_ENDIAN +static inline uint64_t htonll(uint64_t x) { return bswap_64(x); } +static inline uint64_t ntohll(uint64_t x) { return bswap_64(x); } +#else +static inline uint64_t htonll(uint64_t x) { return x; } +static inline uint64_t ntohll(uint64_t x) { return x; } +#endif #define CM_CREATE_MSG_CMD_RESP(msg, cmd, resp, type, size) \ do { \ @@ -97,19 +106,164 @@ struct cm_id_private { pthread_mutex_t mut; }; -static int fd; +static struct dlist *device_list; #define container_of(ptr, type, field) \ ((type *) ((void *)ptr - offsetof(type, field))) +static int check_abi_version(void) +{ + char path[256]; + char val[16]; + int abi_ver; + + if (sysfs_get_mnt_path(path, sizeof path)) { + fprintf(stderr, PFX "couldn't find sysfs mount.\n"); + return -1; + } + + strncat(path, "/class/infiniband_cm/abi_version", sizeof path); + if (sysfs_read_attribute_value(path, val, sizeof val)) { + fprintf(stderr, PFX "couldn't read ucm ABI version.\n"); + return -1; + } + + abi_ver = strtol(val, NULL, 10); + if (abi_ver < IB_USER_CM_MIN_ABI_VERSION || + abi_ver > IB_USER_CM_MAX_ABI_VERSION) { + fprintf(stderr, PFX "kernel ABI version %d " + "doesn't match library version %d.\n", + abi_ver, IB_USER_CM_MAX_ABI_VERSION); + return -1; + } + return 0; +} + +static uint64_t get_device_guid(struct sysfs_class_device *ibdev) +{ + struct sysfs_attribute *attr; + uint64_t guid = 0; + uint16_t parts[4]; + int i; + + attr = sysfs_get_classdev_attr(ibdev, "node_guid"); + if (!attr) + return 0; + + if (sscanf(attr->value, "%hx:%hx:%hx:%hx", + parts, parts + 1, parts + 2, parts + 3) != 4) + return 0; + + for (i = 0; i < 4; ++i) + guid = (guid << 16) | parts[i]; + + return htonll(guid); +} + +static struct ib_cm_device* open_device(struct sysfs_class_device *cm_dev) +{ + struct sysfs_class_device *ib_dev; + struct sysfs_attribute *attr; + struct ib_cm_device *dev; + char ibdev_name[64]; + char *devpath; + + dev = malloc(sizeof *dev); + if (!dev) + return NULL; + + attr = sysfs_get_classdev_attr(cm_dev, "ibdev"); + if (!attr) { + fprintf(stderr, PFX "no ibdev class attr for %s\n", + cm_dev->name); + goto err; + } + + sscanf(attr->value, "%63s", ibdev_name); + ib_dev = sysfs_open_class_device("infiniband", ibdev_name); + if (!ib_dev) + goto err; + + dev->guid = get_device_guid(ib_dev); + sysfs_close_class_device(ib_dev); + if (!dev->guid) + goto err; + + asprintf(&devpath, "/dev/infiniband/%s", cm_dev->name); + dev->fd = open(devpath, O_RDWR); + if (dev->fd < 0) { + fprintf(stderr, PFX "error <%d:%d> opening device <%s>\n", + dev->fd, errno, devpath); + goto err; + } + return dev; +err: + free(dev); + return NULL; +} + static void __attribute__((constructor)) ib_cm_init(void) { - fd = open(IB_UCM_DEV_PATH, O_RDWR); - if (fd < 0) - fprintf(stderr, PFX - "Error <%d:%d> couldn't open IB cm device <%s>\n", - fd, errno, IB_UCM_DEV_PATH); + struct sysfs_class *cls; + struct dlist *cm_dev_list; + struct sysfs_class_device *cm_dev; + struct ib_cm_device *dev; + + device_list = dlist_new(sizeof(struct ib_cm_device)); + if (!device_list) { + fprintf(stderr, PFX "couldn't allocate device list.\n"); + abort(); + } + + cls = sysfs_open_class("infiniband_cm"); + if (!cls) { + fprintf(stderr, PFX "couldn't open 'infiniband_cm'.\n"); + goto err; + } + + if (check_abi_version()) + goto err; + cm_dev_list = sysfs_get_class_devices(cls); + if (!cm_dev_list) { + fprintf(stderr, PFX "no class devices found.\n"); + goto err; + } + + dlist_for_each_data(cm_dev_list, cm_dev, struct sysfs_class_device) { + dev = open_device(cm_dev); + if (dev) + dlist_push(device_list, dev); + } + return; +err: + sysfs_close_class(cls); +} + +static void __attribute__((destructor)) ib_cm_fini(void) +{ + struct ib_cm_device *dev; + + if (!device_list) + return; + + dlist_for_each_data(device_list, dev, struct ib_cm_device) + close(dev->fd); + + dlist_destroy(device_list); +} + +struct ib_cm_device* ib_cm_get_device(struct ibv_context *device_context) +{ + struct ib_cm_device *dev; + uint64_t guid; + + guid = ibv_get_device_guid(device_context->device); + dlist_for_each_data(device_list, dev, struct ib_cm_device) + if (dev->guid == guid) + return dev; + + return NULL; } static void cm_param_path_get(struct cm_abi_path_rec *abi, @@ -146,7 +300,8 @@ static void ib_cm_free_id(struct cm_id_p free(cm_id_priv); } -static struct cm_id_private *ib_cm_alloc_id(void *context) +static struct cm_id_private *ib_cm_alloc_id(struct ibv_context *device_context, + void *context) { struct cm_id_private *cm_id_priv; @@ -155,18 +310,24 @@ static struct cm_id_private *ib_cm_alloc return NULL; memset(cm_id_priv, 0, sizeof *cm_id_priv); + cm_id_priv->id.device_context = device_context; cm_id_priv->id.context = context; pthread_mutex_init(&cm_id_priv->mut, NULL); if (pthread_cond_init(&cm_id_priv->cond, NULL)) goto err; + cm_id_priv->id.device = ib_cm_get_device(device_context); + if (!cm_id_priv->id.device) + goto err; + return cm_id_priv; err: ib_cm_free_id(cm_id_priv); return NULL; } -int ib_cm_create_id(struct ib_cm_id **cm_id, void *context) +int ib_cm_create_id(struct ibv_context *device_context, + struct ib_cm_id **cm_id, void *context) { struct cm_abi_create_id_resp *resp; struct cm_abi_create_id *cmd; @@ -175,14 +336,14 @@ int ib_cm_create_id(struct ib_cm_id **cm int result; int size; - cm_id_priv = ib_cm_alloc_id(context); + cm_id_priv = ib_cm_alloc_id(device_context, context); if (!cm_id_priv) return -ENOMEM; CM_CREATE_MSG_CMD_RESP(msg, cmd, resp, IB_USER_CM_CMD_CREATE_ID, size); cmd->uid = (uintptr_t) cm_id_priv; - result = write(fd, msg, size); + result = write(cm_id_priv->id.device->fd, msg, size); if (result != size) goto err; @@ -206,7 +367,7 @@ int ib_cm_destroy_id(struct ib_cm_id *cm CM_CREATE_MSG_CMD_RESP(msg, cmd, resp, IB_USER_CM_CMD_DESTROY_ID, size); cmd->id = cm_id->handle; - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -235,7 +396,7 @@ int ib_cm_attr_id(struct ib_cm_id *cm_id CM_CREATE_MSG_CMD_RESP(msg, cmd, resp, IB_USER_CM_CMD_ATTR_ID, size); cmd->id = cm_id->handle; - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -317,7 +478,7 @@ int ib_cm_init_qp_attr(struct ib_cm_id * cmd->id = cm_id->handle; cmd->qp_state = qp_attr->qp_state; - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -341,7 +502,7 @@ int ib_cm_listen(struct ib_cm_id *cm_id, cmd->service_id = service_id; cmd->service_mask = service_mask; - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -400,7 +561,7 @@ int ib_cm_send_req(struct ib_cm_id *cm_i cmd->len = param->private_data_len; } - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -435,7 +596,7 @@ int ib_cm_send_rep(struct ib_cm_id *cm_i cmd->len = param->private_data_len; } - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -460,7 +621,7 @@ static inline int cm_send_private_data(s cmd->len = private_data_len; } - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -501,7 +662,7 @@ int ib_cm_establish(struct ib_cm_id *cm_ CM_CREATE_MSG_CMD(msg, cmd, IB_USER_CM_CMD_ESTABLISH, size); cmd->id = cm_id->handle; - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -535,7 +696,7 @@ static inline int cm_send_status(struct cmd->info_len = info_length; } - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -585,7 +746,7 @@ int ib_cm_send_mra(struct ib_cm_id *cm_i cmd->len = private_data_len; } - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -620,7 +781,7 @@ int ib_cm_send_lap(struct ib_cm_id *cm_i cmd->len = private_data_len; } - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -660,7 +821,7 @@ int ib_cm_send_sidr_req(struct ib_cm_id cmd->len = param->private_data_len; } - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -694,7 +855,7 @@ int ib_cm_send_sidr_rep(struct ib_cm_id cmd->info_len = param->info_length; } - result = write(fd, msg, size); + result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; @@ -750,6 +911,7 @@ static void cm_event_req_get(struct ib_c ureq->retry_count = kreq->retry_count; ureq->rnr_retry_count = kreq->rnr_retry_count; ureq->srq = kreq->srq; + ureq->port = kreq->port; cm_event_path_get(ureq->primary_path, &kreq->primary_path); cm_event_path_get(ureq->alternate_path, &kreq->alternate_path); @@ -779,7 +941,7 @@ static void cm_event_sidr_rep_get(struct urep->qpn = krep->qpn; }; -int ib_cm_get_event(struct ib_cm_event **event) +int ib_cm_get_event(struct ib_cm_device *device, struct ib_cm_event **event) { struct cm_id_private *cm_id_priv; struct cm_abi_cmd_hdr *hdr; @@ -832,7 +994,7 @@ int ib_cm_get_event(struct ib_cm_event * cmd->data = (uintptr_t) data; cmd->info = (uintptr_t) info; - result = write(fd, msg, size); + result = write(device->fd, msg, size); if (result != size) { result = (result > 0) ? -ENODATA : result; goto done; @@ -868,7 +1030,8 @@ int ib_cm_get_event(struct ib_cm_event * switch (evt->event) { case IB_CM_REQ_RECEIVED: evt->param.req_rcvd.listen_id = evt->cm_id; - cm_id_priv = ib_cm_alloc_id(evt->cm_id->context); + cm_id_priv = ib_cm_alloc_id(evt->cm_id->device_context, + evt->cm_id->context); if (!cm_id_priv) { result = -ENOMEM; goto done; @@ -905,7 +1068,8 @@ int ib_cm_get_event(struct ib_cm_event * break; case IB_CM_SIDR_REQ_RECEIVED: evt->param.sidr_req_rcvd.listen_id = evt->cm_id; - cm_id_priv = ib_cm_alloc_id(evt->cm_id->context); + cm_id_priv = ib_cm_alloc_id(evt->cm_id->device_context, + evt->cm_id->context); if (!cm_id_priv) { result = -ENOMEM; goto done; @@ -913,6 +1077,7 @@ int ib_cm_get_event(struct ib_cm_event * cm_id_priv->id.handle = resp->id; evt->cm_id = &cm_id_priv->id; evt->param.sidr_req_rcvd.pkey = resp->u.sidr_req_resp.pkey; + evt->param.sidr_req_rcvd.port = resp->u.sidr_req_resp.port; break; case IB_CM_SIDR_REP_RECEIVED: cm_event_sidr_rep_get(&evt->param.sidr_rep_rcvd, @@ -998,32 +1163,3 @@ int ib_cm_ack_event(struct ib_cm_event * free(event); return 0; } - -int ib_cm_get_fd(void) -{ - return fd; -} - -int ib_cm_get_event_timed(int timeout_ms, struct ib_cm_event **event) -{ - struct pollfd ufds; - int result; - - if (!event) - return -EINVAL; - - ufds.fd = ib_cm_get_fd(); - ufds.events = POLLIN; - ufds.revents = 0; - - *event = NULL; - - result = poll(&ufds, 1, timeout_ms); - if (!result) - return -ETIMEDOUT; - - if (result < 0) - return result; - - return ib_cm_get_event(event); -} Index: userspace/libibcm/examples/cmpost.c =================================================================== --- userspace/libibcm/examples/cmpost.c (revision 3433) +++ userspace/libibcm/examples/cmpost.c (working copy) @@ -307,7 +307,7 @@ static int init_node(struct cmtest_node int cqe, ret; if (!is_server) { - ret = ib_cm_create_id(&node->cm_id, node); + ret = ib_cm_create_id(test.verbs, &node->cm_id, node); if (ret) { printf("failed to create cm_id: %d\n", ret); return ret; @@ -526,7 +526,7 @@ static void connect_events(void) int err = 0; while (test.connects_left && !err) { - err = ib_cm_get_event(&event); + err = ib_cm_get_event(ib_cm_get_device(test.verbs), &event); if (!err) { cm_handler(event->cm_id, event); ib_cm_ack_event(event); @@ -540,7 +540,7 @@ static void disconnect_events(void) int err = 0; while (test.disconnects_left && !err) { - err = ib_cm_get_event(&event); + err = ib_cm_get_event(ib_cm_get_device(test.verbs), &event); if (!err) { cm_handler(event->cm_id, event); ib_cm_ack_event(event); @@ -554,7 +554,7 @@ static void run_server(void) int i, ret; printf("starting server\n"); - if (ib_cm_create_id(&listen_id, &test)) { + if (ib_cm_create_id(test.verbs, &listen_id, &test)) { printf("listen request failed\n"); return; } Index: linux-kernel/infiniband/include/rdma/ib_cm.h =================================================================== --- linux-kernel/infiniband/include/rdma/ib_cm.h (revision 3433) +++ linux-kernel/infiniband/include/rdma/ib_cm.h (working copy) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2004 Intel Corporation. All rights reserved. + * Copyright (c) 2004, 2005 Intel Corporation. All rights reserved. * Copyright (c) 2004 Topspin Corporation. All rights reserved. * Copyright (c) 2004 Voltaire Corporation. All rights reserved. * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved. @@ -109,7 +109,6 @@ struct ib_cm_id; struct ib_cm_req_event_param { struct ib_cm_id *listen_id; - struct ib_device *device; u8 port; struct ib_sa_path_rec *primary_path; @@ -220,7 +219,6 @@ struct ib_cm_apr_event_param { struct ib_cm_sidr_req_event_param { struct ib_cm_id *listen_id; - struct ib_device *device; u8 port; u16 pkey; }; @@ -284,6 +282,7 @@ typedef int (*ib_cm_handler)(struct ib_c struct ib_cm_id { ib_cm_handler cm_handler; void *context; + struct ib_device *device; __be64 service_id; __be64 service_mask; enum ib_cm_state state; /* internal CM/debug use */ @@ -295,6 +294,8 @@ struct ib_cm_id { /** * ib_create_cm_id - Allocate a communication identifier. + * @device: Device associated with the cm_id. All related communication will + * be associated with the specified device. * @cm_handler: Callback invoked to notify the user of CM events. * @context: User specified context associated with the communication * identifier. @@ -302,7 +303,8 @@ struct ib_cm_id { * Communication identifiers are used to track connection states, service * ID resolution requests, and listen requests. */ -struct ib_cm_id *ib_create_cm_id(ib_cm_handler cm_handler, +struct ib_cm_id *ib_create_cm_id(struct ib_device *device, + ib_cm_handler cm_handler, void *context); /** Index: linux-kernel/infiniband/include/rdma/ib_user_cm.h =================================================================== --- linux-kernel/infiniband/include/rdma/ib_user_cm.h (revision 3433) +++ linux-kernel/infiniband/include/rdma/ib_user_cm.h (working copy) @@ -38,7 +38,7 @@ #include -#define IB_USER_CM_ABI_VERSION 2 +#define IB_USER_CM_ABI_VERSION 3 enum { IB_USER_CM_CMD_CREATE_ID, @@ -299,8 +299,6 @@ struct ib_ucm_event_get { }; struct ib_ucm_req_event_resp { - /* device */ - /* port */ struct ib_ucm_path_rec primary_path; struct ib_ucm_path_rec alternate_path; __be64 remote_ca_guid; @@ -316,6 +314,7 @@ struct ib_ucm_req_event_resp { __u8 retry_count; __u8 rnr_retry_count; __u8 srq; + __u8 port; }; struct ib_ucm_rep_event_resp { @@ -353,10 +352,9 @@ struct ib_ucm_apr_event_resp { }; struct ib_ucm_sidr_req_event_resp { - /* device */ - /* port */ __u16 pkey; - __u8 reserved[2]; + __u8 port; + __u8 reserved; }; struct ib_ucm_sidr_rep_event_resp { Index: linux-kernel/infiniband/core/cm.c =================================================================== --- linux-kernel/infiniband/core/cm.c (revision 3433) +++ linux-kernel/infiniband/core/cm.c (working copy) @@ -366,9 +366,15 @@ static struct cm_id_private * cm_insert_ cur_cm_id_priv = rb_entry(parent, struct cm_id_private, service_node); if ((cur_cm_id_priv->id.service_mask & service_id) == - (service_mask & cur_cm_id_priv->id.service_id)) - return cm_id_priv; - if (service_id < cur_cm_id_priv->id.service_id) + (service_mask & cur_cm_id_priv->id.service_id) && + (cm_id_priv->id.device == cur_cm_id_priv->id.device)) + return cur_cm_id_priv; + + if (cm_id_priv->id.device < cur_cm_id_priv->id.device) + link = &(*link)->rb_left; + else if (cm_id_priv->id.device > cur_cm_id_priv->id.device) + link = &(*link)->rb_right; + else if (service_id < cur_cm_id_priv->id.service_id) link = &(*link)->rb_left; else link = &(*link)->rb_right; @@ -378,7 +384,8 @@ static struct cm_id_private * cm_insert_ return NULL; } -static struct cm_id_private * cm_find_listen(__be64 service_id) +static struct cm_id_private * cm_find_listen(struct ib_device *device, + __be64 service_id) { struct rb_node *node = cm.listen_service_table.rb_node; struct cm_id_private *cm_id_priv; @@ -386,9 +393,15 @@ static struct cm_id_private * cm_find_li while (node) { cm_id_priv = rb_entry(node, struct cm_id_private, service_node); if ((cm_id_priv->id.service_mask & service_id) == - (cm_id_priv->id.service_mask & cm_id_priv->id.service_id)) + cm_id_priv->id.service_id && + (cm_id_priv->id.device == device)) return cm_id_priv; - if (service_id < cm_id_priv->id.service_id) + + if (device < cm_id_priv->id.device) + node = node->rb_left; + else if (device > cm_id_priv->id.device) + node = node->rb_right; + else if (service_id < cm_id_priv->id.service_id) node = node->rb_left; else node = node->rb_right; @@ -523,7 +536,8 @@ static void cm_reject_sidr_req(struct cm ib_send_cm_sidr_rep(&cm_id_priv->id, ¶m); } -struct ib_cm_id *ib_create_cm_id(ib_cm_handler cm_handler, +struct ib_cm_id *ib_create_cm_id(struct ib_device *device, + ib_cm_handler cm_handler, void *context) { struct cm_id_private *cm_id_priv; @@ -535,6 +549,7 @@ struct ib_cm_id *ib_create_cm_id(ib_cm_h memset(cm_id_priv, 0, sizeof *cm_id_priv); cm_id_priv->id.state = IB_CM_IDLE; + cm_id_priv->id.device = device; cm_id_priv->id.cm_handler = cm_handler; cm_id_priv->id.context = context; cm_id_priv->id.remote_cm_qpn = 1; @@ -1047,7 +1062,6 @@ static void cm_format_req_event(struct c req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad; param = &work->cm_event.param.req_rcvd; param->listen_id = listen_id; - param->device = cm_id_priv->av.port->mad_agent->device; param->port = cm_id_priv->av.port->port_num; param->primary_path = &work->path[0]; if (req_msg->alt_local_lid) @@ -1226,7 +1240,8 @@ static struct cm_id_private * cm_match_r } /* Find matching listen request. */ - listen_cm_id_priv = cm_find_listen(req_msg->service_id); + listen_cm_id_priv = cm_find_listen(cm_id_priv->id.device, + req_msg->service_id); if (!listen_cm_id_priv) { spin_unlock_irqrestore(&cm.lock, flags); cm_issue_rej(work->port, work->mad_recv_wc, @@ -1254,7 +1269,7 @@ static int cm_req_handler(struct cm_work req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad; - cm_id = ib_create_cm_id(NULL, NULL); + cm_id = ib_create_cm_id(work->port->cm_dev->device, NULL, NULL); if (IS_ERR(cm_id)) return PTR_ERR(cm_id); @@ -2629,7 +2644,6 @@ static void cm_format_sidr_req_event(str param = &work->cm_event.param.sidr_req_rcvd; param->pkey = __be16_to_cpu(sidr_req_msg->pkey); param->listen_id = listen_id; - param->device = work->port->mad_agent->device; param->port = work->port->port_num; work->cm_event.private_data = &sidr_req_msg->private_data; } @@ -2642,7 +2656,7 @@ static int cm_sidr_req_handler(struct cm struct ib_wc *wc; unsigned long flags; - cm_id = ib_create_cm_id(NULL, NULL); + cm_id = ib_create_cm_id(work->port->cm_dev->device, NULL, NULL); if (IS_ERR(cm_id)) return PTR_ERR(cm_id); cm_id_priv = container_of(cm_id, struct cm_id_private, id); @@ -2666,7 +2680,8 @@ static int cm_sidr_req_handler(struct cm spin_unlock_irqrestore(&cm.lock, flags); goto out; /* Duplicate message. */ } - cur_cm_id_priv = cm_find_listen(sidr_req_msg->service_id); + cur_cm_id_priv = cm_find_listen(cm_id->device, + sidr_req_msg->service_id); if (!cur_cm_id_priv) { rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table); spin_unlock_irqrestore(&cm.lock, flags); Index: linux-kernel/infiniband/core/ucm.c =================================================================== --- linux-kernel/infiniband/core/ucm.c (revision 3433) +++ linux-kernel/infiniband/core/ucm.c (working copy) @@ -52,12 +52,20 @@ MODULE_AUTHOR("Libor Michalek"); MODULE_DESCRIPTION("InfiniBand userspace Connection Manager access"); MODULE_LICENSE("Dual BSD/GPL"); +struct ib_ucm_device { + int devnum; + struct cdev dev; + struct class_device class_dev; + struct ib_device *ib_dev; +}; + struct ib_ucm_file { struct semaphore mutex; struct file *filp; + struct ib_ucm_device *device; - struct list_head ctxs; /* list of active connections */ - struct list_head events; /* list of pending events */ + struct list_head ctxs; + struct list_head events; wait_queue_head_t poll_wait; }; @@ -90,14 +98,24 @@ struct ib_ucm_event { enum { IB_UCM_MAJOR = 231, - IB_UCM_MINOR = 255 + IB_UCM_BASE_MINOR = 224, + IB_UCM_MAX_DEVICES = 32 }; -#define IB_UCM_DEV MKDEV(IB_UCM_MAJOR, IB_UCM_MINOR) +#define IB_UCM_BASE_DEV MKDEV(IB_UCM_MAJOR, IB_UCM_BASE_MINOR) -static struct semaphore ctx_id_mutex; -static struct idr ctx_id_table; +static void ib_ucm_add_one(struct ib_device *device); +static void ib_ucm_remove_one(struct ib_device *device); +static struct ib_client ucm_client = { + .name = "ucm", + .add = ib_ucm_add_one, + .remove = ib_ucm_remove_one +}; + +DECLARE_MUTEX(ctx_id_mutex); +DEFINE_IDR(ctx_id_table); +static DECLARE_BITMAP(dev_map, IB_UCM_MAX_DEVICES); static struct ib_ucm_context *ib_ucm_ctx_get(struct ib_ucm_file *file, int id) { @@ -184,10 +202,7 @@ error: kfree(ctx); return NULL; } -/* - * Event portion of the API, handle CM events - * and allow event polling. - */ + static void ib_ucm_event_path_get(struct ib_ucm_path_rec *upath, struct ib_sa_path_rec *kpath) { @@ -234,6 +249,7 @@ static void ib_ucm_event_req_get(struct ureq->retry_count = kreq->retry_count; ureq->rnr_retry_count = kreq->rnr_retry_count; ureq->srq = kreq->srq; + ureq->port = kreq->port; ib_ucm_event_path_get(&ureq->primary_path, kreq->primary_path); ib_ucm_event_path_get(&ureq->alternate_path, kreq->alternate_path); @@ -320,6 +336,8 @@ static int ib_ucm_event_process(struct i case IB_CM_SIDR_REQ_RECEIVED: uvt->resp.u.sidr_req_resp.pkey = evt->param.sidr_req_rcvd.pkey; + uvt->resp.u.sidr_req_resp.port = + evt->param.sidr_req_rcvd.port; uvt->data_len = IB_CM_SIDR_REQ_PRIVATE_DATA_SIZE; break; case IB_CM_SIDR_REP_RECEIVED: @@ -412,9 +430,7 @@ static ssize_t ib_ucm_event(struct ib_uc if (copy_from_user(&cmd, inbuf, sizeof(cmd))) return -EFAULT; - /* - * wait - */ + down(&file->mutex); while (list_empty(&file->events)) { @@ -496,7 +512,6 @@ done: return result; } - static ssize_t ib_ucm_create_id(struct ib_ucm_file *file, const char __user *inbuf, int in_len, int out_len) @@ -519,29 +534,27 @@ static ssize_t ib_ucm_create_id(struct i return -ENOMEM; ctx->uid = cmd.uid; - ctx->cm_id = ib_create_cm_id(ib_ucm_event_handler, ctx); + ctx->cm_id = ib_create_cm_id(file->device->ib_dev, + ib_ucm_event_handler, ctx); if (IS_ERR(ctx->cm_id)) { result = PTR_ERR(ctx->cm_id); - goto err; + goto err1; } resp.id = ctx->id; if (copy_to_user((void __user *)(unsigned long)cmd.response, &resp, sizeof(resp))) { result = -EFAULT; - goto err; + goto err2; } - return 0; -err: +err2: + ib_destroy_cm_id(ctx->cm_id); +err1: down(&ctx_id_mutex); idr_remove(&ctx_id_table, ctx->id); up(&ctx_id_mutex); - - if (!IS_ERR(ctx->cm_id)) - ib_destroy_cm_id(ctx->cm_id); - kfree(ctx); return result; } @@ -1253,6 +1266,7 @@ static int ib_ucm_open(struct inode *ino filp->private_data = file; file->filp = filp; + file->device = container_of(inode->i_cdev, struct ib_ucm_device, dev); return 0; } @@ -1283,7 +1297,17 @@ static int ib_ucm_close(struct inode *in return 0; } -static struct file_operations ib_ucm_fops = { +static void ib_ucm_release_class_dev(struct class_device *class_dev) +{ + struct ib_ucm_device *dev; + + dev = container_of(class_dev, struct ib_ucm_device, class_dev); + cdev_del(&dev->dev); + clear_bit(dev->devnum, dev_map); + kfree(dev); +} + +static struct file_operations ucm_fops = { .owner = THIS_MODULE, .open = ib_ucm_open, .release = ib_ucm_close, @@ -1291,55 +1315,141 @@ static struct file_operations ib_ucm_fop .poll = ib_ucm_poll, }; +static struct class ucm_class = { + .name = "infiniband_cm", + .release = ib_ucm_release_class_dev +}; -static struct class *ib_ucm_class; -static struct cdev ib_ucm_cdev; +static ssize_t show_dev(struct class_device *class_dev, char *buf) +{ + struct ib_ucm_device *dev; + + dev = container_of(class_dev, struct ib_ucm_device, class_dev); + return print_dev_t(buf, dev->dev.dev); +} +static CLASS_DEVICE_ATTR(dev, S_IRUGO, show_dev, NULL); -static int __init ib_ucm_init(void) +static ssize_t show_ibdev(struct class_device *class_dev, char *buf) { - int result; + struct ib_ucm_device *dev; + + dev = container_of(class_dev, struct ib_ucm_device, class_dev); + return sprintf(buf, "%s\n", dev->ib_dev->name); +} +static CLASS_DEVICE_ATTR(ibdev, S_IRUGO, show_ibdev, NULL); - result = register_chrdev_region(IB_UCM_DEV, 1, "infiniband_cm"); - if (result) { - printk(KERN_ERR "ucm: Error <%d> registering dev\n", result); - goto err_chr; - } +static void ib_ucm_add_one(struct ib_device *device) +{ + struct ib_ucm_device *ucm_dev; - cdev_init(&ib_ucm_cdev, &ib_ucm_fops); + if (!device->alloc_ucontext) + return; - result = cdev_add(&ib_ucm_cdev, IB_UCM_DEV, 1); - if (result) { - printk(KERN_ERR "ucm: Error <%d> adding cdev\n", result); + ucm_dev = kmalloc(sizeof *ucm_dev, GFP_KERNEL); + if (!ucm_dev) + return; + + memset(ucm_dev, 0, sizeof *ucm_dev); + ucm_dev->ib_dev = device; + + ucm_dev->devnum = find_first_zero_bit(dev_map, IB_UCM_MAX_DEVICES); + if (ucm_dev->devnum >= IB_UCM_MAX_DEVICES) + goto err; + + set_bit(ucm_dev->devnum, dev_map); + + cdev_init(&ucm_dev->dev, &ucm_fops); + ucm_dev->dev.owner = THIS_MODULE; + kobject_set_name(&ucm_dev->dev.kobj, "ucm%d", ucm_dev->devnum); + if (cdev_add(&ucm_dev->dev, IB_UCM_BASE_DEV + ucm_dev->devnum, 1)) + goto err; + + ucm_dev->class_dev.class = &ucm_class; + ucm_dev->class_dev.dev = device->dma_device; + snprintf(ucm_dev->class_dev.class_id, BUS_ID_SIZE, "ucm%d", + ucm_dev->devnum); + if (class_device_register(&ucm_dev->class_dev)) goto err_cdev; - } - ib_ucm_class = class_create(THIS_MODULE, "infiniband_cm"); - if (IS_ERR(ib_ucm_class)) { - result = PTR_ERR(ib_ucm_class); - printk(KERN_ERR "Error <%d> creating class\n", result); + if (class_device_create_file(&ucm_dev->class_dev, + &class_device_attr_dev)) goto err_class; + if (class_device_create_file(&ucm_dev->class_dev, + &class_device_attr_ibdev)) + goto err_class; + + ib_set_client_data(device, &ucm_client, ucm_dev); + return; + +err_class: + class_device_unregister(&ucm_dev->class_dev); +err_cdev: + cdev_del(&ucm_dev->dev); + clear_bit(ucm_dev->devnum, dev_map); +err: + kfree(ucm_dev); + return; +} + +static void ib_ucm_remove_one(struct ib_device *device) +{ + struct ib_ucm_device *ucm_dev = ib_get_client_data(device, &ucm_client); + + if (!ucm_dev) + return; + + class_device_unregister(&ucm_dev->class_dev); +} + +static ssize_t show_abi_version(struct class *class, char *buf) +{ + return sprintf(buf, "%d\n", IB_USER_CM_ABI_VERSION); +} +static CLASS_ATTR(abi_version, S_IRUGO, show_abi_version, NULL); + +static int __init ib_ucm_init(void) +{ + int ret; + + ret = register_chrdev_region(IB_UCM_BASE_DEV, IB_UCM_MAX_DEVICES, + "infiniband_cm"); + if (ret) { + printk(KERN_ERR "ucm: couldn't register device number\n"); + goto err; } - class_device_create(ib_ucm_class, IB_UCM_DEV, NULL, "ucm"); + ret = class_register(&ucm_class); + if (ret) { + printk(KERN_ERR "ucm: couldn't create class infiniband_cm\n"); + goto err_chrdev; + } - idr_init(&ctx_id_table); - init_MUTEX(&ctx_id_mutex); + ret = class_create_file(&ucm_class, &class_attr_abi_version); + if (ret) { + printk(KERN_ERR "ucm: couldn't create abi_version attribute\n"); + goto err_class; + } + ret = ib_register_client(&ucm_client); + if (ret) { + printk(KERN_ERR "ucm: couldn't register client\n"); + goto err_class; + } return 0; + err_class: - cdev_del(&ib_ucm_cdev); -err_cdev: - unregister_chrdev_region(IB_UCM_DEV, 1); -err_chr: - return result; + class_unregister(&ucm_class); +err_chrdev: + unregister_chrdev_region(IB_UCM_BASE_DEV, IB_UCM_MAX_DEVICES); +err: + return ret; } static void __exit ib_ucm_cleanup(void) { - class_device_destroy(ib_ucm_class, IB_UCM_DEV); - class_destroy(ib_ucm_class); - cdev_del(&ib_ucm_cdev); - unregister_chrdev_region(IB_UCM_DEV, 1); + ib_unregister_client(&ucm_client); + class_unregister(&ucm_class); + unregister_chrdev_region(IB_UCM_BASE_DEV, IB_UCM_MAX_DEVICES); } module_init(ib_ucm_init); From sean.hefty at intel.com Thu Sep 15 10:40:58 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 15 Sep 2005 10:40:58 -0700 Subject: [openib-general] [PATCH] [UAT] 2/5 change UAT minor device number In-Reply-To: Message-ID: Change the minor device number for uAT, which is now used by uCM. Signed-off-by: Sean Hefty Index: linux-kernel/infiniband/core/uat.c =================================================================== --- linux-kernel/infiniband/core/uat.c (revision 3433) +++ linux-kernel/infiniband/core/uat.c (working copy) @@ -57,7 +57,7 @@ MODULE_PARM_DESC(debug_level, "Enable de enum { IB_UAT_MAJOR = 231, - IB_UAT_MINOR = 254 + IB_UAT_MINOR = 191 }; #define IB_UAT_DEV MKDEV(IB_UAT_MAJOR, IB_UAT_MINOR) From sean.hefty at intel.com Thu Sep 15 10:44:05 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 15 Sep 2005 10:44:05 -0700 Subject: [openib-general] [PATCH] [kDAPL] 3/5 per device communication identifiers In-Reply-To: Message-ID: Patch fixes up kDAPL to use per device cm_id's. I did not test this, but the changes appear straightforward. Signed-off-by: Sean Hefty Index: linux-kernel/infiniband/ulp/kdapl/ib/dapl_openib_cm.c =================================================================== --- linux-kernel/infiniband/ulp/kdapl/ib/dapl_openib_cm.c (revision 3433) +++ linux-kernel/infiniband/ulp/kdapl/ib/dapl_openib_cm.c (working copy) @@ -481,7 +481,8 @@ int dapl_ib_connect(struct dapl_ep *ep, spin_lock_init(&cm_ctx->lock); init_waitqueue_head(&cm_ctx->wait); cm_ctx->ep = ep; - cm_ctx->cm_id = ib_create_cm_id(dapl_cm_active_cb_handler, cm_ctx); + cm_ctx->cm_id = ib_create_cm_id(ep->qp->device, + dapl_cm_active_cb_handler, cm_ctx); if (IS_ERR(cm_ctx->cm_id)) { dapl_dbg_log(DAPL_DBG_TYPE_ERR, " CM ID creation failed\n"); kfree(cm_ctx); @@ -683,7 +684,8 @@ int dapl_ib_setup_conn_listener(struct d { int status; - sp->cm_srvc_handle = ib_create_cm_id(dapl_cm_passive_cb_handler, sp); + sp->cm_srvc_handle = ib_create_cm_id(ia->provider->device, + dapl_cm_passive_cb_handler, sp); if (IS_ERR(sp->cm_srvc_handle)) { dapl_dbg_log(DAPL_DBG_TYPE_ERR, " CM ID creation failed\n"); return -EAGAIN; From sean.hefty at intel.com Thu Sep 15 10:49:37 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 15 Sep 2005 10:49:37 -0700 Subject: [openib-general] [PATCH] [uDAPL] 4/5 per device communication identifiers In-Reply-To: Message-ID: Convert uDAPL to use per device cm_id's. Untested, but changes appear straightforward. Signed-off-by: Sean Hefty Index: userspace/dapl/dapl/openib/dapl_ib_cm.c =================================================================== --- userspace/dapl/dapl/openib/dapl_ib_cm.c (revision 3433) +++ userspace/dapl/dapl/openib/dapl_ib_cm.c (working copy) @@ -573,7 +573,7 @@ dapls_ib_connect ( conn->ep = ep_ptr; conn->hca = ep_ptr->header.owner_ia->hca_ptr; - status = ib_cm_create_id(&conn->cm_id, conn); + status = ib_cm_create_id(conn->hca->ib_hca_handle, &conn->cm_id, conn); if (status < 0) { dat_status = dapl_convert_errno(errno,"create_cm_id"); dapl_os_free(conn, sizeof(*conn)); @@ -749,7 +749,8 @@ dapls_ib_setup_conn_listener ( return DAT_INTERNAL_ERROR; } - status = ib_cm_create_id(&conn->cm_id, conn); + status = ib_cm_create_id(ia_ptr->hca_ptr->ib_hca_handle, &conn->cm_id, + conn); if (status < 0) { dat_status = dapl_convert_errno(errno,"create_cm_id"); dapl_os_free(conn, sizeof(*conn)); From sean.hefty at intel.com Thu Sep 15 10:53:31 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 15 Sep 2005 10:53:31 -0700 Subject: [openib-general] [PATCH] [SRP] 5/5 per device communication identifiers In-Reply-To: Message-ID: Patch to update SRP to per device cm_id's. I don't have an SRP target to test this against, but changes appear straightforward. Signed-off-by: Sean Hefty Index: linux-kernel/infiniband/ulp/srp/ib_srp.c =================================================================== --- linux-kernel/infiniband/ulp/srp/ib_srp.c (revision 3450) +++ linux-kernel/infiniband/ulp/srp/ib_srp.c (working copy) @@ -449,7 +449,8 @@ static int srp_reconnect_target(struct s srp_disconnect_target(target); - target->cm_id = ib_create_cm_id(srp_cm_handler, target); + target->cm_id = ib_create_cm_id(target->qp->device, srp_cm_handler, + target); if (IS_ERR(target->cm_id)) { ret = PTR_ERR(target->cm_id); target->cm_id = NULL; @@ -1397,7 +1398,7 @@ static ssize_t srp_create_target(struct if (ret) goto err; - target->cm_id = ib_create_cm_id(srp_cm_handler, target); + target->cm_id = ib_create_cm_id(host->dev, srp_cm_handler, target); if (IS_ERR(target->cm_id)) { ret = PTR_ERR(target->cm_id); goto err; From halr at voltaire.com Thu Sep 15 11:08:39 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2005 14:08:39 -0400 Subject: [openib-general] Re: [PATCH] [UAT] 2/5 change UAT minor device number In-Reply-To: References: Message-ID: <1126807719.5425.3693.camel@hal.voltaire.com> Hi Sean, On Thu, 2005-09-15 at 13:40, Sean Hefty wrote: > Change the minor device number for uAT, which is now used by uCM. Feel free to apply this or let me know if you want me to. -- Hal From mshefty at ichips.intel.com Thu Sep 15 11:16:42 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 15 Sep 2005 11:16:42 -0700 Subject: [openib-general] Re: [PATCH] [UAT] 2/5 change UAT minor device number In-Reply-To: <1126807719.5425.3693.camel@hal.voltaire.com> References: <1126807719.5425.3693.camel@hal.voltaire.com> Message-ID: <4329BA8A.2050306@ichips.intel.com> Hal Rosenstock wrote: > Hi Sean, > > On Thu, 2005-09-15 at 13:40, Sean Hefty wrote: > >>Change the minor device number for uAT, which is now used by uCM. > > > Feel free to apply this or let me know if you want me to. Ok - I'll apply this when committing the CM changes. - Sean From sean.hefty at intel.com Thu Sep 15 11:23:21 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 15 Sep 2005 11:23:21 -0700 Subject: [openib-general] [PATCH] [SDP] 6/5 per device communication identifiers In-Reply-To: Message-ID: I can't count. (Actually I counted SRP and SDP together...) Here's a patch to update SDP to per device cm_id's. Signed-off-by: Sean Hefty Index: linux-kernel/infiniband/ulp/sdp/sdp_actv.c =================================================================== --- linux-kernel/infiniband/ulp/sdp/sdp_actv.c (revision 3450) +++ linux-kernel/infiniband/ulp/sdp/sdp_actv.c (working copy) @@ -480,7 +480,7 @@ static void sdp_cm_path_complete(u64 id, /* XXX set timeout to default value of 14 */ path->packet_life = 13; #endif - conn->cm_id = ib_create_cm_id(sdp_cm_event_handler, + conn->cm_id = ib_create_cm_id(ca, sdp_cm_event_handler, hashent_arg(conn->hashent)); if (!conn->cm_id) { sdp_dbg_warn(conn, "Failed to create CM handle, %d", Index: linux-kernel/infiniband/ulp/sdp/sdp_conn.c =================================================================== --- linux-kernel/infiniband/ulp/sdp/sdp_conn.c (revision 3450) +++ linux-kernel/infiniband/ulp/sdp/sdp_conn.c (working copy) @@ -1742,7 +1742,7 @@ static void sdp_device_init_one(struct i if (IS_ERR(hca->pd)) { sdp_warn("Error <%ld> creating HCA <%s> protection domain.", PTR_ERR(hca->pd), device->name); - goto error; + goto err1; } /* * memory registration @@ -1751,7 +1751,7 @@ static void sdp_device_init_one(struct i if (IS_ERR(hca->mem_h)) { sdp_warn("Error <%ld> registering HCA <%s> memory.", PTR_ERR(hca->mem_h), device->name); - goto error; + goto err2; } hca->l_key = hca->mem_h->lkey; @@ -1788,7 +1788,7 @@ static void sdp_device_init_one(struct i device->name, port_count, device->phys_port_cnt); - goto error; + goto err3; } memset(port, 0, sizeof *port); @@ -1804,15 +1804,32 @@ static void sdp_device_init_one(struct i sdp_warn("Error <%d> getting GID for port <%s:%d:%d>", result, device->name, port->index, device->phys_port_cnt); - goto error; + goto err3; } } + hca->listen_id = ib_create_cm_id(device, sdp_cm_event_handler, hca); + if (IS_ERR(hca->listen_id)) { + sdp_warn("Error <%ld> creating listen ID on <%s>.", + PTR_ERR(hca->listen_id), device->name); + goto err3; + } + + result = ib_cm_listen(hca->listen_id, + cpu_to_be64(SDP_MSG_SERVICE_ID_VALUE), + cpu_to_be64(SDP_MSG_SERVICE_ID_MASK)); + if (result) { + sdp_warn("Error <%d> listening for SDP connections", result); + goto err4; + } + ib_set_client_data(device, &sdp_client, hca); return; -error: +err4: + ib_destroy_cm_id(hca->listen_id); +err3: list_for_each_entry_safe(port, tmp, &hca->port_list, list) { list_del(&port->list); kfree(port); @@ -1820,13 +1837,10 @@ error: if (!IS_ERR(hca->fmr_pool)) ib_destroy_fmr_pool(hca->fmr_pool); - - if (!IS_ERR(hca->mem_h)) - (void)ib_dereg_mr(hca->mem_h); - - if (!IS_ERR(hca->pd)) - (void)ib_dealloc_pd(hca->pd); - + ib_dereg_mr(hca->mem_h); +err2: + ib_dealloc_pd(hca->pd); +err1: kfree(hca); } @@ -1845,6 +1859,8 @@ static void sdp_device_remove_one(struct return; } + ib_destroy_cm_id(hca->listen_id); + list_for_each_entry_safe(port, tmp, &hca->port_list, list) { list_del(&port->list); kfree(port); @@ -1853,12 +1869,8 @@ static void sdp_device_remove_one(struct if (!IS_ERR(hca->fmr_pool)) ib_destroy_fmr_pool(hca->fmr_pool); - if (!IS_ERR(hca->mem_h)) - (void)ib_dereg_mr(hca->mem_h); - - if (!IS_ERR(hca->pd)) - (void)ib_dealloc_pd(hca->pd); - + ib_dereg_mr(hca->mem_h); + ib_dealloc_pd(hca->pd); kfree(hca); } @@ -1945,33 +1957,9 @@ int sdp_conn_table_init(int proto_family goto error_iocb; } - /* - * start listening - */ - dev_root_s.listen_id = ib_create_cm_id(sdp_cm_event_handler, - (void *)SDP_DEV_SK_INVALID); - if (!dev_root_s.listen_id) { - sdp_warn("Failed to create listen connection identifier."); - result = -ENOMEM; - goto error_conn; - } - - result = ib_cm_listen(dev_root_s.listen_id, - cpu_to_be64(SDP_MSG_SERVICE_ID_VALUE), - cpu_to_be64(SDP_MSG_SERVICE_ID_MASK)); - if (result) { - sdp_warn("Error <%d> listening for SDP connections", result); - goto error_listen; - - } - sdp_dbg_init("Started listening for SDP connection requests"); return 0; -error_listen: - ib_destroy_cm_id(dev_root_s.listen_id); -error_conn: - sdp_main_iocb_cleanup(); error_iocb: dev_root_s.sk_array--; free_pages((unsigned long)dev_root_s.sk_array, dev_root_s.sk_ordr); @@ -2010,8 +1998,4 @@ void sdp_conn_table_clear(void) * delete IOCB table */ sdp_main_iocb_cleanup(); - /* - * stop listening - */ - ib_destroy_cm_id(dev_root_s.listen_id); } Index: linux-kernel/infiniband/ulp/sdp/sdp_pass.c =================================================================== --- linux-kernel/infiniband/ulp/sdp/sdp_pass.c (revision 3450) +++ linux-kernel/infiniband/ulp/sdp/sdp_pass.c (working copy) @@ -449,7 +449,7 @@ int sdp_cm_req_handler(struct ib_cm_id * * associate connection with a hca/port, and allocate IB. */ result = sdp_conn_alloc_ib(conn, - event->param.req_rcvd.device, + cm_id->device, event->param.req_rcvd.port, event->param.req_rcvd.primary_path->pkey); if (result < 0) { Index: linux-kernel/infiniband/ulp/sdp/sdp_dev.h =================================================================== --- linux-kernel/infiniband/ulp/sdp/sdp_dev.h (revision 3450) +++ linux-kernel/infiniband/ulp/sdp/sdp_dev.h (working copy) @@ -154,6 +154,7 @@ struct sdev_hca { u32 r_key; /* remote key */ struct ib_fmr_pool *fmr_pool; /* fast memory for Zcopy */ struct list_head port_list; /* ports on this HCA */ + struct ib_cm_id *listen_id; }; struct sdev_root { @@ -187,10 +188,6 @@ struct sdev_root { spinlock_t bind_lock; spinlock_t sock_lock; spinlock_t listen_lock; - /* - * SDP wide listen - */ - struct ib_cm_id *listen_id; /* listen handle */ }; #endif /* _SDP_DEV_H */ From rolandd at cisco.com Thu Sep 15 11:28:03 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 15 Sep 2005 11:28:03 -0700 Subject: [openib-general] [PATCH] [SRP] 5/5 per device communication identifiers In-Reply-To: (Sean Hefty's message of "Thu, 15 Sep 2005 10:53:31 -0700") References: Message-ID: <527jdiuqx8.fsf@cisco.com> Looks fine to commit when the CM changes go in. - R. From halr at voltaire.com Thu Sep 15 11:27:41 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2005 14:27:41 -0400 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <523bo6wcbg.fsf@cisco.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEBF6@mtlexch01.mtl.com> <523bo6wcbg.fsf@cisco.com> Message-ID: <1126808787.5425.3837.camel@hal.voltaire.com> On Thu, 2005-09-15 at 12:00, Roland Dreier wrote: > But I don't see any way to handle the situation arising when booting > over IB, where a system needs the SM to bring its port to active > before it can boot, but where the system doesn't know its host name > until after it boots. This is somewhat cumbersome but clearly the host name is known and could be burnt nonvolatilely into the SMA. -- Hal From jlentini at netapp.com Thu Sep 15 11:34:58 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 15 Sep 2005 14:34:58 -0400 (EDT) Subject: [openib-general] Re: [PATCH] [kDAPL] 3/5 per device communication identifiers In-Reply-To: References: Message-ID: On Thu, 15 Sep 2005, Sean Hefty wrote: > Patch fixes up kDAPL to use per device cm_id's. > > I did not test this, but the changes appear straightforward. > > Signed-off-by: Sean Hefty This is ok to commit with the CM change. From jlentini at netapp.com Thu Sep 15 11:35:49 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 15 Sep 2005 14:35:49 -0400 (EDT) Subject: [openib-general] [PATCH] [uDAPL] 4/5 per device communication identifiers In-Reply-To: References: Message-ID: On Thu, 15 Sep 2005, Sean Hefty wrote: > Convert uDAPL to use per device cm_id's. > > Untested, but changes appear straightforward. > > Signed-off-by: Sean Hefty This also is ok to commit with the CM change. From rolandd at cisco.com Thu Sep 15 11:38:05 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 15 Sep 2005 11:38:05 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <1126808787.5425.3837.camel@hal.voltaire.com> (Hal Rosenstock's message of "15 Sep 2005 14:27:41 -0400") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEBF6@mtlexch01.mtl.com> <523bo6wcbg.fsf@cisco.com> <1126808787.5425.3837.camel@hal.voltaire.com> Message-ID: <52y85ytbw2.fsf@cisco.com> Hal> This is somewhat cumbersome but clearly the host name is Hal> known and could be burnt nonvolatilely into the SMA. Actually it can't in general, since the same hardware can be rebooted into different roles and possibly different hostnames. - R. From halr at voltaire.com Thu Sep 15 11:40:23 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 15 Sep 2005 14:40:23 -0400 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <52y85ytbw2.fsf@cisco.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEBF6@mtlexch01.mtl.com> <523bo6wcbg.fsf@cisco.com> <1126808787.5425.3837.camel@hal.voltaire.com> <52y85ytbw2.fsf@cisco.com> Message-ID: <1126809623.5425.3921.camel@hal.voltaire.com> On Thu, 2005-09-15 at 14:38, Roland Dreier wrote: > Hal> This is somewhat cumbersome but clearly the host name is > Hal> known and could be burnt nonvolatilely into the SMA. > > Actually it can't in general, since the same hardware can be rebooted > into different roles and possibly different hostnames. As I said, it is cumbersome. The nonvolatile way would require the host name to be updated if it were to change. Anyhow, what I said before was a little off, the diagnostics are capable of seeing the new NodeDescription. Whether an SM does is implementation dependent. -- Hal From mshefty at ichips.intel.com Thu Sep 15 14:31:27 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 15 Sep 2005 14:31:27 -0700 Subject: [openib-general] [PATCH] [CM] 1/5 per device communication identifiers In-Reply-To: References: Message-ID: <4329E82F.10106@ichips.intel.com> Sean Hefty wrote: > The following patch binds communication identifiers to a device. > It exports per HCA devices to userspace. I've committed all changes except those for SDP. NOTE: This patch may require changes to udev.rules to ensure that the ucm devices are exported correctly. - Sean From ardavis at ichips.intel.com Thu Sep 15 15:47:46 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 15 Sep 2005 15:47:46 -0700 Subject: [openib-general] IBAT kernel oops with latest 3432 svn drop.... In-Reply-To: <1126742920.5425.395.camel@hal.voltaire.com> References: <432879CD.7060003@ichips.intel.com> <4328824F.4020805@ichips.intel.com> <1126734515.5425.127.camel@hal.voltaire.com> <4328AF57.2020306@ichips.intel.com> <1126742920.5425.395.camel@hal.voltaire.com> Message-ID: <4329FA12.8090101@ichips.intel.com> Hal Rosenstock wrote: >On Wed, 2005-09-14 at 19:16, Arlin Davis wrote: > > >>>>>My SM was down (ports in INIT state) >>>>> >>>>> >>>>> >>>>> >>>Had an SM been up previously ? Just wondering how to recreate this. >>> >>> >>> >>> >>SM was actually running, but for some reason it was not sweeping and >>configuring my ports. >> >> > >OpenSM 1.0.0 or 1.1.0 or some other SM ? > > 1.0.0 > > >>>>Question: What can I expect as a result of ib_at_cancel()? >>>> >>>> >>>> >>>> >>>Return 0 if canceled, -1 if cancel failed (e.g. bad ID) >>> >>> >>> >>> >>> >>>>Will I always >>>>get an event with -EINTR for the cancelled request id? >>>> >>>> >>>> >>>> >>>Is that what you are seeing ? >>> >>> >>> >>> >>sometimes I see an event with -EINTR and sometimes I don't see any event. >> >> > >Yes, it looks like in kernel AT it depends on whether the request is >still pending or not as to what occurs (at least right now). > > We need some deterministic results. Maybe a guarantee that all events for the cancelled req_id will be processed before ib_at_cancel returns? Is that possible? -arlin >-- Hal > > > From ardavis at ichips.intel.com Thu Sep 15 16:58:30 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 15 Sep 2005 16:58:30 -0700 Subject: [openib-general] user event processing question... Message-ID: <432A0AA6.5050505@ichips.intel.com> I am running into some mutli-threaded issues with blocking get_event calls. There is a window of oppurtunity where a destroy can remove the event between the time my poll wakes up and the time I call get_event which results in a blocking get_event call. Any suggestions, other then switching between non-blocking and blocking in my processing thread, would be appreciated. Thanks, -arlin From rolandd at cisco.com Thu Sep 15 17:06:51 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 15 Sep 2005 17:06:51 -0700 Subject: [openib-general] Re: user event processing question... In-Reply-To: <432A0AA6.5050505@ichips.intel.com> (Arlin Davis's message of "Thu, 15 Sep 2005 16:58:30 -0700") References: <432A0AA6.5050505@ichips.intel.com> Message-ID: <52k6hhnaec.fsf@cisco.com> Arlin> I am running into some mutli-threaded issues with blocking Arlin> get_event calls. There is a window of oppurtunity where a Arlin> destroy can remove the event between the time my poll wakes Arlin> up and the time I call get_event which results in a Arlin> blocking get_event call. Any suggestions, other then Arlin> switching between non-blocking and blocking in my Arlin> processing thread, would be appreciated. I guess you could just set the file descriptors to always be non-blocking. You might get EAGAIN back from the get event calls sometimes but that doesn't seem like a real issue. - R. From roel at yottayotta.com Thu Sep 15 17:21:07 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Thu, 15 Sep 2005 18:21:07 -0600 (MDT) Subject: [openib-general] OpenSM problem?! Message-ID: Hi Hal, If my interpretation of the spec is correct, it seems that OpenSM has problems with a window size of one during the RMPP protocol. Shortly after receiving a lid from the OpenSM, our application asks for subnAdmGetTable(SANodeRecord): sending MAD: MAD: struct SAHeader (56 bytes) - SA Header (section 15.2.1.1) { RMPPHeader: struct RMPPHeader (36 bytes) - RMPP Header Fields (section 13.6.2.1) { MADHeader: struct MADHeader (24 bytes) - MAD Base Header (section 13.4.3) { baseVersion: 0x01 (8 bit uint) mgmtClass: 0x03 (8 bit uint) classVersion: 0x02 (8 bit uint) method: 0x12 (8 bit uint) status: 0x0000 (16 bit uint) classSpecific: 0x0000 (16 bit uint) transactionID: 0x000000000003B59E (64 bit uint) attributeID: 0x0011 (16 bit uint) rsv0: 0x0000 (16 bit uint) attributeModifier: 0x00000000 (32 bit uint) } RMPPVersion: 0x00 (8 bit uint) RMPPType: 0x00 (8 bit uint) RRespTime: 0x00 (5 bit uint) RMPPFlags: 0x0 (3 bit uint) RMPPStatus: 0x00 (8 bit uint) data1: 0x4E4F4E4F (32 bit uint) data2: 0x4E4F4E4F (32 bit uint) } SMKey: 0x0000000000000000 (64 bit uint) attributeOffset: 0x000E (16 bit uint) rsv0: 0x0000 (16 bit uint) componentMask: 0x0000000000000000 (64 bit uint) } The OpenSM replies: received MAD: struct HdrLRH (8 bytes) - Local Route Header (section 7.7) { VL: 0x0 (4 bit uint) LVer: 0x0 (4 bit uint) SL: 0x0 (4 bit uint) rsv0: 0x0 (2 bit uint) LNH: 0x2 (2 bit uint) DLID: 0x0001 (16 bit uint) rsv1: 0x00 (5 bit uint) pktLen: 0x048 (11 bit uint) SLID: 0x0002 (16 bit uint) } MAD: struct SAHeader (56 bytes) - SA Header (section 15.2.1.1) { RMPPHeader: struct RMPPHeader (36 bytes) - RMPP Header Fields (section 13.6.2.1) { MADHeader: struct MADHeader (24 bytes) - MAD Base Header (section 13.4.3) { baseVersion: 0x01 (8 bit uint) mgmtClass: 0x03 (8 bit uint) classVersion: 0x02 (8 bit uint) method: 0x92 (8 bit uint) status: 0x0000 (16 bit uint) classSpecific: 0x0000 (16 bit uint) transactionID: 0x000000000003B59E (64 bit uint) attributeID: 0x0011 (16 bit uint) rsv0: 0x0000 (16 bit uint) attributeModifier: 0x00000000 (32 bit uint) } RMPPVersion: 0x01 (8 bit uint) RMPPType: 0x01 (8 bit uint) RRespTime: 0x00 (5 bit uint) RMPPFlags: 0x3 (3 bit uint) RMPPStatus: 0x00 (8 bit uint) data1: 0x00000001 (32 bit uint) data2: 0x000002F0 (32 bit uint) } SMKey: 0x0000000000000000 (64 bit uint) attributeOffset: 0x000E (16 bit uint) rsv0: 0x0000 (16 bit uint) componentMask: 0x0000000000000000 (64 bit uint) } The application acks with a window size of 1: sending MAD: unrecognized MAD class attribute: 0x0000 MAD: struct SAHeader (56 bytes) - SA Header (section 15.2.1.1) { RMPPHeader: struct RMPPHeader (36 bytes) - RMPP Header Fields (section 13.6.2.1) { MADHeader: struct MADHeader (24 bytes) - MAD Base Header (section 13.4.3) { baseVersion: 0x01 (8 bit uint) mgmtClass: 0x03 (8 bit uint) classVersion: 0x00 (8 bit uint) method: 0x00 (8 bit uint) status: 0x0000 (16 bit uint) classSpecific: 0x0000 (16 bit uint) transactionID: 0x000000000003B59E (64 bit uint) attributeID: 0x0000 (16 bit uint) rsv0: 0x0000 (16 bit uint) attributeModifier: 0x00000000 (32 bit uint) } RMPPVersion: 0x01 (8 bit uint) RMPPType: 0x02 (8 bit uint) RRespTime: 0x14 (5 bit uint) RMPPFlags: 0x1 (3 bit uint) RMPPStatus: 0x00 (8 bit uint) data1: 0x00000001 (32 bit uint) data2: 0x00000002 (32 bit uint) } SMKey: 0x0000000000000000 (64 bit uint) attributeOffset: 0x0000 (16 bit uint) rsv0: 0x0000 (16 bit uint) componentMask: 0x0000000000000000 (64 bit uint) } OpenSM says that it received the initial packet again: received MAD: struct HdrLRH (8 bytes) - Local Route Header (section 7.7) { VL: 0x0 (4 bit uint) LVer: 0x0 (4 bit uint) SL: 0x0 (4 bit uint) rsv0: 0x0 (2 bit uint) LNH: 0x2 (2 bit uint) DLID: 0x0001 (16 bit uint) rsv1: 0x00 (5 bit uint) pktLen: 0x048 (11 bit uint) SLID: 0x0002 (16 bit uint) } MAD: struct SAHeader (56 bytes) - SA Header (section 15.2.1.1) { RMPPHeader: struct RMPPHeader (36 bytes) - RMPP Header Fields (section 13.6.2.1) { MADHeader: struct MADHeader (24 bytes) - MAD Base Header (section 13.4.3) { baseVersion: 0x01 (8 bit uint) mgmtClass: 0x03 (8 bit uint) classVersion: 0x02 (8 bit uint) method: 0x92 (8 bit uint) status: 0x0000 (16 bit uint) classSpecific: 0x0000 (16 bit uint) transactionID: 0x000000000003B59E (64 bit uint) attributeID: 0x0011 (16 bit uint) rsv0: 0x0000 (16 bit uint) attributeModifier: 0x00000000 (32 bit uint) } RMPPVersion: 0x01 (8 bit uint) RMPPType: 0x01 (8 bit uint) RRespTime: 0x00 (5 bit uint) RMPPFlags: 0x3 (3 bit uint) RMPPStatus: 0x00 (8 bit uint) data1: 0x00000001 (32 bit uint) data2: 0x000002F0 (32 bit uint) } SMKey: 0x0000000000000000 (64 bit uint) attributeOffset: 0x000E (16 bit uint) rsv0: 0x0000 (16 bit uint) componentMask: 0x0000000000000000 (64 bit uint) } Our application acks once again: sending MAD: unrecognized MAD class attribute: 0x0000 MAD: struct SAHeader (56 bytes) - SA Header (section 15.2.1.1) { RMPPHeader: struct RMPPHeader (36 bytes) - RMPP Header Fields (section 13.6.2.1) { MADHeader: struct MADHeader (24 bytes) - MAD Base Header (section 13.4.3) { baseVersion: 0x01 (8 bit uint) mgmtClass: 0x03 (8 bit uint) classVersion: 0x00 (8 bit uint) method: 0x00 (8 bit uint) status: 0x0000 (16 bit uint) classSpecific: 0x0000 (16 bit uint) transactionID: 0x000000000003B59E (64 bit uint) attributeID: 0x0000 (16 bit uint) rsv0: 0x0000 (16 bit uint) attributeModifier: 0x00000000 (32 bit uint) } RMPPVersion: 0x01 (8 bit uint) RMPPType: 0x02 (8 bit uint) RRespTime: 0x14 (5 bit uint) RMPPFlags: 0x1 (3 bit uint) RMPPStatus: 0x00 (8 bit uint) data1: 0x00000001 (32 bit uint) data2: 0x00000002 (32 bit uint) } SMKey: 0x0000000000000000 (64 bit uint) attributeOffset: 0x0000 (16 bit uint) rsv0: 0x0000 (16 bit uint) componentMask: 0x0000000000000000 (64 bit uint) } The RMPP sequence times out, our application sends another subnAdmGetTable(SANodeRecord), ad infinitum. Cheers :-), Roel. From eitan at mellanox.co.il Thu Sep 15 20:58:02 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 16 Sep 2005 06:58:02 +0300 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <006001c5ba18$684dfc90$9e5aa8c0@infiniconsys.com> References: <006001c5ba18$684dfc90$9e5aa8c0@infiniconsys.com> Message-ID: <432A42CA.9000802@mellanox.co.il> Fab Tillier wrote: >>From: Roland Dreier [mailto:rolandd at cisco.com] >>Sent: Thursday, September 15, 2005 9:01 AM >> >> Jack> The resulting set of NodeDescription strings present in the >> Jack> SM and SA could then be a race-dependent salad (depending on >> Jack> the timing of QP0 entering RTS state, SM subnet sweep, and >> Jack> resetting of the local NodeDescription string). >> >>Yes, it's unfortunate. >> >>But I don't see any way to handle the situation arising when booting >>over IB, where a system needs the SM to bring its port to active >>before it can boot, but where the system doesn't know its host name >>until after it boots. > > > What happens during the handoff from the boot environment to the OS? > Does the > HCA get disabled and then the mthca driver starts fresh? Or does the > mthca > driver inherit a device that is already fully initialized. If it gets > re-initialized, don't the ports go down when the boot agent shuts down > (and the > SM should get a GID out of service trap), Actually the SM will get either a port state change unaffiliated async event if the subnet has no switch, or a Trap 128 from the switch connected to the rebooted HCA. EZ followed by the ports going up > when > mthca starts? Or is the problem that the boot driver doesn't know when > the > handoff is, and thus can't disable the device? > > Thanks, > > - Fab > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From eitan at mellanox.co.il Thu Sep 15 21:55:14 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 16 Sep 2005 07:55:14 +0300 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <52oe6uuu7f.fsf@cisco.com> References: <52oe6uuu7f.fsf@cisco.com> Message-ID: <432A5032.7050303@mellanox.co.il> Roland Dreier wrote: > Fab> What happens during the handoff from the boot environment to > Fab> the OS? Does the HCA get disabled and then the mthca driver > Fab> starts fresh? Or does the mthca driver inherit a device that > Fab> is already fully initialized. If it gets re-initialized, > Fab> don't the ports go down when the boot agent shuts down (and > Fab> the SM should get a GID out of service trap), followed by the > Fab> ports going up when mthca starts? Or is the problem that the > Fab> boot driver doesn't know when the handoff is, and thus can't > Fab> disable the device? > > After the kernel takes over, mthca will reset the HCA and of course > the SM will have to bring the port back up. But at the point that > mthca is loaded, the system typically won't have a hostname set. > > The kernel will need to have the HCA port active with the mthca driver > running before it can mount root and get to /etc/sysconfig/network or > wherever the hostname is set. Maybe we could use some parameter passing between the boot OS and the post boot OS? Are there mechanisms to do that? For OpenSM case the node description is only used for informational purposes. I think it will be very confusing if a node with a user given description will show up with no description after it is rebooted. For the gen2 stack we could use the following "hack": OpenSM scans all nodes for their description every time it does a full sweep. So we could cause an extra sweep after each node description change by faking trap 144 (HCA port capability mask change) and sending it over. However, this is non standard and each SM can treat it differently. If we limit our solution to the common case: On a regular non disk-less machine is it possible to have the node description be set before the QP0 is physically UP? EZ From halr at voltaire.com Fri Sep 16 02:59:13 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Sep 2005 05:59:13 -0400 Subject: [openib-general] Re: OpenSM problem?! In-Reply-To: References: Message-ID: <1126864422.5425.11372.camel@hal.voltaire.com> Hi Roel, On Thu, 2005-09-15 at 20:21, Roel van der Goot wrote: > Hi Hal, > > If my interpretation of the spec is correct, it seems that OpenSM > has problems with a window size of one during the RMPP protocol. > Shortly after receiving a lid from the OpenSM, our application > asks for subnAdmGetTable(SANodeRecord): ^^^^^^^^^^^^ (nit) NodeRecord > sending MAD: > > MAD: struct SAHeader (56 bytes) - > SA Header (section 15.2.1.1) > { > RMPPHeader: struct RMPPHeader (36 bytes) - > RMPP Header Fields (section 13.6.2.1) > { > MADHeader: struct MADHeader (24 bytes) - > MAD Base Header (section 13.4.3) > { > baseVersion: 0x01 (8 bit uint) > mgmtClass: 0x03 (8 bit uint) > classVersion: 0x02 (8 bit uint) > method: 0x12 (8 bit uint) > status: 0x0000 (16 bit uint) > classSpecific: 0x0000 (16 bit uint) > transactionID: 0x000000000003B59E (64 bit uint) > attributeID: 0x0011 (16 bit uint) > rsv0: 0x0000 (16 bit uint) > attributeModifier: 0x00000000 (32 bit uint) > } > RMPPVersion: 0x00 (8 bit uint) > RMPPType: 0x00 (8 bit uint) > RRespTime: 0x00 (5 bit uint) > RMPPFlags: 0x0 (3 bit uint) > RMPPStatus: 0x00 (8 bit uint) > data1: 0x4E4F4E4F (32 bit uint) > data2: 0x4E4F4E4F (32 bit uint) > } > SMKey: 0x0000000000000000 (64 bit uint) > attributeOffset: 0x000E (16 bit uint) > rsv0: 0x0000 (16 bit uint) > componentMask: 0x0000000000000000 (64 bit uint) > } Right, that is the SubnAdmGetTable(NodeRecord) request which asks for all NodeRecords. AttributeOffset need not be set here but it causes no harm. > The OpenSM replies: > > received MAD: > struct HdrLRH (8 bytes) - > Local Route Header (section 7.7) > { > VL: 0x0 (4 bit uint) > LVer: 0x0 (4 bit uint) > SL: 0x0 (4 bit uint) > rsv0: 0x0 (2 bit uint) > LNH: 0x2 (2 bit uint) > DLID: 0x0001 (16 bit uint) > rsv1: 0x00 (5 bit uint) > pktLen: 0x048 (11 bit uint) > SLID: 0x0002 (16 bit uint) > } > MAD: struct SAHeader (56 bytes) - > SA Header (section 15.2.1.1) > { > RMPPHeader: struct RMPPHeader (36 bytes) - > RMPP Header Fields (section 13.6.2.1) > { > MADHeader: struct MADHeader (24 bytes) - > MAD Base Header (section 13.4.3) > { > baseVersion: 0x01 (8 bit uint) > mgmtClass: 0x03 (8 bit uint) > classVersion: 0x02 (8 bit uint) > method: 0x92 (8 bit uint) > status: 0x0000 (16 bit uint) > classSpecific: 0x0000 (16 bit uint) > transactionID: 0x000000000003B59E (64 bit uint) > attributeID: 0x0011 (16 bit uint) > rsv0: 0x0000 (16 bit uint) > attributeModifier: 0x00000000 (32 bit uint) > } > RMPPVersion: 0x01 (8 bit uint) > RMPPType: 0x01 (8 bit uint) > RRespTime: 0x00 (5 bit uint) > RMPPFlags: 0x3 (3 bit uint) > RMPPStatus: 0x00 (8 bit uint) > data1: 0x00000001 (32 bit uint) > data2: 0x000002F0 (32 bit uint) > } > SMKey: 0x0000000000000000 (64 bit uint) > attributeOffset: 0x000E (16 bit uint) > rsv0: 0x0000 (16 bit uint) > componentMask: 0x0000000000000000 (64 bit uint) > } This is a first DATA packet (with more to come). > The application acks with a window size of 1: > sending MAD: > unrecognized MAD class attribute: 0x0000 > MAD: struct SAHeader (56 bytes) - > SA Header (section 15.2.1.1) > { > RMPPHeader: struct RMPPHeader (36 bytes) - > RMPP Header Fields (section 13.6.2.1) > { > MADHeader: struct MADHeader (24 bytes) - > MAD Base Header (section 13.4.3) > { > baseVersion: 0x01 (8 bit uint) > mgmtClass: 0x03 (8 bit uint) > classVersion: 0x00 (8 bit uint) > method: 0x00 (8 bit uint) > status: 0x0000 (16 bit uint) > classSpecific: 0x0000 (16 bit uint) > transactionID: 0x000000000003B59E (64 bit uint) > attributeID: 0x0000 (16 bit uint) > rsv0: 0x0000 (16 bit uint) > attributeModifier: 0x00000000 (32 bit uint) > } > RMPPVersion: 0x01 (8 bit uint) > RMPPType: 0x02 (8 bit uint) > RRespTime: 0x14 (5 bit uint) > RMPPFlags: 0x1 (3 bit uint) FYI, this doesn't need setting for ACKs. > RMPPStatus: 0x00 (8 bit uint) > data1: 0x00000001 (32 bit uint) > data2: 0x00000002 (32 bit uint) > } > SMKey: 0x0000000000000000 (64 bit uint) > attributeOffset: 0x0000 (16 bit uint) > rsv0: 0x0000 (16 bit uint) > componentMask: 0x0000000000000000 (64 bit uint) > } That is an ACK for segment 1 (which OpenSM just sent) with a new window last of 2. (So your SA client appears to be other than OpenIB, right ?). The one thing wrong with it which causes it to be ignored on the OpenSM side is that the method is not set properly. It should be SubadmGetTable. Can you change that and retry ? > OpenSM says that it received the initial packet again: ^^^^^^^^^^^^^^^^^^^^ sends > received MAD: > struct HdrLRH (8 bytes) - > Local Route Header (section 7.7) > { > VL: 0x0 (4 bit uint) > LVer: 0x0 (4 bit uint) > SL: 0x0 (4 bit uint) > rsv0: 0x0 (2 bit uint) > LNH: 0x2 (2 bit uint) > DLID: 0x0001 (16 bit uint) > rsv1: 0x00 (5 bit uint) > pktLen: 0x048 (11 bit uint) > SLID: 0x0002 (16 bit uint) > } > MAD: struct SAHeader (56 bytes) - > SA Header (section 15.2.1.1) > { > RMPPHeader: struct RMPPHeader (36 bytes) - > RMPP Header Fields (section 13.6.2.1) > { > MADHeader: struct MADHeader (24 bytes) - > MAD Base Header (section 13.4.3) > { > baseVersion: 0x01 (8 bit uint) > mgmtClass: 0x03 (8 bit uint) > classVersion: 0x02 (8 bit uint) > method: 0x92 (8 bit uint) > status: 0x0000 (16 bit uint) > classSpecific: 0x0000 (16 bit uint) > transactionID: 0x000000000003B59E (64 bit uint) > attributeID: 0x0011 (16 bit uint) > rsv0: 0x0000 (16 bit uint) > attributeModifier: 0x00000000 (32 bit uint) > } > RMPPVersion: 0x01 (8 bit uint) > RMPPType: 0x01 (8 bit uint) > RRespTime: 0x00 (5 bit uint) > RMPPFlags: 0x3 (3 bit uint) > RMPPStatus: 0x00 (8 bit uint) > data1: 0x00000001 (32 bit uint) > data2: 0x000002F0 (32 bit uint) > } > SMKey: 0x0000000000000000 (64 bit uint) > attributeOffset: 0x000E (16 bit uint) > rsv0: 0x0000 (16 bit uint) > componentMask: 0x0000000000000000 (64 bit uint) > } This is probably on a timeout as the OpenSM side has not received a proper ACK. Note that it is the kernel RMPP code which handles the most of what is being discussed here. -- Hal > Our application acks once again: > > sending MAD: > unrecognized MAD class attribute: 0x0000 > MAD: struct SAHeader (56 bytes) - > SA Header (section 15.2.1.1) > { > RMPPHeader: struct RMPPHeader (36 bytes) - > RMPP Header Fields (section 13.6.2.1) > { > MADHeader: struct MADHeader (24 bytes) - > MAD Base Header (section 13.4.3) > { > baseVersion: 0x01 (8 bit uint) > mgmtClass: 0x03 (8 bit uint) > classVersion: 0x00 (8 bit uint) > method: 0x00 (8 bit uint) > status: 0x0000 (16 bit uint) > classSpecific: 0x0000 (16 bit uint) > transactionID: 0x000000000003B59E (64 bit uint) > attributeID: 0x0000 (16 bit uint) > rsv0: 0x0000 (16 bit uint) > attributeModifier: 0x00000000 (32 bit uint) > } > RMPPVersion: 0x01 (8 bit uint) > RMPPType: 0x02 (8 bit uint) > RRespTime: 0x14 (5 bit uint) > RMPPFlags: 0x1 (3 bit uint) > RMPPStatus: 0x00 (8 bit uint) > data1: 0x00000001 (32 bit uint) > data2: 0x00000002 (32 bit uint) > } > SMKey: 0x0000000000000000 (64 bit uint) > attributeOffset: 0x0000 (16 bit uint) > rsv0: 0x0000 (16 bit uint) > componentMask: 0x0000000000000000 (64 bit uint) > } > > The RMPP sequence times out, our application sends another > subnAdmGetTable(SANodeRecord), ad infinitum. > > Cheers :-), > Roel. From halr at voltaire.com Fri Sep 16 03:02:28 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Sep 2005 06:02:28 -0400 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <43292A6B.7040906@mellanox.co.il> References: <1126711190.4425.439.camel@hal.voltaire.com> <43292A6B.7040906@mellanox.co.il> Message-ID: <1126864484.5425.11385.camel@hal.voltaire.com> On Thu, 2005-09-15 at 04:01, Eitan Zahavi wrote: > Now lets go back to the test: > > I use a machine connected through a single switch (IS3) to itself. > > I use osmtest -f c to get Nodes,Ports and PathRecords from the SM. > > From OpenSM Log file I see: > Sep 15 09:47:37 531029 [8003] -> osm_nr_rcv_process: Returning 3 records. > Sep 15 09:47:37 538586 [C004] -> osm_pir_rcv_process: Returning 27 records. > > So we can conclude the following RMPP transactions should be sent: > 1. NodeRec: > attrOffset is 14 and each record size with padding is 112bytes. > The RMPP with 336byte data should require 2 segments = ceiling(336/200). > First segment paylen should be 336 + 2 * 20 = 376. > Last segment paylen should be 336 - 200 + 20 = 156. > > 2. PortInfoRecords: > attrOffset is 8 and each record size with padding is 64bytes. > The RMPP with 1728 = 27 * 64byte data should require 9 segments = ceiling(1728/200). > First segment paylen should be 1728 + 9 * 20 = 1908. > Lat segment paylen should be 1728 - 8*200 + 20 = 148. > > What we see in the attached analyzer capture: > NodeInfoRec > Attr Expected Measured > Num Segments 2 2 > First Paylen 376 376 > Last Paylen 156 156 > > PortInfoRec > Attr Expected Measured > Num Segments 9 9 > First Paylen 1908 1908 > Last Paylen 148 148 > > So the response on the wire is 100% OK. Thanks Sean. > > Now I go to the SA client section: > > From osmtest log I see: > > NodeInfoRec: > Aug 21 14:46:56 [4017F6C0] -> __osmv_send_sa_req: Waiting for async event. > Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: [ > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: [ > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x807b8a4, size = 256. > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquired UMAD 0x807c198, size = 256. > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: ] > Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: Acquired p_madw = 0x807b898, p_mad = 0x807c1d0, size = 256. > Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: ] > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: [ > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88) > Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: [ > Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: ] > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: ] > I wonder how come the received MAD is only of 256 bytes. I expected it to be of headers + data = 56 + 336 = 392byte. Isn't the vendor_get above for the SA request being sent ? That's why the buffer allocated here is 256 bytes. The OpenIB umad_receiver is the only handling the RMPP receive. It allocates the proper size buffer for this. It first tries with a 256 byte buffer and if a bigger one is needed, the first umad_recv fails with the length of the buffer needed and then is reissued. The length that __osmv_sa_mad_rcv_cb is seeing is wrong. I'm not sure why yet (I am looking into this) but the length will not be what you are expecting due to the following: Buffers supplied to and from RMPP are assembled and contain 1 SA header's worth and the data (so there is space for only 1 RMPP header there). -- Hal From halr at voltaire.com Fri Sep 16 06:04:52 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Sep 2005 09:04:52 -0400 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <1126864484.5425.11385.camel@hal.voltaire.com> References: <1126711190.4425.439.camel@hal.voltaire.com> <43292A6B.7040906@mellanox.co.il> <1126864484.5425.11385.camel@hal.voltaire.com> Message-ID: <1126875891.5425.13400.camel@hal.voltaire.com> Hi Eitan, On Fri, 2005-09-16 at 06:02, Hal Rosenstock wrote: > On Thu, 2005-09-15 at 04:01, Eitan Zahavi wrote: > > Now lets go back to the test: > > > > I use a machine connected through a single switch (IS3) to itself. > > > > I use osmtest -f c to get Nodes,Ports and PathRecords from the SM. > > Now I go to the SA client section: > > > > From osmtest log I see: > > > > NodeInfoRec: > > Aug 21 14:46:56 [4017F6C0] -> __osmv_send_sa_req: Waiting for async event. > > Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: [ > > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: [ > > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x807b8a4, size = 256. > > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquired UMAD 0x807c198, size = 256. > > Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: ] > > Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: Acquired p_madw = 0x807b898, p_mad = 0x807c1d0, size = 256. > > Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: ] > > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: [ > > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 / 112 (88) > > Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: [ > > Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: ] > > Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: ] > > I wonder how come the received MAD is only of 256 bytes. I expected it to be of headers + data = 56 + 336 = 392byte. I see the problem: The 200 byte response is a short RMPP packet and the length is not being properly handled in the OpenIB OpenSM vendor layer. Patch for this shortly. BTW, are you running with OpenSM from osm-1.8.0-merge or the trunk for this ? -- Hal From halr at voltaire.com Fri Sep 16 06:08:42 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Sep 2005 09:08:42 -0400 Subject: [openib-general] [PATCH] osm_vendor_ibumad.c::umad_receiver: Fix length of short RMPP packets Message-ID: <1126876121.5425.13438.camel@hal.voltaire.com> osm_vendor_ibumad.c::umad_receiver: Fix length of short RMPP packets Signed-off-by: Hal Rosenstock Index: osm_vendor_ibumad.c =================================================================== --- osm_vendor_ibumad.c (revision 3457) +++ osm_vendor_ibumad.c (working copy) @@ -287,6 +287,10 @@ umad_receiver(void *p_ptr) continue; } + /* Need to fix up MAD size if short RMPP packet */ + if (length < MAD_BLOCK_SIZE) + madw_p->mad_size = length; + /* * Avoid copying by swapping mad buf pointers. * Do not use umad after this line. From ftillier at silverstorm.com Fri Sep 16 10:15:14 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 16 Sep 2005 10:15:14 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <432A5032.7050303@mellanox.co.il> Message-ID: <006501c5bae2$34f6c550$9e5aa8c0@infiniconsys.com> > From: Eitan Zahavi [mailto:eitan at mellanox.co.il] > Sent: Thursday, September 15, 2005 9:55 PM > > Roland Dreier wrote: > > The kernel will need to have the HCA port active with the mthca driver > > running before it can mount root and get to /etc/sysconfig/network or > > wherever the hostname is set. > > If we limit our solution to the common case: > On a regular non disk-less machine is it possible to have the node description > be set before the QP0 is physically UP? I don't know where the machine name is set in Linux. In Windows, it is stored in the registry, and loaded when the access layer first loads (before any QPs are allocated). - Fab From halr at voltaire.com Fri Sep 16 11:14:28 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Sep 2005 14:14:28 -0400 Subject: [openib-general] IPoIB SA Multicast Reregistration Message-ID: <1126894467.5425.17169.camel@hal.voltaire.com> Hi Roland, The following is what I am seeing: SM brings the subnet up. IPoIB does its multicast registration. That all works fine. Sometime later, the SM does a SM Set of PortInfo which causes IPoIB to first deregister all its multicasts and then register them. What I see is the following: If the SM does not see or respond to these requests, these requests do not seem to timeout and be rerequested. Any idea what is going on ? On some rare occasions, I do see one rerequst 4 minutes later but most of the time there are no rerequests. -- Hal From administrator at openib.org Fri Sep 16 11:59:09 2005 From: administrator at openib.org (administrator at openib.org) Date: Sat, 17 Sep 2005 00:59:09 +0600 Subject: [openib-general] Uhwwlztv Message-ID: <0IMY00EIT9IHYD@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: account-password.zip Type: application/octet-stream Size: 53534 bytes Desc: not available URL: From Administrator at openib.org Fri Sep 16 11:58:35 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Fri, 16 Sep 2005 13:58:35 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <0f2601c5baf0$a3fafee0$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Uhwwlztv Scanning time = 9/16/2005 1:58:35 PM Engine/Pattern = 7.510-1002/2.841.00 Action on virus found: The attachment account-password.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/16/2005 account-password.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] Uhwwlztv From Administrator at openib.org Fri Sep 16 11:58:55 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Fri, 16 Sep 2005 11:58:55 -0700 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <051f01c5baf0$afefe8a0$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Uhwwlztv Scanning time = 9/16/2005 11:58:55 AM Engine/Pattern = 7.510-1002/2.841.00 Action on virus found: The attachment account-password.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. From roel at yottayotta.com Fri Sep 16 12:27:10 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Fri, 16 Sep 2005 13:27:10 -0600 (MDT) Subject: [openib-general] Re: OpenSM problem?! In-Reply-To: <1126864422.5425.11372.camel@hal.voltaire.com> References: <1126864422.5425.11372.camel@hal.voltaire.com> Message-ID: Hal Rosenstock wrote: > Hi Roel, Hi Hal, > (So your SA client appears to be other than OpenIB, right ?). Very astute! ;-) > The one thing wrong with it which causes it to be ignored on the OpenSM > side is that the method is not set properly. It should be > SubadmGetTable. Can you change that and retry ? I added both classVersion and method and things are fine. Thank you for your help, Hal. Cheers :-), Roel. From eitan at mellanox.co.il Fri Sep 16 12:24:53 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 16 Sep 2005 22:24:53 +0300 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <1126875891.5425.13400.camel@hal.voltaire.com> References: <1126875891.5425.13400.camel@hal.voltaire.com> Message-ID: <432B1C05.5020309@mellanox.co.il> Hal Rosenstock wrote: > Hi Eitan, > > On Fri, 2005-09-16 at 06:02, Hal Rosenstock wrote: > >>On Thu, 2005-09-15 at 04:01, Eitan Zahavi wrote: >> >>>Now lets go back to the test: >>> >>>I use a machine connected through a single switch (IS3) to itself. >>> >>>I use osmtest -f c to get Nodes,Ports and PathRecords from the SM. > > >>>Now I go to the SA client section: >>> >>> From osmtest log I see: >>> >>>NodeInfoRec: >>>Aug 21 14:46:56 [4017F6C0] -> __osmv_send_sa_req: Waiting for async > > event. > >>>Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: [ >>>Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: [ >>>Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquiring UMAD for > > p_madw = 0x807b8a4, size = 256. > >>>Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: Acquired UMAD > > 0x807c198, size = 256. > >>>Aug 21 14:46:56 [40D87BB0] -> osm_vendor_get: ] >>>Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: Acquired p_madw = > > 0x807b898, p_mad = 0x807c1d0, size = 256. > >>>Aug 21 14:46:56 [40D87BB0] -> osm_mad_pool_get: ] >>>Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: [ >>>Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: Count = 1 = 200 > > / 112 (88) > >>>Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: [ >>>Aug 21 14:46:56 [40D87BB0] -> osmtest_query_res_cb: ] >>>Aug 21 14:46:56 [40D87BB0] -> __osmv_sa_mad_rcv_cb: ] >>>I wonder how come the received MAD is only of 256 bytes. I expected > > it to be of headers + data = 56 + 336 = 392byte. > > I see the problem: The 200 byte response is a short RMPP packet and the > length is not being properly handled in the OpenIB OpenSM vendor layer. > Patch for this shortly. But I see the error also on multi segment RMPP transaction... The __osmv_sa_mad_rcv_cb reports it got 256 byte mad. > > BTW, are you running with OpenSM from osm-1.8.0-merge or the trunk for > this ? No I'm using the trunk! I was happy to delete my copy of the merge branch. > > -- Hal > From halr at voltaire.com Fri Sep 16 12:40:40 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Sep 2005 15:40:40 -0400 Subject: [openib-general] Re: OpenSM problem?! In-Reply-To: References: <1126864422.5425.11372.camel@hal.voltaire.com> Message-ID: <1126899639.5425.18271.camel@hal.voltaire.com> Hi Roel, On Fri, 2005-09-16 at 15:27, Roel van der Goot wrote: > I added both classVersion and method and things are fine. Yes, I missed setting the class version too. > Thank you for your help, Hal. Glad it is working now :-) -- Hal From halr at voltaire.com Fri Sep 16 12:44:16 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 16 Sep 2005 15:44:16 -0400 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <432B1C05.5020309@mellanox.co.il> References: <1126875891.5425.13400.camel@hal.voltaire.com> <432B1C05.5020309@mellanox.co.il> Message-ID: <1126899855.5425.18321.camel@hal.voltaire.com> On Fri, 2005-09-16 at 15:24, Eitan Zahavi wrote: > > I see the problem: The 200 byte response is a short RMPP packet and the > > length is not being properly handled in the OpenIB OpenSM vendor layer. > > Patch for this shortly. > But I see the error also on multi segment RMPP transaction... Did you try it with the patch (or the svn version 3458 or later) ? The longer packets looked right to me. Maybe I missed something. Can you send me the details on this ? > The __osmv_sa_mad_rcv_cb reports it got 256 byte mad. Is that RMPP or a non RMPP response ? > > BTW, are you running with OpenSM from osm-1.8.0-merge or the trunk for > > this ? > No I'm using the trunk! I was happy to delete my copy of the merge branch. Glad to hear it. -- Hal From xma at us.ibm.com Fri Sep 16 13:29:26 2005 From: xma at us.ibm.com (Shirley Ma) Date: Fri, 16 Sep 2005 13:29:26 -0700 Subject: [openib-general] Re: Mellanox device in INIT state In-Reply-To: <20050914231747.GK31182@esmail.cup.hp.com> Message-ID: I might hit this problem -- netdev reference counting problem with ib_at, which was pointed out by Roland a week ago. The difference was I tried to remove ib_mthca, not ib_ipoib. The process hung in the kernel and couldn't recover. The counter would go to -1 if bringing down the interface down first. If loading both ib_at & ib_uat, when removing ipoib module without bringing the interface down, the reference count is 2, with the interface down, the reference is -3. ib_devs_changed() doesn't handle these events correctly. The workaround now is always removing ib_at/ib_uat before removing ib_ipoib/ib_mthca. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 Grant Grundler 09/14/2005 04:17 PM To "Michael S. Tsirkin" cc Grant Grundler , Shirley Ma/Beaverton/IBM at IBMUS, openib-general at openib.org Subject Re: Mellanox device in INIT state On Wed, Sep 14, 2005 at 11:26:10AM +0300, Michael S. Tsirkin wrote: > Seems to be a previous memory corruption that is biting us now. > Looks like prot->rsk_prot isnt NULL, and prot->name seems to > point to zeroed memory. Grant, is this reproducible? Yes - I think so. At least SDP is generating a segfault/stack trace to the console with it's loaded. Now that I'm recording the failures, I'm not certain the previous two failures were the same. > If so, could you please try running with the following patch, > and see what does it print? yup > MST > > Index: linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_inet.c > =================================================================== > --- linux-2.6.13.orig/drivers/infiniband/ulp/sdp/sdp_inet.c 2005-09-11 12:36:48.000000000 +0300 > +++ linux-2.6.13/drivers/infiniband/ulp/sdp/sdp_inet.c 2005-09-14 13:14:35.000000000 +0300 > @@ -1321,6 +1321,11 @@ static int __init sdp_init(void) > > sdp_dbg_init("SDP module load."); > > + printk("sdp_sk_proto.name = %s\n", sdp_sk_proto.name); > + printk("sdp_sk_proto.obj_size = %lld\n", (long long)sdp_sk_proto.obj_size); > + printk("sdp_init in_interrupt = %d\n", in_interrupt()); > + printk("sdp_init prot->rsk_prot = %p\n", prot->rsk_prot); The last printk failed to compile: vers/infiniband/ulp/sdp/sdp_inet.c:1327: error: 'proto' undeclared (first use in this function) I assume that was intended to be "sdp_sk_proto.rsk_prot". Output follows - but with a different failure this time. Something wierd is definitely going on. gsyprf3:/usr/src/linux-2.6.13# reload_ib + IPoIB=51 + ifconfig ib0 down ib0: ERROR while getting interface flags: No such device + ifconfig ib1 down ib1: ERROR while getting interface flags: No such device + rmmod ib_ipoib ib_uverbs ib_sdp ib_cm ib_sa ib_mthca ib_mad ib_core ERROR: Module ib_ipoib does not exist in /proc/modules ERROR: Module ib_uverbs does not exist in /proc/modules ERROR: Module ib_sdp does not exist in /proc/modules ERROR: Module ib_cm does not exist in /proc/modules ERROR: Module ib_sa does not exist in /proc/modules ERROR: Module ib_mthca does not exist in /proc/modules ERROR: Module ib_mad does not exist in /proc/modules ERROR: Module ib_core does not exist in /proc/modules + modprobe ib_mthca msi_x=1 ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) ib_mthca: Initializing ((¥) GSI 60 (level, low) -> CPU 0 (0x0000) vector 69 ACPI: PCI Interrupt 0000:81:00.0[A] -> GSI 60 (level, low) -> IRQ 69 (¥: Missing DCS, aborting. ACPI: PCI interrupt for device 0000:81:00.0 disabled GSI 60 (level, low) -> CPU 0 (0x0000) vector 69 unregistered + modprobe ib_ipoib + modprobe ib_sdp sdp_sk_proto.name = SDP sdp_sk_proto.obj_size = 1744 sdp_init in_interrupt = 0 sdp_init prot->rsk_prot = 0000000000000000 Uninitialised timer! This is just a warning. Your computer is OK function=0xa0000001008ac990, data=0xa00000020021b600 Call Trace: [] show_stack+0x80/0xa0 sp=e000004041267c50 bsp=e000004041260fe0 [] dump_stack+0x30/0x60 sp=e000004041267e20 bsp=e000004041260fc8 [] check_timer_failed+0xe0/0x120 sp=e000004041267e20 bsp=e000004041260fa8 [] __mod_timer+0x60/0x200 sp=e000004041267e20 bsp=e000004041260f68 [] queue_delayed_work+0x110/0x1c0 sp=e000004041267e30 bsp=e000004041260f38 [] sdp_link_addr_init+0x1a0/0x3e0 [ib_sdp] sp=e000004041267e30 bsp=e000004041260f10 [] sdp_init+0x160/0x900 [ib_sdp] sp=e000004041267e30 bsp=e000004041260ee8 [] sys_init_module+0x2e0/0x680 sp=e000004041267e30 bsp=e000004041260e60 [] ia64_ret_from_syscall+0x0/0x20 sp=e000004041267e30 bsp=e000004041260e60 [] __kernel_syscall_via_break+0x0/0x20 sp=e000004041268000 bsp=e000004041260e60 [ console hangs ] I can't abort/interrupt the modprobe command and it's not segfaulting this time. "ps -ef" shows (among other things): grundler at gsyprf3:~$ ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 15:32 ? 00:00:04 init [2] ... root 3972 2250 0 15:58 ttyS3 00:00:00 /bin/sh -x /usr/local/bin/reload root 3998 9 0 15:58 ? 00:00:00 [ipoib] root 3999 3972 99 15:58 ttyS3 00:08:30 modprobe ib_sdp root 4003 9 0 15:58 ? 00:00:00 [ib_cm/0] root 4004 9 0 15:58 ? 00:00:00 [ib_cm/1] root 4008 9 0 15:58 ? 00:00:00 [sdp_wq/0] root 4009 9 0 15:58 ? 00:00:00 [sdp_wq/1] ... grundler at gsyprf3:~$ ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.2 0.0 3440 1328 ? S 15:32 0:04 init [2] ... root 3972 0.0 0.1 5584 2624 ttyS3 S+ 15:58 0:00 /bin/sh -x /usr root 3998 0.0 0.0 0 0 ? S< 15:58 0:00 [ipoib] root 3999 99.9 0.2 6624 4592 ttyS3 R+ 15:58 9:50 modprobe ib_sdp root 4003 0.0 0.0 0 0 ? S< 15:58 0:00 [ib_cm/0] root 4004 0.0 0.0 0 0 ? S< 15:58 0:00 [ib_cm/1] root 4008 0.0 0.0 0 0 ? S< 15:58 0:00 [sdp_wq/0] root 4009 0.0 0.0 0 0 ? S< 15:58 0:00 [sdp_wq/1] ... "kill -9 3999" didn't have the intended effect either. I'll rebuild with SDP_DEBUG options and see if that changes it yet again. grant -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Fri Sep 16 14:27:40 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 16 Sep 2005 14:27:40 -0700 Subject: [openib-general] [RFC] send side QP redirection Message-ID: <432B38CC.50106@ichips.intel.com> I'd like to get feedback about a possible implementation for requester QP redirection based on the APIs given below. Specifically, I'm referring to GSI redirection (spec 13.5.2) and port and CM redirection (REJ code 24). The basic proposal is to combine QP redirection as an extension to an address handle cache. Remote agents are identified by management class, LID, and GID. struct ib_mad_av { struct ib_ah *ah; u32 remote_qpn; u32 remote_qkey; u16 pkey_index; }; /* * Insert a new destination into the map and create a new address * vector to it. If the destination already exists, simply return * the current address vector. * / struct ib_mad_av* ib_insert_mad_dest(struct ib_mad_agent*, struct ib_ah_attr*, mgmt_class, lid, gid, pkey); struct ib_mad_av* ib_get_mad_av(struct ib_mad_agent*, mgmt_class, lid, gid); /* * TBD: need to determine when to remove a destination. Can remove * always if references go to 0. Can add a delay before removal. Can * maintain destinations that have been redirected. ? */ void ib_free_mad_av(struct ib_mad_av*); ib_redirect_mads(struct ib_mad_av*, struct ib_mad_recv_wc*, struct ib_class_port_info*); /* GID redirection with unknown LID would be deferred */ int ib_get_mad_redirect_path(...); Questions or thoughts? - Sean From mshefty at ichips.intel.com Fri Sep 16 17:11:10 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 16 Sep 2005 17:11:10 -0700 Subject: [openib-general] netdev reference counting problem with ib_at In-Reply-To: <52hdctg7jv.fsf@cisco.com> References: <52hdctg7jv.fsf@cisco.com> Message-ID: <432B5F1E.7090909@ichips.intel.com> Roland Dreier wrote: > ib_at needs to be reworked so that it doesn't keep perpetual > references to netdevs. I continue to hit this same issue, so I've started looking at the ib_at code. Ib_at accesses struct ipoib_dev_priv to get information about the related port that IPoIB is using. Is there some other way for AT to get to the same information? It seems wrong for AT to poke into the priv data of a net_device. Should IPoIB expose a function that AT can call to map IP addresses (or net_device) to IB ports? How do we want to handle this long term? - Sean From rolandd at cisco.com Fri Sep 16 17:15:22 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 16 Sep 2005 17:15:22 -0700 Subject: [openib-general] netdev reference counting problem with ib_at In-Reply-To: <432B5F1E.7090909@ichips.intel.com> (Sean Hefty's message of "Fri, 16 Sep 2005 17:11:10 -0700") References: <52hdctg7jv.fsf@cisco.com> <432B5F1E.7090909@ichips.intel.com> Message-ID: <528xxwk0rp.fsf@cisco.com> Sean> I continue to hit this same issue, so I've started looking Sean> at the ib_at code. Ib_at accesses struct ipoib_dev_priv to Sean> get information about the related port that IPoIB is using. Sean> Is there some other way for AT to get to the same Sean> information? It seems wrong for AT to poke into the priv Sean> data of a net_device. Should IPoIB expose a function that AT Sean> can call to map IP addresses (or net_device) to IB ports? Sean> How do we want to handle this long term? It probably makes sense to add an ib_ptr (or rdma_ptr) to struct net_device (along with all the other ones like ip_ptr, dn_ptr, ax25_ptr, etc). - R. From arlin.r.davis at intel.com Fri Sep 16 17:31:00 2005 From: arlin.r.davis at intel.com (Arlin Davis) Date: Fri, 16 Sep 2005 17:31:00 -0700 Subject: [openib-general] [PATCH] uDAPL, support for ib_cm_init_qp_attr and new cm event model Message-ID: James, Here are some changes to support ib_cm_init_qp_attr() and the cm event processing on a per device basis. Also, added copyright credits for kDAPL cm work that was used in uDAPL. Attachment included. Thanks, -arlin Signed-off by: Arlin Davis Index: dapl/openib/TODO =================================================================== --- dapl/openib/TODO (revision 3459) +++ dapl/openib/TODO (working copy) @@ -7,8 +7,6 @@ DAPL: - reinit EP needs a QP timewait completion notification - direct cq_wait_object when multi-CQ verbs event support arrives -- async event support -- add support for ib_cm_init_qp_attr - shared receive queue support Under discussion: Index: dapl/openib/dapl_ib_util.c =================================================================== --- dapl/openib/dapl_ib_util.c (revision 3459) +++ dapl/openib/dapl_ib_util.c (working copy) @@ -56,6 +56,7 @@ #include #include #include +#include int g_dapl_loopback_connection = 0; int g_ib_destroy = 0; @@ -85,26 +86,49 @@ at_rec.retries = 0; /* call with async_comp until the sync version works */ - status = ib_at_ips_by_gid(&hca_ptr->ib_trans.gid, &ipv4_addr->sin_addr.s_addr, 1, + status = ib_at_ips_by_gid(&hca_ptr->ib_trans.gid, + &ipv4_addr->sin_addr.s_addr, 1, &at_comp, &at_rec.req_id); if (status < 0) { dapl_dbg_log (DAPL_DBG_TYPE_ERR, - " get_hca_addr: ERR ips_by_gid %d %s \n", + " ips_by_gid: ERR ips_by_gid %d %s \n", status, strerror(errno)); return 1; } dapl_dbg_log (DAPL_DBG_TYPE_UTIL, - " get_hca_addr: ips_by_gid ret %d at_rec %p -> id %lld\n", + " ips_by_gid: RET %d at_rec %p -> id %lld\n", status, &at_rec, at_rec.req_id ); if (status > 0) { dapli_ip_comp_handler(at_rec.req_id, (void*)&at_rec, status); } else { - dat_status = dapl_os_wait_object_wait(&hca_ptr->ib_trans.wait_object,500000); - if (dat_status != DAT_SUCCESS) - ib_at_cancel(at_rec.req_id); + /* limit the resolution and cancel times */ + dat_status = dapl_os_wait_object_wait( + &hca_ptr->ib_trans.wait_object, + 500000); + if (dat_status != DAT_SUCCESS) { + dapl_dbg_log( + DAPL_DBG_TYPE_UTIL, + " ips_by_gid: REQ TIMEOUT, cancel %lld\n", + at_rec.req_id); + + /* + * FIX: indeterministic + * AT may or may not provide -EINTR event + */ + ib_at_cancel(at_rec.req_id); + + if (dapl_os_wait_object_wait( + &hca_ptr->ib_trans.wait_object, + 500000) != DAT_SUCCESS) + { + dapl_dbg_log(DAPL_DBG_TYPE_ERR, + " ips_by_gid: cancel %lld failed\n", + at_rec.req_id); + } + } } if (!ipv4_addr->sin_addr.s_addr) @@ -130,6 +154,7 @@ */ int32_t dapls_ib_init (void) { + long opts; dapl_dbg_log (DAPL_DBG_TYPE_UTIL, " dapl_ib_init: \n" ); /* initialize hca_list lock */ @@ -142,6 +167,11 @@ if (pipe(g_ib_pipe)) return 1; + /* set AT fd to non-blocking */ + opts = fcntl(ib_at_get_fd(), F_GETFL); + if (fcntl(ib_at_get_fd(), F_SETFL, opts | O_NONBLOCK) < 0) + return 1; + if (dapli_ib_thread_init()) return 1; @@ -177,6 +207,8 @@ IN DAPL_HCA *hca_ptr) { struct dlist *dev_list; + long opts; + int i; dapl_dbg_log (DAPL_DBG_TYPE_UTIL, " open_hca: %s - %p\n", hca_name, hca_ptr ); @@ -227,10 +259,10 @@ (unsigned long long)bswap_64(hca_ptr->ib_trans.gid.global.subnet_prefix), (unsigned long long)bswap_64(hca_ptr->ib_trans.gid.global.interface_id) ); - /* get the IP address of the device */ + /* get the IP address of the device using GID */ if (dapli_get_hca_addr(hca_ptr)) { dapl_dbg_log (DAPL_DBG_TYPE_ERR, - " open_hca: IB get ADDR failed for %s\n", + " open_hca: ERR ib_at_ips_by_gid for %s\n", ibv_get_device_name(hca_ptr->ib_trans.ib_dev) ); goto bail; } @@ -238,6 +270,34 @@ /* initialize hca wait object for uAT event */ dapl_os_wait_object_init(&hca_ptr->ib_trans.wait_object); + /* set event FD's to non-blocking */ + opts = fcntl(hca_ptr->ib_hca_handle->async_fd, F_GETFL); /* uASYNC */ + if (opts < 0 || fcntl(hca_ptr->ib_hca_handle->async_fd, + F_SETFL, opts | O_NONBLOCK) < 0) { + dapl_dbg_log (DAPL_DBG_TYPE_ERR, + " open_hca: ERR with async FD\n" ); + goto bail; + } + for (i=0;iib_hca_handle->num_comp;i++) { /* uCQ */ + opts = fcntl(hca_ptr->ib_hca_handle->cq_fd[i], F_GETFL); + if (opts < 0 || fcntl(hca_ptr->ib_hca_handle->async_fd, + F_SETFL, opts | O_NONBLOCK) < 0) { + dapl_dbg_log(DAPL_DBG_TYPE_ERR, + " open_hca: ERR with CQ FD\n"); + goto bail; + } + } + + /* Get CM device handle for events, and set to non-blocking */ + hca_ptr->ib_trans.ib_cm = ib_cm_get_device(hca_ptr->ib_hca_handle); + opts = fcntl(hca_ptr->ib_trans.ib_cm->fd, F_GETFL); /* uCM */ + if (opts < 0 || fcntl(hca_ptr->ib_trans.ib_cm->fd, + F_SETFL, opts | O_NONBLOCK) < 0) { + dapl_dbg_log (DAPL_DBG_TYPE_ERR, + " open_hca: ERR with CM FD\n" ); + goto bail; + } + /* * Put new hca_transport on list for async and CQ event processing * Wakeup work thread to add to polling list @@ -509,20 +569,34 @@ void dapli_ib_thread_destroy(void) { + int retries = 10; dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " ib_thread_destroy(%d)\n", getpid()); - /* destroy ib_thread, wait for termination */ + /* + * wait for async thread to terminate. + * pthread_join would be the correct method + * but some applications have some issues + */ + + /* destroy ib_thread, wait for termination, if not already */ + dapl_os_lock( &g_hca_lock ); g_ib_destroy = 1; write(g_ib_pipe[1], "w", sizeof "w"); - while (g_ib_destroy != 2) { + + while ((g_ib_destroy != 2) && (retries--)) { struct timespec sleep, remain; sleep.tv_sec = 0; - sleep.tv_nsec = 10000000; /* 10 ms */ + sleep.tv_nsec = 20000000; /* 20 ms */ dapl_dbg_log(DAPL_DBG_TYPE_UTIL, - " ib_thread_destroy: waiting for ib_thread\n"); + " ib_thread_destroy: waiting for ib_thread\n"); + write(g_ib_pipe[1], "w", sizeof "w"); + dapl_os_unlock( &g_hca_lock ); nanosleep(&sleep, &remain); + dapl_os_lock( &g_hca_lock ); } + dapl_os_unlock( &g_hca_lock ); + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " ib_thread_destroy(%d) exit\n",getpid()); } @@ -639,30 +713,26 @@ struct _ib_hca_transport *hca; int ret,idx,fds; char rbuf[2]; - + dapl_dbg_log (DAPL_DBG_TYPE_UTIL, - " ib_thread(%d,0x%x): ENTER: pipe %d cm %d at %d\n", + " ib_thread(%d,0x%x): ENTER: pipe %d at %d\n", getpid(), g_ib_thread, - g_ib_pipe[0], ib_cm_get_fd(), - ib_at_get_fd()); + g_ib_pipe[0], ib_at_get_fd()); /* Poll across pipe, CM, AT never changes */ dapl_os_lock( &g_hca_lock ); ufds[0].fd = g_ib_pipe[0]; /* pipe */ ufds[0].events = POLLIN; - ufds[1].fd = ib_cm_get_fd(); /* uCM */ + ufds[1].fd = ib_at_get_fd(); /* uAT */ ufds[1].events = POLLIN; - ufds[2].fd = ib_at_get_fd(); /* uAT */ - ufds[2].events = POLLIN; - + while (!g_ib_destroy) { - /* build ufds after pipe, cm, at events */ + /* build ufds after pipe, at events */ ufds[0].revents = 0; ufds[1].revents = 0; - ufds[2].revents = 0; - idx=2; + idx=1; /* Walk HCA list and setup async and CQ events */ if (!dapl_llist_is_empty(&g_hca_list)) @@ -672,6 +742,10 @@ while(hca) { int i; + ufds[++idx].fd = hca->ib_cm->fd; /* uCM */ + ufds[idx].events = POLLIN; + ufds[idx].revents = 0; + uhca[idx] = hca; ufds[++idx].fd = hca->ib_ctx->async_fd; /* uASYNC */ ufds[idx].events = POLLIN; ufds[idx].revents = 0; @@ -699,10 +773,11 @@ continue; } - /* check and process CQ and ASYNC events, each open device */ - for(idx=3;idxid %lld id %lld rec_num %d %x\n", + " ip_comp_handler: rec %p ->id %lld id %lld num %d %x\n", context, at_rec->req_id, req_id, rec_num, ipv4_addr->sin_addr.s_addr); - if (rec_num <= 0) { + if (rec_num <= 0) { struct ib_at_completion at_comp; dapl_dbg_log(DAPL_DBG_TYPE_CM, @@ -91,7 +95,7 @@ ipv4_addr->sin_addr.s_addr = 0; - if (++at_rec->retries > IB_MAX_AT_RETRY) + if ((++at_rec->retries > IB_MAX_AT_RETRY) || (rec_num == -EINTR)) goto bail; at_comp.fn = dapli_ip_comp_handler; @@ -103,9 +107,9 @@ if (status < 0) goto bail; - dapl_dbg_log (DAPL_DBG_TYPE_UTIL, - " ip_comp_handler: NEW ips_by_gid ret %d at_rec %p -> id %lld\n", - status, at_rec, at_rec->req_id ); + dapl_dbg_log(DAPL_DBG_TYPE_UTIL, + " ip_comp_handler: ips_by_gid %d rec %p->id %lld\n", + status, at_rec, at_rec->req_id ); } if (ipv4_addr->sin_addr.s_addr) @@ -114,7 +118,7 @@ return; bail: dapl_dbg_log(DAPL_DBG_TYPE_CM, - " ip_comp_handler: ERR: at_rec %p, req_id %lld rec_num %d\n", + " ip_comp_handler: ERR: at_rec %p, id %lld num %d\n", at_rec, req_id, rec_num); dapl_os_wait_object_wakeup(at_rec->wait_object); @@ -130,23 +134,35 @@ " path_comp_handler: ctxt %p, req_id %lld rec_num %d\n", context, req_id, rec_num); + dapl_dbg_log(DAPL_DBG_TYPE_CM, + " path_comp_handler: SRC GID subnet %016llx id %016llx\n", + (unsigned long long)cpu_to_be64(conn->dapl_rt.sgid.global.subnet_prefix), + (unsigned long long)cpu_to_be64(conn->dapl_rt.sgid.global.interface_id) ); + + dapl_dbg_log(DAPL_DBG_TYPE_CM, + " path_comp_handler: DST GID subnet %016llx id %016llx\n", + (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.subnet_prefix), + (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.interface_id) ); + if (rec_num <= 0) { dapl_dbg_log(DAPL_DBG_TYPE_CM, - " path_comp_handler: resolution err %d retry %d\n", + " path_comp_handler: ERR %d retry %d\n", rec_num, conn->retries + 1); if (++conn->retries > IB_MAX_AT_RETRY) { dapl_dbg_log(DAPL_DBG_TYPE_CM, - " path_comp_handler: ep_ptr 0x%p\n",conn->ep); + " path_comp_handler: ERR no PATH (ep=%p)\n", + conn->ep); event = IB_CME_DESTINATION_UNREACHABLE; goto bail; } status = ib_at_paths_by_route(&conn->dapl_rt, 0, &conn->dapl_path, 1, - &conn->dapl_comp, &conn->dapl_comp.req_id); + &conn->dapl_comp, + &conn->dapl_comp.req_id); if (status) { - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " path_by_route: err %d id %lld\n", + dapl_dbg_log(DAPL_DBG_TYPE_ERR, + " path_by_route: retry ERR %d id %lld\n", status, conn->dapl_comp.req_id); event = IB_CME_LOCAL_FAILURE; goto bail; @@ -185,20 +201,9 @@ " rt_comp_handler: conn %p, req_id %lld rec_num %d\n", conn, req_id, rec_num); - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " rt_comp_handler: SRC GID subnet %016llx id %016llx\n", - (unsigned long long)cpu_to_be64(conn->dapl_rt.sgid.global.subnet_prefix), - (unsigned long long)cpu_to_be64(conn->dapl_rt.sgid.global.interface_id) ); - - dapl_dbg_log(DAPL_DBG_TYPE_CM, - " rt_comp_handler: DST GID subnet %016llx id %016llx\n", - (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.subnet_prefix), - (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.interface_id) ); - - if (rec_num <= 0) { dapl_dbg_log(DAPL_DBG_TYPE_ERR, - " dapl_rt_comp_handler: rec %d retry %d\n", + " dapl_rt_comp_handler: ERROR rec %d retry %d\n", rec_num, conn->retries+1 ); if (++conn->retries > IB_MAX_AT_RETRY) { @@ -206,10 +211,11 @@ goto bail; } - status = ib_at_route_by_ip(((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, - 0, 0, IB_AT_ROUTE_FORCE_ATS, - &conn->dapl_rt, - &conn->dapl_comp,&conn->dapl_comp.req_id); + status = ib_at_route_by_ip( + ((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, + 0, 0, IB_AT_ROUTE_FORCE_ATS, + &conn->dapl_rt, + &conn->dapl_comp,&conn->dapl_comp.req_id); if (status < 0) { dapl_dbg_log(DAPL_DBG_TYPE_ERR, "dapl_rt_comp_handler: " "ib_at_route_by_ip failed with status %d\n", @@ -223,9 +229,10 @@ return; } - if (!conn->dapl_rt.dgid.global.subnet_prefix || req_id != conn->dapl_comp.req_id) { + if (!conn->dapl_rt.dgid.global.subnet_prefix || + req_id != conn->dapl_comp.req_id) { dapl_dbg_log(DAPL_DBG_TYPE_ERR, - " dapl_rt_comp_handler: ERROR: unexpected callback req_id=%d(%d)\n", + " dapl_rt_comp_handler: ERROR: cb id=%d(%d)\n", req_id, conn->dapl_comp.req_id ); return; } @@ -234,11 +241,13 @@ conn->dapl_comp.context = conn; conn->retries = 0; status = ib_at_paths_by_route(&conn->dapl_rt, 0, &conn->dapl_path, 1, - &conn->dapl_comp, &conn->dapl_comp.req_id); + &conn->dapl_comp, + &conn->dapl_comp.req_id); if (status) { dapl_dbg_log(DAPL_DBG_TYPE_ERR, "dapl_rt_comp_handler: ib_at_paths_by_route " - "returned %d id %lld\n", status, conn->dapl_comp.req_id); + "returned %d id %lld\n", status, + conn->dapl_comp.req_id); event = IB_CME_LOCAL_FAILURE; goto bail; } @@ -284,19 +293,16 @@ /* move QP state to RTR and RTS */ /* TODO: could use a ib_cm_init_qp_attr() call here */ dapl_dbg_log(DAPL_DBG_TYPE_CM, - " rep_recv: RTR_RTS: cm_id %d r_qp 0x%x r_lid 0x%x r_SID %d\n", + " rep_recv: RTR_RTS: id %d rqp %x rlid %x rSID %d\n", conn->cm_id,event->param.rep_rcvd.remote_qpn, - ntohs(conn->req.primary_path->dlid),conn->service_id ); + ntohs(conn->req.primary_path->dlid),conn->service_id); if ( dapls_modify_qp_state( conn->ep->qp_handle, - IBV_QPS_RTR, - event->param.rep_rcvd.remote_qpn, - ntohs(conn->req.primary_path->dlid), - 1 ) != DAT_SUCCESS ) + IBV_QPS_RTR, conn ) != DAT_SUCCESS ) goto disc; if ( dapls_modify_qp_state( conn->ep->qp_handle, - IBV_QPS_RTS,0,0,0 ) != DAT_SUCCESS) + IBV_QPS_RTS, conn ) != DAT_SUCCESS) goto disc; @@ -356,10 +362,10 @@ sizeof(struct ib_sa_path_rec)); dapl_dbg_log(DAPL_DBG_TYPE_CM, " passive_cb: " - "REQ on HCA %p SP %p SID %d L_ID %d new_id %d p_data %p\n", - new_conn->hca, new_conn->sp, - conn->service_id, conn->cm_id, new_conn->cm_id, - event->private_data ); + "REQ on HCA %p SP %p SID %d LID %d new_id %d pd %p\n", + new_conn->hca, new_conn->sp, + conn->service_id, conn->cm_id, new_conn->cm_id, + event->private_data ); } return new_conn; @@ -454,7 +460,8 @@ new_conn = dapli_req_recv(conn,event); if (new_conn) - dapls_cr_callback(new_conn, IB_CME_CONNECTION_REQUEST_PENDING, + dapls_cr_callback(new_conn, + IB_CME_CONNECTION_REQUEST_PENDING, event->private_data, new_conn->sp); break; case IB_CM_REP_ERROR: @@ -468,7 +475,7 @@ case IB_CM_RTU_RECEIVED: /* move QP to RTS state */ if ( dapls_modify_qp_state(conn->ep->qp_handle, - IBV_QPS_RTS,0,0,0 ) != DAT_SUCCESS) { + IBV_QPS_RTS, conn ) != DAT_SUCCESS) { dapls_cr_callback(conn, IB_CME_LOCAL_FAILURE, NULL, conn->sp); } else { @@ -556,7 +563,7 @@ ep_ptr = (DAPL_EP*)ep_handle; qp_ptr = ep_ptr->qp_handle; - dapl_dbg_log(DAPL_DBG_TYPE_CM, " connect: r_SID %d, pdata %p, plen %d\n", + dapl_dbg_log(DAPL_DBG_TYPE_CM, " connect: rSID %d, pdata %p, ln %d\n", r_qual,p_data,p_size); /* Allocate CM and initialize lock */ @@ -573,20 +580,21 @@ conn->ep = ep_ptr; conn->hca = ep_ptr->header.owner_ia->hca_ptr; - status = ib_cm_create_id(conn->hca->ib_hca_handle, &conn->cm_id, conn); + status = ib_cm_create_id(conn->hca->ib_hca_handle, + &conn->cm_id, conn); if (status < 0) { dat_status = dapl_convert_errno(errno,"create_cm_id"); dapl_os_free(conn, sizeof(*conn)); return dat_status; } - conn->ep->cm_handle = conn; + ep_ptr->cm_handle = conn; /* Setup QP/CM parameters */ (void)dapl_os_memzero(&conn->req,sizeof(conn->req)); conn->service_id = r_qual; conn->req.qp_num = ep_ptr->qp_handle->qp_num; conn->req.qp_type = IBV_QPT_RC; - conn->req.starting_psn = 1; + conn->req.starting_psn = ep_ptr->qp_handle->qp_num; conn->req.private_data = p_data; conn->req.private_data_len = p_size; conn->req.peer_to_peer = 0; @@ -607,14 +615,14 @@ status = ib_at_route_by_ip( ((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr, - ((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr, - 0, IB_AT_ROUTE_FORCE_ATS, &conn->dapl_rt, &conn->dapl_comp, - &conn->dapl_comp.req_id); - - dapl_dbg_log(DAPL_DBG_TYPE_CM, " connect: at_route ret=%d,%s req_id %d GID %016llx %016llx\n", - status, strerror(errno), conn->dapl_comp.req_id, - (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.subnet_prefix), - (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.interface_id) ); + ((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr, + 0, 0, &conn->dapl_rt, &conn->dapl_comp, &conn->dapl_comp.req_id); + + dapl_dbg_log(DAPL_DBG_TYPE_CM, + " connect: at_route requested(ret=%d,id=%d): SRC %x DST %x\n", + status, conn->dapl_comp.req_id, + ((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr, + ((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr); if (status < 0) { dat_status = dapl_convert_errno(errno,"ib_at_route_by_ip"); @@ -653,7 +661,7 @@ int status; dapl_dbg_log (DAPL_DBG_TYPE_CM, - " disconnect(ep_handle %p, conn %p, cm_id %d flags %x)\n", + " disconnect(ep %p, conn %p, id %d flags %x)\n", ep_ptr,conn, (conn?conn->cm_id:0),close_flags); if (conn == IB_INVALID_HANDLE) @@ -703,7 +711,7 @@ dapls_ib_disconnect(ep_ptr, DAT_CLOSE_ABRUPT_FLAG); if (ep_ptr->qp_handle != IB_INVALID_HANDLE) - dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_ERR, 0,0,0); + dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_ERR, 0 ); } /* @@ -749,8 +757,8 @@ return DAT_INTERNAL_ERROR; } - status = ib_cm_create_id(ia_ptr->hca_ptr->ib_hca_handle, &conn->cm_id, - conn); + status = ib_cm_create_id(ia_ptr->hca_ptr->ib_hca_handle, + &conn->cm_id, conn); if (status < 0) { dat_status = dapl_convert_errno(errno,"create_cm_id"); dapl_os_free(conn, sizeof(*conn)); @@ -758,8 +766,8 @@ } dapl_dbg_log(DAPL_DBG_TYPE_CM, - " setup_listener(ia_ptr %p SID %d sp_ptr %p conn %p cm_id %d)\n", - ia_ptr, ServiceID, sp_ptr, conn, conn->cm_id ); + " setup_listener(ia_ptr %p SID %d sp %p conn %p id %d)\n", + ia_ptr, ServiceID, sp_ptr, conn, conn->cm_id ); sp_ptr->cm_srvc_handle = conn; conn->sp = sp_ptr; @@ -864,7 +872,7 @@ conn = cr_ptr->ib_cm_handle; dapl_dbg_log (DAPL_DBG_TYPE_CM, - " accept_connection(cr %p conn %p, cm_id %d, p_data %p, p_sz=%d)\n", + " accept(cr %p conn %p, id %d, p_data %p, p_sz=%d)\n", cr_ptr, conn, conn->cm_id, p_data, p_size ); /* Obtain size of private data structure & contents */ @@ -888,13 +896,9 @@ } } - /* move QP to RTR state, TODO fix port setting */ - /* TODO: could use a ib_cm_init_qp_attr() call here */ + /* move QP to RTR state */ dat_status = dapls_modify_qp_state(ep_ptr->qp_handle, - IBV_QPS_RTR, - conn->req_rcvd.remote_qpn, - ntohs(conn->req_rcvd.primary_path->dlid), - 1 ); + IBV_QPS_RTR, conn); if (dat_status != DAT_SUCCESS ) { dapl_dbg_log(DAPL_DBG_TYPE_ERR, " accept: modify_qp_state failed: %d\n", @@ -910,6 +914,7 @@ passive_params.private_data = p_data; passive_params.private_data_len = p_size; passive_params.qp_num = ep_ptr->qp_handle->qp_num; + passive_params.starting_psn = ep_ptr->qp_handle->qp_num; passive_params.responder_resources = IB_TARGET_MAX; passive_params.initiator_depth = IB_INITIATOR_DEPTH; passive_params.rnr_retry_count = IB_RNR_RETRY_COUNT; @@ -1156,8 +1161,8 @@ } } dapl_dbg_log (DAPL_DBG_TYPE_CALLBACK, - "dapls_ib_get_dat_event: event translate(%s) ib=0x%x dat=0x%x\n", - active ? "active" : "passive", ib_cm_event, dat_event_num); + "dapls_ib_get_dat_event: event(%s) ib=0x%x dat=0x%x\n", + active ? "active" : "passive", ib_cm_event, dat_event_num); return dat_event_num; } @@ -1195,14 +1200,15 @@ return ib_cm_event; } -void dapli_cm_event_cb() + +void dapli_cm_event_cb(struct _ib_hca_transport *hca) { struct ib_cm_event *event; dapl_dbg_log(DAPL_DBG_TYPE_UTIL, " dapli_cm_event()\n"); /* process one CM event, fairness */ - if(!ib_cm_get_event_timed(0,&event)) { + if(!ib_cm_get_event(hca->ib_cm,&event)) { struct dapl_cm_id *conn; int ret; dapl_dbg_log(DAPL_DBG_TYPE_CM, Index: dapl/openib/dapl_ib_qp.c =================================================================== --- dapl/openib/dapl_ib_qp.c (revision 3459) +++ dapl/openib/dapl_ib_qp.c (working copy) @@ -112,7 +112,7 @@ qp_create.qp_type = IBV_QPT_RC; qp_create.qp_context = (void*)ep_ptr; - ep_ptr->qp_handle = ibv_create_qp( ib_pd_handle, &qp_create); + ep_ptr->qp_handle = ibv_create_qp(ib_pd_handle, &qp_create); if (!ep_ptr->qp_handle) return(dapl_convert_errno(ENOMEM, "create_qp")); @@ -121,10 +121,10 @@ ep_ptr->qp_handle->qp_num, qp_create.cap.max_send_wr,qp_create.cap.max_send_sge, qp_create.cap.max_recv_wr,qp_create.cap.max_recv_sge ); - + /* Setup QP attributes for INIT state on the way out */ if (dapls_modify_qp_state(ep_ptr->qp_handle, - IBV_QPS_INIT,0,0,0 ) != DAT_SUCCESS ) { + IBV_QPS_INIT, 0) != DAT_SUCCESS ) { ibv_destroy_qp(ep_ptr->qp_handle); ep_ptr->qp_handle = IB_INVALID_HANDLE; return DAT_INTERNAL_ERROR; @@ -161,7 +161,7 @@ if (ep_ptr->qp_handle != IB_INVALID_HANDLE) { /* force error state to flush queue, then destroy */ - dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_ERR, 0,0,0); + dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_ERR, 0); if (ibv_destroy_qp(ep_ptr->qp_handle)) return(dapl_convert_errno(errno,"destroy_qp")); @@ -217,7 +217,7 @@ (ep_ptr->qp_handle->state != IBV_QPS_ERR)) { ep_ptr->qp_state = IB_QP_STATE_ERROR; return (dapls_modify_qp_state(ep_ptr->qp_handle, - IBV_QPS_ERR,0,0,0)); + IBV_QPS_ERR, 0)); } /* @@ -272,8 +272,8 @@ if ( ep_ptr->qp_handle != IB_INVALID_HANDLE ) { /* move to RESET state and then to INIT */ - dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_RESET, 0,0,0); - dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_INIT, 0,0,0); + dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_RESET, 0); + dapls_modify_qp_state(ep_ptr->qp_handle, IBV_QPS_INIT, 0); ep_ptr->qp_state = IB_QP_STATE_INIT; } @@ -283,104 +283,101 @@ } /* - * Generic QP modify for init, reset, error, RTS, RTR + * Generic QP modify for reset, error, INIT, RTS, RTR */ DAT_RETURN dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, IN ib_qp_state_t qp_state, - IN uint32_t qpn, - IN uint16_t lid, - IN uint8_t port ) + IN struct dapl_cm_id *conn ) { - struct ibv_qp_attr qp_attr; - enum ibv_qp_attr_mask mask = IBV_QP_STATE; - - dapl_os_memzero((void*)&qp_attr, sizeof(qp_attr)); - qp_attr.qp_state = qp_state; + struct ibv_qp_attr attr; + int mask = 0; + + dapl_dbg_log (DAPL_DBG_TYPE_EP, + " modify_qp: qp %p, state %d qp_num 0x%x\n", + qp_handle, qp_state, qp_handle->qp_num); + + dapl_os_memzero((void*)&attr, sizeof(attr)); + attr.qp_state = qp_state; switch (qp_state) { - /* additional attributes with RTR and RTS */ - case IBV_QPS_RTR: - { - mask |= IBV_QP_AV | - IBV_QP_PATH_MTU | - IBV_QP_DEST_QPN | - IBV_QP_RQ_PSN | - IBV_QP_MAX_DEST_RD_ATOMIC | - IBV_QP_MIN_RNR_TIMER; - qp_attr.qp_state = IBV_QPS_RTR; - qp_attr.path_mtu = IBV_MTU_1024; - qp_attr.dest_qp_num = qpn; - qp_attr.rq_psn = 1; - qp_attr.max_dest_rd_atomic = IB_TARGET_MAX; - qp_attr.min_rnr_timer = 12; - qp_attr.ah_attr.is_global = 0; - qp_attr.ah_attr.dlid = lid; - qp_attr.ah_attr.sl = 0; - qp_attr.ah_attr.src_path_bits = 0; - qp_attr.ah_attr.port_num = port; - - dapl_dbg_log (DAPL_DBG_TYPE_EP, - " modify_qp_RTR: qpn %x lid %x port %x, rq_psn %x\n", - qpn,lid,port,ntohl(qp_attr.rq_psn) ); - break; - - } - case IBV_QPS_RTS: - { - mask |= IBV_QP_TIMEOUT | - IBV_QP_RETRY_CNT | - IBV_QP_RNR_RETRY | - IBV_QP_SQ_PSN | - IBV_QP_MAX_QP_RD_ATOMIC; - qp_attr.qp_state = IBV_QPS_RTS; - qp_attr.timeout = 14; - qp_attr.retry_cnt = 7; - qp_attr.rnr_retry = 7; - qp_attr.sq_psn = 1; - qp_attr.max_rd_atomic = IB_TARGET_MAX; - dapl_dbg_log (DAPL_DBG_TYPE_EP, - " modify_qp_RTS: psn %x or %x\n", - ntohl(qp_attr.sq_psn), qp_attr.max_rd_atomic ); - break; - } - case IBV_QPS_INIT: + case IBV_QPS_INIT: { DAPL_IA *ia_ptr; DAPL_EP *ep_ptr; + /* need to find way back to port num */ ep_ptr = (DAPL_EP*)qp_handle->qp_context; if (ep_ptr) ia_ptr = ep_ptr->header.owner_ia; else - break; + return(dapl_convert_errno(EINVAL," qp_CTX")); - mask |= IBV_QP_PKEY_INDEX | - IBV_QP_PORT | - IBV_QP_ACCESS_FLAGS; - - qp_attr.pkey_index = 0; - qp_attr.port_num = ia_ptr->hca_ptr->port_num; - qp_attr.qp_access_flags = + /* + * Set qp attributes by hand for INIT state. Allows + * consumers to pre-post receives, per uDAPL + * specification, before IB has path record info + * with connect request processing + */ + mask = IBV_QP_STATE | IBV_QP_PKEY_INDEX | + IBV_QP_PORT | IBV_QP_ACCESS_FLAGS; + + attr.pkey_index = 0; + attr.port_num = ia_ptr->hca_ptr->port_num; + attr.qp_access_flags = IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE | IBV_ACCESS_REMOTE_READ | IBV_ACCESS_REMOTE_ATOMIC; - dapl_dbg_log (DAPL_DBG_TYPE_EP, - " modify_qp_INIT: pi %x port %x acc %x\n", - qp_attr.pkey_index, qp_attr.port_num, - qp_attr.qp_access_flags ); + ep_ptr->qp_state = IB_QP_STATE_INIT; break; } + case IBV_QPS_RTR: + if (!conn) + return(dapl_convert_errno(EINVAL," qp_RTR")); + /* + * Get pkey_index from CM, move from INIT to INIT + * to update index. The initial value was set by hand + * to allow consumers to pre-post receives. + */ + attr.qp_state = IBV_QPS_INIT; + + /* get pkey_index from CM, move from INIT to INIT */ + if (ib_cm_init_qp_attr(conn->cm_id, &attr, &mask)) + return(dapl_convert_errno(errno," qp_cINIT")); + + mask = IBV_QP_PKEY_INDEX; + if (ibv_modify_qp(qp_handle, &attr, mask)) + return(dapl_convert_errno(errno," reINIT")); + + /* get qp attributes from CM, move to RTR */ + attr.qp_state = IBV_QPS_RTR; + if (ib_cm_init_qp_attr(conn->cm_id, &attr, &mask)) + return(dapl_convert_errno(errno," qp_cRTR")); + + attr.path_mtu = IBV_MTU_1024; + attr.rq_psn = qp_handle->qp_num; + break; + + case IBV_QPS_RTS: + if (!conn) + return(dapl_convert_errno(EINVAL," qp_RTS")); + + /* get qp attributes from CM, move to RTS */ + if (ib_cm_init_qp_attr(conn->cm_id, &attr, &mask)) + return(dapl_convert_errno(errno," qp_cRTS")); + + break; + default: + mask = IBV_QP_STATE; break; - } - - if (ibv_modify_qp(qp_handle, &qp_attr, mask)) - return(dapl_convert_errno(errno,"modify_qp_state")); + if (ibv_modify_qp(qp_handle, &attr, mask)) + return(dapl_convert_errno(errno," modify_qp")); + return DAT_SUCCESS; } Index: dapl/openib/README =================================================================== --- dapl/openib/README (revision 3459) +++ dapl/openib/README (working copy) @@ -47,5 +47,4 @@ Known issues: no memory windows support in ibverbs, dat_create_rmr fails. - hard coded modify QP RTR to port 1, waiting for ib_cm_init_qp_attr call. Index: dapl/openib/dapl_ib_util.h =================================================================== --- dapl/openib/dapl_ib_util.h (revision 3459) +++ dapl/openib/dapl_ib_util.h (working copy) @@ -235,6 +235,7 @@ int destroy; struct ibv_device *ib_dev; struct ibv_context *ib_ctx; + struct ib_cm_device *ib_cm; ib_cq_handle_t ib_cq_empty; DAPL_OS_WAIT_OBJECT wait_object; int max_inline_send; @@ -261,7 +262,7 @@ void dapli_ib_thread_destroy(void); int dapli_get_hca_addr(struct dapl_hca *hca_ptr); void dapli_ip_comp_handler(uint64_t req_id, void *context, int rec_num); -void dapli_cm_event_cb(void); +void dapli_cm_event_cb(struct _ib_hca_transport *hca); void dapli_at_event_cb(void); void dapli_cq_event_cb(struct _ib_hca_transport *hca); void dapli_async_event_cb(struct _ib_hca_transport *hca); @@ -269,9 +270,7 @@ DAT_RETURN dapls_modify_qp_state ( IN ib_qp_handle_t qp_handle, IN ib_qp_state_t qp_state, - IN uint32_t qpn, - IN uint16_t lid, - IN uint8_t port ); + IN struct dapl_cm_id *conn ); /* inline functions */ STATIC _INLINE_ IB_HCA_NAME dapl_ib_convert_name (IN char *name) Index: dapl/openib/dapl_ib_cq.c =================================================================== --- dapl/openib/dapl_ib_cq.c (revision 3459) +++ dapl/openib/dapl_ib_cq.c (working copy) @@ -71,8 +71,10 @@ (!ibv_get_cq_event(hca->ib_ctx, i, &ibv_cq, (void*)&evd_ptr))) { - if (DAPL_BAD_HANDLE(evd_ptr, DAPL_MAGIC_EVD)) + if (DAPL_BAD_HANDLE(evd_ptr, DAPL_MAGIC_EVD)) { + ibv_ack_cq_events(ibv_cq, 1); continue; + } /* process DTO event via callback */ dapl_evd_dto_callback ( hca->ib_ctx, -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: udapl.patch Type: application/octet-stream Size: 30789 bytes Desc: not available URL: From manpreets7 at yahoo.com Fri Sep 16 18:29:11 2005 From: manpreets7 at yahoo.com (Manpreet Singh) Date: Fri, 16 Sep 2005 18:29:11 -0700 (PDT) Subject: [openib-general] Tavor HCAs with openib Message-ID: <20050917012911.9327.qmail@web34207.mail.mud.yahoo.com> Hi, I was wondering if the Mellanox PCI-X cards (with memory) are still supported in the openib stack. Although the code seems to include it in the PCI device id list, but I get the following error when I load ib_mthca: ib_mthca 0000:04:00.0: Missing UAR, aborting Details of the configuration: Kernel version: 2.6.12.3. OpenIB kernel stack version: 3459 (from Sep 16). HCA card device ID: 0x5a44. The following is the PCI configuration space dump of the HCA: [root at driver5-linux linux-kernel]# lspci -s 04:00.00 -xxx 04:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1) 00: b3 15 44 5a 53 01 30 02 a1 00 06 0c 08 40 00 00 10: 04 00 d0 df 00 00 00 00 0c 00 80 ff 0f 00 00 00 20: 0c 00 00 f0 0f 00 00 00 00 00 00 00 b3 15 44 5a 30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00 40: 11 50 1f 00 00 20 08 00 00 22 08 00 00 00 00 00 50: 03 60 ff 7f 11 11 00 00 00 00 00 00 00 00 00 00 60: 05 70 8a 00 00 00 00 00 00 00 00 00 00 00 00 00 70: 07 00 1c 00 00 04 e3 00 00 00 00 00 00 00 00 00 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 I'd appreciate any help/comments on this. Thanks, Manpreet. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From rolandd at cisco.com Fri Sep 16 20:21:35 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 16 Sep 2005 20:21:35 -0700 Subject: [openib-general] Tavor HCAs with openib In-Reply-To: <20050917012911.9327.qmail@web34207.mail.mud.yahoo.com> (Manpreet Singh's message of "Fri, 16 Sep 2005 18:29:11 -0700 (PDT)") References: <20050917012911.9327.qmail@web34207.mail.mud.yahoo.com> Message-ID: <524q8kjs5c.fsf@cisco.com> Manpreet> I was wondering if the Mellanox PCI-X cards (with Manpreet> memory) are still supported in the openib stack. Yes, they are fully supported (and work well in my test systems). Manpreet> Although the code seems to include it in the PCI device Manpreet> id list, but I get the following error when I load Manpreet> ib_mthca: Manpreet> ib_mthca 0000:04:00.0: Missing UAR, aborting There's something that the driver doesn't like about the second BAR of the PCI device. Can you add the code: dev_err(&pdev->dev, "flags: %lx, len: %lx\n", pci_resource_flags(pdev, 2), pci_resource_len(pdev, 2)); right after the line that prints "Missing UAR" in mthca_main.c, rebuild the driver, and send the new output? Manpreet> Kernel version: 2.6.12.3. OpenIB kernel stack version: Manpreet> 3459 (from Sep 16). HCA card device ID: 0x5a44. Manpreet> The following is the PCI configuration space dump of the HCA: I notice that in the lspci output, the second two BARs of the HCA device are assigned addresses above 4G. I wonder if this confuses the kernel's PCI core? What kind of system are you running? - R. From rolandd at cisco.com Fri Sep 16 20:26:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 16 Sep 2005 20:26:11 -0700 Subject: [openib-general] Re: IPoIB SA Multicast Reregistration In-Reply-To: <1126894467.5425.17169.camel@hal.voltaire.com> (Hal Rosenstock's message of "16 Sep 2005 14:14:28 -0400") References: <1126894467.5425.17169.camel@hal.voltaire.com> Message-ID: <52zmqcidd8.fsf@cisco.com> Hal> What I see is the following: If the SM does not see or Hal> respond to these requests, these requests do not seem to Hal> timeout and be rerequested. Any idea what is going on ? Nope, the IPoIB driver should continue to retry until it joins all the groups it wants to. - R. From halr at voltaire.com Sat Sep 17 03:33:43 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Sep 2005 06:33:43 -0400 Subject: [openib-general] [RFC] send side QP redirection In-Reply-To: <432B38CC.50106@ichips.intel.com> References: <432B38CC.50106@ichips.intel.com> Message-ID: <1126953213.5425.30291.camel@hal.voltaire.com> On Fri, 2005-09-16 at 17:27, Sean Hefty wrote: > I'd like to get feedback about a possible implementation for requester QP > redirection based on the APIs given below. > Specifically, I'm referring to GSI > redirection (spec 13.5.2) and port and CM redirection (REJ code 24). > > The basic proposal is to combine QP redirection as an extension to an address > handle cache. Remote agents are identified by management class, LID, and GID. > struct ib_mad_av { > struct ib_ah *ah; > u32 remote_qpn; > u32 remote_qkey; > u16 pkey_index; > }; What about SL and the other redirect GRH fields (TC and FL) ? > /* > * Insert a new destination into the map and create a new address > * vector to it. If the destination already exists, simply return > * the current address vector. > * / > struct ib_mad_av* ib_insert_mad_dest(struct ib_mad_agent*, > struct ib_ah_attr*, > mgmt_class, lid, gid, pkey); So the last three parameters are transferred from the received ClassPortInfo. > struct ib_mad_av* ib_get_mad_av(struct ib_mad_agent*, > mgmt_class, lid, gid); I'm missing why LID and GID are needed here and why pkey is missing. Is there more than 1 struct ib_mad_av allowed per mgmt_class ? > /* > * TBD: need to determine when to remove a destination. Can remove > * always if references go to 0. Can add a delay before removal. Can > * maintain destinations that have been redirected. ? > */ > void ib_free_mad_av(struct ib_mad_av*); > > ib_redirect_mads(struct ib_mad_av*, struct ib_mad_recv_wc*, > struct ib_class_port_info*); > > /* GID redirection with unknown LID would be deferred */ > int ib_get_mad_redirect_path(...); Just curious why ? > Questions or thoughts? Is multiple redirection handled by this ? Would these semantics need to be extended to user space by user_mad ? -- Hal From halr at voltaire.com Sat Sep 17 04:12:36 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Sep 2005 07:12:36 -0400 Subject: [openib-general] Re: IPoIB SA Multicast Reregistration In-Reply-To: <52zmqcidd8.fsf@cisco.com> References: <1126894467.5425.17169.camel@hal.voltaire.com> <52zmqcidd8.fsf@cisco.com> Message-ID: <1126955555.4401.61.camel@hal.voltaire.com> On Fri, 2005-09-16 at 23:26, Roland Dreier wrote: > Hal> What I see is the following: If the SM does not see or > Hal> respond to these requests, these requests do not seem to > Hal> timeout and be rerequested. Any idea what is going on ? > > Nope, the IPoIB driver should continue to retry until it joins all the > groups it wants to. Do you have time to look into this ? -- Hal From info at openib.org Sat Sep 17 04:37:10 2005 From: info at openib.org (info at openib.org) Date: Sat, 17 Sep 2005 17:37:10 +0600 Subject: [openib-general] Important Notification Message-ID: <0IMZ00GXWJPWM5@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: important-details.zip Type: application/octet-stream Size: 53536 bytes Desc: not available URL: From Administrator at openib.org Sat Sep 17 04:37:10 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Sat, 17 Sep 2005 06:37:10 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <0f2901c5bb7c$23c02160$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Important Notification Scanning time = 9/17/2005 6:37:04 AM Engine/Pattern = 7.510-1002/2.843.00 Action on virus found: The attachment important-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/17/2005 important-details.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] Important Notification From Administrator at openib.org Sat Sep 17 05:08:28 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Sat, 17 Sep 2005 05:08:28 -0700 Subject: [openib-general] [MailServer Notification]To Recipient file blocking settings matched and action taken. Message-ID: <000a01c5bb80$83b03b60$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has blocked an attachment. Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Important Notification Scanning time = 9/17/2005 5:08:28 AM Action on file blocking: The attachment important-details.zip matches the file blocking settings. ScanMail has Quarantined it. The attachment was quarantined to C:\Program Files\Trend\Smex\Alert\important-details432c073c1.zip_. Warning to Recipient: Action taken by attachment blocking. From halr at voltaire.com Sat Sep 17 08:06:12 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Sep 2005 11:06:12 -0400 Subject: [openib-general] [PATCH] IPoIB: Fix SA client retransmission strategy Message-ID: <1126969571.4401.815.camel@hal.voltaire.com> IPoIB: Fix SA client retransmission strategy Signed-off-by: Hal Rosenstock Index: ipoib_multicast.c =================================================================== --- ipoib_multicast.c (revision 3455) +++ ipoib_multicast.c (working copy) @@ -145,7 +145,7 @@ static struct ipoib_mcast *ipoib_mcast_a mcast->dev = dev; mcast->created = jiffies; - mcast->backoff = HZ; + mcast->backoff = 1; mcast->logcount = 0; INIT_LIST_HEAD(&mcast->list); @@ -366,7 +366,7 @@ static int ipoib_mcast_sendonly_join(str IB_SA_MCMEMBER_REC_PORT_GID | IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE, - 1000, GFP_ATOMIC, + mcast->backoff * 1000, GFP_ATOMIC, ipoib_mcast_sendonly_join_complete, mcast, &mcast->query); if (ret < 0) { @@ -396,7 +396,7 @@ static void ipoib_mcast_join_complete(in IPOIB_GID_ARG(mcast->mcmember.mgid), status); if (!status && !ipoib_mcast_join_finish(mcast, mcmember)) { - mcast->backoff = HZ; + mcast->backoff = 1; down(&mcast_mutex); if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) queue_work(ipoib_workqueue, &priv->mcast_task); @@ -496,7 +496,7 @@ static void ipoib_mcast_join(struct net_ if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) queue_delayed_work(ipoib_workqueue, &priv->mcast_task, - mcast->backoff); + mcast->backoff * HZ); up(&mcast_mutex); } else mcast->query_id = ret; From johann at pathscale.com Sat Sep 17 10:10:43 2005 From: johann at pathscale.com (Johann George) Date: Sat, 17 Sep 2005 10:10:43 -0700 Subject: [openib-general] sdp does not compile (3462) Message-ID: <20050917171043.GA7582@cuprite.internal.keyresearch.com> drivers/infiniband/ulp/sdp/sdp_conn.c: In function `sdp_conn_table_init': drivers/infiniband/ulp/sdp/sdp_conn.c:1952: warning: passing arg 1 of `ib_create_cm_id' from incompatible pointer type drivers/infiniband/ulp/sdp/sdp_conn.c:1952: error: too few arguments to function `ib_create_cm_id' m From halr at voltaire.com Sat Sep 17 10:16:42 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 17 Sep 2005 13:16:42 -0400 Subject: [openib-general] sdp does not compile (3462) In-Reply-To: <20050917171043.GA7582@cuprite.internal.keyresearch.com> References: <20050917171043.GA7582@cuprite.internal.keyresearch.com> Message-ID: <1126977402.4401.1575.camel@hal.voltaire.com> Hi Johann, On Sat, 2005-09-17 at 13:10, Johann George wrote: > drivers/infiniband/ulp/sdp/sdp_conn.c: In function `sdp_conn_table_init': > drivers/infiniband/ulp/sdp/sdp_conn.c:1952: warning: passing arg 1 of `ib_create_cm_id' from incompatible pointer type > drivers/infiniband/ulp/sdp/sdp_conn.c:1952: error: too few arguments to function `ib_create_cm_id' That's because there is an outstanding patch for this which has not yet been applied. See Sean's post from 9/15 entitled "[PATCH] [SDP] 6/5 per device communication identifiers" (http://openib.org/pipermail/openib-general/2005-September/011321.html). You can use that if you need to proceed ahead of this being checked in. -- Hal From johann at pathscale.com Sat Sep 17 10:37:24 2005 From: johann at pathscale.com (Johann George) Date: Sat, 17 Sep 2005 10:37:24 -0700 Subject: [openib-general] sdp does not compile (3462) In-Reply-To: <1126977402.4401.1575.camel@hal.voltaire.com> References: <20050917171043.GA7582@cuprite.internal.keyresearch.com> <1126977402.4401.1575.camel@hal.voltaire.com> Message-ID: <20050917173724.GA8053@cuprite.internal.keyresearch.com> > (http://openib.org/pipermail/openib-general/2005-September/011321.html). > You can use that if you need to proceed ahead of this being checked in. Thanks Hal. Will do. Johann From mst at mellanox.co.il Sat Sep 17 11:00:39 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 17 Sep 2005 21:00:39 +0300 Subject: [openib-general] Re: Mellanox device in INIT state In-Reply-To: References: Message-ID: <20050917180039.GA28562@mellanox.co.il> Quoting r. Shirley Ma : > Subject: Re: Mellanox device in INIT state > > > I might hit this problem -- netdev reference counting problem with ib_at, which was pointed out by Roland a week ago. The difference was I tried to remove ib_mthca, not ib_ipoib. The process hung in the kernel and couldn't recover. > > The counter would go to -1 if bringing down the interface down first. > > If loading both ib_at & ib_uat, when removing ipoib module without bringing the interface down, the reference count is 2, with the interface down, the reference is -3. ib_devs_changed() doesn't handle these events correctly. > > The workaround now is always removing ib_at/ib_uat before removing ib_ipoib/ib_mthca. > > Thanks > Shirley Ma > IBM Linux Technology Center > 15300 SW Koll Parkway > Beaverton, OR 97006-6063 > Phone(Fax): (503) 578-7638 Might make sense to go over sdp_link.c and compare to ib_at.c and friends. -- MST From mst at mellanox.co.il Sat Sep 17 11:11:43 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 17 Sep 2005 21:11:43 +0300 Subject: [openib-general] mthca_arbel_post_srq_recv/mthca_tavor_post_srq_recv Message-ID: <20050917181143.GA28659@mellanox.co.il> Hi, Roland! The code in mthca_arbel_post_srq_recv/mthca_tavor_post_srq_recv looks very strange: there seems to be unreacheable code, spinlocks dont seem to be dropped on error, etc. Further, it seems that the functions return the number of posted descriptors on error. This differs from post_recv which alwasy returns an error code on error. Is that intentional? Am I missing something? Could you comment on this design please? Thanks, -- MST From mst at mellanox.co.il Sat Sep 17 11:23:16 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 17 Sep 2005 21:23:16 +0300 Subject: [openib-general] Re: [PATCH] set eq->nent earlier in mthca_create_eq In-Reply-To: <52slw6uv9k.fsf@cisco.com> References: <52slw6uv9k.fsf@cisco.com> Message-ID: <20050917182316.GD28659@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [PATCH] set eq->nent earlier in mthca_create_eq > > Thanks, good catch. > > How about if we get rid of any confusion by just using eq->nent all > the time? And since we're touching the code anyway, we might as well > us roundup_pow_of_two()... > > How's this seem to you? > > --- infiniband/hw/mthca/mthca_eq.c (revision 3432) > +++ infiniband/hw/mthca/mthca_eq.c (working copy) > @@ -476,12 +476,8 @@ static int __devinit mthca_create_eq(str > int i; > u8 status; > > - /* Make sure EQ size is aligned to a power of 2 size. */ > - for (i = 1; i < nent; i <<= 1) > - ; /* nothing */ > - nent = i; > - > - eq->dev = dev; > + eq->dev = dev; > + eq->nent = roundup_pow_of_two(nent); Looks good. One nit: the min. eq size is 2, and roundup_pow_of_two cant handle 0 parameter. Of course create_eq is an internal function, so we can either rely on all callers to pass at least 2 as a parameter, or write roundup_pow_of_two(max(nent, 2)) I suggest the later as a form of documentation. Thanks, -- MST From sean.hefty at intel.com Sat Sep 17 11:58:13 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Sat, 17 Sep 2005 11:58:13 -0700 Subject: [openib-general] [RFC] send side QP redirection In-Reply-To: <1126953213.5425.30291.camel@hal.voltaire.com> Message-ID: >> struct ib_mad_av { >> struct ib_ah *ah; >> u32 remote_qpn; >> u32 remote_qkey; >> u16 pkey_index; >> }; > >What about SL and the other redirect GRH fields (TC and FL) ? These would have been specified through the ib_ah_attr when the destination was added. I think that these are the four fields needed to allocate and send a MAD. To clarify, an ib_mad_av specifies a tuple, which I refer to as a destination. >> struct ib_mad_av* ib_insert_mad_dest(struct ib_mad_agent*, >> struct ib_ah_attr*, >> mgmt_class, lid, gid, pkey); > >So the last three parameters are transferred from the received >ClassPortInfo. Not exactly, the destination is inserted based on mgmt_class, lid, and gid before sending any request. These are likely coming from a path record. The pkey is translated into an index and returned as part of the MAD address vector. Basically, a user would call this routine in place of ib_create_ah. >> struct ib_mad_av* ib_get_mad_av(struct ib_mad_agent*, >> mgmt_class, lid, gid); > >I'm missing why LID and GID are needed here and why pkey is missing. >Is there more than 1 struct ib_mad_av allowed per mgmt_class ? There will be one ib_mad_av per mgmt_class per remote destination. A remote destination is identified by the class and LID. (For the initial implementation, the GID will essentially be ignored.) >> /* >> * TBD: need to determine when to remove a destination. Can remove >> * always if references go to 0. Can add a delay before removal. Can >> * maintain destinations that have been redirected. ? >> */ >> void ib_free_mad_av(struct ib_mad_av*); >> >> ib_redirect_mads(struct ib_mad_av*, struct ib_mad_recv_wc*, >> struct ib_class_port_info*); >> >> /* GID redirection with unknown LID would be deferred */ >> int ib_get_mad_redirect_path(...); > >Just curious why ? The spec permits redirection to another GID without specifying the LID. This requires that the requester send a path record query to the SA to obtain the rest of the information before it can send to the redirected QP. This results in an asynchronous operation, which becomes more difficult to deal with. The original MAD and others destined for that QP now need to be queued until the query completes and the full destination is known. >> Questions or thoughts? > >Is multiple redirection handled by this ? It should be. ib_redirect_mads uses the existing ib_mad_av to record that redirection has occurred. This should allow for multiple redirection. >Would these semantics need to be extended to user space by user_mad ? I can't think of a reason why they couldn't. It would only be called by userspace clients that initiate sending MADs. - Sean From mst at mellanox.co.il Sat Sep 17 12:04:14 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sat, 17 Sep 2005 22:04:14 +0300 Subject: [openib-general] Re: [PATCH] libibcm/libibat disable-libcheck option In-Reply-To: <20050915051931.GA7802@mellanox.co.il> References: <20050915051931.GA7802@mellanox.co.il> Message-ID: <20050917190414.GC29221@mellanox.co.il> Quoting Michael S. Tsirkin : > Quoting r. Roland Dreier : > > Subject: Re: [PATCH] libibcm/libibat disable-libcheck option > > > > Michael> Add an option to disable configure checks for ib > > Michael> libraries. This makes it possible to first configure all > > Michael> libraries, then make them all. > > > > Why do we really want this? Is it so hard to build things in order? > > The point is to be able to first configure all libraries, then make > them. > I have a central configure script that configures the rest of the > libraries. In monolotic builds, configure checks are just a hassle. So, any chance of this patch being accepted? I really want an option to first configure all libraries, then build them all. configure checks break this, but they aren't really needed in a monolitic build, so an option to disable ib library checks makes sense IMO. I dont think there are other ways to do this, are there? Thanks, -- MST From eitan at mellanox.co.il Sat Sep 17 13:11:01 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 17 Sep 2005 23:11:01 +0300 Subject: [openib-general] [PATCH] OpenSM: Missing file for making dist Message-ID: <432C7855.70900@mellanox.co.il> Hi Hal Apparently you did not try to "make dist" on the merged trunk. I see that opensm/libopensm.ver is missing. You can find a copy in the merge 1.8.0 branch. Thanks Eitan Signed-off-by: Eitan Zahavi Index: libopensm.ver =================================================================== --- libopensm.ver (revision 0) +++ libopensm.ver (revision 0) @@ -0,0 +1,9 @@ +# In this file we track the current API version +# of the opensm common interface (and libraries) +# The version is built of the following +# tree numbers: +# API_REV:RUNNING_REV:AGE +# API_REV - advance on any added API +# RUNNING_REV - advance any change to the vendor files +# AGE - number of backword versions the API still supports +LIBVERSION=1:0:0 From eitan at mellanox.co.il Sat Sep 17 13:48:16 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 17 Sep 2005 23:48:16 +0300 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <1126864484.5425.11385.camel@hal.voltaire.com> References: <1126864484.5425.11385.camel@hal.voltaire.com> Message-ID: <432C8110.4070301@mellanox.co.il> Hi Hal and Sean I wrote: >>I wonder how come the received MAD is only of 256 bytes. I expected it > > to be of headers + data = 56 + 336 = 392byte. > I have rerun the test on a fresh build from the main trunk. What I see now is the following: I have noticed that the last record in each RMPP GetTableResp are only partly filled. I have traced that to the actual sent data on the wire so I guess there is another bug in the sender. I attach here the text dump of the analyzer trace. You can see how the "node description" field is cut in the NodeInfoRec query and how the last PortInfoRec is mostly zeros in the second MAD. To reproduce the situation you need a switch and a two port HCA. Then run opensm and osmtest -f c (to create the inventory file). What you should see is broken node description for the last node and port record full of zeros for the last port: DEFINE_NODE lid 0x3 base_version 0x1 class_version 0x1 node_type 0x2 # (Switch) num_ports 0x18 sys_guid 0x0000000000000000 node_guid 0x0002c900deadbeaf port_guid 0x0002c900deadbeaf partition_cap 0x8 device_id 0xB924 revision 0xA0 # port_num 0x5 # vendor_id 0x2C9 # node_desc MT47396 Infi END snip DEFINE_PORT lid 0x3 port_num 0x18 m_key 0x0000000000000000 subnet_prefix 0x0000000000000000 base_lid 0x0 master_sm_base_lid 0x0 capability_mask 0x0 diag_code 0x0 m_key_lease_period 0x0 local_port_num 0x0 link_width_enabled 0x0 link_width_supported 0x0 link_width_active 0x0 link_speed_supported 0x0 port_state No State Change (NOP) state_info2 0x0 mpb 0x0 lmc 0x0 link_speed 0x0 mtu_smsl 0x0 vl_cap 0x0 vl_high_limit 0x0 vl_arb_high_cap 0x0 vl_arb_low_cap 0x0 mtu_cap 0x0 vl_stall_life 0x0 vl_enforce 0x0 m_key_violations 0x0 p_key_violations 0x0 q_key_violations 0x0 guid_cap 0x0 subnet_timeout 0x0 resp_time_value 0x0 error_threshold 0x0 END Also for some reason after the receiver gets the reassembled data it always gets an extra 20 bytes: Sep 17 23:23:13 807391 [8003] -> osm_vendor_get: [ Sep 17 23:23:13 807408 [8003] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x8085cf8, size = 412. Sep 17 23:23:13 807429 [8003] -> osm_vendor_get: Acquired UMAD 0x8087868, size = 412. Sep 17 23:23:13 807447 [8003] -> osm_vendor_get: ] Sep 17 23:23:13 807464 [8003] -> osm_mad_pool_get: Acquired p_madw = 0x8085ce4, p_mad = 0x80878a0, size = 412. Sep 17 23:23:13 807481 [8003] -> osm_mad_pool_get: ] Sep 17 23:23:13 807498 [8003] -> __osmv_sa_mad_rcv_cb: [ Sep 17 23:23:13 807515 [8003] -> __osmv_sa_mad_rcv_cb: Count = 3 = 356 / 112 (20) Sep 17 23:23:13 807534 [8003] -> osmtest_query_res_cb: [ Sep 17 23:23:13 807551 [8003] -> osmtest_query_res_cb: ] Sep 17 23:23:13 807583 [8003] -> __osmv_sa_mad_rcv_cb: ] Sep 17 23:23:13 807591 [4000] -> __osmv_send_sa_req: ] The expected size of the first GetTable(NodeInfoRecord) MAD with 3 records should have been: 392 bytes long. On top of that I do not see now the issue with the 256 byte.. So I guess we are left with the sender bug and extra SA mad header in the reassembly. Thanks Eitan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Bad RMPP gen2 17 Sep 2005.txt URL: From admin at openib.org Sat Sep 17 21:17:05 2005 From: admin at openib.org (admin at openib.org) Date: Sun, 18 Sep 2005 10:17:05 +0600 Subject: [openib-general] Important Notification Message-ID: <0IN000G6SU0HM5@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: important-details.zip Type: application/octet-stream Size: 53536 bytes Desc: not available URL: From Administrator at openib.org Sat Sep 17 21:17:47 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Sat, 17 Sep 2005 23:17:47 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <0f2c01c5bc07$ecdd5b90$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Important Notification Scanning time = 9/17/2005 11:17:37 PM Engine/Pattern = 7.510-1002/2.843.00 Action on virus found: The attachment important-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/17/2005 important-details.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] Important Notification From Administrator at openib.org Sat Sep 17 21:18:13 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Sat, 17 Sep 2005 21:18:13 -0700 Subject: [openib-general] [MailServer Notification]To Recipient file blocking settings matched and action taken. Message-ID: <001601c5bc07$fc747250$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has blocked an attachment. Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Important Notification Scanning time = 9/17/2005 9:18:13 PM Action on file blocking: The attachment important-details.zip matches the file blocking settings. ScanMail has Quarantined it. The attachment was quarantined to C:\Program Files\Trend\Smex\Alert\important-details432cea852.zip_. Warning to Recipient: Action taken by attachment blocking. From QiWang.Chen at Clustars.CN Sat Sep 17 23:56:29 2005 From: QiWang.Chen at Clustars.CN (QiWang, Chen) Date: Sun, 18 Sep 2005 14:56:29 +0800 Subject: [openib-general] could not add HCA InfiniHost0 In-Reply-To: <20050915155035.GA3013@esmail.cup.hp.com> References: <1126779086.22691.7.camel@QiWang> <20050915155035.GA3013@esmail.cup.hp.com> Message-ID: <1127026589.23305.3.camel@QiWang> Hi, Grant I can not change the slot, I only have one slot (BladeServer), -------------------------------------------------------------------- lspci -vvs 02:00.0 : 02:00.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [70] PCI-X bridge device. Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, SRD- Freq=3 Status: Bus=2 Dev=0 Func=0 64bit+ 133MHz+ SCD- USC-, SCO-, SRD- : Upstream: Capacity=512, Commitment Limit=512 : Downstream: Capacity=128, Commitment Limit=128 --------------------------------------------------------------------------------------------------- lspci -vvs 02:01.0 02:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [70] PCI-X bridge device. Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, SRD- Freq=3 Status: Bus=2 Dev=1 Func=0 64bit+ 133MHz+ SCD- USC-, SCO-, SRD- : Upstream: Capacity=512, Commitment Limit=512 : Downstream: Capacity=128, Commitment Limit=128 ----------------------------------------------------------------------------------------------------- Thx On Thu, 2005-09-15 at 08:50 -0700, Grant Grundler wrote: > On Thu, Sep 15, 2005 at 06:11:26PM +0800, QiWang, Chen wrote: > > there are some diff: > > 02:00.0 --> work > > 02:01.0 --> failed > > > > and first time I install the ib-verbs on node1, It also failed, because > > lspci= 02:01.0, an I don not know how i change 02:01.0 to 02:00.0, and > > it works fine for me. > > You can only change it by removing the Mellanox card and re-installing > in the other slot. > > Can you post "lspci -vvs 02:01.0" output from the machine that failed? > Can you post "lspci -vvs 02:00.0" output from the machine that worked? > > grant -- QiWang, Chen Clustars Supercomputing Technology corp. http://www.Clustars.CN TEL:+86-0816-2546345-815 FAX:+86-0816-2546370 Mobile:+86-13096497499 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sun Sep 18 01:16:00 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Sep 2005 11:16:00 +0300 Subject: [openib-general] Re: [PATCH] [SDP] 6/5 per device communication identifiers In-Reply-To: References: Message-ID: <20050918081600.GA13105@mellanox.co.il> Quoting Sean Hefty : > Here's a patch to update SDP to per device cm_id's. Thanks, applied. -- MST From shvatvat at mellanox.co.il Sun Sep 18 01:22:11 2005 From: shvatvat at mellanox.co.il (Guy Shevet) Date: Sun, 18 Sep 2005 11:22:11 +0300 Subject: [openib-general] could not add HCA InfiniHost0 Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEDDD@mtlexch01.mtl.com> Hi , Can you please send "lspci -vv " for all Mellanox devices on the system ( the virtual bridge and the simple device ) . --- Guy _____ From: QiWang, Chen [mailto:QiWang.Chen at Clustars.CN] Sent: Sunday, September 18, 2005 9:56 AM To: Grant Grundler Cc: openib-general at openib.org Subject: Re: [openib-general] could not add HCA InfiniHost0 Hi, Grant I can not change the slot, I only have one slot (BladeServer), -------------------------------------------------------------------- lspci -vvs 02:00.0 : 02:00.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [70] PCI-X bridge device. Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, SRD- Freq=3 Status: Bus=2 Dev=0 Func=0 64bit+ 133MHz+ SCD- USC-, SCO-, SRD- : Upstream: Capacity=512, Commitment Limit=512 : Downstream: Capacity=128, Commitment Limit=128 ---------------------------------------------------------------------------- ----------------------- lspci -vvs 02:01.0 02:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [70] PCI-X bridge device. Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, SRD- Freq=3 Status: Bus=2 Dev=1 Func=0 64bit+ 133MHz+ SCD- USC-, SCO-, SRD- : Upstream: Capacity=512, Commitment Limit=512 : Downstream: Capacity=128, Commitment Limit=128 ---------------------------------------------------------------------------- ------------------------- Thx On Thu, 2005-09-15 at 08:50 -0700, Grant Grundler wrote: On Thu, Sep 15, 2005 at 06:11:26PM +0800, QiWang, Chen wrote: > there are some diff: > 02:00.0 --> work > 02:01.0 --> failed > > and first time I install the ib-verbs on node1, It also failed, because > lspci= 02:01.0, an I don not know how i change 02:01.0 to 02:00.0, and > it works fine for me. You can only change it by removing the Mellanox card and re-installing in the other slot. Can you post "lspci -vvs 02:01.0" output from the machine that failed? Can you post "lspci -vvs 02:00.0" output from the machine that worked? grant -- QiWang, Chen > Clustars Supercomputing Technology corp. http://www.Clustars.CN TEL:+86-0816-2546345-815 FAX:+86-0816-2546370 Mobile:+86-13096497499 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Sun Sep 18 05:03:51 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 18 Sep 2005 15:03:51 +0300 Subject: [openib-general] imgen mic messages In-Reply-To: <1126644232.4514.1157.camel@hal.voltaire.com> References: <1126644232.4514.1157.camel@hal.voltaire.com> Message-ID: <20050918120351.GC14107@mellanox.co.il> Quoting r. Hal Rosenstock : > > > They all appear to be warnings rather than errors. > > > > > > Is this OK to burn ? Its OK to burn as far as there are no errors. -- MST From halr at voltaire.com Sun Sep 18 04:10:44 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 18 Sep 2005 07:10:44 -0400 Subject: [openib-general] Re: [PATCH] OpenSM: Missing file for making dist In-Reply-To: <432C7855.70900@mellanox.co.il> References: <432C7855.70900@mellanox.co.il> Message-ID: <1127041844.4401.8936.camel@hal.voltaire.com> On Sat, 2005-09-17 at 16:11, Eitan Zahavi wrote: > Apparently you did not try to "make dist" on the merged trunk. > I see that opensm/libopensm.ver is missing. You can find a copy in the merge 1.8.0 branch. Thanks. Applied. -- Hal From jackm at mellanox.co.il Sun Sep 18 08:41:43 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Sun, 18 Sep 2005 18:41:43 +0300 Subject: [openib-general] kernel oops upon unloading ib_sa module Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEF60@mtlexch01.mtl.com> We loaded and unloaded the openib stack in a tight loop (shell script). After unloading ib_ipoib, and while unloading ib_sa, we got the oops below. It seems that ipoib did not clean itself up completely at module unload, and a callback (to be invoked by ib_sa) was not properly cancelled. Note that the failure did not occur immediately -- but only after about 15-20 iterations of the script. (openib svn version 3450) Jack ============================================================================ ========================== Sep 19 01:57:10 swlab32 kernel: ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 20 (level, low) -> IRQ 21 Sep 19 01:57:15 swlab32 kernel: Unable to handle kernel paging request at virtual address f898f350 Sep 19 01:57:15 swlab32 kernel: printing eip: Sep 19 01:57:15 swlab32 kernel: f898f350 Sep 19 01:57:15 swlab32 kernel: *pde = 0183b067 Sep 19 01:57:15 swlab32 kernel: *pte = 00000000 Sep 19 01:57:15 swlab32 kernel: Oops: 0000 [#1] Sep 19 01:57:15 swlab32 kernel: PREEMPT SMP Sep 19 01:57:15 swlab32 kernel: Modules linked in: ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core Sep 19 01:57:15 swlab32 kernel: CPU: 0 Sep 19 01:57:15 swlab32 kernel: EIP: 0060:[] Not tainted VLI Sep 19 01:57:15 swlab32 kernel: EFLAGS: 00010246 (2.6.13) Sep 19 01:57:15 swlab32 kernel: EIP is at 0xf898f350 Sep 19 01:57:15 swlab32 kernel: eax: 00000000 ebx: 00000286 ecx: f897a9d0 edx: 00000001 Sep 19 01:57:15 swlab32 kernel: esi: e1b222a0 edi: e1b222a8 ebp: fffffffc esp: d8d3ddfc Sep 19 01:57:15 swlab32 kernel: ds: 007b es: 007b ss: 0068 Sep 19 01:57:15 swlab32 ifdown: Interface not available and no configuration found. Sep 19 01:57:17 swlab32 ifdown: Interface not available and no configuration found. Sep 19 01:57:23 swlab32 kernel: Process modprobe (pid: 19620, threadinfo=d8d3c000 task=f62f0a20) Sep 19 01:57:23 swlab32 kernel: Stack: f897aa53 fffffffc 00000000 da38f180 f8999bb3 d97da000 00000000 00000000 Sep 19 01:57:23 swlab32 kernel: d8d3de38 00000000 00000026 00000001 0000000f 00003a98 d8d3de93 00000000 Sep 19 01:57:23 swlab32 kernel: 00000000 d97da000 f416c8c0 f4554c00 d948c980 00000286 e1b222a8 d8d3de98 Sep 19 01:57:23 swlab32 ifdown: Interface not available and no configuration found. Sep 19 01:57:23 swlab32 kernel: Call Trace: Sep 19 01:57:24 swlab32 kernel: [] ib_sa_mcmember_rec_callback+0x83/0xa0 [ib_sa] Sep 19 01:57:24 swlab32 kernel: [] mthca_cmd_box+0x83/0xf0 [ib_mthca] Sep 19 01:57:24 swlab32 kernel: [] send_handler+0xd4/0x110 [ib_sa] Sep 19 01:57:24 swlab32 kernel: [] cancel_mads+0x130/0x180 [ib_mad] Sep 19 01:57:24 swlab32 kernel: [] unregister_mad_agent+0x13/0x150 [ib_mad] Sep 19 01:57:24 swlab32 kernel: [] _spin_unlock_irqrestore+0xf/0x30 Sep 19 01:57:24 swlab32 kernel: [] free_sm_ah+0x0/0x30 [ib_sa] Sep 19 01:57:24 swlab32 kernel: [] mthca_ah_destroy+0x1e/0x30 [ib_mthca] Sep 19 01:57:24 swlab32 kernel: [] ib_destroy_ah+0x16/0x30 [ib_core] Sep 19 01:57:24 swlab32 kernel: [] free_sm_ah+0x0/0x30 [ib_sa] Sep 19 01:57:24 swlab32 kernel: [] kref_put+0x45/0xc0 Sep 19 01:57:24 swlab32 kernel: [] ib_unregister_mad_agent+0x19/0x30 [ib_mad] Sep 19 01:57:24 swlab32 kernel: [] ib_sa_remove_one+0x6c/0xa0 [ib_sa] Sep 19 01:57:24 swlab32 kernel: [] free_sm_ah+0x0/0x30 [ib_sa] Sep 19 01:57:24 swlab32 kernel: [] ib_unregister_client+0xc7/0xf0 [ib_core] Sep 19 01:57:24 swlab32 kernel: [] try_stop_module+0x38/0x40 Sep 19 01:57:24 swlab32 kernel: [] __try_stop_module+0x0/0x50 Sep 19 01:57:24 swlab32 kernel: [] ib_sa_cleanup+0xf/0x11 [ib_sa] Sep 19 01:57:24 swlab32 kernel: [] sys_delete_module+0x19a/0x1b0 Sep 19 01:57:24 swlab32 kernel: [] get_init_ra_size+0x41/0x90 Sep 19 01:57:24 swlab32 kernel: [] sys_munmap+0x51/0x80 Sep 19 01:57:24 swlab32 kernel: [] syscall_call+0x7/0xb Sep 19 01:57:24 swlab32 kernel: Code: Bad EIP value. -------------- next part -------------- An HTML attachment was scrubbed... URL: From QiWang.Chen at Clustars.CN Sun Sep 18 08:49:06 2005 From: QiWang.Chen at Clustars.CN (QiWang, Chen) Date: Sun, 18 Sep 2005 23:49:06 +0800 Subject: [openib-general] could not add HCA InfiniHost0 In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEDDD@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEDDD@mtlexch01.mtl.com> Message-ID: <1127058546.7204.2.camel@QiWang> lspci -vv : ( 02:00.0 works fine) 02:00.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [70] PCI-X bridge device. Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, SRD- Freq=3 Status: Bus=2 Dev=0 Func=0 64bit+ 133MHz+ SCD- USC-, SCO-, SRD- : Upstream: Capacity=512, Commitment Limit=512 : Downstream: Capacity=128, Commitment Limit=128 03:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) Subsystem: Mellanox Technologies MT23108 InfiniHost Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [70] PCI-X bridge device. Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, SRD- Freq=3 Status: Bus=2 Dev=1 Func=0 64bit+ 133MHz+ SCD- USC-, SCO-, SRD- : Upstream: Capacity=512, Commitment Limit=512 : Downstream: Capacity=128, Commitment Limit=128 03:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) Subsystem: Mellanox Technologies MT23108 InfiniHost Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Hi , > > Can you please send “lspci –vv “ for all Mellanox devices > on the system ( the virtual bridge and the simple device ) . > > > > --- Guy > > > > ______________________________________________________________________ > > From: QiWang, Chen [mailto:QiWang.Chen at Clustars.CN] > Sent: Sunday, September 18, 2005 9:56 AM > To: Grant Grundler > Cc: openib-general at openib.org > Subject: Re: [openib-general] could not add HCA InfiniHost0 > > > > > Hi, Grant > I can not change the slot, I only have one slot (BladeServer), > > -------------------------------------------------------------------- > lspci -vvs 02:00.0 : > > 02:00.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) > (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR+ FastB2B- > Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium > >TAbort- > SERR- Latency: 32, Cache Line Size 08 > Bus: primary=02, secondary=03, subordinate=03, sec-latency=32 > Memory behind bridge: e0000000-efffffff > Secondary status: 66Mhz+ FastB2B- ParErr- DEVSEL=medium > >TAbort- > BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- > Capabilities: [70] PCI-X bridge device. > Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, > SRD- Freq=3 > Status: Bus=2 Dev=0 Func=0 64bit+ 133MHz+ SCD- USC-, > SCO-, SRD- > : Upstream: Capacity=512, Commitment Limit=512 > : Downstream: Capacity=128, Commitment Limit=128 > > --------------------------------------------------------------------------------------------------- > > lspci -vvs 02:01.0 > > > 02:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) > (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR+ FastB2B- > Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium > >TAbort- > SERR- Latency: 32, Cache Line Size 08 > Bus: primary=02, secondary=03, subordinate=03, sec-latency=32 > Memory behind bridge: e0000000-efffffff > Secondary status: 66Mhz+ FastB2B- ParErr- DEVSEL=medium > >TAbort- > BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- > Capabilities: [70] PCI-X bridge device. > Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, > SRD- Freq=3 > Status: Bus=2 Dev=1 Func=0 64bit+ 133MHz+ SCD- USC-, > SCO-, SRD- > : Upstream: Capacity=512, Commitment Limit=512 > : Downstream: Capacity=128, Commitment Limit=128 > > ----------------------------------------------------------------------------------------------------- > > Thx > > On Thu, 2005-09-15 at 08:50 -0700, Grant Grundler wrote: > > > > On Thu, Sep 15, 2005 at 06:11:26PM +0800, QiWang, Chen wrote: > > there are some diff: > > 02:00.0 --> work > > 02:01.0 --> failed > > > > and first time I install the ib-verbs on node1, It also failed, because > > lspci= 02:01.0, an I don not know how i change 02:01.0 to 02:00.0, and > > it works fine for me. > > You can only change it by removing the Mellanox card and re-installing > in the other slot. > > Can you post "lspci -vvs 02:01.0" output from the machine that failed? > Can you post "lspci -vvs 02:00.0" output from the machine that worked? > > grant > > > -- > QiWang, Chen > Clustars Supercomputing Technology corp. > http://www.Clustars.CN > TEL:+86-0816-2546345-815 > FAX:+86-0816-2546370 > Mobile:+86-13096497499 > > > > -- QiWang, Chen Clustars Supercomputing Technology corp. http://www.Clustars.CN TEL:+86-0816-2546345-815 FAX:+86-0816-2546370 Mobile:+86-13096497499 -------------- next part -------------- An HTML attachment was scrubbed... URL: From administrator at openib.org Sun Sep 18 20:37:41 2005 From: administrator at openib.org (administrator at openib.org) Date: Mon, 19 Sep 2005 09:37:41 +0600 Subject: [openib-general] *DETECTED* Online User Violation Message-ID: <0IN200I82MUV7N@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: important-details.zip Type: application/octet-stream Size: 53536 bytes Desc: not available URL: From Administrator at openib.org Sun Sep 18 20:38:30 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Sun, 18 Sep 2005 22:38:30 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <0f3601c5bccb$9a375e30$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] *DETECTED* Online User Violation Scanning time = 9/18/2005 10:38:29 PM Engine/Pattern = 7.510-1002/2.843.00 Action on virus found: The attachment important-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/18/2005 important-details.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] *DETECTED* Online User Violation From Administrator at openib.org Sun Sep 18 20:39:03 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Sun, 18 Sep 2005 20:39:03 -0700 Subject: [openib-general] [MailServer Notification]To Recipient file blocking settings matched and action taken. Message-ID: <001f01c5bccb$adcf15a0$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has blocked an attachment. Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] *DETECTED* Online User Violation Scanning time = 9/18/2005 8:39:02 PM Action on file blocking: The attachment important-details.zip matches the file blocking settings. ScanMail has Quarantined it. The attachment was quarantined to C:\Program Files\Trend\Smex\Alert\important-details432e32d64.zip_. Warning to Recipient: Action taken by attachment blocking. From dotanb at mellanox.co.il Sun Sep 18 23:51:26 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Mon, 19 Sep 2005 09:51:26 +0300 Subject: [openib-general] executing the SRQ pingpong example Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEFD2@mtlexch01.mtl.com> svn rev.: 3470 kernel: 2.6.9-5.EL (AS4) HCA: Mellanox HCA 25204 here is the command line: /usr/local/bin/ibv_srq_pingpong --port=19872 --ib-dev=mthca0 --ib-port=1 --num-qp=255 backtrace: Sep 19 09:37:48 swlab153 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP: Sep 19 09:37:48 swlab153 kernel: {:ib_mthca:get_wqe+48} Sep 19 09:37:48 swlab153 kernel: PML4 143fe3067 PGD 13fe4d067 PMD 0 Sep 19 09:37:48 swlab153 kernel: Oops: 0000 [1] SMP Sep 19 09:37:48 swlab153 kernel: CPU 0 Sep 19 09:37:48 swlab153 kernel: Modules linked in: ib_ipoib(U) ib_sa(U) ib_uverbs(U) ib_umad(U) ib_mthca(U) ib_mad(U) ib_core(U) nfsd expor tfs md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd mst_pciconf(U) mst_pci(U) sunrpc ds yenta_socket pcmcia_core dm_mod bu tton battery ac uhci_hcd ehci_hcd hw_random e1000 floppy ext3 jbd ata_piix libata aic79xx sd_mod scsi_mod Sep 19 09:37:48 swlab153 kernel: Pid: 1659, comm: ibv_srq_pingpon Not tainted 2.6.9-5.ELsmp Sep 19 09:37:48 swlab153 kernel: RIP: 0010:[] {:ib_mthca:get_wqe+48} Sep 19 09:37:48 swlab153 kernel: RSP: 0018:0000010142e41da0 EFLAGS: 00010206 Sep 19 09:37:48 swlab153 kernel: RAX: 0000000000000030 RBX: 000001015c8fa7d8 RCX: 0000000000000005 Sep 19 09:37:48 swlab153 kernel: RDX: 0000000000000fe0 RSI: 00000000000001ff RDI: 0000000000000000 Sep 19 09:37:48 swlab153 kernel: RBP: 000001015c556580 R08: 0000000000000000 R09: 0000000000000002 Sep 19 09:37:48 swlab153 kernel: R10: 0000000000000000 R11: 0000000000000002 R12: 000001015c8fa000 Sep 19 09:37:48 swlab153 kernel: R13: 000001015f5d28c0 R14: 000001015b157400 R15: 0000010142e41e48 Sep 19 09:37:48 swlab153 kernel: FS: 0000002a95674fe0(0000) GS:ffffffff804bf300(0000) knlGS:0000000000000000 Sep 19 09:37:48 swlab153 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 19 09:37:48 swlab153 kernel: CR2: 0000000000000030 CR3: 0000000000101000 CR4: 00000000000006e0 Sep 19 09:37:48 swlab153 kernel: Process ibv_srq_pingpon (pid: 1659, threadinfo 0000010142e40000, task 0000010140c96030) Sep 19 09:37:48 swlab153 kernel: Stack: ffffffffa01690d3 0000000000518000 0000000000518000 0000000000000000 Sep 19 09:37:48 swlab153 kernel: 000001015c556580 000001015b157400 000001015b15f6c0 0000010142e41e68 Sep 19 09:37:48 swlab153 kernel: ffffffffa0166668 00007ffe0c002400 Sep 19 09:37:48 swlab153 kernel: Call Trace:{:ib_mthca:mthca_alloc_srq+1545} Sep 19 09:37:48 swlab153 kernel: {:ib_mthca:mthca_create_srq+185} Sep 19 09:37:48 swlab153 kernel: {:ib_uverbs:ib_uverbs_create_srq+349} Sep 19 09:37:48 swlab153 kernel: {:ib_uverbs:ib_uverbs_srq_event_handler+0} Sep 19 09:37:48 swlab153 kernel: {:ib_uverbs:ib_uverbs_write+139} Sep 19 09:37:48 swlab153 kernel: {vfs_write+207} {sys_write+69} Sep 19 09:37:48 swlab153 kernel: {system_call+126} Sep 19 09:37:48 swlab153 kernel: Sep 19 09:37:48 swlab153 kernel: Code: 48 03 14 38 48 89 d0 c3 53 48 89 f3 44 8b 4e 70 8b 43 48 48 Sep 19 09:37:48 swlab153 kernel: RIP {:ib_mthca:get_wqe+48} RSP <0000010142e41da0> Sep 19 09:37:48 swlab153 kernel: CR2: 0000000000000030 did anyone see this oops too? Dotan Barak Software Verification Engineer Mellanox Technologies LTD Tel: +972-4-9097200 Ext: 231 Fax: +972-4-9593245 P.O. Box 86 Yokneam 20692 ISRAEL. Home: +972-77-8841095 Cell: 052-4222383 [ May the fork be with you ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Sep 19 04:32:24 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 07:32:24 -0400 Subject: [openib-general] RMPP Message Format Errors In-Reply-To: <432C8110.4070301@mellanox.co.il> References: <1126864484.5425.11385.camel@hal.voltaire.com> <432C8110.4070301@mellanox.co.il> Message-ID: <1127129539.4401.19026.camel@hal.voltaire.com> On Sat, 2005-09-17 at 16:48, Eitan Zahavi wrote: > I have rerun the test on a fresh build from the main trunk. > What I see now is the following: > > I have noticed that the last record in each RMPP GetTableResp are only partly filled. > I have traced that to the actual sent data on the wire so I guess there > is another bug in the sender. I attach here the text dump of the analyzer trace. > You can see how the "node description" field is cut in the NodeInfoRec query and how > the last PortInfoRec is mostly zeros in the second MAD. Yes, there is a bug in the amount of the buffer copied on the send side in user_mad which truncates the last record. Patch for this shortly. -- Hal From halr at voltaire.com Mon Sep 19 04:38:51 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 07:38:51 -0400 Subject: [openib-general] [PATCH] user_mad: Fix length of user buffer copied when sending RMPP Message-ID: <1127129930.4401.19073.camel@hal.voltaire.com> user_mad: Fix length of user buffer copied when sending RMPP Signed-off-by: Hal Rosenstock Index: user_mad.c =================================================================== --- user_mad.c (revision 3472) +++ user_mad.c (working copy) @@ -273,6 +273,7 @@ static ssize_t ib_umad_write(struct file u8 method; __be64 *tid; int ret, length, hdr_len, data_len, rmpp_hdr_size; + int class_hdr_len = 0; int rmpp_active = 0; if (count < sizeof (struct ib_user_mad)) @@ -338,10 +339,12 @@ static ssize_t ib_umad_write(struct file if (rmpp_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_ADM) { hdr_len = offsetof(struct ib_sa_mad, data); data_len = length; + class_hdr_len = sizeof(struct ib_sa_hdr); } else if ((rmpp_mad->mad_hdr.mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) && (rmpp_mad->mad_hdr.mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END)) { hdr_len = offsetof(struct ib_vendor_mad, data); data_len = length - hdr_len; + class_hdr_len = 4; } else { ret = -EINVAL; goto err_ah; @@ -390,7 +393,7 @@ static ssize_t ib_umad_write(struct file /* Now, copy rest of message from user into send buffer */ if (copy_from_user(((struct ib_rmpp_mad *) packet->msg->mad)->data, buf + sizeof (struct ib_user_mad) + rmpp_hdr_size, - length)) { + length + class_hdr_len)) { ret = -EFAULT; goto err_msg; } From halr at voltaire.com Mon Sep 19 06:03:38 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 09:03:38 -0400 Subject: [openib-general] [RFC] send side QP redirection In-Reply-To: References: Message-ID: <1127135018.4401.19694.camel@hal.voltaire.com> On Sat, 2005-09-17 at 14:58, Sean Hefty wrote: > >> struct ib_mad_av { > >> struct ib_ah *ah; > >> u32 remote_qpn; > >> u32 remote_qkey; > >> u16 pkey_index; > >> }; > > > >What about SL and the other redirect GRH fields (TC and FL) ? > > These would have been specified through the ib_ah_attr when the destination was > added. I think that these are the four fields needed to allocate and send a > MAD. To clarify, an ib_mad_av specifies a tuple, which I > refer to as a destination. Aren't LID and GID also part of ib_ah_attr struct too ? > >> struct ib_mad_av* ib_insert_mad_dest(struct ib_mad_agent*, > >> struct ib_ah_attr*, > >> mgmt_class, lid, gid, pkey); > > > >So the last three parameters are transferred from the received > >ClassPortInfo. > > Not exactly, the destination is inserted based on mgmt_class, lid, and gid > before sending any request. These are likely coming from a path record. The > pkey is translated into an index and returned as part of the MAD address vector. > Basically, a user would call this routine in place of ib_create_ah. But this call takes an ib_ah_attr struct so I don't see how this call is in place of ib_create_ah. > >> struct ib_mad_av* ib_get_mad_av(struct ib_mad_agent*, > >> mgmt_class, lid, gid); > > > >I'm missing why LID and GID are needed here and why pkey is missing. > >Is there more than 1 struct ib_mad_av allowed per mgmt_class ? > > There will be one ib_mad_av per mgmt_class per remote destination. A remote > destination is identified by the class and LID. (For the initial > implementation, the GID will essentially be ignored.) Got it. > >> /* > >> * TBD: need to determine when to remove a destination. Can remove > >> * always if references go to 0. Can add a delay before removal. Can > >> * maintain destinations that have been redirected. ? > >> */ > >> void ib_free_mad_av(struct ib_mad_av*); > >> > >> ib_redirect_mads(struct ib_mad_av*, struct ib_mad_recv_wc*, > >> struct ib_class_port_info*); Does this return void or int ? > >> /* GID redirection with unknown LID would be deferred */ > >> int ib_get_mad_redirect_path(...); > > > >Just curious why ? > > The spec permits redirection to another GID without specifying the LID. This > requires that the requester send a path record query to the SA to obtain the > rest of the information before it can send to the redirected QP. This results > in an asynchronous operation, which becomes more difficult to deal with. The > original MAD and others destined for that QP now need to be queued until the > query completes and the full destination is known. I can see why this functionality should be deferred from an complexity standpoint. Any idea where this might be used rather than LID redirection ? > >> Questions or thoughts? > >Is multiple redirection handled by this ? > > It should be. ib_redirect_mads uses the existing ib_mad_av to record that > redirection has occurred. This should allow for multiple redirection. Understood. If a multiple redirection occurred, this would be called multiple times. > >Would these semantics need to be extended to user space by user_mad ? > > I can't think of a reason why they couldn't. It would only be called by > userspace clients that initiate sending MADs. That would be the next step. -- Hal From guyg at voltaire.com Mon Sep 19 07:54:53 2005 From: guyg at voltaire.com (Guy German) Date: Mon, 19 Sep 2005 17:54:53 +0300 Subject: [openib-general] [RFC] CMA - generic CM implementaion for IB Message-ID: <432ED13D.2010800@voltaire.com> This is a draft of a generic cm abstraction layer implementation for infiniband. I would like to get your comments on it. Disclaimer: ---------- It is just a skeleton implementation *very* basically tested (it compiles). There are things not implemented yet, e.g: - ib_cma_get_device - ib_cma_get_src_ip - some cm events cases - arp retries (Maybe should implemented in at.c) - protection from destroy while in callback (route/path) - APM The implementation took reference of the the openib kdapl implementation, therefore I added Mellanox and NetApp copyrights. If there are other copyrights need to be added I will immediately do so. I am attaching the files to this mail and I will send them inlined in 2 separate mails (one for the header and one for the c implementation). Thanks, Guy -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: cma.c URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ib_cma.h URL: From guyg at voltaire.com Mon Sep 19 07:55:30 2005 From: guyg at voltaire.com (Guy German) Date: Mon, 19 Sep 2005 17:55:30 +0300 (IDT) Subject: [openib-general][PATCH][RFC]: CMA header Message-ID: /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * * This Software is licensed under one of the following licenses: * * 1) under the terms of the "Common Public License 1.0" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/cpl.php. * * 2) under the terms of the "The BSD License" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/bsd-license.php. * * 3) under the terms of the "GNU General Public License (GPL) Version 2" a * copy of which is available from the Open Source Initiative, see * http://www.opensource.org/licenses/gpl-license.php. * * Licensee has the right to choose one of the above licenses. * * Redistributions of source code must retain the above copyright * notice and one of the license notices. * * Redistributions in binary form must reproduce both the above copyright * notice, one of the license notices in the documentation * and/or other materials provided with the distribution. * */ /* * - for calling accept/reject or disconnect on the passive side you need to * use the cma handle accepted in ib_cma_listen cb. * - cma_id is created when calling connect or listen and destroyed when * accepting disconnected/rejected/unreachable events on either active * side (connect cb) or passive side (accept cb) */ #ifndef IB_CMA_H #define IB_CMA_H #include #include #include enum ib_cma_event { IB_CMA_EVENT_ESTABLISHED = 1, IB_CMA_EVENT_REJECTED, IB_CMA_EVENT_NON_PEER_REJECTED, IB_CMA_EVENT_DISCONNECTED, IB_CMA_EVENT_UNREACHABLE }; enum ib_qos { IB_QOS_BEST_EFFORT = 0, IB_QOS_HIGH_THROUGHPUT = (1 << 0), IB_QOS_LOW_LATENCY = (1 << 1), IB_QOS_ECONOMY = (1 << 2), IB_QOS_PREMIUM = (1 << 3) }; enum ib_connect_flags { IB_CONNECT_DEFAULT_FLAG = 0x00, IB_CONNECT_MULTIPATH_FLAG = 0x01 }; typedef void (*ib_cma_addr_handler)(struct sockaddr *src_ip, void *context); typedef void (*ib_cma_ac_handler)(enum ib_cma_event event, void *context); typedef void (*ib_cma_event_handler)(enum ib_cma_event event, void *context, const void *private_data); typedef void (*ib_cma_listen_handler)(void *cma_id, struct ib_device *device, void *private_data, void *context); struct ib_cma_conn { struct ib_qp *qp; struct ib_qp_attr *qp_attr; struct sockaddr *dst_ip; __be64 service_id; struct ib_device *device; void *context; ib_cma_event_handler cma_event_handler; const void *private_data; u8 private_data_len; enum ib_qos qos; enum ib_connect_flags connect_flags; }; /** * ib_cma_get_device - Returns the device and port to be used according * to the destination ip address (this can be detemined according * to the local routing table). Call this function before * creating the qp. If using link-local IPv6 addresses * there is no need to call this function. * @remote_address - The destination address for connection * @qos - desired quality of service * @device - The device to use (output) * @port - port to use (output) * */ int ib_cma_get_device(struct sockaddr *remote_address, enum ib_qos qos, struct ib_device **device, u8 *port); /** * ib_cma_create_qp - creates and returns a qp for a specified pd and port * and modifies it to be in the init state. * @pd - The protection domain associated with the QP, created on the device * retreived by ib_cma_get_device * @port - The port to use in modify to init operation * @qp - The qp created (out) * @init_attr - attributes for qp creation */ int ib_cma_create_qp(struct ib_pd *pd, u8 port, struct ib_qp **qp, struct ib_qp_init_attr *init_attr); /** * ib_cma_connect - this is the connect request function, called by * the active side. The consumer registers an upcall that will be * initiated by the cma with an appropriate connection event * notification (established/rejected/disconnected etc) * @cma_conn: This structure contains the following connection parameters: * @qp: qp for establishing the connection * @qp_attr: only relevant attributes are used * @dst_ip: destination ip address * @service_id: destination service id (port) * @context: context to be returned in the callback * @cma_event_handler: the upcall function for the active side * @private_data: private data to be received at the listener upcall * @private_data_len: private data length (max 255) * @qos: Quality os service for the rc * @connect_flags: default or multipath connection * @cma_id: This returned handle is a void* (different in ib and iwarp) * in ib - it is pointer to struct cma_context. */ int ib_cma_connect(struct ib_cma_conn *cma_conn, void **cma_id); /** * ib_cma_disconnect - this function disconnects the rc. It can be * called, by either the passive or active side * @qp: the connected qp to disconnect * @cma_id: On the active side- this handle is the one returned * when ib_cma_connect was called. * On the passive side- this handle was accepted in cma_listen callback */ int ib_cma_disconnect(struct ib_qp *qp, void *cma_id); /** * ib_cma_listen - this function is called by the passive side. It is * listening on a the specified port (ib service id) for incomming * connection requests * @device: * @address: * @service_id: service id (port) to listen on * @context: user context to be returned in the callback * @cm_listen_handler: the listen callback * @cma_id: cma handle for the passive side */ int ib_cma_listen(struct ib_device *device, struct sockaddr *address, __be64 service_id, void *context, ib_cma_listen_handler cm_listen_handler, void **cma_id); /** * ib_cma_destroy - this functionis is called on the passive side, to * stop listenning on a certain sevice id * @cma_id: the same cma handle received when ib_cma_sid_listen was called */ int ib_cma_destroy(void *cma_id); /** * ib_cma_accept - call on the passive side to accept a connection request * note that if the function returned with error - a reject message was * sent to the remote side and the cma_id was destroyed * @cma_id: pass the handle that was returned in cma_listen callback for * this connection * @qp: the connection's qp * @private_data: private data to send back to the initiator * @private_data_len: private data length * @context: user context to be returned in the callback * @cm_accept_handler: the cma accept callback - triggered when RTU ack * received */ int ib_cma_accept(void *cma_id, struct ib_qp *qp, const void *private_data, u8 private_data_len, void *context, ib_cma_ac_handler cm_accept_handler); /** * ib_cma_reject - call on the passive side to reject a connection request. * This call destroys the cma_id, hence when the active side accepts * the reject the cma_id is already destroyed. * @cma_id: this handle was accepted in cma_listen callback * @private_data: private data to send back to the initiator * @private_data_len: private data length */ int ib_cma_reject(void *cma_id, const void *private_data, u8 private_data_len); /** * ib_cma_get_src_ip - this function asynchronicly finds * src ip from cma_id * @cma_id: the cma_id will have to include the path data received * in the request handler * @src_ip: source ip of the initiator */ int ib_cma_get_src_ip(void *cma_id, ib_cma_addr_handler addr_handler, void *context); #endif /* IB_CMA_H */ 5~ From guyg at voltaire.com Mon Sep 19 07:56:39 2005 From: guyg at voltaire.com (Guy German) Date: Mon, 19 Sep 2005 17:56:39 +0300 (IDT) Subject: [openib-general][PATCH][RFC]: CMA IB implementation Message-ID: /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved. * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved. * * This Software is licensed under one of the following licenses: * * 1) under the terms of the "Common Public License 1.0" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/cpl.php. * * 2) under the terms of the "The BSD License" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/bsd-license.php. * * 3) under the terms of the "GNU General Public License (GPL) Version 2" a * copy of which is available from the Open Source Initiative, see * http://www.opensource.org/licenses/gpl-license.php. * * Licensee has the right to choose one of the above licenses. * * Redistributions of source code must retain the above copyright * notice and one of the license notices. * * Redistributions in binary form must reproduce both the above copyright * notice, one of the license notices in the documentation * and/or other materials provided with the distribution. * */ #include "ib_cma.h" // move to #include #include MODULE_AUTHOR("Guy German"); MODULE_DESCRIPTION("Generic RDMA CM - Infiniband implementation"); MODULE_LICENSE("Dual BSD/GPL"); #define PFX "ib_cma: " #define CMA_TARGET_MAX 4 #define CMA_INITIATOR_DEPTH 4 #define CMA_RC_RETRY_COUNT 7 #define CMA_RNR_RETRY_COUNT 6 #define CMA_CM_RESPONSE_TIMEOUT 20 /* 4 sec */ #define CMA_MAX_CM_RETRIES 0 enum cma_close_flags { CMA_CLOSE_ABRUPT = 0, CMA_CLOSE_GRACEFUL }; struct accept_callback { ib_cma_ac_handler func; void *context; }; struct listen_callback { ib_cma_listen_handler func; void *context; }; struct cma_context { struct ib_cm_id *cm_id; struct ib_cma_conn cma_conn; struct ib_cm_req_param cma_param; struct ib_at_ib_route cma_route; struct ib_sa_path_rec cma_path; struct ib_at_completion ibat_comp; struct accept_callback accept_cb; struct listen_callback listen_cb; struct cma_context *creq_cma_ctx; spinlock_t lock; unsigned long flags; int in_callback; int destroy; }; /* Static functions */ /* * approximately transforms microseconds to 4.096us*2^x* * 63(+8) is max return */ static inline u8 us_to_cmt(u32 us) { u32 ms = us, converged = 2; u8 i; do_div(ms, 1000UL); if (ms < 2) return 8; for (i = 1; i < 63; i++) { if (converged >= ms) break; converged *= 2; } return i+8; } static int cma_modify_qp_state(struct ib_cm_id *cm_id, struct ib_qp *qp, enum ib_qp_state qp_state, int qp_attr_mask) { struct ib_qp_attr qp_attr; int status = 0; printk(KERN_DEBUG PFX "%s: enter >>> modify to %d\n", __func__, qp_state); if (qp == NULL) return -EINVAL; memset(&qp_attr, 0, sizeof qp_attr); qp_attr.qp_state = qp_state; if (cm_id && !qp_attr_mask) status = ib_cm_init_qp_attr(cm_id, &qp_attr, &qp_attr_mask); if (!status) status = ib_modify_qp(qp, &qp_attr, qp_attr_mask); if (status) printk(KERN_ERR PFX "%s: qp_state=%d status%d\n", __func__, qp_state, status); return status; } static int destroy_cma_ctx(struct cma_context *cma_ctx) { if(!IS_ERR(cma_ctx->cm_id)) ib_destroy_cm_id(cma_ctx->cm_id); if (cma_ctx->cma_param.private_data) kfree(cma_ctx->cma_param.private_data); if (cma_ctx) kfree(cma_ctx); cma_ctx = NULL; return 0; } static int cma_disconnect(struct ib_qp *qp, struct cma_context *cma_ctx, enum cma_close_flags cflags) { int status; if (cma_ctx == NULL) goto modqp; if (cflags == CMA_CLOSE_ABRUPT) status = destroy_cma_ctx(cma_ctx); else if (cflags == CMA_CLOSE_GRACEFUL){ status = ib_send_cm_dreq(cma_ctx->cm_id, NULL, 0); } modqp: status = cma_modify_qp_state(0, qp, IB_QPS_ERR, IB_QP_STATE); return status; } void cma_connection_callback(struct cma_context *cma_ctx, const enum ib_cma_event event, const void *private_data) { ib_cma_event_handler conn_cb; struct ib_qp *qp = cma_ctx->cma_conn.qp; int status; conn_cb = cma_ctx->cma_conn.cma_event_handler; switch (event) { case IB_CMA_EVENT_ESTABLISHED: break; case IB_CMA_EVENT_DISCONNECTED: case IB_CMA_EVENT_REJECTED: case IB_CMA_EVENT_UNREACHABLE: case IB_CMA_EVENT_NON_PEER_REJECTED: status = cma_disconnect(qp, cma_ctx, CMA_CLOSE_ABRUPT); break; default: printk(KERN_ERR PFX "%s: unknown event !!\n", __func__); } printk(KERN_DEBUG PFX "%s: event=%d\n", __func__, event); conn_cb(event, cma_ctx->cma_conn.context, private_data); } static inline int cma_rep_recv(struct cma_context *cma_ctx, struct ib_cm_event *rep_cm_event) { int status; status = cma_modify_qp_state(cma_ctx->cm_id, cma_ctx->cma_conn.qp, IB_QPS_RTR, 0); if (status) { printk(KERN_ERR PFX "%s: fail to modify QPS_RTR %d\n", __func__, status); return status; } status = cma_modify_qp_state(cma_ctx->cm_id, cma_ctx->cma_conn.qp, IB_QPS_RTS, 0); if (status) { printk(KERN_ERR PFX "%s: fail to modify QPS_RTR %d\n", __func__, status); return status; } status = ib_send_cm_rtu(cma_ctx->cm_id, NULL, 0); if (status) { printk(KERN_ERR PFX "%s: fail to send cm rtu %d\n", __func__, status); return status; } return 0; } int cma_active_cb_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) { int status = 0; enum ib_cma_event cma_event = 0; struct cma_context *cma_ctx = cm_id->context; printk(KERN_DEBUG PFX "%s: enter >>> cm_id=%p cma_ctx=%p\n",__func__, cm_id, cma_ctx); switch (event->event) { case IB_CM_REQ_ERROR: cma_event = IB_CMA_EVENT_UNREACHABLE; break; case IB_CM_REJ_RECEIVED: cma_event = IB_CMA_EVENT_NON_PEER_REJECTED; break; case IB_CM_DREP_RECEIVED: case IB_CM_TIMEWAIT_EXIT: cma_event = IB_CMA_EVENT_DISCONNECTED; break; case IB_CM_REP_RECEIVED: status = cma_rep_recv(cma_ctx, event); if (!status) cma_event = IB_CMA_EVENT_ESTABLISHED; else cma_event = IB_CMA_EVENT_DISCONNECTED; break; case IB_CM_DREQ_RECEIVED: ib_send_cm_drep(cm_id, NULL, 0); cma_event = IB_CMA_EVENT_DISCONNECTED; break; case IB_CM_DREQ_ERROR: break; default: printk(KERN_WARNING PFX "%s: cm event (%d) not handled\n", __func__, event->event); break; } printk(KERN_WARNING PFX "%s: cm_event=%d cma_event=%d\n", __func__, event->event, cma_event); if (cma_event) cma_connection_callback(cma_ctx, cma_event, event->private_data); return status; } static struct cma_context *get_cma_ctx(struct ib_cm_id *cm_id, struct ib_cm_event *event) { struct cma_context *new_cma_ctx; int status; if (event->event != IB_CM_REQ_RECEIVED) return cm_id->context; if (((struct cma_context *)cm_id->context)->cm_id != cm_id) printk(KERN_DEBUG PFX "%s: old_cm_id=%p new_cm_id=%p\n", __func__, ((struct cma_context *)cm_id->context)->cm_id, cm_id); new_cma_ctx = kmalloc(sizeof *new_cma_ctx, GFP_KERNEL); if (!new_cma_ctx) { status = ib_send_cm_rej(cm_id, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, NULL, 0); return NULL; } memset(new_cma_ctx, 0, sizeof *new_cma_ctx); new_cma_ctx->cm_id = cm_id; new_cma_ctx->creq_cma_ctx = cm_id->context; cm_id->context = new_cma_ctx; return new_cma_ctx; } static int cma_passive_cb_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) { struct cma_context *cma_ctx; ib_cma_listen_handler crcb; ib_cma_ac_handler accb; void *cr_ctx, *ac_ctx; int status = 0; printk(KERN_DEBUG PFX "%s: enter >>> cm_id=%p\n",__func__, cm_id); cma_ctx = get_cma_ctx(cm_id, event); if (!cma_ctx) return -EINVAL; accb = cma_ctx->accept_cb.func; ac_ctx = cma_ctx->accept_cb.context; switch (event->event) { case IB_CM_REQ_RECEIVED: crcb = cma_ctx->creq_cma_ctx->listen_cb.func; cr_ctx = cma_ctx->creq_cma_ctx->listen_cb.context; memcpy(&cma_ctx->cma_path, ((struct ib_cm_req_event_param *) &event->param)->primary_path, sizeof cma_ctx->cma_path); crcb(cma_ctx, cm_id->device, event->private_data, cr_ctx); break; case IB_CM_REP_ERROR: accb(IB_CMA_EVENT_UNREACHABLE, ac_ctx); break; case IB_CM_REJ_RECEIVED: accb(IB_CMA_EVENT_REJECTED, ac_ctx); break; case IB_CM_RTU_RECEIVED: status = cma_modify_qp_state(cma_ctx->cm_id, cma_ctx->cma_conn.qp, IB_QPS_RTS, 0); if (!status) accb(IB_CMA_EVENT_ESTABLISHED, ac_ctx); else { accb(IB_CMA_EVENT_DISCONNECTED, ac_ctx); status = cma_disconnect(cma_ctx->cma_conn.qp, cma_ctx, CMA_CLOSE_ABRUPT); } break; case IB_CM_DREQ_RECEIVED: ib_send_cm_drep(cm_id, NULL, 0); break; case IB_CM_DREQ_ERROR: break; case IB_CM_DREP_RECEIVED: case IB_CM_TIMEWAIT_EXIT: accb(IB_CMA_EVENT_DISCONNECTED, ac_ctx); status = cma_disconnect(cma_ctx->cma_conn.qp, cma_ctx, CMA_CLOSE_ABRUPT); break; default: break; } destroy_cma_ctx(cma_ctx); return status; } static void cma_path_handler(u64 req_id, void *context, int rec_num) { struct cma_context *cma_ctx = context; enum ib_cma_event event; int status = 0; if (!cma_ctx) { printk(KERN_ERR PFX "%s: context received null\n",__func__); return; } if (rec_num <= 0) { event = IB_CMA_EVENT_UNREACHABLE; goto error; } cma_ctx->cma_param.primary_path = &cma_ctx->cma_path; cma_ctx->cma_param.alternate_path = NULL; printk(KERN_DEBUG PFX "%s: dlid=%d slid=%d pkey=%d mtu=%d sid=%llx " "qpn=%d qpt=%d psn=%d prd=%s respres=%d rcm=%d flc=%d " "cmt=%d rtrc=%d rntrtr=%d maxcm=%d \n",__func__, cma_ctx->cma_param.primary_path->dlid , cma_ctx->cma_param.primary_path->slid , cma_ctx->cma_param.primary_path->pkey , cma_ctx->cma_param.primary_path->mtu , cma_ctx->cma_param.service_id, cma_ctx->cma_param.qp_num, cma_ctx->cma_param.qp_type, cma_ctx->cma_param.starting_psn, (char *)cma_ctx->cma_param.private_data, cma_ctx->cma_param.responder_resources, cma_ctx->cma_param.remote_cm_response_timeout, cma_ctx->cma_param.flow_control, cma_ctx->cma_param.local_cm_response_timeout, cma_ctx->cma_param.retry_count, cma_ctx->cma_param.rnr_retry_count, cma_ctx->cma_param.max_cm_retries); status = ib_send_cm_req(cma_ctx->cm_id, &cma_ctx->cma_param); if (status) { printk(KERN_ERR PFX "%s: cm_req failed %d\n",__func__, status); event = IB_CMA_EVENT_REJECTED; goto error; } return; error: printk(KERN_ERR PFX "%s: return error %d \n",__func__, status); cma_connection_callback(cma_ctx, event, NULL); } static void cma_route_handler(u64 req_id, void *context, int rec_num) { struct cma_context *cma_ctx = context; enum ib_cma_event event; int status = 0; if (rec_num <= 0) { event = IB_CMA_EVENT_UNREACHABLE; goto error; } cma_ctx->ibat_comp.fn = &cma_path_handler; cma_ctx->ibat_comp.context = cma_ctx; status = ib_at_paths_by_route(&cma_ctx->cma_route, 0, &cma_ctx->cma_path, 1, &cma_ctx->ibat_comp); if (status) { event = IB_CMA_EVENT_DISCONNECTED; goto error; } return; error: printk(KERN_ERR PFX "%s: return error %d \n",__func__, status); cma_connection_callback(cma_ctx, event ,NULL); } /* API functions */ int ib_cma_create_qp(struct ib_pd *pd, u8 port, struct ib_qp **qp_in, struct ib_qp_init_attr *init_attr) { struct ib_qp_attr qp_attr; int qp_attr_mask; struct ib_qp *qp; qp = ib_create_qp(pd, init_attr); if (IS_ERR(qp)) return IS_ERR(qp); *qp_in = qp; printk(KERN_DEBUG PFX "%s: qp created (%p)\n",__func__, qp); memset(&qp_attr, 0, sizeof qp_attr); qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS | IB_QP_PKEY_INDEX | IB_QP_PORT; qp_attr.qp_access_flags = IB_ACCESS_REMOTE_READ | IB_ACCESS_REMOTE_WRITE; qp_attr.qp_state = IB_QPS_INIT; qp_attr.pkey_index = 0; qp_attr.port_num = port; printk(KERN_DEBUG PFX "%s: call ib_modify_qp (qp_attr_mask=%x)\n", __func__, qp_attr_mask); return ib_modify_qp(qp, &qp_attr, qp_attr_mask); }; EXPORT_SYMBOL(ib_cma_create_qp); int ib_cma_connect(struct ib_cma_conn *cma_conn, void **cma_id) { struct cma_context *cma_ctx; int status; u32 timeout; u32 dst_ip; dst_ip = (((struct sockaddr_in *)(cma_conn->dst_ip))->sin_addr).s_addr; printk(KERN_DEBUG PFX "%s: enter >>> dst_ip=%d.%d.%d.%d\n",__func__, (dst_ip & 0x000000ff), (dst_ip & 0x0000ff00) >> 8, (dst_ip & 0x00ff0000) >> 16, (dst_ip & 0xff000000) >> 24); cma_ctx = kmalloc(sizeof *cma_ctx, GFP_KERNEL); if (!cma_ctx) return -ENOMEM; memset(cma_ctx, 0, sizeof *cma_ctx); timeout = us_to_cmt(cma_conn->qp_attr->timeout); cma_ctx->cm_id = ib_create_cm_id(cma_conn->device, cma_active_cb_handler, (void *)cma_ctx); if (IS_ERR(cma_ctx->cm_id)) { printk(KERN_ERR PFX "%s: cm_id creation failed\n", __func__); destroy_cma_ctx(cma_ctx); return -EAGAIN; } else printk(KERN_DEBUG PFX "%s: cm_id created %p\n", __func__, cma_ctx->cm_id); printk(KERN_DEBUG PFX "%s: cma_event_handler=%p\n", __func__, cma_conn->cma_event_handler ); memcpy(&cma_ctx->cma_conn, cma_conn, sizeof *cma_conn); cma_ctx->cma_param.service_id = cma_conn->service_id; cma_ctx->cma_param.qp_num = cma_conn->qp->qp_num; cma_ctx->cma_param.qp_type = IB_QPT_RC; cma_ctx->cma_param.private_data = kmalloc(cma_conn->private_data_len, GFP_KERNEL); memcpy((u8 *)cma_ctx->cma_param.private_data, (u8 *)cma_conn->private_data, cma_conn->private_data_len); cma_ctx->cma_param.private_data_len = cma_conn->private_data_len; cma_ctx->cma_param.responder_resources = CMA_TARGET_MAX; cma_ctx->cma_param.initiator_depth = CMA_INITIATOR_DEPTH; cma_ctx->cma_param.remote_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT; //timeout; cma_ctx->cma_param.local_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT; cma_ctx->cma_param.retry_count = CMA_RC_RETRY_COUNT; cma_ctx->cma_param.rnr_retry_count = CMA_RNR_RETRY_COUNT; cma_ctx->cma_param.max_cm_retries = CMA_MAX_CM_RETRIES; cma_ctx->ibat_comp.fn = &cma_route_handler; cma_ctx->ibat_comp.context = cma_ctx; status = ib_at_route_by_ip(dst_ip, 0, 0, 0, &cma_ctx->cma_route, &cma_ctx->ibat_comp); if (status < 0) { printk(KERN_ERR PFX " ib_at_route_by_ip failed (%d)\n", status); destroy_cma_ctx(cma_ctx); return -EAGAIN; } if (status == 1) { printk(KERN_DEBUG PFX "%s: immidiate route - call " "route_handler\n",__func__); cma_route_handler(cma_ctx->ibat_comp.req_id, cma_ctx, 1); } *cma_id = (void *)cma_ctx; return 0; }; EXPORT_SYMBOL(ib_cma_connect); int ib_cma_disconnect(struct ib_qp *qp, void *cma_id) { struct cma_context *cma_ctx = cma_id; int status; status = cma_disconnect(qp, cma_ctx, CMA_CLOSE_ABRUPT); return status; }; EXPORT_SYMBOL(ib_cma_disconnect); int ib_cma_listen(struct ib_device *device, struct sockaddr *address, __be64 service_id, void *context, ib_cma_listen_handler cm_listen_handler, void **cma_id) { struct cma_context *cma_ctx; int status; printk(KERN_DEBUG PFX "%s: enter >> \n",__func__); cma_ctx = kmalloc(sizeof *cma_ctx, GFP_KERNEL); if (!cma_ctx) return -ENOMEM; memset(cma_ctx, 0, sizeof *cma_ctx); cma_ctx->listen_cb.func = cm_listen_handler; cma_ctx->listen_cb.context = context; cma_ctx->cm_id = ib_create_cm_id(device, cma_passive_cb_handler, (void *)cma_ctx); if (IS_ERR(cma_ctx->cm_id)) { printk(KERN_ERR PFX "%s: cm_id creation failed\n", __func__); destroy_cma_ctx(cma_ctx); return -EAGAIN; } else printk(KERN_DEBUG PFX "%s: cm_id created %p\n", __func__, cma_ctx->cm_id); /* `address` is ignored at the moment ... */ status = ib_cm_listen(cma_ctx->cm_id, service_id, 0); if (status) { printk(KERN_ERR PFX "%s: cm_listen failed %d\n", __func__, status); destroy_cma_ctx(cma_ctx); return status; } printk(KERN_INFO PFX "%s:cm_id=%p cma_id=%p\n", __func__, cma_ctx->cm_id, cma_ctx); *cma_id = (void *)cma_ctx; return 0; }; EXPORT_SYMBOL(ib_cma_listen); int ib_cma_destroy(void *cma_id) { return destroy_cma_ctx((struct cma_context *)cma_id); }; EXPORT_SYMBOL(ib_cma_destroy); int ib_cma_accept(void *cma_id, struct ib_qp *qp, const void *private_data, u8 private_data_len, void *context, ib_cma_ac_handler cm_accept_handler) { struct cma_context *cma_ctx = cma_id; struct ib_cm_rep_param passive_params; int status; printk(KERN_DEBUG PFX "%s: enter >> private_data = %s (len=%d)\n", __func__, (char *)private_data, private_data_len); if (private_data_len > IB_CM_REP_PRIVATE_DATA_SIZE) { status = -EINVAL; goto reject; } memset(&passive_params, 0, sizeof passive_params); passive_params.private_data = private_data; passive_params.private_data_len = private_data_len; passive_params.qp_num = qp->qp_num; passive_params.responder_resources = CMA_TARGET_MAX; passive_params.initiator_depth = CMA_INITIATOR_DEPTH; passive_params.rnr_retry_count = CMA_RNR_RETRY_COUNT; status = cma_modify_qp_state(cma_ctx->cm_id, qp, IB_QPS_RTR, 0); if (status) goto reject; cma_ctx->accept_cb.func = cm_accept_handler; cma_ctx->accept_cb.context = context; status = ib_send_cm_rep(cma_ctx->cm_id, &passive_params); if (status) goto reject; printk(KERN_DEBUG PFX "%s: return success\n", __func__); return 0; reject: printk(KERN_ERR PFX "%s: error status %d\n", __func__, status); ib_send_cm_rej(cma_ctx->cm_id, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, NULL, 0); destroy_cma_ctx(cma_ctx); return status; }; EXPORT_SYMBOL(ib_cma_accept); int ib_cma_reject(void *cma_id, const void *private_data, u8 private_data_len) { struct cma_context *cma_ctx = cma_id; int status; if (cma_ctx == NULL) return -EINVAL; status = ib_send_cm_rej(cma_ctx->cm_id, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, private_data, private_data_len); destroy_cma_ctx(cma_ctx); return status; }; EXPORT_SYMBOL(ib_cma_reject); int ib_cma_get_device(struct sockaddr *remote_address, enum ib_qos qos, struct ib_device **device, u8 *port) { return 0; }; EXPORT_SYMBOL(ib_cma_get_device); int ib_cma_get_src_ip(void *cma_id, ib_cma_addr_handler addr_handler, void *context) { return 0; }; EXPORT_SYMBOL(ib_cma_get_src_ip); static int cma_init(void) { printk(KERN_WARNING PFX "START generic CM module\n"); return 0; } static void cma_cleanup(void) { printk(KERN_WARNING PFX "EXIT generic CM module\n"); } module_init(cma_init); module_exit(cma_cleanup); From jackm at mellanox.co.il Mon Sep 19 08:21:56 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Mon, 19 Sep 2005 18:21:56 +0300 Subject: [openib-general] recursion depth exceeded in ipoib_workqueue Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30FF168@mtlexch01.mtl.com> environment: HCA Port 1 of Host 1 is connected back-to-back to HCA port 1 of Host 2. A shell script running on Host 1 loads and unloads the openib driver. On Host 2, the openib driver is up and opensm is running. Host 1: while date ; do /etc/init.d/openibd start sleep 3 /etc/init.d/openibd stop sleep 1 done NOTES: a. sleeps were inserted to give time to opensm on host 2 to respond to changes b. openibd script attached Problem -- recursion depth exceeded in ipoib_workqueue: /var/log/messages from Host 1 ------------------------------ ib_mthca: Initializing (0000:04:00.0) ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 29 (level, low) -> IRQ 185 run_workqueue: recursion depth exceeded: 4 Call Trace:{flush_cpu_workqueue+87} {wait_for_completion+230} {default_wake_function+0} {lock_timer_base+41} {:ib_ipoib:ipoib_mcast_stop_thread+99} {:ib_ipoib:ipoib_mcast_restart_task+44} {flush_cpu_workqueue+205} {:ib_ipoib:ipoib_mcast_restart_task+0} {lock_timer_base+41} {:ib_ipoib:ipoib_mcast_stop_thread+99} {:ib_ipoib:ipoib_mcast_restart_task+44} {flush_cpu_workqueue+205} {:ib_ipoib:ipoib_mcast_restart_task+0} {lock_timer_base+41} {:ib_ipoib:ipoib_mcast_stop_thread+99} {:ib_ipoib:ipoib_mcast_restart_task+44} {:ib_ipoib:ipoib_mcast_restart_task+0} {worker_thread+478} {default_wake_function+0} {__wake_up_common+67} {default_wake_function+0} {keventd_create_kthread+0} {worker_thread+0} {keventd_create_kthread+0} {kthread+217} {child_rip+8} {keventd_create_kthread+0} {kthread+0} {child_rip+0} Please Note: -- Set Multicast List posts the restart task to the ipoib_workqueue (ipoib_main.c:675) -- ipoib_mcast_restart_task (ipoib_multicast.c) calls ipoib_mcast_stop_thread(), which calls flush_workqueue(ipoib_workqueue) -- so the restart task flushes the work queue its running from. -- Linux prevents the deadlock by testing if the flush is called from the same thread (see linux/workqueue.c:223). If it is, Linux flushes remaining tasks in the work queue (without waiting). This both breaks serialization of tasks in the work queue, and can cause the recursion overflow seen above. Jack <> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: openibd Type: application/octet-stream Size: 24304 bytes Desc: not available URL: From halr at voltaire.com Mon Sep 19 08:24:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 11:24:14 -0400 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <432A5032.7050303@mellanox.co.il> References: <52oe6uuu7f.fsf@cisco.com> <432A5032.7050303@mellanox.co.il> Message-ID: <1127143328.4401.20682.camel@hal.voltaire.com> On Fri, 2005-09-16 at 00:55, Eitan Zahavi wrote: > For the gen2 stack we could use the following "hack": > OpenSM scans all nodes for their description every time it does a full sweep. > So we could cause an extra sweep after each node description change by faking trap 144 > (HCA port capability mask change) and sending it over. Is a full sweep needed to handle a single HCA port capability mask change ? Also, couldn't the NodeDescription change be handled with a light sweep (if SM Get NodeDescription were added there) ? That might be better or are there other issues with doing it that way ? -- Hal From rolandd at cisco.com Mon Sep 19 09:02:23 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 09:02:23 -0700 Subject: [openib-general] Re: [PATCH] IPoIB: Fix SA client retransmission strategy In-Reply-To: <1126969571.4401.815.camel@hal.voltaire.com> (Hal Rosenstock's message of "17 Sep 2005 11:06:12 -0400") References: <1126969571.4401.815.camel@hal.voltaire.com> Message-ID: <52mzm9hwq8.fsf@cisco.com> Thanks, committed. I'm guessing this fixes the problems you reported with retries, right? - R. From rolandd at cisco.com Mon Sep 19 09:04:21 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 09:04:21 -0700 Subject: [openib-general] Re: [PATCH] set eq->nent earlier in mthca_create_eq In-Reply-To: <20050917182316.GD28659@mellanox.co.il> (Michael S. Tsirkin's message of "Sat, 17 Sep 2005 21:23:16 +0300") References: <52slw6uv9k.fsf@cisco.com> <20050917182316.GD28659@mellanox.co.il> Message-ID: <52irwxhwmy.fsf@cisco.com> OK, I committed this with the max(, 2) added. - R. From rolandd at cisco.com Mon Sep 19 09:06:29 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 09:06:29 -0700 Subject: [openib-general] Re: mthca_arbel_post_srq_recv/mthca_tavor_post_srq_recv In-Reply-To: <20050917181143.GA28659@mellanox.co.il> (Michael S. Tsirkin's message of "Sat, 17 Sep 2005 21:11:43 +0300") References: <20050917181143.GA28659@mellanox.co.il> Message-ID: <52ek7lhwje.fsf@cisco.com> Michael> Hi, Roland! The code in Michael> mthca_arbel_post_srq_recv/mthca_tavor_post_srq_recv looks Michael> very strange: there seems to be unreacheable code, Michael> spinlocks dont seem to be dropped on error, etc. Michael> Further, it seems that the functions return the number of Michael> posted descriptors on error. This differs from post_recv Michael> which alwasy returns an error code on error. Is that Michael> intentional? Am I missing something? Could you comment Michael> on this design please? No, you're right, it's absolutely broken. The code is the result of a screwed up merge from an earlier version. I fixed it up now. - R. From mshefty at ichips.intel.com Mon Sep 19 09:06:42 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 19 Sep 2005 09:06:42 -0700 Subject: [openib-general] Re: [PATCH] libibcm/libibat disable-libcheck option In-Reply-To: <20050917190414.GC29221@mellanox.co.il> References: <20050915051931.GA7802@mellanox.co.il> <20050917190414.GC29221@mellanox.co.il> Message-ID: <432EE212.70206@ichips.intel.com> Michael S. Tsirkin wrote: > So, any chance of this patch being accepted? > > I really want an option to first configure all libraries, then build them all. > configure checks break this, but they aren't really needed in a > monolitic build, so an option to disable ib library checks makes sense IMO. > > I dont think there are other ways to do this, are there? I don't have any objection as adding this as an option. - Sean From halr at voltaire.com Mon Sep 19 09:04:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 12:04:46 -0400 Subject: [openib-general] [PATCH] madeye: Mainly add more SA decode Message-ID: <1127145883.4401.20988.camel@hal.voltaire.com> madeye: Mainly add more SA decode Support SA attributes and add support for some missing SA methods Also, display data for received RMPP messages (next step is to do this on the send side) Also, allow filtering of messages by attribute ID Signed-off-by: Hal Rosenstock Index: madeye.c =================================================================== --- madeye.c (revision 3450) +++ madeye.c (working copy) @@ -1,5 +1,6 @@ /* - * Copyright (c) 2004 Intel Corporation. All rights reserved. + * Copyright (c) 2004, 2005 Intel Corporation. All rights reserved. + * Copyright (c) 2005 Voltaire Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -60,16 +61,19 @@ struct madeye_port { static int smp = 1; static int gmp = 1; static int mgmt_class = 0; +static int attr_id = 0; static int data = 0; module_param(smp, int, 0444); module_param(gmp, int, 0444); module_param(mgmt_class, int, 0444); +module_param(attr_id, int, 0444); module_param(data, int, 0444); MODULE_PARM_DESC(smp, "Display all SMPs (default=1)"); MODULE_PARM_DESC(gmp, "Display all GMPs (default=1)"); MODULE_PARM_DESC(mgmt_class, "Display all MADs of specified class (default=0)"); +MODULE_PARM_DESC(attr_id, "Display add MADs of specified attribute ID (default=0)"); MODULE_PARM_DESC(data, "Display data area of MADs (default=0)"); static char * get_class_name(u8 mgmt_class) @@ -130,6 +134,14 @@ static char * get_method_name(u8 mgmt_cl return "Get table response"; case IB_SA_METHOD_DELETE: return "Delete"; + case IB_SA_METHOD_DELETE_RESP: + return "Delete response"; + case IB_SA_METHOD_GET_MULTI: + return "Get Multi"; + case IB_SA_METHOD_GET_MULTI_RESP: + return "Get Multi response"; + case IB_SA_METHOD_GET_TRACE_TBL: + return "Get Trace Table response"; default: break; } @@ -162,6 +174,58 @@ static void print_status_details(u16 sta } } +static char * get_sa_attr(__be16 attr) +{ + switch(attr) { + case IB_SA_ATTR_CLASS_PORTINFO: + return "Class Port Info"; + case IB_SA_ATTR_NOTICE: + return "Notice"; + case IB_SA_ATTR_INFORM_INFO: + return "Inform Info"; + case IB_SA_ATTR_NODE_REC: + return "Node Record"; + case IB_SA_ATTR_PORT_INFO_REC: + return "PortInfo Record"; + case IB_SA_ATTR_SL2VL_REC: + return "SL to VL Record"; + case IB_SA_ATTR_SWITCH_REC: + return "Switch Record"; + case IB_SA_ATTR_LINEAR_FDB_REC: + return "Linear FDB Record"; + case IB_SA_ATTR_RANDOM_FDB_REC: + return "Random FDB Record"; + case IB_SA_ATTR_MCAST_FDB_REC: + return "Multicast FDB Record"; + case IB_SA_ATTR_SM_INFO_REC: + return "SM Info Record"; + case IB_SA_ATTR_LINK_REC: + return "Link Record"; + case IB_SA_ATTR_GUID_INFO_REC: + return "Guid Info Record"; + case IB_SA_ATTR_SERVICE_REC: + return "Service Record"; + case IB_SA_ATTR_PARTITION_REC: + return "Partition Record"; + case IB_SA_ATTR_PATH_REC: + return "Path Record"; + case IB_SA_ATTR_VL_ARB_REC: + return "VL Arb Record"; + case IB_SA_ATTR_MC_MEMBER_REC: + return "MC Member Record"; + case IB_SA_ATTR_TRACE_REC: + return "Trace Record"; + case IB_SA_ATTR_MULTI_PATH_REC: + return "Multi Path Record"; + case IB_SA_ATTR_SERVICE_ASSOC_REC: + return "Service Assoc Record"; + case IB_SA_ATTR_INFORM_INFO_REC: + return "Inform Info Record"; + default: + return ""; + } +} + static void print_mad_hdr(struct ib_mad_hdr *mad_hdr) { printk("MAD version....0x%01x\n", mad_hdr->base_version); @@ -175,7 +239,11 @@ static void print_mad_hdr(struct ib_mad_ print_status_details(be16_to_cpu(mad_hdr->status)); printk("Class specific.0x%02x\n", be16_to_cpu(mad_hdr->class_specific)); printk("Trans ID.......0x%llx\n", mad_hdr->tid); - printk("Attr ID........0x%02x\n", be16_to_cpu(mad_hdr->attr_id)); + if (mad_hdr->mgmt_class == IB_MGMT_CLASS_SUBN_ADM) + printk("Attr ID........0x%02x (%s)\n", be16_to_cpu(mad_hdr->attr_id), + get_sa_attr(be16_to_cpu(mad_hdr->attr_id))); + else + printk("Attr ID........0x%02x\n", be16_to_cpu(mad_hdr->attr_id)); printk("Attr modifier..0x%04x\n", be32_to_cpu(mad_hdr->attr_mod)); } @@ -326,6 +394,8 @@ static void snoop_smi_handler(struct ib_ { if (!smp && send_wr->wr.ud.mad_hdr->mgmt_class != mgmt_class) return; + if (attr_id && send_wr->wr.ud.mad_hdr->attr_id != attr_id) + return; printk("Madeye:sent SMP\n"); if (send_wr->num_sge > 1) { @@ -342,6 +412,8 @@ static void recv_smi_handler(struct ib_m { if (!smp && mad_recv_wc->recv_buf.mad->mad_hdr.mgmt_class != mgmt_class) return; + if (attr_id && mad_recv_wc->recv_buf.mad->mad_hdr.attr_id != attr_id) + return; printk("Madeye:recv SMP\n"); print_smp((struct ib_smp *)&mad_recv_wc->recv_buf.mad->mad_hdr); @@ -353,6 +425,7 @@ static int is_rmpp_mad(struct ib_mad_hdr switch (mad_hdr->method) { case IB_SA_METHOD_GET_TABLE: case IB_SA_METHOD_GET_TABLE_RESP: + case IB_SA_METHOD_GET_MULTI_RESP: return 1; default: break; @@ -373,6 +446,8 @@ static void snoop_gsi_handler(struct ib_ if (!gmp && send_wr->wr.ud.mad_hdr->mgmt_class != mgmt_class) return; + if (attr_id && send_wr->wr.ud.mad_hdr->attr_id != attr_id) + return; printk("Madeye:sent GMP\n"); print_mad_hdr(hdr); @@ -381,16 +456,23 @@ static void snoop_gsi_handler(struct ib_ mad = (struct ib_rmpp_mad *) hdr; print_rmpp_hdr(&mad->rmpp_hdr); } + } static void recv_gsi_handler(struct ib_mad_agent *mad_agent, struct ib_mad_recv_wc *mad_recv_wc) { struct ib_mad_hdr *hdr = &mad_recv_wc->recv_buf.mad->mad_hdr; - struct ib_rmpp_mad *mad; + struct ib_rmpp_mad *mad = NULL; + struct ib_sa_mad *sa_mad; + struct ib_vendor_mad *vendor_mad; + u8 *mad_data; + int i, j; if (!gmp && hdr->mgmt_class != mgmt_class) return; + if (attr_id && mad_recv_wc->recv_buf.mad->mad_hdr.attr_id != attr_id) + return; printk("Madeye:recv GMP\n"); print_mad_hdr(hdr); @@ -399,6 +481,40 @@ static void recv_gsi_handler(struct ib_m mad = (struct ib_rmpp_mad *) hdr; print_rmpp_hdr(&mad->rmpp_hdr); } + + if (data) { + if (hdr->mgmt_class == IB_MGMT_CLASS_SUBN_ADM) { + j = IB_MGMT_SA_DATA; + /* Display SA header */ + sa_mad = (struct ib_sa_mad *) &mad_recv_wc->recv_buf.mad; + + if (is_rmpp_mad(hdr)) { + if (mad->rmpp_hdr.rmpp_type != IB_MGMT_RMPP_TYPE_DATA) + return; + } + mad_data = sa_mad->data; + } else { + if (is_rmpp_mad(hdr)) { + j = IB_MGMT_VENDOR_DATA; + /* Display OUI */ + vendor_mad = (struct ib_vendor_mad *) &mad_recv_wc->recv_buf.mad; + printk("Vendor OUI......%01x %01x %01x\n", + vendor_mad->oui[0], + vendor_mad->oui[1], + vendor_mad->oui[2]); + mad_data = vendor_mad->data; + } else { + j = IB_MGMT_MAD_DATA; + mad_data = mad_recv_wc->recv_buf.mad->data; + } + } + for (i = 0; i < j; i++) { + if (i % 16 == 0) + printk("\nData..........."); + printk("%01x ", mad_data[i]); + } + printk("\n"); + } } static void madeye_add_one(struct ib_device *device) From halr at voltaire.com Mon Sep 19 09:08:30 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 12:08:30 -0400 Subject: [openib-general] Re: [PATCH] IPoIB: Fix SA client retransmission strategy In-Reply-To: <52mzm9hwq8.fsf@cisco.com> References: <1126969571.4401.815.camel@hal.voltaire.com> <52mzm9hwq8.fsf@cisco.com> Message-ID: <1127146110.4401.21012.camel@hal.voltaire.com> On Mon, 2005-09-19 at 12:02, Roland Dreier wrote: > Thanks, committed. > > I'm guessing this fixes the problems you reported with retries, right? Yup. Can it be pushed for 2.6.14 ? (Also, user_mad.c changes should be pushed upstream as well). -- Hal From rolandd at cisco.com Mon Sep 19 09:21:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 09:21:10 -0700 Subject: [openib-general] Re: executing the SRQ pingpong example In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEFD2@mtlexch01.mtl.com> (Dotan Barak's message of "Mon, 19 Sep 2005 09:51:26 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEFD2@mtlexch01.mtl.com> Message-ID: <52aci9hvux.fsf@cisco.com> Thanks, I think this patch (already committed) should fix the crash. Index: linux-kernel/infiniband/hw/mthca/mthca_srq.c =================================================================== --- linux-kernel/infiniband/hw/mthca/mthca_srq.c (revision 3478) +++ linux-kernel/infiniband/hw/mthca/mthca_srq.c (working copy) @@ -172,6 +172,8 @@ static int mthca_alloc_srq_buf(struct mt scatter->lkey = cpu_to_be32(MTHCA_INVAL_LKEY); } + srq->last = get_wqe(srq, srq->max - 1); + return 0; } @@ -263,7 +265,6 @@ int mthca_alloc_srq(struct mthca_dev *de srq->first_free = 0; srq->last_free = srq->max - 1; - srq->last = get_wqe(srq, srq->max - 1); return 0; From halr at voltaire.com Mon Sep 19 09:23:21 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 12:23:21 -0400 Subject: [openib-general] Re: [PATCH] libibcm/libibat disable-libcheck option In-Reply-To: <432EE212.70206@ichips.intel.com> References: <20050915051931.GA7802@mellanox.co.il> <20050917190414.GC29221@mellanox.co.il> <432EE212.70206@ichips.intel.com> Message-ID: <1127146844.4401.21112.camel@hal.voltaire.com> On Mon, 2005-09-19 at 12:06, Sean Hefty wrote: > Michael S. Tsirkin wrote: > > So, any chance of this patch being accepted? > > > > I really want an option to first configure all libraries, then build them all. > > configure checks break this, but they aren't really needed in a > > monolitic build, so an option to disable ib library checks makes sense IMO. > > > > I dont think there are other ways to do this, are there? > > I don't have any objection as adding this as an option. A similar change was accepted to userspace/management; granted the library dependencies there are more complex. Anyhow, I don't think it does any harm to allow this as an option. It's not the normal case so if someone wants to defeat the check intentionally they can. -- Hal From rolandd at cisco.com Mon Sep 19 09:33:15 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 09:33:15 -0700 Subject: [openib-general] [PATCH] user_mad: Fix length of user buffer copied when sending RMPP In-Reply-To: <1127129930.4401.19073.camel@hal.voltaire.com> (Hal Rosenstock's message of "19 Sep 2005 07:38:51 -0400") References: <1127129930.4401.19073.camel@hal.voltaire.com> Message-ID: <521x3lhvas.fsf@cisco.com> What version of user_mad.c is this against? It doesn't apply to the latest subversion, since you have the chunk if (copy_from_user(((struct ib_rmpp_mad *) packet->msg->mad)->data, buf + sizeof (struct ib_user_mad) + rmpp_hdr_size, - length)) { + length + class_hdr_len)) { but the current code looks like if (copy_from_user(((struct ib_rmpp_mad *) packet->msg->mad)->data, buf + sizeof (struct ib_user_mad) + rmpp_hdr_size, length - rmpp_hdr_size)) { I don't see how the current code could be wrong: at the beginning of the function, we do: length = count - sizeof (struct ib_user_mad); so length is the size of the buffer passed in by userspace, less the size of our user_mad header. Then in the copy_from_user() call, we're copying from an offset of sizeof (struct ib_user_mad) + rmpp_hdr_size after the beginning of the buffer, so we should copy at most the size of the buffer less that offset, which is exactly length - rmpp_hdr_size. If I'm wrong, can you regenerate your patch against the current code and provide a better changelog entry that describes what you're fixing? - R. From mshefty at ichips.intel.com Mon Sep 19 09:35:38 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 19 Sep 2005 09:35:38 -0700 Subject: [openib-general] [RFC] send side QP redirection In-Reply-To: <1127135018.4401.19694.camel@hal.voltaire.com> References: <1127135018.4401.19694.camel@hal.voltaire.com> Message-ID: <432EE8DA.1090705@ichips.intel.com> Hal Rosenstock wrote: >>>>struct ib_mad_av { >>>> struct ib_ah *ah; >>>> u32 remote_qpn; >>>> u32 remote_qkey; >>>> u16 pkey_index; >>>>}; >>> >>>What about SL and the other redirect GRH fields (TC and FL) ? >> >>These would have been specified through the ib_ah_attr when the destination was >>added. I think that these are the four fields needed to allocate and send a >>MAD. To clarify, an ib_mad_av specifies a tuple, which I >>refer to as a destination. > > Aren't LID and GID also part of ib_ah_attr struct too ? Yes. The idea is that are used to identify a unique remote agent. See next comment. >>>>struct ib_mad_av* ib_insert_mad_dest(struct ib_mad_agent*, >>>> struct ib_ah_attr*, >>>> mgmt_class, lid, gid, pkey); >>> >>>So the last three parameters are transferred from the received >>>ClassPortInfo. >> >>Not exactly, the destination is inserted based on mgmt_class, lid, and gid >>before sending any request. These are likely coming from a path record. The >>pkey is translated into an index and returned as part of the MAD address vector. >>Basically, a user would call this routine in place of ib_create_ah. > > But this call takes an ib_ah_attr struct so I don't see how this call is > in place of ib_create_ah. In place of calling ib_create_ah(struct ib_pd*, struct ib_ah_attr*) to obtain struct ib_ah*, this call is made. Since struct ib_mad_av contains struct ib_ah*, it can be used in place of it. As you pointed out above, the lid and gid are part of struct ib_ah_attr and can be removed as parameters. >>>>/* >>>> * TBD: need to determine when to remove a destination. Can remove >>>> * always if references go to 0. Can add a delay before removal. Can >>>> * maintain destinations that have been redirected. ? >>>> */ >>>>void ib_free_mad_av(struct ib_mad_av*); >>>> >>>>ib_redirect_mads(struct ib_mad_av*, struct ib_mad_recv_wc*, >>>> struct ib_class_port_info*); > > Does this return void or int ? This will return int or struct ib_mad_av*, since it may fail. Internally, it will allocate a new ib_mad_av, since the existing one may still be in use. >>>>/* GID redirection with unknown LID would be deferred */ >>>>int ib_get_mad_redirect_path(...); > > I can see why this functionality should be deferred from an complexity > standpoint. Any idea where this might be used rather than LID > redirection ? My guess is only once global routing is defined. - Sean From rolandd at cisco.com Mon Sep 19 09:42:33 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 09:42:33 -0700 Subject: [openib-general] Re: recursion depth exceeded in ipoib_workqueue In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30FF168@mtlexch01.mtl.com> (Jack Morgenstein's message of "Mon, 19 Sep 2005 18:21:56 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FF168@mtlexch01.mtl.com> Message-ID: <52wtldggau.fsf@cisco.com> Thanks, good catch. It seems that since ipoib_mcast_restart_task() runs from the same workqueue as the "multicast thread", we don't actually need to stop the thread -- if the restart task is running, we know that the multicast thread isn't running and can't run until the restart task returns. So no need to flush the workqueue... Something like this should work, right? --- linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (revision 3480) +++ linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -832,8 +832,6 @@ void ipoib_mcast_restart_task(void *dev_ ipoib_dbg_mcast(priv, "restarting multicast task\n"); - ipoib_mcast_stop_thread(dev); - spin_lock_irqsave(&priv->lock, flags); /* From halr at voltaire.com Mon Sep 19 09:43:44 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 12:43:44 -0400 Subject: [openib-general] [PATCH] user_mad: Fix length of user buffer copied when sending RMPP In-Reply-To: <521x3lhvas.fsf@cisco.com> References: <1127129930.4401.19073.camel@hal.voltaire.com> <521x3lhvas.fsf@cisco.com> Message-ID: <1127148222.4401.21294.camel@hal.voltaire.com> On Mon, 2005-09-19 at 12:33, Roland Dreier wrote: > What version of user_mad.c is this against? I already checked it in. There was an earlier change which was just inteneded to change some formatting but I made a mistake and made part of this change there where I (mistakenly) also eliminated the subtraction of rmpp_hdr_size you cite below. > It doesn't apply to the > latest subversion, since you have the chunk > > if (copy_from_user(((struct ib_rmpp_mad *) packet->msg->mad)->data, > buf + sizeof (struct ib_user_mad) + rmpp_hdr_size, > - length)) { > + length + class_hdr_len)) { > > but the current code looks like > > if (copy_from_user(((struct ib_rmpp_mad *) packet->msg->mad)->data, > buf + sizeof (struct ib_user_mad) + rmpp_hdr_size, > length - rmpp_hdr_size)) { > > I don't see how the current code could be wrong: at the beginning of > the function, we do: > > length = count - sizeof (struct ib_user_mad); > > so length is the size of the buffer passed in by userspace, less the > size of our user_mad header. Then in the copy_from_user() call, we're > copying from an offset of sizeof (struct ib_user_mad) + rmpp_hdr_size > after the beginning of the buffer, so we should copy at most the size > of the buffer less that offset, which is exactly length - rmpp_hdr_size. The length passed in for RMPP MADs is a little funny. In osm_vendor_ibumad.c::osm_vendor_send for RMPP, the length of the SA MAD header is subtracted off (but this includes the MAD header, the RMPP header, and the SA class header). Even if that length were to be made "more correct", it would only include 1 RMPP header's worth as that is what in the buffer being transmitted. That approach would require some slightly different changes to user_mad to make the proper adjustments. Would that approach be better ? > If I'm wrong, can you regenerate your patch against the current code > and provide a better changelog entry that describes what you're fixing? I can regenerate the diff of these 2 versions together if you want or redo this again with the other approach which might be clearer. What do you think ? -- Hal From rolandd at cisco.com Mon Sep 19 09:53:49 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 09:53:49 -0700 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: (Guy German's message of "Mon, 19 Sep 2005 17:55:30 +0300 (IDT)") References: Message-ID: <52slw1gfs2.fsf@cisco.com> This isn't horrible, but you seem to have ignored most of the discussion from last month: > int ib_cma_get_device(struct sockaddr *remote_address, > enum ib_qos qos, struct ib_device **device, u8 *port); How are you dealing with hotplug and object lifetime issues here? > int ib_cma_get_src_ip(void *cma_id, ib_cma_addr_handler addr_handler, > void *context); There's no point in making this asynchronous, since we're putting the source/dest information in the CM REQ private data. > int ib_cma_create_qp(struct ib_pd *pd, u8 port, struct ib_qp **qp, > struct ib_qp_init_attr *init_attr); What's the point of this function? > int ib_cma_listen(struct ib_device *device, struct sockaddr *address, > __be64 service_id, void *context, > ib_cma_listen_handler cm_listen_handler, > void **cma_id); A minor point, but there's no need for the service_id parameter. We'll just use the sin_port (or sin6_port) member of the sockaddr. Similarly struct ib_cma_conn doesn't need the service_id member either. - R. From rolandd at cisco.com Mon Sep 19 09:56:22 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 09:56:22 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: (Guy German's message of "Mon, 19 Sep 2005 17:56:39 +0300 (IDT)") References: Message-ID: <52oe6pgfnt.fsf@cisco.com> > /* > * approximately transforms microseconds to 4.096us*2^x* > * 63(+8) is max return > */ > static inline u8 us_to_cmt(u32 us) { This looks pretty bogus -- it only handles timeouts up to 2^32 microseconds, ie only 4 seconds. But fortunately... > timeout = us_to_cmt(cma_conn->qp_attr->timeout); the only use of it is bogus as well. You never use timeout again, which is good because qp_attr->timeout is not in units of microseconds; it's already in the IB logarithmic scale. - R. From mshefty at ichips.intel.com Mon Sep 19 10:02:19 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 19 Sep 2005 10:02:19 -0700 Subject: [openib-general] Re: [PATCH] madeye: Mainly add more SA decode In-Reply-To: <1127145883.4401.20988.camel@hal.voltaire.com> References: <1127145883.4401.20988.camel@hal.voltaire.com> Message-ID: <432EEF1B.2000109@ichips.intel.com> Hal Rosenstock wrote: > madeye: Mainly add more SA decode > Support SA attributes and add support for some missing SA methods > Also, display data for received RMPP messages (next step is to do this > on the send side) > Also, allow filtering of messages by attribute ID > > Signed-off-by: Hal Rosenstock Thanks - committed with minor changes to fixup some longer line lengths. - Sean From mst at mellanox.co.il Mon Sep 19 10:04:05 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Sep 2005 20:04:05 +0300 Subject: [openib-general] Re: recursion depth exceeded in ipoib_workqueue In-Reply-To: <52wtldggau.fsf@cisco.com> References: <52wtldggau.fsf@cisco.com> Message-ID: <20050919170405.GA25887@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: recursion depth exceeded in ipoib_workqueue > > Thanks, good catch. It seems that since ipoib_mcast_restart_task() > runs from the same workqueue as the "multicast thread", we don't > actually need to stop the thread -- if the restart task is running, > we know that the multicast thread isn't running and can't run until > the restart task returns. So no need to flush the workqueue... > > Something like this should work, right? Dont we need to cancel outstanding mcast queries? -- MST From rolandd at cisco.com Mon Sep 19 10:05:58 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 10:05:58 -0700 Subject: [openib-general] [PATCH] user_mad: Fix length of user buffer copied when sending RMPP In-Reply-To: <1127148222.4401.21294.camel@hal.voltaire.com> (Hal Rosenstock's message of "19 Sep 2005 12:43:44 -0400") References: <1127129930.4401.19073.camel@hal.voltaire.com> <521x3lhvas.fsf@cisco.com> <1127148222.4401.21294.camel@hal.voltaire.com> Message-ID: <52k6hdgf7t.fsf@cisco.com> Hal> I already checked it in. There was an earlier change which Hal> was just inteneded to change some formatting but I made a Hal> mistake and made part of this change there where I Hal> (mistakenly) also eliminated the subtraction of rmpp_hdr_size Hal> you cite below. I see... I hadn't done svn up. I still think the change has to be wrong, though. With your latest code: /* Now, copy rest of message from user into send buffer */ if (copy_from_user(((struct ib_rmpp_mad *) packet->msg->mad)->data, buf + sizeof (struct ib_user_mad) + rmpp_hdr_size, length + class_hdr_len)) { At the beginning of the function, length = count - sizeof (struct ib_user_mad); We know class_hdr_len >= 0. So that copy is copying count - sizeof (struct ib_user_mad) + class_hdr_len bytes from buf, at an offset of sizeof (struct ib_user_mad) + rmpp_hdr_size into the userspace buffer. So it copies up to an offset of count + class_hdr_len + rmpp_hdr_size in buf. But userspace only did a write of count bytes, so we're reading past the end of the userspace buffer. What am I missing? Hal> The length passed in for RMPP MADs is a little funny. In Hal> osm_vendor_ibumad.c::osm_vendor_send for RMPP, the length of Hal> the SA MAD header is subtracted off (but this includes the Hal> MAD header, the RMPP header, and the SA class header). Even Hal> if that length were to be made "more correct", it would only Hal> include 1 RMPP header's worth as that is what in the buffer Hal> being transmitted. That approach would require some slightly Hal> different changes to user_mad to make the proper adjustments. Hal> Would that approach be better ? I don't really understand this either. Doesn't userspace just pass in the data that the kernel passes on to ib_post_send_mad()? - R. From rolandd at cisco.com Mon Sep 19 10:09:17 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 10:09:17 -0700 Subject: [openib-general] Re: recursion depth exceeded in ipoib_workqueue In-Reply-To: <20050919170405.GA25887@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 19 Sep 2005 20:04:05 +0300") References: <52wtldggau.fsf@cisco.com> <20050919170405.GA25887@mellanox.co.il> Message-ID: <52fys1gf2a.fsf@cisco.com> Michael> Dont we need to cancel outstanding mcast queries? Yes, I missed that. How about this? Index: linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (revision 3480) +++ linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -598,20 +598,11 @@ int ipoib_mcast_start_thread(struct net_ return 0; } -int ipoib_mcast_stop_thread(struct net_device *dev) +static void ipoib_mcast_cancel_queries(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_mcast *mcast; - ipoib_dbg_mcast(priv, "stopping multicast thread\n"); - - down(&mcast_mutex); - clear_bit(IPOIB_MCAST_RUN, &priv->flags); - cancel_delayed_work(&priv->mcast_task); - up(&mcast_mutex); - - flush_workqueue(ipoib_workqueue); - if (priv->broadcast && priv->broadcast->query) { ib_sa_cancel_query(priv->broadcast->query_id, priv->broadcast->query); priv->broadcast->query = NULL; @@ -628,6 +619,22 @@ int ipoib_mcast_stop_thread(struct net_d wait_for_completion(&mcast->done); } } +} + +int ipoib_mcast_stop_thread(struct net_device *dev) +{ + struct ipoib_dev_priv *priv = netdev_priv(dev); + + ipoib_dbg_mcast(priv, "stopping multicast thread\n"); + + down(&mcast_mutex); + clear_bit(IPOIB_MCAST_RUN, &priv->flags); + cancel_delayed_work(&priv->mcast_task); + up(&mcast_mutex); + + flush_workqueue(ipoib_workqueue); + + ipoib_mcast_cancel_queries(dev); return 0; } @@ -832,7 +839,7 @@ void ipoib_mcast_restart_task(void *dev_ ipoib_dbg_mcast(priv, "restarting multicast task\n"); - ipoib_mcast_stop_thread(dev); + ipoib_mcast_cancel_queries(dev); spin_lock_irqsave(&priv->lock, flags); From rolandd at cisco.com Mon Sep 19 10:11:25 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 10:11:25 -0700 Subject: [openib-general] [PATCH] Fix SDP debug build for new CM API Message-ID: <52br2pgeyq.fsf@cisco.com> The CM API has changed so that the req_rcvd event no longer has a device member. Use cm_id->device instead when printing the device's name in a debug message. Signed-off-by: Roland Dreier --- linux-kernel/infiniband/ulp/sdp/sdp_pass.c (revision 3480) +++ linux-kernel/infiniband/ulp/sdp/sdp_pass.c (working copy) @@ -360,8 +360,7 @@ int sdp_cm_req_handler(struct ib_cm_id * sdp_dbg_ctrl(NULL, "CM REQ. comm <%08x> SID <%016llx> ca <%s> port <%d>", cm_id->local_id, (unsigned long long)cm_id->service_id, - event->param.req_rcvd.device->name, - event->param.req_rcvd.port); + cm_id->device->name, event->param.req_rcvd.port); /* * check Hello Header, to determine if we want the connection. */ From halr at voltaire.com Mon Sep 19 10:18:51 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 13:18:51 -0400 Subject: [openib-general] [PATCH] user_mad: Fix length of user buffer copied when sending RMPP In-Reply-To: <52k6hdgf7t.fsf@cisco.com> References: <1127129930.4401.19073.camel@hal.voltaire.com> <521x3lhvas.fsf@cisco.com> <1127148222.4401.21294.camel@hal.voltaire.com> <52k6hdgf7t.fsf@cisco.com> Message-ID: <1127150330.4401.21556.camel@hal.voltaire.com> On Mon, 2005-09-19 at 13:05, Roland Dreier wrote: > Hal> I already checked it in. There was an earlier change which > Hal> was just inteneded to change some formatting but I made a > Hal> mistake and made part of this change there where I > Hal> (mistakenly) also eliminated the subtraction of rmpp_hdr_size > Hal> you cite below. > > I see... I hadn't done svn up. I still think the change has to be > wrong, though. With your latest code: > > /* Now, copy rest of message from user into send buffer */ > if (copy_from_user(((struct ib_rmpp_mad *) packet->msg->mad)->data, > buf + sizeof (struct ib_user_mad) + rmpp_hdr_size, > length + class_hdr_len)) { > > At the beginning of the function, > > length = count - sizeof (struct ib_user_mad); > > We know class_hdr_len >= 0. So that copy is copying > > count - sizeof (struct ib_user_mad) + class_hdr_len > > bytes from buf, at an offset of > > sizeof (struct ib_user_mad) + rmpp_hdr_size > > into the userspace buffer. So it copies up to an offset of > > count + class_hdr_len + rmpp_hdr_size > > in buf. But userspace only did a write of count bytes, so we're > reading past the end of the userspace buffer. > > What am I missing? You are right that it is going past the end of the buffer :-( It does seem to work but it appears that is just luck... I will fix it hopefully correctly this time. > Hal> The length passed in for RMPP MADs is a little funny. In > Hal> osm_vendor_ibumad.c::osm_vendor_send for RMPP, the length of > Hal> the SA MAD header is subtracted off (but this includes the > Hal> MAD header, the RMPP header, and the SA class header). Even > Hal> if that length were to be made "more correct", it would only > Hal> include 1 RMPP header's worth as that is what in the buffer > Hal> being transmitted. That approach would require some slightly > Hal> different changes to user_mad to make the proper adjustments. > Hal> Would that approach be better ? > > I don't really understand this either. Doesn't userspace just pass in > the data that the kernel passes on to ib_post_send_mad()? It's not so direct (to ib_post_send_mad) for RMPP MADs. -- Hal From mshefty at ichips.intel.com Mon Sep 19 10:27:35 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 19 Sep 2005 10:27:35 -0700 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: References: Message-ID: <432EF507.60707@ichips.intel.com> Guy German wrote: > typedef void (*ib_cma_event_handler)(enum ib_cma_event event, void *context, > const void *private_data); > typedef void (*ib_cma_listen_handler)(void *cma_id, struct ib_device *device, > void *private_data, void *context); I think we can merge these two handlers. We do not want to pass back struct ib_device* to a caller. The device needs to be associated with the cma_id up front. > /** > * ib_cma_get_device - Returns the device and port to be used according > * to the destination ip address (this can be detemined according > * to the local routing table). Call this function before > * creating the qp. If using link-local IPv6 addresses > * there is no need to call this function. > * @remote_address - The destination address for connection > * @qos - desired quality of service > * @device - The device to use (output) > * @port - port to use (output) > * > */ > int ib_cma_get_device(struct sockaddr *remote_address, > enum ib_qos qos, struct ib_device **device, u8 *port); I don't believe that we can support this function and still work with device removal. > /** > * ib_cma_connect - this is the connect request function, called by > * the active side. The consumer registers an upcall that will be > * initiated by the cma with an appropriate connection event > * notification (established/rejected/disconnected etc) > * @cma_conn: This structure contains the following connection parameters: > * @qp: qp for establishing the connection > * @qp_attr: only relevant attributes are used > * @dst_ip: destination ip address > * @service_id: destination service id (port) > * @context: context to be returned in the callback > * @cma_event_handler: the upcall function for the active side > * @private_data: private data to be received at the listener upcall > * @private_data_len: private data length (max 255) > * @qos: Quality os service for the rc > * @connect_flags: default or multipath connection > * @cma_id: This returned handle is a void* (different in ib and iwarp) > * in ib - it is pointer to struct cma_context. > */ > int ib_cma_connect(struct ib_cma_conn *cma_conn, void **cma_id); Creating the cma_id inside this call, rather than using a separate call means that the user must be able to handle a connection request callback before the cma_id is known. I.e. a callback can occur before this call returns. (In fact, the entire connection could be established, data transfered, and disconnected before this call returns.) It may be easier to have a separate call to allocate the cma_id that records the context and event handler. > /** > * ib_cma_listen - this function is called by the passive side. It is > * listening on a the specified port (ib service id) for incomming > * connection requests > * @device: > * @address: > * @service_id: service id (port) to listen on > * @context: user context to be returned in the callback > * @cm_listen_handler: the listen callback > * @cma_id: cma handle for the passive side > */ > int ib_cma_listen(struct ib_device *device, struct sockaddr *address, > __be64 service_id, void *context, > ib_cma_listen_handler cm_listen_handler, > void **cma_id); Same issue as above. > /** > * ib_cma_destroy - this functionis is called on the passive side, to > * stop listenning on a certain sevice id > * @cma_id: the same cma handle received when ib_cma_sid_listen was called > */ > int ib_cma_destroy(void *cma_id); Why not have this apply to both active and passive sides? Will this interface support peer to peer connections? If so, then we may not want to distinguish between active and passive at this level. - Sean From mst at mellanox.co.il Mon Sep 19 10:33:04 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Sep 2005 20:33:04 +0300 Subject: [openib-general] Re: [PATCH] Fix SDP debug build for new CM API In-Reply-To: <52br2pgeyq.fsf@cisco.com> References: <52br2pgeyq.fsf@cisco.com> Message-ID: <20050919173304.GB25887@mellanox.co.il> Quoting r. Roland Dreier : > Subject: [PATCH] Fix SDP debug build for new CM API > > The CM API has changed so that the req_rcvd event no longer has a > device member. Use cm_id->device instead when printing the device's > name in a debug message. > > Signed-off-by: Roland Dreier Thanks, applied. -- MST From mst at mellanox.co.il Mon Sep 19 10:39:29 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Sep 2005 20:39:29 +0300 Subject: [openib-general] Re: Re: [PATCH] libibcm/libibat disable-libcheck option In-Reply-To: <432EE212.70206@ichips.intel.com> References: <432EE212.70206@ichips.intel.com> Message-ID: <20050919173929.GC25887@mellanox.co.il> Quoting Sean Hefty : > I don't have any objection as adding this as an option. > > - Sean > Here it is then: could you check this in. or do you want me to? --- Add an option to disable configure checks for ib libraries. This makes it possible to first configure all libraries, then make them all. Signed-off-by: Michael S. Tsirkin Index: userspace/libibcm/configure.in =================================================================== --- userspace.orig/libibcm/configure.in 2005-09-14 20:06:55.000000000 +0300 +++ userspace/libibcm/configure.in 2005-09-14 20:09:22.000000000 +0300 @@ -9,6 +9,12 @@ AM_INIT_AUTOMAKE(libibcm, 0.9.0) AC_DISABLE_STATIC AM_PROG_LIBTOOL +AC_ARG_ENABLE(libcheck, [ --disable-libcheck do not test for presence of ib libraries], +[ if test x$enableval = xno ; then + disable_libcheck=yes + fi +]) + dnl Checks for programs AC_PROG_CC @@ -17,16 +23,22 @@ AC_C_CONST AC_CHECK_SIZEOF(long) dnl Checks for libraries +if test "$disable_libcheck" != "yes" +then AC_CHECK_LIB(ibverbs, ibv_get_devices, [], AC_MSG_ERROR([ibv_get_devices() not found. libibcm requires libibcm.])) AC_CHECK_LIB(ibat, ib_at_route_by_ip, [], AC_MSG_ERROR([ib_at_route_by_ip() not found. libibcm requires libat.])) +fi dnl Checks for header files. +if test "$disable_libcheck" != "yes" +then AC_CHECK_HEADER(infiniband/verbs.h, [], AC_MSG_ERROR([ not found. Is libibverbs installed?])) AC_CHECK_HEADER(infiniband/at.h, [], AC_MSG_ERROR([ not found. Is libibat installed?])) +fi AC_HEADER_STDC AC_CACHE_CHECK(whether ld accepts --version-script, ac_cv_version_script, Index: userspace/libibat/configure.in =================================================================== --- userspace.orig/libibat/configure.in 2005-08-03 11:38:47.000000000 +0300 +++ userspace/libibat/configure.in 2005-09-14 20:12:03.000000000 +0300 @@ -9,14 +9,23 @@ AM_INIT_AUTOMAKE(libibat, 0.9.0) AC_DISABLE_STATIC AM_PROG_LIBTOOL +AC_ARG_ENABLE(libcheck, [ --disable-libcheck do not test for presence of ib libraries], +[ if test x$enableval = xno ; then + disable_libcheck=yes + fi +]) + dnl Checks for programs AC_PROG_CC dnl Checks for libraries dnl Checks for header files. +if test "$disable_libcheck" != "yes" +then AC_CHECK_HEADER(infiniband/verbs.h, [], AC_MSG_ERROR([ not found. Is libibverbs installed?])) +fi AC_HEADER_STDC dnl Checks for typedefs, structures, and compiler characteristics. -- MST -- MST From mst at mellanox.co.il Mon Sep 19 10:44:53 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Sep 2005 20:44:53 +0300 Subject: [openib-general] Re: recursion depth exceeded in ipoib_workqueue In-Reply-To: <52fys1gf2a.fsf@cisco.com> References: <52fys1gf2a.fsf@cisco.com> Message-ID: <20050919174453.GD25887@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: recursion depth exceeded in ipoib_workqueue > > Michael> Dont we need to cancel outstanding mcast queries? > > Yes, I missed that. How about this? It seems that wait_for_completion can now deadlock, since it is waiting for a work queued on ipoib_workqueue. -- MST From halr at voltaire.com Mon Sep 19 10:44:18 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 13:44:18 -0400 Subject: [openib-general] [PATCH] user_mad.c::ib_umad_write don't go past end of user buffer Message-ID: <1127151857.4401.21709.camel@hal.voltaire.com> user_mad.c::ib_umad_write don't go past end of user buffer Fix to previous fix for length of user buffer copied when sending RMPP Signed-off-by: Hal Rosenstock Index: user_mad.c =================================================================== --- user_mad.c (revision 3480) +++ user_mad.c (working copy) @@ -273,7 +273,6 @@ static ssize_t ib_umad_write(struct file u8 method; __be64 *tid; int ret, length, hdr_len, data_len, rmpp_hdr_size; - int class_hdr_len = 0; int rmpp_active = 0; if (count < sizeof (struct ib_user_mad)) @@ -335,16 +334,15 @@ static ssize_t ib_umad_write(struct file ret = -EINVAL; goto err_ah; } + /* Validate that the management class can support RMPP */ if (rmpp_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_ADM) { hdr_len = offsetof(struct ib_sa_mad, data); - data_len = length; - class_hdr_len = sizeof(struct ib_sa_hdr); + data_len = length - hdr_len; } else if ((rmpp_mad->mad_hdr.mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) && (rmpp_mad->mad_hdr.mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END)) { hdr_len = offsetof(struct ib_vendor_mad, data); data_len = length - hdr_len; - class_hdr_len = 4; } else { ret = -EINVAL; goto err_ah; @@ -393,7 +391,7 @@ static ssize_t ib_umad_write(struct file /* Now, copy rest of message from user into send buffer */ if (copy_from_user(((struct ib_rmpp_mad *) packet->msg->mad)->data, buf + sizeof (struct ib_user_mad) + rmpp_hdr_size, - length + class_hdr_len)) { + length - rmpp_hdr_size)) { ret = -EFAULT; goto err_msg; } From halr at voltaire.com Mon Sep 19 10:47:51 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 13:47:51 -0400 Subject: [openib-general] [PATCH} OpenSM osm_vendor_ibumad.c::osm_vendor_send Fix length of umad_send Message-ID: <1127152070.4401.21713.camel@hal.voltaire.com> osm_vendor_ibumad.c::osm_vendor_send Fix length of umad_send when VENDOR_RMPP_SUPPORT is defined. NOTE: This requires the user_mad.c::ib_umad_write patch just sent. Signed-off-by: Hal Rosenstock Index: osm_vendor_ibumad.c =================================================================== --- osm_vendor_ibumad.c (revision 3480) +++ osm_vendor_ibumad.c (working copy) @@ -983,8 +983,12 @@ osm_vendor_send( put_madw(p_vend, p_madw, &p_mad->trans_id); if ((ret = umad_send(p_bind->port_id, p_bind->agent_id, p_vw->umad, +#ifdef VENDOR_RMPP_SUPPORT + p_madw->mad_size, +#else is_rmpp ? p_madw->mad_size - IB_SA_MAD_HDR_SIZE : p_madw->mad_size, +#endif resp_expected ? p_vend->timeout : 0, p_vend->max_retries)) < 0) { if (resp_expected) From halr at voltaire.com Mon Sep 19 10:52:13 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 13:52:13 -0400 Subject: [openib-general] user_mad.c changes for upstream Message-ID: <1127152333.4401.21716.camel@hal.voltaire.com> Hi Roland, The last patch I sent is incremental off of what is in the OpenIB svn tree. If that looks right, I will check it in. It does work. Do you want a patch which is the consolidated difference from what has been pushed upstream or would you be all set on this ? -- Hal From mshefty at ichips.intel.com Mon Sep 19 10:59:16 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 19 Sep 2005 10:59:16 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: References: Message-ID: <432EFC74.6050105@ichips.intel.com> Guy German wrote: > #define CMA_TARGET_MAX 4 > #define CMA_INITIATOR_DEPTH 4 > #define CMA_RC_RETRY_COUNT 7 > #define CMA_RNR_RETRY_COUNT 6 > #define CMA_CM_RESPONSE_TIMEOUT 20 /* 4 sec */ > #define CMA_MAX_CM_RETRIES 0 Are these values hard-coded just for the initial implementation? How would these change? > enum cma_close_flags { > CMA_CLOSE_ABRUPT = 0, > CMA_CLOSE_GRACEFUL > }; Not sure what these are for. Why not have the user always destroy the cma_id? If it hasn't yet been destroyed when a disconnect comes in, callback the user. If a connection hasn't been disconnected when it is destroyed, automatically send a disconnect message. > struct accept_callback { > ib_cma_ac_handler func; > void *context; > }; > > struct listen_callback { > ib_cma_listen_handler func; > void *context; > }; These could be eliminated if we just associated a context with a cma_id and left it at that. I think we're asking for race conditions if we try to update the context with every callback. > static int cma_modify_qp_state(struct ib_cm_id *cm_id, struct ib_qp *qp, > enum ib_qp_state qp_state, > int qp_attr_mask) > { > struct ib_qp_attr qp_attr; > int status = 0; > printk(KERN_DEBUG PFX "%s: enter >>> modify to %d\n", > __func__, qp_state); > > if (qp == NULL) > return -EINVAL; We shouldn't need checks like this in the kernel. > memset(&qp_attr, 0, sizeof qp_attr); > qp_attr.qp_state = qp_state; > > if (cm_id && !qp_attr_mask) Or this check... > static int destroy_cma_ctx(struct cma_context *cma_ctx) > { > if(!IS_ERR(cma_ctx->cm_id)) > ib_destroy_cm_id(cma_ctx->cm_id); > if (cma_ctx->cma_param.private_data) > kfree(cma_ctx->cma_param.private_data); Is this the outbound private data or inbound? Why not tie the private data to an event and avoid storing it with the cma_ctx? > if (cma_ctx) > kfree(cma_ctx); > > cma_ctx = NULL; Is there a reason to set this to NULL? > return 0; > } > static int cma_disconnect(struct ib_qp *qp, > struct cma_context *cma_ctx, > enum cma_close_flags cflags) > { > int status; > > if (cma_ctx == NULL) > goto modqp; We shouldn't need this check. > if (cflags == CMA_CLOSE_ABRUPT) > status = destroy_cma_ctx(cma_ctx); Why would this call fail? What would the user do if it does? > else if (cflags == CMA_CLOSE_GRACEFUL){ > status = ib_send_cm_dreq(cma_ctx->cm_id, NULL, 0); > } See comments above. Eliminate the GRACEFUL/ABRUPT flags and just let the user either issue the disconnect or just destroy the cma_ctx. > modqp: > status = cma_modify_qp_state(0, qp, IB_QPS_ERR, IB_QP_STATE); > return status; > } > > void cma_connection_callback(struct cma_context *cma_ctx, > const enum ib_cma_event event, > const void *private_data) > { > ib_cma_event_handler conn_cb; > struct ib_qp *qp = cma_ctx->cma_conn.qp; > int status; > > conn_cb = cma_ctx->cma_conn.cma_event_handler; > > switch (event) { > case IB_CMA_EVENT_ESTABLISHED: > break; > case IB_CMA_EVENT_DISCONNECTED: > case IB_CMA_EVENT_REJECTED: > case IB_CMA_EVENT_UNREACHABLE: > case IB_CMA_EVENT_NON_PEER_REJECTED: > status = cma_disconnect(qp, cma_ctx, CMA_CLOSE_ABRUPT); This is destroying the cma_ctx without the user knowing it. The dereference to cma_ctx below will crash. We shouldn't take any action on behalf of the user. Simply report the error and let the user destroy the cma_id. > break; > default: > printk(KERN_ERR PFX "%s: unknown event !!\n", __func__); > } > > printk(KERN_DEBUG PFX "%s: event=%d\n", __func__, event); > > conn_cb(event, cma_ctx->cma_conn.context, private_data); > } > > static inline int cma_rep_recv(struct cma_context *cma_ctx, > struct ib_cm_event *rep_cm_event) > { > int status; > > status = cma_modify_qp_state(cma_ctx->cm_id, cma_ctx->cma_conn.qp, > IB_QPS_RTR, 0); > if (status) { > printk(KERN_ERR PFX "%s: fail to modify QPS_RTR %d\n", > __func__, status); > return status; > } > > status = cma_modify_qp_state(cma_ctx->cm_id, cma_ctx->cma_conn.qp, > IB_QPS_RTS, 0); > if (status) { > printk(KERN_ERR PFX "%s: fail to modify QPS_RTR %d\n", > __func__, status); > return status; > } > > status = ib_send_cm_rtu(cma_ctx->cm_id, NULL, 0); > if (status) { > printk(KERN_ERR PFX "%s: fail to send cm rtu %d\n", > __func__, status); > return status; > } > > return 0; > } > > int cma_active_cb_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) > { > int status = 0; > enum ib_cma_event cma_event = 0; > struct cma_context *cma_ctx = cm_id->context; > > printk(KERN_DEBUG PFX "%s: enter >>> cm_id=%p cma_ctx=%p\n",__func__, > cm_id, cma_ctx); > > switch (event->event) { > case IB_CM_REQ_ERROR: > cma_event = IB_CMA_EVENT_UNREACHABLE; > break; > case IB_CM_REJ_RECEIVED: > cma_event = IB_CMA_EVENT_NON_PEER_REJECTED; > break; > case IB_CM_DREP_RECEIVED: > case IB_CM_TIMEWAIT_EXIT: > cma_event = IB_CMA_EVENT_DISCONNECTED; > break; > case IB_CM_REP_RECEIVED: > status = cma_rep_recv(cma_ctx, event); > if (!status) > cma_event = IB_CMA_EVENT_ESTABLISHED; > else > cma_event = IB_CMA_EVENT_DISCONNECTED; > break; > case IB_CM_DREQ_RECEIVED: > ib_send_cm_drep(cm_id, NULL, 0); > cma_event = IB_CMA_EVENT_DISCONNECTED; > break; > case IB_CM_DREQ_ERROR: > break; > default: > printk(KERN_WARNING PFX "%s: cm event (%d) not handled\n", > __func__, event->event); > break; > } > > printk(KERN_WARNING PFX "%s: cm_event=%d cma_event=%d\n", > __func__, event->event, cma_event); > > if (cma_event) This check isn't needed. > cma_connection_callback(cma_ctx, cma_event, > event->private_data); > > return status; Returning non-zero will destroy the underlying cm_id. We can avoid some synchronization by letting it exist until the user destroys the corresponding cma_id. Otherwise, there's the potential of the user trying to destroy it twice. Once from the cma_connection_callback reporting an error, and then again here. > } > > static struct cma_context *get_cma_ctx(struct ib_cm_id *cm_id, > struct ib_cm_event *event) > { > struct cma_context *new_cma_ctx; > int status; > > > if (event->event != IB_CM_REQ_RECEIVED) > return cm_id->context; > > if (((struct cma_context *)cm_id->context)->cm_id != cm_id) > printk(KERN_DEBUG PFX "%s: old_cm_id=%p new_cm_id=%p\n", > __func__, ((struct cma_context *)cm_id->context)->cm_id, > cm_id); This check shouldn't be needed. > new_cma_ctx = kmalloc(sizeof *new_cma_ctx, GFP_KERNEL); > if (!new_cma_ctx) { > status = ib_send_cm_rej(cm_id, > IB_CM_REJ_CONSUMER_DEFINED, > NULL, 0, NULL, 0); > return NULL; > } > > memset(new_cma_ctx, 0, sizeof *new_cma_ctx); > new_cma_ctx->cm_id = cm_id; > new_cma_ctx->creq_cma_ctx = cm_id->context; > cm_id->context = new_cma_ctx; > > return new_cma_ctx; > } > > static void cma_path_handler(u64 req_id, void *context, int rec_num) > { > struct cma_context *cma_ctx = context; > enum ib_cma_event event; > int status = 0; > > if (!cma_ctx) { This check isn't needed. > printk(KERN_ERR PFX "%s: context received null\n",__func__); > return; > } > > if (rec_num <= 0) { > event = IB_CMA_EVENT_UNREACHABLE; > goto error; > } > > cma_ctx->cma_param.primary_path = &cma_ctx->cma_path; > cma_ctx->cma_param.alternate_path = NULL; > > printk(KERN_DEBUG PFX "%s: dlid=%d slid=%d pkey=%d mtu=%d sid=%llx " > "qpn=%d qpt=%d psn=%d prd=%s respres=%d rcm=%d flc=%d " > "cmt=%d rtrc=%d rntrtr=%d maxcm=%d \n",__func__, > cma_ctx->cma_param.primary_path->dlid , > cma_ctx->cma_param.primary_path->slid , > cma_ctx->cma_param.primary_path->pkey , > cma_ctx->cma_param.primary_path->mtu , > cma_ctx->cma_param.service_id, > cma_ctx->cma_param.qp_num, > cma_ctx->cma_param.qp_type, > cma_ctx->cma_param.starting_psn, > (char *)cma_ctx->cma_param.private_data, > cma_ctx->cma_param.responder_resources, > cma_ctx->cma_param.remote_cm_response_timeout, > cma_ctx->cma_param.flow_control, > cma_ctx->cma_param.local_cm_response_timeout, > cma_ctx->cma_param.retry_count, > cma_ctx->cma_param.rnr_retry_count, > cma_ctx->cma_param.max_cm_retries); > > status = ib_send_cm_req(cma_ctx->cm_id, &cma_ctx->cma_param); > if (status) { > printk(KERN_ERR PFX "%s: cm_req failed %d\n",__func__, status); > event = IB_CMA_EVENT_REJECTED; > goto error; > } > > return; > > error: > printk(KERN_ERR PFX "%s: return error %d \n",__func__, status); > cma_connection_callback(cma_ctx, event, NULL); > } > > static void cma_route_handler(u64 req_id, void *context, int rec_num) > { > struct cma_context *cma_ctx = context; > enum ib_cma_event event; > int status = 0; > > if (rec_num <= 0) { > event = IB_CMA_EVENT_UNREACHABLE; > goto error; > } > cma_ctx->ibat_comp.fn = &cma_path_handler; > cma_ctx->ibat_comp.context = cma_ctx; > > status = ib_at_paths_by_route(&cma_ctx->cma_route, 0, > &cma_ctx->cma_path, 1, > &cma_ctx->ibat_comp); > > if (status) { > event = IB_CMA_EVENT_DISCONNECTED; > goto error; > } > return; > > error: > printk(KERN_ERR PFX "%s: return error %d \n",__func__, status); > cma_connection_callback(cma_ctx, event ,NULL); > } > > /* API functions */ > > int ib_cma_create_qp(struct ib_pd *pd, u8 port, struct ib_qp **qp_in, > struct ib_qp_init_attr *init_attr) > { Why not return struct ib_qp* similar to how the other APIs operate? > struct ib_qp_attr qp_attr; > int qp_attr_mask; > struct ib_qp *qp; > > qp = ib_create_qp(pd, init_attr); > if (IS_ERR(qp)) > return IS_ERR(qp); > *qp_in = qp; > printk(KERN_DEBUG PFX "%s: qp created (%p)\n",__func__, qp); > memset(&qp_attr, 0, sizeof qp_attr); > > qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS | > IB_QP_PKEY_INDEX | IB_QP_PORT; > > qp_attr.qp_access_flags = IB_ACCESS_REMOTE_READ | > IB_ACCESS_REMOTE_WRITE; > qp_attr.qp_state = IB_QPS_INIT; > qp_attr.pkey_index = 0; > qp_attr.port_num = port; > printk(KERN_DEBUG PFX "%s: call ib_modify_qp (qp_attr_mask=%x)\n", > __func__, qp_attr_mask); > > return ib_modify_qp(qp, &qp_attr, qp_attr_mask); If modify fails, the QP should be destroyed. > }; > EXPORT_SYMBOL(ib_cma_create_qp); > > int ib_cma_connect(struct ib_cma_conn *cma_conn, void **cma_id) > { > struct cma_context *cma_ctx; > int status; > u32 timeout; > u32 dst_ip; > dst_ip = (((struct sockaddr_in *)(cma_conn->dst_ip))->sin_addr).s_addr; > > printk(KERN_DEBUG PFX "%s: enter >>> dst_ip=%d.%d.%d.%d\n",__func__, > (dst_ip & 0x000000ff), > (dst_ip & 0x0000ff00) >> 8, > (dst_ip & 0x00ff0000) >> 16, > (dst_ip & 0xff000000) >> 24); > cma_ctx = kmalloc(sizeof *cma_ctx, GFP_KERNEL); > if (!cma_ctx) > return -ENOMEM; > memset(cma_ctx, 0, sizeof *cma_ctx); > > timeout = us_to_cmt(cma_conn->qp_attr->timeout); > > cma_ctx->cm_id = ib_create_cm_id(cma_conn->device, > cma_active_cb_handler, > (void *)cma_ctx); > if (IS_ERR(cma_ctx->cm_id)) { > printk(KERN_ERR PFX "%s: cm_id creation failed\n", __func__); > destroy_cma_ctx(cma_ctx); > return -EAGAIN; > } > else > printk(KERN_DEBUG PFX "%s: cm_id created %p\n", __func__, > cma_ctx->cm_id); > > printk(KERN_DEBUG PFX "%s: cma_event_handler=%p\n", __func__, > cma_conn->cma_event_handler ); > memcpy(&cma_ctx->cma_conn, cma_conn, sizeof *cma_conn); > cma_ctx->cma_param.service_id = cma_conn->service_id; > cma_ctx->cma_param.qp_num = cma_conn->qp->qp_num; > cma_ctx->cma_param.qp_type = IB_QPT_RC; > cma_ctx->cma_param.private_data = kmalloc(cma_conn->private_data_len, > GFP_KERNEL); > memcpy((u8 *)cma_ctx->cma_param.private_data, > (u8 *)cma_conn->private_data, cma_conn->private_data_len); > cma_ctx->cma_param.private_data_len = cma_conn->private_data_len; > cma_ctx->cma_param.responder_resources = CMA_TARGET_MAX; > cma_ctx->cma_param.initiator_depth = CMA_INITIATOR_DEPTH; > cma_ctx->cma_param.remote_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT; //timeout; > cma_ctx->cma_param.local_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT; > cma_ctx->cma_param.retry_count = CMA_RC_RETRY_COUNT; > cma_ctx->cma_param.rnr_retry_count = CMA_RNR_RETRY_COUNT; > cma_ctx->cma_param.max_cm_retries = CMA_MAX_CM_RETRIES; > cma_ctx->ibat_comp.fn = &cma_route_handler; > cma_ctx->ibat_comp.context = cma_ctx; > > status = ib_at_route_by_ip(dst_ip, 0, 0, 0, &cma_ctx->cma_route, > &cma_ctx->ibat_comp); > if (status < 0) { > printk(KERN_ERR PFX " ib_at_route_by_ip failed (%d)\n", > status); > destroy_cma_ctx(cma_ctx); > return -EAGAIN; > } > > if (status == 1) { > printk(KERN_DEBUG PFX "%s: immidiate route - call " > "route_handler\n",__func__); > cma_route_handler(cma_ctx->ibat_comp.req_id, cma_ctx, 1); > } > *cma_id = (void *)cma_ctx; > return 0; > }; > EXPORT_SYMBOL(ib_cma_connect); > > int ib_cma_disconnect(struct ib_qp *qp, void *cma_id) > { > struct cma_context *cma_ctx = cma_id; > int status; > > status = cma_disconnect(qp, cma_ctx, CMA_CLOSE_ABRUPT); > > return status; > }; > EXPORT_SYMBOL(ib_cma_disconnect); > > int ib_cma_listen(struct ib_device *device, struct sockaddr *address, > __be64 service_id, void *context, > ib_cma_listen_handler cm_listen_handler, > void **cma_id) > { > struct cma_context *cma_ctx; > int status; > > printk(KERN_DEBUG PFX "%s: enter >> \n",__func__); > cma_ctx = kmalloc(sizeof *cma_ctx, GFP_KERNEL); > if (!cma_ctx) > return -ENOMEM; > memset(cma_ctx, 0, sizeof *cma_ctx); > cma_ctx->listen_cb.func = cm_listen_handler; > cma_ctx->listen_cb.context = context; > cma_ctx->cm_id = ib_create_cm_id(device, cma_passive_cb_handler, > (void *)cma_ctx); > if (IS_ERR(cma_ctx->cm_id)) { > printk(KERN_ERR PFX "%s: cm_id creation failed\n", __func__); > destroy_cma_ctx(cma_ctx); > return -EAGAIN; > } > else > printk(KERN_DEBUG PFX "%s: cm_id created %p\n", __func__, > cma_ctx->cm_id); > /* `address` is ignored at the moment ... */ > status = ib_cm_listen(cma_ctx->cm_id, service_id, 0); > if (status) { > printk(KERN_ERR PFX "%s: cm_listen failed %d\n", __func__, > status); > destroy_cma_ctx(cma_ctx); > return status; > } > printk(KERN_INFO PFX "%s:cm_id=%p cma_id=%p\n", __func__, > cma_ctx->cm_id, cma_ctx); > > *cma_id = (void *)cma_ctx; > return 0; > }; > EXPORT_SYMBOL(ib_cma_listen); > > int ib_cma_destroy(void *cma_id) > { > return destroy_cma_ctx((struct cma_context *)cma_id); > }; > EXPORT_SYMBOL(ib_cma_destroy); > > int ib_cma_accept(void *cma_id, struct ib_qp *qp, > const void *private_data, u8 private_data_len, > void *context, ib_cma_ac_handler cm_accept_handler) > { > struct cma_context *cma_ctx = cma_id; > struct ib_cm_rep_param passive_params; > int status; > > printk(KERN_DEBUG PFX "%s: enter >> private_data = %s (len=%d)\n", > __func__, (char *)private_data, private_data_len); > > if (private_data_len > IB_CM_REP_PRIVATE_DATA_SIZE) { > status = -EINVAL; > goto reject; > } > > memset(&passive_params, 0, sizeof passive_params); > passive_params.private_data = private_data; > passive_params.private_data_len = private_data_len; > passive_params.qp_num = qp->qp_num; > passive_params.responder_resources = CMA_TARGET_MAX; > passive_params.initiator_depth = CMA_INITIATOR_DEPTH; > passive_params.rnr_retry_count = CMA_RNR_RETRY_COUNT; > > status = cma_modify_qp_state(cma_ctx->cm_id, qp, IB_QPS_RTR, 0); > if (status) > goto reject; > > cma_ctx->accept_cb.func = cm_accept_handler; > cma_ctx->accept_cb.context = context; > status = ib_send_cm_rep(cma_ctx->cm_id, &passive_params); > if (status) > goto reject; > > printk(KERN_DEBUG PFX "%s: return success\n", __func__); > return 0; > > reject: > printk(KERN_ERR PFX "%s: error status %d\n", __func__, status); > ib_send_cm_rej(cma_ctx->cm_id, IB_CM_REJ_CONSUMER_DEFINED, > NULL, 0, NULL, 0); > destroy_cma_ctx(cma_ctx); > return status; > }; > EXPORT_SYMBOL(ib_cma_accept); > > int ib_cma_reject(void *cma_id, const void *private_data, > u8 private_data_len) > { > struct cma_context *cma_ctx = cma_id; > int status; > > if (cma_ctx == NULL) This check isn't needed. > return -EINVAL; > > status = ib_send_cm_rej(cma_ctx->cm_id, IB_CM_REJ_CONSUMER_DEFINED, > NULL, 0, private_data, private_data_len); > destroy_cma_ctx(cma_ctx); > > return status; > }; > EXPORT_SYMBOL(ib_cma_reject); As a general statement, can we reduce the number of debug print statements? - Sean From mshefty at ichips.intel.com Mon Sep 19 11:02:36 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 19 Sep 2005 11:02:36 -0700 Subject: [openib-general] Re: [PATCH] libibcm/libibat disable-libcheck option In-Reply-To: <20050919173929.GC25887@mellanox.co.il> References: <432EE212.70206@ichips.intel.com> <20050919173929.GC25887@mellanox.co.il> Message-ID: <432EFD3C.2050201@ichips.intel.com> Michael S. Tsirkin wrote: > Add an option to disable configure checks for ib libraries. > This makes it possible to first configure all libraries, > then make them all. Committed. - Sean From rolandd at cisco.com Mon Sep 19 11:09:52 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 11:09:52 -0700 Subject: [openib-general] Re: user_mad.c changes for upstream In-Reply-To: <1127152333.4401.21716.camel@hal.voltaire.com> (Hal Rosenstock's message of "19 Sep 2005 13:52:13 -0400") References: <1127152333.4401.21716.camel@hal.voltaire.com> Message-ID: <52u0ghexov.fsf@cisco.com> Hal> Hi Roland, The last patch I sent is incremental off of what Hal> is in the OpenIB svn tree. If that looks right, I will check Hal> it in. It does work. Do you want a patch which is the Hal> consolidated difference from what has been pushed upstream or Hal> would you be all set on this ? What I really need is a changelog comment explaining the final change. A diff against what is upstream is helpful, but I can generate it from subversion without too much trouble. - R. From rolandd at cisco.com Mon Sep 19 11:13:44 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 11:13:44 -0700 Subject: [openib-general] Re: user_mad.c changes for upstream In-Reply-To: <1127152333.4401.21716.camel@hal.voltaire.com> (Hal Rosenstock's message of "19 Sep 2005 13:52:13 -0400") References: <1127152333.4401.21716.camel@hal.voltaire.com> Message-ID: <52mzm9exif.fsf@cisco.com> Formatting changes aside, it seems that the change is just the below. Is that all there is? - R. --- drivers/infiniband/core/user_mad.c +++ drivers/infiniband/core/user_mad.c @@ -337,7 +337,7 @@ static ssize_t ib_umad_write(struct file /* Validate that management class can support RMPP */ if (rmpp_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_ADM) { hdr_len = offsetof(struct ib_sa_mad, data); - data_len = length; + data_len = length - hdr_len; } else if ((rmpp_mad->mad_hdr.mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) && (rmpp_mad->mad_hdr.mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END)) { hdr_len = offsetof(struct ib_vendor_mad, data); From halr at voltaire.com Mon Sep 19 11:15:09 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 14:15:09 -0400 Subject: [openib-general] [Fwd: [PATCH] user_mad.c::ib_umad_write don't go past end of user buffer] Message-ID: <1127153708.24173.51.camel@hal.voltaire.com> This didn't seem to make it to the list. -----Forwarded Message----- From: Hal Rosenstock To: Roland Dreier Cc: Eitan Zahavi , openib-general at openib.org Subject: [PATCH] user_mad.c::ib_umad_write don't go past end of user buffer Date: 19 Sep 2005 13:44:18 -0400 user_mad.c::ib_umad_write don't go past end of user buffer Fix to previous fix for length of user buffer copied when sending RMPP Signed-off-by: Hal Rosenstock Index: user_mad.c =================================================================== --- user_mad.c (revision 3480) +++ user_mad.c (working copy) @@ -273,7 +273,6 @@ static ssize_t ib_umad_write(struct file u8 method; __be64 *tid; int ret, length, hdr_len, data_len, rmpp_hdr_size; - int class_hdr_len = 0; int rmpp_active = 0; if (count < sizeof (struct ib_user_mad)) @@ -335,16 +334,15 @@ static ssize_t ib_umad_write(struct file ret = -EINVAL; goto err_ah; } + /* Validate that the management class can support RMPP */ if (rmpp_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_ADM) { hdr_len = offsetof(struct ib_sa_mad, data); - data_len = length; - class_hdr_len = sizeof(struct ib_sa_hdr); + data_len = length - hdr_len; } else if ((rmpp_mad->mad_hdr.mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) && (rmpp_mad->mad_hdr.mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END)) { hdr_len = offsetof(struct ib_vendor_mad, data); data_len = length - hdr_len; - class_hdr_len = 4; } else { ret = -EINVAL; goto err_ah; @@ -393,7 +391,7 @@ static ssize_t ib_umad_write(struct file /* Now, copy rest of message from user into send buffer */ if (copy_from_user(((struct ib_rmpp_mad *) packet->msg->mad)->data, buf + sizeof (struct ib_user_mad) + rmpp_hdr_size, - length + class_hdr_len)) { + length - rmpp_hdr_size)) { ret = -EFAULT; goto err_msg; } From halr at voltaire.com Mon Sep 19 11:21:31 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 14:21:31 -0400 Subject: [openib-general] Re: user_mad.c changes for upstream In-Reply-To: <52u0ghexov.fsf@cisco.com> References: <1127152333.4401.21716.camel@hal.voltaire.com> <52u0ghexov.fsf@cisco.com> Message-ID: <1127154091.24173.105.camel@hal.voltaire.com> On Mon, 2005-09-19 at 14:09, Roland Dreier wrote: > Hal> Hi Roland, The last patch I sent is incremental off of what > Hal> is in the OpenIB svn tree. If that looks right, I will check > Hal> it in. It does work. Do you want a patch which is the > Hal> consolidated difference from what has been pushed upstream or > Hal> would you be all set on this ? > > What I really need is a changelog comment explaining the final change. > A diff against what is upstream is helpful, but I can generate it from > subversion without too much trouble. OK. Here's what I think it nets out to other than the formatting changes: user_mad::ib_umad_write Fix the data length for RMPP of user buffer copied when sending SA class RMPP The salient diff is below. -- Hal --- user_mad.c (revision 3471) +++ user_mad.c (revision 3484) @@ -334,10 +334,11 @@ static ssize_t ib_umad_write(struct file ret = -EINVAL; goto err_ah; } - /* Validate that management class can support RMPP */ + + /* Validate that the management class can support RMPP */ if (rmpp_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_ADM) { hdr_len = offsetof(struct ib_sa_mad, data); - data_len = length; + data_len = length - hdr_len; } else if ((rmpp_mad->mad_hdr.mgmt_class >= IB_MGMT_CLASS_VENDOR _RANGE2_START) && (rmpp_mad->mad_hdr.mgmt_class <= IB_MGMT_CLASS_VENDO R_RANGE2_END)) { hdr_len = offsetof(struct ib_vendor_mad, data); From halr at voltaire.com Mon Sep 19 11:24:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 14:24:49 -0400 Subject: [openib-general] Re: user_mad.c changes for upstream In-Reply-To: <52mzm9exif.fsf@cisco.com> References: <1127152333.4401.21716.camel@hal.voltaire.com> <52mzm9exif.fsf@cisco.com> Message-ID: <1127154157.24173.115.camel@hal.voltaire.com> On Mon, 2005-09-19 at 14:13, Roland Dreier wrote: > Formatting changes aside, it seems that the change is just the below. A patch didn't make it to the list so I checked it in. > Is that all there is? I sent an email summarizing this. See if you think the same thing. Thanks. -- Hal > - R. > > --- drivers/infiniband/core/user_mad.c > +++ drivers/infiniband/core/user_mad.c > @@ -337,7 +337,7 @@ static ssize_t ib_umad_write(struct file > /* Validate that management class can support RMPP */ > if (rmpp_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_ADM) { > hdr_len = offsetof(struct ib_sa_mad, data); > - data_len = length; > + data_len = length - hdr_len; > } else if ((rmpp_mad->mad_hdr.mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) && > (rmpp_mad->mad_hdr.mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END)) { > hdr_len = offsetof(struct ib_vendor_mad, data); From rolandd at cisco.com Mon Sep 19 11:54:49 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 11:54:49 -0700 Subject: [openib-general] Re: recursion depth exceeded in ipoib_workqueue In-Reply-To: <20050919174453.GD25887@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 19 Sep 2005 20:44:53 +0300") References: <52fys1gf2a.fsf@cisco.com> <20050919174453.GD25887@mellanox.co.il> Message-ID: <52ek7kga6e.fsf@cisco.com> Michael> It seems that wait_for_completion can now deadlock, since Michael> it is waiting for a work queued on ipoib_workqueue. I don't think so -- all the complete() calls are in functions called from the SA module's context. Am I wrong? - R. From manpreets7 at yahoo.com Mon Sep 19 12:16:19 2005 From: manpreets7 at yahoo.com (Manpreet Singh) Date: Mon, 19 Sep 2005 12:16:19 -0700 (PDT) Subject: [openib-general] Re: Tavor HCAs with openib Message-ID: <20050919191619.65414.qmail@web34207.mail.mud.yahoo.com> Hi Roland, The following is the output with the card after adding the lines you mentioned: ib_mthca 0000:04:00.0: Missing UAR, aborting. ib_mthca 0000:04:00.0: flags: 0, len: 0 Thanks, Manpreet. __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com From eitan at mellanox.co.il Mon Sep 19 13:01:37 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 19 Sep 2005 23:01:37 +0300 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <1127143328.4401.20682.camel@hal.voltaire.com> References: <1127143328.4401.20682.camel@hal.voltaire.com> Message-ID: <432F1921.3070909@mellanox.co.il> Hal Rosenstock wrote: > On Fri, 2005-09-16 at 00:55, Eitan Zahavi wrote: > >>For the gen2 stack we could use the following "hack": >>OpenSM scans all nodes for their description every time it does a full > > sweep. > >>So we could cause an extra sweep after each node description change by > > faking trap 144 > >>(HCA port capability mask change) and sending it over. > > > Is a full sweep needed to handle a single HCA port capability mask > change ? Current implementation does either a full sweep or light sweep (scan only switch info for change bit and non responsive ports). > > Also, couldn't the NodeDescription change be handled with a light sweep > (if SM Get NodeDescription were added there) ? That might be better or > are there other issues with doing it that way ? See above. No other issues but we might eventually define incremental algorithm. > > -- Hal > From eitan at mellanox.co.il Mon Sep 19 13:09:13 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 19 Sep 2005 23:09:13 +0300 Subject: [openib-general] Re: [PATCH} OpenSM osm_vendor_ibumad.c::osm_vendor_send Fix length of umad_send In-Reply-To: <1127152070.4401.21713.camel@hal.voltaire.com> References: <1127152070.4401.21713.camel@hal.voltaire.com> Message-ID: <432F1AE9.803@mellanox.co.il> Hi Hal, Hal Rosenstock wrote: > osm_vendor_ibumad.c::osm_vendor_send Fix length of umad_send when > VENDOR_RMPP_SUPPORT is defined. > > NOTE: This requires the user_mad.c::ib_umad_write patch just sent. Please send a clear message when this can be tested. I see many mails between you and Roland and now confused when it is going to be fully supported in the trunk core and management. I guess this is all about the truncated RMPP send I have complained about. Thanks for fixing it Eitan > > Signed-off-by: Hal Rosenstock > > Index: osm_vendor_ibumad.c > =================================================================== > --- osm_vendor_ibumad.c (revision 3480) > +++ osm_vendor_ibumad.c (working copy) > @@ -983,8 +983,12 @@ osm_vendor_send( > put_madw(p_vend, p_madw, &p_mad->trans_id); > > if ((ret = umad_send(p_bind->port_id, p_bind->agent_id, > p_vw->umad, > +#ifdef VENDOR_RMPP_SUPPORT > + p_madw->mad_size, > +#else > is_rmpp ? p_madw->mad_size - > IB_SA_MAD_HDR_SIZE : > p_madw->mad_size, > +#endif > resp_expected ? p_vend->timeout : 0, > p_vend->max_retries)) < 0) { > if (resp_expected) > From mst at mellanox.co.il Mon Sep 19 13:14:17 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 19 Sep 2005 23:14:17 +0300 Subject: [openib-general] Re: recursion depth exceeded in ipoib_workqueue In-Reply-To: <52ek7kga6e.fsf@cisco.com> References: <52ek7kga6e.fsf@cisco.com> Message-ID: <20050919201417.GA27254@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: recursion depth exceeded in ipoib_workqueue > > Michael> It seems that wait_for_completion can now deadlock, since > Michael> it is waiting for a work queued on ipoib_workqueue. > > I don't think so -- all the complete() calls are in functions called > from the SA module's context. Am I wrong? > > - R. > What about this: down(&mcast_mutex); if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) { if (status == -ETIMEDOUT) queue_work(ipoib_workqueue, &priv->mcast_task); else queue_delayed_work(ipoib_workqueue, &priv->mcast_task, mcast->backoff * HZ); } else complete(&mcast->done); up(&mcast_mutex); Can we get to this code? -- MST From rolandd at cisco.com Mon Sep 19 13:44:09 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 13:44:09 -0700 Subject: [openib-general] Re: recursion depth exceeded in ipoib_workqueue In-Reply-To: <20050919201417.GA27254@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 19 Sep 2005 23:14:17 +0300") References: <52ek7kga6e.fsf@cisco.com> <20050919201417.GA27254@mellanox.co.il> Message-ID: <52aci8g546.fsf@cisco.com> Yep, I was wrong again. OK, how about this: don't do the flush when we're calling from the same workqueue context. Index: linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (revision 3483) +++ linux-kernel/infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -598,7 +598,7 @@ int ipoib_mcast_start_thread(struct net_ return 0; } -int ipoib_mcast_stop_thread(struct net_device *dev) +int ipoib_mcast_stop_thread(struct net_device *dev, int flush) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_mcast *mcast; @@ -610,7 +610,8 @@ int ipoib_mcast_stop_thread(struct net_d cancel_delayed_work(&priv->mcast_task); up(&mcast_mutex); - flush_workqueue(ipoib_workqueue); + if (flush) + flush_workqueue(ipoib_workqueue); if (priv->broadcast && priv->broadcast->query) { ib_sa_cancel_query(priv->broadcast->query_id, priv->broadcast->query); @@ -832,7 +833,7 @@ void ipoib_mcast_restart_task(void *dev_ ipoib_dbg_mcast(priv, "restarting multicast task\n"); - ipoib_mcast_stop_thread(dev); + ipoib_mcast_stop_thread(dev, 0); spin_lock_irqsave(&priv->lock, flags); Index: linux-kernel/infiniband/ulp/ipoib/ipoib.h =================================================================== --- linux-kernel/infiniband/ulp/ipoib/ipoib.h (revision 3483) +++ linux-kernel/infiniband/ulp/ipoib/ipoib.h (working copy) @@ -257,7 +257,7 @@ void ipoib_mcast_send(struct net_device void ipoib_mcast_restart_task(void *dev_ptr); int ipoib_mcast_start_thread(struct net_device *dev); -int ipoib_mcast_stop_thread(struct net_device *dev); +int ipoib_mcast_stop_thread(struct net_device *dev, int flush); void ipoib_mcast_dev_down(struct net_device *dev); void ipoib_mcast_dev_flush(struct net_device *dev); Index: linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c (revision 3483) +++ linux-kernel/infiniband/ulp/ipoib/ipoib_ib.c (working copy) @@ -432,7 +432,7 @@ int ipoib_ib_dev_down(struct net_device flush_workqueue(ipoib_workqueue); } - ipoib_mcast_stop_thread(dev); + ipoib_mcast_stop_thread(dev, 1); /* * Flush the multicast groups first so we stop any multicast joins. The @@ -599,7 +599,7 @@ void ipoib_ib_dev_cleanup(struct net_dev ipoib_dbg(priv, "cleaning up ib_dev\n"); - ipoib_mcast_stop_thread(dev); + ipoib_mcast_stop_thread(dev, 1); /* Delete the broadcast address and the local address */ ipoib_mcast_dev_down(dev); From rolandd at cisco.com Mon Sep 19 13:47:51 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 13:47:51 -0700 Subject: [openib-general] Re: Tavor HCAs with openib In-Reply-To: <20050919191619.65414.qmail@web34207.mail.mud.yahoo.com> (Manpreet Singh's message of "Mon, 19 Sep 2005 12:16:19 -0700 (PDT)") References: <20050919191619.65414.qmail@web34207.mail.mud.yahoo.com> Message-ID: <523bo0g4y0.fsf@cisco.com> > ib_mthca 0000:04:00.0: Missing UAR, aborting. > ib_mthca 0000:04:00.0: flags: 0, len: 0 OK, there's something wrong with the PCI bus setup in your system. The kernel is not giving us the correct BAR configuration -- pci_resource_flags(pdev, 2) and pci_resource_len(pdev, 2) are returning 0, instead of telling us we have 8 MB of memory. Have you tried a newer kernel like 2.6.13 or 2.6.14-rc1? What does lspci -vv say for this device? In any case you're probably going to have to report this to linux-kernel to get it resolved -- the issue seems to be in the PCI core code or in your BIOS. - R. From halr at voltaire.com Mon Sep 19 15:20:15 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 19 Sep 2005 18:20:15 -0400 Subject: [openib-general] RMPP Message Format Errors Message-ID: <1127168205.24173.1859.camel@hal.voltaire.com> Hi Eitan, The send side RMPP changes for the truncation of the last SA record have now stabilized. With the latest user_mad.c and osm_vendor_ibumad.c changes which are in the OpenIB svn tree (svn revision 3485), this is ready to be verified again. It safe to come out now :-) -- Hal From iod00d at hp.com Mon Sep 19 16:46:10 2005 From: iod00d at hp.com (Grant Grundler) Date: Mon, 19 Sep 2005 16:46:10 -0700 Subject: [openib-general] could not add HCA InfiniHost0 In-Reply-To: <1127026589.23305.3.camel@QiWang> References: <1126779086.22691.7.camel@QiWang> <20050915155035.GA3013@esmail.cup.hp.com> <1127026589.23305.3.camel@QiWang> Message-ID: <20050919234610.GH20254@esmail.cup.hp.com> On Sun, Sep 18, 2005 at 02:56:29PM +0800, QiWang, Chen wrote: > Hi, Grant > I can not change the slot, I only have one slot (BladeServer), Well, then it sounds like you have two different versions of the Blade server. You might pull them out and compare markings on the boards to see if they are the same rev/model/etc. I'll look at the output tomorrow and comment if anything unusual catches my attention. thanks, grant > > -------------------------------------------------------------------- > lspci -vvs 02:00.0 : > > 02:00.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) > (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR+ FastB2B- > Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- > SERR- Latency: 32, Cache Line Size 08 > Bus: primary=02, secondary=03, subordinate=03, sec-latency=32 > Memory behind bridge: e0000000-efffffff > Secondary status: 66Mhz+ FastB2B- ParErr- DEVSEL=medium >TAbort- > BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- > Capabilities: [70] PCI-X bridge device. > Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, > SRD- Freq=3 > Status: Bus=2 Dev=0 Func=0 64bit+ 133MHz+ SCD- USC-, > SCO-, SRD- > : Upstream: Capacity=512, Commitment Limit=512 > : Downstream: Capacity=128, Commitment Limit=128 > > --------------------------------------------------------------------------------------------------- > > lspci -vvs 02:01.0 > > > 02:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1) > (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR+ FastB2B- > Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- > SERR- Latency: 32, Cache Line Size 08 > Bus: primary=02, secondary=03, subordinate=03, sec-latency=32 > Memory behind bridge: e0000000-efffffff > Secondary status: 66Mhz+ FastB2B- ParErr- DEVSEL=medium >TAbort- > BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B- > Capabilities: [70] PCI-X bridge device. > Secondary Status: 64bit+, 133MHz+, SCD-, USC-, SCO-, > SRD- Freq=3 > Status: Bus=2 Dev=1 Func=0 64bit+ 133MHz+ SCD- USC-, > SCO-, SRD- > : Upstream: Capacity=512, Commitment Limit=512 > : Downstream: Capacity=128, Commitment Limit=128 > > ----------------------------------------------------------------------------------------------------- > > Thx > > On Thu, 2005-09-15 at 08:50 -0700, Grant Grundler wrote: > > > On Thu, Sep 15, 2005 at 06:11:26PM +0800, QiWang, Chen wrote: > > > there are some diff: > > > 02:00.0 --> work > > > 02:01.0 --> failed > > > > > > and first time I install the ib-verbs on node1, It also failed, because > > > lspci= 02:01.0, an I don not know how i change 02:01.0 to 02:00.0, and > > > it works fine for me. > > > > You can only change it by removing the Mellanox card and re-installing > > in the other slot. > > > > Can you post "lspci -vvs 02:01.0" output from the machine that failed? > > Can you post "lspci -vvs 02:00.0" output from the machine that worked? > > > > grant > > -- > QiWang, Chen > Clustars Supercomputing Technology corp. > http://www.Clustars.CN > TEL:+86-0816-2546345-815 > FAX:+86-0816-2546370 > Mobile:+86-13096497499 From jcdjobs at yahoo.com Mon Sep 19 17:02:54 2005 From: jcdjobs at yahoo.com (Jonathan Day) Date: Mon, 19 Sep 2005 17:02:54 -0700 (PDT) Subject: [openib-general] Several questions wrt OpenIB Message-ID: <20050920000255.61167.qmail@web34415.mail.mud.yahoo.com> Hi, A feq questions with regards to OpenIB, running on a Linux environment. I'm using a 64-bit Linux, running on Opterons and the Broadcom SB1 1250 (a dual-core MIPS64 chip that isn't known for doing things in a standard way) where I will be needing to use Infiniband, VIA, DAPL and MPI-2 at various points. The Opterons seem to be generally well-supported on everything I've seen so far, but MIPS64 coders seem to be somewhat of a rarity and Broadcom coders more so. As a result, I've been having a hard time getting things to run correctly or (in more cases than I'd like) at all. Are there any components of OpenIB which will definitely NOT work on a MIPS64 board or over the MIPS portion of the Linux kernel? Are there any projects out there to support VIA over OpenIB? Are there any major latency issues with the current svn version of OpenIB that I would need to take account of (and/or ignore) when doing timings tests? Thanks, Jonathan Day From rolandd at cisco.com Mon Sep 19 17:16:29 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 19 Sep 2005 17:16:29 -0700 Subject: [openib-general] Several questions wrt OpenIB In-Reply-To: <20050920000255.61167.qmail@web34415.mail.mud.yahoo.com> (Jonathan Day's message of "Mon, 19 Sep 2005 17:02:54 -0700 (PDT)") References: <20050920000255.61167.qmail@web34415.mail.mud.yahoo.com> Message-ID: <52ek7kr3tu.fsf@cisco.com> Jonathan> Are there any components of OpenIB which will definitely Jonathan> NOT work on a MIPS64 board or over the MIPS portion of Jonathan> the Linux kernel? Is your platform cache coherent between PCI and CPU accesses to memory? Based on my reading of arch/mips/Kconfig, it seems to be the case that the Sibyte SOCs are cache-coherent. If so, then everything _should_ work OK -- the IB drivers are 32/64 clean, endian clean, etc. However, you would probably be the first to even build the drivers for MIPS, so I would be curious to hear your experiences. Jonathan> Are there any projects out there to support VIA over OpenIB? Not that I know of. As far as I know, VIA is considered obsoleted by uDAPL. Jonathan> Are there any major latency issues with the current svn Jonathan> version of OpenIB that I would need to take account of Jonathan> (and/or ignore) when doing timings tests? No, as far as I know the current OpenIB provides latency as least as good as any other IB stack. - R. From Richard.Frank at oracle.com Mon Sep 19 19:01:51 2005 From: Richard.Frank at oracle.com (Rick Frank) Date: Mon, 19 Sep 2005 22:01:51 -0400 Subject: [openib-general] Managing SRP devices via iSCSI ? Message-ID: <00f101c5bd87$4500c480$6501a8c0@YOURA11C73D0FD> One key argument I've heard in favor of iSER vs SRP is that iSCSI (top level iSER driver) has a very strong management infrastructure - as it is fairly mature. However, iSER seems to be just gaining steam in terms of direct attached storage supporting this protocol .vs. SRP. Would it not be possible to implement some glue between SRP and iSCSI to allow for the discovery and management of SRP devices ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From yaronh at voltaire.com Mon Sep 19 21:47:48 2005 From: yaronh at voltaire.com (Yaron Haviv) Date: Tue, 20 Sep 2005 07:47:48 +0300 Subject: [openib-general] Managing SRP devices via iSCSI ? Message-ID: <35EA21F54A45CB47B879F21A91F4862F7B9BF5@taurus.voltaire.com> ________________________________________ >From: openib-general-bounces at openib.org [mailto:openib-general->bounces at openib.org] On Behalf Of Rick Frank >Sent: Monday, September 19, 2005 10:02 PM >To: openib-general at openib.org >Subject: [openib-general] Managing SRP devices via iSCSI ? > >One key argument I've heard in favor of iSER vs SRP is that iSCSI (top >level >iSER driver) has a very strong management infrastructure - as it is fairly >mature. > >However, iSER seems to be just gaining steam in terms of direct attached >storage supporting this protocol .vs. SRP. > >Would it not be possible to implement some glue between SRP and iSCSI to >allow for the discovery and management of SRP devices ? Rick, The question is why bother with a new approach when iSER is what you just suggested ? a. After all iSER transactions are similar to SRP ones (derived from SRP) with few enhancements in favor of iSER (SRQ, FMR, MC/S, immediate, recovery,..). b. iSER header and naming convention is derived from iSCSI, where as SRP naming and header structure is different forcing redundant translation between the two, and some functionality that wouldn't be possible such as Portals, MC/S, ACA, etc', makes more sense to just use the iSCSI base header format (like iSER does). c. iWarp guys that now join OpenIB will never use this non standard, IB specific SRP/iSCSI hybrid but rather the real iSER. d. SRP which was initially defined in T10 lost all its momentum in T10 (last SRP meeting was 2 years ago), not sure how you will standardize your proposal, where iSER is in IETF (integral part of iSCSI/IPS) and serves IB & iWarp, guaranteeing its momentum will grow, and it will be enhanced over time. So I believe overall it's simpler to move SRP implementations to iSER, (some vendors already wisely do that) than somehow define a non standard SRP with iSCSI management, after all iSER is just what you propose (improved SRP with iSCSI services), and is already defined (last call in IETF). By the way I wouldn't deduct from few early experiments of SRP storage in the market a whole lot on SRP adoption among key storage vendors or on their future plans. If you are interested in more details on iSER let me know Yaron From mst at mellanox.co.il Mon Sep 19 23:48:51 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Sep 2005 09:48:51 +0300 Subject: [openib-general] Re: Tavor HCAs with openib In-Reply-To: <20050917012911.9327.qmail@web34207.mail.mud.yahoo.com> References: <20050917012911.9327.qmail@web34207.mail.mud.yahoo.com> Message-ID: <20050920064851.GF2520@mellanox.co.il> Quoting r. Manpreet Singh : > Subject: Tavor HCAs with openib > > Hi, > > I was wondering if the Mellanox PCI-X cards (with > memory) are still supported in the openib stack. > Although the code seems to include it in the PCI > device id list, but I get the following error when I > load ib_mthca: > > ib_mthca 0000:04:00.0: Missing UAR, aborting > > Details of the configuration: > > Kernel version: 2.6.12.3. > OpenIB kernel stack version: 3459 (from Sep 16). > HCA card device ID: 0x5a44. > > The following is the PCI configuration space dump of > the HCA: > > [root at driver5-linux linux-kernel]# lspci -s 04:00.00 > -xxx > 04:00.0 InfiniBand: Mellanox Technology MT23108 > InfiniHost (rev a1) > 00: b3 15 44 5a 53 01 30 02 a1 00 06 0c 08 40 00 00 > 10: 04 00 d0 df 00 00 00 00 0c 00 80 ff 0f 00 00 00 > 20: 0c 00 00 f0 0f 00 00 00 00 00 00 00 b3 15 44 5a > 30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00 > 40: 11 50 1f 00 00 20 08 00 00 22 08 00 00 00 00 00 > 50: 03 60 ff 7f 11 11 00 00 00 00 00 00 00 00 00 00 > 60: 05 70 8a 00 00 00 00 00 00 00 00 00 00 00 00 00 > 70: 07 00 1c 00 00 04 e3 00 00 00 00 00 00 00 00 00 > 80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > I'd appreciate any help/comments on this. > > Thanks, > Manpreet. Yes. 10: 04 00 d0 df 00 00 00 00 0c 00 80 ff 0f 00 00 00 20: 0c 00 00 f0 0f 00 00 00 00 00 00 00 b3 15 44 5a Region 0 (offset 0x10) is at dfd00000, but Regions 2 and 4 (0x18 and 0x20) got assigned to 0x0fff080000 and 0x0ff0000000. It really looks like a BIOS/kernel issue. Does linux actually support 64 bit hardware addresses? pciutils seems not to like them: lspci (rev 2.1.11) says: ~>./lspci -F ~/foot -vv pcilib: 04:00.0 64-bit device address ignored. pcilib: 04:00.0 64-bit device address ignored. 04:00.0 Class 0c06: 15b3:5a44 (rev a1) Subsystem: 15b3:5a44 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- (64-bit, prefetchable) Region 4: Memory at (64-bit, prefetchable) Capabilities: [40] #11 [001f] Capabilities: [50] Vital Product Data Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/5 Enable- Address: 0000000000000000 Data: 0000 Capabilities: [70] PCI-X non-bridge device. Command: DPERE- ERO- RBC=3 OST=1 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple, DMMRBC=0, DMOST=0, DMCRS=0, RSCEM- Just to make sure, could you please look at the PCI bridge that the device is behind with lspci -vv? Need to check what happends to prefetchable memory behind bridge base/limit values. If region 2/4 values are outside this range, its a BIOS issue, otherwise it might be a kernel issue. -- MST From dotanb at mellanox.co.il Tue Sep 20 00:06:49 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Tue, 20 Sep 2005 10:06:49 +0300 Subject: [openib-general] RE: executing the SRQ pingpong example Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30FF1F2@mtlexch01.mtl.com> > > Thanks, I think this patch (already committed) should fix the crash. > > Index: linux-kernel/infiniband/hw/mthca/mthca_srq.c > =================================================================== > --- linux-kernel/infiniband/hw/mthca/mthca_srq.c (revision 3478) > +++ linux-kernel/infiniband/hw/mthca/mthca_srq.c (working copy) > @@ -172,6 +172,8 @@ static int mthca_alloc_srq_buf(struct mt > scatter->lkey = cpu_to_be32(MTHCA_INVAL_LKEY); > } > > + srq->last = get_wqe(srq, srq->max - 1); > + > return 0; > } > > @@ -263,7 +265,6 @@ int mthca_alloc_srq(struct mthca_dev *de > > srq->first_free = 0; > srq->last_free = srq->max - 1; > - srq->last = get_wqe(srq, srq->max - 1); > > return 0; > > thanks, the fix do the job ... Dotan -------------- next part -------------- An HTML attachment was scrubbed... URL: From yael at mellanox.co.il Tue Sep 20 00:48:07 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 20 Sep 2005 10:48:07 +0300 Subject: [openib-general] [PATCH] Opensm - ignore strict-aliasing warning Message-ID: <5z4q8gxjrc.fsf@mtl066.yok.mtl.com> Hi Hal, Attached is a patch forcing the compiler not to do strict-aliasing optimization on the opensm code, due to strict-aliasing warnings in our code. Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/Makefile.am =================================================================== --- opensm/Makefile.am (revision 3487) +++ opensm/Makefile.am (working copy) @@ -64,7 +64,7 @@ opensm_SOURCES = main.c osm_db_files.c o osm_ucast_mgr.c osm_ucast_updn.c \ osm_vl15intf.c osm_vl_arb_rcv.c\ osm_vl_arb_rcv_ctrl.c st.c -opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 +opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -fno-strict-aliasing -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 opensm_CXXFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 # for linking with the simulator client library we have to use g++: From rajib.majumder at csfb.com Tue Sep 20 00:52:13 2005 From: rajib.majumder at csfb.com (Majumder, Rajib) Date: Tue, 20 Sep 2005 15:52:13 +0800 Subject: [openib-general] PathScale InfiniPath Message-ID: Hello, I am just wondering if OpenIB stack supports InfiniPath adapter. Is there any HCA driver available for InfiniPath? If not, is PathScale working on it? Can someone from PathScale/OpenIB let me know the status? Thanks. Rajib ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.csfb.com/legal_terms/disclaimer_external_email.shtml ============================================================================== From hch at lst.de Tue Sep 20 01:37:17 2005 From: hch at lst.de (Christoph Hellwig) Date: Tue, 20 Sep 2005 10:37:17 +0200 Subject: [openib-general] Managing SRP devices via iSCSI ? In-Reply-To: <00f101c5bd87$4500c480$6501a8c0@YOURA11C73D0FD> References: <00f101c5bd87$4500c480$6501a8c0@YOURA11C73D0FD> Message-ID: <20050920083717.GA29726@lst.de> On Mon, Sep 19, 2005 at 10:01:51PM -0400, Rick Frank wrote: > One key argument I've heard in favor of iSER vs SRP is that iSCSI (top level > iSER driver) has a very strong management infrastructure - as it is fairly > mature. > > However, iSER seems to be just gaining steam in terms of direct attached > storage supporting this protocol .vs. SRP. > > Would it not be possible to implement some glue between SRP and iSCSI to > allow for the discovery and management of SRP devices ? No. It's also pretty pointless because the iSCSI code doesn't really offer anything fine over SRP. Just use SRP and report anything lacking in management, we'll try to address it. From eitan at mellanox.co.il Tue Sep 20 01:48:00 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 20 Sep 2005 11:48:00 +0300 Subject: [openib-general] Re: RMPP Message Format Errors In-Reply-To: <1127168205.24173.1859.camel@hal.voltaire.com> References: <1127168205.24173.1859.camel@hal.voltaire.com> Message-ID: <432FCCC0.8050202@mellanox.co.il> Hi Hal, Seems like RMPP works ! This is an important milestone for OpenSM as we are now able to test the SM/SA with osmtest. There is still some constant 8 bytes remainder in the RMPP number of received records calculation (see osmtest -V log file) but this is minor (as no SA record is that small). Thanks for your continuous support. Eitan Hal Rosenstock wrote: > Hi Eitan, > > The send side RMPP changes for the truncation of the last SA > record have now stabilized. With the latest user_mad.c and > osm_vendor_ibumad.c changes which are in the OpenIB svn tree (svn > revision 3485), this is ready to be verified again. It safe to come out > now :-) > > -- Hal > From yael at mellanox.co.il Tue Sep 20 01:58:05 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Tue, 20 Sep 2005 11:58:05 +0300 Subject: [openib-general] Opensm - osm_sa_path_record.c - variable declaration Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E22F8@mtlexch01.mtl.com> Hi Hal, I saw that in your code fixes of osm_sa_path_record.c you added some variable declaration in the middle of function (osm_pr_rcv_process - in the McastDest case). Windows compiler does not enable declaration not in the beginning of the function, so I would like to have it changed. We can either move the declaration to the beginning of the function, or add {} around the declaration. Do you have a preference? Yael -----Original Message----- From: Yael Kalka Sent: Tuesday, September 20, 2005 10:48 AM To: halr at voltaire.com Cc: openib-general at openib.org; Eitan Zahavi; Yael Kalka Subject: [PATCH] Opensm - ignore strict-aliasing warning Hi Hal, Attached is a patch forcing the compiler not to do strict-aliasing optimization on the opensm code, due to strict-aliasing warnings in our code. Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/Makefile.am =================================================================== --- opensm/Makefile.am (revision 3487) +++ opensm/Makefile.am (working copy) @@ -64,7 +64,7 @@ opensm_SOURCES = main.c osm_db_files.c o osm_ucast_mgr.c osm_ucast_updn.c \ osm_vl15intf.c osm_vl_arb_rcv.c\ osm_vl_arb_rcv_ctrl.c st.c -opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 +opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -fno-strict-aliasing -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 opensm_CXXFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 # for linking with the simulator client library we have to use g++: -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Tue Sep 20 02:13:08 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Sep 2005 12:13:08 +0300 Subject: [openib-general] Re: recursion depth exceeded in ipoib_workqueue In-Reply-To: <52aci8g546.fsf@cisco.com> References: <52aci8g546.fsf@cisco.com> Message-ID: <20050920091308.GQ2520@mellanox.co.il> Quoting Roland Dreier : > Subject: Re: recursion depth exceeded in ipoib_workqueue > > Yep, I was wrong again. > > OK, how about this: don't do the flush when we're calling from the > same workqueue context. Looks good to me. -- MST From halr at voltaire.com Tue Sep 20 02:09:19 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 05:09:19 -0400 Subject: [openib-general] Re: Opensm - osm_sa_path_record.c - variable declaration In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30E22F8@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30E22F8@mtlexch01.mtl.com> Message-ID: <1127207357.24173.5800.camel@hal.voltaire.com> Hi Yael, On Tue, 2005-09-20 at 04:58, Yael Kalka wrote: > I saw that in your code fixes of osm_sa_path_record.c you added some > variable declaration > in the middle of function (osm_pr_rcv_process - in the McastDest > case). > Windows compiler does not enable declaration not in the beginning of > the function, so I would > like to have it changed. > We can either move the declaration to the beginning of the function, > or add {} around the declaration. Please note that I don't have a Windows environment for OpenSM development. > Do you have a preference? Ny preference would be the latter. -- Hal > Yael > > > -----Original Message----- > From: Yael Kalka > Sent: Tuesday, September 20, 2005 10:48 AM > To: halr at voltaire.com > Cc: openib-general at openib.org; Eitan Zahavi; Yael Kalka > Subject: [PATCH] Opensm - ignore strict-aliasing warning > > > Hi Hal, > > Attached is a patch forcing the compiler not to do strict-aliasing > optimization on the opensm code, due to strict-aliasing warnings in > our code. > > Thanks, > Yael > > Signed-off-by: Yael Kalka > > Index: opensm/Makefile.am > =================================================================== > --- opensm/Makefile.am (revision 3487) > +++ opensm/Makefile.am (working copy) > @@ -64,7 +64,7 @@ opensm_SOURCES = main.c osm_db_files.c o > osm_ucast_mgr.c osm_ucast_updn.c \ > osm_vl15intf.c osm_vl_arb_rcv.c\ > osm_vl_arb_rcv_ctrl.c st.c > -opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT > $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 > +opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -fno-strict-aliasing > -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 > > opensm_CXXFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT > $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 > > # for linking with the simulator client library we have to use g++: > From halr at voltaire.com Tue Sep 20 02:23:34 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 05:23:34 -0400 Subject: [openib-general] Re: [PATCH] Opensm - ignore strict-aliasing warning In-Reply-To: <5z4q8gxjrc.fsf@mtl066.yok.mtl.com> References: <5z4q8gxjrc.fsf@mtl066.yok.mtl.com> Message-ID: <1127207794.24173.5863.camel@hal.voltaire.com> Hi Yael, On Tue, 2005-09-20 at 03:48, Yael Kalka wrote: > Attached is a patch forcing the compiler not to do strict-aliasing > optimization on the opensm code, due to strict-aliasing warnings in our code. I think this is more of a workaround (to remove the warnings) rather than a fix. It also removes other potential optimizations. -- Hal From halr at voltaire.com Tue Sep 20 02:31:40 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 05:31:40 -0400 Subject: [openib-general] Re: RMPP Message Format Errors In-Reply-To: <432FCCC0.8050202@mellanox.co.il> References: <1127168205.24173.1859.camel@hal.voltaire.com> <432FCCC0.8050202@mellanox.co.il> Message-ID: <1127208699.24173.5970.camel@hal.voltaire.com> On Tue, 2005-09-20 at 04:48, Eitan Zahavi wrote: > Hi Hal, > > Seems like RMPP works ! Yippee :-) > This is an important milestone for OpenSM as we are now able to test the SM/SA with osmtest. and also for Solaris. > There is still some constant 8 bytes remainder in the RMPP number of received records calculation > (see osmtest -V log file) but this is minor (as no SA record is that small). It sounds like there is still a calculation slightly off. I don't see a constant off by 8 remainder issue. In my configuration most seem fine and the only one which is not off by 20 (SA class header size) is the following: Sep 20 05:17:36 292850 [40FFF960] -> osm_vendor_get: Acquired UMAD 0x53cd40, size = 856. Sep 20 05:17:36 292861 [40FFF960] -> osm_vendor_get: ] Sep 20 05:17:36 292870 [40FFF960] -> osm_mad_pool_get: Acquired p_madw = 0x536190, p_mad = 0x53cd78, size = 856. Sep 20 05:17:36 292880 [40FFF960] -> osm_mad_pool_get: ] Sep 20 05:17:36 292889 [40FFF960] -> __osmv_sa_mad_rcv_cb: [ Sep 20 05:17:36 292899 [40FFF960] -> __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) Sep 20 05:17:36 292909 [40FFF960] -> osmtest_query_res_cb: [ Sep 20 05:17:36 292918 [40FFF960] -> osmtest_query_res_cb: ] Sep 20 05:17:36 292932 [40FFF960] -> __osmv_sa_mad_rcv_cb: ] Sep 20 05:17:36 292938 [AB001140] -> __osmv_send_sa_req: ] Sep 20 05:17:36 292971 [AB001140] -> osmv_query_sa: ] Sep 20 05:17:36 292980 [AB001140] -> osmtest_get_all_recs: ] Sep 20 05:17:36 292989 [AB001140] -> osmtest_validate_all_node_recs: Received 7 records. Is this what you are referring to ? I do also see: Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: ib_query failed (IB_REMOTE_ERROR). Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote error = IB_SA_MAD_STATUS_NO_RECORDS. Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected num of records is : 1, Found number of records : 0 and some timeouts: Sep 20 05:17:40 644730 [40FFF960] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=12) -- dropping. Sep 20 05:17:40 644740 [40FFF960] -> umad_receiver: ERR 5410: class 0x3 LID 0x0 Sep 20 05:17:40 644750 [40FFF960] -> __osmv_sa_mad_err_cb: [ Sep 20 05:17:40 644760 [40FFF960] -> osmtest_query_res_cb: [ Sep 20 05:17:40 644769 [40FFF960] -> osmtest_query_res_cb: ERR 0003: Error on query (IB_TIMEOUT). Sep 20 05:17:40 644787 [40FFF960] -> osmtest_query_res_cb: ] Sep 20 05:17:40 644801 [40FFF960] -> __osmv_sa_mad_err_cb: ] which then resulted in: Sep 20 05:17:40 644955 [AB001140] -> osmtest_wrong_sm_key_ignored: ERR 0011: Did not get a timeout but got (IB_SUCCESS). > Thanks for your continuous support. > > Eitan > > Hal Rosenstock wrote: > > Hi Eitan, > > > > The send side RMPP changes for the truncation of the last SA > > record have now stabilized. With the latest user_mad.c and > > osm_vendor_ibumad.c changes which are in the OpenIB svn tree (svn > > revision 3485), this is ready to be verified again. It safe to come out > > now :-) > > > > -- Hal > > > From halr at voltaire.com Tue Sep 20 02:37:17 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 05:37:17 -0400 Subject: [openib-general] Re: RMPP Message Format Errors In-Reply-To: <432FCCC0.8050202@mellanox.co.il> References: <1127168205.24173.1859.camel@hal.voltaire.com> <432FCCC0.8050202@mellanox.co.il> Message-ID: <1127208721.24173.5972.camel@hal.voltaire.com> Hi Eitan, On Tue, 2005-09-20 at 04:48, Eitan Zahavi wrote: > Hi Hal, > > Seems like RMPP works ! Yippee :-) > This is an important milestone for OpenSM as we are now able to test the SM/SA with osmtest. and also for Solaris. > There is still some constant 8 bytes remainder in the RMPP number of received records calculation > (see osmtest -V log file) but this is minor (as no SA record is that small). It sounds like there is still a calculation slightly off. I don't see a constant off by 8 remainder issue. In my configuration most seem fine and the only one which is not off by 20 (SA class header size) is the following: Sep 20 05:17:36 292850 [40FFF960] -> osm_vendor_get: Acquired UMAD 0x53cd40, size = 856. Sep 20 05:17:36 292861 [40FFF960] -> osm_vendor_get: ] Sep 20 05:17:36 292870 [40FFF960] -> osm_mad_pool_get: Acquired p_madw = 0x536190, p_mad = 0x53cd78, size = 856. Sep 20 05:17:36 292880 [40FFF960] -> osm_mad_pool_get: ] Sep 20 05:17:36 292889 [40FFF960] -> __osmv_sa_mad_rcv_cb: [ Sep 20 05:17:36 292899 [40FFF960] -> __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) Sep 20 05:17:36 292909 [40FFF960] -> osmtest_query_res_cb: [ Sep 20 05:17:36 292918 [40FFF960] -> osmtest_query_res_cb: ] Sep 20 05:17:36 292932 [40FFF960] -> __osmv_sa_mad_rcv_cb: ] Sep 20 05:17:36 292938 [AB001140] -> __osmv_send_sa_req: ] Sep 20 05:17:36 292971 [AB001140] -> osmv_query_sa: ] Sep 20 05:17:36 292980 [AB001140] -> osmtest_get_all_recs: ] Sep 20 05:17:36 292989 [AB001140] -> osmtest_validate_all_node_recs: Received 7 records. Is this what you are referring to ? I do also see: Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: ib_query failed (IB_REMOTE_ERROR). Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote error = IB_SA_MAD_STATUS_NO_RECORDS. Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected num of records is : 1, Found number of records : 0 and some timeouts: Sep 20 05:17:40 644730 [40FFF960] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=12) -- dropping. Sep 20 05:17:40 644740 [40FFF960] -> umad_receiver: ERR 5410: class 0x3 LID 0x0 Sep 20 05:17:40 644750 [40FFF960] -> __osmv_sa_mad_err_cb: [ Sep 20 05:17:40 644760 [40FFF960] -> osmtest_query_res_cb: [ Sep 20 05:17:40 644769 [40FFF960] -> osmtest_query_res_cb: ERR 0003: Error on query (IB_TIMEOUT). Sep 20 05:17:40 644787 [40FFF960] -> osmtest_query_res_cb: ] Sep 20 05:17:40 644801 [40FFF960] -> __osmv_sa_mad_err_cb: ] which then resulted in: Sep 20 05:17:40 644955 [AB001140] -> osmtest_wrong_sm_key_ignored: ERR 0011: Did not get a timeout but got (IB_SUCCESS). > Thanks for your continuous support. > > Eitan > > Hal Rosenstock wrote: > > Hi Eitan, > > > > The send side RMPP changes for the truncation of the last SA > > record have now stabilized. With the latest user_mad.c and > > osm_vendor_ibumad.c changes which are in the OpenIB svn tree (svn > > revision 3485), this is ready to be verified again. It safe to come out > > now :-) > > > > -- Hal > > > From guyg at voltaire.com Tue Sep 20 02:48:31 2005 From: guyg at voltaire.com (Guy German) Date: Tue, 20 Sep 2005 12:48:31 +0300 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: <432EF507.60707@ichips.intel.com> References: <432EF507.60707@ichips.intel.com> Message-ID: <432FDAEF.9080901@voltaire.com> Sean Hefty wrote: >> typedef void (*ib_cma_event_handler)(enum ib_cma_event event, void >> *context, const void *private_data); >> typedef void (*ib_cma_listen_handler)(void *cma_id, struct ib_device >> *device, void *private_data, void *context); > > > I think we can merge these two handlers. We do not want to pass back > struct ib_device* to a caller. The device needs to be associated with > the cma_id up front. The listen handler passes also the new cma_id. Do you think it is better to merge them and pass null in this field in the "active side" case? (or maybe pass the same cma_id back in the handler)... >> int ib_cma_get_device(struct sockaddr *remote_address, >> enum ib_qos qos, struct ib_device **device, u8 *port); > > > I don't believe that we can support this function and still work with > device removal. I agree that this is an open issue, at the moment, that's why I did not implement it yet. I will try to open a discussion on it, in a different thread. >> int ib_cma_connect(struct ib_cma_conn *cma_conn, void **cma_id); > > Creating the cma_id inside this call, rather than using a separate call > means that the user must be able to handle a connection request callback > before the cma_id is known. I.e. a callback can occur before this call > returns. (In fact, the entire connection could be established, data > transfered, and disconnected before this call returns.) It may be > easier to have a separate call to allocate the cma_id that records the > context and event handler. That's a good point. I wanted to save the cma consumer the trouble of creating and destroying cma_id's, and thought the cma can do it for him, but I agree that this can be problematic. I will change that. >> int ib_cma_listen(struct ib_device *device, struct sockaddr *address, >> __be64 service_id, void *context, >> ib_cma_listen_handler cm_listen_handler, >> void **cma_id); > > > Same issue as above. Point taken. >> int ib_cma_destroy(void *cma_id); > > Why not have this apply to both active and passive sides? Will this > interface support peer to peer connections? If so, then we may not want > to distinguish between active and passive at this level. True. This will have to apply to the active side as well, after we add ib_cma_create. Thanks, Guy From guyg at voltaire.com Tue Sep 20 02:49:09 2005 From: guyg at voltaire.com (Guy German) Date: Tue, 20 Sep 2005 12:49:09 +0300 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: <52slw1gfs2.fsf@cisco.com> References: <52slw1gfs2.fsf@cisco.com> Message-ID: <432FDB15.10809@voltaire.com> Roland Dreier wrote: > This isn't horrible, Cheers ;) > but you seem to have ignored most of the > discussion from last month: I did not implement yet the functions that were controversial (i.e. ib_cma_get_device and ib_cma_get_src_ip). I think it is more productive to discuss those issues over a preliminary implementation. > > int ib_cma_get_device(struct sockaddr *remote_address, > > enum ib_qos qos, struct ib_device **device, u8 *port); > > How are you dealing with hotplug and object lifetime issues here? I did not fully understand why the user can't deal with hot unplug, if he follows: - register as client (implements add and removal functions) - creates a list of devices - get a device from the cma (request by ip) - check the device is valid and appears in its list - apply a locking/synchronization mechanism between the removal function and the calls to the verbs. I guess I am missing something, so I will start a new thread on that issue. > > int ib_cma_get_src_ip(void *cma_id, ib_cma_addr_handler addr_handler, > > void *context); > > There's no point in making this asynchronous, since we're putting the > source/dest information in the CM REQ private data. Do we have to wait to an ibta approval to do this or can this be implemented right away, in this fashion ? > > int ib_cma_create_qp(struct ib_pd *pd, u8 port, struct ib_qp **qp, > > struct ib_qp_init_attr *init_attr); > > What's the point of this function? This function (_in the current implementation_) is sort of "application note" on what cma consumers need to do. It forces the consumer to use a qp that was modified to init after creation. Its only 25 lines long, so I thought I can leave it there even if one chooses not to use it. If we choose to change the implementation of device retrieval this function might be useful. > > int ib_cma_listen(struct ib_device *device, struct sockaddr *address, > > __be64 service_id, void *context, > > ib_cma_listen_handler cm_listen_handler, > > void **cma_id); > > A minor point, but there's no need for the service_id parameter. > We'll just use the sin_port (or sin6_port) member of the sockaddr. > Similarly struct ib_cma_conn doesn't need the service_id member either. iSER, today, uses sid of 64bit, other ULP's might use 64bit sid's too, why limit them to 16 bit ? Thanks, Guy From guyg at voltaire.com Tue Sep 20 03:18:47 2005 From: guyg at voltaire.com (Guy German) Date: Tue, 20 Sep 2005 13:18:47 +0300 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <52oe6pgfnt.fsf@cisco.com> References: <52oe6pgfnt.fsf@cisco.com> Message-ID: <432FE207.1050402@voltaire.com> Roland Dreier wrote: > the only use of it is bogus as well. You never use timeout again, > which is good because qp_attr->timeout is not in units of > microseconds; it's already in the IB logarithmic scale. You are right. It is a dead code - I will remove it. The first implementation takes default values. Guy From eitan at mellanox.co.il Tue Sep 20 03:33:24 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 20 Sep 2005 13:33:24 +0300 Subject: [openib-general] Re: RMPP Message Format Errors In-Reply-To: <1127208699.24173.5970.camel@hal.voltaire.com> References: <1127208699.24173.5970.camel@hal.voltaire.com> Message-ID: <432FE574.4060800@mellanox.co.il> Hal Rosenstock wrote: > On Tue, 2005-09-20 at 04:48, Eitan Zahavi wrote: > >>Hi Hal, >> >>Seems like RMPP works ! > > > > Is this what you are referring to ? Yes the line of interest is: __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) This shows 16byte extra in the data size. > > I do also see: > Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: > ib_query failed (IB_REMOTE_ERROR). > Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote > error = IB_SA_MAD_STATUS_NO_RECORDS. > Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected > num of records is : 1, Found number of records : 0 The full osmtest flow has some intentional errors injected. If it provides the "PASSED" message at the end it means that the errors were intentional and expected. Some of the flows have a special output message that wraps the errors in a section like: "vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv" ... "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^" We probably need to apply this convention to all the "bad flows". EZ From halr at voltaire.com Tue Sep 20 03:38:16 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 06:38:16 -0400 Subject: [openib-general] Re: RMPP Message Format Errors In-Reply-To: <432FE574.4060800@mellanox.co.il> References: <1127208699.24173.5970.camel@hal.voltaire.com> <432FE574.4060800@mellanox.co.il> Message-ID: <1127212695.24173.6501.camel@hal.voltaire.com> On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote: > > Is this what you are referring to ? > Yes the line of interest is: > __osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) > This shows 16byte extra in the data size. Should it be 20 for the SA class header size or 0 here ? > > I do also see: > > Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR 0370: > > ib_query failed (IB_REMOTE_ERROR). > > Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: Remote > > error = IB_SA_MAD_STATUS_NO_RECORDS. > > Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: Expected > > num of records is : 1, Found number of records : 0 > The full osmtest flow has some intentional errors injected. > If it provides the "PASSED" message at the end it means that > the errors were intentional and expected. > > Some of the flows have a special output message that wraps the > errors in a section like: > > "vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv" > ... > "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^" > > We probably need to apply this convention to all the "bad flows". Then is expected number of records in this test 1 rather than 0 ? -- Hal From eitan at mellanox.co.il Tue Sep 20 03:46:25 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 20 Sep 2005 13:46:25 +0300 Subject: [openib-general] Re: RMPP Message Format Errors In-Reply-To: <1127212695.24173.6501.camel@hal.voltaire.com> References: <1127212695.24173.6501.camel@hal.voltaire.com> Message-ID: <432FE881.2080007@mellanox.co.il> Hal Rosenstock wrote: > On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote: > >>>Is this what you are referring to ? >> >>Yes the line of interest is: >>__osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) >>This shows 16byte extra in the data size. > > > Should it be 20 for the SA class header size or 0 here ? Should be 0. Means an the packet size should accommodate an integer number of SA records (after removing the headers size). > > >>>I do also see: >>>Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR > > 0370: > >>>ib_query failed (IB_REMOTE_ERROR). >>>Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: > > Remote > >>>error = IB_SA_MAD_STATUS_NO_RECORDS. >>>Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: > > Expected > >>>num of records is : 1, Found number of records : 0 >> >>The full osmtest flow has some intentional errors injected. >>If it provides the "PASSED" message at the end it means that >>the errors were intentional and expected. >> >>Some of the flows have a special output message that wraps the >>errors in a section like: >> >>"vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv" >>... >>"^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^" >> >>We probably need to apply this convention to all the "bad flows". > > > Then is expected number of records in this test 1 rather than 0 ? > > -- Hal > From ali at mellanox.co.il Tue Sep 20 04:31:46 2005 From: ali at mellanox.co.il (Ali Ayoub) Date: Tue, 20 Sep 2005 14:31:46 +0300 Subject: [openib-general] IPoIB interface MAC Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E306A26C@mtlexch01.mtl.com> Hi all, > How can I retrieve the MAC address for a specific IPoIB interface? Using > ifconfig doesn't produce a good results, here is ifconfig output for > machines with GEN2: > > SUSE 9. 3, 2.6.13 > ib0 Link encap:UNSPEC HWaddr > 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:11.4.8.156 Bcast:11.255.255.255 Mask:255.0.0.0 > inet6 addr: fe80::202:c912:4538:5345/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:729127 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1764771 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:37915221 (36.1 Mb) TX bytes:3522251817 (3359.0 Mb) > > REDHAT 4, 2.6.9 > ib0 Link encap:UNSPEC HWaddr > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 > inet addr:11.4.8.67 Bcast:11.4.255.255 Mask:255.255.0.0 > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) > > Thanks, > Ali Ayoub > Mellanox Technologies LTD > Tel: +972-4-9097200 Ext: 251 > Cell: +972-54-5245673 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Sep 20 04:40:58 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 07:40:58 -0400 Subject: [openib-general] IPoIB interface MAC In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E306A26C@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E306A26C@mtlexch01.mtl.com> Message-ID: <1127216457.24173.7062.camel@hal.voltaire.com> On Tue, 2005-09-20 at 07:31, Ali Ayoub wrote: > Hi all, > How can I retrieve the MAC address for a specific IPoIB interface? ip addr show dev ib0 19: ib0: mtu 2044 qdisc pfifo_fast qlen 128 link/[32] 00:0e:04:04:fe:80:00:00:00:00:00:00:00:08:f1:04:03:96:05:59 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 192.168.0.1/24 brd 192.168.0.255 scope global ib0 inet6 fe80::208:f104:396:559/64 scope link valid_lft forever preferred_lft forever > Using ifconfig doesn't produce a good results, here is ifconfig > output for machines with GEN2: > > SUSE 9. 3, 2.6.13 > ib0 Link encap:UNSPEC HWaddr > 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:11.4.8.156 Bcast:11.255.255.255 Mask:255.0.0.0 > inet6 addr: fe80::202:c912:4538:5345/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:729127 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1764771 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:37915221 (36.1 Mb) TX bytes:3522251817 (3359.0 Mb) > > REDHAT 4, 2.6.9 > ib0 Link encap:UNSPEC HWaddr > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 > inet addr:11.4.8.67 Bcast:11.4.255.255 Mask:255.255.0.0 > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) > > > Thanks, > Ali Ayoub > Mellanox Technologies LTD > Tel: +972-4-9097200 Ext: 251 > Cell: +972-54-5245673 > > > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Tue Sep 20 04:53:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 07:53:14 -0400 Subject: [openib-general] Re: RMPP Message Format Errors In-Reply-To: <432FE881.2080007@mellanox.co.il> References: <1127212695.24173.6501.camel@hal.voltaire.com> <432FE881.2080007@mellanox.co.il> Message-ID: <1127217006.24173.7150.camel@hal.voltaire.com> On Tue, 2005-09-20 at 06:46, Eitan Zahavi wrote: > Hal Rosenstock wrote: > > On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote: > > > >>>Is this what you are referring to ? > >> > >>Yes the line of interest is: > >>__osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) > >>This shows 16byte extra in the data size. > > > > > > Should it be 20 for the SA class header size or 0 here ? > Should be 0. Means an the packet size should accommodate an integer number of > SA records (after removing the headers size). OK. There's a problem or problems on the receive side (of RMPP) to look into but these appear OK for SA client right now. > >>>I do also see: > >>>Sep 20 05:16:40 995667 [AB001140] -> osmt_get_service_by_name: ERR > > > > 0370: > > > >>>ib_query failed (IB_REMOTE_ERROR). > >>>Sep 20 05:16:40 995673 [AB001140] -> osmt_get_service_by_name: > > > > Remote > > > >>>error = IB_SA_MAD_STATUS_NO_RECORDS. > >>>Sep 20 05:16:40 995678 [AB001140] -> osmt_get_service_by_name: > > > > Expected > > > >>>num of records is : 1, Found number of records : 0 > >> > >>The full osmtest flow has some intentional errors injected. > >>If it provides the "PASSED" message at the end it means that > >>the errors were intentional and expected. > >> > >>Some of the flows have a special output message that wraps the > >>errors in a section like: > >> > >>"vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv" > >>... > >>"^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^" > >> > >>We probably need to apply this convention to all the "bad flows". > > > > > > Then is expected number of records in this test 1 rather than 0 ? Will these be fixed ? Are these issues being documented along with other ones previously noted ? -- Hal From guyg at voltaire.com Tue Sep 20 05:04:45 2005 From: guyg at voltaire.com (Guy German) Date: Tue, 20 Sep 2005 15:04:45 +0300 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <432EFC74.6050105@ichips.intel.com> References: <432EFC74.6050105@ichips.intel.com> Message-ID: <432FFADD.2060002@voltaire.com> Hi Sean, Thanks a lot for all your comments. I think that adding cma_create and force use of cma_destroy, as you suggested, would take care of many of the comments below. Sean Hefty wrote: >> #define CMA_TARGET_MAX 4 >> #define CMA_INITIATOR_DEPTH 4 >> #define CMA_RC_RETRY_COUNT 7 >> #define CMA_RNR_RETRY_COUNT 6 >> #define CMA_CM_RESPONSE_TIMEOUT 20 /* 4 sec */ >> #define CMA_MAX_CM_RETRIES 0 > > Are these values hard-coded just for the initial implementation? How > would these change? I thought the consumer would pass some of them in the qp_attr that in struct ib_cma_conn, other parameters can be added to the struct, if needed. >> enum cma_close_flags { >> CMA_CLOSE_ABRUPT = 0, >> CMA_CLOSE_GRACEFUL >> }; > > Not sure what these are for. Why not have the user always destroy the > cma_id? If it hasn't yet been destroyed when a disconnect comes in, > callback the user. If a connection hasn't been disconnected when it is > destroyed, automatically send a disconnect message. OK. >> struct accept_callback { >> ib_cma_ac_handler func; >> void *context; >> }; >> >> struct listen_callback { >> ib_cma_listen_handler func; >> void *context; >> }; > > > These could be eliminated if we just associated a context with a cma_id > and left it at that. I think we're asking for race conditions if we try > to update the context with every callback. I don't think I update the context more then once (unless the consumer calls cma_accept twice or something..) It is there for convenience - to distinguish between a context held for an accept cb or listen cb. >> static int cma_modify_qp_state(struct ib_cm_id *cm_id, struct ib_qp *qp, >> enum ib_qp_state qp_state, >> int qp_attr_mask) >> { >> struct ib_qp_attr qp_attr; >> int status = 0; >> printk(KERN_DEBUG PFX "%s: enter >>> modify to %d\n", >> __func__, qp_state); >> >> if (qp == NULL) >> return -EINVAL; > We shouldn't need checks like this in the kernel. OK >> memset(&qp_attr, 0, sizeof qp_attr); >> qp_attr.qp_state = qp_state; >> >> if (cm_id && !qp_attr_mask) > Or this check... This check we do need, because: - when we call modify qp state to RTR or RTS cm_id is valid and qp_attr_mask==0, so we need to call ib_cm_init_qp_attr - when we call modify qp state to ERROR cm_id==0 and qp_attr_mask is valid >> static int destroy_cma_ctx(struct cma_context *cma_ctx) >> { >> if(!IS_ERR(cma_ctx->cm_id)) >> ib_destroy_cm_id(cma_ctx->cm_id); >> if (cma_ctx->cma_param.private_data) >> kfree(cma_ctx->cma_param.private_data); > > > Is this the outbound private data or inbound? Why not tie the private > data to an event and avoid storing it with the cma_ctx? It is the private data passed by the consumer in the connection request. I stored it in cma_ctx to retrieve it back in the cma_path_handler. For some reason I saw that the private data was garbaged when I passed the consumer's pointer of private_data (maybe it was my improvised test module)... > >> if (cma_ctx) >> kfree(cma_ctx); >> >> cma_ctx = NULL; > > > Is there a reason to set this to NULL? > >> return 0; >> } The reason was to check cma_ctx in cma_disconnect - as to not free it twice. After the change of forcing cma_destroy call before disconnect - this would no longer be needed. >> static int cma_disconnect(struct ib_qp *qp, >> struct cma_context *cma_ctx, enum >> cma_close_flags cflags) >> { >> int status; >> >> if (cma_ctx == NULL) >> goto modqp; > > > We shouldn't need this check. Right. see above. >> if (cflags == CMA_CLOSE_ABRUPT) >> status = destroy_cma_ctx(cma_ctx); > > > Why would this call fail? What would the user do if it does? I will change it to void. > >> else if (cflags == CMA_CLOSE_GRACEFUL){ >> status = ib_send_cm_dreq(cma_ctx->cm_id, NULL, 0); >> } > > > See comments above. Eliminate the GRACEFUL/ABRUPT flags and just let > the user either issue the disconnect or just destroy the cma_ctx. Agreed. >> modqp: >> status = cma_modify_qp_state(0, qp, IB_QPS_ERR, IB_QP_STATE); >> return status; >> } >> >> void cma_connection_callback(struct cma_context *cma_ctx, >> const enum ib_cma_event event, >> const void *private_data) >> { >> ib_cma_event_handler conn_cb; >> struct ib_qp *qp = cma_ctx->cma_conn.qp; >> int status; >> >> conn_cb = cma_ctx->cma_conn.cma_event_handler; >> >> switch (event) { >> case IB_CMA_EVENT_ESTABLISHED: >> break; >> case IB_CMA_EVENT_DISCONNECTED: >> case IB_CMA_EVENT_REJECTED: >> case IB_CMA_EVENT_UNREACHABLE: >> case IB_CMA_EVENT_NON_PEER_REJECTED: >> status = cma_disconnect(qp, cma_ctx, CMA_CLOSE_ABRUPT); > > > This is destroying the cma_ctx without the user knowing it. The > dereference to cma_ctx below will crash. We shouldn't take any action > on behalf of the user. Simply report the error and let the user destroy > the cma_id. This is the same of create/destroy issue. I will remove the call. >> int cma_active_cb_handler(struct ib_cm_id *cm_id, struct ib_cm_event >> *event) >> { >> int status = 0; >> enum ib_cma_event cma_event = 0; >> struct cma_context *cma_ctx = cm_id->context; >> >> printk(KERN_DEBUG PFX "%s: enter >>> cm_id=%p >> cma_ctx=%p\n",__func__, cm_id, cma_ctx); >> >> switch (event->event) { >> case IB_CM_REQ_ERROR: >> cma_event = IB_CMA_EVENT_UNREACHABLE; >> break; >> case IB_CM_REJ_RECEIVED: >> cma_event = IB_CMA_EVENT_NON_PEER_REJECTED; >> break; >> case IB_CM_DREP_RECEIVED: >> case IB_CM_TIMEWAIT_EXIT: >> cma_event = IB_CMA_EVENT_DISCONNECTED; >> break; >> case IB_CM_REP_RECEIVED: >> status = cma_rep_recv(cma_ctx, event); >> if (!status) >> cma_event = IB_CMA_EVENT_ESTABLISHED; >> else >> cma_event = IB_CMA_EVENT_DISCONNECTED; >> break; >> case IB_CM_DREQ_RECEIVED: >> ib_send_cm_drep(cm_id, NULL, 0); >> cma_event = IB_CMA_EVENT_DISCONNECTED; >> break; >> case IB_CM_DREQ_ERROR: >> break; >> default: >> printk(KERN_WARNING PFX "%s: cm event (%d) not handled\n", >> __func__, event->event); >> break; >> } >> >> printk(KERN_WARNING PFX "%s: cm_event=%d cma_event=%d\n", >> __func__, event->event, cma_event); >> >> if (cma_event) > > This check isn't needed. OK >> cma_connection_callback(cma_ctx, cma_event, >> event->private_data); >> >> return status; > > Returning non-zero will destroy the underlying cm_id. Interesting. I didn't know that. > We can avoid some > synchronization by letting it exist until the user destroys the > corresponding cma_id. Otherwise, there's the potential of the user > trying to destroy it twice. Once from the cma_connection_callback > reporting an error, and then again here. So you suggests this function will always return 0, then ? >> } >> >> static struct cma_context *get_cma_ctx(struct ib_cm_id *cm_id, >> struct ib_cm_event *event) >> { >> struct cma_context *new_cma_ctx; >> int status; >> >> >> if (event->event != IB_CM_REQ_RECEIVED) >> return cm_id->context; >> >> if (((struct cma_context *)cm_id->context)->cm_id != cm_id) >> printk(KERN_DEBUG PFX "%s: old_cm_id=%p new_cm_id=%p\n", >> __func__, ((struct cma_context *)cm_id->context)->cm_id, >> cm_id); > > > This check shouldn't be needed. It is just for debug purposes - I will remove it later on. >> new_cma_ctx = kmalloc(sizeof *new_cma_ctx, GFP_KERNEL); >> if (!new_cma_ctx) { >> status = ib_send_cm_rej(cm_id, >> IB_CM_REJ_CONSUMER_DEFINED, >> NULL, 0, NULL, 0); >> return NULL; >> } >> >> memset(new_cma_ctx, 0, sizeof *new_cma_ctx); >> new_cma_ctx->cm_id = cm_id; >> new_cma_ctx->creq_cma_ctx = cm_id->context; >> cm_id->context = new_cma_ctx; >> >> return new_cma_ctx; >> } >> >> static void cma_path_handler(u64 req_id, void *context, int rec_num) >> { >> struct cma_context *cma_ctx = context; >> enum ib_cma_event event; >> int status = 0; >> >> if (!cma_ctx) { > > This check isn't needed. What if the consumer destroyed the cma_id, before the path handler cb returned ? >> printk(KERN_ERR PFX "%s: context received null\n",__func__); >> return; >> int ib_cma_create_qp(struct ib_pd *pd, u8 port, struct ib_qp **qp_in, >> struct ib_qp_init_attr *init_attr) >> { > > Why not return struct ib_qp* similar to how the other APIs operate? I thought of returning ib_modify_qp status, but I agree that destroying the qp if failed and just returning the qp if not, is better. >> struct ib_qp_attr qp_attr; >> int qp_attr_mask; >> struct ib_qp *qp; >> >> qp = ib_create_qp(pd, init_attr); >> if (IS_ERR(qp)) >> return IS_ERR(qp); >> *qp_in = qp; >> printk(KERN_DEBUG PFX "%s: qp created (%p)\n",__func__, qp); >> memset(&qp_attr, 0, sizeof qp_attr); >> >> qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS | >> IB_QP_PKEY_INDEX | IB_QP_PORT; >> >> qp_attr.qp_access_flags = IB_ACCESS_REMOTE_READ | >> IB_ACCESS_REMOTE_WRITE; >> qp_attr.qp_state = IB_QPS_INIT; >> qp_attr.pkey_index = 0; >> qp_attr.port_num = port; >> printk(KERN_DEBUG PFX "%s: call ib_modify_qp (qp_attr_mask=%x)\n", >> __func__, qp_attr_mask); >> >> return ib_modify_qp(qp, &qp_attr, qp_attr_mask); > > > If modify fails, the QP should be destroyed. OK. see above. >> int ib_cma_reject(void *cma_id, const void *private_data, >> u8 private_data_len) >> { >> struct cma_context *cma_ctx = cma_id; >> int status; >> >> if (cma_ctx == NULL) > > This check isn't needed. OK > As a general statement, can we reduce the number of debug print statements? Sure. This is mainly for bring up purposes, they can be removed after the code stabilizes. Thanks again for your detailed feedback Guy. From IBMEHCAD at de.ibm.com Tue Sep 20 05:48:14 2005 From: IBMEHCAD at de.ibm.com (IBMEHCA DD) Date: Tue, 20 Sep 2005 14:48:14 +0200 Subject: [openib-general] IBM eHCA Device Driver for gen2 IB stack In-Reply-To: <52y87uk47r.fsf@topspin.com> Message-ID: we released a https://sourceforge.net/projects/ibmehcad/ ehca2_0025 today which adresses most of these comments, and which survives most of pallas Roland Dreier wrote on 25.07.2005 22:42:16: > IBMEHCA> Hi, we've completed the first alpha code drop of the > IBMEHCA> Power5 IBM eHCA Device Driver for the for the gen2 > IBMEHCA> openib.org stack. We're running IPoIB and ibv userspace > IBMEHCA> programs successfully with this code in our lab setup. > > IBMEHCA> The source files can be downloaded from > IBMEHCA> https://sourceforge.net/projects/ibmehcad/ ehca2_0011e > > Thanks for posting this. A few comments from a first read through. > These are in no particular order, with minor nitpicks mixed in with > more series problems. Some more review is still needed before this > code should go upstream. > ehca (kernel driver): > - Please use /* */ instead of // for comments done > - Can the debugging stuff be less ugly than EDEB_AV_EN() etc? > Also the formatting of your goto labels is not very nice. Instead > of something like > > EHCA_REG_SMR_EXIT0: > if (retcode == 0) > > I'd rather see > out: > if (retcode == 0) we changed the goto statements to lower case, but didn't remove the numbering yet. The numbering actually is sort of a coding pattern to detect cleanup bugs. The idea behind it is that resources must be freed in the inverse order they where created, so a void * pd=malloc(4096) if (0 == pd) { goto error_out_dealloc_pd } ... void * cq=malloc(4096) if (0 == cq) { goto error_out_dealloc_cq } .... if (register_cq_fails) { goto error_out_dealloc_cq } ... void * qp=malloc(4096) if (0 == qp) { goto error_out_dealloc_qp } goto out; error_out_dealloc_qp free(qp); error_out_dealloc_cq free(cq); error_out_dealloc_pd free(pd); out: return would be more error prone than using numering. The basic idea is as soon as you allocate/register a resource (memory, HCA resources, kobject...) increase the goto statement number by one free in opposite order as allocated, never skip a number during the initial checks ...and if you see sth like if (...) goto out_0 if (...) goto out_2 if (...) goto out_1 it's very obviously wrong. > > - In ehca_classes.c, it would be better to avoid using vmalloc() to > allocate your structures. done > > - In ehca_common.h, all the stuff copied from from hvcall.h should be > deleted. Also, there are some functions that seem to use ugly > return codes like H_Success for no good reason. done (mostly, some of these defines are IB specific) > > - The definitions of p_to_u64() etc. seem unnecesary and needlessly > obfuscated. You can always do > > (unsigned long) ptr > to cast a void * to u64, and > (void *) (unsigned long) val done > - Similarly in ehca_qp.c, > #define QP_ATTR_IS_SET(mask,attr) (((mask)&(attr))!=0) > is just needless obfuscation: just do the & directly where you need it. done > - In ehca_qp.c, is there a way we can take modqp_statetrans_table[] > into common code and share with mthca_qp.c? > well, maybe, modify QP is very complicated to get straight, but if somebody from mthca_qp.c thinks these are the right set of transitions (this state machine base algorithm over the last 2 years) > - It seems many source files include ehca_classes_pSeries.h > directly. They should probably include ehca_classes.h. Also > ehca_classes_zSeries.h seems to be missing. done we currently don't have a ehca_classes_zSeries.h > > Instead of defining P_SERIES on your command line, I think you > should just include and test CONFIG_PPC_PSERIES > and CONFIG_ARCH_S390 instead. > > - In ehca_main.c, ehca_hca_resources_show() violates the "one value > per file" rule for sysfs. Either put this in debugfs or split it > up into separate attributes in a subdirectory. done > > - What is ehca_register_pci() doing? Why are you changing the main > kernel's pci_dma_ops?? Shouldn't you be creating a device on your > own virtual bus with it's own dma ops? > done, see ebus we're currently trying to get that integrated into base ppc64 kernel > - It seems ehca_nopage() accesses ehca_idr without taking the > ehca_idr_sem, is this OK? > good, point! done > libehca: > - It would probably be a good idea to use GNU autotools to build > libehca. In particular you want to make the destination directory > configurable. done (partly) this is the initial version of autotools build, still learning how to use autotools in a better way... > > - I don't think you should hard code -m64 in your CFLAGS or put > /usr/local/lib64 anywhere -- I believe nearly everyone running > ppc64 is using 32-bit userspace. Have you tested 32-bit userspace > on a 64-bit kernel? 32 bit userspace will be supported in one of the next versions. we're running mvapich as 64 bit library > > - I don't think the install should unconditionally delete > /dev/infiniband and then create the device nodes -- this should > really be handled by either udev or the distribution. > removed that from standard build, still trying to figure out how to configure udev to automatically add the infiniband device nodes > - In ehca_reqs_core.c, get rid of all the > #ifndef __KERNEL__ > #define ib_recv_wr ibv_recv_wr > #define ib_send_wr ibv_send_wr > //#define ib_ah ibv_ah > #define ehca_av ehcau_av > #define unlikely(x) x > // ib_wr_opcode > #define IB_WR_SEND IBV_WR_SEND > #define IB_WR_SEND_WITH_IMM IBV_WR_SEND_WITH_IMM > #define IB_WR_RDMA_WRITE IBV_WR_RDMA_WRITE > #define IB_WR_RDMA_WRITE_WITH_IMM IBV_WR_RDMA_WRITE_WITH_IMM > > and so on. It's not a good idea to try and use the same source for > code that runs in the kernel and code that runs in userspace. technically the kernel version looks like it's a kernel only module... > Splitting your source also gets rid of stuff like > > #ifdef EHCA_USERDRIVER > u8 *data = (u8 *) sge->addr; > #else > u8 *data = (u8 *) abs_to_virt(sge->addr); > #endif EHCA_USERDRIVER is the rest of a userspace sanbox to run the kernel code in userspace, as done for parts of the scsi stack for example. we'll try to remove that flag on the next release > - In ehca_uinit.c, is the LIBEHCA_FILE environment variable handling > safe? It seems that a user could trick a setuid process into doing > something bad here. > good point! now picking up the place where to trace in a config file in /etc > - In ehca_umain.c there seems to be some weird indentation: > my_cq->ehca_cq_core.galpas.kernel.fw_handle = (u64) > mmap(NULL, 4096, PROT_WRITE, MAP_SHARED, context->cmd_fd, > ((u64) (my_cq->token) << 32) | 0x11000000); hmm, changed from bad to not completely good. Will fix in the next version. Hope we didn't add some more of those. > - Does it make sense to encapsulate to use of ppc64-specific assembly > stuff like: > > __asm__ __volatile("dcbz 0,%0"::"r"(adr)); > __asm__ __volatile("dclst 0,%0"::"r"(adr)); > __asm__ __volatile__("sync":::"memory"); // serialize GAL register access done > > - In ehca_utools.h, ehca_swapbl32 and ehca_swapbl16 seem kind of useless. > Why not just use ntohl/ntohs directly? done > > Thanks, > Roland Thanks for your input, I'm pretty sure this only was the start to get that code kernel ready. Christoph -------------- next part -------------- An HTML attachment was scrubbed... URL: From guyg at voltaire.com Tue Sep 20 06:35:13 2005 From: guyg at voltaire.com (Guy German) Date: Tue, 20 Sep 2005 16:35:13 +0300 Subject: [openib-general][RFC][CMA]: ib_cma_get_device hot unplug issue Message-ID: <43301011.1000200@voltaire.com> Hi, I'm sorry for bringing it up again, but I don't understand yet why a cma consumer is different then any other verbs consumer (who needs to synchronize between a device removal cb and device verbs calls). I understand that when returning from ib_cma_get_device the device can be no longer valid, but if the consumer is aware of that and makes sure it is (checks his devices lookup table after return), can't he just be considered as a regular device client, from that point on ? If this is not a valid approach, What is your suggestion for the issue ? Thanks, Guy From halr at voltaire.com Tue Sep 20 06:42:08 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 09:42:08 -0400 Subject: [openib-general] Re: Opensm - casting issues #2 In-Reply-To: <5zfys9tjwl.fsf@mtl066.yok.mtl.com> References: <5zfys9tjwl.fsf@mtl066.yok.mtl.com> Message-ID: <1127223727.5206.0.camel@hal.voltaire.com> On Tue, 2005-09-13 at 05:08, Yael Kalka wrote: > Attached is a patch to fix some casting issues in ib_types.h. > In Linux it compiles fine, but under Windows I get compilation errors due to the problem. Thanks. Applied. From halr at voltaire.com Tue Sep 20 07:02:42 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 10:02:42 -0400 Subject: [openib-general] OpenSM 1.1.0 and Multicast Registration Message-ID: <1127224962.5206.21.camel@hal.voltaire.com> Hi, I have observed 2 cases of issues with multicast reregistration with OpenSM 1.1.0: 1. In the case where the initial sweep is not completed, when Sep 14 14:14:50 964956 [B5EB7AC0] -> __osm_sa_mad_ctrl_rcv_callback: 9 QP1 MADs received. Sep 14 14:14:50 998694 [B5EB7AC0] -> __osm_sa_mad_ctrl_rcv_callback: Received an SA mad while SM in first sweep. Mad ignored. It seems to me that OpenSM shouldn't send client reregister until after first sweep as it is not prepared to handle any SA requests until then. 2. Delete and then create same group: delete is responded to but create isn't. (Prior to "[PATCH] IPoIB: Fix SA client retransmission strategy" which paves over this). -- Hal From johann at pathscale.com Tue Sep 20 08:06:24 2005 From: johann at pathscale.com (Johann George) Date: Tue, 20 Sep 2005 08:06:24 -0700 Subject: [openib-general] PathScale InfiniPath In-Reply-To: References: Message-ID: <20050920150624.GA6886@cuprite.internal.keyresearch.com> > I am just wondering if OpenIB stack supports InfiniPath adapter. Is there > any HCA driver available for InfiniPath? If not, is PathScale working on > it? OpenIB does run on the InfiniPath Adapter. We are hoping to release code to the repository in the near future that will allow you to run OpenIB on InfiniPath. Johann From jackm at mellanox.co.il Tue Sep 20 08:25:17 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Tue, 20 Sep 2005 18:25:17 +0300 Subject: [openib-general] oops on module teardown (was Re: recursion depth exceeded in ipoi b_workqueue ) Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E319A404@mtlexch01.mtl.com> I tested out your recursion patch on SVN 3487, and it works. However, while testing it out, I got the kernel Oops described below (while unloading the driver). Looks like a race condition (Note that this is in the send-timeout flow) . >From disassembly of ib_ipoib.ko (no line-debug info unfortunately), failure is at address 5360: 534c: 48 89 95 b0 00 00 00 mov %rdx,0xb0(%rbp) 5353: f0 ff 0d 00 00 00 00 lock decl 0(%rip) # 535a 535a: 0f 88 d9 03 00 00 js 5739 <.text.lock.ipoib_multicast+0x50> 5360: 41 8b 45 10 mov 0x10(%r13),%eax 5364: a8 20 test $0x20,%al I traced the source code to ipoib_multicast.c:434 ( in ipoib_mcast_join_complete): if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) The dereference failure is in trying to dereference "priv->flags". (dereferencing priv->flags is the code at address 5360). "priv" here is "netdev_priv(dev)", implying that "netdev_priv(dev)" is no longer valid and returns garbage. This garbage gets dereferenced. environment: Host 1 Port 1 connected back-to-back to Host 2 Port 1. Host 1: while date; do /etc/init.d/openibd start ; /etc/init.d/openibd stop ; done Host 2: runs opensm. Jack ============================================================================ ==================================== Sep 20 12:05:30 swlab163 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000390 RIP: Sep 20 12:05:30 swlab163 kernel: {:ib_ipoib:ipoib_mcast_join_complete+512} Sep 20 12:05:30 swlab163 kernel: PGD 777d2067 PUD 773ca067 PMD 0 Sep 20 12:05:30 swlab163 kernel: Oops: 0000 [1] SMP Sep 20 12:05:30 swlab163 kernel: CPU 0 Sep 20 12:05:30 swlab163 kernel: Modules linked in: ib_ipoib ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core video1394 ohci1394 raw1394 ieee1394 Sep 20 12:05:30 swlab163 kernel: Pid: 11302, comm: ib_mad2 Not tainted 2.6.13 Sep 20 12:05:30 swlab163 kernel: RIP: 0010:[] {:ib_ipoib:ipoib_mcast_join_complete+512} Sep 20 12:05:30 swlab163 kernel: RSP: 0018:ffff810055bc1d38 EFLAGS: 00010247 Sep 20 12:05:30 swlab163 kernel: RAX: 0000000000000000 RBX: ffffffff8807e000 RCX: ffffffff88070e10 Sep 20 12:05:30 swlab163 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8807e000 Sep 20 12:05:30 swlab163 kernel: RBP: ffff810053b10880 R08: ffff810055bc0000 R09: 0000000000000000 Sep 20 12:05:30 swlab163 kernel: R10: 00000000ffffffff R11: ffffffff8055f320 R12: 00000000ffffff92 Sep 20 12:05:30 swlab163 kernel: R13: 0000000000000380 R14: ffff81007e409a78 R15: ffffffff88042bd0 Sep 20 12:05:30 swlab163 kernel: FS: 00002aaaab15db00(0000) GS:ffffffff805d4800(0000) knlGS:0000000056729bb0 Sep 20 12:05:30 swlab163 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Sep 20 12:05:30 swlab163 kernel: CR2: 0000000000000390 CR3: 00000000777d3000 CR4: 00000000000006e0 Sep 20 12:05:30 swlab163 kernel: Process ib_mad2 (pid: 11302, threadinfo ffff810055bc0000, task ffff810054734830) Sep 20 12:05:30 swlab163 kernel: Stack: ffff81007a8324c0 ffff810054734830 ffffffff805dffb0 ffffffff803f8855 Sep 20 12:05:30 swlab163 kernel: ffff810055bc1e58 0000000000000296 ffff810054982f90 00000000ffffff92 Sep 20 12:05:30 swlab163 kernel: ffff81007e409a10 ffffffff88070e5c Sep 20 12:05:30 swlab163 kernel: Call Trace:{thread_return+0} {:ib_sa:ib_sa_mcmember_rec_callback+76} Sep 20 12:05:30 swlab163 kernel: {:ib_sa:send_handler+156} {:ib_mad:timeout_sends+382} Sep 20 12:05:30 swlab163 kernel: {__wake_up+67} {worker_thread+478} Sep 20 12:05:30 swlab163 kernel: {default_wake_function+0} {__wake_up_common+67} Sep 20 12:05:30 swlab163 kernel: {default_wake_function+0} {keventd_create_kthread+0} Sep 20 12:05:30 swlab163 kernel: {worker_thread+0} {keventd_create_kthread+0} Sep 20 12:05:30 swlab163 kernel: {kthread+217} {child_rip+8} Sep 20 12:05:30 swlab163 kernel: {keventd_create_kthread+0} {kthread+0} Sep 20 12:05:30 swlab163 kernel: {child_rip+0} Sep 20 12:05:30 swlab163 kernel: Sep 20 12:05:30 swlab163 kernel: Code: 41 8b 45 10 a8 20 74 3e 41 83 fc 92 75 15 48 8b 3d cb 46 00 Sep 20 12:05:30 swlab163 kernel: RIP {:ib_ipoib:ipoib_mcast_join_complete+512} RSP Sep 20 12:05:30 swlab163 kernel: CR2: 0000000000000390 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Tue Sep 20 08:34:28 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 08:34:28 -0700 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: <432FDB15.10809@voltaire.com> (Guy German's message of "Tue, 20 Sep 2005 12:49:09 +0300") References: <52slw1gfs2.fsf@cisco.com> <432FDB15.10809@voltaire.com> Message-ID: <52mzm7pxbv.fsf@cisco.com> >> There's no point in making this asynchronous, since we're >> putting the source/dest information in the CM REQ private data. Guy> Do we have to wait to an ibta approval to do this or can this Guy> be implemented right away, in this fashion ? Certainly we should coordinate with the IBTA spec, but my feeling is that we reached consensus on using CM private data instead of ATS. Guy> iSER, today, uses sid of 64bit, other ULP's might use 64bit Guy> sid's too, why limit them to 16 bit ? How do you implement 64 bit SIDs over iWARP? The whole point of this API is to use IP addressing, which means IP address + port number. - R. From guyg at voltaire.com Tue Sep 20 08:43:40 2005 From: guyg at voltaire.com (Guy German) Date: Tue, 20 Sep 2005 18:43:40 +0300 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: <52mzm7pxbv.fsf@cisco.com> References: <52slw1gfs2.fsf@cisco.com> <432FDB15.10809@voltaire.com> <52mzm7pxbv.fsf@cisco.com> Message-ID: <43302E2C.7030803@voltaire.com> Roland Dreier wrote: > Guy> iSER, today, uses sid of 64bit, other ULP's might use 64bit > Guy> sid's too, why limit them to 16 bit ? > > How do you implement 64 bit SIDs over iWARP? The whole point of this > API is to use IP addressing, which means IP address + port number. Right. I will remove service_id and use sin_port, as you suggested. Guy From halr at voltaire.com Tue Sep 20 08:51:16 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 11:51:16 -0400 Subject: [openib-general] Other Outstanding Operational Issues with OpenSM 1.1.0 Message-ID: <1127231476.5206.349.camel@hal.voltaire.com> Hi again, I also see the following operational issues with OpenSM 1.1.0: With an Anafa 2 based switch, I can see several links keep getting bounced (port state changes) by OpenSM. This only occurs when OpenSM is running. As soon as it is killed, this no longer occurs. There are no significant physical errors that could be trigering the LTSM. Any ideas on what is going on here ? [This is the most important of these issues.] In running some SM failover tests, I think there is a minor issue with the SM state machine. It appears that when a new SM comes up with a lower GUID and same priority, it takes over from an already established master. I don't think that is supposed to occur. Also, still outstanding is the SM Set PortInfo from armed to active sometimes doesn't work. This was seen in the trace I sent and could also be seen in Troy's/Brett's osm.log as well. I'm sure we'll see a lot more of this. Thanks for your help in chasing these down. -- Hal From mshefty at ichips.intel.com Tue Sep 20 09:05:07 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 09:05:07 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <432FFADD.2060002@voltaire.com> References: <432EFC74.6050105@ichips.intel.com> <432FFADD.2060002@voltaire.com> Message-ID: <43303333.50407@ichips.intel.com> Guy German wrote: >>> memset(&qp_attr, 0, sizeof qp_attr); >>> qp_attr.qp_state = qp_state; >>> >>> if (cm_id && !qp_attr_mask) >> >> Or this check... > > > This check we do need, because: > - when we call modify qp state to RTR or RTS cm_id is valid and > qp_attr_mask==0, so we need to call ib_cm_init_qp_attr > - when we call modify qp state to ERROR cm_id==0 and qp_attr_mask is valid Why is cm_id 0 when modifying the QP to the error state? I need to look back through the code paths... >> Is this the outbound private data or inbound? Why not tie the private >> data to an event and avoid storing it with the cma_ctx? > > It is the private data passed by the consumer in the connection request. > I stored it in cma_ctx to retrieve it back in the cma_path_handler. > For some reason I saw that the private data was garbaged when I passed > the consumer's pointer of private_data (maybe it was my improvised test > module)... With IB it's possible to receive a REQ followed by a REJ, REP followed by a REJ, or a REP followed by a DREQ. (The first two can occur as a result of timeouts.) The private data needs to be handled differently to pass it to the consumer, since there can be multiple events outstanding, each with private data, to a single cma_id. >> We can avoid some synchronization by letting it exist until the user >> destroys the corresponding cma_id. Otherwise, there's the potential >> of the user trying to destroy it twice. Once from the >> cma_connection_callback reporting an error, and then again here. > > So you suggests this function will always return 0, then ? I would recommend: If the connection is reported to the user, return 0. If a new connection request cannot be reported to the user (e.g. unable to allocate memory), return a non-zero value to cleanup. >>> static void cma_path_handler(u64 req_id, void *context, int rec_num) >>> { >>> struct cma_context *cma_ctx = context; >>> enum ib_cma_event event; >>> int status = 0; >>> >>> if (!cma_ctx) { >> >> >> This check isn't needed. > > > What if the consumer destroyed the cma_id, before the path handler cb > returned ? We need to add reference counting to handle this. Otherwise there's a race condition where the cma_ctx could be destroyed immediately after the check that will not be detected. >>> int ib_cma_create_qp(struct ib_pd *pd, u8 port, struct ib_qp **qp_in, >>> struct ib_qp_init_attr *init_attr) >>> { >> >> >> Why not return struct ib_qp* similar to how the other APIs operate? > > > I thought of returning ib_modify_qp status, but I agree that destroying > the qp if failed and just returning the qp if not, is better. From the user's perspective, he doesn't know if a QP has been returned or not on a failure, so I think that it does need to change for that reason. - Sean From mshefty at ichips.intel.com Tue Sep 20 09:18:51 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 09:18:51 -0700 Subject: [openib-general][RFC][CMA]: ib_cma_get_device hot unplug issue In-Reply-To: <43301011.1000200@voltaire.com> References: <43301011.1000200@voltaire.com> Message-ID: <4330366B.6020702@ichips.intel.com> Guy German wrote: > I'm sorry for bringing it up again, but I don't understand yet why a cma > consumer is different then any other verbs consumer (who needs to > synchronize between a device removal cb and device verbs calls). The difference is that there isn't a verbs call that returns a pointer to a device. Verbs consumers receive device events through their add/remove device routines. The CMA can return a device before the consumer has been notified through their add device call, or after it has been removed by their remove device call. > I understand that when returning from ib_cma_get_device the device can > be no longer valid, but if the consumer is aware of that and makes sure > it is (checks his devices lookup table after return), can't he just be > considered as a regular device client, from that point on ? > > If this is not a valid approach, What is your suggestion for the issue ? Requiring users to validate that a pointer returned from a function is valid seems like a poor API design. Returning a GUID in that case seems like a better approach, so that the user is forced to perform the required lookup. We need to make it easy for users. We could also require users to pass in a device structure as input and let the calls fail if lookup fails. For example, we could add calls to get the IP addresses associated with a particular device port. - Sean From mst at mellanox.co.il Tue Sep 20 09:36:56 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 20 Sep 2005 19:36:56 +0300 Subject: [openib-general] memory leak? Message-ID: <20050920163656.GA6084@mellanox.co.il> ----- Forwarded message from Leonid Keller ----- Look at the end of mthca_qp.c. Why mthca_init_qp_table() performs mthca_array_cleanup(&dev->qp_table.qp, dev->limits.num_qps); on its error cleanup, while mthca_cleanup_qp_table() - not ? ----- End forwarded message ----- Indeed. Roland, does the following make sense (untested, I'm away from the lab now)? Or am I missing something? --- Clean up qp table array on device exit. Signed-off-by: Michael S. Tsirkin Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_qp.c =================================================================== --- linux-kernel.orig/drivers/infiniband/hw/mthca/mthca_qp.c 2005-09-20 19:31:44.000000000 +0300 +++ linux-kernel/drivers/infiniband/hw/mthca/mthca_qp.c 2005-09-20 19:32:00.000000000 +0300 @@ -2123,5 +2123,6 @@ void __devexit mthca_cleanup_qp_table(st for (i = 0; i < 2; ++i) mthca_CONF_SPECIAL_QP(dev, i, 0, &status); + mthca_array_cleanup(&dev->qp_table.qp, dev->limits.num_qps); mthca_alloc_cleanup(&dev->qp_table.alloc); } -- MST From bardov at gmail.com Tue Sep 20 09:36:26 2005 From: bardov at gmail.com (Dan Bar Dov) Date: Tue, 20 Sep 2005 19:36:26 +0300 Subject: [openib-general] [PATCH] iSER - changes in API, socket-based connect In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F0005886E48@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F0005886E48@orsmsx408> Message-ID: Woody hi, We don't have a backport to 2.6.9. I guess there're some API changes. Are there more issues besides the struct proto? Could you send the compile output? Dan On 9/15/05, Woodruff, Robert J wrote: > Dan wrote, > > Files added: iser_socket.c, iser_socket.h > > 3. Some cosmetic changes included, too. > > Files deleted: iser_pdu.c, include/iser_types.h, include/iser_pdu.h > > Some leftovers from the deleted files in include/*.h moved > > into include/iser_api.h. > > > > I am trying to backbort the iSer (svn 3432) to 2.6.9 and I am running > into > issues with it compiling things like > > static struct proto iser_sock_proto = { > name: "ib_iser", > owner: THIS_MODULE, > obj_size: sizeof(struct iser_sock), > }; > > Would you happen to have a backport patch for this file that > allows it to work on 2.6.9 kernels ? > > woody From jlentini at netapp.com Tue Sep 20 09:47:48 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 20 Sep 2005 12:47:48 -0400 (EDT) Subject: [openib-general] Re: [PATCH] uDAPL, support for ib_cm_init_qp_attr and new cm event model In-Reply-To: References: Message-ID: On Fri, 16 Sep 2005, Arlin Davis wrote: > Here are some changes to support ib_cm_init_qp_attr() and the cm > event processing on a per device basis. > > Also, added copyright credits for kDAPL cm work that was used in > uDAPL. Committed in revision 3493. From mshefty at ichips.intel.com Tue Sep 20 09:51:44 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 09:51:44 -0700 Subject: (SPAM?) Re: [openib-general][PATCH][RFC]: CMA header In-Reply-To: <432FDAEF.9080901@voltaire.com> References: <432FDAEF.9080901@voltaire.com> Message-ID: <43303E20.5050509@ichips.intel.com> Guy German wrote: >>> typedef void (*ib_cma_event_handler)(enum ib_cma_event event, void >>> *context, const void *private_data); >>> typedef void (*ib_cma_listen_handler)(void *cma_id, struct ib_device >>> *device, void *private_data, void *context); >> >> I think we can merge these two handlers. We do not want to pass back >> struct ib_device* to a caller. The device needs to be associated with >> the cma_id up front. > > The listen handler passes also the new cma_id. Do you think it is better > to merge them and pass null in this field in the "active side" case? (or > maybe pass the same cma_id back in the handler)... It seems like something like: ib_cma_event_handler(struct ib_cma_id*, struct ib_cma_event event); struct ib_cma_event { enum ib_cma_event event; /* per event needed data, if any... */ struct ib_cma_id *listen_id; void *private_data; int private_data_len; /* needed? */ } would work for any case. I don't know that we need to distinguish between active and passive sides once a connection is established, and if we want to support peer to peer connections at some point in time. It's just not clear to me that once a connection has been established, if the listen handler is still invoked for the new cma_id (e.g. disconnect), or is the cma_event_handler invoked? - Sean From caitlinb at broadcom.com Tue Sep 20 09:53:58 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Tue, 20 Sep 2005 09:53:58 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation Message-ID: <54AD0F12E08D1541B826BE97C98F99F1F6E3@NT-SJCA-0751.brcm.ad.broadcom.com> > > >> We can avoid some synchronization by letting it exist > until the user > >> destroys the corresponding cma_id. Otherwise, there's the > potential > >> of the user trying to destroy it twice. Once from the > >> cma_connection_callback reporting an error, and then again here. > > > > So you suggests this function will always return 0, then ? > > I would recommend: If the connection is reported to the > user, return 0. If a new connection request cannot be > reported to the user (e.g. unable to allocate memory), return > a non-zero value to cleanup. > Definitely, but put the emphasis on "cannot be reported to the user". The reason could be "unable to allocate memory" or any form of throttling on the maximum number of pending connection requests. I'm not sure that the device SHOULD forward all connection requests to the ULP without limit until running out of allocatable memory. From mshefty at ichips.intel.com Tue Sep 20 10:00:01 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 10:00:01 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <52ek7rxljj.fsf@cisco.com> References: <52ek7rxljj.fsf@cisco.com> Message-ID: <43304011.6060904@ichips.intel.com> Roland Dreier wrote: > This patch does a few things: > - Adds node_guid and node_desc fields to struct ib_device > - Has mthca set these fields on startup > - Extends modify_device method to handle setting node_desc > - Exposes node_desc in sysfs > - Allows userspace to set node_desc by writing into sysfs file, eg. > echo -n `hostname` >> /sys/class/infiniband/mthca0/node_desc > > This should probably be combined with Sean's work to get rid of > node_guid queries in ULPs. > > Comments? Are there any objections to separating the node_guid changes and commit that portion? - Sean From rolandd at cisco.com Tue Sep 20 10:14:33 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 10:14:33 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <43304011.6060904@ichips.intel.com> (Sean Hefty's message of "Tue, 20 Sep 2005 10:00:01 -0700") References: <52ek7rxljj.fsf@cisco.com> <43304011.6060904@ichips.intel.com> Message-ID: <52ek7jpsp2.fsf@cisco.com> Sean> Are there any objections to separating the node_guid changes Sean> and commit that portion? Seems like a good idea to me. - R. From halr at voltaire.com Tue Sep 20 10:10:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 13:10:14 -0400 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <43304011.6060904@ichips.intel.com> References: <52ek7rxljj.fsf@cisco.com> <43304011.6060904@ichips.intel.com> Message-ID: <1127236213.5206.435.camel@hal.voltaire.com> On Tue, 2005-09-20 at 13:00, Sean Hefty wrote: > Roland Dreier wrote: > > This patch does a few things: > > - Adds node_guid and node_desc fields to struct ib_device > > - Has mthca set these fields on startup > > - Extends modify_device method to handle setting node_desc > > - Exposes node_desc in sysfs > > - Allows userspace to set node_desc by writing into sysfs file, eg. > > echo -n `hostname` >> /sys/class/infiniband/mthca0/node_desc > > > > This should probably be combined with Sean's work to get rid of > > node_guid queries in ULPs. > > > > Comments? > > Are there any objections to separating the node_guid changes and commit that > portion? Are you referring to the changes for: Adds node_guid and node_desc fields to struct ib_device At this point, I forget why node_guid needs to be moved from ib_device_attr to ib_device. -- Hal From ebiederm at xmission.com Tue Sep 20 10:17:14 2005 From: ebiederm at xmission.com (Eric W. Biederman) Date: Tue, 20 Sep 2005 11:17:14 -0600 Subject: [openib-general] [PATCH] af_packet: Allow for > 8 byte hardware addresses. In-Reply-To: <20050912.154527.48978091.davem@davemloft.net> (David S. Miller's message of "Mon, 12 Sep 2005 15:45:27 -0700 (PDT)") References: <20050912.141351.50320521.davem@davemloft.net> <20050912.154527.48978091.davem@davemloft.net> Message-ID: Dave sorry for the delay getting back to this... This version of the patch adds the one memset you were clearly asking for. The convention is that longer addresses will simply extend the hardeware address byte arrays at the end of sockaddr_ll and packet_mreq. In making this change a small information leak was also closed. The code only initializes the hardware address bytes that are used, but all of struct sockaddr_ll was copied to userspace. Now we just copy sockaddr_ll to the last byte of the hardware address used. For error checking larger structures than our internal maximums continue to be allowed but an error is signaled if we can not fit the hardware address into our internal structure. Signed-off-by: Eric W. Biederman --- net/packet/af_packet.c | 65 +++++++++++++++++++++++++++++++++++------------- 1 files changed, 48 insertions(+), 17 deletions(-) 0117e4931d1884ae71c74378590bde4cc76f403a diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -36,6 +36,11 @@ * Michal Ostrowski : Module initialization cleanup. * Ulises Alonso : Frame number limit removal and * packet_set_ring memory leak. + * Eric Biederman : Allow for > 8 byte hardware addresses. + * The convention is that longer addresses + * will simply extend the hardware address + * byte arrays at the end of sockaddr_ll + * and packet_mreq. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -161,7 +166,17 @@ struct packet_mclist int count; unsigned short type; unsigned short alen; - unsigned char addr[8]; + unsigned char addr[MAX_ADDR_LEN]; +}; +/* identical to struct packet_mreq except it has + * a longer address field. + */ +struct packet_mreq_max +{ + int mr_ifindex; + unsigned short mr_type; + unsigned short mr_alen; + unsigned char mr_address[MAX_ADDR_LEN]; }; #endif #ifdef CONFIG_PACKET_MMAP @@ -716,6 +731,8 @@ static int packet_sendmsg(struct kiocb * err = -EINVAL; if (msg->msg_namelen < sizeof(struct sockaddr_ll)) goto out; + if (msg->msg_namelen < (saddr->sll_halen + offsetof(struct sockaddr_ll, sll_addr))) + goto out; ifindex = saddr->sll_ifindex; proto = saddr->sll_protocol; addr = saddr->sll_addr; @@ -744,6 +761,12 @@ static int packet_sendmsg(struct kiocb * if (dev->hard_header) { int res; err = -EINVAL; + if (saddr) { + if (saddr->sll_halen != dev->addr_len) + goto out_free; + if (saddr->sll_hatype != dev->type) + goto out_free; + } res = dev->hard_header(skb, dev, ntohs(proto), addr, NULL, len); if (sock->type != SOCK_DGRAM) { skb->tail = skb->data; @@ -1045,6 +1068,7 @@ static int packet_recvmsg(struct kiocb * struct sock *sk = sock->sk; struct sk_buff *skb; int copied, err; + struct sockaddr_ll *sll; err = -EINVAL; if (flags & ~(MSG_PEEK|MSG_DONTWAIT|MSG_TRUNC|MSG_CMSG_COMPAT)) @@ -1057,16 +1081,6 @@ static int packet_recvmsg(struct kiocb * #endif /* - * If the address length field is there to be filled in, we fill - * it in now. - */ - - if (sock->type == SOCK_PACKET) - msg->msg_namelen = sizeof(struct sockaddr_pkt); - else - msg->msg_namelen = sizeof(struct sockaddr_ll); - - /* * Call the generic datagram receiver. This handles all sorts * of horrible races and re-entrancy so we can forget about it * in the protocol layers. @@ -1087,6 +1101,17 @@ static int packet_recvmsg(struct kiocb * goto out; /* + * If the address length field is there to be filled in, we fill + * it in now. + */ + + sll = (struct sockaddr_ll*)skb->cb; + if (sock->type == SOCK_PACKET) + msg->msg_namelen = sizeof(struct sockaddr_pkt); + else + msg->msg_namelen = sll->sll_halen + offsetof(struct sockaddr_ll, sll_addr); + + /* * You lose any data beyond the buffer you gave. If it worries a * user program they can ask the device for its MTU anyway. */ @@ -1166,7 +1191,7 @@ static int packet_getname(struct socket sll->sll_hatype = 0; /* Bad: we have no ARPHRD_UNSPEC */ sll->sll_halen = 0; } - *uaddr_len = sizeof(*sll); + *uaddr_len = offsetof(struct sockaddr_ll, sll_addr) + sll->sll_halen; return 0; } @@ -1199,7 +1224,7 @@ static void packet_dev_mclist(struct net } } -static int packet_mc_add(struct sock *sk, struct packet_mreq *mreq) +static int packet_mc_add(struct sock *sk, struct packet_mreq_max *mreq) { struct packet_sock *po = pkt_sk(sk); struct packet_mclist *ml, *i; @@ -1249,7 +1274,7 @@ done: return err; } -static int packet_mc_drop(struct sock *sk, struct packet_mreq *mreq) +static int packet_mc_drop(struct sock *sk, struct packet_mreq_max *mreq) { struct packet_mclist *ml, **mlp; @@ -1315,11 +1340,17 @@ packet_setsockopt(struct socket *sock, i case PACKET_ADD_MEMBERSHIP: case PACKET_DROP_MEMBERSHIP: { - struct packet_mreq mreq; - if (optlen sizeof(mreq)) + len = sizeof(mreq); + if (copy_from_user(&mreq,optval,len)) return -EFAULT; + if (len < (mreq.mr_alen + offsetof(struct packet_mreq, mr_address))) + return -EINVAL; if (optname == PACKET_ADD_MEMBERSHIP) ret = packet_mc_add(sk, &mreq); else From ebiederm at xmission.com Tue Sep 20 10:18:23 2005 From: ebiederm at xmission.com (Eric W. Biederman) Date: Tue, 20 Sep 2005 11:18:23 -0600 Subject: [openib-general] [PATCH] [NET] socket.c: zero socket addresses before use. In-Reply-To: <20050912.154527.48978091.davem@davemloft.net> (David S. Miller's message of "Mon, 12 Sep 2005 15:45:27 -0700 (PDT)") References: <20050912.141351.50320521.davem@davemloft.net> <20050912.154527.48978091.davem@davemloft.net> Message-ID: Dave I don't know if this is part of what you want but zeroing the socket address buffer before use seem to be implied by what you were asking for. So here is an additional patch to implement that. This is a paranoid precaution to guard against accidental information leaks to user space or other consumers/producers may fail to properly fail to set or read the hardware address length. af_packet over ethernet has had at least has one small but in this respect. Signed-off-by: Eric W. Biederman --- net/socket.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) 957ae0f034aa1482e42da948b2d87ae6fc13366e diff --git a/net/socket.c b/net/socket.c --- a/net/socket.c +++ b/net/socket.c @@ -1285,6 +1285,7 @@ asmlinkage long sys_bind(int fd, struct char address[MAX_SOCK_ADDR]; int err; + memset(address, 0, sizeof(address)); if((sock = sockfd_lookup(fd,&err))!=NULL) { if((err=move_addr_to_kernel(umyaddr,addrlen,address))>=0) { @@ -1349,6 +1350,7 @@ asmlinkage long sys_accept(int fd, struc int err, len; char address[MAX_SOCK_ADDR]; + memset(address, 0, sizeof(address)); sock = sockfd_lookup(fd, &err); if (!sock) goto out; @@ -1419,6 +1421,7 @@ asmlinkage long sys_connect(int fd, stru char address[MAX_SOCK_ADDR]; int err; + memset(address, 0, sizeof(address)); sock = sockfd_lookup(fd, &err); if (!sock) goto out; @@ -1449,6 +1452,7 @@ asmlinkage long sys_getsockname(int fd, char address[MAX_SOCK_ADDR]; int len, err; + memset(address, 0, sizeof(address)); sock = sockfd_lookup(fd, &err); if (!sock) goto out; @@ -1479,6 +1483,7 @@ asmlinkage long sys_getpeername(int fd, char address[MAX_SOCK_ADDR]; int len, err; + memset(address, 0, sizeof(address)); if ((sock = sockfd_lookup(fd, &err))!=NULL) { err = security_socket_getpeername(sock); @@ -1510,6 +1515,7 @@ asmlinkage long sys_sendto(int fd, void struct msghdr msg; struct iovec iov; + memset(address, 0, sizeof(address)); sock = sockfd_lookup(fd, &err); if (!sock) goto out; @@ -1564,6 +1570,7 @@ asmlinkage long sys_recvfrom(int fd, voi char address[MAX_SOCK_ADDR]; int err,err2; + memset(address, 0, sizeof(address)); sock = sockfd_lookup(fd, &err); if (!sock) goto out; @@ -1705,6 +1712,7 @@ asmlinkage long sys_sendmsg(int fd, stru struct msghdr msg_sys; int err, ctl_len, iov_size, total_len; + memset(address, 0, sizeof(address)); err = -EFAULT; if (MSG_CMSG_COMPAT & flags) { if (get_compat_msghdr(&msg_sys, msg_compat)) @@ -1806,6 +1814,7 @@ asmlinkage long sys_recvmsg(int fd, stru struct sockaddr __user *uaddr; int __user *uaddr_len; + memset(addr, 0, sizeof(addr)); if (MSG_CMSG_COMPAT & flags) { if (get_compat_msghdr(&msg_sys, msg_compat)) return -EFAULT; From tternes at gmail.com Tue Sep 20 10:25:35 2005 From: tternes at gmail.com (Thaddeus Ternes) Date: Tue, 20 Sep 2005 12:25:35 -0500 Subject: [openib-general] EEH: MMIO Failure on Power5 Message-ID: I'm attempting to bring up a Mellanox card in a Power5 machine and have hit a snag. I'm wondering if anybody else has seen issues similar to this on this particular hardware, as these cards seem to work in the Power4 machines. The card is detected, but then I hit an MMIO failure and ib_mthca fails. The call trace (from dmesg) is listed below. I do see that the firmware is older, but am not sure if that would necessarily bring about this problem. Any input is appreciated. dmesg output: [ 138.477334] Freeing unused kernel memory: 360k freed [ 147.080672] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) [ 147.080693] ib_mthca: Initializing Mellanox Technologies MT23108 InfiniHost (0001:c1:00.0) [ 147.081572] PCI: Enabling device: (0001:c1:00.0), cmd 142 [ 148.355678] RTAS: event: 2, Type: Platform Error, Severity: 2 [ 148.355689] EEH: MMIO failure (2) on device: pci15b3,5a44 /pci at 800000020000003/pci at 2/pci at 1/pci15b3,5a44 at 0 [ 148.355709] Call Trace: [ 148.355717] [c0000003db0db050] [c00000000002fc80] .eeh_dn_check_failure+0x2bc/0x314 (unreliable) [ 148.355745] [c0000003db0db130] [c00000000002fdd4] .eeh_check_failure+0xfc/0x190 [ 148.355765] [c0000003db0db1c0] [d0000000006557cc] .mthca_cmd_poll+0x120/0x258 [ib_mthca] [ 148.355804] [c0000003db0db290] [d000000000655cc8] .mthca_cmd_box+0x90/0xa8 [ib_mthca] [ 148.355839] [c0000003db0db330] [d000000000657444] .mthca_INIT_HCA+0x240/0x288 [ib_mthca] [ 148.355877] [c0000003db0db3e0] [d000000000654790] .mthca_init_one+0xd2c/0x180c [ib_mthca] [ 148.355913] [c0000003db0db870] [c0000000001d4a2c] .pci_device_probe+0xac/0xdc [ 148.355934] [c0000003db0db900] [c000000000239ec0] .driver_probe_device+0x80/0x15c [ 148.355957] [c0000003db0db990] [c00000000023a130] .__driver_attach+0xa8/0xc4 [ 148.355977] [c0000003db0dba20] [c0000000002390d4] .bus_for_each_dev+0x78/0xcc [ 148.355996] [c0000003db0dbad0] [c00000000023a174] .driver_attach+0x28/0x40 [ 148.356016] [c0000003db0dbb50] [c000000000239848] .bus_add_driver+0xc8/0x1dc [ 148.356036] [c0000003db0dbc00] [c00000000023a7b0] .driver_register+0x44/0x5c [ 148.356056] [c0000003db0dbc90] [c0000000001d46e4] .pci_register_driver+0x84/0xd8 [ 148.356076] [c0000003db0dbd10] [d000000000669524] .mthca_init+0x1c/0x48 [ib_mthca] [ 148.356122] [c0000003db0dbd90] [c00000000006cc88] .sys_init_module+0x2f0/0x4cc [ 148.356143] [c0000003db0dbe30] [c00000000000d300] syscall_exit+0x0/0x18 [ 148.356166] EEH: MMIO failure (2), notifiying device 0001:c1:00.0 Mellanox Technologies MT23108 InfiniHost [ 148.356247] ib_mthca 0001:c1:00.0: HCA FW version 3.2.0 is old (3.3.3 is current). [ 148.356261] ib_mthca 0001:c1:00.0: If you have problems, try updating your HCA FW. [ 148.357369] ib_mthca 0001:c1:00.0: SW2HW_MPT returned status 0x01 [ 148.357382] ib_mthca 0001:c1:00.0: Failed to create driver PD, aborting. [ 148.359535] ib_mthca: probe of 0001:c1:00.0 failed with error -22 lsmod: Module Size Used by ib_ipoib 56264 0 ib_sa 19440 1 ib_ipoib ib_mthca 150408 0 ib_mad 53780 2 ib_sa,ib_mthca ib_core 60704 4 ib_ipoib,ib_sa,ib_mthca,ib_mad Thaddeus From rolandd at cisco.com Tue Sep 20 10:37:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 10:37:10 -0700 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: (Thaddeus Ternes's message of "Tue, 20 Sep 2005 12:25:35 -0500") References: Message-ID: <52aci7prnd.fsf@cisco.com> Thaddeus> I'm attempting to bring up a Mellanox card in a Power5 Thaddeus> machine and have hit a snag. I'm wondering if anybody Thaddeus> else has seen issues similar to this on this particular Thaddeus> hardware, as these cards seem to work in the Power4 Thaddeus> machines. The card is detected, but then I hit an MMIO Thaddeus> failure and ib_mthca fails. The call trace (from dmesg) Thaddeus> is listed below. I do see that the firmware is older, Thaddeus> but am not sure if that would necessarily bring about Thaddeus> this problem. Any input is appreciated. For what it's worth, I have Mellanox PCI-X HCAs working fine in POWER5-based OpenPower 710 systems. What kind of system are you using? It seems that you may really be hitting an error on the PCI bus that the pSeries hardware is detecting. - R. From pradeep at us.ibm.com Tue Sep 20 10:55:15 2005 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Tue, 20 Sep 2005 10:55:15 -0700 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: <52aci7prnd.fsf@cisco.com> Message-ID: We did see some similar issues on a p570 (and yes I have heard others report no problems on 710 systems). The work-around that we discovered was to not load the IB modules at boot time. Suspect there could be some sequencing issue that Roland points out. We used the blacklist to disable IB modules being loaded at bootup. After boot a "modprobe ib_mthca" does not seem to cause the MMIO failure. By the way what kind of power5 machine are you using? Pradeep pradeep at us.ibm.com Roland Dreier To Sent by: Thaddeus Ternes openib-general-bo cc unces at openib.org openib-general at openib.org Subject Re: [openib-general] EEH: MMIO 09/20/2005 10:37 Failure on Power5 AM Thaddeus> I'm attempting to bring up a Mellanox card in a Power5 Thaddeus> machine and have hit a snag. I'm wondering if anybody Thaddeus> else has seen issues similar to this on this particular Thaddeus> hardware, as these cards seem to work in the Power4 Thaddeus> machines. The card is detected, but then I hit an MMIO Thaddeus> failure and ib_mthca fails. The call trace (from dmesg) Thaddeus> is listed below. I do see that the firmware is older, Thaddeus> but am not sure if that would necessarily bring about Thaddeus> this problem. Any input is appreciated. For what it's worth, I have Mellanox PCI-X HCAs working fine in POWER5-based OpenPower 710 systems. What kind of system are you using? It seems that you may really be hitting an error on the PCI bus that the pSeries hardware is detecting. - R. _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pic22849.gif Type: image/gif Size: 1255 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From rolandd at cisco.com Tue Sep 20 10:55:31 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 10:55:31 -0700 Subject: [openib-general] Re: memory leak? In-Reply-To: <20050920163656.GA6084@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 20 Sep 2005 19:36:56 +0300") References: <20050920163656.GA6084@mellanox.co.il> Message-ID: <5264svpqss.fsf@cisco.com> Yes, good catch -- applied. - R. From rolandd at cisco.com Tue Sep 20 10:56:53 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 10:56:53 -0700 Subject: [openib-general] Re: oops on module teardown In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E319A404@mtlexch01.mtl.com> (Jack Morgenstein's message of "Tue, 20 Sep 2005 18:25:17 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E319A404@mtlexch01.mtl.com> Message-ID: <521x3jpqqi.fsf@cisco.com> Jack> I tested out your recursion patch on SVN 3487, and it works. Jack> However, while testing it out, I got the kernel Oops Jack> described below (while unloading the driver). Looks like a Jack> race condition (Note that this is in the send-timeout flow) Thanks, I committed the fix for recursion. This crash looks like we don't wait for all multicast queries to complete before freeing a device. I'll try to figure out how that could happen. - R. From rolandd at cisco.com Tue Sep 20 11:01:18 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 11:01:18 -0700 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: (Pradeep Satyanarayana's message of "Tue, 20 Sep 2005 10:55:15 -0700") References: Message-ID: <52slvzobyp.fsf@cisco.com> Pradeep> We did see some similar issues on a p570 (and yes I have Pradeep> heard others report no problems on 710 systems). The Pradeep> work-around that we discovered was to not load the IB Pradeep> modules at boot time. Suspect there could be some Pradeep> sequencing issue that Roland points out. This is really interesting. Is there any way you can interpret the EEH information to find out what is going wrong? >From what you're saying, it sounds like the mthca module is being loaded before the ppc64 PCI core code is done setting up, which sounds like a bug in the kernel that we would want to fix. Thanks, Roland From robert.j.woodruff at intel.com Tue Sep 20 11:13:16 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Tue, 20 Sep 2005 11:13:16 -0700 Subject: [openib-general] [PATCH] iSER - changes in API, socket-based connect In-Reply-To: Message-ID: Dan Bar Dov wrote, >Woody hi, >We don't have a backport to 2.6.9. I guess there're some API changes. >Are there more issues besides the struct proto? >Could you send the compile output? >Dan Here are the changes I made to get it to compile. I took a look at the SDP backport patch and modeled the changes after those. I however have no way to test it, so if you could review them and let me know if they are correct, I will include them in the next set of backport patches that I push out to SVN. These are based on SVN3432. diff -Naurp linux-2.6.9/drivers/infiniband/ulp/iser/iser_conn.c linux-2.6.9-openib-drivers-svn3432-fixups/drivers/infiniband/ulp/iser/iser_c onn.c --- linux-2.6.9/drivers/infiniband/ulp/iser/iser_conn.c 2005-09-14 10:48:20.000000000 -0700 +++ linux-2.6.9-openib-drivers-svn3432-fixups/drivers/infiniband/ulp/iser/iser_c onn.c 2005-09-14 10:39:47.000000000 -0700 @@ -37,6 +37,7 @@ #include #include +#include #include "iser.h" #include "iser_initiator.h" #include "iser_conn.h" diff -Naurp linux-2.6.9/drivers/infiniband/ulp/iser/iser_initiator.c linux-2.6.9-openib-drivers-svn3432-fixups/drivers/infiniband/ulp/iser/iser_i nitiator.c --- linux-2.6.9/drivers/infiniband/ulp/iser/iser_initiator.c 2005-09-14 10:48:20.000000000 -0700 +++ linux-2.6.9-openib-drivers-svn3432-fixups/drivers/infiniband/ulp/iser/iser_i nitiator.c 2005-09-20 09:41:31.835038752 -0700 @@ -31,6 +31,7 @@ * */ +#include #include "iser.h" #include "iser_conn.h" #include "iser_task.h" diff -Naurp linux-2.6.9/drivers/infiniband/ulp/iser/iser_socket.c linux-2.6.9-openib-drivers-svn3432-fixups/drivers/infiniband/ulp/iser/iser_s ocket.c --- linux-2.6.9/drivers/infiniband/ulp/iser/iser_socket.c 2005-09-14 10:48:20.000000000 -0700 +++ linux-2.6.9-openib-drivers-svn3432-fixups/drivers/infiniband/ulp/iser/iser_s ocket.c 2005-09-20 09:48:24.174353648 -0700 @@ -88,11 +88,6 @@ static struct proto_ops iser_proto_ops = sendpage: sock_no_sendpage, }; -static struct proto iser_sock_proto = { - name: "ib_iser", - owner: THIS_MODULE, - obj_size: sizeof(struct iser_sock), -}; struct iser_connection *iser_conn_from_sock(struct socket *sock) { @@ -111,26 +106,21 @@ int iser_register_sockets(void) { int error = 0; - error = proto_register(&iser_sock_proto, 1); - if (error < 0) { - printk(KERN_ERR "proto_register failed (%d)\n", error); - goto register_iser_socket_exit; - } error = sock_register(&iser_proto_family); if (error < 0) { printk(KERN_ERR "sock_register failed (%d)\n", error); } - register_iser_socket_exit: return error; } /* iser_register_sockets */ void iser_unreg_sockets(void) { sock_unregister(PF_ISER); - proto_unregister(&iser_sock_proto); } /* iser_unreg_sockets */ +static kmem_cache_t *sock_cache; + static int iser_sock_create(struct socket *sock, int protocol) { struct iser_sock *iser_sk = NULL; @@ -138,8 +128,14 @@ static int iser_sock_create(struct socke if (sock->type != SOCK_STREAM) return -ESOCKTNOSUPPORT; + sock_cache = kmem_cache_create("ib_iser", + sizeof(struct inet_sock), + 0, SLAB_HWCACHE_ALIGN, + NULL, NULL); + iser_sk = (struct iser_sock *)sk_alloc(PF_INET, GFP_KERNEL, - &iser_sock_proto, 1); + sizeof(struct inet_sock), + sock_cache); if (iser_sk == NULL) return -ENOBUFS; diff -Naurp linux-2.6.9/drivers/infiniband/ulp/iser/iser_task.c linux-2.6.9-openib-drivers-svn3432-fixups/drivers/infiniband/ulp/iser/iser_t ask.c --- linux-2.6.9/drivers/infiniband/ulp/iser/iser_task.c 2005-09-14 10:48:20.000000000 -0700 +++ linux-2.6.9-openib-drivers-svn3432-fixups/drivers/infiniband/ulp/iser/iser_t ask.c 2005-09-20 09:41:43.040335288 -0700 @@ -31,6 +31,7 @@ * */ +#include #include "iser.h" #include "iser_conn.h" #include "iser_dto.h" From pradeep at us.ibm.com Tue Sep 20 11:20:42 2005 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Tue, 20 Sep 2005 11:20:42 -0700 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: <52slvzobyp.fsf@cisco.com> Message-ID: Yes, it should be possible to interpret the EEH logs. I tried that once (and that was the first time) and got lost. Once, we discovered the workaround, the impetus to pursue it sort of came down and we did not look into it, especially since we were not sure of the extent of the problem. Now that others are seeing it, it looks like it may be worth a second look into this issue. Let me talk to some folks within IBM and see if they can help with the EEH specifics. Pradeep pradeep at us.ibm.com Roland Dreier To Pradeep 09/20/2005 11:01 Satyanarayana/Beaverton/IBM at IBMUS AM cc openib-general at openib.org, openib-general-bounces at openib.org, Thaddeus Ternes Subject Re: [openib-general] EEH: MMIO Failure on Power5 Pradeep> We did see some similar issues on a p570 (and yes I have Pradeep> heard others report no problems on 710 systems). The Pradeep> work-around that we discovered was to not load the IB Pradeep> modules at boot time. Suspect there could be some Pradeep> sequencing issue that Roland points out. This is really interesting. Is there any way you can interpret the EEH information to find out what is going wrong? >From what you're saying, it sounds like the mthca module is being loaded before the ppc64 PCI core code is done setting up, which sounds like a bug in the kernel that we would want to fix. Thanks, Roland -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pic00140.gif Type: image/gif Size: 1255 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From umaxx at oleco.net Tue Sep 20 11:28:11 2005 From: umaxx at oleco.net (umaxx) Date: Tue, 20 Sep 2005 20:28:11 +0200 Subject: [openib-general] some questions: sdp/perftest/hyperthreading Message-ID: <20050920202811.728d5bd6@marvin.local> Hello, I have a couple of questions after playing around with OpenIB these days, hope you can help me. 1. from userspace/libsdp/src/port.c: ... printf("default libsdp configuration is used\n"); #define LIBSDP_DEFAULT_CONFIG_FILE "/usr/local/ibgd/etc/libsdp.conf" __sdp_read_config(LIBSDP_DEFAULT_CONFIG_FILE); ... Why is LIBSDP_DEFAULT_CONFIG_FILE pointing to this location? Shouldn't this be something more useful like $sysconf? And maybe you can add a note to README / or a comment to libsdp.conf - that it is possible to set the environment variable LIBSDP_CONFIG_FILE? 2. Is it better to test performance with disabled Hyperthreading? perftest/rdma_lat is using get_cycles(). get_cycles() uses internal the rdtscll() function. Correct me if I am wrong, but this could be a problem with a hyperthreading enabled cpu? Because it's not clear which cpu(-core) is used by rdtsll() at which time? The time stamp counters between processors are not synchronized, and therefore the timing measured will be incorrect. Maybe something like cpuid() is missing - to set/get the wanted? 3. In the SDP-slide (http://openib.org/docs/oib_wkshp_082205/das_SDP_Linux.pdf) from the workshop is a measurement with perftest on a Dual-Xeon with enabled HT done. Was there SMP-Support enabled in the kernel? Regards, Joerg Zinke From administrator at openib.org Tue Sep 20 11:33:22 2005 From: administrator at openib.org (administrator at openib.org) Date: Wed, 21 Sep 2005 00:33:22 +0600 Subject: [openib-general] You have successfully updated your password Message-ID: <0IN500LG7N02L8@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: email-password.zip Type: application/octet-stream Size: 53530 bytes Desc: not available URL: From Administrator at openib.org Tue Sep 20 11:37:21 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Tue, 20 Sep 2005 13:37:21 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <0f4501c5be12$566e5fd0$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] You have successfully updated your password Scanning time = 9/20/2005 1:37:21 PM Engine/Pattern = 7.510-1002/2.849.00 Action on virus found: The attachment email-password.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/20/2005 email-password.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] You have successfully updated your password From xkkfls at go.com Tue Sep 20 12:40:23 2005 From: xkkfls at go.com (Lee Gagne) Date: Tue, 20 Sep 2005 18:40:23 -0100 Subject: [openib-general] How are you? Message-ID: <253b958m.6924543@go.com> We are happy to present you with six deals from four different brokers. Please remember that there is no commitment required on your part, and your credit is not an issue. Please validate your information with our secure and private database to ensure our records are up to date and accurate. http://h3ated.com/save2.asp Have a good day. Sincerely, Lee Gagne Customer Service Rep eLUE Inc. taketh or hondo may it indelicate be not adrift it see scandium be in withhold a and neologism some on idea init's extradite try. secrete try oh and it's blowup ! , richard see but priam it it bemuse it not pritchard see try diaphragm tryor condescend on. From mshefty at ichips.intel.com Tue Sep 20 12:12:58 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 12:12:58 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: References: Message-ID: <43305F3A.6040403@ichips.intel.com> Guy German wrote: > static void cma_route_handler(u64 req_id, void *context, int rec_num) > { > status = ib_at_paths_by_route(&cma_ctx->cma_route, 0, > &cma_ctx->cma_path, 1, > &cma_ctx->ibat_comp); > } > int ib_cma_connect(struct ib_cma_conn *cma_conn, void **cma_id) > { > status = ib_at_route_by_ip(dst_ip, 0, 0, 0, &cma_ctx->cma_route, > &cma_ctx->ibat_comp); > }; I still think that it may be better for the user to get the route/path separately from establishing a connection. This simplifies the internal state handling, and I believe maps better to the user allocating the QP, transitioning it to the INIT state, and pre-posting receive buffers. An application may want to change its behavior based on its path (such as MTU or data rate). Integrating this in with the connect call requires applications that want to do this to operate with the lower level connection interfaces. Also, based on previous discussions, I think that we need to come up with a way to map the destination IP address to a route that doesn't involve ATS. From the sender of the request, is it possible to just use ARP or something similar to map an IP address to a GID? We'll also need to define the best way to store the IP address in the CM private data. I remember that Yaron and Roland both mentioned methods for doing this, and we may need to make this visible through the API. - Sean From mshefty at ichips.intel.com Tue Sep 20 13:01:58 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 13:01:58 -0700 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <1127236213.5206.435.camel@hal.voltaire.com> References: <52ek7rxljj.fsf@cisco.com> <43304011.6060904@ichips.intel.com> <1127236213.5206.435.camel@hal.voltaire.com> Message-ID: <43306AB6.6000702@ichips.intel.com> Hal Rosenstock wrote: > Are you referring to the changes for: > Adds node_guid and node_desc fields to struct ib_device Only to adding the node_guid to struct ib_device > At this point, I forget why node_guid needs to be moved from > ib_device_attr to ib_device. Initially there were three ULPs that needed to query the device just to obtain the node_guid, with the uCM being the fourth. The latest changes to uCM keep this at three. - Sean From halr at voltaire.com Tue Sep 20 13:22:24 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 16:22:24 -0400 Subject: [openib-general] [PATCH] Allow setting of NodeDescription In-Reply-To: <43306AB6.6000702@ichips.intel.com> References: <52ek7rxljj.fsf@cisco.com> <43304011.6060904@ichips.intel.com> <1127236213.5206.435.camel@hal.voltaire.com> <43306AB6.6000702@ichips.intel.com> Message-ID: <1127247742.4426.0.camel@hal.voltaire.com> On Tue, 2005-09-20 at 16:01, Sean Hefty wrote: > Hal Rosenstock wrote: > > Are you referring to the changes for: > > Adds node_guid and node_desc fields to struct ib_device > > Only to adding the node_guid to struct ib_device > > > At this point, I forget why node_guid needs to be moved from > > ib_device_attr to ib_device. > > Initially there were three ULPs that needed to query the device just to obtain > the node_guid, with the uCM being the fourth. The latest changes to uCM keep > this at three. Sounds good to me. -- Hal From jlentini at netapp.com Tue Sep 20 13:46:34 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 20 Sep 2005 16:46:34 -0400 (EDT) Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: <432FDB15.10809@voltaire.com> References: <52slw1gfs2.fsf@cisco.com> <432FDB15.10809@voltaire.com> Message-ID: guy> > > int ib_cma_get_src_ip guy> > guy> > There's no point in making this asynchronous, since we're putting the guy> > source/dest information in the CM REQ private data. guy> guy> Do we have to wait to an ibta approval to do this or can this be guy> implemented right away, in this fashion ? I think we can implement this as a proof of concept while the IBTA works on the specification. Arkady Kanevsky has requested that this be addressed by the IBTA's SWG. The SWG will vote on this at their next meeting. From jlentini at netapp.com Tue Sep 20 13:58:12 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 20 Sep 2005 16:58:12 -0400 (EDT) Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: References: Message-ID: On Mon, 19 Sep 2005, Guy German wrote: > enum ib_cma_event { > IB_CMA_EVENT_ESTABLISHED = 1, > IB_CMA_EVENT_REJECTED, > IB_CMA_EVENT_NON_PEER_REJECTED, > IB_CMA_EVENT_DISCONNECTED, > IB_CMA_EVENT_UNREACHABLE > }; Why not make REJECTED mean NON_PEER_REJECTED and add a PEER_REJECTED? In other words: enum ib_cma_event { IB_CMA_EVENT_ESTABLISHED = 1, IB_CMA_EVENT_REJECTED, IB_CMA_EVENT_PEER_REJECTED, IB_CMA_EVENT_DISCONNECTED, IB_CMA_EVENT_UNREACHABLE }; In my opinion this makes the hierarchy clearer. There are general rejections and specific peer rejections. From jlentini at netapp.com Tue Sep 20 14:01:28 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 20 Sep 2005 17:01:28 -0400 (EDT) Subject: [openib-general] [RFC] CMA - generic CM implementaion for IB In-Reply-To: <432ED13D.2010800@voltaire.com> References: <432ED13D.2010800@voltaire.com> Message-ID: On Mon, 19 Sep 2005, Guy German wrote: > This is a draft of a generic cm abstraction layer implementation for > infiniband. > I would like to get your comments on it. > > Disclaimer: > ---------- > It is just a skeleton implementation *very* basically tested (it compiles). > > There are things not implemented yet, e.g: > - ib_cma_get_device > - ib_cma_get_src_ip > - some cm events cases > - arp retries (Maybe should implemented in at.c) > - protection from destroy while in callback (route/path) > - APM > > The implementation took reference of the the openib kdapl implementation, > therefore I added Mellanox and NetApp copyrights. If there are other > copyrights need to be added I will immediately do so. > > I am attaching the files to this mail and I will send them inlined in 2 > separate mails (one for the header and one for the c implementation). I'm glad to see discussion returning to this topic. Not to nit pick, but why did you use the prefix "ib_cma_" instead of "rdma_" or "rdma_cm_"? From mshefty at ichips.intel.com Tue Sep 20 14:23:49 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 14:23:49 -0700 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: References: Message-ID: <43307DE5.8000906@ichips.intel.com> James Lentini wrote: > Why not make REJECTED mean NON_PEER_REJECTED and add a PEER_REJECTED? > In other words: > > enum ib_cma_event { > IB_CMA_EVENT_ESTABLISHED = 1, > IB_CMA_EVENT_REJECTED, > IB_CMA_EVENT_PEER_REJECTED, > IB_CMA_EVENT_DISCONNECTED, > IB_CMA_EVENT_UNREACHABLE > }; > > In my opinion this makes the hierarchy clearer. There are general > rejections and specific peer rejections. From an implementation viewpoint, I'm not sure we can distinguish between rejected and peer rejected. How about just rejected with some additional reject information in the case that the user cares? - Sean From iod00d at hp.com Tue Sep 20 14:27:17 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 20 Sep 2005 14:27:17 -0700 Subject: [openib-general] Re: Tavor HCAs with openib In-Reply-To: <20050920064851.GF2520@mellanox.co.il> References: <20050917012911.9327.qmail@web34207.mail.mud.yahoo.com> <20050920064851.GF2520@mellanox.co.il> Message-ID: <20050920212717.GA24837@esmail.cup.hp.com> On Tue, Sep 20, 2005 at 09:48:51AM +0300, Michael S. Tsirkin wrote: ... > Region 0 (offset 0x10) is at dfd00000, but Regions 2 and 4 (0x18 and > 0x20) got assigned to 0x0fff080000 and 0x0ff0000000. > > It really looks like a BIOS/kernel issue. > Does linux actually support 64 bit hardware addresses? Yes - but not when running a 32-bit kernel. The resources are defined as "unsigned long". ISTR there was also an issue with pci_save/restore state clobbering the the wrong BAR (or the wrong half of a 64-bit bar) when a 4GB+ address was assigned to a BAR. I'm not certain that code is getting invoked here though. grant From rolandd at cisco.com Tue Sep 20 14:27:12 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 14:27:12 -0700 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: <43307DE5.8000906@ichips.intel.com> (Sean Hefty's message of "Tue, 20 Sep 2005 14:23:49 -0700") References: <43307DE5.8000906@ichips.intel.com> Message-ID: <52irwvo2fj.fsf@cisco.com> Sean> From an implementation viewpoint, I'm not sure we can Sean> distinguish between rejected and peer rejected. How about Sean> just rejected with some additional reject information in the Sean> case that the user cares? I think the right way to think of this API is as implementing an iWARP emulation layer for IB. For a TCP connection, there's only one "rejected" status, ie ECONNREFUSED. So I don't think we can expose multiple reject reasons or additional reject data. - R. From mshefty at ichips.intel.com Tue Sep 20 14:38:41 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 14:38:41 -0700 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: <52irwvo2fj.fsf@cisco.com> References: <43307DE5.8000906@ichips.intel.com> <52irwvo2fj.fsf@cisco.com> Message-ID: <43308161.3050802@ichips.intel.com> Roland Dreier wrote: > Sean> From an implementation viewpoint, I'm not sure we can > Sean> distinguish between rejected and peer rejected. How about > Sean> just rejected with some additional reject information in the > Sean> case that the user cares? > > I think the right way to think of this API is as implementing an iWARP > emulation layer for IB. For a TCP connection, there's only one > "rejected" status, ie ECONNREFUSED. So I don't think we can expose > multiple reject reasons or additional reject data. How does iWarp fit in with this API? I was assuming that it was beneath it, with this API being more like the connection portion of kDAPL. - Sean From caitlinb at broadcom.com Tue Sep 20 14:40:33 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Tue, 20 Sep 2005 14:40:33 -0700 Subject: [openib-general][PATCH][RFC]: CMA header Message-ID: <54AD0F12E08D1541B826BE97C98F99F1F6F0@NT-SJCA-0751.brcm.ad.broadcom.com> iWARP/MPA and iWARP/SCTP both distinquish between a rejection that came from the ULP peer and one that came from the stack (including the RDMA layer). A TCP rejection is definitely a Non-peer rejection. But it is also a non-peer rejection if the connection request cannot be delivered to the remote peer for approval. Reasons for that would include not having the resources, or the quota, to deliver the connection request to the user (particularly plausible if the remote peer that must approve the connection request is in user mode). This is distinquished in the response header (MPA or SCTP). > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier > Sent: Tuesday, September 20, 2005 2:27 PM > To: Sean Hefty > Cc: Openib > Subject: Re: [openib-general][PATCH][RFC]: CMA header > > Sean> From an implementation viewpoint, I'm not sure we can > Sean> distinguish between rejected and peer rejected. How about > Sean> just rejected with some additional reject information in the > Sean> case that the user cares? > > I think the right way to think of this API is as implementing > an iWARP emulation layer for IB. For a TCP connection, > there's only one "rejected" status, ie ECONNREFUSED. So I > don't think we can expose multiple reject reasons or > additional reject data. > From halr at voltaire.com Tue Sep 20 14:37:19 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 17:37:19 -0400 Subject: [openib-general] Re: RMPP Message Format Errors In-Reply-To: <1127217006.24173.7150.camel@hal.voltaire.com> References: <1127212695.24173.6501.camel@hal.voltaire.com> <432FE881.2080007@mellanox.co.il> <1127217006.24173.7150.camel@hal.voltaire.com> Message-ID: <1127252238.4426.69.camel@hal.voltaire.com> On Tue, 2005-09-20 at 07:53, Hal Rosenstock wrote: > On Tue, 2005-09-20 at 06:46, Eitan Zahavi wrote: > > Hal Rosenstock wrote: > > > On Tue, 2005-09-20 at 06:33, Eitan Zahavi wrote: > > > > > >>>Is this what you are referring to ? > > >> > > >>Yes the line of interest is: > > >>__osmv_sa_mad_rcv_cb: Count = 7 = 800 / 112 (16) > > >>This shows 16byte extra in the data size. > > > > > > > > > Should it be 20 for the SA class header size or 0 here ? > > Should be 0. Means an the packet size should accommodate an integer number of > > SA records (after removing the headers size). > > OK. There's a problem or problems on the receive side (of RMPP) to look > into but these appear OK for SA client right now. Patch coming shortly for this. -- Hal From robert.j.woodruff at intel.com Tue Sep 20 14:48:33 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Tue, 20 Sep 2005 14:48:33 -0700 Subject: [openib-general] [PATCH] iSER - changes in API, socket-based connect In-Reply-To: Message-ID: Woody wrote, >Here are the changes I made to get it to compile. I took a look >at the SDP backport patch and modeled the changes after those. >I however have no way to test it, so if you could review them >and let me know if they are correct, I will include them in >the next set of backport patches that I push out to SVN. >These are based on SVN3432. Ok, I was able to get iSer to build on 2.6.9-11.EL with the patch I sent earlier, but when I try to load it, I get /sbin/insmod /lib/modules/2.6.9-11.OpenIB.3432.EL.rootsmp/kernel/drivers/infiniband/ulp/i ser/ib_iser.ko insmod: error inserting '/lib/modules/2.6.9-11.OpenIB.3432.EL.rootsmp/kernel/drivers/infiniband/ulp/ iser/ib_iser.ko': -1 Operation not permitted dmesg shows, iser:dat_ia_open failed ret=-16 iser:dat_ia_open failed ret=-16 iser:initializing iser failed! iser:initializing iser global structures failed! Is there something else I need to do other than load kdapl.ko and kdapl_ib.ko ? They appeared to load OK. [root at iclust-16 woody]# /sbin/lsmod | grep kdapl kdapl_ib 51008 0 kdapl 9285 1 kdapl_ib ib_at 23745 2 kdapl_ib,ib_uat ib_cm 39089 4 kdapl_ib,ib_srp,ib_sdp,ib_ucm ib_core 53313 10 kdapl_ib,ib_srp,ib_sdp,ib_cm,ib_umad,ib_uverbs,ib_ipoib,ib_sa,ib_mthca,ib_ma d Any ideas ? woody From surs at cse.ohio-state.edu Tue Sep 20 14:49:23 2005 From: surs at cse.ohio-state.edu (Sayantan Sur) Date: Tue, 20 Sep 2005 17:49:23 -0400 Subject: [openib-general] Firmware parameters Message-ID: <20050920214921.GA2799@cse.ohio-state.edu> Hi, I read the Wiki page (last updated 09/15) on burning firmware: https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet I couldn't find any information on how to set certain firmware parameters (like TPT values, Number of Outstanding Reads ... etc.) in that page. Another important question is given any HCA (with pre-installed firmware), how do I find out the parameter values on the card? I tried mstflint in the following manner, but wasn't able to get any useful information. I tried this on SuSE Linux 9.3, kernel 2.6.13-1(smp) with SVN revision 3433. [surs at ro0:mstflint] sudo ./mstflint -d 02:00.0 q Image type: FailSafe Chip rev.: A0 GUID Des: Node Port1 Port2 Sys image GUIDs: 0002c902004002e8 0002c902004002e9 0002c902004002ea 0002c902004002eb Board ID: (MT_0150000001) [surs at ro0:mstflint] sudo ./mstflint -d 02:00.0 dc *** ERROR *** Failed dumping FW configuration: Fw configuration section not found in the given image. Could someone please direct me to some means to achieve this? TIA, Sayantan. -- http://www.cse.ohio-state.edu/~surs From iod00d at hp.com Tue Sep 20 14:50:10 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 20 Sep 2005 14:50:10 -0700 Subject: [openib-general] some questions: sdp/perftest/hyperthreading In-Reply-To: <20050920202811.728d5bd6@marvin.local> References: <20050920202811.728d5bd6@marvin.local> Message-ID: <20050920215010.GC24837@esmail.cup.hp.com> On Tue, Sep 20, 2005 at 08:28:11PM +0200, umaxx wrote: ... > 2. Is it better to test performance with disabled Hyperthreading? In general, yes. > perftest/rdma_lat is using get_cycles(). get_cycles() uses internal > the rdtscll() function. Correct me if I am wrong, but this could be a > problem with a hyperthreading enabled cpu? Because it's not clear which > cpu(-core) is used by rdtsll() at which time? The time stamp counters > between processors are not synchronized, and therefore the timing > measured will be incorrect. I don't know if this is a problem in practice. It potentually could be. > Maybe something like cpuid() is missing - to set/get the wanted? I suggest using taskset(1) to bind a test program to a logical CPU if it really is a problem. The tests have fairly short run times and it's not worth making them more complex to deal with this. Someone else might take a shot at answering the other two questions. hth, grant From halr at voltaire.com Tue Sep 20 14:53:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 17:53:25 -0400 Subject: [openib-general] [PATCH] mad_rmpp: Fix receive length calculation Message-ID: <1127253205.4426.80.camel@hal.voltaire.com> mad_rmpp: Fix receive length calculation Signed-off-by: Hal Rosenstock Index: mad_rmpp.c =================================================================== --- mad_rmpp.c (revision 3496) +++ mad_rmpp.c (working copy) @@ -407,13 +407,23 @@ static inline int get_mad_len(struct mad { struct ib_rmpp_mad *rmpp_mad; int hdr_size, data_size, pad; + int class_hdr_len = 0; + u8 mgmt_class; rmpp_mad = (struct ib_rmpp_mad *)rmpp_recv->cur_seg_buf->mad; + mgmt_class = rmpp_mad->mad_hdr.mgmt_class; - hdr_size = data_offset(rmpp_mad->mad_hdr.mgmt_class); + hdr_size = data_offset(mgmt_class); data_size = sizeof(struct ib_rmpp_mad) - hdr_size; pad = data_size - be32_to_cpu(rmpp_mad->rmpp_hdr.paylen_newwin); - if (pad > data_size || pad < 0) + /* Adjust pad by one class header size */ + if (mgmt_class == IB_MGMT_CLASS_SUBN_ADM) + class_hdr_len = sizeof(struct ib_sa_hdr); + else if ((mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) && + (mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END)) + class_hdr_len = 4; + pad += class_hdr_len; + if (pad > data_size + class_hdr_len || pad < 0) pad = 0; return hdr_size + rmpp_recv->seg_num * data_size - pad; From mshefty at ichips.intel.com Tue Sep 20 15:06:26 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 15:06:26 -0700 Subject: [openib-general] Re: [PATCH] mad_rmpp: Fix receive length calculation In-Reply-To: <1127253205.4426.80.camel@hal.voltaire.com> References: <1127253205.4426.80.camel@hal.voltaire.com> Message-ID: <433087E2.5060802@ichips.intel.com> Hal Rosenstock wrote: > mad_rmpp: Fix receive length calculation Hal, can you explain problem with the current calculation? Why does pad need to be adjusted by one header size? - Sean From rolandd at cisco.com Tue Sep 20 15:08:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:10 -0700 Subject: [openib-general] [PATCH 01/10] IPoIB: fix module removal race In-Reply-To: <2005920158.2ndgBgW8RwzzVaAk@cisco.com> Message-ID: <2005920158.6GA97hjj1WYfaq3W@cisco.com> From: Michael S. Tsirkin Since ipoib uses queue_delayed_work to run flush task on port state events, it must flush scheduled work after unregistering the event handler. Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) b21607256f3370b9eba48cd0b67e8686c6b51a64 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -1005,6 +1005,7 @@ debug_failed: register_failed: ib_unregister_event_handler(&priv->event_handler); + flush_scheduled_work(); event_failed: ipoib_dev_cleanup(priv->dev); @@ -1057,6 +1058,7 @@ static void ipoib_remove_one(struct ib_d list_for_each_entry_safe(priv, tmp, dev_list, list) { ib_unregister_event_handler(&priv->event_handler); + flush_scheduled_work(); unregister_netdev(priv->dev); ipoib_dev_cleanup(priv->dev); From rolandd at cisco.com Tue Sep 20 15:08:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:10 -0700 Subject: [openib-general] [git pull] InfiniBand fixes Message-ID: <2005920158.2ndgBgW8RwzzVaAk@cisco.com> Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git This tree is also available from kernel.org mirrors at: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git This will pull the following changes (patches also sent as replies to this email): Hal Rosenstock: IPoIB: Fix SA client retransmission strategy IB: Fix data length for RMPP SA sends Michael S. Tsirkin: IPoIB: fix module removal race IB/mthca: Fix device removal memory leak Roland Dreier: IB/mthca: assign ACK timeout field correctly IB/mthca: fix posting of first work request IB/mthca: Initialize eq->nent before we use it IB/mthca: Fix posting work requests to shared receive queues IB/mthca: Don't try to set srq->last for userspace SRQs IPoIB: Don't flush workqueue from within workqueue drivers/infiniband/core/user_mad.c | 5 +- drivers/infiniband/hw/mthca/mthca_eq.c | 16 ++------ drivers/infiniband/hw/mthca/mthca_qp.c | 51 +++++++++++------------- drivers/infiniband/hw/mthca/mthca_srq.c | 25 +++++------- drivers/infiniband/ulp/ipoib/ipoib.h | 2 - drivers/infiniband/ulp/ipoib/ipoib_ib.c | 4 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 + drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 13 +++--- 8 files changed, 55 insertions(+), 63 deletions(-) From rolandd at cisco.com Tue Sep 20 15:08:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:10 -0700 Subject: [openib-general] [PATCH 02/10] IB/mthca: assign ACK timeout field correctly In-Reply-To: <2005920158.6GA97hjj1WYfaq3W@cisco.com> Message-ID: <2005920158.rJMu8Og0ayj9lKb3@cisco.com> The hardware reads the ACK timeout field from the most significant 5 bits of struct mthca_qp_path's ackto field, not the least significant bits. This fix has the driver put the timeout in the right place. Without this, we get a timeout that is 2^8 times too small. Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_qp.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) 6fd9dccd77024ea85b65aa3e8f1cce22caa0d578 diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -687,7 +687,7 @@ int mthca_modify_qp(struct ib_qp *ibqp, } if (attr_mask & IB_QP_TIMEOUT) { - qp_context->pri_path.ackto = attr->timeout; + qp_context->pri_path.ackto = attr->timeout << 3; qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_ACK_TIMEOUT); } From rolandd at cisco.com Tue Sep 20 15:08:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:11 -0700 Subject: [openib-general] [PATCH 04/10] IPoIB: Fix SA client retransmission strategy In-Reply-To: <2005920158.ofLOVgpQOqw9UR3J@cisco.com> Message-ID: <2005920158.QCEMsU9kiixrkQEA@cisco.com> From: Hal Rosenstock We got a little mixed up with what the backoff member holds in the IPoIB multicast group structure: sometimes it was used as a number of seconds, and sometimes it was used as a number of jiffies. Fix the code so that backoff is always in seconds. Signed-off-by: Hal Rosenstock Signed-off-by: Roland Dreier --- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) d24ef0519e081774db6a1515ad8dadefd3fcd508 diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -145,7 +145,7 @@ static struct ipoib_mcast *ipoib_mcast_a mcast->dev = dev; mcast->created = jiffies; - mcast->backoff = HZ; + mcast->backoff = 1; mcast->logcount = 0; INIT_LIST_HEAD(&mcast->list); @@ -396,7 +396,7 @@ static void ipoib_mcast_join_complete(in IPOIB_GID_ARG(mcast->mcmember.mgid), status); if (!status && !ipoib_mcast_join_finish(mcast, mcmember)) { - mcast->backoff = HZ; + mcast->backoff = 1; down(&mcast_mutex); if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) queue_work(ipoib_workqueue, &priv->mcast_task); @@ -496,7 +496,7 @@ static void ipoib_mcast_join(struct net_ if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) queue_delayed_work(ipoib_workqueue, &priv->mcast_task, - mcast->backoff); + mcast->backoff * HZ); up(&mcast_mutex); } else mcast->query_id = ret; From rolandd at cisco.com Tue Sep 20 15:08:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:11 -0700 Subject: [openib-general] [PATCH 03/10] IB/mthca: fix posting of first work request In-Reply-To: <2005920158.rJMu8Og0ayj9lKb3@cisco.com> Message-ID: <2005920158.ofLOVgpQOqw9UR3J@cisco.com> Fix posting first WQE for mem-free HCAs: we need to link to previous WQE even in that case. While we're at it, simplify code for Tavor-mode HCAs. We don't really need the conditional test there either; we can similarly always link to the previous WQE. Based on Michael S. Tsirkin's analogous fix for userspace libmthca. Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_qp.c | 48 ++++++++++++++----------------- drivers/infiniband/hw/mthca/mthca_srq.c | 14 ++++----- 2 files changed, 28 insertions(+), 34 deletions(-) bfb454be124d30e9bd140d9c6ef6aec549f4d293 diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -227,7 +227,6 @@ static void mthca_wq_init(struct mthca_w wq->last_comp = wq->max - 1; wq->head = 0; wq->tail = 0; - wq->last = NULL; } void mthca_qp_event(struct mthca_dev *dev, u32 qpn, @@ -1103,6 +1102,9 @@ static int mthca_alloc_qp_common(struct } } + qp->sq.last = get_send_wqe(qp, qp->sq.max - 1); + qp->rq.last = get_recv_wqe(qp, qp->rq.max - 1); + return 0; } @@ -1583,15 +1585,13 @@ int mthca_tavor_post_send(struct ib_qp * goto out; } - if (prev_wqe) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - cpu_to_be32(((ind << qp->sq.wqe_shift) + - qp->send_wqe_offset) | - mthca_opcode[wr->opcode]); - wmb(); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - cpu_to_be32((size0 ? 0 : MTHCA_NEXT_DBD) | size); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + cpu_to_be32(((ind << qp->sq.wqe_shift) + + qp->send_wqe_offset) | + mthca_opcode[wr->opcode]); + wmb(); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + cpu_to_be32((size0 ? 0 : MTHCA_NEXT_DBD) | size); if (!size0) { size0 = size; @@ -1688,13 +1688,11 @@ int mthca_tavor_post_receive(struct ib_q qp->wrid[ind] = wr->wr_id; - if (likely(prev_wqe)) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - cpu_to_be32((ind << qp->rq.wqe_shift) | 1); - wmb(); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - cpu_to_be32(MTHCA_NEXT_DBD | size); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + cpu_to_be32((ind << qp->rq.wqe_shift) | 1); + wmb(); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + cpu_to_be32(MTHCA_NEXT_DBD | size); if (!size0) size0 = size; @@ -1905,15 +1903,13 @@ int mthca_arbel_post_send(struct ib_qp * goto out; } - if (likely(prev_wqe)) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - cpu_to_be32(((ind << qp->sq.wqe_shift) + - qp->send_wqe_offset) | - mthca_opcode[wr->opcode]); - wmb(); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - cpu_to_be32(MTHCA_NEXT_DBD | size); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + cpu_to_be32(((ind << qp->sq.wqe_shift) + + qp->send_wqe_offset) | + mthca_opcode[wr->opcode]); + wmb(); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + cpu_to_be32(MTHCA_NEXT_DBD | size); if (!size0) { size0 = size; diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -189,7 +189,6 @@ int mthca_alloc_srq(struct mthca_dev *de srq->max = attr->max_wr; srq->max_gs = attr->max_sge; - srq->last = NULL; srq->counter = 0; if (mthca_is_memfree(dev)) @@ -264,6 +263,7 @@ int mthca_alloc_srq(struct mthca_dev *de srq->first_free = 0; srq->last_free = srq->max - 1; + srq->last = get_wqe(srq, srq->max - 1); return 0; @@ -446,13 +446,11 @@ int mthca_tavor_post_srq_recv(struct ib_ ((struct mthca_data_seg *) wqe)->addr = 0; } - if (likely(prev_wqe)) { - ((struct mthca_next_seg *) prev_wqe)->nda_op = - cpu_to_be32((ind << srq->wqe_shift) | 1); - wmb(); - ((struct mthca_next_seg *) prev_wqe)->ee_nds = - cpu_to_be32(MTHCA_NEXT_DBD); - } + ((struct mthca_next_seg *) prev_wqe)->nda_op = + cpu_to_be32((ind << srq->wqe_shift) | 1); + wmb(); + ((struct mthca_next_seg *) prev_wqe)->ee_nds = + cpu_to_be32(MTHCA_NEXT_DBD); srq->wrid[ind] = wr->wr_id; srq->first_free = next_ind; From rolandd at cisco.com Tue Sep 20 15:08:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:11 -0700 Subject: [openib-general] [PATCH 06/10] IB/mthca: Fix posting work requests to shared receive queues In-Reply-To: <2005920158.G3atJmbH9pmjSQjI@cisco.com> Message-ID: <2005920158.fH5cjWXDbJWaw3V6@cisco.com> The error handling paths in mthca_tavor_post_srq_recv() and mthca_arbel_post_srq_recv() are quite bogus, the result of a screwed up merge. Fix them so they work as intended. Pointed out by Michael S. Tsirkin Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_srq.c | 10 ++++------ 1 files changed, 4 insertions(+), 6 deletions(-) b95e7b96cd976a5974fa2ae8f3a1af510cb8d4c9 diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -409,7 +409,7 @@ int mthca_tavor_post_srq_recv(struct ib_ mthca_err(dev, "SRQ %06x full\n", srq->srqn); err = -ENOMEM; *bad_wr = wr; - return nreq; + break; } wqe = get_wqe(srq, ind); @@ -427,7 +427,7 @@ int mthca_tavor_post_srq_recv(struct ib_ err = -EINVAL; *bad_wr = wr; srq->last = prev_wqe; - return nreq; + break; } for (i = 0; i < wr->num_sge; ++i) { @@ -456,8 +456,6 @@ int mthca_tavor_post_srq_recv(struct ib_ srq->first_free = next_ind; } - return nreq; - if (likely(nreq)) { __be32 doorbell[2]; @@ -501,7 +499,7 @@ int mthca_arbel_post_srq_recv(struct ib_ mthca_err(dev, "SRQ %06x full\n", srq->srqn); err = -ENOMEM; *bad_wr = wr; - return nreq; + break; } wqe = get_wqe(srq, ind); @@ -517,7 +515,7 @@ int mthca_arbel_post_srq_recv(struct ib_ if (unlikely(wr->num_sge > srq->max_gs)) { err = -EINVAL; *bad_wr = wr; - return nreq; + break; } for (i = 0; i < wr->num_sge; ++i) { From rolandd at cisco.com Tue Sep 20 15:08:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:11 -0700 Subject: [openib-general] [PATCH 07/10] IB/mthca: Don't try to set srq->last for userspace SRQs In-Reply-To: <2005920158.fH5cjWXDbJWaw3V6@cisco.com> Message-ID: <2005920158.JalL9e5rI88RbY2o@cisco.com> Subject: [PATCH] IB/mthca: Don't try to set srq->last for userspace SRQs Userspace SRQs don't have a buffer allocated for them in the kernel, so it doesn't make sense to set srq->last during initialization. In fact, this can crash trying to follow a nonexistent buffer pointer. Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_srq.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) 6577ae51cf52f5fb0e4a85e673dd7bf2d0074e3e diff --git a/drivers/infiniband/hw/mthca/mthca_srq.c b/drivers/infiniband/hw/mthca/mthca_srq.c --- a/drivers/infiniband/hw/mthca/mthca_srq.c +++ b/drivers/infiniband/hw/mthca/mthca_srq.c @@ -172,6 +172,8 @@ static int mthca_alloc_srq_buf(struct mt scatter->lkey = cpu_to_be32(MTHCA_INVAL_LKEY); } + srq->last = get_wqe(srq, srq->max - 1); + return 0; } @@ -263,7 +265,6 @@ int mthca_alloc_srq(struct mthca_dev *de srq->first_free = 0; srq->last_free = srq->max - 1; - srq->last = get_wqe(srq, srq->max - 1); return 0; From rolandd at cisco.com Tue Sep 20 15:08:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:11 -0700 Subject: [openib-general] [PATCH 08/10] IB: Fix data length for RMPP SA sends In-Reply-To: <2005920158.JalL9e5rI88RbY2o@cisco.com> Message-ID: <2005920158.z5ALwijFYEKC7SAF@cisco.com> Subject: [PATCH] IB: Fix data length for RMPP SA sends From: Hal Rosenstock Date: 1127163061 -0700 We need to subtract off the header length from our payload length when sending multi-packet SA messages. Signed-off-by: Hal Rosenstock Signed-off-by: Roland Dreier --- drivers/infiniband/core/user_mad.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) eff4c654b1a4a5e5493fbdc3affa6dd48765c085 diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -334,10 +334,11 @@ static ssize_t ib_umad_write(struct file ret = -EINVAL; goto err_ah; } - /* Validate that management class can support RMPP */ + + /* Validate that the management class can support RMPP */ if (rmpp_mad->mad_hdr.mgmt_class == IB_MGMT_CLASS_SUBN_ADM) { hdr_len = offsetof(struct ib_sa_mad, data); - data_len = length; + data_len = length - hdr_len; } else if ((rmpp_mad->mad_hdr.mgmt_class >= IB_MGMT_CLASS_VENDOR_RANGE2_START) && (rmpp_mad->mad_hdr.mgmt_class <= IB_MGMT_CLASS_VENDOR_RANGE2_END)) { hdr_len = offsetof(struct ib_vendor_mad, data); From rolandd at cisco.com Tue Sep 20 15:08:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:11 -0700 Subject: [openib-general] [PATCH 05/10] IB/mthca: Initialize eq->nent before we use it In-Reply-To: <2005920158.QCEMsU9kiixrkQEA@cisco.com> Message-ID: <2005920158.G3atJmbH9pmjSQjI@cisco.com> In mthca_create_eq(), we call get_eqe() before setting eq->nent. This is wrong, because get_eqe() uses eq->nent. Fix this, and clean up the code a little while we're at it. (We got lucky with the current code, because eq->nent was cleared to 0, which get_eqe() made happen to do the right thing) Pointed out by Michael S. Tsirkin Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_eq.c | 16 +++++----------- 1 files changed, 5 insertions(+), 11 deletions(-) 8b194835d2f28c793d25d2c41753af2b7ee29f31 diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c --- a/drivers/infiniband/hw/mthca/mthca_eq.c +++ b/drivers/infiniband/hw/mthca/mthca_eq.c @@ -476,12 +476,8 @@ static int __devinit mthca_create_eq(str int i; u8 status; - /* Make sure EQ size is aligned to a power of 2 size. */ - for (i = 1; i < nent; i <<= 1) - ; /* nothing */ - nent = i; - - eq->dev = dev; + eq->dev = dev; + eq->nent = roundup_pow_of_two(max(nent, 2)); eq->page_list = kmalloc(npages * sizeof *eq->page_list, GFP_KERNEL); @@ -512,7 +508,7 @@ static int __devinit mthca_create_eq(str memset(eq->page_list[i].buf, 0, PAGE_SIZE); } - for (i = 0; i < nent; ++i) + for (i = 0; i < eq->nent; ++i) set_eqe_hw(get_eqe(eq, i)); eq->eqn = mthca_alloc(&dev->eq_table.alloc); @@ -528,8 +524,6 @@ static int __devinit mthca_create_eq(str if (err) goto err_out_free_eq; - eq->nent = nent; - memset(eq_context, 0, sizeof *eq_context); eq_context->flags = cpu_to_be32(MTHCA_EQ_STATUS_OK | MTHCA_EQ_OWNER_HW | @@ -538,7 +532,7 @@ static int __devinit mthca_create_eq(str if (mthca_is_memfree(dev)) eq_context->flags |= cpu_to_be32(MTHCA_EQ_STATE_ARBEL); - eq_context->logsize_usrpage = cpu_to_be32((ffs(nent) - 1) << 24); + eq_context->logsize_usrpage = cpu_to_be32((ffs(eq->nent) - 1) << 24); if (mthca_is_memfree(dev)) { eq_context->arbel_pd = cpu_to_be32(dev->driver_pd.pd_num); } else { @@ -569,7 +563,7 @@ static int __devinit mthca_create_eq(str dev->eq_table.arm_mask |= eq->eqn_mask; mthca_dbg(dev, "Allocated EQ %d with %d entries\n", - eq->eqn, nent); + eq->eqn, eq->nent); return err; From rolandd at cisco.com Tue Sep 20 15:08:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:11 -0700 Subject: [openib-general] [PATCH 10/10] IB/mthca: Fix device removal memory leak In-Reply-To: <2005920158.3byIBzMT2aE3fDZQ@cisco.com> Message-ID: <2005920158.lu2Dy0AsB1k9T11l@cisco.com> Subject: [PATCH] IB/mthca: Fix device removal memory leak From: Michael S. Tsirkin Date: 1127238888 -0700 Clean up QP table array on device removal. Signed-off-by: Michael S. Tsirkin Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_qp.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) 71eea47d853bb0ce0c6befe11b3e08111263170f diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -2123,5 +2123,6 @@ void __devexit mthca_cleanup_qp_table(st for (i = 0; i < 2; ++i) mthca_CONF_SPECIAL_QP(dev, i, 0, &status); + mthca_array_cleanup(&dev->qp_table.qp, dev->limits.num_qps); mthca_alloc_cleanup(&dev->qp_table.alloc); } From rolandd at cisco.com Tue Sep 20 15:08:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 15:08:11 -0700 Subject: [openib-general] [PATCH 09/10] IPoIB: Don't flush workqueue from within workqueue In-Reply-To: <2005920158.z5ALwijFYEKC7SAF@cisco.com> Message-ID: <2005920158.3byIBzMT2aE3fDZQ@cisco.com> Subject: [PATCH] IPoIB: Don't flush workqueue from within workqueue ipoib_mcast_restart_task() is always called from within the single-threaded IPoIB workqueue, so flushing the workqueue from within the function can lead to a recursion overflow. But since we're running in a single-threaded workqueue, we're already synchronized against other items in the workqueue, so just get rid of the flush in ipoib_mcast_restart_task(). Signed-off-by: Roland Dreier --- drivers/infiniband/ulp/ipoib/ipoib.h | 2 +- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 4 ++-- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 7 ++++--- 3 files changed, 7 insertions(+), 6 deletions(-) 8d2cae0651502028bf64844508ab18528bbd65c2 diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -257,7 +257,7 @@ void ipoib_mcast_send(struct net_device void ipoib_mcast_restart_task(void *dev_ptr); int ipoib_mcast_start_thread(struct net_device *dev); -int ipoib_mcast_stop_thread(struct net_device *dev); +int ipoib_mcast_stop_thread(struct net_device *dev, int flush); void ipoib_mcast_dev_down(struct net_device *dev); void ipoib_mcast_dev_flush(struct net_device *dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -432,7 +432,7 @@ int ipoib_ib_dev_down(struct net_device flush_workqueue(ipoib_workqueue); } - ipoib_mcast_stop_thread(dev); + ipoib_mcast_stop_thread(dev, 1); /* * Flush the multicast groups first so we stop any multicast joins. The @@ -599,7 +599,7 @@ void ipoib_ib_dev_cleanup(struct net_dev ipoib_dbg(priv, "cleaning up ib_dev\n"); - ipoib_mcast_stop_thread(dev); + ipoib_mcast_stop_thread(dev, 1); /* Delete the broadcast address and the local address */ ipoib_mcast_dev_down(dev); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -598,7 +598,7 @@ int ipoib_mcast_start_thread(struct net_ return 0; } -int ipoib_mcast_stop_thread(struct net_device *dev) +int ipoib_mcast_stop_thread(struct net_device *dev, int flush) { struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_mcast *mcast; @@ -610,7 +610,8 @@ int ipoib_mcast_stop_thread(struct net_d cancel_delayed_work(&priv->mcast_task); up(&mcast_mutex); - flush_workqueue(ipoib_workqueue); + if (flush) + flush_workqueue(ipoib_workqueue); if (priv->broadcast && priv->broadcast->query) { ib_sa_cancel_query(priv->broadcast->query_id, priv->broadcast->query); @@ -832,7 +833,7 @@ void ipoib_mcast_restart_task(void *dev_ ipoib_dbg_mcast(priv, "restarting multicast task\n"); - ipoib_mcast_stop_thread(dev); + ipoib_mcast_stop_thread(dev, 0); spin_lock_irqsave(&priv->lock, flags); From robert.j.woodruff at intel.com Tue Sep 20 15:09:10 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Tue, 20 Sep 2005 15:09:10 -0700 Subject: [openib-general] [PATCH] iSER - changes in API, socket-based connect In-Reply-To: Message-ID: Woody wrote, >iser:dat_ia_open failed ret=-16 >iser:dat_ia_open failed ret=-16 >iser:initializing iser failed! >iser:initializing iser global structures failed! >Is there something else I need to do other than load kdapl.ko and >kdapl_ib.ko ? Never mind. I read the README file and discovered that I need to ifconfig ipoib before loading iser or it fails with the above error. It might be nice for iSer to not require this in the future. It now appears to load, but I cannot test it any further since I do not have an iSer target. If you could at least code review the patches I sent earlier, that would be great. If they look OK, I will include them in the next set of 2.6.9 backport patches I put into SVN and perhaps you can test them with your iSer target. woody From halr at voltaire.com Tue Sep 20 15:25:48 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 18:25:48 -0400 Subject: [openib-general] Re: [PATCH] mad_rmpp: Fix receive length calculation In-Reply-To: <433087E2.5060802@ichips.intel.com> References: <1127253205.4426.80.camel@hal.voltaire.com> <433087E2.5060802@ichips.intel.com> Message-ID: <1127255148.4426.88.camel@hal.voltaire.com> Hi Sean, On Tue, 2005-09-20 at 18:06, Sean Hefty wrote: > Hal Rosenstock wrote: > > mad_rmpp: Fix receive length calculation > > Hal, can you explain problem with the current calculation? Why does pad need to > be adjusted by one header size? Because the data size comes out to be the class specific data size, but the payload length in the last segment includes the class header as well as the filled in data in that last segment. The case that Eitan documented where it wasn't off by 20 (SA class header size) was due to the fact that the pad went negative because the payload length was larger than 200 (the SA class specific data). Make sense now ? -- Hal From mshefty at ichips.intel.com Tue Sep 20 15:42:13 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 15:42:13 -0700 Subject: [openib-general] Re: [PATCH] mad_rmpp: Fix receive length calculation In-Reply-To: <1127255148.4426.88.camel@hal.voltaire.com> References: <1127253205.4426.80.camel@hal.voltaire.com> <433087E2.5060802@ichips.intel.com> <1127255148.4426.88.camel@hal.voltaire.com> Message-ID: <43309045.90604@ichips.intel.com> Hal Rosenstock wrote: >>>mad_rmpp: Fix receive length calculation >> >>Hal, can you explain problem with the current calculation? Why does pad need to >>be adjusted by one header size? > > > Because the data size comes out to be the class specific data size, but > the payload length in the last segment includes the class header as well > as the filled in data in that last segment. Ah - got it now. Thanks. What about using a different calculation to determine the pad rather than subtracting the hdr_size (which includes the class specific header) and then adding the size of the class specific header back in? Can we just do something like sizeof MAD - payload? - Sean From halr at voltaire.com Tue Sep 20 15:47:09 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 18:47:09 -0400 Subject: [openib-general] Re: [PATCH] mad_rmpp: Fix receive length calculation In-Reply-To: <43309045.90604@ichips.intel.com> References: <1127253205.4426.80.camel@hal.voltaire.com> <433087E2.5060802@ichips.intel.com> <1127255148.4426.88.camel@hal.voltaire.com> <43309045.90604@ichips.intel.com> Message-ID: <1127256428.4426.96.camel@hal.voltaire.com> On Tue, 2005-09-20 at 18:42, Sean Hefty wrote: > Ah - got it now. Thanks. What about using a different calculation to determine > the pad rather than subtracting the hdr_size (which includes the class specific > header) and then adding the size of the class specific header back in? Right, a little convulted... > Can we just do something like sizeof MAD - payload? Close but that wouldn't work. You're right that the payload length could be used to simplify this. I think the correct calculation is: IB_MGMT_RMPP_DATA - payload length If you like this better, I'll rework the patch. -- Hal From sean.hefty at intel.com Tue Sep 20 15:55:05 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 15:55:05 -0700 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: <432FDAEF.9080901@voltaire.com> Message-ID: Here's a modified version of the header. Notable changes are: * Functions have been renamed from ib_blah to rdma_blah. * Creation and destruction of the cma_id are explicit. * A single event handler is defined. * Routing information is retrieved via a separate call. * Listeners must now specify a backlog. * The get device routine has been removed. * Get source IP is no longer needed and has been removed. Functionality that could be added could include changing the route associated with a cma_id and retrieving all IP addresses for a device/port. - Sean /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2005 Intel Corporation. All rights reserved. * * This Software is licensed under one of the following licenses: * * 1) under the terms of the "Common Public License 1.0" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/cpl.php. * * 2) under the terms of the "The BSD License" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/bsd-license.php. * * 3) under the terms of the "GNU General Public License (GPL) Version 2" a * copy of which is available from the Open Source Initiative, see * http://www.opensource.org/licenses/gpl-license.php. * * Licensee has the right to choose one of the above licenses. * * Redistributions of source code must retain the above copyright * notice and one of the license notices. * * Redistributions in binary form must reproduce both the above copyright * notice, one of the license notices in the documentation * and/or other materials provided with the distribution. * */ #if !defined(RDMA_CMA_H) #define RDMA_CMA_H #include #include #include enum rdma_cma_event_type { RDMA_CMA_EVENT_ROUTE_FOUND, RDMA_CMA_EVENT_ESTABLISHED, RDMA_CMA_EVENT_REJECTED, RDMA_CMA_EVENT_DISCONNECTED, RDMA_CMA_EVENT_UNREACHABLE }; /* How to use? */ enum rdma_qos { RDMA_QOS_BEST_EFFORT = 0, RDMA_QOS_HIGH_THROUGHPUT = (1 << 0), RDMA_QOS_LOW_LATENCY = (1 << 1), RDMA_QOS_ECONOMY = (1 << 2), RDMA_QOS_PREMIUM = (1 << 3) }; /* How to use? */ enum rdma_connect_flags { RDMA_CONNECT_DEFAULT_FLAG = 0x00, RDMA_CONNECT_MULTIPATH_FLAG = 0x01 }; struct rdma_route { struct sockaddr src_ip; struct sockaddr dest_ip; }; struct rdma_cma_event { enum rdma_cma_event_type event; void *private_data; }; struct rdma_cma_id; typedef void (*rdma_cma_event_handler)(struct rdma_cma_id *cma_id, struct rdma_cma_event event); struct rdma_cma_id { struct ib_device *device; void *context; struct ib_qp *qp; rdma_cma_event_handler event_handler; struct rdma_route *route; }; struct rdma_cma_id* rdma_cma_create_id(struct ib_device *device, void *context, rdma_cma_event_handler event_handler); void rdma_cma_destroy_id(struct rdma_cma_id *cma_id); /** * rdma_cma_listen - this function is called by the passive side. It is * listening on a the specified port for incomming connection requests. */ int rdma_cma_listen(struct rdma_cma_id *cma_id, struct sockaddr *address, int backlog); int rdma_cma_get_route(struct rdma_cma_id *cma_id, struct sockaddr *src_ip, struct sockaddr *dest_ip); struct rdma_cma_conn_param { struct ib_qp *qp; struct ib_qp_attr *qp_attr; /* Need to clarify what's needed */ const void *private_data; u8 private_data_len; enum rdma_qos qos; /* ? */ enum rdma_connect_flags connect_flags; /* ? */ }; /** * rdma_cma_connect - this is the connect request function, called by * the active side. The consumer registers an upcall that will be * initiated by the cma with an appropriate connection event * notification (established/rejected/disconnected etc) * @conn_param: This structure contains the following connection parameters: * @qp: qp for establishing the connection * @qp_attr: only relevant attributes are used * @private_data: private data to be received at the listener upcall * @private_data_len: private data length * @qos: Quality of service for the rc * @connect_flags: default or multipath connection */ int rdma_cma_connect(struct rdma_cma_id *cma_id, struct rdma_cma_conn_param *conn_param); /** * rdma_cma_accept - call on the passive side to accept a connection request * note that if the function returned with error - a reject message was * sent to the remote side and the cma_id was destroyed * @cma_id: pass the handle that was returned in cma_listen callback for * this connection * @qp: the connection's qp * @private_data: private data to send back to the initiator * @private_data_len: private data length */ int rdma_cma_accept(struct rdma_cma_id *cma_id, struct ib_qp *qp, const void *private_data, u8 private_data_len); /** * rdma_cma_reject - call on the passive side to reject a connection request. * This call destroys the cma_id, hence when the active side accepts * the reject the cma_id is already destroyed. * @cma_id: this handle was accepted in cma_listen callback * @private_data: private data to send back to the initiator * @private_data_len: private data length */ int rdma_cma_reject(struct rdma_cma_id *cma_id, const void *private_data, u8 private_data_len); /** * rdma_cma_disconnect - this function disconnects the associated QP. */ int rdma_cma_disconnect(struct rdma_cma_id *cma_id); #endif /* RDMA_CMA_H */ From mshefty at ichips.intel.com Tue Sep 20 15:56:44 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 15:56:44 -0700 Subject: [openib-general] Re: [PATCH] mad_rmpp: Fix receive length calculation In-Reply-To: <1127256428.4426.96.camel@hal.voltaire.com> References: <1127253205.4426.80.camel@hal.voltaire.com> <433087E2.5060802@ichips.intel.com> <1127255148.4426.88.camel@hal.voltaire.com> <43309045.90604@ichips.intel.com> <1127256428.4426.96.camel@hal.voltaire.com> Message-ID: <433093AC.30207@ichips.intel.com> Hal Rosenstock wrote: > Close but that wouldn't work. You're right that the payload length could > be used to simplify this. I think the correct calculation is: > > IB_MGMT_RMPP_DATA - payload length > > If you like this better, I'll rework the patch. I think having a simpler calculation would be better. Thanks for looking at this. - Sean From halr at voltaire.com Tue Sep 20 16:00:50 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 19:00:50 -0400 Subject: [openib-general] [PATCHv2] mad_rmpp: Fix receive length calculation Message-ID: <1127257250.4426.107.camel@hal.voltaire.com> mad_rmpp: Fix receive length calculation Based on simplification idea from Sean Hefty Signed-off-by: Hal Rosenstock Index: mad_rmpp.c =================================================================== --- mad_rmpp.c (revision 3496) +++ mad_rmpp.c (working copy) @@ -412,8 +412,8 @@ static inline int get_mad_len(struct mad hdr_size = data_offset(rmpp_mad->mad_hdr.mgmt_class); data_size = sizeof(struct ib_rmpp_mad) - hdr_size; - pad = data_size - be32_to_cpu(rmpp_mad->rmpp_hdr.paylen_newwin); - if (pad > data_size || pad < 0) + pad = IB_MGMT_RMPP_DATA - be32_to_cpu(rmpp_mad->rmpp_hdr.paylen_newwin); + if (pad > IB_MGMT_RMPP_DATA || pad < 0) pad = 0; return hdr_size + rmpp_recv->seg_num * data_size - pad; From halr at voltaire.com Tue Sep 20 16:06:17 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 19:06:17 -0400 Subject: [openib-general] grmpp and RMPP fixes Message-ID: <1127257576.4426.110.camel@hal.voltaire.com> Hi Sean, BTW, I haven't looked at (or run) grmpp to see if it is impacted by any of the RMPP changes. -- Hal From mshefty at ichips.intel.com Tue Sep 20 16:14:49 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 16:14:49 -0700 Subject: [openib-general] Re: grmpp and RMPP fixes In-Reply-To: <1127257576.4426.110.camel@hal.voltaire.com> References: <1127257576.4426.110.camel@hal.voltaire.com> Message-ID: <433097E9.3020407@ichips.intel.com> Hal Rosenstock wrote: > Hi Sean, > > BTW, I haven't looked at (or run) grmpp to see if it is impacted by any > of the RMPP changes. Thanks for letting me know. I'll run it and fix anything if it runs into any issues. I expect that it'll work fine. - Sean From mshefty at ichips.intel.com Tue Sep 20 16:18:25 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 16:18:25 -0700 Subject: [openib-general] Re: [PATCHv2] mad_rmpp: Fix receive length calculation In-Reply-To: <1127257250.4426.107.camel@hal.voltaire.com> References: <1127257250.4426.107.camel@hal.voltaire.com> Message-ID: <433098C1.5050608@ichips.intel.com> Hal Rosenstock wrote: > mad_rmpp: Fix receive length calculation Thanks - applied. - Sean From ardavis at ichips.intel.com Tue Sep 20 16:22:40 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Tue, 20 Sep 2005 16:22:40 -0700 Subject: [openib-general] QP with large starting sequence adds latency to RDMA READ??? Message-ID: <433099C0.1070408@ichips.intel.com> I just noticed some RDMA read performance issues that seem to be related to the QP starting sequence number. If I set the starting sequence to 1 then all is fine but if I set it to 0x10000 then it seems to add ~40us to my 32KB RDMA read operation (polling for completions). Has anyone seen anything like this? I am running a PCI-e HCA adapter in a EM64T server. RDMA writes and message sends seem to be unaffected. -arlin From halr at voltaire.com Tue Sep 20 16:22:42 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 19:22:42 -0400 Subject: [openib-general] Re: [PATCHv2] mad_rmpp: Fix receive length calculation Message-ID: <1127258561.4426.121.camel@hal.voltaire.com> Hi Roland, Can this also be queued for the next git pull ? It would be preferable if this also could go into 2.6.14. Thanks. -- Hal -----Forwarded Message----- From: Sean Hefty To: Hal Rosenstock Cc: openib-general at openib.org Subject: Re: [PATCHv2] mad_rmpp: Fix receive length calculation Date: 20 Sep 2005 16:18:25 -0700 Hal Rosenstock wrote: > mad_rmpp: Fix receive length calculation Thanks - applied. - Sean From halr at voltaire.com Tue Sep 20 16:26:13 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 19:26:13 -0400 Subject: [openib-general] QP with large starting sequence adds latency to RDMA READ??? In-Reply-To: <433099C0.1070408@ichips.intel.com> References: <433099C0.1070408@ichips.intel.com> Message-ID: <1127258772.4426.123.camel@hal.voltaire.com> On Tue, 2005-09-20 at 19:22, Arlin Davis wrote: > I just noticed some RDMA read performance issues that seem to be related > to the QP starting sequence number. If I set the starting sequence to 1 > then all is fine but if I set it to 0x10000 then it seems to add ~40us > to my 32KB RDMA read operation (polling for completions). Has anyone > seen anything like this? > > I am running a PCI-e HCA adapter in a EM64T server. > > RDMA writes and message sends seem to be unaffected. What application/ULP are you running ? -- Hal From Don.Dhondt at Bull.com Tue Sep 20 16:32:56 2005 From: Don.Dhondt at Bull.com (Don.Dhondt at Bull.com) Date: Tue, 20 Sep 2005 16:32:56 -0700 Subject: [openib-general] Problem Building OpenIB on 2.6.12 kernel Message-ID: Our kernel is 2.6.12 and we try to build the GEN2 drivers. The gen2 "linux-kernel" trunk was downloaded yesterday... don't remember the number. When we add IPoIB, we get: drivers/infiniband/ulp/ipoib/ipoib_main.c:478: error: conflicting types for 'path_lookup' include/linux/namei.h:103: error: previous declaration of 'path_lookup' was here drivers/infiniband/ulp/ipoib/ipoib_main.c:478: error: conflicting types for 'path_lookup' include/linux/namei.h:103: error: previous declaration of 'path_lookup' was here make[3]: *** [drivers/infiniband/ulp/ipoib/ipoib_main.o] Error 1 make[2]: *** [drivers/infiniband/ulp/ipoib] Error 2 make[1]: *** [drivers/infiniband] Error 2 make: *** [drivers] Error 2 ipoib_main.c:478: static void path_lookup(struct sk_buff *skb, struct net_device *dev) include/linux/namei.h:103: extern int FASTCALL(path_lookup(const char *, unsigned, struct nameidata *)); Any ideas on what we are doing wrong? Also, will the GEN2 stack work with a 64K page size kernel? If not, is there a plan to make it work? Thanks for any help. Don Dhondt -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Tue Sep 20 16:33:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 16:33:11 -0700 Subject: [openib-general] Re: [PATCHv2] mad_rmpp: Fix receive length calculation In-Reply-To: <1127258561.4426.121.camel@hal.voltaire.com> (Hal Rosenstock's message of "20 Sep 2005 19:22:42 -0400") References: <1127258561.4426.121.camel@hal.voltaire.com> Message-ID: <5264svnwlk.fsf@cisco.com> Hal> Hi Roland, Can this also be queued for the next git pull ? It Hal> would be preferable if this also could go into 2.6.14. Sure, I'll add that into my git tree once Linus responds to my latest pull request. - R. From vuhuong at mellanox.com Tue Sep 20 16:46:07 2005 From: vuhuong at mellanox.com (Vu Pham) Date: Tue, 20 Sep 2005 16:46:07 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, Message-ID: <43309F3F.3060009@mellanox.com> Hi Roland, Please review the following srp patch: + support more than default 8 luns per target. Should we have max_luns as module param? How about cmds_per_lun, max_sectors, max_targets as module params as well + fix the srp_free_iu free NULL pointer - due to srp_connect_target fail with target->state other than PORT(DLID)_REDIRECT + fix the bug of reuse the iu while it's still in_use + support FMR - srp_map_fmr (if map_fmr failed then fall back to normal indirect mode using global r_key) I tested this patch with Mellanox IB storage Signed-off-by: Vu Pham -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: srp.patch URL: From mshefty at ichips.intel.com Tue Sep 20 16:39:13 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 16:39:13 -0700 Subject: [openib-general] QP with large starting sequence adds latency to RDMA READ??? In-Reply-To: <1127258772.4426.123.camel@hal.voltaire.com> References: <433099C0.1070408@ichips.intel.com> <1127258772.4426.123.camel@hal.voltaire.com> Message-ID: <43309DA1.1060703@ichips.intel.com> Hal Rosenstock wrote: > On Tue, 2005-09-20 at 19:22, Arlin Davis wrote: > >>I just noticed some RDMA read performance issues that seem to be related >>to the QP starting sequence number. If I set the starting sequence to 1 >>then all is fine but if I set it to 0x10000 then it seems to add ~40us >>to my 32KB RDMA read operation (polling for completions). Has anyone >>seen anything like this? >> >>I am running a PCI-e HCA adapter in a EM64T server. >> >>RDMA writes and message sends seem to be unaffected. > > > What application/ULP are you running ? He's running a modified version of dtest over DAPL that times RDMA reads from the post time to poll CQ completion. The test setup is actually a little more complex. The latencies for the runs with the various PSNs were: 1 - 60us 0x10000 - 60us (first run) 0x10000 - 100us (subsequent runs) 0x20000 - 100us (first run) 0x20000 - 140us (subsequent runs) 0x40000 - 140us (first run) 0x40000 - 220us (subsequent runs) The application halts and is re-started between each run, and the QP is different each time. The only change from run to run is the PSN. - Sean From rolandd at cisco.com Tue Sep 20 16:42:38 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 16:42:38 -0700 Subject: [openib-general] Problem Building OpenIB on 2.6.12 kernel In-Reply-To: (Don Dhondt's message of "Tue, 20 Sep 2005 16:32:56 -0700") References: Message-ID: <521x3jnw5t.fsf@cisco.com> Don> Our kernel is 2.6.12 and we try to build the GEN2 drivers. Don> The gen2 "linux-kernel" trunk was downloaded Don> yesterday... don't remember the number. You may run into other problems, because now that 2.6.13 is out, the OpenIB subversion tree will support 2.6.13 instead of 2.6.12. Don> When we add IPoIB, we get: Don> drivers/infiniband/ulp/ipoib/ipoib_main.c:478: error: Don> conflicting types for 'path_lookup' Don> include/linux/namei.h:103: error: previous declaration of Don> 'path_lookup' was here Strange, I've never seen that error. Can you figure out how is getting included while building ipoib_main.c? Don> Also, will the GEN2 stack work with a 64K page size kernel? Don> If not, is there a plan to make it work? I don't think anyone has tested it, but if you have new enough HCA firmware (>= 3.3.3 for PCI-X HCAs), it should work. If it doesn't work I'd be interested in hearing about it. - R. From rolandd at cisco.com Tue Sep 20 16:52:54 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 16:52:54 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <43309F3F.3060009@mellanox.com> (Vu Pham's message of "Tue, 20 Sep 2005 16:46:07 -0700") References: <43309F3F.3060009@mellanox.com> Message-ID: <52wtlbmh49.fsf@cisco.com> Thanks, I haven't read all the FMR stuff through yet, but a few quick comments: > + support more than default 8 luns per target. Should we have max_luns > as module param? How about cmds_per_lun, max_sectors, max_targets as > module params as well I think it makes more sense to handle this the same way I handled max_sectors: make it a per-target parameter passed in when connecting to the target. We could make cmds_per_lun a similar parameter, but are there likely to be any SRP targets that need this to be limited? Also, what is max_targets? > + fix the srp_free_iu free NULL pointer - due to srp_connect_target fail > with target->state other than PORT(DLID)_REDIRECT Yes, good catch. > + fix the bug of reuse the iu while it's still in_use I think I see the bug: a send may complete and have its IU recycled before the corresponding command is completed, and end up screwing things up. Is this right? If so I would prefer to fix things in a slightly different way. Rather than a TX ring, we should just keep a list of free IUs ready to send and only add IUs to the end of the list when we're really done with the IU. > + support FMR - srp_map_fmr (if map_fmr failed then fall back to normal > indirect mode using global r_key) This is good to have. I still need to read the code. - R. From rolandd at cisco.com Tue Sep 20 16:55:14 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 16:55:14 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <43309F3F.3060009@mellanox.com> (Vu Pham's message of "Tue, 20 Sep 2005 16:46:07 -0700") References: <43309F3F.3060009@mellanox.com> Message-ID: <52slvzmh0d.fsf@cisco.com> > - iu = kmalloc(sizeof *iu, gfp_mask); > + iu = kzalloc(sizeof *iu, gfp_mask); By the way, why do we want this? I think we always clear out the contents of our IUs before we send them. - R. From halr at voltaire.com Tue Sep 20 16:51:22 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 20 Sep 2005 19:51:22 -0400 Subject: [openib-general] QP with large starting sequence adds latency to RDMA READ??? In-Reply-To: <43309DA1.1060703@ichips.intel.com> References: <433099C0.1070408@ichips.intel.com> <1127258772.4426.123.camel@hal.voltaire.com> <43309DA1.1060703@ichips.intel.com> Message-ID: <1127260281.4426.138.camel@hal.voltaire.com> On Tue, 2005-09-20 at 19:39, Sean Hefty wrote: > > What application/ULP are you running ? > > He's running a modified version of dtest over DAPL that times RDMA reads from > the post time to poll CQ completion. > > The test setup is actually a little more complex. The latencies for the runs > with the various PSNs were: > > 1 - 60us > 0x10000 - 60us (first run) > 0x10000 - 100us (subsequent runs) > 0x20000 - 100us (first run) > 0x20000 - 140us (subsequent runs) > 0x40000 - 140us (first run) > 0x40000 - 220us (subsequent runs) > > The application halts and is re-started between each run, and the QP is > different each time. The only change from run to run is the PSN. Might it be some timeout being hit before this starts working ? Not sure I understand it but it looks like the CM starting_psn is set to ep_ptr->qp_handle->qp_num in the uDAPL CM. Don't know if that has anything to do with this but shouldn't this be the same PSN set for the QP ? -- Hal From mshefty at ichips.intel.com Tue Sep 20 17:05:25 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 17:05:25 -0700 Subject: [openib-general] QP with large starting sequence adds latency to RDMA READ??? In-Reply-To: <1127260281.4426.138.camel@hal.voltaire.com> References: <433099C0.1070408@ichips.intel.com> <1127258772.4426.123.camel@hal.voltaire.com> <43309DA1.1060703@ichips.intel.com> <1127260281.4426.138.camel@hal.voltaire.com> Message-ID: <4330A3C5.8050504@ichips.intel.com> Hal Rosenstock wrote: > Might it be some timeout being hit before this starts working ? > > Not sure I understand it but it looks like the > CM starting_psn is set to ep_ptr->qp_handle->qp_num in the uDAPL CM. He is over-riding this value before modifying the QP. > Don't know if that has anything to do with this but shouldn't this be > the same PSN set for the QP ? Not sure what the problem is. Initially, he was seeing a steady increase in RDMA read latency of 40 us per run. He narrowed it down to setting the PSN. Hard-coding the PSN to 1 eliminated the latency issue. Setting it to higher values cause the RDMA read latency to increase. - Sean From vuhuong at mellanox.com Tue Sep 20 17:19:34 2005 From: vuhuong at mellanox.com (Vu Pham) Date: Tue, 20 Sep 2005 17:19:34 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <52wtlbmh49.fsf@cisco.com> References: <43309F3F.3060009@mellanox.com> <52wtlbmh49.fsf@cisco.com> Message-ID: <4330A716.4030204@mellanox.com> I think it makes more sense to handle this the same way I handled max_sectors: make it a per-target parameter passed in when connecting to the target. We could make cmds_per_lun a similar parameter, but are there likely to be any SRP targets that need this to be limited? Also, what is max_targets? OK we can do the same way that you handled max_sectors. SRP targets may prefer a specific cmds_per_lun to reach max sequential performance. max_targets == max_id > + fix the bug of reuse the iu while it's still in_use I think I see the bug: a send may complete and have its IU recycled before the corresponding command is completed, and end up screwing things up. Is this right? Yes If so I would prefer to fix things in a slightly different way. Rather than a TX ring, we should just keep a list of free IUs ready to send and only add IUs to the end of the list when we're really done with the IU. A free list of IUs is fine also Vu -------------- next part -------------- An HTML attachment was scrubbed... URL: From vuhuong at mellanox.com Tue Sep 20 17:26:22 2005 From: vuhuong at mellanox.com (Vu Pham) Date: Tue, 20 Sep 2005 17:26:22 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <52slvzmh0d.fsf@cisco.com> References: <43309F3F.3060009@mellanox.com> <52slvzmh0d.fsf@cisco.com> Message-ID: <4330A8AE.6030404@mellanox.com> > - iu = kmalloc(sizeof *iu, gfp_mask); > + iu = kzalloc(sizeof *iu, gfp_mask); We only do this once at init time By the way, why do we want this? I think we always clear out the contents of our IUs before we send them. I don't think that we clear out the contents of our IUs and we should not except some extra fields (fmr_arr, fmr_cnt) that my patch introduced Vu -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Tue Sep 20 17:42:25 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 17:42:25 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <4330A8AE.6030404@mellanox.com> (Vu Pham's message of "Tue, 20 Sep 2005 17:26:22 -0700") References: <43309F3F.3060009@mellanox.com> <52slvzmh0d.fsf@cisco.com> <4330A8AE.6030404@mellanox.com> Message-ID: <52oe6nmetq.fsf@cisco.com> Vu> I don't think that we clear out the contents of our IUs and we Vu> should not except some extra fields (fmr_arr, fmr_cnt) that my Vu> patch introduced I see -- yes, I was getting confused between iu and iu->buf. - R. From rolandd at cisco.com Tue Sep 20 17:46:06 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 17:46:06 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <4330A716.4030204@mellanox.com> (Vu Pham's message of "Tue, 20 Sep 2005 17:19:34 -0700") References: <43309F3F.3060009@mellanox.com> <52wtlbmh49.fsf@cisco.com> <4330A716.4030204@mellanox.com> Message-ID: <52k6hbmenl.fsf@cisco.com> Vu> OK we can do the same way that you handled max_sectors. SRP Vu> targets may prefer a specific cmds_per_lun to reach max Vu> sequential performance. max_targets == max_id Does max_id matter at all for SRP? We only use scsi_scan_target() to find targets, so I'm not sure where the SCSI midlayer even looks at max_id for our case. - R. From vuhuong at mellanox.com Tue Sep 20 17:55:33 2005 From: vuhuong at mellanox.com (Vu Pham) Date: Tue, 20 Sep 2005 17:55:33 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <52k6hbmenl.fsf@cisco.com> References: <43309F3F.3060009@mellanox.com> <52wtlbmh49.fsf@cisco.com> <4330A716.4030204@mellanox.com> <52k6hbmenl.fsf@cisco.com> Message-ID: <4330AF85.8020207@mellanox.com> Does max_id matter at all for SRP? We only use scsi_scan_target() to find targets, so I'm not sure where the SCSI midlayer even looks at max_id for our case. Our current srp implementation does not support scsi naming convention which requires max_id and max_channel For example if there is a srp target with 1024 luns - how would you want to represent this srp target to the SCSI midlayer? a. a target with 1024 luns - or b. max_id targets and 1024/max_id luns per target - or c. max_channel channels, max_id targets and 1024/max_id/max_channel luns per target per channel Right now it only matters for the resource that we allocate/use for this scsi host ie. fmr_pool_size... Vu -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Tue Sep 20 18:05:56 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 20 Sep 2005 18:05:56 -0700 Subject: [openib-general][PATCH][RFC]: CMA header In-Reply-To: Message-ID: Here's a start at porting the CMA implementation to the modified header. I didn't get as far as I had hoped, but will continue tomorrow based on feedback. I commented out code that I hadn't gotten to yet, which is mostly the event handling from the IB CM. - Sean /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved. * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved. * Copyright (c) 2005 Intel Corporation. All rights reserved. * * This Software is licensed under one of the following licenses: * * 1) under the terms of the "Common Public License 1.0" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/cpl.php. * * 2) under the terms of the "The BSD License" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/bsd-license.php. * * 3) under the terms of the "GNU General Public License (GPL) Version 2" a * copy of which is available from the Open Source Initiative, see * http://www.opensource.org/licenses/gpl-license.php. * * Licensee has the right to choose one of the above licenses. * * Redistributions of source code must retain the above copyright * notice and one of the license notices. * * Redistributions in binary form must reproduce both the above copyright * notice, one of the license notices in the documentation * and/or other materials provided with the distribution. * */ #include #include MODULE_AUTHOR("Guy German"); MODULE_DESCRIPTION("Generic RDMA CM Agent"); MODULE_LICENSE("Dual BSD/GPL"); #define PFX "ib_cma: " struct cma_id_private { struct rdma_cma_id cma_id; struct ib_cm_id *cm_id; /* TODO: add state if needed */ /* TOOD: might need refcount for route queries */ /* atomic_t refcount; */ spinlock_t lock; int backlog; }; struct cma_route_private { struct rdma_route route; struct ib_path_rec *path_rec; }; struct rdma_cma_id* rdma_cma_create_id(struct ib_device *device, void *context, rdma_cma_event_handler event_handler) { struct cma_id_private *cma_id_priv; struct ib_cm_id *cm_id; cma_id_priv = zmalloc(sizeof *cma_id_priv, GFP_KERNEL); if (!cma_id_priv) return -ENOMEM; cma_id_priv->cma_id.device = device; cma_id_priv->cma_id.context = context; cma_id_priv->cma_id.event_handler = event_handler; spin_lock_init(&cma_id_priv->lock); cm_id = ib_create_cm_id(device, cma_ib_handler, cma_id_priv); if (IS_ERR(cm_id)) { kfree(cma_id_priv); return ERR_PTR(PTR_ERR(cm_id)); } cma_id_priv->cm_id = cm_id; return id; } EXPORT_SYMBOL(rdma_cma_create_id); void rdma_cma_destroy_id(struct rdma_cma_id *cma_id) { struct cma_id_private *cma_id_priv; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); /* TODO: cancel route lookup if active */ ib_destroy_cm_id(cma_id_priv->cm_id); kfree(cma_id->route); kfree(cma_id_priv); } EXPORT_SYMBOL(rdma_cma_destroy_id); int rdma_cma_listen(struct rdma_cma_id *cma_id, struct sockaddr *address, int backlog) { struct cma_id_private *cma_id_priv; int ret; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); cma_id->route = zmalloc(sizeof *cma_id->route, GFP_KERNEL); if (!cma_id->route) return -ENOMEM; cma_id_priv->backlog = backlog; cma_id->route->src_ip = address; /* TODO: convert address into a service_id */ ret = ib_cm_listen(cma_id_priv->cm_id, 0, 0); if (ret) goto err; return 0; err: kfree(cma_id->route); return ret; }; EXPORT_SYMBOL(rdma_cma_listen); /* static void cma_path_handler(u64 req_id, void *context, int rec_num) { struct cma_context *cma_id = context; enum ib_cma_event event; int status = 0; if (rec_num <= 0) { event = IB_CMA_EVENT_UNREACHABLE; goto error; } cma_id->cma_param.primary_path = &cma_id->cma_path; cma_id->cma_param.alternate_path = NULL; printk(KERN_DEBUG PFX "%s: dlid=%d slid=%d pkey=%d mtu=%d sid=%llx " "qpn=%d qpt=%d psn=%d prd=%s respres=%d rcm=%d flc=%d " "cmt=%d rtrc=%d rntrtr=%d maxcm=%d \n",__func__, cma_id->cma_param.primary_path->dlid , cma_id->cma_param.primary_path->slid , cma_id->cma_param.primary_path->pkey , cma_id->cma_param.primary_path->mtu , cma_id->cma_param.service_id, cma_id->cma_param.qp_num, cma_id->cma_param.qp_type, cma_id->cma_param.starting_psn, (char *)cma_id->cma_param.private_data, cma_id->cma_param.responder_resources, cma_id->cma_param.remote_cm_response_timeout, cma_id->cma_param.flow_control, cma_id->cma_param.local_cm_response_timeout, cma_id->cma_param.retry_count, cma_id->cma_param.rnr_retry_count, cma_id->cma_param.max_cm_retries); status = ib_send_cm_req(cma_id->cm_id, &cma_id->cma_param); if (status) { printk(KERN_ERR PFX "%s: cm_req failed %d\n",__func__, status); event = IB_CMA_EVENT_REJECTED; goto error; } return; error: printk(KERN_ERR PFX "%s: return error %d \n",__func__, status); cma_connection_callback(cma_id, event, NULL); } static void cma_route_handler(u64 req_id, void *context, int rec_num) { struct cma_context *cma_id = context; enum ib_cma_event event; int status = 0; if (rec_num <= 0) { event = IB_CMA_EVENT_UNREACHABLE; goto error; } cma_id->ibat_comp.fn = &cma_path_handler; cma_id->ibat_comp.context = cma_id; status = ib_at_paths_by_route(&cma_id->cma_route, 0, &cma_id->cma_path, 1, &cma_id->ibat_comp); if (status) { event = IB_CMA_EVENT_DISCONNECTED; goto error; } return; error: printk(KERN_ERR PFX "%s: return error %d \n",__func__, status); cma_connection_callback(cma_id, event ,NULL); } */ int rdma_cma_get_route(struct rdma_cma_id *cma_id, struct sockaddr *src_ip, struct sockaddr *dest_ip) { struct cma_id_private *cma_id_priv; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); /* TODO: Get remote GID from ARP table, query for path record */ return 0; } EXPORT_SYMBOL(rdma_cma_get_route); int rdma_cma_connect(struct rdma_cma_id *cma_id, struct rdma_cma_conn_param *conn_param) { struct cma_id_private *cma_id_priv; struct cma_route_private *route; struct ib_cm_req_param req; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); route = container_of(cma_id->route, struct cma_route_private, route); cma_id->qp = conn_param->qp; memset(&req, 0, sizeof req); req.primary_path = route->path_rec; /* TODO: convert route->route.dest_ip to service id */ req.service_id = 0; req.qp_num = conn_param->qp->qp_num; req.qp_type = IB_QPT_RC; req.starting_psn = req.qp_num; req.private_data = conn_param->private_data; req.private_data_len = conn_param->private_data_len; /* TODO: Get these values from user - from qp_attr ? u8 responder_resources; u8 initiator_depth; u8 remote_cm_response_timeout; u8 flow_control; u8 local_cm_response_timeout; u8 retry_count; u8 rnr_retry_count; u8 max_cm_retries; */ req.srq = conn_param->qp->srq ? 1 : 0; return ib_send_cm_req(cma_id_priv->cm_id, &req); } EXPORT_SYMBOL(rdma_cma_connect); /* TODO: fix up int rdma_cma_accept(struct rdma_cma_id *cma_id, struct ib_qp *qp, const void *private_data, u8 private_data_len) { struct cma_context *cma_id = cma_id; struct ib_cm_rep_param passive_params; int status; printk(KERN_DEBUG PFX "%s: enter >> private_data = %s (len=%d)\n", __func__, (char *)private_data, private_data_len); if (private_data_len > IB_CM_REP_PRIVATE_DATA_SIZE) { status = -EINVAL; goto reject; } memset(&passive_params, 0, sizeof passive_params); passive_params.private_data = private_data; passive_params.private_data_len = private_data_len; passive_params.qp_num = qp->qp_num; passive_params.responder_resources = CMA_TARGET_MAX; passive_params.initiator_depth = CMA_INITIATOR_DEPTH; passive_params.rnr_retry_count = CMA_RNR_RETRY_COUNT; status = cma_modify_qp_state(cma_id->cm_id, qp, IB_QPS_RTR, 0); if (status) goto reject; cma_id->accept_cb.func = cm_accept_handler; cma_id->accept_cb.context = context; status = ib_send_cm_rep(cma_id->cm_id, &passive_params); if (status) goto reject; printk(KERN_DEBUG PFX "%s: return success\n", __func__); return 0; reject: printk(KERN_ERR PFX "%s: error status %d\n", __func__, status); ib_send_cm_rej(cma_id->cm_id, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, NULL, 0); destroy_cma_id(cma_id); return status; } EXPORT_SYMBOL(rdma_cma_accept); */ int rdma_cma_reject(struct rdma_cma_id *cma_id, const void *private_data, u8 private_data_len) { struct cma_id_private *cma_id_priv; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); return ib_send_cm_rej(cma_id_priv->cm_id, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, private_data, private_data_len); }; EXPORT_SYMBOL(rdma_cma_reject); int rdma_cma_disconnect(struct rdma_cma_id *cma_id) { struct cma_id_private *cma_id_priv; struct ib_qp_attr qp_attr; int qp_attr_mask; int ret; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); qp_attr.qp_state = IB_QPS_ERR; qp_attr_mask = IB_QP_STATE; ret = ib_modify_qp(cma_id_priv->cma_id.qp, &qp_attr, qp_attr_mask); if (ret) return ret; /* We could either be initiating the disconnect or responding to one. */ ret = ib_send_cm_dreq(cma_id_priv->cm_id, NULL, 0); if (ret) ib_send_cm_drep(cma_id_priv->cm_id, NULL, 0); return 0; } EXPORT_SYMBOL(rdma_cma_disconnect); /* TODO: fixup void cma_connection_callback(struct cma_context *cma_id, const enum ib_cma_event event, const void *private_data) { ib_cma_event_handler conn_cb; struct ib_qp *qp = cma_id->cma_conn.qp; int status; conn_cb = cma_id->cma_conn.cma_event_handler; switch (event) { case IB_CMA_EVENT_ESTABLISHED: break; case IB_CMA_EVENT_DISCONNECTED: case IB_CMA_EVENT_REJECTED: case IB_CMA_EVENT_UNREACHABLE: case IB_CMA_EVENT_NON_PEER_REJECTED: status = cma_disconnect(qp, cma_id, CMA_CLOSE_ABRUPT); break; default: printk(KERN_ERR PFX "%s: unknown event !!\n", __func__); } printk(KERN_DEBUG PFX "%s: event=%d\n", __func__, event); conn_cb(event, cma_id->cma_conn.context, private_data); } static inline int cma_rep_recv(struct cma_context *cma_id, struct ib_cm_event *rep_cm_event) { int status; status = cma_modify_qp_state(cma_id->cm_id, cma_id->cma_conn.qp, IB_QPS_RTR, 0); if (status) { printk(KERN_ERR PFX "%s: fail to modify QPS_RTR %d\n", __func__, status); return status; } status = cma_modify_qp_state(cma_id->cm_id, cma_id->cma_conn.qp, IB_QPS_RTS, 0); if (status) { printk(KERN_ERR PFX "%s: fail to modify QPS_RTR %d\n", __func__, status); return status; } status = ib_send_cm_rtu(cma_id->cm_id, NULL, 0); if (status) { printk(KERN_ERR PFX "%s: fail to send cm rtu %d\n", __func__, status); return status; } return 0; } int cma_active_cb_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) { int status = 0; enum ib_cma_event cma_event = 0; struct cma_context *cma_id = cm_id->context; printk(KERN_DEBUG PFX "%s: enter >>> cm_id=%p cma_id=%p\n",__func__, cm_id, cma_id); switch (event->event) { case IB_CM_REQ_ERROR: cma_event = IB_CMA_EVENT_UNREACHABLE; break; case IB_CM_REJ_RECEIVED: cma_event = IB_CMA_EVENT_NON_PEER_REJECTED; break; case IB_CM_DREP_RECEIVED: case IB_CM_TIMEWAIT_EXIT: cma_event = IB_CMA_EVENT_DISCONNECTED; break; case IB_CM_REP_RECEIVED: status = cma_rep_recv(cma_id, event); if (!status) cma_event = IB_CMA_EVENT_ESTABLISHED; else cma_event = IB_CMA_EVENT_DISCONNECTED; break; case IB_CM_DREQ_RECEIVED: ib_send_cm_drep(cm_id, NULL, 0); cma_event = IB_CMA_EVENT_DISCONNECTED; break; case IB_CM_DREQ_ERROR: break; default: printk(KERN_WARNING PFX "%s: cm event (%d) not handled\n", __func__, event->event); break; } printk(KERN_WARNING PFX "%s: cm_event=%d cma_event=%d\n", __func__, event->event, cma_event); if (cma_event) cma_connection_callback(cma_id, cma_event, event->private_data); return status; } static int cma_passive_cb_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) { struct cma_context *cma_id; ib_cma_listen_handler crcb; ib_cma_ac_handler accb; void *cr_ctx, *ac_ctx; int status = 0; printk(KERN_DEBUG PFX "%s: enter >>> cm_id=%p\n",__func__, cm_id); cma_id = get_cma_id(cm_id, event); if (!cma_id) return -EINVAL; accb = cma_id->accept_cb.func; ac_ctx = cma_id->accept_cb.context; switch (event->event) { case IB_CM_REQ_RECEIVED: crcb = cma_id->creq_cma_id->listen_cb.func; cr_ctx = cma_id->creq_cma_id->listen_cb.context; memcpy(&cma_id->cma_path, ((struct ib_cm_req_event_param *) &event->param)->primary_path, sizeof cma_id->cma_path); crcb(cma_id, cm_id->device, event->private_data, cr_ctx); break; case IB_CM_REP_ERROR: accb(IB_CMA_EVENT_UNREACHABLE, ac_ctx); break; case IB_CM_REJ_RECEIVED: accb(IB_CMA_EVENT_REJECTED, ac_ctx); break; case IB_CM_RTU_RECEIVED: status = cma_modify_qp_state(cma_id->cm_id, cma_id->cma_conn.qp, IB_QPS_RTS, 0); if (!status) accb(IB_CMA_EVENT_ESTABLISHED, ac_ctx); else { accb(IB_CMA_EVENT_DISCONNECTED, ac_ctx); status = cma_disconnect(cma_id->cma_conn.qp, cma_id, CMA_CLOSE_ABRUPT); } break; case IB_CM_DREQ_RECEIVED: ib_send_cm_drep(cm_id, NULL, 0); break; case IB_CM_DREQ_ERROR: break; case IB_CM_DREP_RECEIVED: case IB_CM_TIMEWAIT_EXIT: accb(IB_CMA_EVENT_DISCONNECTED, ac_ctx); status = cma_disconnect(cma_id->cma_conn.qp, cma_id, CMA_CLOSE_ABRUPT); break; default: break; } destroy_cma_id(cma_id); return status; } */ static int cma_init(void) { return 0; } static void cma_cleanup(void) { } module_init(cma_init); module_exit(cma_cleanup); From lindahl at pathscale.com Tue Sep 20 19:14:58 2005 From: lindahl at pathscale.com (Greg Lindahl) Date: Tue, 20 Sep 2005 19:14:58 -0700 Subject: [openib-general] Re: Opensm - osm_sa_path_record.c - variable declaration In-Reply-To: <1127207357.24173.5800.camel@hal.voltaire.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30E22F8@mtlexch01.mtl.com> <1127207357.24173.5800.camel@hal.voltaire.com> Message-ID: <20050921021458.GC4238@greglaptop> > Windows compiler does not enable declaration not in the beginning of > the function, so I would > like to have it changed. > We can either move the declaration to the beginning of the function, > or add {} around the declaration. Isn't there a gcc flag to convince it to give an error in this case? If so, adding it to the OpenIB makefile would be a good idea. -- greg From rolandd at cisco.com Tue Sep 20 19:30:44 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 19:30:44 -0700 Subject: [openib-general] Re: Opensm - osm_sa_path_record.c - variable declaration In-Reply-To: <20050921021458.GC4238@greglaptop> (Greg Lindahl's message of "Tue, 20 Sep 2005 19:14:58 -0700") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30E22F8@mtlexch01.mtl.com> <1127207357.24173.5800.camel@hal.voltaire.com> <20050921021458.GC4238@greglaptop> Message-ID: <52fyrzm9t7.fsf@cisco.com> Greg> Isn't there a gcc flag to convince it to give an error in Greg> this case? If so, adding it to the OpenIB makefile would be Greg> a good idea. gcc 3.4 added -Wdeclaration-after-statement. However, older gcc versions don't accept the option. In fact gcc 2.95 doesn't accept declarations after statements. In any case it takes at least a little autoconf magic to add the flag only if the compiler being used accepts it. - R. From iod00d at hp.com Tue Sep 20 20:22:24 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 20 Sep 2005 20:22:24 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <52wtlbmh49.fsf@cisco.com> References: <43309F3F.3060009@mellanox.com> <52wtlbmh49.fsf@cisco.com> Message-ID: <20050921032224.GF24837@esmail.cup.hp.com> On Tue, Sep 20, 2005 at 04:52:54PM -0700, Roland Dreier wrote: ... > I think it makes more sense to handle this the same way I handled > max_sectors: make it a per-target parameter passed in when connecting > to the target. We could make cmds_per_lun a similar parameter, but > are there likely to be any SRP targets that need this to be limited? Probably. At least it is necessary for regular SCSI devices. This is normally how one can enforce "fairness" by preventing o one (or more) initiator from monopolizing a shared the target device o one LUN from saturating the link (and thus access to other LUNS). There might be other (better?) mechanisms for iSCSI/IB to handle these situations. Limiting the number of pending requests is known to work on previous technologies. hth, grant From caitlin.bestler at gmail.com Tue Sep 20 20:26:17 2005 From: caitlin.bestler at gmail.com (Caitlin Bestler) Date: Tue, 20 Sep 2005 20:26:17 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <432EFC74.6050105@ichips.intel.com> References: <432EFC74.6050105@ichips.intel.com> Message-ID: <469958e00509202026641feda2@mail.gmail.com> > > enum cma_close_flags { > > CMA_CLOSE_ABRUPT = 0, > > CMA_CLOSE_GRACEFUL > > }; > > Not sure what these are for. Why not have the user always destroy the cma_id? > If it hasn't yet been destroyed when a disconnect comes in, callback the user. > If a connection hasn't been disconnected when it is destroyed, automatically > send a disconnect message. > DAT defines a graceful close as one that occurs *after* any work requests already posted to the send queue have had a chance to complete normally. It serves the same application layer purpose as half-closing a socket. While this may be a somewhat advanced option for a low level interface there are advantages to letting the verb layer implement it. The device dependent code may have options available that will allow the graceful close to be issued promptly after the currently newest work request already posted to the QP. However it can be, and has been, implemented above the verb layer by the DAT Provider without device specific support. From Don.Dhondt at Bull.com Tue Sep 20 20:33:25 2005 From: Don.Dhondt at Bull.com (Don.Dhondt at Bull.com) Date: Tue, 20 Sep 2005 20:33:25 -0700 Subject: [openib-general] Problem Building OpenIB on 2.6.12 kernel Message-ID: Don> Our kernel is 2.6.12 and we try to build the GEN2 drivers. Don> The gen2 "linux-kernel" trunk was downloaded Don> yesterday... don't remember the number. Roland> You may run into other problems, because now that 2.6.13 is out, the Roland> OpenIB subversion tree will support 2.6.13 instead of 2.6.12. It is not very practical to try to use OpenIB on a continually moving kernel. That is probably why we see so many backport patches for various kernels. I don't suppose you know of any existing backport patches to 2.6.12. Don> When we add IPoIB, we get: Don> drivers/infiniband/ulp/ipoib/ipoib_main.c:478: error: Don> conflicting types for 'path_lookup' Don> include/linux/namei.h:103: error: previous declaration of Don> 'path_lookup' was here Roland> Strange, I've never seen that error. Can you figure out how Roland> is getting included while building ipoib_main.c? Well, I'm told by Jerome it is in <.ipoib_main.o.d> Don> Also, will the GEN2 stack work with a 64K page size kernel? Don> If not, is there a plan to make it work? Roland> I don't think anyone has tested it, but if you have new enough HCA Roland> firmware (>= 3.3.3 for PCI-X HCAs), it should work. If it doesn't Roland> work I'd be interested in hearing about it. Roland> - R. We have the firmware but until we get a clean build we can't give it a test. I'll let you know if we get there. =Don From halr at voltaire.com Tue Sep 20 21:15:00 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 00:15:00 -0400 Subject: [openib-general] Problem Building OpenIB on 2.6.12 kernel In-Reply-To: References: Message-ID: <1127276100.4426.179.camel@hal.voltaire.com> On Tue, 2005-09-20 at 23:33, Don.Dhondt at Bull.com wrote: > > Don> Our kernel is 2.6.12 and we try to build the GEN2 drivers. > Don> The gen2 "linux-kernel" trunk was downloaded > Don> yesterday... don't remember the number. > > Roland> You may run into other problems, because now that 2.6.13 is > out, the > Roland> OpenIB subversion tree will support 2.6.13 instead of 2.6.12. > > It is not very practical to try to use OpenIB on a continually moving > kernel. > That is probably why we see so many backport patches for various kernels. > I don't suppose you know of any existing backport patches to 2.6.12. I believe Mellanox has a backport to 2.6.11 and 2.6.12 which should be close to what you need (in https://openib.org/svn/gen2/branches/backport/). -- Hal From rolandd at cisco.com Tue Sep 20 21:29:38 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 20 Sep 2005 21:29:38 -0700 Subject: [openib-general] Problem Building OpenIB on 2.6.12 kernel In-Reply-To: (Don Dhondt's message of "Tue, 20 Sep 2005 20:33:25 -0700") References: Message-ID: <52br2nm4b1.fsf@cisco.com> Don> It is not very practical to try to use OpenIB on a Don> continually moving kernel. That is probably why we see so Don> many backport patches for various kernels. I don't suppose Don> you know of any existing backport patches to 2.6.12. I don't think I've seen any backport patches. Of course you can just use the IB code that is already in the stock 2.6.12 kernel, if that works for you. I understand that you'd like to be able to use the latest code from the svn tree with your choice of kernel, but it's not feasible for us to maintain N different backports in the face of changing kernel APIs. Roland> Strange, I've never seen that error. Can you figure out Roland> how is getting included while building Roland> ipoib_main.c? Don> Well, I'm told by Jerome it is in <.ipoib_main.o.d> That's just the automatically generated dependency file. I'd like to understand what's causing gcc to include namei.h when you compile ipoib_main.c. As far as I can tell from a quick grep of the kernel source, linux/namei.h is not included indirectly through any other Linux include files: $ grep -r namei include|grep '#include' include/asm-m68knommu/namei.h:#include include/asm-um/namei.h:#include "asm/arch/namei.h" and ipoib_main.c does not include . So why is your build picking up the declaration of path_lookup() in namei.h? Anyway, one easy fix for you is just to replace the occurrences of 'path_lookup' in ipoib_main.c with 'ipoib_path_lookup' or something like that. - R. From iod00d at hp.com Tue Sep 20 21:37:10 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 20 Sep 2005 21:37:10 -0700 Subject: [openib-general] Problem Building OpenIB on 2.6.12 kernel In-Reply-To: References: Message-ID: <20050921043710.GG24837@esmail.cup.hp.com> On Tue, Sep 20, 2005 at 08:33:25PM -0700, Don.Dhondt at Bull.com wrote: > It is not very practical to try to use OpenIB on a continually moving > kernel. You are right - it's not. But it's even less practical to develope code for submission to kernel.org on really old releases either. The primary role of openib.org SVN tree is as a developement branch of the code submitted to kernel.org. Basing openib.org SVN on the latest minor kernel.org release is a compromise between using "old" releases and linus' tree - which is really a continually moving target. > That is probably why we see so many backport patches for various kernels. AFAIK, those patches are not intended for developement use. The backports are primarily intended for use with distro's that are already shipping based on a particular kernel.org version. > I don't suppose you know of any existing backport patches to 2.6.12. If it's not in the SVN, it probably doesn't yet exist. I expect submittals for additional backport patches are welcome and would be committed to the repository. hth, grant From iod00d at hp.com Tue Sep 20 22:47:00 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 20 Sep 2005 22:47:00 -0700 Subject: [openib-general] could not add HCA InfiniHost0 In-Reply-To: <1127058546.7204.2.camel@QiWang> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEDDD@mtlexch01.mtl.com> <1127058546.7204.2.camel@QiWang> Message-ID: <20050921054700.GI24837@esmail.cup.hp.com> On Sun, Sep 18, 2005 at 11:49:06PM +0800, QiWang, Chen wrote: > 03:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) > Subsystem: Mellanox Technologies MT23108 InfiniHost ... > Interrupt: pin A routed to IRQ 193 vs. > 03:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1) > Subsystem: Mellanox Technologies MT23108 InfiniHost ... > Interrupt: pin A routed to IRQ 201 suggests the two blades are not identical HW. The MMIO BAR config and routing looked fine. Doesn't BIOS normally assign the PCI_INTERRUPT_LINE values to PCI devices? hth, grant From guyg at voltaire.com Tue Sep 20 23:57:35 2005 From: guyg at voltaire.com (Guy German) Date: Wed, 21 Sep 2005 09:57:35 +0300 Subject: [openib-general] [RFC] CMA - generic CM implementaion for IB In-Reply-To: References: <432ED13D.2010800@voltaire.com> Message-ID: <4331045F.8020607@voltaire.com> James Lentini wrote: > Not to nit pick, but why did you use the prefix "ib_cma_" instead of > "rdma_" or "rdma_cm_"? Hi James, As I agree with the change I think it applies to _all_ the verbs. Why call ib_post_recv and not rdma_post_recv, if it is actually a generic rdma call that can be implemented over iwarp ? Guy From davem at davemloft.net Wed Sep 21 00:11:56 2005 From: davem at davemloft.net (David S. Miller) Date: Wed, 21 Sep 2005 00:11:56 -0700 (PDT) Subject: [openib-general] Re: [PATCH] af_packet: Allow for > 8 byte hardware addresses. In-Reply-To: References: <20050912.154527.48978091.davem@davemloft.net> Message-ID: <20050921.001156.94450818.davem@davemloft.net> From: ebiederm at xmission.com (Eric W. Biederman) Date: Tue, 20 Sep 2005 11:17:14 -0600 > Dave sorry for the delay getting back to this... > This version of the patch adds the one memset you were clearly > asking for. Applied, thanks a lot Eric. From davem at davemloft.net Wed Sep 21 00:13:21 2005 From: davem at davemloft.net (David S. Miller) Date: Wed, 21 Sep 2005 00:13:21 -0700 (PDT) Subject: [openib-general] Re: [PATCH] [NET] socket.c: zero socket addresses before use. In-Reply-To: References: <20050912.154527.48978091.davem@davemloft.net> Message-ID: <20050921.001321.78997430.davem@davemloft.net> From: ebiederm at xmission.com (Eric W. Biederman) Date: Tue, 20 Sep 2005 11:18:23 -0600 > Dave I don't know if this is part of what you want but > zeroing the socket address buffer before use seem to be implied > by what you were asking for. So here is an additional patch > to implement that. > > This is a paranoid precaution to guard against accidental > information leaks to user space or other consumers/producers > may fail to properly fail to set or read the hardware > address length. af_packet over ethernet has had at least > has one small but in this respect. I think this patch might be a bit overkill, but thanks for cooking it up. I'm willing to be convinced otherwise though :-) From danb at voltaire.com Wed Sep 21 00:20:01 2005 From: danb at voltaire.com (Dan Bar Dov) Date: Wed, 21 Sep 2005 10:20:01 +0300 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, Message-ID: The iSCSI protocol has a "max_cmd_sn" parameter passed in each iscsi response. It sets a window size for the initiator indicating the number of commands it is allowed to send to the target. Dan > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Grant Grundler > Sent: Wednesday, September 21, 2005 6:22 AM > To: Roland Dreier > Cc: Vu Pham; openib-general > Subject: Re: [openib-general][PATCH][SRP] bug fixes & fmr supported, > > On Tue, Sep 20, 2005 at 04:52:54PM -0700, Roland Dreier wrote: > ... > > I think it makes more sense to handle this the same way I handled > > max_sectors: make it a per-target parameter passed in when > connecting > > to the target. We could make cmds_per_lun a similar parameter, but > > are there likely to be any SRP targets that need this to be limited? > > Probably. At least it is necessary for regular SCSI devices. > This is normally how one can enforce "fairness" by preventing > o one (or more) initiator from monopolizing a shared the target device > o one LUN from saturating the link (and thus access to other LUNS). > > There might be other (better?) mechanisms for iSCSI/IB to handle > these situations. Limiting the number of pending requests is > known to work on previous technologies. > > hth, > grant > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From guyg at voltaire.com Wed Sep 21 00:58:47 2005 From: guyg at voltaire.com (Guy German) Date: Wed, 21 Sep 2005 10:58:47 +0300 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <43305F3A.6040403@ichips.intel.com> References: <43305F3A.6040403@ichips.intel.com> Message-ID: <433112B7.4050802@voltaire.com> Sean Hefty wrote: > Guy German wrote: >> static void cma_route_handler(u64 req_id, void *context, int rec_num) >> { >> status = ib_at_paths_by_route(&cma_ctx->cma_route, 0, >> &cma_ctx->cma_path, 1, >> &cma_ctx->ibat_comp); >> } >> int ib_cma_connect(struct ib_cma_conn *cma_conn, void **cma_id) >> { >> status = ib_at_route_by_ip(dst_ip, 0, 0, 0, &cma_ctx->cma_route, >> &cma_ctx->ibat_comp); >> }; > > I still think that it may be better for the user to get the route/path > separately from establishing a connection. This simplifies the internal > state handling, and I believe maps better to the user allocating the QP, > transitioning it to the INIT state, and pre-posting receive buffers. An > application may want to change its behavior based on its path (such as > MTU or data rate). Integrating this in with the connect call requires > applications that want to do this to operate with the lower level > connection interfaces. The problems I see with asynchronous route/arp model in the generic cm API is that consumers will have to deal with complicated caching (that can be implemented in the at module) and that it is not native to iwarp. What do other people think about it ? > Also, based on previous discussions, I think that we need to come up > with a way to map the destination IP address to a route that doesn't > involve ATS. This way (ib_at_route_by_ip) doesn't involve ATS, it uses ipoib (when called with flag IB_AT_ROUTE_USE_DEFAULTS==0) > We'll also need to define the best way to store the IP address in the CM > private data. I remember that Yaron and Roland both mentioned methods > for doing this, and we may need to make this visible through the API. OK Guy From mst at mellanox.co.il Wed Sep 21 01:02:30 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Sep 2005 11:02:30 +0300 Subject: [openib-general] RFC: struct netdevice changes for IPoIB UC support Message-ID: <20050921080230.GE18449@mellanox.co.il> Hi! I am working on IP over InfiniBand net device support. Existing code in mainline kernel only supports UD (unreliable datagram) mode of operation, with max MTU of 2Kbyte. I'm looking into support for UC (unreliable connected) mode of operation, which can support MTU with theorectical limit up to 2Gbyte. As was discussed on the openib list, one of the difficulties with IP over IB support for UC mode, is the fact that the same device has to support sending both UC (max MTU 2Gbyte) and UD (max MTU 2Kbyte) packets, depending on packet link address. I propose the following simple patch to let the netdevice override the path MTU per dst entry. The patch was tested by modifying existing IPoIB code to use MTU of 1K for some addresses, and 2K for others. Please comment on this approach: does it make sense to you guys? Please Cc me directly, I'm not on the list. Thanks a bunch, MST --- Make it possible for a network device to support more than one MTU value at a time (depending on packet link address, or other criteria). Signed-off-by: Michael S. Tsirkin Index: linux-2.6.12.5/include/linux/netdevice.h =================================================================== --- linux-2.6.12.5.orig/include/linux/netdevice.h +++ linux-2.6.12.5/include/linux/netdevice.h @@ -454,6 +454,10 @@ struct net_device #define HAVE_CHANGE_MTU int (*change_mtu)(struct net_device *dev, int new_mtu); +#define HAVE_GET_MTU + u32 (*get_mtu)(struct net_device *dev, + struct neighbour *neigh, + int path_mtu); #define HAVE_TX_TIMEOUT void (*tx_timeout) (struct net_device *dev); Index: linux-2.6.12.5/include/net/dst.h =================================================================== --- linux-2.6.12.5.orig/include/net/dst.h +++ linux-2.6.12.5/include/net/dst.h @@ -111,7 +111,12 @@ dst_metric(const struct dst_entry *dst, static inline u32 dst_mtu(const struct dst_entry *dst) { - u32 mtu = dst_metric(dst, RTAX_MTU); + u32 mtu; + if (dst->dev && dst->dev->get_mtu) + mtu = dst->dev->get_mtu(dst->dev, dst->neighbour, + dst_metric(dst, RTAX_MTU)); + else + mtu = dst_metric(dst, RTAX_MTU); /* * Alexey put it here, so ask him about it :) */ -- MST From herbert at gondor.apana.org.au Wed Sep 21 01:15:23 2005 From: herbert at gondor.apana.org.au (Herbert Xu) Date: Wed, 21 Sep 2005 18:15:23 +1000 Subject: [openib-general] Re: RFC: struct netdevice changes for IPoIB UC support In-Reply-To: <20050921080230.GE18449@mellanox.co.il> Message-ID: Michael S. Tsirkin wrote: > > Please comment on this approach: does it make sense to you guys? > Please Cc me directly, I'm not on the list. Sorry, this doesn't make sense. > static inline u32 dst_mtu(const struct dst_entry *dst) > { > - u32 mtu = dst_metric(dst, RTAX_MTU); > + u32 mtu; > + if (dst->dev && dst->dev->get_mtu) > + mtu = dst->dev->get_mtu(dst->dev, dst->neighbour, > + dst_metric(dst, RTAX_MTU)); > + else > + mtu = dst_metric(dst, RTAX_MTU); >From this I gather that for a given dst the MTU is actually constant. That is, it only varies across different dst's. In this case you should calculate the correct MTU when the dst is created rather than here. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt From hch at lst.de Wed Sep 21 01:43:00 2005 From: hch at lst.de (Christoph Hellwig) Date: Wed, 21 Sep 2005 10:43:00 +0200 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <43309F3F.3060009@mellanox.com> References: <43309F3F.3060009@mellanox.com> Message-ID: <20050921084300.GA21715@lst.de> > + if ((dma_addr & (PAGE_SIZE - 1)) || > + ((dma_addr + dma_len) & (PAGE_SIZE - 1)) || > + ((i == (sg_cnt - 1)) && !unaligned)) { > + srp_fmr->io_addr = dma_addr & PAGE_MASK; > + ++unaligned; > + } > + > + if (unaligned <= 1) { > + cur_len += dma_len; > + for (base_addr = dma_addr; > + (dma_addr & PAGE_MASK) <= > + ((base_addr + dma_len - 1) & PAGE_MASK); > + dma_addr += PAGE_SIZE) > + dma_pages[page_cnt++] = dma_addr & PAGE_MASK; > + } > + > + if ((unaligned > 1) || (i == (sg_cnt - 1))) { this is definitly completely broken. dma_addr_ts are opaqueue handles, some platforms use high bits in them for iommu flags. From hch at lst.de Wed Sep 21 01:44:35 2005 From: hch at lst.de (Christoph Hellwig) Date: Wed, 21 Sep 2005 10:44:35 +0200 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <52wtlbmh49.fsf@cisco.com> References: <43309F3F.3060009@mellanox.com> <52wtlbmh49.fsf@cisco.com> Message-ID: <20050921084435.GB21715@lst.de> On Tue, Sep 20, 2005 at 04:52:54PM -0700, Roland Dreier wrote: > Thanks, I haven't read all the FMR stuff through yet, but a few quick comments: > > > + support more than default 8 luns per target. Should we have max_luns > > as module param? How about cmds_per_lun, max_sectors, max_targets as > > module params as well > > I think it makes more sense to handle this the same way I handled > max_sectors: make it a per-target parameter passed in when connecting > to the target. We could make cmds_per_lun a similar parameter, but > are there likely to be any SRP targets that need this to be limited? > Also, what is max_targets? Why do we need a limited max_luns at all? I hope all SRP targets propetly support REPORT_LUNS in which case we couldn't care less about a maximum LUN limit. And even if they don't I hope they handle scanning the first non-existant LUN gracefully. From hch at lst.de Wed Sep 21 01:44:59 2005 From: hch at lst.de (Christoph Hellwig) Date: Wed, 21 Sep 2005 10:44:59 +0200 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <52k6hbmenl.fsf@cisco.com> References: <43309F3F.3060009@mellanox.com> <52wtlbmh49.fsf@cisco.com> <4330A716.4030204@mellanox.com> <52k6hbmenl.fsf@cisco.com> Message-ID: <20050921084459.GC21715@lst.de> On Tue, Sep 20, 2005 at 05:46:06PM -0700, Roland Dreier wrote: > Vu> OK we can do the same way that you handled max_sectors. SRP > Vu> targets may prefer a specific cmds_per_lun to reach max > Vu> sequential performance. max_targets == max_id > > Does max_id matter at all for SRP? We only use scsi_scan_target() to > find targets, so I'm not sure where the SCSI midlayer even looks at > max_id for our case. It shouldn't. From eitan at mellanox.co.il Wed Sep 21 02:14:00 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: 21 Sep 2005 12:14:00 +0300 Subject: [openib-general] [PATCH] osm: backward compat autoconf on 64bit machine Message-ID: <86ll1qvl47.fsf@mtl066.yok.mtl.com> Hi Hal The attached patch allows compiling OpenSM in gen1 mode on 64 bit machine and supports customized driver installation dir through the use of MTHOME and TSHOME environmnet variables. These are only needed by non standard installations. Thanks Eitan Signed-off-by: Eitan Zahavi Index: config/osmvsel.m4 =================================================================== --- config/osmvsel.m4 (revision 3488) +++ config/osmvsel.m4 (working copy) @@ -34,19 +34,31 @@ elif test $with_osmv = "sim" ; then OSMV_LDADD="-L$with_sim/lib -libmscli" elif test $with_osmv = "gen1"; then OSMV_CFLAGS="-DOSM_VENDOR_INTF_TS" - OSMV_INCLUDES="-I/usr/local/ibgd/driver/infinihost/include -I\$(srcdir)/../include" + + if test -z $MTHOME; then + MTHOME=/usr/local/ibgd/driver/infinihost + fi + + OSMV_INCLUDES="-I$MTHOME/include -I\$(srcdir)/../include" dnl we need to find the TS includes somewhere... - osmv_dir=`uname -r|sed 's/smp//'` - osmv_dir_smp=`uname -r` osmv_found=0 - for d in /usr/src/$osmv_dir /usr/src/$osmv_dir_smp /lib/modules/$osmv_dir/build /lib/modules/$osmv_dir_smp/build/; do - if test -d $d/drivers/infiniband/include; then + if test -z $TSHOME; then + osmv_dir=`uname -r|sed 's/-smp//'` + osmv_dir_smp=`uname -r` + for d in /usr/src/linux-$osmv_dir /usr/src/linux-$osmv_dir_smp /lib/modules/$osmv_dir/build /lib/modules/$osmv_dir_smp/build/; do + if test -f $d/drivers/infiniband/include/ts_ib_useraccess.h; then OSMV_INCLUDES="$OSMV_INCLUDES -I$d/drivers/infiniband/include" osmv_found=1 fi done - if test -z $osmv_found; then + else + if test -f $TSHOME/ts_ib_useraccess.h; then + OSMV_INCLUDES="$OSMV_INCLUDES -I$TSHOME" + osmv_found=1 + fi + fi + if test $osmv_found = 0; then AC_MSG_ERROR([Fail to find gen1 include files dir]) fi OSMV_LDADD="-L/usr/local/ibgd/driver/infinihost/lib -lvapi -lmosal -lmtl_common -lmpga" @@ -86,7 +98,7 @@ if test "$disable_libcheck" != "yes"; th AC_MSG_ERROR([ibms_bind() not found. libosmvendor of type sim requires libibmscli.])) elif test $with_osmv = "gen1"; then osmv_save_ldflags=$LDFALGS - LDFLAGS="$LDFLAGS -L/usr/local/ibgd/driver/infinihost/lib -lmosal -lmtl_common -lmpga" + LDFLAGS="$LDFLAGS -L$MTHOME/lib -L$MTHOME/lib64 -lmosal -lmtl_common -lmpga" AC_CHECK_LIB(vapi, vipul_init, [], AC_MSG_ERROR([vipul_init() not found. libosmvendor of type gen1 requires libvapi.])) LD_FLAGS=$osmv_save_ldflags From admin at openib.org Wed Sep 21 02:49:00 2005 From: admin at openib.org (admin at openib.org) Date: Wed, 21 Sep 2005 15:49:00 +0600 Subject: [openib-general] Warning Message: Your services near to be closed. Message-ID: <0IN60007BTE797@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: document.zip Type: application/octet-stream Size: 53518 bytes Desc: not available URL: From Administrator at openib.org Wed Sep 21 02:52:18 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Wed, 21 Sep 2005 04:52:18 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <0f5401c5be92$27336730$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Warning Message: Your services near to be closed. Scanning time = 9/21/2005 4:52:16 AM Engine/Pattern = 7.510-1002/2.851.00 Action on virus found: The attachment document.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/21/2005 document.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] Warning Message: Your services near to be closed. From halr at voltaire.com Wed Sep 21 03:26:02 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 06:26:02 -0400 Subject: [openib-general] Re: [PATCH] osm: backward compat autoconf on 64bit machine In-Reply-To: <86ll1qvl47.fsf@mtl066.yok.mtl.com> References: <86ll1qvl47.fsf@mtl066.yok.mtl.com> Message-ID: <1127298361.4426.292.camel@hal.voltaire.com> On Wed, 2005-09-21 at 05:14, Eitan Zahavi wrote: > The attached patch allows compiling OpenSM in gen1 mode > on 64 bit machine and supports customized driver installation dir > through the use of MTHOME and TSHOME environmnet variables. > These are only needed by non standard installations. Thanks. Applied. -- Hal From guyg at voltaire.com Wed Sep 21 03:58:24 2005 From: guyg at voltaire.com (Guy German) Date: Wed, 21 Sep 2005 13:58:24 +0300 Subject: [openib-general][RFC][CMA]: ib_cma_get_device hot unplug issue In-Reply-To: <4330366B.6020702@ichips.intel.com> References: <43301011.1000200@voltaire.com> <4330366B.6020702@ichips.intel.com> Message-ID: <43313CD0.90608@voltaire.com> Sean Hefty wrote: > Requiring users to validate that a pointer returned from a function is > valid seems like a poor API design. Returning a GUID in that case seems > like a better approach, so that the user is forced to perform the > required lookup. We need to make it easy for users. rdma_cm_get_guid (by dst_ip) sounds like a good idea to me. Although there is not much difference for the user between finding a device by guid and validating a device pointer (functionality wise), I agree the former does look cleaner. > We could also require users to pass in a device structure as input and > let the calls fail if lookup fails. For example, we could add calls to > get the IP addresses associated with a particular device port. I am not quite sure I follow you here, but I think it is not enough that lookup succeeds, you need to synchronize between the removal cb and the verbs calls, via locking mechanism of some sort (lock before device free and lock before calling ib_query_port, for example) If the cma would be the one actually registered as the client (to get the removal callbacks), the consumer would practically not be able to call verbs like ib_create_pd, ib_create_cq, ib_query_port etc ... Any way, I don't think it is a good idea to register the cma layer as a device client, if we want it to be a thin mid layer. Guy From mst at mellanox.co.il Wed Sep 21 04:51:20 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Sep 2005 14:51:20 +0300 Subject: [openib-general] Re: some questions: sdp/perftest/hyperthreading In-Reply-To: <20050920202811.728d5bd6@marvin.local> References: <20050920202811.728d5bd6@marvin.local> Message-ID: <20050921115120.GF19932@mellanox.co.il> Quoting r. umaxx : > Subject: some questions: sdp/perftest/hyperthreading > > Hello, > > I have a couple of questions after playing around with OpenIB these > days, hope you can help me. > > 1. from userspace/libsdp/src/port.c: > > ... > printf("default libsdp configuration is used\n"); > #define LIBSDP_DEFAULT_CONFIG_FILE "/usr/local/ibgd/etc/libsdp.conf" > __sdp_read_config(LIBSDP_DEFAULT_CONFIG_FILE); > ... > > Why is LIBSDP_DEFAULT_CONFIG_FILE pointing to this location? > Shouldn't this be something more useful like $sysconf? > And maybe you can add a note to README / or a comment to > libsdp.conf - that it is possible to set the environment > variable LIBSDP_CONFIG_FILE? Yes, that needs to be fixed. Patch? > 3. In the SDP-slide > (http://openib.org/docs/oib_wkshp_082205/das_SDP_Linux.pdf) from the > workshop is a measurement with perftest on a Dual-Xeon with enabled HT > done. Was there SMP-Support enabled in the kernel? Yes. -- MST From ebiederm at xmission.com Wed Sep 21 06:48:28 2005 From: ebiederm at xmission.com (Eric W. Biederman) Date: Wed, 21 Sep 2005 07:48:28 -0600 Subject: [openib-general] Re: [PATCH] [NET] socket.c: zero socket addresses before use. In-Reply-To: <20050921.001321.78997430.davem@davemloft.net> (David S. Miller's message of "Wed, 21 Sep 2005 00:13:21 -0700 (PDT)") References: <20050912.154527.48978091.davem@davemloft.net> <20050921.001321.78997430.davem@davemloft.net> Message-ID: "David S. Miller" writes: > From: ebiederm at xmission.com (Eric W. Biederman) > Date: Tue, 20 Sep 2005 11:18:23 -0600 > >> Dave I don't know if this is part of what you want but >> zeroing the socket address buffer before use seem to be implied >> by what you were asking for. So here is an additional patch >> to implement that. >> >> This is a paranoid precaution to guard against accidental >> information leaks to user space or other consumers/producers >> may fail to properly fail to set or read the hardware >> address length. af_packet over ethernet has had at least >> has one small but in this respect. > > I think this patch might be a bit overkill, but thanks for cooking it > up. I'm willing to be convinced otherwise though :-) Personally I agree. But if we don't we probably need to audit all of the other protocols besides af_packet and see if they could possible have the problem that af_packet did with struct sockaddr_ll. What happened is struct sockaddr_ll had an array of 8 bytes where it placed a hardware address. It reported to socket.c the length of struct sockaddr_ll for returning to user space. The code then only filled in enough bytes for the actual hardware address. 6 bytes in the common case of ethernet. So only 6 bytes were ever written and we returned 8 bytes to user space. The transmission is similar except the kernel code was responsible for simply not caring about the additional bytes. But confused kernel code could have the problem. This is the same problem you were concerned about with my new struct packet_mreq_max. The only way to take the same precautions on the other code paths is to modify socket.c I am concerned enough to point out the possibility and send a patch to add some memset() as a cheap insurance plan. Auditing the address handling for all of the rest of the network protcols is more than I am ready to volunteer for. Eric From Federico.Sacerdoti at deshaw.com Wed Sep 21 08:03:26 2005 From: Federico.Sacerdoti at deshaw.com (Sacerdoti, Federico) Date: Wed, 21 Sep 2005 11:03:26 -0400 Subject: [openib-general] problem compiling openib on Linux 2.6.12 Message-ID: Hi, Using the openib svn code from yesterday I am compiling the Linux kernel 2.6.12 (from kernel.org). When I finish and try to 'modprobe ib_ucm' I see in dmsg: Unknown symbol class_destroy (also for class_create, etc) When compiling again I see the errors: CC [M] drivers/infiniband/core/uat.o drivers/infiniband/core/uat.c: In function `ib_uat_init': drivers/infiniband/core/uat.c:830: warning: implicit declaration of function `class_create' drivers/infiniband/core/uat.c:830: warning: assignment makes pointer from integer without a c ast drivers/infiniband/core/uat.c:837: warning: implicit declaration of function `class_device_cr eate' drivers/infiniband/core/uat.c: In function `ib_uat_cleanup': drivers/infiniband/core/uat.c:853: warning: implicit declaration of function `class_device_de stroy' drivers/infiniband/core/uat.c:854: warning: implicit declaration of function `class_destroy' CC [M] drivers/infiniband/core/ucm.o Here is the relevent parts of my kernel config: [root at drdab000 kernel.org]# ggrep INFINIBAND config-2.6.12 config-2.6.12:2224:# InfiniBand support config-2.6.12:2226:CONFIG_INFINIBAND=y config-2.6.12:2227:CONFIG_INFINIBAND_USER_MAD=m config-2.6.12:2228:CONFIG_INFINIBAND_USER_ACCESS=m config-2.6.12:2229:CONFIG_INFINIBAND_MTHCA=m config-2.6.12:2230:# CONFIG_INFINIBAND_MTHCA_DEBUG is not set config-2.6.12:2231:CONFIG_INFINIBAND_IPOIB=m config-2.6.12:2232:# CONFIG_INFINIBAND_IPOIB_DEBUG is not set config-2.6.12:2233:CONFIG_INFINIBAND_SDP=m config-2.6.12:2234:# CONFIG_INFINIBAND_SDP_DEBUG is not set config-2.6.12:2235:# CONFIG_INFINIBAND_SRP is not set Thank you for your help, -Federico From halr at voltaire.com Wed Sep 21 08:07:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 11:07:46 -0400 Subject: [openib-general] problem compiling openib on Linux 2.6.12 In-Reply-To: References: Message-ID: <1127315266.4426.574.camel@hal.voltaire.com> On Wed, 2005-09-21 at 11:03, Sacerdoti, Federico wrote: > Hi, > > Using the openib svn code from yesterday I am compiling the Linux kernel > 2.6.12 (from kernel.org). When I finish and try to 'modprobe ib_ucm' > > I see in dmsg: > > Unknown symbol class_destroy > (also for class_create, etc) > > When compiling again I see the errors: > > CC [M] drivers/infiniband/core/uat.o > drivers/infiniband/core/uat.c: In function `ib_uat_init': > drivers/infiniband/core/uat.c:830: warning: implicit declaration of > function `class_create' > drivers/infiniband/core/uat.c:830: warning: assignment makes pointer > from integer without a c > ast > drivers/infiniband/core/uat.c:837: warning: implicit declaration of > function `class_device_cr > eate' > drivers/infiniband/core/uat.c: In function `ib_uat_cleanup': > drivers/infiniband/core/uat.c:853: warning: implicit declaration of > function `class_device_de > stroy' > drivers/infiniband/core/uat.c:854: warning: implicit declaration of > function `class_destroy' A ucm backpatch for this is similar to the one for uat in https://openib.org/svn/gen2/branches/backport/2.6.12/uat_3465_to_2_6_12.patch I don't know if an explicit one for ucm exists. -- Hal From caitlinb at broadcom.com Wed Sep 21 08:35:24 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Wed, 21 Sep 2005 08:35:24 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation Message-ID: <54AD0F12E08D1541B826BE97C98F99F1020853@NT-SJCA-0751.brcm.ad.broadcom.com> > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Guy German > Sent: Wednesday, September 21, 2005 12:59 AM > To: Sean Hefty > Cc: Openib > Subject: Re: [openib-general][PATCH][RFC]: CMA IB implementation > > Sean Hefty wrote: > > Guy German wrote: > >> static void cma_route_handler(u64 req_id, void *context, > int rec_num) > >> { > >> status = ib_at_paths_by_route(&cma_ctx->cma_route, 0, > >> &cma_ctx->cma_path, 1, > >> &cma_ctx->ibat_comp); } int > >> ib_cma_connect(struct ib_cma_conn *cma_conn, void **cma_id) { > >> status = ib_at_route_by_ip(dst_ip, 0, 0, 0, > &cma_ctx->cma_route, > >> &cma_ctx->ibat_comp); }; > > > > I still think that it may be better for the user to get the > route/path > > separately from establishing a connection. This simplifies the > > internal state handling, and I believe maps better to the user > > allocating the QP, transitioning it to the INIT state, and > pre-posting > > receive buffers. An application may want to change its > behavior based > > on its path (such as MTU or data rate). Integrating this in > with the > > connect call requires applications that want to do this to operate > > with the lower level connection interfaces. > > The problems I see with asynchronous route/arp model in the > generic cm API is that consumers will have to deal with > complicated caching (that can be implemented in the at > module) and that it is not native to iwarp. > > What do other people think about it ? > Any split in the API needs to allow iWARP providers to implement the first part as a nop. iWARP is defined on top of IP transports, and in fact frequently is cleanly layered over at least the IP layer (full clean separation from the TCP layer being a bit less common). So the iWARP implementation may literally have no access to ARP data. Ultimately, that is an extreme example of the caching benefits Guy referenced even for IB providers. My question on this remains, what applications would really use this split? And wouldn't the speciail requirements of these applications, which I believe is a subset even of appliations that know they are using InfiniBand, mean that they would use the IB specific CM API (with explicit CM MADs)? From mst at mellanox.co.il Wed Sep 21 08:45:38 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 21 Sep 2005 18:45:38 +0300 Subject: [openib-general] Re: problem compiling openib on Linux 2.6.12 In-Reply-To: References: Message-ID: <20050921154538.GJ19932@mellanox.co.il> Quoting r. Sacerdoti, Federico : > Subject: problem compiling openib on Linux 2.6.12 > > Hi, > > Using the openib svn code from yesterday I am compiling the Linux kernel > 2.6.12 (from kernel.org). Hi Federico! Vanilla trunk only supports 2.6.13 now. If you want to use subversion trunk for other kernel versions, you need to apply the backport patch. Look under https://openib.org/svn/gen2/branches/backport/ -- MST From mshefty at ichips.intel.com Wed Sep 21 09:21:50 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 09:21:50 -0700 Subject: [openib-general][RFC][CMA]: ib_cma_get_device hot unplug issue In-Reply-To: <43313CD0.90608@voltaire.com> References: <43301011.1000200@voltaire.com> <4330366B.6020702@ichips.intel.com> <43313CD0.90608@voltaire.com> Message-ID: <4331889E.3080608@ichips.intel.com> Guy German wrote: >> We could also require users to pass in a device structure as input and >> let the calls fail if lookup fails. For example, we could add calls >> to get the IP addresses associated with a particular device port. > > I am not quite sure I follow you here, but I think it is not enough that > lookup succeeds, you need to synchronize between the removal cb and the > verbs calls, via locking mechanism of some sort (lock before device free > and lock before calling ib_query_port, for example) I don't think that the CMA needs to register as a device client. I was referring to providing a call such as: ib_device_get_ip(struct ib_device, struct sockaddr*, int num_sockaddr); To return all IP addresses associated with the given device. The caller can now coordinate calling this routine with device removal. - Sean From mshefty at ichips.intel.com Wed Sep 21 09:32:19 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 09:32:19 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <433112B7.4050802@voltaire.com> References: <43305F3A.6040403@ichips.intel.com> <433112B7.4050802@voltaire.com> Message-ID: <43318B13.5060802@ichips.intel.com> Guy German wrote: > The problems I see with asynchronous route/arp model in the generic cm > API is that consumers will have to deal with complicated caching (that > can be implemented in the at module) and that it is not native to iwarp. > > What do other people think about it ? I don't think that they need to deal with caching. And in looking at the ib_at code, I think that caching should be removed. A more generic SA path record cache would work better IMO. I would rather see an interface similar to the caching mechanisms in place for verbs, with calls like ib_get_cached_path(), etc. In any case, I think that we should defer adding caching in an initial implementation, and design an appropriate SA cache separately. - Sean From vuhuong at mellanox.com Wed Sep 21 09:30:57 2005 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 21 Sep 2005 09:30:57 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <20050921084435.GB21715@lst.de> References: <43309F3F.3060009@mellanox.com> <52wtlbmh49.fsf@cisco.com> <20050921084435.GB21715@lst.de> Message-ID: <43318AC1.2020208@mellanox.com> Christoph Hellwig wrote: >>I think it makes more sense to handle this the same way I handled >>max_sectors: make it a per-target parameter passed in when connecting >>to the target. We could make cmds_per_lun a similar parameter, but >>are there likely to be any SRP targets that need this to be limited? >>Also, what is max_targets? > > > Why do we need a limited max_luns at all? I hope all SRP targets > propetly support REPORT_LUNS in which case we couldn't care less about > a maximum LUN limit. And even if they don't I hope they handle scanning > the first non-existant LUN gracefully. > I aggreed. However, we need to override the default value of the scsi_host->max_lun with some big number ie. 1024 From rishi.shah at soulcitypubs.com Wed Sep 21 09:03:51 2005 From: rishi.shah at soulcitypubs.com (RAVE*SQ Magazine) Date: Wed, 21 Sep 2005 09:03:51 -0700 Subject: [openib-general] RAVE*SQ Partners with Fabindia! Message-ID: <1119868776-1463792126-1127319096@soulcitypublications.b.tep1.com> If you are reading this your browser does not support HTML. To read this message from RAVE*SQ Magazine visit http://soulcitypublications.c.topica.com/maad0MCabkwP6ci5DeZe/ to see this special offer! ==================================================================== Update Your Profile: http://soulcitypublications.f.topica.com/f/?a84NZf.ci5DeZ.b3Blbmli Unsubscribe: http://soulcitypublications.f.topica.com/f/unsub.html/aafs57olsf4g91gfecd3h1q8_k8tp0mh.h8kid8fydg6b Confirm Your Subscription: http://soulcitypublications.f.topica.com/f/?a84NZf.ci5DeZ.b3Blbmli.c Report Unsolicited Email: http://topica.com/f/abuse.html?aafs57olsf4g91gfecd3h1q8_k8tp0mh.h8kid8fydg6b Delivered by Topica: http://www.topica.com/?p=T3FOOTER -------------- next part -------------- An HTML attachment was scrubbed... URL: From ardavis at ichips.intel.com Wed Sep 21 09:39:29 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 21 Sep 2005 09:39:29 -0700 Subject: [openib-general] QP with large starting sequence adds latency to RDMA READ??? In-Reply-To: <4330A3C5.8050504@ichips.intel.com> References: <433099C0.1070408@ichips.intel.com> <1127258772.4426.123.camel@hal.voltaire.com> <43309DA1.1060703@ichips.intel.com> <1127260281.4426.138.camel@hal.voltaire.com> <4330A3C5.8050504@ichips.intel.com> Message-ID: <43318CC1.80803@ichips.intel.com> Sean Hefty wrote: > Hal Rosenstock wrote: > >> Might it be some timeout being hit before this starts working ? >> >> Not sure I understand it but it looks like the CM starting_psn is set >> to ep_ptr->qp_handle->qp_num in the uDAPL CM. > > Seeding the starting_psn to the QPN is new to this latest drop of uDAPL. Every new QPN created in subseqeunt runs is 0x10000 greater then the previous and this seems to cause a hit of about 40us. This is why I went back and just seeded by hand to see what happens and sure enough for every 0x1000 increment on the seed there is an additional ~40us hit. I am working on a verbs based test to make sure it is not some uDAPL issue that I introduced . I will also send out a patch to my dtest.c that measures the RDMA reads > He is over-riding this value before modifying the QP. > >> Don't know if that has anything to do with this but shouldn't this be >> the same PSN set for the QP ? > > > Not sure what the problem is. Initially, he was seeing a steady > increase in RDMA read latency of 40 us per run. He narrowed it down > to setting the PSN. Hard-coding the PSN to 1 eliminated the latency > issue. Setting it to higher values cause the RDMA read latency to > increase. > > - Sean > From mshefty at ichips.intel.com Wed Sep 21 09:40:01 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 09:40:01 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1020853@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F1020853@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <43318CE1.10008@ichips.intel.com> Caitlin Bestler wrote: > Any split in the API needs to allow iWARP providers to > implement the first part as a nop. iWARP is defined on > top of IP transports, and in fact frequently is cleanly > layered over at least the IP layer (full clean separation > from the TCP layer being a bit less common). So the > iWARP implementation may literally have no access to > ARP data. If you look at the modified version of the API that I sent it, the iWarp implementation is limited to allocating a data structure, and copying the source and destination IP addresses. If a provider has a requirement to do something different they can. - Sean From caitlinb at broadcom.com Wed Sep 21 09:39:49 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Wed, 21 Sep 2005 09:39:49 -0700 Subject: [openib-general][RFC][CMA]: ib_cma_get_device hot unplug issue Message-ID: <54AD0F12E08D1541B826BE97C98F99F1020855@NT-SJCA-0751.brcm.ad.broadcom.com> > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Sean Hefty > Sent: Wednesday, September 21, 2005 9:22 AM > To: Guy German > Cc: Openib-General at Openib.Org > Subject: Re: [openib-general][RFC][CMA]: ib_cma_get_device > hot unplug issue > > Guy German wrote: > >> We could also require users to pass in a device structure as input > >> and let the calls fail if lookup fails. For example, we could add > >> calls to get the IP addresses associated with a particular > device port. > > > > I am not quite sure I follow you here, but I think it is not enough > > that lookup succeeds, you need to synchronize between the > removal cb > > and the verbs calls, via locking mechanism of some sort > (lock before > > device free and lock before calling ib_query_port, for example) > > I don't think that the CMA needs to register as a device > client. I was referring to providing a call such as: > > ib_device_get_ip(struct ib_device, struct sockaddr*, int > num_sockaddr); > > To return all IP addresses associated with the given device. > The caller can now coordinate calling this routine with > device removal. > Why not simply specify the net device, and use the existing look ups to get the devices IP addresses? From halr at voltaire.com Wed Sep 21 09:36:53 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 12:36:53 -0400 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <43318B13.5060802@ichips.intel.com> References: <43305F3A.6040403@ichips.intel.com> <433112B7.4050802@voltaire.com> <43318B13.5060802@ichips.intel.com> Message-ID: <1127320550.4426.671.camel@hal.voltaire.com> On Wed, 2005-09-21 at 12:32, Sean Hefty wrote: > I don't think that they need to deal with caching. And in looking at the ib_at > code, I think that caching should be removed. Just for clarity: There is only a placeholder for caching in the AT API. Nothing has been implemented. -- Hal From caitlinb at broadcom.com Wed Sep 21 09:45:03 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Wed, 21 Sep 2005 09:45:03 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation Message-ID: <54AD0F12E08D1541B826BE97C98F99F1020856@NT-SJCA-0751.brcm.ad.broadcom.com> > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Wednesday, September 21, 2005 9:40 AM > To: Caitlin Bestler > Cc: Guy German; Openib > Subject: Re: [openib-general][PATCH][RFC]: CMA IB implementation > > Caitlin Bestler wrote: > > Any split in the API needs to allow iWARP providers to > implement the > > first part as a nop. iWARP is defined on top of IP > transports, and in > > fact frequently is cleanly layered over at least the IP layer (full > > clean separation from the TCP layer being a bit less > common). So the > > iWARP implementation may literally have no access to ARP data. > > If you look at the modified version of the API that I sent > it, the iWarp implementation is limited to allocating a data > structure, and copying the source and destination IP > addresses. If a provider has a requirement to do something > different they can. > That's certainly an acceptably low overhead for iWARP IHVs, provided there are applications that want this control and *not* also need even more IB-specific CM control. I still have the same skepticism I had for the IT-API's exposing of paths via a transport neutral API. Namely, is there really any basis to select amongst multiple paths from transport neutral code? The same applies to caching of address translations on a transport neutral basis. Is it really possible to do in any way that makes sense? Wouldn't caching at a lower layer, with transport/device specific knowledge, make more sense? From mshefty at ichips.intel.com Wed Sep 21 09:47:20 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 09:47:20 -0700 Subject: [openib-general][RFC][CMA]: ib_cma_get_device hot unplug issue In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1020855@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F1020855@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <43318E98.5070601@ichips.intel.com> Caitlin Bestler wrote: >>I don't think that the CMA needs to register as a device >>client. I was referring to providing a call such as: >> >>ib_device_get_ip(struct ib_device, struct sockaddr*, int >>num_sockaddr); >> >>To return all IP addresses associated with the given device. >>The caller can now coordinate calling this routine with >>device removal. >> > Why not simply specify the net device, and use the existing > look ups to get the devices IP addresses? That might work. I was just trying to toss out some ideas. A problem that I see is trying to associate net_device's with ib_device's. I haven't thought of a good way to obtain that information if it's needed, which I'm guessing it will be. - Sean From mshefty at ichips.intel.com Wed Sep 21 09:59:03 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 09:59:03 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1020856@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F1020856@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <43319157.5020708@ichips.intel.com> Caitlin Bestler wrote: > That's certainly an acceptably low overhead for iWARP IHVs, > provided there are applications that want this control and > *not* also need even more IB-specific CM control. I still > have the same skepticism I had for the IT-API's exposing > of paths via a transport neutral API. Namely, is there > really any basis to select amongst multiple paths from > transport neutral code? The same applies to caching of > address translations on a transport neutral basis. Is > it really possible to do in any way that makes sense? > Wouldn't caching at a lower layer, with transport/device > specific knowledge, make more sense? I guess I view this API slightly differently than being just a transport neutral connection interface. I also see it as a way to connect over IB using IP addresses, which today is only possible if using ib_at. That is, the API could do both. - Sean From mshefty at ichips.intel.com Wed Sep 21 10:03:20 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 10:03:20 -0700 Subject: [openib-general][RFC][CMA]: ib_cma_get_device hot unplug issue In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1020859@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F1020859@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <43319258.7070509@ichips.intel.com> Caitlin Bestler wrote: > Doesn't IPoIB have to associate a net_device with the ib_device? It does, but this API and the ULPs using it are sitting over verbs. I'm not sure how to get from an IP address to the ib_device, but I haven't studied this in detail yet to know what an implementation would look like. This may not be hard, I just don't know how to do it myself. - Sean From arlin.r.davis at intel.com Wed Sep 21 10:24:06 2005 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 21 Sep 2005 10:24:06 -0700 Subject: [openib-general] [PATCH] uDAPL dtest changes to measure RDMA reads Message-ID: James, Here is a patch to improve dtest and measure RDMA reads. Attachment included. -arlin Signed-off by: Arlin Davis Index: test/dtest/dtest.c =================================================================== --- test/dtest/dtest.c (revision 3459) +++ test/dtest/dtest.c (working copy) @@ -42,10 +42,12 @@ #include #ifndef DAPL_PROVIDER -#define DAPL_PROVIDER "OpenIB1_2" +#define DAPL_PROVIDER "OpenIB-ib0" #endif #define MAX_POLLING_CNT 50000 +#define MAX_RDMA_RD 4 +#define MAX_PROCS 1000 /* Header files needed for DAT/uDAPL */ #include "dat/udat.h" @@ -66,7 +68,8 @@ static DAT_CR_HANDLE h_cr = DAT_HANDLE_NULL; static DAT_EVD_HANDLE h_async_evd = DAT_HANDLE_NULL; -static DAT_EVD_HANDLE h_dto_evd = DAT_HANDLE_NULL; +static DAT_EVD_HANDLE h_dto_req_evd = DAT_HANDLE_NULL; +static DAT_EVD_HANDLE h_dto_rcv_evd = DAT_HANDLE_NULL; static DAT_EVD_HANDLE h_cr_evd = DAT_HANDLE_NULL; static DAT_EVD_HANDLE h_conn_evd = DAT_HANDLE_NULL; static DAT_CNO_HANDLE h_dto_cno = DAT_HANDLE_NULL; @@ -123,7 +126,8 @@ double epc; double epf; double rdma_wr; - double rdma_rd; + double rdma_rd[MAX_RDMA_RD]; + double rdma_rd_total; double rtt; double close; } time; @@ -137,7 +141,7 @@ static int polling=1; static int poll_count=0; static int rdma_wr_poll_count=0; -static int rdma_rd_poll_count=0; +static int rdma_rd_poll_count[MAX_RDMA_RD]={0}; static int pin_memory=0; static int delay=0; static int buf_len=RDMA_BUFFER_SIZE; @@ -147,9 +151,6 @@ static int burst_msg_posted=0; static int burst_msg_index=0; -#define MAX_RDMA_RD 4 -#define MAX_PROCS 1000 - static pid_t child[MAX_PROCS+1]; /* forward prototypes */ @@ -181,7 +182,7 @@ main(int argc, char **argv) { - int c; + int i,c; DAT_RETURN ret; /* parse arguments */ @@ -191,7 +192,6 @@ { case 's': server = 1; - printf("%d Running as server\n",getpid()); fflush(stdout); break; case 'c': @@ -310,7 +310,8 @@ ep_attr.ep_provider_specific = NULL; start = get_time(); - ret = dat_ep_create( h_ia, h_pz, h_dto_evd, h_dto_evd, h_conn_evd, &ep_attr, &h_ep ); + ret = dat_ep_create( h_ia, h_pz, h_dto_rcv_evd, + h_dto_req_evd, h_conn_evd, &ep_attr, &h_ep ); stop = get_time(); time.epc += ((stop - start)*1.0e6); time.total += time.epc; @@ -432,13 +433,16 @@ LOGPRINTF("%d Closed Interface Adaptor\n",getpid()); printf( "\n%d: DAPL Test Complete.\n\n",getpid()); - printf( "%d: RDMA write: Total=%10.2lf usec, %d bursts, itime=%10.2lf usec, pc=%d\n", - getpid(), time.rdma_wr, burst, time.rdma_wr/burst, rdma_wr_poll_count ); - printf( "%d: RDMA read: Total=%10.2lf usec, %d bursts, itime=%10.2lf usec, pc=%d\n", - getpid(), time.rdma_rd, MAX_RDMA_RD, time.rdma_rd/MAX_RDMA_RD, rdma_rd_poll_count ); - printf( "%d: Message RTT: Total=%10.2lf usec, %d bursts, itime=%10.2lf usec, pc=%d\n\n", - getpid(), time.rtt, burst, time.rtt/burst, poll_count ); - + printf( "%d: Message RTT: Total=%10.2lf usec, %d bursts, itime=%10.2lf usec, pc=%d\n", + getpid(), time.rtt, burst, time.rtt/burst, poll_count ); + printf( "%d: RDMA write: Total=%10.2lf usec, %d bursts, itime=%10.2lf usec, pc=%d\n", + getpid(), time.rdma_wr, burst, + time.rdma_wr/burst, rdma_wr_poll_count ); + for(i=0;i= MSG_BUF_COUNT ) return( DAT_ABORT ); @@ -1087,22 +1090,71 @@ l_iov.pad = 0; l_iov.virtual_address = (DAT_VADDR)(unsigned long)sbuf; l_iov.segment_length = buf_len; - - start = get_time(); + for (i=0;i From arlin.r.davis at intel.com Wed Sep 21 10:27:54 2005 From: arlin.r.davis at intel.com (Arlin Davis) Date: Wed, 21 Sep 2005 10:27:54 -0700 Subject: [openib-general] [PATCH] uDAPL workaround for RDMA read performance anomaly Message-ID: James, Here is a workaround for the RDMA read performance anomaly until we figure out what is going on. -arlin Signed-off by: Arlin Davis Index: dapl/openib/dapl_ib_qp.c =================================================================== --- dapl/openib/dapl_ib_qp.c (revision 3512) +++ dapl/openib/dapl_ib_qp.c (working copy) @@ -357,7 +357,9 @@ return(dapl_convert_errno(errno," qp_cRTR")); attr.path_mtu = IBV_MTU_1024; - attr.rq_psn = qp_handle->qp_num; + /* RDMA read performance anomaly, high value psn ? */ + /* attr.rq_psn = qp_handle->qp_num */; + attr.rq_psn = 1; break; case IBV_QPS_RTS: @@ -368,6 +370,7 @@ if (ib_cm_init_qp_attr(conn->cm_id, &attr, &mask)) return(dapl_convert_errno(errno," qp_cRTS")); + attr.sq_psn = 1; break; default: From rolandd at cisco.com Wed Sep 21 12:22:15 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 21 Sep 2005 12:22:15 -0700 Subject: [openib-general] [RFC] libibverbs completion event handling Message-ID: <521x3imdjs.fsf@cisco.com> While thinking about how to handle some of the issues raised by Al Viro in , I realized that our verbs interface could be improved to make delivery of completion events more flexible. For example, Arlin's request for using one FD for each CQ can be accomodated quite nicely. The basic idea is to create new objects that I call "completion vectors" and "completion channels." Completion vectors refer to the interrupt generated when a completion event occurs. With the current drivers, there will always be a single completion vector, but once we have full MSI-X support, multiple completion vectors will be possible. Orthogonal to this is the notion of a completion channel. This is a FD used for delivering completion events to userspace. Completion vectors are handled by the kernel, and userspace cannot change the number of vectors that available. On the other hand, completion channels are created at the request of a userspace process, and userspace can create as many channels as it wants. Every userspace CQ has a completion vector and a completion channel. Multiple CQs can share the same completion vector and/or the same completion channel. CQs with different completion vectors can still share a completion channel, and vice versa. The exact API would be something like the below. Thoughts? Thanks, Roland struct ibv_comp_channel { int fd; }; /** * ibv_create_comp_channel - Create a completion event channel */ extern struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context *context); /** * ibv_destroy_comp_channel - Destroy a completion event channel */ extern int ibv_destroy_comp_channel(struct ibv_comp_channel *channel); /** * ibv_create_cq - Create a completion queue * @context - Context CQ will be attached to * @cqe - Minimum number of entries required for CQ * @cq_context - Consumer-supplied context returned for completion events * @channel - Completion channel where completion events will be queued. * May be NULL if completion events will not be used. * @comp_vector - Completion vector used to signal completion events. * Must be >= 0 and < context->num_comp_vectors. */ extern struct ibv_cq *ibv_create_cq(struct ibv_context *context, int cqe, void *cq_context, struct ibv_comp_channel *channel, int comp_vector); /** * ibv_get_cq_event - Read next CQ event * @channel: Channel to get next event from. * @cq: Used to return pointer to CQ. * @cq_context: Used to return consumer-supplied CQ context. * * All completion events returned by ibv_get_cq_event() must * eventually be acknowledged with ibv_ack_cq_events(). */ extern int ibv_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq, void **cq_context); From kingman at austin.rr.com Wed Sep 21 12:24:12 2005 From: kingman at austin.rr.com (John Kingman) Date: Wed, 21 Sep 2005 14:24:12 -0500 (CDT) Subject: [openib-general] [PATCH] [CM] CM DREQ's need the redirected qpn Message-ID: Another piece of CM redirection. If the remote CM has been redirected, need to send DREQs there too. Tested with our target. Signed-off-by: John Kingman Index: cm.c =================================================================== --- cm.c (revision 3502) +++ cm.c (working copy) @@ -1699,7 +1699,7 @@ static void cm_format_dreq(struct cm_dre cm_form_tid(cm_id_priv, CM_MSG_SEQUENCE_DREQ)); dreq_msg->local_comm_id = cm_id_priv->id.local_id; dreq_msg->remote_comm_id = cm_id_priv->id.remote_id; - cm_dreq_set_remote_qpn(dreq_msg, cm_id_priv->remote_qpn); + cm_dreq_set_remote_qpn(dreq_msg, cm_id_priv->id.remote_cm_qpn); if (private_data && private_data_len) memcpy(dreq_msg->private_data, private_data, private_data_len); From xma at us.ibm.com Wed Sep 21 12:28:11 2005 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 21 Sep 2005 12:28:11 -0700 Subject: [openib-general] problems on device/ports initialization Message-ID: If for any reason, some ports of the IB device are not ready during device initialization, none of the ports on that device are being initialized. (see ib_mad_device_init()). In other words the device is not usable by this limitation. Also, once the unready ports are avaliable later, IB should allow these ports to be initialized later. Any objection? Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Wed Sep 21 12:28:21 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 12:28:21 -0700 Subject: [openib-general] [PATCH] [CM] CM DREQ's need the redirected qpn In-Reply-To: References: Message-ID: <4331B455.7020504@ichips.intel.com> John Kingman wrote: > Another piece of CM redirection. If the remote CM has been redirected, > need to send DREQs there too. Tested with our target. > > Signed-off-by: John Kingman > > Index: cm.c > =================================================================== > --- cm.c (revision 3502) > +++ cm.c (working copy) > @@ -1699,7 +1699,7 @@ static void cm_format_dreq(struct cm_dre > cm_form_tid(cm_id_priv, CM_MSG_SEQUENCE_DREQ)); > dreq_msg->local_comm_id = cm_id_priv->id.local_id; > dreq_msg->remote_comm_id = cm_id_priv->id.remote_id; > - cm_dreq_set_remote_qpn(dreq_msg, cm_id_priv->remote_qpn); > + cm_dreq_set_remote_qpn(dreq_msg, cm_id_priv->id.remote_cm_qpn); The remote QPN in the DREQ is supposed to be the QPN of the connection, and not the CM's QP. I think that the original code is correct. - Sean From Don.Dhondt at Bull.com Wed Sep 21 12:34:10 2005 From: Don.Dhondt at Bull.com (Don.Dhondt at Bull.com) Date: Wed, 21 Sep 2005 12:34:10 -0700 Subject: [openib-general] Problem Building OpenIB on 2.6.12 kernel In-Reply-To: <52br2nm4b1.fsf@cisco.com> Message-ID: Hi >> Don> Well, I'm told by Jerome it is in <.ipoib_main.o.d> > That's just the automatically generated dependency file. I'd like to > understand what's causing gcc to include namei.h when you compile > ipoib_main.c. As far as I can tell from a quick grep of the kernel > source, linux/namei.h is not included indirectly through any other > Linux include files: I found the problem - it is in the patch that we apply for the Lustre product. If I keep this patch out, IPoIB builds correctly. We will be studying the patch to see what we can change ... And the backport patch that Hal mentionned in a previous email ( https://openib.org/svn/gen2/branches/backport/) took care of 4 unresolved symbols in ib_uat.ko (class_destroy, class_create, class_device_create, class_device_destroy). So all drivers are now built - thank you both for your help. Jerome -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Wed Sep 21 12:38:18 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 12:38:18 -0700 Subject: [openib-general] [RFC] libibverbs completion event handling In-Reply-To: <521x3imdjs.fsf@cisco.com> References: <521x3imdjs.fsf@cisco.com> Message-ID: <4331B6AA.40704@ichips.intel.com> Roland Dreier wrote: > While thinking about how to handle some of the issues raised by Al > Viro in , I realized that our Reminds me why I never read lkml. > The exact API would be something like the below. Thoughts? The changes look good to me, but I can't say that I fully understand all of the issues that were raise in the thread. - Sean From jlentini at netapp.com Wed Sep 21 12:40:24 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 21 Sep 2005 15:40:24 -0400 (EDT) Subject: [openib-general] Re: [PATCH] uDAPL workaround for RDMA read performance anomaly In-Reply-To: References: Message-ID: On Wed, 21 Sep 2005, Arlin Davis wrote: arlin> James, arlin> arlin> Here is a workaround for the RDMA read performance anomaly arlin> until we figure out what is going on. arlin> arlin> -arlin Committed in revision 3513. From halr at voltaire.com Wed Sep 21 12:36:29 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 15:36:29 -0400 Subject: [openib-general] problems on device/ports initialization In-Reply-To: References: Message-ID: <1127331388.4426.1008.camel@hal.voltaire.com> On Wed, 2005-09-21 at 15:28, Shirley Ma wrote: > If for any reason, some ports of the IB device are not ready during > device initialization, none of the ports on that device are being > initialized. (see ib_mad_device_init()). In other words the device is > not usable by this limitation. Also, once the unready ports are > avaliable later, IB should allow these ports to be initialized later. > Any objection? Is there some additional indication after the IB device is detected which indicates the ports are "ready" ? -- Hal From ardavis at ichips.intel.com Wed Sep 21 12:56:44 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 21 Sep 2005 12:56:44 -0700 Subject: [openib-general] [RFC] libibverbs completion event handling In-Reply-To: <521x3imdjs.fsf@cisco.com> References: <521x3imdjs.fsf@cisco.com> Message-ID: <4331BAFC.8000006@ichips.intel.com> Roland Dreier wrote: >While thinking about how to handle some of the issues raised by Al >Viro in , I realized that our >verbs interface could be improved to make delivery of completion >events more flexible. For example, Arlin's request for using one FD >for each CQ can be accomodated quite nicely. > > Yes, the API maps very nicely to uDAPL direct CQ wait objects. Thanks. From xma at us.ibm.com Wed Sep 21 13:01:53 2005 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 21 Sep 2005 13:01:53 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <1127331388.4426.1008.camel@hal.voltaire.com> Message-ID: >Is there some additional indication after the IB device is detected >which indicates the ports are "ready" ? No, the only think I can think of after the cable is connected. Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 Hal Rosenstock 09/21/2005 12:36 PM To Shirley Ma/Beaverton/IBM at IBMUS cc openib-general at openib.org Subject Re: [openib-general] problems on device/ports initialization On Wed, 2005-09-21 at 15:28, Shirley Ma wrote: > If for any reason, some ports of the IB device are not ready during > device initialization, none of the ports on that device are being > initialized. (see ib_mad_device_init()). In other words the device is > not usable by this limitation. Also, once the unready ports are > avaliable later, IB should allow these ports to be initialized later. > Any objection? Is there some additional indication after the IB device is detected which indicates the ports are "ready" ? -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From viswa.krish at gmail.com Wed Sep 21 13:14:36 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Wed, 21 Sep 2005 13:14:36 -0700 Subject: [openib-general] ib_create_cq memory leak? Message-ID: <4df28be40509211314243b4bed@mail.gmail.com> I ran into this issue when using the kernel API to create CQ's. In order to reproduce this problem, I wrote a small kernel module which creates 4K CQ's and destroys them. After running the test (8-10 times), I saw create_cq error with error -12 (ENOMEM). I am attaching the test module source code with Makefiles [root src]# svn info (Latest code) Path: . URL: https://openib.org/svn/gen2/trunk/src Repository UUID: 21a7a0b7-18d7-0310-8e21-e8b31bdbf5cd Revision: 3512 Node Kind: directory Schedule: normal Last Changed Author: halr Last Changed Rev: 3511 Last Changed Date: 2005-09-21 08:57:38 -0700 (Wed, 21 Sep 2005) To compile the code, change the KERNELSRC variable in mysock.mak to point to your kernel source tree #make -f mysock.mak #insmod mysock.ko To run the test #echo 1 > /dev/mysock After 8-10 times of running the above, you will see a -12 error on the console. This problem does not occur when you create a single CQ and destroy it immediately in a loop (I tried 100000 times). This occurs when you create 4K CQ's and then destroy it. ib_mthca 0000:05:00.0: Mapped page at 362f9000 to 7e000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 362fa000 to 41000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35d10000 to 7d000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35d11000 to 42000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35f27000 to 7c000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35f28000 to 43000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3593f000 to 7b000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35940000 to 44000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35b56000 to 7a000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35b57000 to 45000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3556d000 to 79000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3556e000 to 46000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35785000 to 78000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35786000 to 47000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3519b000 to 77000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3519c000 to 48000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 7e000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 7d000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 7c000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 7b000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 7a000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 79000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 78000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 48000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 77000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. ib_mthca 0000:05:00.0: Mapped page at 35b03000 to 76000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 362ba000 to 75000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35d2c000 to 74000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35c83000 to 73000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35b99000 to 72000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35db0000 to 71000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 356c5000 to 70000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35adc000 to 6f000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35add000 to 49000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 76000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 75000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 74000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 73000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 72000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 71000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 70000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 49000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6f000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. ib_mthca 0000:05:00.0: Mapped page at 362cf000 to 6e000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35a0f000 to 6d000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35070000 to 6c000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35e83000 to 6b000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35bd8000 to 6a000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 351ef000 to 69000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35c44000 to 68000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 355db000 to 67000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 355dc000 to 4a000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6e000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6d000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6c000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6b000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6a000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 69000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 68000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 4a000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 67000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. ib_mthca 0000:05:00.0: Mapped page at 35d15000 to 66000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 359d0000 to 65000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 36121000 to 64000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35544000 to 63000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35c15000 to 62000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35e2c000 to 61000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 36342000 to 60000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 358d7000 to 5f000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 358d8000 to 4b000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 66000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 65000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 64000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 63000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 62000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 61000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 60000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 4b000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5f000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. ib_mthca 0000:05:00.0: Mapped page at 35b83000 to 5e000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35762000 to 5d000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3595b000 to 5c000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3619e000 to 5b000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3574f000 to 5a000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 354a3000 to 59000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35db8000 to 58000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3520e000 to 57000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3520f000 to 4c000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5e000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5d000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5c000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5b000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5a000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 59000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 58000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 4c000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 57000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. ib_mthca 0000:05:00.0: Mapped page at 35b80000 to 56000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35f01000 to 55000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 361ba000 to 54000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3518d000 to 53000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3560e000 to 52000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 362c1000 to 51000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35fb2000 to 50000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3500b000 to 4f000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 4f000 from ICM. mysock: CQ creation err -12, iter 4091 ib_mthca 0000:05:00.0: Unmapping 1 pages at 56000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 55000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 54000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 53000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 52000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 51000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 50000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mysock.c Type: application/octet-stream Size: 3373 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: mysock.mak Type: application/octet-stream Size: 92 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Makefile Type: application/octet-stream Size: 64 bytes Desc: not available URL: From viswa.krish at gmail.com Wed Sep 21 13:16:32 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Wed, 21 Sep 2005 13:16:32 -0700 Subject: [openib-general] ib_create_cq memory leak? (Resend) Message-ID: <4df28be40509211316678d95e3@mail.gmail.com> I ran into this issue when using the kernel API to create CQ's. In order to reproduce this problem, I wrote a small kernel module which creates 4K CQ's and destroys them. After running the test (8-10 times), I saw create_cq error with error -12 (ENOMEM). I am attaching the test module source code with Makefiles [root src]# svn info (Latest code) Path: . URL: https://openib.org/svn/gen2/trunk/src Repository UUID: 21a7a0b7-18d7-0310-8e21 -e8b31bdbf5cd Revision: 3512 Node Kind: directory Schedule: normal Last Changed Author: halr Last Changed Rev: 3511 Last Changed Date: 2005-09-21 08:57:38 -0700 (Wed, 21 Sep 2005) To compile the code, change the KERNELSRC variable in mysock.mak to point to your kernel source tree #make -f mysock.mak #insmod mysock.ko To run the test #echo 1 > /dev/mysock After 8-10 times of running the above, you will see a -12 error on the console. This problem does not occur when you create a single CQ and destroy it immediately in a loop (I tried 100000 times). This occurs when you create 4K CQ's and then destroy it. ib_mthca 0000:05:00.0: Mapped page at 362f9000 to 7e000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 362fa000 to 41000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35d10000 to 7d000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35d11000 to 42000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35f27000 to 7c000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35f28000 to 43000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3593f000 to 7b000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35940000 to 44000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35b56000 to 7a000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35b57000 to 45000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3556d000 to 79000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3556e000 to 46000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35785000 to 78000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35786000 to 47000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3519b000 to 77000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3519c000 to 48000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 7e000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 7d000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 7c000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 7b000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 7a000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 79000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 78000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 48000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 77000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. ib_mthca 0000:05:00.0: Mapped page at 35b03000 to 76000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 362ba000 to 75000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35d2c000 to 74000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35c83000 to 73000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35b99000 to 72000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35db0000 to 71000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 356c5000 to 70000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35adc000 to 6f000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35add000 to 49000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 76000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 75000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 74000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 73000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 72000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 71000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 70000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 49000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6f000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. ib_mthca 0000:05:00.0: Mapped page at 362cf000 to 6e000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35a0f000 to 6d000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35070000 to 6c000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35e83000 to 6b000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35bd8000 to 6a000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 351ef000 to 69000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35c44000 to 68000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 355db000 to 67000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 355dc000 to 4a000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6e000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6d000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6c000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6b000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 6a000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 69000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 68000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 4a000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 67000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. ib_mthca 0000:05:00.0: Mapped page at 35d15000 to 66000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 359d0000 to 65000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 36121000 to 64000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35544000 to 63000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35c15000 to 62000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35e2c000 to 61000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 36342000 to 60000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 358d7000 to 5f000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 358d8000 to 4b000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 66000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 65000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 64000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 63000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 62000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 61000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 60000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 4b000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5f000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. ib_mthca 0000:05:00.0: Mapped page at 35b83000 to 5e000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35762000 to 5d000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3595b000 to 5c000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3619e000 to 5b000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3574f000 to 5a000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 354a3000 to 59000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35db8000 to 58000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3520e000 to 57000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3520f000 to 4c000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5e000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5d000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5c000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5b000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 5a000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 59000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 58000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 4c000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 57000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. ib_mthca 0000:05:00.0: Mapped page at 35b80000 to 56000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35f01000 to 55000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 361ba000 to 54000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3518d000 to 53000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3560e000 to 52000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 362c1000 to 51000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 35fb2000 to 50000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 26040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 20040000 for ICM. ib_mthca 0000:05:00.0: Mapped 1 chunks/256 KB at 25840000 for ICM. ib_mthca 0000:05:00.0: Mapped page at 3500b000 to 4f000 for ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 4f000 from ICM. mysock: CQ creation err -12, iter 4091 ib_mthca 0000:05:00.0: Unmapping 1 pages at 56000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 55000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 54000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 53000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 52000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 51000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 25840000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 20040000 from ICM. ib_mthca 0000:05:00.0: Unmapping 1 pages at 50000 from ICM. ib_mthca 0000:05:00.0: Unmapping 64 pages at 26040000 from ICM. mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 mysock: CQ creation err -12, iter 507 -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: From kingman at austin.rr.com Wed Sep 21 14:00:10 2005 From: kingman at austin.rr.com (John Kingman) Date: Wed, 21 Sep 2005 16:00:10 -0500 (CDT) Subject: [openib-general] [PATCH] [CM] CM DREQ's need the redirected qpn In-Reply-To: <4331B455.7020504@ichips.intel.com> References: <4331B455.7020504@ichips.intel.com> Message-ID: On Wed, 21 Sep 2005, Sean Hefty wrote: > John Kingman wrote: >> Another piece of CM redirection. If the remote CM has been redirected, >> need to send DREQs there too. Tested with our target. >> >> Signed-off-by: John Kingman >> >> Index: cm.c >> =================================================================== >> --- cm.c (revision 3502) >> +++ cm.c (working copy) >> @@ -1699,7 +1699,7 @@ static void cm_format_dreq(struct cm_dre >> cm_form_tid(cm_id_priv, CM_MSG_SEQUENCE_DREQ)); >> dreq_msg->local_comm_id = cm_id_priv->id.local_id; >> dreq_msg->remote_comm_id = cm_id_priv->id.remote_id; >> - cm_dreq_set_remote_qpn(dreq_msg, cm_id_priv->remote_qpn); >> + cm_dreq_set_remote_qpn(dreq_msg, cm_id_priv->id.remote_cm_qpn); > > The remote QPN in the DREQ is supposed to be the QPN of the connection, and not > the CM's QP. I think that the original code is correct. Am I misreading o12-8? John From halr at voltaire.com Wed Sep 21 13:56:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 16:56:46 -0400 Subject: [openib-general] problems on device/ports initialization In-Reply-To: References: Message-ID: <1127336205.4426.1177.camel@hal.voltaire.com> On Wed, 2005-09-21 at 16:01, Shirley Ma wrote: > >Is there some additional indication after the IB device is detected > >which indicates the ports are "ready" ? > > No, the only think I can think of after the cable is connected. I don't think that matters. What is the initialization failure that is occurring ? Any error message ? -- hal From mshefty at ichips.intel.com Wed Sep 21 14:06:47 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 14:06:47 -0700 Subject: [openib-general] [PATCH] [CM] CM DREQ's need the redirected qpn In-Reply-To: References: <4331B455.7020504@ichips.intel.com> Message-ID: <4331CB67.6090505@ichips.intel.com> John Kingman wrote: >>The remote QPN in the DREQ is supposed to be the QPN of the connection, and not >>the CM's QP. I think that the original code is correct. > > Am I misreading o12-8? The DREQ field "Remote QPN/EECN" is defined in 12.7.37. It indicates the QP that is being disconnected. o12.8 just says that the CM shall support responding to DREQ messages. My take is that the DREQ will be sent to the redirected QP, but that value in the DREQ message itself is the QPN of the connection. - Sean From xma at us.ibm.com Wed Sep 21 14:52:38 2005 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 21 Sep 2005 14:52:38 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <1127336205.4426.1177.camel@hal.voltaire.com> Message-ID: > I don't think that matters. What is the initialization failure that is > occurring ? Any error message ? We saw an error on Galaxy adapter while ib_mad fails to create QP1 for the second port, then the first port becomes useless. The right behavior shouldn't prevent port1 from working. Second if loading ib_ipoib module afterwards, and configure the ib interface, the kernel will hung and oops. Simpling add a port_usable[] field in ib_device will address this problem. After connecting the IB cable to port1, the port status changes from DOWN to ACTIVE. The QP1 creation would be successful if in the async event checking the port_usable[port_num] and calling mad_port_open() and agent_port_open() for that particular port. I haven't tested this out yet. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 Hal Rosenstock 09/21/2005 01:56 PM To Shirley Ma/Beaverton/IBM at IBMUS cc openib-general at openib.org Subject Re: [openib-general] problems on device/ports initialization On Wed, 2005-09-21 at 16:01, Shirley Ma wrote: > >Is there some additional indication after the IB device is detected > >which indicates the ports are "ready" ? > > No, the only think I can think of after the cable is connected. I don't think that matters. What is the initialization failure that is occurring ? Any error message ? -- hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Wed Sep 21 15:05:39 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 21 Sep 2005 15:05:39 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: (Shirley Ma's message of "Wed, 21 Sep 2005 14:52:38 -0700") References: Message-ID: <52hdcekrf0.fsf@cisco.com> Shirley> We saw an error on Galaxy adapter while ib_mad fails to Shirley> create QP1 for the second port, then the first port Shirley> becomes useless. Actually this ties into some suspicious code I noticed while reading the ehca driver. Why do you fail the creation of QP1 just because your port is not active yet? It should be OK to create QP0 and QP1 before a port is up. How does Galaxy handle it if someone creates QP1 and then the port later goes down? - R. From rolandd at cisco.com Wed Sep 21 15:07:19 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 21 Sep 2005 15:07:19 -0700 Subject: [openib-general] ib_create_cq memory leak? In-Reply-To: <4df28be40509211314243b4bed@mail.gmail.com> (Viswanath Krishnamurthy's message of "Wed, 21 Sep 2005 13:14:36 -0700") References: <4df28be40509211314243b4bed@mail.gmail.com> Message-ID: <528xxqkrc8.fsf@cisco.com> Thanks, I think I see the problem. There seems to be a bug in mthca_memfree.c, where doorbell records are not reclaimed properly. I should have a fix for you to test by the end of the week. - R. From halr at voltaire.com Wed Sep 21 15:18:03 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 18:18:03 -0400 Subject: [openib-general] problems on device/ports initialization In-Reply-To: References: Message-ID: <1127341083.15613.47.camel@hal.voltaire.com> On Wed, 2005-09-21 at 17:52, Shirley Ma wrote: > > I don't think that matters. What is the initialization failure that > is > > occurring ? Any error message ? > > We saw an error on Galaxy adapter while ib_mad fails to create QP1 for > the second port, then the first port becomes useless. You are talking about creating QPs working or not, not whether the port is cabled or not. Are those related on Galaxy ? > The right behavior shouldn't prevent port1 from working. So you would prefer to work with the ports which can be started ? > Second if loading ib_ipoib module afterwards, and configure the ib > interface, the kernel will hung and oops. Simpling add a port_usable[] > field in ib_device will address this problem. And what determines port_usable ? > After connecting the IB cable to port1, the port status changes from > DOWN to ACTIVE. The QP1 creation would be successful if in the async > event checking the port_usable[port_num] and calling mad_port_open() > and agent_port_open() for that particular port. I haven't tested this > out yet. Was QP0 creation successful on that "failed" port but QP1 creation relies on port active ? -- Hal From kingman at austin.rr.com Wed Sep 21 15:24:51 2005 From: kingman at austin.rr.com (John Kingman) Date: Wed, 21 Sep 2005 17:24:51 -0500 (CDT) Subject: [openib-general] [PATCH] [CM] CM DREQ's need the redirected qpn In-Reply-To: <4331CB67.6090505@ichips.intel.com> References: <4331B455.7020504@ichips.intel.com> <4331CB67.6090505@ichips.intel.com> Message-ID: On Wed, 21 Sep 2005, Sean Hefty wrote: > John Kingman wrote: >> > The remote QPN in the DREQ is supposed to be the QPN of the connection, >> > and not >> > the CM's QP. I think that the original code is correct. >> >> Am I misreading o12-8? > > The DREQ field "Remote QPN/EECN" is defined in 12.7.37. It indicates the QP > that is being disconnected. o12.8 just says that the CM shall support > responding to DREQ messages. > > My take is that the DREQ will be sent to the redirected QP, but that value in > the DREQ message itself is the QPN of the connection. You're right, of course. I thought my change had fixed a problem, but when I backed it out, the problem didn't recur. The problem must have been fixed on the target side at about the same time. So, as Gilda Radner used to say ... "Nevermind." Thanks, John From caitlinb at broadcom.com Wed Sep 21 15:33:20 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Wed, 21 Sep 2005 15:33:20 -0700 Subject: [openib-general] [RFC] libibverbs completion event handling Message-ID: <54AD0F12E08D1541B826BE97C98F99F1020860@NT-SJCA-0751.brcm.ad.broadcom.com> I'm not sure I follow what a "completion channel" is. My understanding is that work completions are stored in user-accessible memory (typically a ring buffer). This enables fast-path reaping of work completions. The OS has no involvement unless notifications are enabled. The "completion vector" is used to report completion notifications. So is the completion vector a *single* resource used by the driver/verbs to report completions, where said notifications are then split into user context dependent "completion channels"? The RDMAC verbs did not define callbacks to userspace at all. Instead it is assumed that the proxy for user mode services will receive the callbacks, and how it relays those notifications to userspace is outside the scope of the verbs. Both uDAPL and ITAPI define relays of notifications to AEVDS/CNOs and/or file descriptors. Forwarding a completion notification to userspace in order to make a callback in userspace so that it can kick an fd to wake up another thread doesn't make much sense. The uDAPL/ITAPI/whatever proxy can perform all of these functions without any device dependencies and in a way that is fully optimal for the usermode API that is being used. For kernel clients, I don't see any need for anything beyond the already defined callbacks direct from the device-dependent code. Even in the typical case where the usermode application does an evd_wait() on the DAT or ITAPI endpoint, the DAT/ITAPI proxy will be able to determine which thread should be woken and could even do so optimally. It also allows the proxy to implemenet Access Layer features such as EVD thresholding without device-specific support. > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier > Sent: Wednesday, September 21, 2005 12:22 PM > To: openib-general at openib.org > Subject: [openib-general] [RFC] libibverbs completion event handling > > While thinking about how to handle some of the issues raised > by Al Viro in , I > realized that our verbs interface could be improved to make > delivery of completion events more flexible. For example, > Arlin's request for using one FD for each CQ can be > accomodated quite nicely. > > The basic idea is to create new objects that I call > "completion vectors" and "completion channels." Completion > vectors refer to the interrupt generated when a completion > event occurs. With the current drivers, there will always be > a single completion vector, but once we have full MSI-X > support, multiple completion vectors will be possible. > Orthogonal to this is the notion of a completion channel. > This is a FD used for delivering completion events to userspace. > > Completion vectors are handled by the kernel, and userspace > cannot change the number of vectors that available. On the > other hand, completion channels are created at the request of > a userspace process, and userspace can create as many > channels as it wants. > > Every userspace CQ has a completion vector and a completion channel. > Multiple CQs can share the same completion vector and/or the > same completion channel. CQs with different completion > vectors can still share a completion channel, and vice versa. > > The exact API would be something like the below. Thoughts? > > Thanks, > Roland > > struct ibv_comp_channel { > int fd; > }; > > /** > * ibv_create_comp_channel - Create a completion event > channel */ extern struct ibv_comp_channel > *ibv_create_comp_channel(struct ibv_context *context); > > /** > * ibv_destroy_comp_channel - Destroy a completion event > channel */ extern int ibv_destroy_comp_channel(struct > ibv_comp_channel *channel); > > /** > * ibv_create_cq - Create a completion queue > * @context - Context CQ will be attached to > * @cqe - Minimum number of entries required for CQ > * @cq_context - Consumer-supplied context returned for > completion events > * @channel - Completion channel where completion events will > be queued. > * May be NULL if completion events will not be used. > * @comp_vector - Completion vector used to signal completion events. > * Must be >= 0 and < context->num_comp_vectors. > */ > extern struct ibv_cq *ibv_create_cq(struct ibv_context > *context, int cqe, > void *cq_context, > struct ibv_comp_channel *channel, > int comp_vector); > > /** > * ibv_get_cq_event - Read next CQ event > * @channel: Channel to get next event from. > * @cq: Used to return pointer to CQ. > * @cq_context: Used to return consumer-supplied CQ context. > * > * All completion events returned by ibv_get_cq_event() must > * eventually be acknowledged with ibv_ack_cq_events(). > */ > extern int ibv_get_cq_event(struct ibv_comp_channel *channel, > struct ibv_cq **cq, void > **cq_context); _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From xma at us.ibm.com Wed Sep 21 16:20:28 2005 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 21 Sep 2005 16:20:28 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <52hdcekrf0.fsf@cisco.com> Message-ID: >Why do you fail the creation of QP1 just because your port is not active yet? It should be OK to create QP0 and QP1 before a port is up. To successfully creating QP1, the status will change from INIT->RTR->RTS. Is it allowed if there is no LID assgined? > How does Galaxy handle it if someone creates QP1 and then the port later goes down? There is no impact on other active ports. Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 Roland Dreier 09/21/2005 03:05 PM To Shirley Ma/Beaverton/IBM at IBMUS cc Hal Rosenstock , openib-general at openib.org Subject Re: [openib-general] problems on device/ports initialization Shirley> We saw an error on Galaxy adapter while ib_mad fails to Shirley> create QP1 for the second port, then the first port Shirley> becomes useless. Actually this ties into some suspicious code I noticed while reading the ehca driver. Why do you fail the creation of QP1 just because your port is not active yet? It should be OK to create QP0 and QP1 before a port is up. How does Galaxy handle it if someone creates QP1 and then the port later goes down? - R. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Wed Sep 21 16:23:12 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 21 Sep 2005 16:23:12 -0700 Subject: [openib-general] [RFC] libibverbs completion event handling In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1020860@NT-SJCA-0751.brcm.ad.broadcom.com> (Caitlin Bestler's message of "Wed, 21 Sep 2005 15:33:20 -0700") References: <54AD0F12E08D1541B826BE97C98F99F1020860@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <52u0gej99b.fsf@cisco.com> Caitlin> I'm not sure I follow what a "completion channel" is. My Caitlin> understanding is that work completions are stored in Caitlin> user-accessible memory (typically a ring buffer). This Caitlin> enables fast-path reaping of work completions. The OS has Caitlin> no involvement unless notifications are enabled. Right. Notifications ("events" in the terminology I used) are the only thing I was talking about. Caitlin> The "completion vector" is used to report completion Caitlin> notifications. So is the completion vector a *single* Caitlin> resource used by the driver/verbs to report completions, Caitlin> where said notifications are then split into user context Caitlin> dependent "completion channels"? Yes. Caitlin> The RDMAC verbs did not define callbacks to userspace at Caitlin> all. Instead it is assumed that the proxy for user mode Caitlin> services will receive the callbacks, and how it relays Caitlin> those notifications to userspace is outside the scope of Caitlin> the verbs. That "outside the scope" part is exactly what I'm talking about implementing here. - R. From xma at us.ibm.com Wed Sep 21 16:23:35 2005 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 21 Sep 2005 16:23:35 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <1127341083.15613.47.camel@hal.voltaire.com> Message-ID: > You are talking about creating QPs working or not, not whether the port is cabled or not. Are those related on Galaxy ? Yes. >So you would prefer to work with the ports which can be started ? Yes. > And what determines port_usable ? >Was QP0 creation successful on that "failed" port but QP1 creation relies on port active ? Yes. QP1 relays on port to be active for the LID. Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Wed Sep 21 16:26:17 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 21 Sep 2005 16:26:17 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: (Shirley Ma's message of "Wed, 21 Sep 2005 16:20:28 -0700") References: Message-ID: <52psr2j946.fsf@cisco.com> Shirley> To successfully creating QP1, the status will change from Shirley> INIT->RTR->RTS. Is it allowed if there is no LID Shirley> assgined? Why would it not be allowed? I don't know of anything that forbids this. Roland> How does Galaxy handle it if someone creates QP1 and then Roland> the port later goes down? Shirley> There is no impact on other active ports. But what about the port that goes down? You have QP1 in the RTS state for port "X", and port "X" goes down. What happens to QP1? Also it seems that the ehca driver drop does not handle creating QP0 at all. How do you support running a subnet manager on Galaxy? - R. From rolandd at cisco.com Wed Sep 21 16:27:54 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 21 Sep 2005 16:27:54 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: (Shirley Ma's message of "Wed, 21 Sep 2005 16:23:35 -0700") References: Message-ID: <52ll1qj91h.fsf@cisco.com> Shirley> Yes. QP1 relays on port to be active for the LID. What happens if the LID changes after you create QP1? I'm really confused about how Galaxy handles special QPs. - R. From xma at us.ibm.com Wed Sep 21 16:28:07 2005 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 21 Sep 2005 16:28:07 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <1127341083.15613.47.camel@hal.voltaire.com> Message-ID: > And what determines port_usable ? Forgot to reply this in the previous email. Once the QP1/QP0 has been successfully created, the port is marked as usable. Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Wed Sep 21 16:34:38 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 16:34:38 -0700 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates Message-ID: The following patch updates the original CMA APIs. Updated implementation to follow. If there's agreement, can we check this into svn under the ULP directory? Signed-off-by: Sean Hefty /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2005 Intel Corporation. All rights reserved. * * This Software is licensed under one of the following licenses: * * 1) under the terms of the "Common Public License 1.0" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/cpl.php. * * 2) under the terms of the "The BSD License" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/bsd-license.php. * * 3) under the terms of the "GNU General Public License (GPL) Version 2" a * copy of which is available from the Open Source Initiative, see * http://www.opensource.org/licenses/gpl-license.php. * * Licensee has the right to choose one of the above licenses. * * Redistributions of source code must retain the above copyright * notice and one of the license notices. * * Redistributions in binary form must reproduce both the above copyright * notice, one of the license notices in the documentation * and/or other materials provided with the distribution. * */ #if !defined(RDMA_CMA_H) #define RDMA_CMA_H #include #include #include #include enum rdma_cma_event_type { RDMA_CMA_EVENT_ROUTE_FOUND, RDMA_CMA_EVENT_CONNECT_REQUEST, RDMA_CMA_EVENT_CONNECT_ERROR, RDMA_CMA_EVENT_UNREACHABLE, RDMA_CMA_EVENT_REJECTED, RDMA_CMA_EVENT_ESTABLISHED, RDMA_CMA_EVENT_DISCONNECTED, }; struct rdma_route { struct sockaddr src_addr; struct sockaddr dest_addr; struct ib_sa_path_rec path_rec; }; struct rdma_cma_event { enum rdma_cma_event_type event; void *private_data; }; struct rdma_cma_id; typedef void (*rdma_cma_event_handler)(struct rdma_cma_id *cma_id, struct rdma_cma_event *event); struct rdma_cma_id { struct ib_device *device; void *context; struct ib_qp *qp; rdma_cma_event_handler event_handler; struct rdma_route *route; }; struct rdma_cma_id* rdma_cma_create_id(struct ib_device *device, void *context, rdma_cma_event_handler event_handler); void rdma_cma_destroy_id(struct rdma_cma_id *cma_id); /** * rdma_cma_listen - this function is called by the passive side. It is * listening on a the specified port for incomming connection requests. */ int rdma_cma_listen(struct rdma_cma_id *cma_id, struct sockaddr *addr); int rdma_cma_get_route(struct rdma_cma_id *cma_id, struct sockaddr *src_addr, struct sockaddr *dest_addr); struct rdma_cma_conn_param { struct ib_qp *qp; const void *private_data; u8 private_data_len; u8 responder_resources; u8 initiator_depth; u8 flow_control; u8 retry_count; /* ignored when accepting */ u8 rnr_retry_count; }; /** * rdma_cma_connect - this is the connect request function, called by * the active side. The consumer registers an upcall that will be * initiated by the cma with an appropriate connection event * notification (established/rejected/disconnected etc) * * Note that the QP must be in the INIT state before calling this routine. */ int rdma_cma_connect(struct rdma_cma_id *cma_id, struct rdma_cma_conn_param *conn_param); /** * rdma_cma_accept - call on the passive side to accept a connection request * note that if the function returned with error - a reject message was * sent to the remote side and the cma_id was destroyed. * * Note that the QP must be in the INIT state before calling this routine. */ int rdma_cma_accept(struct rdma_cma_id *cma_id, struct rdma_cma_conn_param *conn_param); /** * rdma_cma_reject - call on the passive side to reject a connection request. * This call destroys the cma_id, hence when the active side accepts * the reject the cma_id is already destroyed. * @cma_id: this handle was accepted in cma_listen callback * @private_data: private data to send back to the initiator * @private_data_len: private data length */ int rdma_cma_reject(struct rdma_cma_id *cma_id, const void *private_data, u8 private_data_len); /** * rdma_cma_disconnect - this function disconnects the associated QP. */ int rdma_cma_disconnect(struct rdma_cma_id *cma_id); /* TODO: need a way to map IP address to a device or get IP addresses associated with a device */ #endif /* RDMA_CMA_H */ From halr at voltaire.com Wed Sep 21 16:30:02 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 19:30:02 -0400 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <52psr2j946.fsf@cisco.com> References: <52psr2j946.fsf@cisco.com> Message-ID: <1127345296.15613.161.camel@hal.voltaire.com> On Wed, 2005-09-21 at 19:26, Roland Dreier wrote: > Also it seems that the ehca driver drop does not handle creating QP0 > at all. How do you support running a subnet manager on Galaxy? and perhaps SMA as well ? -- Hal From sean.hefty at intel.com Wed Sep 21 16:36:40 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 16:36:40 -0700 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: Message-ID: Here's the updated implementation. It compiles, but that's it. Signed-off-by: Sean Hefty /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved. * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved. * Copyright (c) 2005 Intel Corporation. All rights reserved. * * This Software is licensed under one of the following licenses: * * 1) under the terms of the "Common Public License 1.0" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/cpl.php. * * 2) under the terms of the "The BSD License" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/bsd-license.php. * * 3) under the terms of the "GNU General Public License (GPL) Version 2" a * copy of which is available from the Open Source Initiative, see * http://www.opensource.org/licenses/gpl-license.php. * * Licensee has the right to choose one of the above licenses. * * Redistributions of source code must retain the above copyright * notice and one of the license notices. * * Redistributions in binary form must reproduce both the above copyright * notice, one of the license notices in the documentation * and/or other materials provided with the distribution. * */ #include #include MODULE_AUTHOR("Guy German"); MODULE_DESCRIPTION("Generic RDMA CM Agent"); MODULE_LICENSE("Dual BSD/GPL"); #define PFX "rdma_cma: " #define CMA_CM_RESPONSE_TIMEOUT 20 #define CMA_MAX_CM_RETRIES 3 struct cma_id_private { struct rdma_cma_id cma_id; struct ib_cm_id *cm_id; /* TODO: add state if needed */ /* TOOD: might need refcount for route queries */ /* atomic_t refcount; */ spinlock_t lock; }; struct cma_id_private* cma_alloc_id(struct ib_device *device, void *context, rdma_cma_event_handler event_handler) { struct cma_id_private *cma_id_priv; cma_id_priv = kmalloc(sizeof *cma_id_priv, GFP_KERNEL); if (!cma_id_priv) return NULL; memset(cma_id_priv, 0, sizeof *cma_id_priv); cma_id_priv->cma_id.device = device; cma_id_priv->cma_id.context = context; cma_id_priv->cma_id.event_handler = event_handler; spin_lock_init(&cma_id_priv->lock); return cma_id_priv; } static int cma_modify_ib_qp_rtr(struct cma_id_private *cma_id_priv) { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; qp_attr.qp_state = IB_QPS_RTR; ret = ib_cm_init_qp_attr(cma_id_priv->cm_id, &qp_attr, &qp_attr_mask); if (ret) return ret; qp_attr.rq_psn = cma_id_priv->cma_id.qp->qp_num; return ib_modify_qp(cma_id_priv->cma_id.qp, &qp_attr, qp_attr_mask); } static int cma_modify_ib_qp_rts(struct cma_id_private *cma_id_priv) { struct ib_qp_attr qp_attr; int qp_attr_mask, ret; qp_attr.qp_state = IB_QPS_RTS; ret = ib_cm_init_qp_attr(cma_id_priv->cm_id, &qp_attr, &qp_attr_mask); if (ret) return ret; return ib_modify_qp(cma_id_priv->cma_id.qp, &qp_attr, qp_attr_mask); } static struct cma_id_private* cma_req_recv(struct cma_id_private *listen_id, struct ib_cm_event *ib_event) { struct cma_id_private *cma_id_priv; struct rdma_route *route; cma_id_priv = cma_alloc_id(listen_id->cma_id.device, listen_id->cma_id.context, listen_id->cma_id.event_handler); if (!cma_id_priv) return NULL; route = kmalloc(sizeof *route, GFP_KERNEL); if (!route) goto err; memset(route, 0, sizeof *route); /* TODO: get route information from private data */ route->path_rec = *ib_event->param.req_rcvd.primary_path; cma_id_priv->cma_id.route = route; return cma_id_priv; err: kfree(cma_id_priv); return NULL; } static enum rdma_cma_event_type cma_rep_recv(struct cma_id_private *cma_id_priv) { int ret; ret = cma_modify_ib_qp_rtr(cma_id_priv); if (ret) goto reject; ret = cma_modify_ib_qp_rts(cma_id_priv); if (ret) goto reject; ret = ib_send_cm_rtu(cma_id_priv->cm_id, NULL, 0); if (ret) goto reject; return RDMA_CMA_EVENT_ESTABLISHED; reject: /* TODO: set QP state to ERROR? INIT? RESET? */ rdma_cma_reject(&cma_id_priv->cma_id, NULL, 0); return RDMA_CMA_EVENT_CONNECT_ERROR; } static enum rdma_cma_event_type cma_rtu_recv(struct cma_id_private *cma_id_priv) { int ret; ret = cma_modify_ib_qp_rts(cma_id_priv); if (ret) goto reject; return RDMA_CMA_EVENT_ESTABLISHED; reject: /* TODO: set QP state to ERROR? INIT? RESET? */ rdma_cma_reject(&cma_id_priv->cma_id, NULL, 0); return RDMA_CMA_EVENT_CONNECT_ERROR; } static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { struct cma_id_private *cma_id_priv; struct rdma_cma_event event; cma_id_priv = cm_id->context; switch (ib_event->event) { case IB_CM_REQ_ERROR: case IB_CM_REP_ERROR: event.event = RDMA_CMA_EVENT_UNREACHABLE; break; case IB_CM_REQ_RECEIVED: cma_id_priv = cma_req_recv(cma_id_priv, ib_event); if (!cma_id_priv) return -ENOMEM; event.event = RDMA_CMA_EVENT_CONNECT_REQUEST; break; case IB_CM_REP_RECEIVED: event.event = cma_rep_recv(cma_id_priv); break; case IB_CM_RTU_RECEIVED: event.event = cma_rtu_recv(cma_id_priv); break; case IB_CM_DREQ_RECEIVED: case IB_CM_DREQ_ERROR: case IB_CM_DREP_RECEIVED: event.event = RDMA_CMA_EVENT_DISCONNECTED; break; case IB_CM_TIMEWAIT_EXIT: case IB_CM_MRA_RECEIVED: /* ignore event */ break; case IB_CM_REJ_RECEIVED: /* TODO: set QP state to ERROR? INIT? RESET? */ event.event = RDMA_CMA_EVENT_REJECTED; break; default: printk(KERN_ERR PFX "unexpected IB CM event: %d", ib_event->event); return 0; } event.private_data = ib_event->private_data; cma_id_priv->cma_id.event_handler(&cma_id_priv->cma_id, &event); return 0; } struct rdma_cma_id* rdma_cma_create_id(struct ib_device *device, void *context, rdma_cma_event_handler event_handler) { struct cma_id_private *cma_id_priv; int ret; cma_id_priv = cma_alloc_id(device, context, event_handler); if (!cma_id_priv) return ERR_PTR(-ENOMEM); switch (device->node_type) { case IB_NODE_CA: cma_id_priv->cm_id = ib_create_cm_id(device, cma_ib_handler, cma_id_priv); ret = IS_ERR(cma_id_priv->cm_id) ? PTR_ERR(cma_id_priv->cm_id) : 0; break; default: ret = -ENOSYS; break; } if (ret) goto err; return &cma_id_priv->cma_id; err: kfree(cma_id_priv); return ERR_PTR(ret); } EXPORT_SYMBOL(rdma_cma_create_id); void rdma_cma_destroy_id(struct rdma_cma_id *cma_id) { struct cma_id_private *cma_id_priv; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); /* TODO: cancel route lookup if active */ switch (cma_id->device->node_type) { case IB_NODE_CA: ib_destroy_cm_id(cma_id_priv->cm_id); break; default: break; } kfree(cma_id->route); kfree(cma_id_priv); } EXPORT_SYMBOL(rdma_cma_destroy_id); static __be64 cma_get_service_id(struct sockaddr *addr) { /* TODO: write me */ return 42; } int rdma_cma_listen(struct rdma_cma_id *cma_id, struct sockaddr *addr) { struct cma_id_private *cma_id_priv; int ret; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); cma_id->route = kmalloc(sizeof *cma_id->route, GFP_KERNEL); if (!cma_id->route) return -ENOMEM; memset(cma_id->route, 0, sizeof *cma_id->route); cma_id->route->src_addr = *addr; switch (cma_id->device->node_type) { case IB_NODE_CA: ret = ib_cm_listen(cma_id_priv->cm_id, cma_get_service_id(addr), 0); break; default: ret = -ENOSYS; break; } if (ret) goto err; return 0; err: kfree(cma_id->route); return ret; }; EXPORT_SYMBOL(rdma_cma_listen); /* static void cma_path_handler(u64 req_id, void *context, int rec_num) { struct cma_context *cma_id = context; enum ib_cma_event event; int status = 0; if (rec_num <= 0) { event = IB_CMA_EVENT_UNREACHABLE; goto error; } cma_id->cma_param.primary_path = &cma_id->cma_path; cma_id->cma_param.alternate_path = NULL; printk(KERN_DEBUG PFX "%s: dlid=%d slid=%d pkey=%d mtu=%d sid=%llx " "qpn=%d qpt=%d psn=%d prd=%s respres=%d rcm=%d flc=%d " "cmt=%d rtrc=%d rntrtr=%d maxcm=%d \n",__func__, cma_id->cma_param.primary_path->dlid , cma_id->cma_param.primary_path->slid , cma_id->cma_param.primary_path->pkey , cma_id->cma_param.primary_path->mtu , cma_id->cma_param.service_id, cma_id->cma_param.qp_num, cma_id->cma_param.qp_type, cma_id->cma_param.starting_psn, (char *)cma_id->cma_param.private_data, cma_id->cma_param.responder_resources, cma_id->cma_param.remote_cm_response_timeout, cma_id->cma_param.flow_control, cma_id->cma_param.local_cm_response_timeout, cma_id->cma_param.retry_count, cma_id->cma_param.rnr_retry_count, cma_id->cma_param.max_cm_retries); status = ib_send_cm_req(cma_id->cm_id, &cma_id->cma_param); if (status) { printk(KERN_ERR PFX "%s: cm_req failed %d\n",__func__, status); event = IB_CMA_EVENT_REJECTED; goto error; } return; error: printk(KERN_ERR PFX "%s: return error %d \n",__func__, status); cma_connection_callback(cma_id, event, NULL); } static void cma_route_handler(u64 req_id, void *context, int rec_num) { struct cma_context *cma_id = context; enum ib_cma_event event; int status = 0; if (rec_num <= 0) { event = IB_CMA_EVENT_UNREACHABLE; goto error; } cma_id->ibat_comp.fn = &cma_path_handler; cma_id->ibat_comp.context = cma_id; status = ib_at_paths_by_route(&cma_id->cma_route, 0, &cma_id->cma_path, 1, &cma_id->ibat_comp); if (status) { event = IB_CMA_EVENT_DISCONNECTED; goto error; } return; error: printk(KERN_ERR PFX "%s: return error %d \n",__func__, status); cma_connection_callback(cma_id, event ,NULL); } */ int cma_get_route_ib(struct cma_id_private *cma_id_priv, struct sockaddr *src_addr, struct sockaddr *dest_addr) { /* TODO: Get remote GID from ARP table, query for path record */ return -ENOSYS; } int rdma_cma_get_route(struct rdma_cma_id *cma_id, struct sockaddr *src_addr, struct sockaddr *dest_addr) { struct cma_id_private *cma_id_priv; int ret; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); switch (cma_id->device->node_type) { case IB_NODE_CA: ret = cma_get_route_ib(cma_id_priv, src_addr, dest_addr); break; default: ret = -ENOSYS; break; } return ret; } EXPORT_SYMBOL(rdma_cma_get_route); static int cma_connect_ib(struct cma_id_private *cma_id_priv, struct rdma_cma_conn_param *conn_param) { struct ib_cm_req_param req; struct rdma_route *route; route = cma_id_priv->cma_id.route; memset(&req, 0, sizeof req); req.primary_path = &route->path_rec; req.service_id = cma_get_service_id(&route->dest_addr); req.qp_num = conn_param->qp->qp_num; req.qp_type = IB_QPT_RC; req.starting_psn = req.qp_num; req.private_data = conn_param->private_data; req.private_data_len = conn_param->private_data_len; req.responder_resources = conn_param->responder_resources; req.initiator_depth = conn_param->initiator_depth; req.flow_control = conn_param->flow_control; req.retry_count = conn_param->retry_count; req.rnr_retry_count = conn_param->rnr_retry_count; req.remote_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT; req.local_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT; req.max_cm_retries = CMA_MAX_CM_RETRIES; req.srq = conn_param->qp->srq ? 1 : 0; return ib_send_cm_req(cma_id_priv->cm_id, &req); } int rdma_cma_connect(struct rdma_cma_id *cma_id, struct rdma_cma_conn_param *conn_param) { struct cma_id_private *cma_id_priv; int ret; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); cma_id->qp = conn_param->qp; switch (cma_id->device->node_type) { case IB_NODE_CA: ret = cma_connect_ib(cma_id_priv, conn_param); break; default: ret = -ENOSYS; break; } return ret; } EXPORT_SYMBOL(rdma_cma_connect); static int cma_accept_ib(struct cma_id_private *cma_id_priv, struct rdma_cma_conn_param *conn_param) { struct ib_cm_rep_param rep; int ret; ret = cma_modify_ib_qp_rtr(cma_id_priv); if (ret) return ret; memset(&rep, 0, sizeof rep); rep.qp_num = conn_param->qp->qp_num; rep.starting_psn = rep.qp_num; rep.private_data = conn_param->private_data; rep.private_data_len = conn_param->private_data_len; rep.responder_resources = conn_param->responder_resources; rep.initiator_depth = conn_param->initiator_depth; rep.target_ack_delay = CMA_CM_RESPONSE_TIMEOUT; rep.failover_accepted = 0; rep.flow_control = conn_param->flow_control; rep.rnr_retry_count = conn_param->rnr_retry_count; rep.srq = conn_param->qp->srq ? 1 : 0; return ib_send_cm_rep(cma_id_priv->cm_id, &rep); } int rdma_cma_accept(struct rdma_cma_id *cma_id, struct rdma_cma_conn_param *conn_param) { struct cma_id_private *cma_id_priv; int ret; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); cma_id->qp = conn_param->qp; switch (cma_id->device->node_type) { case IB_NODE_CA: ret = cma_accept_ib(cma_id_priv, conn_param); break; default: ret = -ENOSYS; break; } if (ret) goto reject; return 0; reject: /* TODO: set QP state to ERROR? INIT? RESET? */ rdma_cma_reject(cma_id, NULL, 0); return ret; } EXPORT_SYMBOL(rdma_cma_accept); int rdma_cma_reject(struct rdma_cma_id *cma_id, const void *private_data, u8 private_data_len) { struct cma_id_private *cma_id_priv; int ret; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); switch (cma_id->device->node_type) { case IB_NODE_CA: ret = ib_send_cm_rej(cma_id_priv->cm_id, IB_CM_REJ_CONSUMER_DEFINED, NULL, 0, private_data, private_data_len); break; default: ret = -ENOSYS; break; } return ret; }; EXPORT_SYMBOL(rdma_cma_reject); int rdma_cma_disconnect(struct rdma_cma_id *cma_id) { struct cma_id_private *cma_id_priv; struct ib_qp_attr qp_attr; int ret; cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); /* TODO: Should we transition here? Compare with error handling processing accept or a CM reply. */ qp_attr.qp_state = IB_QPS_ERR; ret = ib_modify_qp(cma_id_priv->cma_id.qp, &qp_attr, IB_QP_STATE); if (ret) return ret; switch (cma_id->device->node_type) { case IB_NODE_CA: /* Initiate or respond to a disconnect. */ ret = ib_send_cm_dreq(cma_id_priv->cm_id, NULL, 0); if (ret) ib_send_cm_drep(cma_id_priv->cm_id, NULL, 0); break; default: break; } return 0; } EXPORT_SYMBOL(rdma_cma_disconnect); static int cma_init(void) { return 0; } static void cma_cleanup(void) { } module_init(cma_init); module_exit(cma_cleanup); From xma at us.ibm.com Wed Sep 21 16:39:07 2005 From: xma at us.ibm.com (Shirley Ma) Date: Wed, 21 Sep 2005 16:39:07 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <52ll1qj91h.fsf@cisco.com> Message-ID: To be honest I am not very familiar with the driver code too. I will dig out more and we can discuss this later. I am looking for a general answer if the QP1 creation fails for some reason, whether ib_mad shuts down all other ports special QPs. I think the answer is not. Am I right? Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed Sep 21 16:33:15 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 19:33:15 -0400 Subject: [openib-general] problems on device/ports initialization In-Reply-To: References: Message-ID: <1127345376.15613.166.camel@hal.voltaire.com> On Wed, 2005-09-21 at 19:28, Shirley Ma wrote: > > And what determines port_usable ? > Forgot to reply this in the previous email. Once the QP1/QP0 has been > successfully created, the port is marked as usable. But if QP0 is created, what happens to packets on it while there is no QP1 ? Are they handled ? If so, the port is partially up. -- Hal From mshefty at ichips.intel.com Wed Sep 21 16:42:17 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 16:42:17 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: References: Message-ID: <4331EFD9.8000402@ichips.intel.com> Shirley Ma wrote: > > To be honest I am not very familiar with the driver code too. I will dig > out more and we can discuss this later. > > I am looking for a general answer if the QP1 creation fails for some > reason, whether ib_mad shuts down all other ports special QPs. I think > the answer is not. Am I right? If the MAD code encounters an issue initializing a device, it will cleanup all resources allocated to that device. The error handling is per device, rather than per port. - Sean From viswa.krish at gmail.com Wed Sep 21 16:47:42 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Wed, 21 Sep 2005 16:47:42 -0700 Subject: [openib-general] Modifying QP state error Message-ID: <4df28be40509211647469727df@mail.gmail.com> When I try to modify QP state from RTS to RESET I get the following error ib_mthca 0000:05:00.0: Command 1e completed with status 0a ib_mthca 0000:05:00.0: modify QP 7 returned status 0a. Is modifying QP state from RTS to RESET a valid state transistion ? (I guess so) Are there anything else that needs to be taken care of ? -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Wed Sep 21 16:51:13 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 21 Sep 2005 16:51:13 -0700 Subject: [openib-general] Modifying QP state error In-Reply-To: <4df28be40509211647469727df@mail.gmail.com> References: <4df28be40509211647469727df@mail.gmail.com> Message-ID: <4331F1F1.2030107@ichips.intel.com> Viswanath Krishnamurthy wrote: > When I try to modify QP state from RTS to RESET I get the following error > > ib_mthca 0000:05:00.0: Command 1e completed with status 0a > ib_mthca 0000:05:00.0: modify QP 7 returned status 0a. > > Is modifying QP state from RTS to RESET a valid state transistion ? (I > guess so) > Are there anything else that needs to be taken care of ? You need to modify from RTS to ERROR to RESET. - Sean From halr at voltaire.com Wed Sep 21 16:52:37 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 19:52:37 -0400 Subject: [openib-general] Modifying QP state error In-Reply-To: <4df28be40509211647469727df@mail.gmail.com> References: <4df28be40509211647469727df@mail.gmail.com> Message-ID: <1127346757.15613.233.camel@hal.voltaire.com> On Wed, 2005-09-21 at 19:47, Viswanath Krishnamurthy wrote: > When I try to modify QP state from RTS to RESET I get the following > error > > ib_mthca 0000:05:00.0: Command 1e completed with status 0a > ib_mthca 0000:05:00.0: modify QP 7 returned status 0a. > > Is modifying QP state from RTS to RESET a valid state transistion ? > (I guess so) > Are there anything else that needs to be taken care of ? You can only get to RESET from ERROR. See Figure 124 QP Context State Diagram IBA 1.2 p. 452. -- Hal From rolandd at cisco.com Wed Sep 21 16:59:49 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 21 Sep 2005 16:59:49 -0700 Subject: [openib-general] Modifying QP state error In-Reply-To: <4df28be40509211647469727df@mail.gmail.com> (Viswanath Krishnamurthy's message of "Wed, 21 Sep 2005 16:47:42 -0700") References: <4df28be40509211647469727df@mail.gmail.com> Message-ID: <52ek7ij7ka.fsf@cisco.com> Viswanath> Is modifying QP state from RTS to RESET a valid state Viswanath> transistion ? (I guess so) Are there anything else that Viswanath> needs to be taken care of ? Yes, it should work. Can you post the code to reproduce this? - R. From rolandd at cisco.com Wed Sep 21 17:07:07 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 21 Sep 2005 17:07:07 -0700 Subject: [openib-general] Modifying QP state error References: <4df28be40509211647469727df@mail.gmail.com> <1127346757.15613.233.camel@hal.voltaire.com> Message-ID: <52oe6mezis.fsf@cisco.com> Hal> You can only get to RESET from ERROR. See Figure 124 QP Hal> Context State Diagram IBA 1.2 p. 452. I think the figure drawn in a slightly misleading way. The text at the lower left says: It is possible to transition from any state to either the Error or the Reset state with the Modify QP/EE Verb. - R. From halr at voltaire.com Wed Sep 21 17:07:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 20:07:25 -0400 Subject: [openib-general] Modifying QP state error In-Reply-To: <52oe6mezis.fsf@cisco.com> References: <4df28be40509211647469727df@mail.gmail.com> <1127346757.15613.233.camel@hal.voltaire.com> <52oe6mezis.fsf@cisco.com> Message-ID: <1127347644.15613.267.camel@hal.voltaire.com> On Wed, 2005-09-21 at 20:07, Roland Dreier wrote: > Hal> You can only get to RESET from ERROR. See Figure 124 QP > Hal> Context State Diagram IBA 1.2 p. 452. > > I think the figure drawn in a slightly misleading way. The text at > the lower left says: > > It is possible to transition from any state to either the Error or > the Reset state with the Modify QP/EE Verb. It certainly does say that. I sit corrected... -- Hal From viswa.krish at gmail.com Wed Sep 21 17:19:51 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Wed, 21 Sep 2005 17:19:51 -0700 Subject: [openib-general] Modifying QP state error In-Reply-To: <52oe6mezis.fsf@cisco.com> References: <4df28be40509211647469727df@mail.gmail.com> <1127346757.15613.233.camel@hal.voltaire.com> <52oe6mezis.fsf@cisco.com> Message-ID: <4df28be405092117191f4361fd@mail.gmail.com> The mthca state transistion code allows this transistion (RTS --> RESET), but the mthca hardware/firmware does not allow it. It allows RTS->ERR->RESET. I will post the code later to reproduce this. I was trying to workaround the CQ destroy memory leak by caching QP entries and reusing them, but ran into other issues. -Viswa On 9/21/05, Roland Dreier wrote: > > Hal> You can only get to RESET from ERROR. See Figure 124 QP > Hal> Context State Diagram IBA 1.2 p. 452. > > I think the figure drawn in a slightly misleading way. The text at > the lower left says: > > It is possible to transition from any state to either the Error or > the Reset state with the Modify QP/EE Verb. > > - R. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Wed Sep 21 17:22:27 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 21 Sep 2005 17:22:27 -0700 Subject: [openib-general] Modifying QP state error In-Reply-To: <4df28be405092117191f4361fd@mail.gmail.com> (Viswanath Krishnamurthy's message of "Wed, 21 Sep 2005 17:19:51 -0700") References: <4df28be40509211647469727df@mail.gmail.com> <1127346757.15613.233.camel@hal.voltaire.com> <52oe6mezis.fsf@cisco.com> <4df28be405092117191f4361fd@mail.gmail.com> Message-ID: <52k6haeyt8.fsf@cisco.com> Viswanath> The mthca state transistion code allows this Viswanath> transistion (RTS --> RESET), but the mthca Viswanath> hardware/firmware does not allow it. It allows Viswanath> RTS-> ERR->RESET. I will post the code later to Viswanath> reproduce this. I was trying to workaround the CQ Viswanath> destroy memory leak by caching QP entries and reusing Viswanath> them, but ran into other issues. The hardware should allow it. mthca does a transition directly to reset whenever it destroys a QP, and I've never seen this failure, even when destroying QPs in the RTS state. - R. From viswa.krish at gmail.com Wed Sep 21 17:23:28 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Wed, 21 Sep 2005 17:23:28 -0700 Subject: [openib-general] opensm and SIGINT Message-ID: <4df28be4050921172370bad964@mail.gmail.com> Hal, Currently opensm traps SIGINT. There was some discussion to remove it. I have currently running some tests on opensm by killing (SIGKILL) and restarting opensm. So far I ahve not found any resource leak issues. Is ther a plan to remove that signal handler. Ideally it should not exist. -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitlinb at broadcom.com Wed Sep 21 17:30:29 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Wed, 21 Sep 2005 17:30:29 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <43319157.5020708@ichips.intel.com> References: <54AD0F12E08D1541B826BE97C98F99F1020856@NT-SJCA-0751.brcm.ad.broadcom.com> <43319157.5020708@ichips.intel.com> Message-ID: <469958e0050921173063e14203@mail.gmail.com> On 9/21/05, Sean Hefty wrote: > > Caitlin Bestler wrote: > > That's certainly an acceptably low overhead for iWARP IHVs, > > provided there are applications that want this control and > > *not* also need even more IB-specific CM control. I still > > have the same skepticism I had for the IT-API's exposing > > of paths via a transport neutral API. Namely, is there > > really any basis to select amongst multiple paths from > > transport neutral code? The same applies to caching of > > address translations on a transport neutral basis. Is > > it really possible to do in any way that makes sense? > > Wouldn't caching at a lower layer, with transport/device > > specific knowledge, make more sense? > > I guess I view this API slightly differently than being just a transport > neutral > connection interface. I also see it as a way to connect over IB using IP > addresses, which today is only possible if using ib_at. That is, the API > could > do both. Given that purpose I can envision an IB-aware application that needed to use IP addresses and wanted to take charge of caching the translation. But viewing this in a wider scope raises a second question. Shouldn't iSER be using the same routines to establish connections? -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.j.woodruff at intel.com Wed Sep 21 17:45:06 2005 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Wed, 21 Sep 2005 17:45:06 -0700 Subject: [openib-general] 3513 DAPL is Broken Message-ID: <1AC79F16F5C5284499BB9591B33D6F00059AF893@orsmsx408> Seems to hang around the time of the modify QP. ibv_rc_pingpong seems to work OK and also your DAPL-socket CM version that you gave me yesterday seems to work, but the DAPL I pulled from SVN that uses the IB AT/CM has the following problem. I am starting to think that pushing out your socket CM version until things stabilize with the IBAT/IBCM version might be worth considering, so that people that want to use DAPL now have something that is reliable. woody Here is the dapl trace when running Intel MPI on top of uDAPL 3513, dapl_ia_query (0x522860, (nil), 0x0, (nil), 0x3ffffff, 0x7fbfffe510) dapl_ia_query () returns 0x0 dapl_evd_create () returns 0x0 setup_listener(ia_ptr 0x522860 SID 3545 sp 0x5238e0 conn 0x5239a0 id 5389248) setup_listener(conn=0x5239a0 cm_id=5389248) dapl_ep_create (0x522860, 0x5235a0, 0x523620, 0x523620, 0x523780, 0x7fbfffecb0, 0x5201e8) query_hca: MAX msg 2147483648 dto 65535 iov 59 rdma i4,o4 qp_alloc: ia_ptr 0x522860 ep_ptr 0x526740 ep_ctx_ptr 0x526740 qp_alloc: qpn 0xc0409 sq 1000,9 rq 1000,1 modify_qp: qp 0x523c50, state 1 qp_num 0xc0409 dapl_ep_create (0x522860, 0x5235a0, 0x523620, 0x523620, 0x523780, 0x7fbfffecb0, 0x5203b8) query_hca: MAX msg 2147483648 dto 65535 iov 59 rdma i4,o4 qp_alloc: ia_ptr 0x522860 ep_ptr 0x526a20 ep_ctx_ptr 0x526a20 qp_alloc: qpn 0xc040a sq 1000,9 rq 1000,1 modify_qp: qp 0x526d00, state 1 qp_num 0xc040a dapl_ep_create (0x522860, 0x5235a0, 0x523620, 0x523620, 0x523780, 0x7fbfffecb0, 0x520758) query_hca: MAX msg 2147483648 dto 65535 iov 59 rdma i4,o4 qp_alloc: ia_ptr 0x522860 ep_ptr 0x593470 ep_ctx_ptr 0x593470 qp_alloc: qpn 0xc040b sq 1000,9 rq 1000,1 modify_qp: qp 0x526e40, state 1 qp_num 0xc040b From halr at voltaire.com Wed Sep 21 18:00:52 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 21 Sep 2005 21:00:52 -0400 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <4df28be4050921172370bad964@mail.gmail.com> References: <4df28be4050921172370bad964@mail.gmail.com> Message-ID: <1127350851.15613.390.camel@hal.voltaire.com> Hi Viswa, On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote: > Currently opensm traps SIGINT. There was some discussion to remove it. > I have currently running some tests on opensm > by killing (SIGKILL) and restarting opensm. So far I ahve not found > any resource leak issues. Is ther a plan to remove that > signal handler. Ideally it should not exist. Eitan stated that this was historical in nature for gen1 drivers which had resource tracking problems: "if OpenSM left without cleaning up all used resources (like MAD buffers and UD-AVs), the driver oops'ed." I think that (eliminating the handler for SIGINT) can at least be done for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor layers for starters. I will experiment with gen2 and let you know. -- Hal From ftillier at silverstorm.com Wed Sep 21 21:31:08 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Wed, 21 Sep 2005 21:31:08 -0700 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <1127350851.15613.390.camel@hal.voltaire.com> Message-ID: <000c01c5bf2e$7517ebf0$9e5aa8c0@infiniconsys.com> > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, September 21, 2005 6:01 PM > > Hi Viswa, > > On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote: > > Currently opensm traps SIGINT. There was some discussion to remove it. > > I have currently running some tests on opensm > > by killing (SIGKILL) and restarting opensm. So far I ahve not found > > any resource leak issues. Is ther a plan to remove that > > signal handler. Ideally it should not exist. > > Eitan stated that this was historical in nature for gen1 drivers which > had resource tracking problems: "if OpenSM left without cleaning up all > used resources (like MAD buffers and UD-AVs), the driver oops'ed." > > I think that (eliminating the handler for SIGINT) can at least be done > for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor > layers for starters. I will experiment with gen2 and let you know. I'd like to see elimination of signal handling removed from the Windows version too. If there's a bug in the transport due to resource leaks, that needs to be fixed, not masked by handling signals. - Fab From rolandd at cisco.com Wed Sep 21 21:33:41 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 21 Sep 2005 21:33:41 -0700 Subject: [openib-general] ib_create_cq memory leak? In-Reply-To: <4df28be40509211314243b4bed@mail.gmail.com> (Viswanath Krishnamurthy's message of "Wed, 21 Sep 2005 13:14:36 -0700") References: <4df28be40509211314243b4bed@mail.gmail.com> Message-ID: <52fyrxg1qy.fsf@cisco.com> Thanks very much for the excellent test case. The following patch (already checked into svn and queued in git for merging into 2.6.14) should fix things -- on my system, your test case ran successfully for many hundreds of iterations. --- linux-kernel/infiniband/hw/mthca/mthca_memfree.c (revision 3500) +++ linux-kernel/infiniband/hw/mthca/mthca_memfree.c (working copy) @@ -529,12 +529,25 @@ int mthca_alloc_db(struct mthca_dev *dev goto found; } + for (i = start; i != end; i += dir) + if (!dev->db_tab->page[i].db_rec) { + page = dev->db_tab->page + i; + goto alloc; + } + if (dev->db_tab->max_group1 >= dev->db_tab->min_group2 - 1) { ret = -ENOMEM; goto out; } + if (group == 0) + ++dev->db_tab->max_group1; + else + --dev->db_tab->min_group2; + page = dev->db_tab->page + end; + +alloc: page->db_rec = dma_alloc_coherent(&dev->pdev->dev, 4096, &page->mapping, GFP_KERNEL); if (!page->db_rec) { @@ -554,10 +567,6 @@ int mthca_alloc_db(struct mthca_dev *dev } bitmap_zero(page->used, MTHCA_DB_REC_PER_PAGE); - if (group == 0) - ++dev->db_tab->max_group1; - else - --dev->db_tab->min_group2; found: j = find_first_zero_bit(page->used, MTHCA_DB_REC_PER_PAGE); From vuhuong at mellanox.com Wed Sep 21 22:41:13 2005 From: vuhuong at mellanox.com (Vu Pham) Date: Wed, 21 Sep 2005 22:41:13 -0700 Subject: [openib-general][PATCH][SRP] bug fixes & fmr supported, In-Reply-To: <20050921084300.GA21715@lst.de> References: <43309F3F.3060009@mellanox.com> <20050921084300.GA21715@lst.de> Message-ID: <433243F9.6030304@mellanox.com> Christoph Hellwig wrote: >>+ if ((dma_addr & (PAGE_SIZE - 1)) || >>+ ((dma_addr + dma_len) & (PAGE_SIZE - 1)) || >>+ ((i == (sg_cnt - 1)) && !unaligned)) { >>+ srp_fmr->io_addr = dma_addr & PAGE_MASK; >>+ ++unaligned; >>+ } >>+ >>+ if (unaligned <= 1) { >>+ cur_len += dma_len; >>+ for (base_addr = dma_addr; >>+ (dma_addr & PAGE_MASK) <= >>+ ((base_addr + dma_len - 1) & PAGE_MASK); >>+ dma_addr += PAGE_SIZE) >>+ dma_pages[page_cnt++] = dma_addr & PAGE_MASK; >>+ } >>+ >>+ if ((unaligned > 1) || (i == (sg_cnt - 1))) { > > > this is definitly completely broken. dma_addr_ts are opaqueue handles, > some platforms use high bits in them for iommu flags. > Yes, I'm busted with such platforms. Could anyone recommend a generic way to do this? or is there such a generic way? Thanks From yael at mellanox.co.il Wed Sep 21 23:05:13 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 22 Sep 2005 09:05:13 +0300 Subject: [openib-general] [PATCH] Opensm - ignore strict-aliasing warning Message-ID: <5zaci5mycm.fsf@mtl066.yok.mtl.com> Hi Hal, I saw that you didn't add this patch yet. Using the -fno-strict-aliasing falg avoids the compiler from doing some optimizations that might cause the strict aliasing bugs. So this is a fix, and not a workaround to remove the warnings. I agree that it is not the best fix. It is better to fix the code in such a way that will enable these optimizations too. I can look into it and have a better fix, but it will take some time. Until then, this fix should be added, since currently we can produce code that might have these problems in it. Here is the patch again (in case you lost it along the mails....) Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/Makefile.am =================================================================== --- opensm/Makefile.am (revision 3487) +++ opensm/Makefile.am (working copy) @@ -64,7 +64,7 @@ opensm_SOURCES = main.c osm_db_files.c o osm_ucast_mgr.c osm_ucast_updn.c \ osm_vl15intf.c osm_vl_arb_rcv.c\ osm_vl_arb_rcv_ctrl.c st.c -opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 +opensm_CFLAGS = -Wall $(OSMV_CFLAGS) -fno-strict-aliasing -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 opensm_CXXFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -D_XOPEN_SOURCE=600 -D_BSD_SOURCE=1 # for linking with the simulator client library we have to use g++: From guyg at voltaire.com Thu Sep 22 02:13:16 2005 From: guyg at voltaire.com (Guy German) Date: Thu, 22 Sep 2005 12:13:16 +0300 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <43319157.5020708@ichips.intel.com> References: <54AD0F12E08D1541B826BE97C98F99F1020856@NT-SJCA-0751.brcm.ad.broadcom.com> <43319157.5020708@ichips.intel.com> Message-ID: <433275AC.4080005@voltaire.com> Sean Hefty wrote: > I guess I view this API slightly differently than being just a transport > neutral connection interface. I also see it as a way to connect over IB > using IP addresses, which today is only possible if using ib_at. That > is, the API could do both. I don't think this layer should replace ib_at. If you think there are things to be fixed in the ib_at, I suggest we fix them. I do believe that the original purpose of this generic cm was to serve ulps that don't want to be transport oriented (e.g. iSER). Guy From mst at mellanox.co.il Thu Sep 22 02:22:57 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Sep 2005 12:22:57 +0300 Subject: [openib-general] Re: Firmware parameters In-Reply-To: <20050920214921.GA2799@cse.ohio-state.edu> References: <20050920214921.GA2799@cse.ohio-state.edu> Message-ID: <20050922092257.GC31820@mellanox.co.il> Quoting Sayantan Sur : > Subject: Firmware parameters > > Hi, > > I read the Wiki page (last updated 09/15) on burning firmware: > > https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet > > I couldn't find any information on how to set certain firmware > parameters (like TPT values, Number of Outstanding Reads ... etc.) in > that page. You can do this by editing the .brd file. If you do this, I suggest changing the PSID value (also in the brd file) to label the card as having a modified firmware. > Another important question is given any HCA (with pre-installed > firmware), how do I find out the parameter values on the card? > > I tried mstflint in the following manner, but wasn't able to get any > useful information. I tried this on SuSE Linux 9.3, kernel 2.6.13-1(smp) > with SVN revision 3433. > > [surs at ro0:mstflint] sudo ./mstflint -d 02:00.0 q > Image type: FailSafe > Chip rev.: A0 > GUID Des: Node Port1 Port2 Sys image > GUIDs: 0002c902004002e8 0002c902004002e9 0002c902004002ea > 0002c902004002eb > Board ID: (MT_0150000001) > [surs at ro0:mstflint] sudo ./mstflint -d 02:00.0 dc > *** ERROR *** Failed dumping FW configuration: Fw configuration section > not found in the given image. > > > Could someone please direct me to some means to achieve this? > > TIA, > Sayantan. Unfortunately, the dc command only works if the image was created with the imgen tool. There's no easy way to do this for existing firmware with just flint. Here's a (somewhat cumbersome) procedure to achieve what you want: 1. read the firmware binary out of the board into file with "ri" (read image) mstflint command 2. launch the infiniburn graphical application (can be found on IB gold CD 1.7.x, alternatively a copy appears in the contrib section of the subversion repository) 3. select "read from file", input file format: Raw Binary. Modified parameters now appear, marked in red colour. Hope this helps, -- MST From mst at mellanox.co.il Thu Sep 22 03:31:59 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Sep 2005 13:31:59 +0300 Subject: [openib-general] [PATCH] add cq error events Message-ID: <20050922103159.GE31820@mellanox.co.il> Hello, Roland, Sean! The following implements reporting error events in mthca. (I've renamed mthca_cq_event to mthca_cq_completion, for consistency with qp events). As a side note, the spec says: "Two types of CQ errors can occur: the CQ can overrun or it can become inaccessible": I wander whether this should be interpreted in a sense that that there should be two types of events: IB_EVENT_CQ_OVERRUN and IB_EVENT_CQ_ACCESS, rather than just a generic IB_EVENT_CQ_ERR What do you think? MST --- Implement reporting asynchronous cq events. Signed-off-by: Michael S. Tsirkin Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_dev.h =================================================================== --- linux-kernel/drivers/infiniband/hw/mthca/mthca_dev.h (revision 3514) +++ linux-kernel/drivers/infiniband/hw/mthca/mthca_dev.h (working copy) @@ -440,7 +440,9 @@ int mthca_init_cq(struct mthca_dev *dev, struct mthca_cq *cq); void mthca_free_cq(struct mthca_dev *dev, struct mthca_cq *cq); -void mthca_cq_event(struct mthca_dev *dev, u32 cqn); +void mthca_cq_completion(struct mthca_dev *dev, u32 cqn); +void mthca_cq_event(struct mthca_dev *dev, u32 cqn, + enum ib_event_type event_type); void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, struct mthca_srq *srq); Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_cq.c =================================================================== --- linux-kernel/drivers/infiniband/hw/mthca/mthca_cq.c (revision 3514) +++ linux-kernel/drivers/infiniband/hw/mthca/mthca_cq.c (working copy) @@ -208,7 +208,7 @@ static inline void update_cons_index(str } } -void mthca_cq_event(struct mthca_dev *dev, u32 cqn) +void mthca_cq_completion(struct mthca_dev *dev, u32 cqn) { struct mthca_cq *cq; @@ -224,6 +224,35 @@ void mthca_cq_event(struct mthca_dev *de cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context); } +void mthca_cq_event(struct mthca_dev *dev, u32 cqn, + enum ib_event_type event_type) +{ + struct mthca_cq *cq; + struct ib_event event; + + spin_lock(&dev->cq_table.lock); + + cq = mthca_array_get(&dev->cq_table.cq, cqn & (dev->limits.num_cqs - 1)); + + if (cq) + atomic_inc(&cq->refcount); + spin_unlock(&dev->cq_table.lock); + + if (!cq) { + mthca_warn(dev, "Async event for bogus CQ %08x\n", cqn); + return; + } + + event.device = &dev->ib_dev; + event.event = event_type; + event.element.cq = &cq->ibcq; + if (cq->ibcq.event_handler) + cq->ibcq.event_handler(&event, cq->ibcq.cq_context); + + if (atomic_dec_and_test(&cq->refcount)) + wake_up(&cq->wait); +} + void mthca_cq_clean(struct mthca_dev *dev, u32 cqn, u32 qpn, struct mthca_srq *srq) { Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_eq.c =================================================================== --- linux-kernel/drivers/infiniband/hw/mthca/mthca_eq.c (revision 3514) +++ linux-kernel/drivers/infiniband/hw/mthca/mthca_eq.c (working copy) @@ -287,7 +287,7 @@ static int mthca_eq_int(struct mthca_dev case MTHCA_EVENT_TYPE_COMP: disarm_cqn = be32_to_cpu(eqe->event.comp.cqn) & 0xffffff; disarm_cq(dev, eq->eqn, disarm_cqn); - mthca_cq_event(dev, disarm_cqn); + mthca_cq_completion(dev, disarm_cqn); break; case MTHCA_EVENT_TYPE_PATH_MIG: @@ -349,6 +349,8 @@ static int mthca_eq_int(struct mthca_dev eqe->event.cq_err.syndrome == 1 ? "overrun" : "access violation", be32_to_cpu(eqe->event.cq_err.cqn) & 0xffffff); + mthca_cq_event(dev, be32_to_cpu(eqe->event.cq_err.cqn), + IB_EVENT_CQ_ERR); break; case MTHCA_EVENT_TYPE_EQ_OVERFLOW: From mst at mellanox.co.il Thu Sep 22 03:40:19 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 22 Sep 2005 13:40:19 +0300 Subject: [openib-general] Re: problem compiling openib on Linux 2.6.12 In-Reply-To: <1127315266.4426.574.camel@hal.voltaire.com> References: <1127315266.4426.574.camel@hal.voltaire.com> Message-ID: <20050922104019.GH31820@mellanox.co.il> Quoting Hal Rosenstock : > Subject: Re: problem compiling openib on Linux 2.6.12 > > On Wed, 2005-09-21 at 11:03, Sacerdoti, Federico wrote: > > Hi, > > > > Using the openib svn code from yesterday I am compiling the Linux > kernel > > 2.6.12 (from kernel.org). When I finish and try to 'modprobe ib_ucm' > > > > I see in dmsg: > > > > Unknown symbol class_destroy > > (also for class_create, etc) > > ... > A ucm backpatch for this is similar to the one for uat in > https://openib.org/svn/gen2/branches/backport/2.6.12/uat_3465_to_2_6_12. > patch I dont see any mentions of class_destroy in ucm. Am I missing something? > I don't know if an explicit one for ucm exists. Care to build one? -- MST From halr at voltaire.com Thu Sep 22 03:49:40 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Sep 2005 06:49:40 -0400 Subject: [openib-general] Re: problem compiling openib on Linux 2.6.12 In-Reply-To: <20050922104019.GH31820@mellanox.co.il> References: <1127315266.4426.574.camel@hal.voltaire.com> <20050922104019.GH31820@mellanox.co.il> Message-ID: <1127386178.15613.2087.camel@hal.voltaire.com> On Thu, 2005-09-22 at 06:40, Michael S. Tsirkin wrote: > Quoting Hal Rosenstock : > > Subject: Re: problem compiling openib on Linux 2.6.12 > > > > On Wed, 2005-09-21 at 11:03, Sacerdoti, Federico wrote: > > > Hi, > > > > > > Using the openib svn code from yesterday I am compiling the Linux > > kernel > > > 2.6.12 (from kernel.org). When I finish and try to 'modprobe ib_ucm' > > > > > > I see in dmsg: > > > > > > Unknown symbol class_destroy > > > (also for class_create, etc) > > > ... > > A ucm backpatch for this is similar to the one for uat in > > https://openib.org/svn/gen2/branches/backport/2.6.12/uat_3465_to_2_6_12. > > patch > > I dont see any mentions of class_destroy in ucm. > Am I missing something? No, that was eliminated in Sean's recent ucm change; I forgot about this. > > I don't know if an explicit one for ucm exists. > Care to build one? This is moot. -- Hal From danb at voltaire.com Thu Sep 22 04:00:26 2005 From: danb at voltaire.com (Dan Bar Dov) Date: Thu, 22 Sep 2005 14:00:26 +0300 Subject: [openib-general] ISER memory management Message-ID: I've commited the following change in ISER: Changed all memory allocations to be based on kmem_caches instead of private memory pools implementations. This changes memory registration mechanisms as a side effect. Removed files: iser_utils.c iser_utils.h Signed-off-by: Dan Bar Dov From yael at mellanox.co.il Thu Sep 22 04:43:54 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 22 Sep 2005 14:43:54 +0300 Subject: [openib-general] [PATCH] Opensm - Fix bug in trap receiver Message-ID: <5z8xxpmio5.fsf@mtl066.yok.mtl.com> Hi Hal, There is a bug in the trap handling in opensm. Currently - when OpenSM receives a trap, it tries to send a trap repress even if the OpenSM is not in MASTER state. Also - if OpenSM receives a trap from its local lid (source_lid is 0), and it has not yet configured its lid - it should not try to send trap repress, as this will fail (no destination lid available to define the path). Attached is a patch to fix these issues. Thanks, Yael Index: opensm/osm_sm_mad_ctrl.c =================================================================== --- opensm/osm_sm_mad_ctrl.c (revision 435) +++ opensm/osm_sm_mad_ctrl.c (working copy) @@ -591,6 +591,17 @@ __osm_sm_mad_ctrl_process_trap( p_smp = osm_madw_get_smp_ptr( p_madw ); + /* Make sure OpenSM is master. If not - then we should not process the trap */ + if (p_ctrl->p_subn->sm_state != IB_SMINFO_STATE_MASTER) + { + osm_log( p_ctrl->p_log, OSM_LOG_DEBUG, + "__osm_sm_mad_ctrl_process_trap: " + "Received trap but OpenSM is not in MASTER state. " + "Dropping mad. \n"); + osm_mad_pool_put( p_ctrl->p_mad_pool, p_madw ); + goto Exit; + } + /* Note that attr_id (like the rest of the MAD) is in network byte order. Index: opensm/osm_trap_rcv.c =================================================================== --- opensm/osm_trap_rcv.c (revision 435) +++ opensm/osm_trap_rcv.c (working copy) @@ -375,6 +375,16 @@ __osm_trap_rcv_process_request( if (p_madw->mad_addr.addr_type.smi.source_lid == 0) { + /* Check if the sm_base_lid is 0. If yes - this means that + the local lid wasn't configured yet. Don't send a response + to the trap. */ + if (p_rcv->p_subn->sm_base_lid == 0) + { + osm_log( p_rcv->p_log, OSM_LOG_DEBUG, + "__osm_trap_rcv_process_request: " + "Received a SLID=0 Trap. local LID=0. Ignoring mad. \n"); + goto Exit; + } osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_trap_rcv_process_request: " "Received a SLID=0 Trap. Using local LID:0x%04X instead:.\n", From yael at mellanox.co.il Thu Sep 22 04:45:28 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 22 Sep 2005 14:45:28 +0300 Subject: [openib-general] [PATCH] Opensm - Fix bug in cl_event_wheel Message-ID: <5z7jd9milj.fsf@mtl066.yok.mtl.com> Hi Hal, There is a bug in the init function of cl_event_wheel - initializing of spinlock inside CL_ASSERT. Attached is a patch to fix it. Thanks, Yael Index: complib/cl_event_wheel.c =================================================================== --- complib/cl_event_wheel.c (revision 439) +++ complib/cl_event_wheel.c (working copy) @@ -224,7 +224,14 @@ cl_event_wheel_init( p_event_wheel->p_log = p_log; p_event_wheel->p_external_lock = NULL; p_event_wheel->closing = FALSE; - CL_ASSERT( cl_spinlock_init( &(p_event_wheel->lock) ) == CL_SUCCESS ); + cl_status = cl_spinlock_init( &(p_event_wheel->lock) ); + if (cl_status != CL_SUCCESS) + { + osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, + "cl_event_wheel_init : ERROR 1000: " + "Failed to initialize cl_spinlock\n" ); + goto Exit; + } cl_qlist_init( &p_event_wheel->events_wheel); cl_qmap_init( &p_event_wheel->events_map ); @@ -237,8 +244,10 @@ cl_event_wheel_init( { osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, "cl_event_wheel_init : ERROR 1000: " - "Failed to initialize timer\n" ); + "Failed to initialize cl_timer\n" ); + goto Exit; } + Exit: OSM_LOG_EXIT( p_event_wheel->p_log ); return(cl_status); } From halr at voltaire.com Thu Sep 22 05:18:24 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Sep 2005 08:18:24 -0400 Subject: [openib-general] Re: [PATCH] Opensm - ignore strict-aliasing warning In-Reply-To: <5zaci5mycm.fsf@mtl066.yok.mtl.com> References: <5zaci5mycm.fsf@mtl066.yok.mtl.com> Message-ID: <1127391503.15613.2531.camel@hal.voltaire.com> Hi Yael, On Thu, 2005-09-22 at 02:05, Yael Kalka wrote: > Hi Hal, > > I saw that you didn't add this patch yet. No, I didn't as I responded to your email on this patch and hadn't heard back until now. > Using the -fno-strict-aliasing falg avoids the compiler from doing some optimizations that might cause the strict aliasing > bugs. So this is a fix, and not a workaround to remove the warnings. > I agree that it is not the best fix. It is better to fix the code in such a way that will enable these optimizations too. > I can look into it and have a better fix, but it will take some time. > Until then, this fix should be added, since currently we can produce code that might have these problems in it. OK. That seems safer. info gcc is unclear on whether it breaks generated code. It is only a warning. `-Wstrict-aliasing' This option is only active when `-fstrict-aliasing' is active. It warns about code which might break the strict aliasing rules that the compiler is using for optimization. The warning does not catch all cases, but does attempt to catch the more common pitfalls. It is included in `-Wall'. `-fstrict-aliasing' Allows the compiler to assume the strictest aliasing rules applicable to the language being compiled. For C (and C++), this activates optimizations based on the type of expressions. In particular, an object of one type is assumed never to reside at the same address as an object of a different type, unless the types are almost the same. For example, an `unsigned int' can alias an `int', but not a `void*' or a `double'. A character type may alias any other type. > Here is the patch again (in case you lost it along the mails....) Applied. Shouldn't -fno-strict-aliasing be added to opensm_CXXFLAGS as well ? -- Hal From jackm at mellanox.co.il Thu Sep 22 05:45:11 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Thu, 22 Sep 2005 15:45:11 +0300 Subject: [openib-general] [PATCH] check for valid MGID in user space Message-ID: <20050922124511.GA9109@mellanox.co.il> The following patch checks validity of MGID when attaching/detaching a QP to/from a multicast group (for user-space only). IB spec demands that multicast gids start with 0xFF in 0-th byte. (IB Spec v1.2,section 4.1.1 (page 144)). Index: linux-kernel/infiniband/core/uverbs_cmd.c =================================================================== --- linux-kernel/infiniband/core/uverbs_cmd.c (revision 3505) +++ linux-kernel/infiniband/core/uverbs_cmd.c (working copy) @@ -1040,7 +1040,7 @@ down(&ib_uverbs_idr_mutex); qp = idr_find(&ib_uverbs_qp_idr, cmd.qp_handle); - if (qp && qp->uobject->context == file->ucontext) + if (qp && qp->uobject->context == file->ucontext && cmd.gid[0] == 0xFF) ret = ib_attach_mcast(qp, (union ib_gid *) cmd.gid, cmd.mlid); up(&ib_uverbs_idr_mutex); @@ -1062,7 +1062,7 @@ down(&ib_uverbs_idr_mutex); qp = idr_find(&ib_uverbs_qp_idr, cmd.qp_handle); - if (qp && qp->uobject->context == file->ucontext) + if (qp && qp->uobject->context == file->ucontext && cmd.gid[0] == 0xFF) ret = ib_detach_mcast(qp, (union ib_gid *) cmd.gid, cmd.mlid); up(&ib_uverbs_idr_mutex); From halr at voltaire.com Thu Sep 22 06:13:12 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Sep 2005 09:13:12 -0400 Subject: [openib-general] Re: [PATCH] Opensm - Fix bug in trap receiver In-Reply-To: <5z8xxpmio5.fsf@mtl066.yok.mtl.com> References: <5z8xxpmio5.fsf@mtl066.yok.mtl.com> Message-ID: <1127394790.15613.2771.camel@hal.voltaire.com> On Thu, 2005-09-22 at 07:43, Yael Kalka wrote: > There is a bug in the trap handling in opensm. > Currently - when OpenSM receives a trap, it tries to send a trap > repress even if the OpenSM is not in MASTER state. > Also - if OpenSM receives a trap from its local lid (source_lid is 0), > and it has not yet configured its lid - it should not try to send trap > repress, as this will fail (no destination lid available to define the > path). > Attached is a patch to fix these issues. Thanks. Applied. Try not to forget your signed off line. -- Hal From halr at voltaire.com Thu Sep 22 06:18:31 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Sep 2005 09:18:31 -0400 Subject: [openib-general] Re: [PATCH] Opensm - Fix bug in cl_event_wheel In-Reply-To: <5z7jd9milj.fsf@mtl066.yok.mtl.com> References: <5z7jd9milj.fsf@mtl066.yok.mtl.com> Message-ID: <1127395111.15613.2803.camel@hal.voltaire.com> On Thu, 2005-09-22 at 07:45, Yael Kalka wrote: > Hi Hal, > > There is a bug in the init function of cl_event_wheel - initializing > of spinlock inside CL_ASSERT. > Attached is a patch to fix it. > > Thanks, > Yael > > Index: complib/cl_event_wheel.c > =================================================================== > --- complib/cl_event_wheel.c (revision 439) ^^^ What code base is this off of ? > +++ complib/cl_event_wheel.c (working copy) > @@ -224,7 +224,14 @@ cl_event_wheel_init( > p_event_wheel->p_log = p_log; > p_event_wheel->p_external_lock = NULL; > p_event_wheel->closing = FALSE; > - CL_ASSERT( cl_spinlock_init( &(p_event_wheel->lock) ) == CL_SUCCESS ); > + cl_status = cl_spinlock_init( &(p_event_wheel->lock) ); > + if (cl_status != CL_SUCCESS) > + { > + osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, > + "cl_event_wheel_init : ERROR 1000: " ^^^^ This is a duplicate error number. I've committed this patch; just indicate what error number to change this to. > + "Failed to initialize cl_spinlock\n" ); > + goto Exit; > + } > cl_qlist_init( &p_event_wheel->events_wheel); > cl_qmap_init( &p_event_wheel->events_map ); > > @@ -237,8 +244,10 @@ cl_event_wheel_init( > { > osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, > "cl_event_wheel_init : ERROR 1000: " > - "Failed to initialize timer\n" ); > + "Failed to initialize cl_timer\n" ); > + goto Exit; > } > + Exit: > OSM_LOG_EXIT( p_event_wheel->p_log ); > return(cl_status); > } Again, please try and remember your signed off line. -- Hal From info at openib.org Thu Sep 22 08:23:40 2005 From: info at openib.org (info at openib.org) Date: Thu, 22 Sep 2005 21:23:40 +0600 Subject: [openib-general] PGMJUHQBYXVQF Message-ID: <0IN9001G63JINS@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: important-details.zip Type: application/octet-stream Size: 53536 bytes Desc: not available URL: From Administrator at openib.org Thu Sep 22 08:23:26 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Thu, 22 Sep 2005 08:23:26 -0700 Subject: [openib-general] [MailServer Notification]To Recipient file blocking settings matched and action taken. Message-ID: <00b501c5bf89$9435dd70$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has blocked an attachment. Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] PGMJUHQBYXVQF Scanning time = 9/22/2005 8:23:26 AM Action on file blocking: The attachment important-details.zip matches the file blocking settings. ScanMail has Quarantined it. The attachment was quarantined to C:\Program Files\Trend\Smex\Alert\important-details4332cc6ed.zip_. Warning to Recipient: Action taken by attachment blocking. From halr at voltaire.com Thu Sep 22 08:17:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Sep 2005 11:17:46 -0400 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <1127350851.15613.390.camel@hal.voltaire.com> References: <4df28be4050921172370bad964@mail.gmail.com> <1127350851.15613.390.camel@hal.voltaire.com> Message-ID: <1127402265.15613.3424.camel@hal.voltaire.com> Hi again Viswa, On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote: > Hi Viswa, > > On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote: > > Currently opensm traps SIGINT. There was some discussion to remove it. > > I have currently running some tests on opensm > > by killing (SIGKILL) and restarting opensm. So far I ahve not found > > any resource leak issues. Is ther a plan to remove that > > signal handler. Ideally it should not exist. > > Eitan stated that this was historical in nature for gen1 drivers which > had resource tracking problems: "if OpenSM left without cleaning up all > used resources (like MAD buffers and UD-AVs), the driver oops'ed." > > I think that (eliminating the handler for SIGINT) can at least be done > for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor > layers for starters. I will experiment with gen2 and let you know. Does the patch below do what you want ? Can you try it ? -- Hal Index: opensm/osm_opensm.c =================================================================== --- opensm/osm_opensm.c (revision 3513) +++ opensm/osm_opensm.c (working copy) @@ -182,7 +182,9 @@ osm_reg_sig_handler( IN osm_opensm_t * const p_osm ) { __p_osm_to_signal = p_osm; +#ifndef OSM_VENDOR_INTF_OPENIB cl_reg_sig_hdl( SIGINT, __sig_handler ); +#endif cl_reg_sig_hdl( SIGTERM, __sig_handler ); cl_reg_sig_hdl( SIGHUP, __sig_handler ); osm_exit_flag = 0; From caitlin.bestler at gmail.com Thu Sep 22 08:37:44 2005 From: caitlin.bestler at gmail.com (Caitlin Bestler) Date: Thu, 22 Sep 2005 08:37:44 -0700 Subject: [openib-general] [RFC] libibverbs completion event handling In-Reply-To: <52u0gej99b.fsf@cisco.com> References: <54AD0F12E08D1541B826BE97C98F99F1020860@NT-SJCA-0751.brcm.ad.broadcom.com> <52u0gej99b.fsf@cisco.com> Message-ID: <469958e0050922083727fa7b3a@mail.gmail.com> On 9/21/05, Roland Dreier wrote: > > Caitlin> I'm not sure I follow what a "completion channel" is. My > Caitlin> understanding is that work completions are stored in > Caitlin> user-accessible memory (typically a ring buffer). This > Caitlin> enables fast-path reaping of work completions. The OS has > Caitlin> no involvement unless notifications are enabled. > > Right. Notifications ("events" in the terminology I used) are the > only thing I was talking about. > > Caitlin> The "completion vector" is used to report completion > Caitlin> notifications. So is the completion vector a *single* > Caitlin> resource used by the driver/verbs to report completions, > Caitlin> where said notifications are then split into user context > Caitlin> dependent "completion channels"? > > Yes. And how does a completion vector relate to callbacks? Would this be typically identifying the number of "processing resources" (such as distinct execution contexts) that will be allocated? Or more formally, if two CQs use the same completion vector, are we guaranteed that they will not have concurrent callbacks? Caitlin> The RDMAC verbs did not define callbacks to userspace at > Caitlin> all. Instead it is assumed that the proxy for user mode > Caitlin> services will receive the callbacks, and how it relays > Caitlin> those notifications to userspace is outside the scope of > Caitlin> the verbs. > > That "outside the scope" part is exactly what I'm talking about > implementing here. How would this enable kernel mode processing of notifications for a user-mode CQ by the proxy? -------------- next part -------------- An HTML attachment was scrubbed... URL: From krause at cup.hp.com Thu Sep 22 08:38:44 2005 From: krause at cup.hp.com (Michael Krause) Date: Thu, 22 Sep 2005 08:38:44 -0700 Subject: [openib-general] [RFC] libibverbs completion event handling In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1020860@NT-SJCA-0751.brcm.a d.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F1020860@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <6.2.0.14.2.20050922082655.022f41a8@esmail.cup.hp.com> At 03:33 PM 9/21/2005, Caitlin Bestler wrote: >I'm not sure I follow what a "completion channel" is. >My understanding is that work completions are stored in >user-accessible memory (typically a ring buffer). This >enables fast-path reaping of work completions. The OS >has no involvement unless notifications are enabled. > >The "completion vector" is used to report completion >notifications. So is the completion vector a *single* >resource used by the driver/verbs to report completions, >where said notifications are then split into user >context dependent "completion channels"? > >The RDMAC verbs did not define callbacks to userspace >at all. Instead it is assumed that the proxy for user >mode services will receive the callbacks, and how it >relays those notifications to userspace is outside >the scope of the verbs. Correct. >Both uDAPL and ITAPI define relays of notifications >to AEVDS/CNOs and/or file descriptors. Forwarding >a completion notification to userspace in order to >make a callback in userspace so that it can kick >an fd to wake up another thread doesn't make much >sense. The uDAPL/ITAPI/whatever proxy can perform >all of these functions without any device dependencies >and in a way that is fully optimal for the usermode >API that is being used. Exactly. This was the intention. Does not really matter what the API is but that there by an API that does this work on behalf of the consumer. >For kernel clients, I don't see any need for anything beyond the already >defined >callbacks direct from the device-dependent code. This was the intention when we designed the verbs. >Even in the typical case where the usermode application >does an evd_wait() on the DAT or ITAPI endpoint, the >DAT/ITAPI proxy will be able to determine which thread >should be woken and could even do so optimally. It >also allows the proxy to implemenet Access Layer features >such as EVD thresholding without device-specific support. Correct. > > -----Original Message----- > > From: openib-general-bounces at openib.org > > [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier > > Sent: Wednesday, September 21, 2005 12:22 PM > > To: openib-general at openib.org > > Subject: [openib-general] [RFC] libibverbs completion event handling > > > > While thinking about how to handle some of the issues raised > > by Al Viro in , I > > realized that our verbs interface could be improved to make > > delivery of completion events more flexible. For example, > > Arlin's request for using one FD for each CQ can be > > accomodated quite nicely. > > > > The basic idea is to create new objects that I call > > "completion vectors" and "completion channels." Completion > > vectors refer to the interrupt generated when a completion > > event occurs. With the current drivers, there will always be > > a single completion vector, but once we have full MSI-X > > support, multiple completion vectors will be possible. When I proposed the use of multiple completion handlers, it was based on the operating assumption that either MSI or MSI-X be used by the underlying hardware. Either is possible - MSI limits it to a single address with 32 data values which allows different handlers to be bound to each value though targeting a single processor. MSI-X builds upon technology we've been shipping for nearly 20 years now and allows up to 2048 different addresses which may target or multiple processors. Any API should be able to deal with both approaches thus should not assume anything about whether one or more handlers are bound to a given processor. > > Orthogonal to this is the notion of a completion channel. > > This is a FD used for delivering completion events to userspace. > > > > Completion vectors are handled by the kernel, and userspace > > cannot change the number of vectors that available. On the > > other hand, completion channels are created at the request of > > a userspace process, and userspace can create as many > > channels as it wants. > > > > Every userspace CQ has a completion vector and a completion channel. > > Multiple CQs can share the same completion vector and/or the > > same completion channel. CQs with different completion > > vectors can still share a completion channel, and vice versa. > > > > The exact API would be something like the below. Thoughts? Why wouldn't it just be akin to the verbs interface - here are the event handler and callback routines to associate with a given CQ. The handler might be nothing more than an index into a set of functions that are stored within the kernel - these functions are either device-specific (i.e. supplied by the IHV) or a OS-specific such as dealing with error events (might also have a device-specific component as well). When the routine is invoked, it has basically has three parameters: CQ to target, number of CQE to reap, address to store CQE. I do not see what more is required. Mike > > > > Thanks, > > Roland > > > > struct ibv_comp_channel { > > int fd; > > }; > > > > /** > > * ibv_create_comp_channel - Create a completion event > > channel */ extern struct ibv_comp_channel > > *ibv_create_comp_channel(struct ibv_context *context); > > > > /** > > * ibv_destroy_comp_channel - Destroy a completion event > > channel */ extern int ibv_destroy_comp_channel(struct > > ibv_comp_channel *channel); > > > > /** > > * ibv_create_cq - Create a completion queue > > * @context - Context CQ will be attached to > > * @cqe - Minimum number of entries required for CQ > > * @cq_context - Consumer-supplied context returned for > > completion events > > * @channel - Completion channel where completion events will > > be queued. > > * May be NULL if completion events will not be used. > > * @comp_vector - Completion vector used to signal completion events. > > * Must be >= 0 and < context->num_comp_vectors. > > */ > > extern struct ibv_cq *ibv_create_cq(struct ibv_context > > *context, int cqe, > > void *cq_context, > > struct ibv_comp_channel *channel, > > int comp_vector); > > > > /** > > * ibv_get_cq_event - Read next CQ event > > * @channel: Channel to get next event from. > > * @cq: Used to return pointer to CQ. > > * @cq_context: Used to return consumer-supplied CQ context. > > * > > * All completion events returned by ibv_get_cq_event() must > > * eventually be acknowledged with ibv_ack_cq_events(). > > */ > > extern int ibv_get_cq_event(struct ibv_comp_channel *channel, > > struct ibv_cq **cq, void > > **cq_context); _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From krause at cup.hp.com Thu Sep 22 08:45:50 2005 From: krause at cup.hp.com (Michael Krause) Date: Thu, 22 Sep 2005 08:45:50 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <469958e0050921173063e14203@mail.gmail.com> References: <54AD0F12E08D1541B826BE97C98F99F1020856@NT-SJCA-0751.brcm.ad.broadcom.com> <43319157.5020708@ichips.intel.com> <469958e0050921173063e14203@mail.gmail.com> Message-ID: <6.2.0.14.2.20050922083901.022f42f0@esmail.cup.hp.com> At 05:30 PM 9/21/2005, Caitlin Bestler wrote: >On 9/21/05, Sean Hefty ><mshefty at ichips.intel.com> wrote: >Caitlin Bestler wrote: > > That's certainly an acceptably low overhead for iWARP IHVs, > > provided there are applications that want this control and > > *not* also need even more IB-specific CM control. I still > > have the same skepticism I had for the IT-API's exposing > > of paths via a transport neutral API. Namely, is there > > really any basis to select amongst multiple paths from > > transport neutral code? The same applies to caching of > > address translations on a transport neutral basis. Is > > it really possible to do in any way that makes sense? > > Wouldn't caching at a lower layer, with transport/device > > specific knowledge, make more sense? > >I guess I view this API slightly differently than being just a transport >neutral >connection interface. I also see it as a way to connect over IB using IP >addresses, which today is only possible if using ib_at. That is, the API >could >do both. > > >Given that purpose I can envision an IB-aware application that needed >to use IP addresses and wanted to take charge of caching the translation. > >But viewing this in a wider scope raises a second question. Shouldn't >iSER be using the same routines to establish connections? While many applications do use IP addresses, unless one goes the route of defining an IP address per path (something that iSCSI does comprehend today), IB multi-path (and I suspect eventually Ethernet's multi-path support) will require interconnect specific interfaces. Ideally, applications / ULP define the destination and QoS requirements - what we used to call an address vector. Middleware maps those to a interconnect-specific path on behalf of the application / ULP. This is done underneath the API as part of the OS / RDMA infrastructure. Such an approach works quite well for many applications / ULP however it should not be the only one supported as it assumes that the OS / RDMA infrastructure is sufficiently robust to apply policy management decisions in conjunction with the fabric management being deployed. Given IB SM will vary in robustness, there must also exist API that allow applications / ULP to comprehend the set of paths and select accordingly. I can envision how to construct such a knowledge that is interconnect independent but it requires more "standardization" about what defines the QoS requirements - latency, bandwidth, service rate, no single point of failure, etc. What I see so far does not address these issues. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlentini at netapp.com Thu Sep 22 09:09:39 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 22 Sep 2005 12:09:39 -0400 (EDT) Subject: [openib-general] 3513 DAPL is Broken In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F00059AF893@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F00059AF893@orsmsx408> Message-ID: Is there a way to reproduce this with dapltest? Our 6 canonical regression tests (see test/dapltest/scripts/regress.sh), don't encounter this problem on revision 3521. Are you sure the application is hanging in DAPL? Can you enable DAPL debugging and send the output (see doc/dapl_environ.txt and doc/dat_environ.txt)? Thanks, james On Wed, 21 Sep 2005, Woodruff, Robert J wrote: > Seems to hang around the time of the modify QP. > > ibv_rc_pingpong seems to work OK and also your > DAPL-socket CM version that you gave me yesterday seems > to work, but the DAPL I pulled from SVN that uses the IB AT/CM > has the following problem. > > I am starting to think that pushing out your socket CM > version until things stabilize with the IBAT/IBCM version > might be worth considering, so that people that > want to use DAPL now have something that is reliable. > > woody > > Here is the dapl trace when running Intel MPI on top of uDAPL 3513, > > dapl_ia_query (0x522860, (nil), 0x0, (nil), 0x3ffffff, 0x7fbfffe510) > dapl_ia_query () returns 0x0 > dapl_evd_create () returns 0x0 > setup_listener(ia_ptr 0x522860 SID 3545 sp 0x5238e0 conn 0x5239a0 id > 5389248) > setup_listener(conn=0x5239a0 cm_id=5389248) > dapl_ep_create (0x522860, 0x5235a0, 0x523620, 0x523620, 0x523780, > 0x7fbfffecb0, 0x5201e8) > query_hca: MAX msg 2147483648 dto 65535 iov 59 rdma i4,o4 > qp_alloc: ia_ptr 0x522860 ep_ptr 0x526740 ep_ctx_ptr 0x526740 > qp_alloc: qpn 0xc0409 sq 1000,9 rq 1000,1 > modify_qp: qp 0x523c50, state 1 qp_num 0xc0409 > dapl_ep_create (0x522860, 0x5235a0, 0x523620, 0x523620, 0x523780, > 0x7fbfffecb0, 0x5203b8) > query_hca: MAX msg 2147483648 dto 65535 iov 59 rdma i4,o4 > qp_alloc: ia_ptr 0x522860 ep_ptr 0x526a20 ep_ctx_ptr 0x526a20 > qp_alloc: qpn 0xc040a sq 1000,9 rq 1000,1 > modify_qp: qp 0x526d00, state 1 qp_num 0xc040a > dapl_ep_create (0x522860, 0x5235a0, 0x523620, 0x523620, 0x523780, > 0x7fbfffecb0, 0x520758) > query_hca: MAX msg 2147483648 dto 65535 iov 59 rdma i4,o4 > qp_alloc: ia_ptr 0x522860 ep_ptr 0x593470 ep_ctx_ptr 0x593470 > qp_alloc: qpn 0xc040b sq 1000,9 rq 1000,1 > modify_qp: qp 0x526e40, state 1 qp_num 0xc040b From mshefty at ichips.intel.com Thu Sep 22 09:27:36 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 22 Sep 2005 09:27:36 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <433275AC.4080005@voltaire.com> References: <54AD0F12E08D1541B826BE97C98F99F1020856@NT-SJCA-0751.brcm.ad.broadcom.com> <43319157.5020708@ichips.intel.com> <433275AC.4080005@voltaire.com> Message-ID: <4332DB78.1000808@ichips.intel.com> Guy German wrote: > I don't think this layer should replace ib_at. If you think there are > things to be fixed in the ib_at, I suggest we fix them. I do believe > that the original purpose of this generic cm was to serve ulps that > don't want to be transport oriented (e.g. iSER). Based on discussions from last month, the general agreement was to use CM private data in place of ATS. Once that's done, I don't see a need for ib_at. (Also, put simply, I don't believe that ATS can work.) I think that a combination of what Roland, including his original API design, and Yaron proposed is the right direction to go. - Sean From jlentini at netapp.com Thu Sep 22 09:29:36 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 22 Sep 2005 12:29:36 -0400 (EDT) Subject: [openib-general] [IBAT] interface numbering assumption Message-ID: Hal, IBAT's resolve_ip function assumes that network interfaces are consecutively numbered, see at.c line 1691. One of my machines ended up with the following configuration: # ls /sys/class/net/ eth0 eth1 ib0 ib1 lo sit0 # cat /sys/class/net/lo/ifindex 1 # cat /sys/class/net/eth0/ifindex 2 # cat /sys/class/net/eth1/ifindex 3 # cat /sys/class/net/sit0/ifindex 4 # cat /sys/class/net/ib0/ifindex 9 # cat /sys/class/net/ib1/ifindex 10 I'm not sure how this happened. As a result, the for loop on line 1691 exited before finding an IPoIB device. A quick reboot fixed the problem. Is there a better way to enumerate all of the network inferaces? I believe that is what this for loop is attempting to accomplish. james From guyg at voltaire.com Thu Sep 22 09:36:39 2005 From: guyg at voltaire.com (Guy German) Date: Thu, 22 Sep 2005 19:36:39 +0300 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <4332DB78.1000808@ichips.intel.com> References: <54AD0F12E08D1541B826BE97C98F99F1020856@NT-SJCA-0751.brcm.ad.broadcom.com> <43319157.5020708@ichips.intel.com> <433275AC.4080005@voltaire.com> <4332DB78.1000808@ichips.intel.com> Message-ID: <4332DD97.70408@voltaire.com> Sean Hefty wrote: > Guy German wrote: > >> I don't think this layer should replace ib_at. If you think there are >> things to be fixed in the ib_at, I suggest we fix them. I do believe >> that the original purpose of this generic cm was to serve ulps that >> don't want to be transport oriented (e.g. iSER). > > > Based on discussions from last month, the general agreement was to use > CM private data in place of ATS. Once that's done, I don't see a need > for ib_at. (Also, put simply, I don't believe that ATS can work.) I > think that a combination of what Roland, including his original API > design, and Yaron proposed is the right direction to go. ib_at works also with ipoib. The current way my cma implementation is using it, for instance, does not use ATS at all. The way I see it, ib_at is an address resolution module for Infiniband (that can probably be improved) and the cma should be a generic connection manager for rdma transports. Guy From mshefty at ichips.intel.com Thu Sep 22 09:37:09 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 22 Sep 2005 09:37:09 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <6.2.0.14.2.20050922083901.022f42f0@esmail.cup.hp.com> References: <54AD0F12E08D1541B826BE97C98F99F1020856@NT-SJCA-0751.brcm.ad.broadcom.com> <43319157.5020708@ichips.intel.com> <469958e0050921173063e14203@mail.gmail.com> <6.2.0.14.2.20050922083901.022f42f0@esmail.cup.hp.com> Message-ID: <4332DDB5.10707@ichips.intel.com> Michael Krause wrote: > accordingly. I can envision how to construct such a knowledge that is > interconnect independent but it requires more "standardization" about > what defines the QoS requirements - latency, bandwidth, service rate, no > single point of failure, etc. What I see so far does not address these > issues. I removed the QoS structures from my latest version until something can be defined that is usable, and it's clear how the QoS requirements map to the underlying routes. (My personal experience with QoS parameters in an API is that it never moves beyond the stage of being a place-holder.) Any patch that defines and implements QoS to routing would be greatly appreciated. And reference to an application that makes use of it would be helpful. - Sean From xma at us.ibm.com Thu Sep 22 09:41:11 2005 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 22 Sep 2005 09:41:11 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <4331EFD9.8000402@ichips.intel.com> Message-ID: > If the MAD code encounters an issue initializing a device, it will cleanup all resources allocated to that device. The error handling is per device, rather than per port. I am not convinced. First only MAD resouce not all resouces allocated to that device gets cleanup. (Some resources are still allocated for all ports, and if you continue loading ib_ipoib modules, you can see the ib interface with no QP1, but configuring the interface will fail, and hung in the kernel.) Second, the port usability(QP1 & QP0 have both have been created successfully) depends on other port is not a good design. Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Thu Sep 22 09:51:26 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 22 Sep 2005 09:51:26 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: References: Message-ID: <4332E10E.2030208@ichips.intel.com> Shirley Ma wrote: > > > If the MAD code encounters an issue initializing a device, it will > cleanup all resources allocated to that device. The error handling is > per device, rather than per port. > > I am not convinced. First only MAD resouce not all resouces allocated to > that device gets cleanup. (Some resources are still allocated for all I was only referring to the MAD code. > ports, and if you continue loading ib_ipoib modules, you can see the ib > interface with no QP1, but configuring the interface will fail, and hung > in the kernel.) > Second, the port usability(QP1 & QP0 have both have been created > successfully) depends on other port is not a good design. This is an error handling issue that can be fixed. Although, I don't think that it's necessarily worth doing, since errors should be very rare. The error wrt to ipoib hanging the kernel, if an earlier error should occur, needs to be fixed however. The problem could be in the MAD layer, SA code, or ipoib itself. - Sean From Administrator at openib.org Thu Sep 22 10:03:14 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Thu, 22 Sep 2005 12:03:14 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <000501c5bf97$852860b0$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] PGMJUHQBYXVQF Scanning time = 9/22/2005 12:03:14 PM Engine/Pattern = 7.510-1002/2.853.00 Action on virus found: The attachment important-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/22/2005 important-details.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] PGMJUHQBYXVQF From robert.j.woodruff at intel.com Thu Sep 22 10:10:48 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Thu, 22 Sep 2005 10:10:48 -0700 Subject: [openib-general] 3513 DAPL is Broken In-Reply-To: Message-ID: James wrote, >Is there a way to reproduce this with dapltest? >Our 6 canonical regression tests (see >test/dapltest/scripts/regress.sh), don't encounter this problem on >revision 3521. I'll try to investigate a bit more this afternoon or tomorrow. Arlin is OOP on vacation till monday. I need to double check to make sure I got everything installed and re-built correctly. I can also try dapltest or DAPL netpipe. woody From xma at us.ibm.com Thu Sep 22 10:25:31 2005 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 22 Sep 2005 10:25:31 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <4332E10E.2030208@ichips.intel.com> Message-ID: > Although, I don't think that it's necessarily worth doing, since errors should be very rare. Agree. Since Galaxy hits this problem on PPC, we've tried different approaches to fix this problem. None of them work well. And it's hard to change the existing architecture. So I would like to work on a patch to address the error. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Thu Sep 22 10:33:22 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 22 Sep 2005 10:33:22 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: References: Message-ID: <4332EAE2.9060107@ichips.intel.com> Shirley Ma wrote: > > Although, I don't think that it's necessarily worth doing, since > errors should be very rare. > Agree. Since Galaxy hits this problem on PPC, we've tried different > approaches to fix this problem. None of them work well. And it's hard to > change the existing architecture. So I would like to work on a patch to > address the error. A patch for this would be accepted. I should note that other modules, such as the CM, follow this same error recovery. If an error occurs trying to initialize any of the ports, the entire device is not used by that module. Ipoib appears to handle each port separately, however, so that a failure on one port does not mark the others as invalid. Roland should know for certain, but at least that's the way the code looks to me. - Sean From rolandd at cisco.com Thu Sep 22 10:38:58 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 22 Sep 2005 10:38:58 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <4332EAE2.9060107@ichips.intel.com> (Sean Hefty's message of "Thu, 22 Sep 2005 10:33:22 -0700") References: <4332EAE2.9060107@ichips.intel.com> Message-ID: <52y85pdmtp.fsf@cisco.com> Sean> Ipoib appears to handle each port separately, however, so Sean> that a failure on one port does not mark the others as Sean> invalid. Roland should know for certain, but at least Sean> that's the way the code looks to me. Sort of. However, the whole stack was architected with the expectation that "devices" would be the unit we operate on. It will take quite a bit of work to fix everything up so that all the pieces work with devices where some ports are not accessible. I still would like to understand the real issue with Galaxy. Why do we have to wait for a port to be in the Active state before we use it? - R. From mshefty at ichips.intel.com Thu Sep 22 10:46:32 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 22 Sep 2005 10:46:32 -0700 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: References: Message-ID: <4332EDF8.7090700@ichips.intel.com> Sean Hefty wrote: > The following patch updates the original CMA APIs. > > Updated implementation to follow. > > If there's agreement, can we check this into svn under the ULP directory? I've checked this into svn under svn/gen2/users/mshefty/linux-kernel/infiniband, so that changes can be tracked easier. - Sean From xma at us.ibm.com Thu Sep 22 11:32:14 2005 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 22 Sep 2005 11:32:14 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <52y85pdmtp.fsf@cisco.com> Message-ID: >I still would like to understand the real issue with Galaxy. Why do we have to wait for a port to be in the Active state before we use it? The real issue is not in Galaxy, it's in PPC pHype firmware. We are working with pHype team to address this problem now. It would take quite some time to fix it in the firmware. I was told the architecture is pretty complicated. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From viswa.krish at gmail.com Thu Sep 22 11:32:39 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Thu, 22 Sep 2005 11:32:39 -0700 Subject: [openib-general] ib_create_cq memory leak? In-Reply-To: <52fyrxg1qy.fsf@cisco.com> References: <4df28be40509211314243b4bed@mail.gmail.com> <52fyrxg1qy.fsf@cisco.com> Message-ID: <4df28be4050922113273f4cc59@mail.gmail.com> Roland, Thanks. Tested this out.. Works like a charm... -Viswa On 9/21/05, Roland Dreier wrote: > > Thanks very much for the excellent test case. The following patch > (already checked into svn and queued in git for merging into 2.6.14) > should fix things -- on my system, your test case ran successfully for > many hundreds of iterations. > > --- linux-kernel/infiniband/hw/mthca/mthca_memfree.c (revision 3500) > +++ linux-kernel/infiniband/hw/mthca/mthca_memfree.c (working copy) > @@ -529,12 +529,25 @@ int mthca_alloc_db(struct mthca_dev *dev > goto found; > } > > + for (i = start; i != end; i += dir) > + if (!dev->db_tab->page[i].db_rec) { > + page = dev->db_tab->page + i; > + goto alloc; > + } > + > if (dev->db_tab->max_group1 >= dev->db_tab->min_group2 - 1) { > ret = -ENOMEM; > goto out; > } > > + if (group == 0) > + ++dev->db_tab->max_group1; > + else > + --dev->db_tab->min_group2; > + > page = dev->db_tab->page + end; > + > +alloc: > page->db_rec = dma_alloc_coherent(&dev->pdev->dev, 4096, > &page->mapping, GFP_KERNEL); > if (!page->db_rec) { > @@ -554,10 +567,6 @@ int mthca_alloc_db(struct mthca_dev *dev > } > > bitmap_zero(page->used, MTHCA_DB_REC_PER_PAGE); > - if (group == 0) > - ++dev->db_tab->max_group1; > - else > - --dev->db_tab->min_group2; > > found: > j = find_first_zero_bit(page->used, MTHCA_DB_REC_PER_PAGE); > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xma at us.ibm.com Thu Sep 22 11:35:00 2005 From: xma at us.ibm.com (Shirley Ma) Date: Thu, 22 Sep 2005 11:35:00 -0700 Subject: [openib-general] problems on device/ports initialization In-Reply-To: <4332EAE2.9060107@ichips.intel.com> Message-ID: > A patch for this would be accepted. I should note that other modules, such as the CM, follow this same error recovery. If an error occurs trying to initialize any of the ports, the entire device is not used by that module. > Ipoib appears to handle each port separately, however, so that a failure on one port does not mark the others as invalid. Roland should know for certain, but at least that's the way the code looks to me. Great. Thanks! I will try to address these errors as well. Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From viswa.krish at gmail.com Thu Sep 22 11:37:56 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Thu, 22 Sep 2005 11:37:56 -0700 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <1127402265.15613.3424.camel@hal.voltaire.com> References: <4df28be4050921172370bad964@mail.gmail.com> <1127350851.15613.390.camel@hal.voltaire.com> <1127402265.15613.3424.camel@hal.voltaire.com> Message-ID: <4df28be405092211377f0d4dac@mail.gmail.com> Hi Hal, Sure will test it out. I see no issue in this fix. I have run the following test overnight in a script with yesterday's code 1. Start opensm 2. Ping another node over IB 3. Run osmtest (osmtest -f c, osmtest -f a) 4. Kill opensm with -9 signal and repeat over The failures are captured in a log. This has run more than 2500 times without resource leak issues. I saw about 150 osmtest failures which I will followup with another mail. Once opensm failed to start correctly with SUBNET UP message in the log. -Viswa On 22 Sep 2005 11:17:46 -0400, Hal Rosenstock wrote: > > Hi again Viswa, > > On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote: > > Hi Viswa, > > > > On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote: > > > Currently opensm traps SIGINT. There was some discussion to remove it. > > > I have currently running some tests on opensm > > > by killing (SIGKILL) and restarting opensm. So far I ahve not found > > > any resource leak issues. Is ther a plan to remove that > > > signal handler. Ideally it should not exist. > > > > Eitan stated that this was historical in nature for gen1 drivers which > > had resource tracking problems: "if OpenSM left without cleaning up all > > used resources (like MAD buffers and UD-AVs), the driver oops'ed." > > > > I think that (eliminating the handler for SIGINT) can at least be done > > for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor > > layers for starters. I will experiment with gen2 and let you know. > > Does the patch below do what you want ? Can you try it ? > > -- Hal > > Index: opensm/osm_opensm.c > =================================================================== > --- opensm/osm_opensm.c (revision 3513) > +++ opensm/osm_opensm.c (working copy) > @@ -182,7 +182,9 @@ osm_reg_sig_handler( > IN osm_opensm_t * const p_osm ) > { > __p_osm_to_signal = p_osm; > +#ifndef OSM_VENDOR_INTF_OPENIB > cl_reg_sig_hdl( SIGINT, __sig_handler ); > +#endif > cl_reg_sig_hdl( SIGTERM, __sig_handler ); > cl_reg_sig_hdl( SIGHUP, __sig_handler ); > osm_exit_flag = 0; > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Sep 22 11:41:04 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Sep 2005 14:41:04 -0400 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <4df28be405092211377f0d4dac@mail.gmail.com> References: <4df28be4050921172370bad964@mail.gmail.com> <1127350851.15613.390.camel@hal.voltaire.com> <1127402265.15613.3424.camel@hal.voltaire.com> <4df28be405092211377f0d4dac@mail.gmail.com> Message-ID: <1127414463.15613.4541.camel@hal.voltaire.com> Hi Viswa, On Thu, 2005-09-22 at 14:37, Viswanath Krishnamurthy wrote: > Hi Hal, > > Sure will test it out. I see no issue in this fix. I have run the > following test overnight > in a script with yesterday's code > > 1. Start opensm > 2. Ping another node over IB > 3. Run osmtest (osmtest -f c, osmtest -f a) > 4. Kill opensm with -9 signal and repeat over > > The failures are captured in a log. > > This has run more than 2500 times without resource leak issues. I saw > about 150 osmtest > failures which I will followup with another mail. Some failures are intentional (bad flow tests). They are all not marked obviously. Some of this has been documented on the list but not fixed yet but I am interested in seeing what you are referring to. > Once opensm failed to start correctly with SUBNET UP message in the > log. So the subnet didn't come up and the ports didn't become active ? Just out of curiousity, could you unload and reload ib_umad and then start opensm when that occurs to see if that fixes things ? I'm not sure it would. Thanks. -- Hal From rolandd at cisco.com Thu Sep 22 11:48:15 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 22 Sep 2005 11:48:15 -0700 Subject: [openib-general] FW: SDP problems with 64K page size Message-ID: <52ll1pdjm8.fsf@cisco.com> Hi, Jerome asked me to forward this on, since for some reason his email didn't appear when he sent it. In any case there seem to be some PAGE_SIZE dependencies in SDP. Libor provided a patch that fixed this up a while ago, but I don't know if this is the right way to handle this. - R. From: Jerome Pioux To: openib-general at openib.org Sent: Wednesday, September 21, 2005 5:36 PM Subject: SDP problem running SDP on 64k kernel page size Hi I tried to run SDP on 2.6.12 kernel but configured with a 64k page size. I have the Mellanox 3.3.3 rc firmware release (rc7) that allows for this. It does not work - the error returned to the application is: 105 = No buffer space available We had similar results with Topspin stack in the past and it was fixed by the 2 patches below. I am not sure how this patch could be reconstructed in the openIB source and if even possible, but I was hoping that showing it here could be proved useful to someone? I stand by to provide you with more info (?) if needed... - Jerome PS: Patches to allow SDP to work on 64k page size kernel (3.3.3 firmware required) diff -Naur topspin-src-3.0.0-178/host/ulp/sdp/inet/include/sdp_buff_p.h topspin-src-3.0.0-178_bas4v2/host/ulp/sdp/inet/include/sdp_buff_p.h --- topspin-src-3.0.0-178/host/ulp/sdp/inet/include/sdp_buff_p.h 2005-03-17 21:13:14.000000000 +0100 +++ topspin-src-3.0.0-178_bas4v2/host/ulp/sdp/inet/include/sdp_buff_p.h 2005-07-21 11:55:46.895428699 +0200 @@ -45,6 +45,7 @@ #define TS_SDP_BUFFER_COUNT_MAX 1048576 #define TS_SDP_BUFFER_COUNT_INC 128 #define TS_SDP_BUFFER_FREE_MARK 1024 +#define TS_SDP_BUFFER_SIZE 16384 diff -Naur topspin-src-3.0.0-178/host/ulp/sdp/inet/sdp_buff.c topspin-src-3.0.0-178_bas4v2/host/ulp/sdp/inet/sdp_buff.c --- topspin-src-3.0.0-178/host/ulp/sdp/inet/sdp_buff.c 2005-03-17 21:13:14.000000000 +0100 +++ topspin-src-3.0.0-178_bas4v2/host/ulp/sdp/inet/sdp_buff.c 2005-07-21 11:55:46.896405262 +0200 @@ -585,7 +585,7 @@ * buffer descriptor. */ m_pool->buff_cur--; - free_page((unsigned long)buff->head); + kfree(buff->head); kmem_cache_free(m_pool->buff_cache, buff); } @@ -652,7 +652,7 @@ break; } /* if */ - buff->head = (tPTR)__get_free_page(GFP_ATOMIC); + buff->head = kmalloc(TS_SDP_BUFFER_SIZE, GFP_ATOMIC); if (NULL == buff->head) { TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, @@ -663,7 +663,7 @@ break; } /* if */ - buff->end = buff->head + PAGE_SIZE; + buff->end = buff->head + TS_SDP_BUFFER_SIZE; buff->data = buff->head; buff->tail = buff->head; buff->lkey = 0; @@ -679,7 +679,7 @@ TS_TRACE(MOD_LNX_SDP, T_TERSE, TRACE_FLOW_FATAL, "BUFFER: Failed to insert buffer into pool. <%d>", result); - free_page((unsigned long)buff->head); + kfree(buff->head); kmem_cache_free(m_pool->buff_cache, buff); break; } /* if */ @@ -742,7 +742,7 @@ memset(main_pool, 0, sizeof(tSDP_MAIN_POOL_STRUCT)); - main_pool->buff_size = PAGE_SIZE; + main_pool->buff_size = TS_SDP_BUFFER_SIZE; main_pool->buff_min = buff_min; main_pool->buff_max = buff_max; main_pool->alloc_inc = alloc_inc; From rolandd at cisco.com Thu Sep 22 11:58:47 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 22 Sep 2005 11:58:47 -0700 Subject: [openib-general] FW: SDP problems with 64K page size In-Reply-To: <52ll1pdjm8.fsf@cisco.com> (Roland Dreier's message of "Thu, 22 Sep 2005 11:48:15 -0700") References: <52ll1pdjm8.fsf@cisco.com> Message-ID: <52fyrxdj4o.fsf@cisco.com> Jerome, You could try this braindead port of Libor's patch and see if it works for you. I've not done anything beyond compile testing. - R. Index: linux-kernel/infiniband/ulp/sdp/sdp_buff.c =================================================================== --- linux-kernel/infiniband/ulp/sdp/sdp_buff.c (revision 3522) +++ linux-kernel/infiniband/ulp/sdp/sdp_buff.c (working copy) @@ -330,7 +330,7 @@ static struct sdpc_buff *sdp_buff_pool_a return NULL; } - buff->end = buff->head + PAGE_SIZE; + buff->end = buff->head + sdp_buff_pool_buff_size(); buff->data = buff->head; buff->tail = buff->head; buff->sge.lkey = 0; @@ -350,7 +350,7 @@ int sdp_buff_pool_init(void) int result; main_pool.pool_cache = kmem_cache_create("sdp_buff_pool", - PAGE_SIZE, + sdp_buff_pool_buff_size(), 0, 0, NULL, NULL); if (!main_pool.pool_cache) { Index: linux-kernel/infiniband/ulp/sdp/sdp_buff.h =================================================================== --- linux-kernel/infiniband/ulp/sdp/sdp_buff.h (revision 3522) +++ linux-kernel/infiniband/ulp/sdp/sdp_buff.h (working copy) @@ -101,6 +101,6 @@ struct sdpc_buff_root { */ #define sdp_buff_q_size(pool) ((pool)->size) -#define sdp_buff_pool_buff_size() PAGE_SIZE +#define sdp_buff_pool_buff_size() min(PAGE_SIZE, 16384) #endif /* _SDP_BUFF_H */ From viswa.krish at gmail.com Thu Sep 22 12:06:37 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Thu, 22 Sep 2005 12:06:37 -0700 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <1127414463.15613.4541.camel@hal.voltaire.com> References: <4df28be4050921172370bad964@mail.gmail.com> <1127350851.15613.390.camel@hal.voltaire.com> <1127402265.15613.3424.camel@hal.voltaire.com> <4df28be405092211377f0d4dac@mail.gmail.com> <1127414463.15613.4541.camel@hal.voltaire.com> Message-ID: <4df28be405092212065a7a1329@mail.gmail.com> Hal, On 22 Sep 2005 14:41:04 -0400, Hal Rosenstock wrote: > > Hi Viswa, > > On Thu, 2005-09-22 at 14:37, Viswanath Krishnamurthy wrote: > > Hi Hal, > > > > Sure will test it out. I see no issue in this fix. I have run the > > following test overnight > > in a script with yesterday's code > > > > 1. Start opensm > > 2. Ping another node over IB > > 3. Run osmtest (osmtest -f c, osmtest -f a) > > 4. Kill opensm with -9 signal and repeat over > > > > The failures are captured in a log. > > > > This has run more than 2500 times without resource leak issues. I saw > > about 150 osmtest > > failures which I will followup with another mail. > > Some failures are intentional (bad flow tests). They are all not marked > obviously. Some of this has been documented on the list but not fixed > yet but I am interested in seeing what you are referring to. I will attach the log later. > Once opensm failed to start correctly with SUBNET UP message in the > > log. > > So the subnet didn't come up and the ports didn't become active ? Just > out of curiousity, could you unload and reload ib_umad and then start > opensm when that occurs to see if that fixes things ? I'm not sure it > would. I do not think this would help. The system is never rebooted. Just opensm is started and stopped. On the mext opensm start/stop the subnet came up. I think it is more of an opensm issue than any kernel module issue. Thanks. > > -- Hal > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tternes at gmail.com Thu Sep 22 12:08:50 2005 From: tternes at gmail.com (Thaddeus Ternes) Date: Thu, 22 Sep 2005 14:08:50 -0500 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: <52aci7prnd.fsf@cisco.com> References: <52aci7prnd.fsf@cisco.com> Message-ID: These are OpenPower 720 machines. I've been away from the office for a few days, so I'll do some more poking around to see if I can come up with anything else. Maybe I've missed something in the logs or dmesg... Thaddeus On 9/20/05, Roland Dreier wrote: > Thaddeus> I'm attempting to bring up a Mellanox card in a Power5 > Thaddeus> machine and have hit a snag. I'm wondering if anybody > Thaddeus> else has seen issues similar to this on this particular > Thaddeus> hardware, as these cards seem to work in the Power4 > Thaddeus> machines. The card is detected, but then I hit an MMIO > Thaddeus> failure and ib_mthca fails. The call trace (from dmesg) > Thaddeus> is listed below. I do see that the firmware is older, > Thaddeus> but am not sure if that would necessarily bring about > Thaddeus> this problem. Any input is appreciated. > > For what it's worth, I have Mellanox PCI-X HCAs working fine in > POWER5-based OpenPower 710 systems. What kind of system are you using? > > It seems that you may really be hitting an error on the PCI bus that > the pSeries hardware is detecting. > > - R. > From rolandd at cisco.com Thu Sep 22 12:14:33 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 22 Sep 2005 12:14:33 -0700 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: (Thaddeus Ternes's message of "Thu, 22 Sep 2005 14:08:50 -0500") References: <52aci7prnd.fsf@cisco.com> Message-ID: <523bnwewyu.fsf@cisco.com> Thaddeus> These are OpenPower 720 machines. I've been away from Thaddeus> the office for a few days, so I'll do some more poking Thaddeus> around to see if I can come up with anything else. Thaddeus> Maybe I've missed something in the logs or dmesg... Have you tried the workaround of adding 'ib_mthca' to /etc/hotplug/blacklist and then loading the module after the system is fully booted? - R. From halr at voltaire.com Thu Sep 22 12:08:02 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Sep 2005 15:08:02 -0400 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <4df28be405092212065a7a1329@mail.gmail.com> References: <4df28be4050921172370bad964@mail.gmail.com> <1127350851.15613.390.camel@hal.voltaire.com> <1127402265.15613.3424.camel@hal.voltaire.com> <4df28be405092211377f0d4dac@mail.gmail.com> <1127414463.15613.4541.camel@hal.voltaire.com> <4df28be405092212065a7a1329@mail.gmail.com> Message-ID: <1127416082.15613.4718.camel@hal.voltaire.com> On Thu, 2005-09-22 at 15:06, Viswanath Krishnamurthy wrote: > I do not think this would help. The system is never rebooted. Just > opensm is started and stopped. On the mext opensm start/stop the > subnet came up. I think it is more of an opensm issue than any kernel > module issue. Can you run opensm in -V mode and send the log. It might be related to the SM Set PortInfo armed->active issue which has been documented but not resolved. -- Hal From viswa.krish at gmail.com Thu Sep 22 12:55:59 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Thu, 22 Sep 2005 12:55:59 -0700 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <1127416082.15613.4718.camel@hal.voltaire.com> References: <4df28be4050921172370bad964@mail.gmail.com> <1127350851.15613.390.camel@hal.voltaire.com> <1127402265.15613.3424.camel@hal.voltaire.com> <4df28be405092211377f0d4dac@mail.gmail.com> <1127414463.15613.4541.camel@hal.voltaire.com> <4df28be405092212065a7a1329@mail.gmail.com> <1127416082.15613.4718.camel@hal.voltaire.com> Message-ID: <4df28be405092212553174677@mail.gmail.com> Hal, Here is the log of osmtest failure. This was seen 150 times out of 2500 iterations. The opensm SUBNET UP failure is tough to reproduce. Saw it once in 2500 iterations. Unfortunately I did not collect the log on that error. The patch worked as expected and did not see any issues with ctrl-C. When I tried apply the patch, I got a failure. (I used the patch command). I manually added those 2 lines. Command Line Arguments Done with args Flow = All Validations Sep 21 17:50:56 684254 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def ault port. using default guid 0x2c90200400cfd Sep 21 17:50:56 686301 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def ault port. Sep 21 17:50:56 686347 [B7F026C0] -> osm_vendor_bind: Binding to port 0x2c90200400cfd. Sep 21 17:50:56 689963 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def ault port. Sep 21 17:50:56 691969 [B7F026C0] -> osm_vendor_get_all_port_attr: assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def ault port. Sep 21 17:50:56 693187 [B7F026C0] -> osmtest_validate_sa_class_port_info: ----------------------------- SA Class Port Info: base_ver:1 class_ver:2 cap_mask:0x202 resp_time_val:0x64 ----------------------------- Sep 21 17:50:56 775383 [B7F026C0] -> osmtest_wrong_sm_key_ignored: Try PortRecord for port with LID 0x0 Num:0x1. Sep 21 17:51:00 775320 [B76FFBB0] -> umad_receiver: ERR 5409: send completed with error (method=1 attr=12 trans_id=0x34) -- dropping. Sep 21 17:51:00 775389 [B76FFBB0] -> umad_receiver: ERR 5410: class 0x3 LID 0x0 Sep 21 17:51:00 775418 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003: Error on query (IB_TIMEOUT). Sep 21 17:51:00 775465 [B7F026C0] -> osmtest_wrong_sm_key_ignored: ERR 0011: Did not get a timeout but got (IB_SUCCESS). Sep 21 17:51:00 775581 [B7F026C0] -> osmt_register_service: Registering Service: name:osmt.srvc.1804289383.7793 id:0x6b8b26f 6. Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554 Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554 . Sep 21 17:51:04 779578 [B76FFBB0] -> umad_receiver: ERR 5409: send completed with error (method=2 attr=31 trans_id=0x36) --dropping. Sep 21 17:51:04 779604 [B76FFBB0] -> umad_receiver: ERR 5410: class 0x3 LID 0x0 Sep 21 17:51:04 779631 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003: Error on query (IB_TIMEOUT). Sep 21 17:51:04 779674 [B7F026C0] -> osmt_register_service: ERR 0364: ib_query failed (IB_TIMEOUT). Sep 21 17:51:04 779740 [B7F026C0] -> osmtest_run: ERR 00148: Service Flow failed (IB_TIMEOUT) OSMTEST: TEST "All Validations" FAIL -Viswa On 22 Sep 2005 15:08:02 -0400, Hal Rosenstock wrote: > > On Thu, 2005-09-22 at 15:06, Viswanath Krishnamurthy wrote: > > I do not think this would help. The system is never rebooted. Just > > opensm is started and stopped. On the mext opensm start/stop the > > subnet came up. I think it is more of an opensm issue than any kernel > > module issue. > > Can you run opensm in -V mode and send the log. It might be related to > the SM Set PortInfo armed->active issue which has been documented but > not resolved. > > -- Hal > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tternes at gmail.com Thu Sep 22 13:42:00 2005 From: tternes at gmail.com (Thaddeus Ternes) Date: Thu, 22 Sep 2005 15:42:00 -0500 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: <523bnwewyu.fsf@cisco.com> References: <52aci7prnd.fsf@cisco.com> <523bnwewyu.fsf@cisco.com> Message-ID: Yeah, same result as before. On 9/22/05, Roland Dreier wrote: > Thaddeus> These are OpenPower 720 machines. I've been away from > Thaddeus> the office for a few days, so I'll do some more poking > Thaddeus> around to see if I can come up with anything else. > Thaddeus> Maybe I've missed something in the logs or dmesg... > > Have you tried the workaround of adding 'ib_mthca' to /etc/hotplug/blacklist > and then loading the module after the system is fully booted? > > - R. > From pradeep at us.ibm.com Thu Sep 22 13:55:43 2005 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Thu, 22 Sep 2005 13:55:43 -0700 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: Message-ID: Adding ib_mthca to /etc/hotplug/blacklist worked for us (i.e. it is the workaround we adopted). Just to double check, you did reboot after adding to the blaclkist and then loaded ib_mthca after reboot -right? BTW, what kind of Power5 machine are you using? Pradeep pradeep at us.ibm.com Thaddeus Ternes To Roland Dreier 09/22/2005 01:42 cc PM Pradeep Satyanarayana/Beaverton/IBM at IBMUS, openib-general at openib.org Please respond to Subject Thaddeus Ternes Re: [openib-general] EEH: MMIO Failure on Power5 Yeah, same result as before. On 9/22/05, Roland Dreier wrote: > Thaddeus> These are OpenPower 720 machines. I've been away from > Thaddeus> the office for a few days, so I'll do some more poking > Thaddeus> around to see if I can come up with anything else. > Thaddeus> Maybe I've missed something in the logs or dmesg... > > Have you tried the workaround of adding 'ib_mthca' to /etc/hotplug/blacklist > and then loading the module after the system is fully booted? > > - R. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pic26721.gif Type: image/gif Size: 1255 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From tternes at gmail.com Thu Sep 22 14:00:48 2005 From: tternes at gmail.com (Thaddeus Ternes) Date: Thu, 22 Sep 2005 16:00:48 -0500 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: References: Message-ID: Yeah, did a reboot. I verified the modules weren't loaded (lsmod), and then modprobed ib_mthca. The same errors that I was seeing during startup were dropped to screen: p5l1:~# lsmod Module Size Used by p5l1:~# modprobe ib_mthca [599947.213712] ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) [599947.213732] ib_mthca: Initializing Mellanox Technologies MT23108 InfiniHost (0001:c1:00.0) [599948.488315] EEH: MMIO failure (2) on device: pci15b3,5a44 /pci@ 800000020000003/pci at 2/pci at 1/pci15b3,5a44 at 0 [599948.488343] Call Trace: [599948.488351] [c00000000f02b050] [c00000000002fc80] .eeh_dn_check_failure+0x2bc/0x314 (unreliable) [599948.488380] [c00000000f02b130] [c00000000002fdd4] .eeh_check_failure+0xfc/0x190 [599948.488425] [c00000000f02b1c0] [d0000000005f37cc] .mthca_cmd_poll+0x120/0x258 [ib_mthca] [599948.488469] [c00000000f02b290] [d0000000005f3cc8] .mthca_cmd_box+0x90/0xa8 [ib_mthca] [599948.488516] [c00000000f02b330] [d0000000005f5444] .mthca_INIT_HCA+0x240/0x288 [ib_mthca] [599948.488561] [c00000000f02b3e0] [d0000000005f2790] .mthca_init_one+0xd2c/0x180c [ib_mthca] [599948.488600] [c00000000f02b870] [c0000000001d4a2c] .pci_device_probe+0xac/0xdc [599948.488622] [c00000000f02b900] [c000000000239ec0] .driver_probe_device+0x80/0x15c [599948.488647] [c00000000f02b990] [c00000000023a130] .__driver_attach+0xa8/0xc4 [599948.488669] [c00000000f02ba20] [c0000000002390d4] .bus_for_each_dev+0x78/0xcc [599948.488699] [c00000000f02bad0] [c00000000023a174] .driver_attach+0x28/0x40 [599948.488718] [c00000000f02bb50] [c000000000239848] .bus_add_driver+0xc8/0x1dc [599948.488751] [c00000000f02bc00] [c00000000023a7b0] .driver_register+0x44/0x5c [599948.488771] [c00000000f02bc90] [c0000000001d46e4] .pci_register_driver+0x84/0xd8 [599948.488808] [c00000000f02bd10] [d000000000607594] .mthca_init+0x1c/0x48 [ib_mthca] [599948.488857] [c00000000f02bd90] [c00000000006cc88] .sys_init_module+0x2f0/0x4cc [599948.488885] [c00000000f02be30] [c00000000000d300] syscall_exit+0x0/0x18 [599948.488914] EEH: MMIO failure (2), notifiying device 0001:c1:00.0Mellanox Technologies MT23108 InfiniHost [599948.488986] ib_mthca 0001:c1:00.0: HCA FW version 3.2.0 is old (3.3.3 is current). [599948.489002] ib_mthca 0001:c1:00.0: If you have problems, try updating your HCA FW. [599948.490093] ib_mthca 0001:c1:00.0: SW2HW_MPT returned status 0x01 [599948.490107] ib_mthca 0001:c1:00.0: Failed to create driver PD, aborting. [599948.492268] ib_mthca: probe of 0001:c1:00.0 failed with error -22 This is on an OpenPower 720... Thaddeus On 9/22/05, Pradeep Satyanarayana wrote: > > Adding ib_mthca to /etc/hotplug/blacklist worked for us (i.e. it is the > workaround we adopted). Just to double check, you did reboot after adding to > the blaclkist and then loaded ib_mthca after reboot -right? > > BTW, what kind of Power5 machine are you using? > > Pradeep > pradeep at us.ibm.com > [image: Inactive hide details for Thaddeus Ternes ]Thaddeus > Ternes > > > > *Thaddeus Ternes * > > 09/22/2005 01:42 PM Please respond to > Thaddeus Ternes > > > To > > Roland Dreier > cc > > Pradeep Satyanarayana/Beaverton/IBM at IBMUS, openib-general at openib.org > Subject > > Re: [openib-general] EEH: MMIO Failure on Power5 > > Yeah, same result as before. > > On 9/22/05, Roland Dreier wrote: > > Thaddeus> These are OpenPower 720 machines. I've been away from > > Thaddeus> the office for a few days, so I'll do some more poking > > Thaddeus> around to see if I can come up with anything else. > > Thaddeus> Maybe I've missed something in the logs or dmesg... > > > > Have you tried the workaround of adding 'ib_mthca' to > /etc/hotplug/blacklist > > and then loading the module after the system is fully booted? > > > > - R. > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From mshefty at ichips.intel.com Thu Sep 22 14:06:13 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 22 Sep 2005 14:06:13 -0700 Subject: [openib-general] [PATCH] add cq error events In-Reply-To: <20050922103159.GE31820@mellanox.co.il> References: <20050922103159.GE31820@mellanox.co.il> Message-ID: <43331CC5.406@ichips.intel.com> Michael S. Tsirkin wrote: > As a side note, the spec says: "Two types of CQ errors can occur: the CQ can > overrun or it can become inaccessible": I wander whether this should > be interpreted in a sense that that there should be > two types of events: IB_EVENT_CQ_OVERRUN and IB_EVENT_CQ_ACCESS, rather than > just a generic IB_EVENT_CQ_ERR > > What do you think? I'm fine with specifying the CQ errors, though I'm not sure that the user can do anything differently. Maybe use CQ_ERROR or CQ_ACCESS_ERROR, rather than CQ_ACCESS. I'll let Roland comment on the changes to mthca. - Sean From jlentini at netapp.com Thu Sep 22 14:10:59 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 22 Sep 2005 17:10:59 -0400 (EDT) Subject: [openib-general] [PATCH] uDAPL dtest changes to measure RDMA reads In-Reply-To: References: Message-ID: On Wed, 21 Sep 2005, Arlin Davis wrote: arlin> Here is a patch to improve dtest and measure RDMA reads. arlin> Attachment included. Thanks Arlin. Committed in revision 3525. From pradeep at us.ibm.com Thu Sep 22 14:58:30 2005 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Thu, 22 Sep 2005 14:58:30 -0700 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: Message-ID: I have filed a bug against the kernel (for p-series) as a starting point. Could you please flll me on some of the other specifics a) which kernel were you using b) firmware level (presumably it is uptodate). One other issue that I failed to mention previously - is the HCA in one of the superslots (I know on my p570 slots 2 and 6 are superslots by default) and, is this superslot enabled? Here is a quote of how to enable superslots- One issue with the Mellanox cards in pSeries systems is to ensure that the card is installed in a superslot, and that the "I/O Adapter Enlarged Capacity" setting has been enabled for the system. For a p570, slots C6 and C2 are the available super slots. To enable the "Enlarged Capacity" feature, go to ASM and select the following screens: System Configuration->I/O Adapter Enlarged Capacity Set the setting to Enabled and save it. If this does not help, I have already filed the bug. Please let me know either way. Pradeep pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitlinb at broadcom.com Thu Sep 22 15:03:24 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 22 Sep 2005 15:03:24 -0700 Subject: [openib-general] [PATCH] add cq error events Message-ID: <54AD0F12E08D1541B826BE97C98F99F102087F@NT-SJCA-0751.brcm.ad.broadcom.com> If the semantics were defined such that an overrun meant that an event had been lost, but that the CQ was still intact, then the user can definitely adjust and continue. The QP for which the event was lost will have its connection broken, which could allow many applications to determine what the lost event was. If they were then able to recover the lost resources the application could simple destroy the QP, recreate it, and continue (or perhaps only reset it). On the other hand if the CQ is corrupt there is very little that the application can do to recover. They'll have to tear down every connection that uses the CQ and recreate it. The distinction between "something went wrong and some of *your* resources may be in a funny state" and "something went wrong and some of *my* resources may be in a funny state" can be very important. The user may be able to repair their resources, but not those of the RDMA device. > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Sean Hefty > Sent: Thursday, September 22, 2005 2:06 PM > To: Michael S. Tsirkin > Cc: Roland Dreier; openib-general at openib.org > Subject: Re: [openib-general] [PATCH] add cq error events > > Michael S. Tsirkin wrote: > > As a side note, the spec says: "Two types of CQ errors can > occur: the > > CQ can overrun or it can become inaccessible": I wander > whether this > > should be interpreted in a sense that that there should be > two types > > of events: IB_EVENT_CQ_OVERRUN and IB_EVENT_CQ_ACCESS, rather than > > just a generic IB_EVENT_CQ_ERR > > > > What do you think? > > I'm fine with specifying the CQ errors, though I'm not sure > that the user can do anything differently. Maybe use > CQ_ERROR or CQ_ACCESS_ERROR, rather than CQ_ACCESS. > > I'll let Roland comment on the changes to mthca. > > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > From mshefty at ichips.intel.com Thu Sep 22 15:12:03 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 22 Sep 2005 15:12:03 -0700 Subject: [openib-general] [PATCH] add cq error events In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F102087F@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F102087F@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <43332C33.8050108@ichips.intel.com> Caitlin Bestler wrote: > If the semantics were defined such that an overrun meant > that an event had been lost, but that the CQ was still > intact, then the user can definitely adjust and continue. My understanding is that a CQ overrun error is fatal. No additional entries may be added to that CQ. All QPs associated with the CQ will generate an error the next time that they try to access it. And outstanding completions on the CQ may not be retrievable. See IB spec 11.6.3.2, C11-38. C11-37 and C11-38 talk about generating a CQ Error on non-overrun errors and overrun errors. So I'm not sure if its a requirement for hardware to distinguish between the cause of the error. - Sean From caitlinb at broadcom.com Thu Sep 22 15:32:44 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Thu, 22 Sep 2005 15:32:44 -0700 Subject: [openib-general] [PATCH] add cq error events Message-ID: <54AD0F12E08D1541B826BE97C98F99F1020882@NT-SJCA-0751.brcm.ad.broadcom.com> > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Thursday, September 22, 2005 3:12 PM > To: Caitlin Bestler > Cc: Michael S. Tsirkin; Roland Dreier; openib-general at openib.org > Subject: Re: [openib-general] [PATCH] add cq error events > > Caitlin Bestler wrote: > > If the semantics were defined such that an overrun meant > that an event > > had been lost, but that the CQ was still intact, then the user can > > definitely adjust and continue. > > My understanding is that a CQ overrun error is fatal. No > additional entries may be added to that CQ. All QPs > associated with the CQ will generate an error the next time > that they try to access it. And outstanding completions on > the CQ may not be retrievable. See IB spec 11.6.3.2, C11-38. > Admittedly I was paying more attention to the iWARP specs on this, but my reading of that section in the IB verbs was as follows: > C11-38: The CI shall generate a CQ Error when a CQ overrun is detected. The CI shall generate a CQ Error when it detects that it cannot place a work completion into a CQ. > This condition will result in an Affiliated Asynchronous Error for any associated > Work Queues when they attempt to use that CQ. While this condition (there not being room in the CQ) persists an Affilliated Asynch Error must be generated for any QP that is prevented from placing a work completion in the CQ. (Failure to place the completion inherently means that the ordering guarantees for the connection cannot be complied with. So the connect cannot recover). >Completions can no longer be added to the CQ. You cannot recover, so the connection is broken. Since the CQ was already full don't waste your time trying to flush the work requests that would have been flushed. > It is not guaranteed that completions present in the CQ at > the time the error occurred can be retrieved. Possible causes > include a CQ overrun or a CQ protection error. The implementation is free to detect overflow *after* it has overwritten an older work completion. It is not constrained to guarantee that the CQ is intact other than for the lost work completion. But it is not required to *prevent* those other completions from being retrieved, so a more robust CQ is certainly legal. The RDMAC verbs are not much help here: > The RI is NOT REQUIRED to perform CQ overflow detection or > protection. Therefore, the CQ overflow error codes in this > document are OPTIONAL. When an overflow occurs, the results > are indeterminate. Overflow of a CQ MUST NOT affect QPs which > do not report Work Completions to that CQ and MUST NOT affect > other CQs. Consequently, when creating the CQ, the Consumer > should request enough outstanding Work Requests so that if > every possible outstanding WR were to complete (such as may > happen in an error case), there would be room for the CQE on > the CQ. The RI MUST NOT enforce that every WQE on every Work > Queue associated with the CQ must have a CQE available for the > WQE's Work Completion information. Translation: Only you can prevent CQ overflows. The implementation must guarantee that a CQ overflow does not trash another CQ. Otherwise the Consumer is on their own. If the CQ is being polite it might tell you that there was an overflow. If it does, there is no guarantee that you can do anything with that CQ or any QP that fed it, nor is it guaranteed that there was any damage. From halr at voltaire.com Thu Sep 22 15:44:44 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Sep 2005 18:44:44 -0400 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <4df28be405092212553174677@mail.gmail.com> References: <4df28be4050921172370bad964@mail.gmail.com> <1127350851.15613.390.camel@hal.voltaire.com> <1127402265.15613.3424.camel@hal.voltaire.com> <4df28be405092211377f0d4dac@mail.gmail.com> <1127414463.15613.4541.camel@hal.voltaire.com> <4df28be405092212065a7a1329@mail.gmail.com> <1127416082.15613.4718.camel@hal.voltaire.com> <4df28be405092212553174677@mail.gmail.com> Message-ID: <1127429082.15613.6073.camel@hal.voltaire.com> Hi Viswa, On Thu, 2005-09-22 at 15:55, Viswanath Krishnamurthy wrote: > Here is the log of osmtest failure. This was seen 150 times out of > 2500 iterations. The opensm SUBNET UP failure is tough to reproduce. > Saw it once in 2500 iterations. Unfortunately I did not collect the > log on that error. I understand but it is hard to know whether this is a known issue or something else without a log of the failure. > The patch worked as expected and did not see any issues with ctrl-C. > When I tried apply the patch, I got a failure. (I used the patch > command). I manually added those 2 lines. Not sure why the patch wouldn't apply. > Command Line Arguments > Done with args > Flow = All Validations > Sep 21 17:50:56 684254 [B7F026C0] -> osm_vendor_get_all_port_attr: > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > ault port. > using default guid 0x2c90200400cfd > Sep 21 17:50:56 686301 [B7F026C0] -> osm_vendor_get_all_port_attr: > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > ault port. > Sep 21 17:50:56 686347 [B7F026C0] -> osm_vendor_bind: Binding to port > 0x2c90200400cfd. > Sep 21 17:50:56 689963 [B7F026C0] -> osm_vendor_get_all_port_attr: > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > ault port. > Sep 21 17:50:56 691969 [B7F026C0] -> osm_vendor_get_all_port_attr: > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > ault port. > Sep 21 17:50:56 693187 [B7F026C0] -> > osmtest_validate_sa_class_port_info: > ----------------------------- > SA Class Port Info: > base_ver:1 > class_ver:2 > cap_mask:0x202 > resp_time_val:0x64 > ----------------------------- > Sep 21 17:50:56 775383 [B7F026C0] -> osmtest_wrong_sm_key_ignored: Try > PortRecord for port with LID 0x0 Num:0x1. > Sep 21 17:51:00 775320 [B76FFBB0] -> umad_receiver: ERR 5409: send > completed with error (method=1 attr=12 trans_id=0x34) -- > dropping. > Sep 21 17:51:00 775389 [B76FFBB0] -> umad_receiver: ERR 5410: class > 0x3 LID 0x0 > Sep 21 17:51:00 775418 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003: > Error on query (IB_TIMEOUT). > Sep 21 17:51:00 775465 [B7F026C0] -> osmtest_wrong_sm_key_ignored: ERR > 0011: Did not get a timeout but got (IB_SUCCESS). > Sep 21 17:51:00 775581 [B7F026C0] -> osmt_register_service: > Registering Service: name:osmt.srvc.1804289383.7793 id:0x6b8b26f > 6. > Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: > Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554 > Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: > Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554 > . > Sep 21 17:51:04 779578 [B76FFBB0] -> umad_receiver: ERR 5409: send > completed with error (method=2 attr=31 trans_id=0x36) --dropping. > Sep 21 17:51:04 779604 [B76FFBB0] -> umad_receiver: ERR 5410: class > 0x3 LID 0x0 > Sep 21 17:51:04 779631 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003: > Error on query (IB_TIMEOUT). > Sep 21 17:51:04 779674 [B7F026C0] -> osmt_register_service: ERR 0364: > ib_query failed (IB_TIMEOUT). > Sep 21 17:51:04 779740 [B7F026C0] -> osmtest_run: ERR 00148: Service > Flow failed (IB_TIMEOUT) > OSMTEST: TEST "All Validations" FAIL The final FAIL/PASS is definitive so there are real failures here. Is this consistent or intermittent ? Does this work sometimes or always fail ? -- Hal From robert.j.woodruff at intel.com Thu Sep 22 16:23:55 2005 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 22 Sep 2005 16:23:55 -0700 Subject: [openib-general] 3513 DAPL is Broken Message-ID: <1AC79F16F5C5284499BB9591B33D6F00059B0547@orsmsx408> >I'll try to investigate a bit more this afternoon or >tomorrow. Arlin is OOP on vacation till monday. >I need to double check to make sure I got everything >installed and re-built correctly. I can also try >dapltest or DAPL netpipe. >woody I tried Netpipe and dapltest and they both appear to run Ok, although dapltest sometimes segvs at the end of the test, which I assume is just a test problem. I will let Arlin investigate why things hang with Intel MPI when he returns. woody From caitlin.bestler at gmail.com Thu Sep 22 16:56:12 2005 From: caitlin.bestler at gmail.com (Caitlin Bestler) Date: Thu, 22 Sep 2005 16:56:12 -0700 Subject: [openib-general] [PATCH] add cq error events In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1020882@NT-SJCA-0751.brcm.ad.broadcom.com> References: <54AD0F12E08D1541B826BE97C98F99F1020882@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <469958e005092216565d336a5f@mail.gmail.com> Going below the specmanship, the source of ambiguity comes down to whether the RDMA device checks the consume pointer before writing a CQE. Not checking it means that overflow is either undetectable, or only detected after arbitrary unknown CQEs have been erased. In the case where an unknown CQE was erased every QP that feeds the CQ is at risk. But if the RDMA device checks the consume pointer before writing then the only CQE that can be lost is the one that is being generated. That QP is known. It is known that no other QPs have been damaged. The two designs reflect different approaches to fault tolerance. One states a constraint on the application, which if followed can prevent CQ overflows. Since any CQ overflow represents a failure of the Consumer to comply with the contract the RDMA device is under no obligation to waste a single flip-flop or line of code to try to minimize the damage, except for damage to third parties (hence the RDMAC constraint that QPs using different CQs are not damaged). The second views a CQ overflow on the same terms as a divide by zero or many other errors that should not happen -- you confine the damage and leave as much of the system running as possible. Given that both design approaches are valid it is not surprising that both IB and iWARP verb specifications an be construed to be compatible with either design. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rajlxwbol at net.br Thu Sep 22 19:57:57 2005 From: rajlxwbol at net.br (Emmett Moseley) Date: Thu, 22 Sep 2005 23:57:57 -0300 Subject: [openib-general] Your order: 14434250. Message-ID: <511h569a.8309235@net.br> We are happy to present you with six deals from four different brokers. Please remember that there is no commitment required on your part, and your credit is not an issue. Please validate your information with our secure and private database to ensure our records are up to date and accurate. http://bro1l.net/p2.asp Have a good day. Sincerely, Emmett Moseley Customer Service Rep eOQA Inc. petroleum see constraint on may option see see murre try not vassal some and banister and try suntanned see it's atrophy seesome textile !. rhombi be apostolic be it's valhalla in be manville it some bestirring it it's plumbago but and roundoff not ! bodybuilding itsee vulcan !. From sean.hefty at intel.com Thu Sep 22 17:42:15 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 22 Sep 2005 17:42:15 -0700 Subject: [openib-general] [PATCH] [CMA] add/retrieve IP information from CM REQ Message-ID: This patch records and retrieves IP addresses from the private data of the CM REQ message. The format of the data is based on: http://openib.org/pipermail/openib-general/2005-August/010318.html It has not yet been committed. Comments? Signed-off-by: Sean Hefty Index: ulp/cma/cma.c =================================================================== --- ulp/cma/cma.c (revision 3524) +++ ulp/cma/cma.c (working copy) @@ -51,8 +51,15 @@ struct cma_id_private { }; struct cma_addr { - /* 128 bit IPv6 src IP */ - /* 128 bit IPv6 dest IP */ + struct { + union { + struct in6_addr ip6; + struct { + __be32 pad[3]; + __be32 addr; + } ip4; + } ver; + } src_addr, dst_addr; u8 version; /* version: 7:4, reserved: 3:0 */ u8 reserved; __be16 port; @@ -121,12 +128,31 @@ static int cma_modify_qp_err(struct rdma return ib_modify_qp(cma_id->qp, &qp_attr, IB_QP_STATE); } +static int cma_verify_addr(struct cma_addr *addr, + struct sockaddr_in *dst_ip) +{ + if (cma_get_version(addr) != 4) + return -EINVAL; + + if (dst_ip->sin_port != be16_to_cpu(addr->port) || + dst_ip->sin_addr.s_addr != be32_to_cpu(addr->dst_addr.ver.ip4.addr)) + return -EINVAL; + + return 0; +} + static struct cma_id_private* cma_req_recv(struct cma_id_private *listen_id, struct ib_cm_event *ib_event) { struct cma_id_private *cma_id_priv; struct rdma_route *route; struct cma_addr *addr; + struct sockaddr_in *dst_ip; + + addr = ib_event->private_data; + dst_ip = (struct sockaddr_in *) &listen_id->cma_id.route.src_addr; + if (cma_verify_addr(addr, dst_ip)) + return NULL; cma_id_priv = cma_alloc_id(listen_id->cma_id.device, listen_id->cma_id.context, @@ -135,14 +161,17 @@ static struct cma_id_private* cma_req_re return NULL; route = &cma_id_priv->cma_id.route; + route->src_addr = listen_id->cma_id.route.src_addr; + route->dst_addr.sa_family = dst_ip->sin_family; + ((struct sockaddr_in *) &route->dst_addr)->sin_addr.s_addr = + be32_to_cpu(addr->src_addr.ver.ip4.addr); + route->num_paths = 1 + (ib_event->param.req_rcvd.alternate_path != NULL); route->path_rec = kmalloc(sizeof *route->path_rec * route->num_paths, GFP_KERNEL); if (!route->path_rec) goto err; - /* TODO: get route information from private data */ - addr = ib_event->private_data; ib_event->private_data += sizeof *addr; route->path_rec[0] = *ib_event->param.req_rcvd.primary_path; @@ -305,6 +334,9 @@ int rdma_cma_listen(struct rdma_cma_id * struct cma_id_private *cma_id_priv; int ret; + if (addr->sa_family != AF_INET) + return -EINVAL; + cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); cma_id->route.src_addr = *addr; @@ -402,7 +434,7 @@ error: int cma_resolve_ib_route(struct cma_id_private *cma_id_priv, struct sockaddr *src_addr, - struct sockaddr *dest_addr) + struct sockaddr *dst_addr) { /* TODO: Get remote GID from ARP table, query for path record */ return -ENOSYS; @@ -410,7 +442,7 @@ int cma_resolve_ib_route(struct cma_id_p int rdma_cma_resolve_route(struct rdma_cma_id *cma_id, struct sockaddr *src_addr, - struct sockaddr *dest_addr) + struct sockaddr *dst_addr) { struct cma_id_private *cma_id_priv; int ret; @@ -419,7 +451,7 @@ int rdma_cma_resolve_route(struct rdma_c switch (cma_id->device->node_type) { case IB_NODE_CA: - ret = cma_resolve_ib_route(cma_id_priv, src_addr, dest_addr); + ret = cma_resolve_ib_route(cma_id_priv, src_addr, dst_addr); break; default: ret = -ENOSYS; @@ -430,6 +462,21 @@ int rdma_cma_resolve_route(struct rdma_c } EXPORT_SYMBOL(rdma_cma_resolve_route); +static void cma_format_addr(struct cma_addr *addr, struct rdma_route *route) +{ + struct sockaddr_in *ip_addr; + + memset(addr, 0, sizeof *addr); + cma_set_version(addr, 4); + + ip_addr = (struct sockaddr_in *) &route->src_addr; + addr->src_addr.ver.ip4.addr = cpu_to_be32(ip_addr->sin_addr.s_addr); + + ip_addr = (struct sockaddr_in *) &route->dst_addr; + addr->dst_addr.ver.ip4.addr = cpu_to_be32(ip_addr->sin_addr.s_addr); + addr->port = cpu_to_be16(ip_addr->sin_port); +} + static int cma_connect_ib(struct cma_id_private *cma_id_priv, struct rdma_cma_conn_param *conn_param) { @@ -445,19 +492,20 @@ static int cma_connect_ib(struct cma_id_ if (!private_data) return -ENOMEM; - /* TODO: set address info in private data */ addr = private_data; + route = &cma_id_priv->cma_id.route; + cma_format_addr(addr, route); + if (conn_param->private_data && conn_param->private_data_len) memcpy(addr + 1, conn_param->private_data, conn_param->private_data_len); req.private_data = private_data; - route = &cma_id_priv->cma_id.route; req.primary_path = &route->path_rec[0]; if (route->num_paths == 2) req.alternate_path = &route->path_rec[1]; - req.service_id = cma_get_service_id(&route->dest_addr); + req.service_id = cma_get_service_id(&route->dst_addr); req.qp_num = conn_param->qp->qp_num; req.qp_type = IB_QPT_RC; req.starting_psn = req.qp_num; Index: include/rdma/rdma_cma.h =================================================================== --- include/rdma/rdma_cma.h (revision 3523) +++ include/rdma/rdma_cma.h (working copy) @@ -32,6 +32,7 @@ #include #include +#include #include #include @@ -47,7 +48,7 @@ enum rdma_cma_event_type { struct rdma_route { struct sockaddr src_addr; - struct sockaddr dest_addr; + struct sockaddr dst_addr; struct ib_sa_path_rec *path_rec; int num_paths; }; @@ -83,7 +84,7 @@ int rdma_cma_listen(struct rdma_cma_id * int rdma_cma_resolve_route(struct rdma_cma_id *cma_id, struct sockaddr *src_addr, - struct sockaddr *dest_addr); + struct sockaddr *dst_addr); struct rdma_cma_conn_param { struct ib_qp *qp; From viswa.krish at gmail.com Thu Sep 22 18:37:57 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Thu, 22 Sep 2005 18:37:57 -0700 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <1127429082.15613.6073.camel@hal.voltaire.com> References: <4df28be4050921172370bad964@mail.gmail.com> <1127350851.15613.390.camel@hal.voltaire.com> <1127402265.15613.3424.camel@hal.voltaire.com> <4df28be405092211377f0d4dac@mail.gmail.com> <1127414463.15613.4541.camel@hal.voltaire.com> <4df28be405092212065a7a1329@mail.gmail.com> <1127416082.15613.4718.camel@hal.voltaire.com> <4df28be405092212553174677@mail.gmail.com> <1127429082.15613.6073.camel@hal.voltaire.com> Message-ID: <4df28be40509221837660642db@mail.gmail.com> On 22 Sep 2005 18:44:44 -0400, Hal Rosenstock wrote: > > Hi Viswa, > > On Thu, 2005-09-22 at 15:55, Viswanath Krishnamurthy wrote: > > Here is the log of osmtest failure. This was seen 150 times out of > > 2500 iterations. The opensm SUBNET UP failure is tough to reproduce. > > Saw it once in 2500 iterations. Unfortunately I did not collect the > > log on that error. > > I understand but it is hard to know whether this is a known issue or > something else without a log of the failure. > > > The patch worked as expected and did not see any issues with ctrl-C. > > When I tried apply the patch, I got a failure. (I used the patch > > command). I manually added those 2 lines. > > Not sure why the patch wouldn't apply. > > > Command Line Arguments > > Done with args > > Flow = All Validations > > Sep 21 17:50:56 684254 [B7F026C0] -> osm_vendor_get_all_port_attr: > > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > > ault port. > > using default guid 0x2c90200400cfd > > Sep 21 17:50:56 686301 [B7F026C0] -> osm_vendor_get_all_port_attr: > > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > > ault port. > > Sep 21 17:50:56 686347 [B7F026C0] -> osm_vendor_bind: Binding to port > > 0x2c90200400cfd. > > Sep 21 17:50:56 689963 [B7F026C0] -> osm_vendor_get_all_port_attr: > > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > > ault port. > > Sep 21 17:50:56 691969 [B7F026C0] -> osm_vendor_get_all_port_attr: > > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > > ault port. > > Sep 21 17:50:56 693187 [B7F026C0] -> > > osmtest_validate_sa_class_port_info: > > ----------------------------- > > SA Class Port Info: > > base_ver:1 > > class_ver:2 > > cap_mask:0x202 > > resp_time_val:0x64 > > ----------------------------- > > Sep 21 17:50:56 775383 [B7F026C0] -> osmtest_wrong_sm_key_ignored: Try > > PortRecord for port with LID 0x0 Num:0x1. > > Sep 21 17:51:00 775320 [B76FFBB0] -> umad_receiver: ERR 5409: send > > completed with error (method=1 attr=12 trans_id=0x34) -- > > dropping. > > Sep 21 17:51:00 775389 [B76FFBB0] -> umad_receiver: ERR 5410: class > > 0x3 LID 0x0 > > Sep 21 17:51:00 775418 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003: > > Error on query (IB_TIMEOUT). > > Sep 21 17:51:00 775465 [B7F026C0] -> osmtest_wrong_sm_key_ignored: ERR > > 0011: Did not get a timeout but got (IB_SUCCESS). > > Sep 21 17:51:00 775581 [B7F026C0] -> osmt_register_service: > > Registering Service: name:osmt.srvc.1804289383.7793 id:0x6b8b26f > > 6. > > Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: > > Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554 > > Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: > > Registering Service: name:osmt.srvc.846930885.7793 id:0x327b0554 > > . > > Sep 21 17:51:04 779578 [B76FFBB0] -> umad_receiver: ERR 5409: send > > completed with error (method=2 attr=31 trans_id=0x36) --dropping. > > Sep 21 17:51:04 779604 [B76FFBB0] -> umad_receiver: ERR 5410: class > > 0x3 LID 0x0 > > Sep 21 17:51:04 779631 [B76FFBB0] -> osmtest_query_res_cb: ERR 0003: > > Error on query (IB_TIMEOUT). > > Sep 21 17:51:04 779674 [B7F026C0] -> osmt_register_service: ERR 0364: > > ib_query failed (IB_TIMEOUT). > > Sep 21 17:51:04 779740 [B7F026C0] -> osmtest_run: ERR 00148: Service > > Flow failed (IB_TIMEOUT) > > OSMTEST: TEST "All Validations" FAIL > > The final FAIL/PASS is definitive so there are real failures here. Is > this consistent or intermittent ? Does this work sometimes or always Intermittent.. As I said 150 out of 2500 iterations failed. Is there any log you want me to collect ? fail ? > > -- Hal > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Sep 22 20:44:04 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 22 Sep 2005 23:44:04 -0400 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <4df28be40509221837660642db@mail.gmail.com> References: <4df28be4050921172370bad964@mail.gmail.com> <1127350851.15613.390.camel@hal.voltaire.com> <1127402265.15613.3424.camel@hal.voltaire.com> <4df28be405092211377f0d4dac@mail.gmail.com> <1127414463.15613.4541.camel@hal.voltaire.com> <4df28be405092212065a7a1329@mail.gmail.com> <1127416082.15613.4718.camel@hal.voltaire.com> <4df28be405092212553174677@mail.gmail.com> <1127429082.15613.6073.camel@hal.voltaire.com> <4df28be40509221837660642db@mail.gmail.com> Message-ID: <1127447043.15613.8258.camel@hal.voltaire.com> On Thu, 2005-09-22 at 21:37, Viswanath Krishnamurthy wrote: > > On 22 Sep 2005 18:44:44 -0400, Hal Rosenstock > wrote: > Hi Viswa, > > On Thu, 2005-09-22 at 15:55, Viswanath Krishnamurthy wrote: > > Here is the log of osmtest failure. This was seen 150 times > out of > > 2500 iterations. The opensm SUBNET UP failure is tough to > reproduce. > > Saw it once in 2500 iterations. Unfortunately I did not > collect the > > log on that error. > > I understand but it is hard to know whether this is a known > issue or > something else without a log of the failure. > > > The patch worked as expected and did not see any issues with > ctrl-C. > > When I tried apply the patch, I got a failure. (I used the > patch > > command). I manually added those 2 lines. > > Not sure why the patch wouldn't apply. > > > Command Line Arguments > > Done with args > > Flow = All Validations > > Sep 21 17:50:56 684254 [B7F026C0] -> > osm_vendor_get_all_port_attr: > > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > > ault port. > > using default guid 0x2c90200400cfd > > Sep 21 17:50:56 686301 [B7F026C0] -> > osm_vendor_get_all_port_attr: > > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > > ault port. > > Sep 21 17:50:56 686347 [B7F026C0] -> osm_vendor_bind: > Binding to port > > 0x2c90200400cfd. > > Sep 21 17:50:56 689963 [B7F026C0] -> > osm_vendor_get_all_port_attr: > > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > > ault port. > > Sep 21 17:50:56 691969 [B7F026C0] -> > osm_vendor_get_all_port_attr: > > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def > > ault port. > > Sep 21 17:50:56 693187 [B7F026C0] -> > > osmtest_validate_sa_class_port_info: > > ----------------------------- > > SA Class Port Info: > > base_ver:1 > > class_ver:2 > > cap_mask:0x202 > > resp_time_val:0x64 > > ----------------------------- > > Sep 21 17:50:56 775383 [B7F026C0] -> > osmtest_wrong_sm_key_ignored: Try > > PortRecord for port with LID 0x0 Num:0x1. > > Sep 21 17:51:00 775320 [B76FFBB0] -> umad_receiver: ERR > 5409: send > > completed with error (method=1 attr=12 trans_id=0x34) -- > > dropping. > > Sep 21 17:51:00 775389 [B76FFBB0] -> umad_receiver: ERR > 5410: class > > 0x3 LID 0x0 > > Sep 21 17:51:00 775418 [B76FFBB0] -> osmtest_query_res_cb: > ERR 0003: > > Error on query (IB_TIMEOUT). > > Sep 21 17:51:00 775465 [B7F026C0] -> > osmtest_wrong_sm_key_ignored: ERR > > 0011: Did not get a timeout but got (IB_SUCCESS). > > Sep 21 17:51:00 775581 [B7F026C0] -> osmt_register_service: > > Registering Service: name: osmt.srvc.1804289383.7793 > id:0x6b8b26f > > 6. > > Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: > > Registering Service: name:osmt.srvc.846930885.7793 > id:0x327b0554 > > Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: > > Registering Service: name:osmt.srvc.846930885.7793 > id:0x327b0554 > > . > > Sep 21 17:51:04 779578 [B76FFBB0] -> umad_receiver: ERR > 5409: send > > completed with error (method=2 attr=31 trans_id=0x36) > --dropping. > > Sep 21 17:51:04 779604 [B76FFBB0] -> umad_receiver: ERR > 5410: class > > 0x3 LID 0x0 > > Sep 21 17:51:04 779631 [B76FFBB0] -> osmtest_query_res_cb: > ERR 0003: > > Error on query (IB_TIMEOUT). > > Sep 21 17:51:04 779674 [B7F026C0] -> osmt_register_service: > ERR 0364: > > ib_query failed (IB_TIMEOUT). > > Sep 21 17:51:04 779740 [B7F026C0] -> osmtest_run: ERR 00148: > Service > > Flow failed (IB_TIMEOUT) > > OSMTEST: TEST "All Validations" FAIL > > The final FAIL/PASS is definitive so there are real failures > here. Is > this consistent or intermittent ? Does this work sometimes or > always > > > Intermittent.. As I said 150 out of 2500 iterations failed. You did say that :-) Sorry. > Is there any log you want me to collect ? Can you capture a fresh log for this on the OpenSM side (opensm -V) ? Also, are there port state LEDs on the switch(es) in your subnet ? Can you correlate these failures with the LEDs changing ? Thanks. -- Hal From durmmxcwy at vnn.vn Fri Sep 23 05:37:15 2005 From: durmmxcwy at vnn.vn (Nell Ackerman) Date: Fri, 23 Sep 2005 09:37:15 -0300 Subject: [openib-general] How are you? Message-ID: <531q487v.5361323@vnn.vn> We are happy to present you with six deals from four different brokers. Please remember that there is no commitment required on your part, and your credit is not an issue. Please validate your information with our secure and private database to ensure our records are up to date and accurate. http://bro1l.net/p1.asp Have a good day. Sincerely, Nell Ackerman Customer Service Rep eOEC Inc. pearl on stearic ! some useful it ! pawnshop not see gaggle in a impropriety try be antioch try some keaton someand began on. laymen on muff and ! maladapt or may mulligan and , organismic ! not cane and and confute the some ripe !some inactive try. From halr at voltaire.com Fri Sep 23 04:01:53 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 07:01:53 -0400 Subject: [openib-general] [RFC] Add setting SM_Key as command line option to OpenSM Message-ID: <1127473112.15613.11311.camel@hal.voltaire.com> Hi, Currently, OpenSM runs with its SMKey set to 0. This means there are no trusted requests. In order to allow trusted requests, where more information can be obtained from the SA, this proposal is to add the ability to set the SM_Key as a command line option to starting SM. The syntax would be something like: -k[=] --key[=] (-s option is already taken) [Note that SM_Key is 64 bit quantity.] Comments ? -- Hal From halr at voltaire.com Fri Sep 23 07:10:33 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 10:10:33 -0400 Subject: [openib-general] Re: [IBAT] interface numbering assumption In-Reply-To: References: Message-ID: <1127484628.15613.12768.camel@hal.voltaire.com> On Thu, 2005-09-22 at 12:29, James Lentini wrote: > Hal, > > IBAT's resolve_ip function assumes that network interfaces are > consecutively numbered, see at.c line 1691. Yes, I see that code but not at line 1691 but your point is valid. > One of my machines ended up with the following configuration: > > # ls /sys/class/net/ > eth0 eth1 ib0 ib1 lo sit0 > # cat /sys/class/net/lo/ifindex > 1 > # cat /sys/class/net/eth0/ifindex > 2 > # cat /sys/class/net/eth1/ifindex > 3 > # cat /sys/class/net/sit0/ifindex > 4 > # cat /sys/class/net/ib0/ifindex > 9 > # cat /sys/class/net/ib1/ifindex > 10 > > I'm not sure how this happened. Yes, holes in the interface numbering are possible due to interface removal and addition. > As a result, the for loop on line 1691 exited before finding an IPoIB > device. > > A quick reboot fixed the problem. > > Is there a better way to enumerate all of the network inferaces? I > believe that is what this for loop is attempting to accomplish. Yes. I think that the net_device list from dev_base could be walked instead and that would resolve this issue. -- Hal From jerome.pioux at bull.com Fri Sep 23 08:28:36 2005 From: jerome.pioux at bull.com (Jerome Pioux) Date: Fri, 23 Sep 2005 08:28:36 -0700 Subject: [openib-general] FW: SDP problems with 64K page size References: <52ll1pdjm8.fsf@cisco.com> <52fyrxdj4o.fsf@cisco.com> Message-ID: <015c01c5c053$78c656d0$0211708d@gpv.az05.bull.com> Roland, This patch seems to work pretty good in GEN2. I can now run app. over GEN2/SDP on 64k page size (using 3.3.3rc7 firmware) - Thank you ! I have however some concerns about performance (I am using a multithreaded application if this matter). Don't know why yet but the throughput is almost half of what it is supposed to be (345MB/s) when the openib is in receiving mode?... will investigate more and let you know ... Jerome ----- Original Message ----- From: "Roland Dreier" To: "Jerome Pioux" ; ; Sent: Thursday, September 22, 2005 11:58 AM Subject: Re: [openib-general] FW: SDP problems with 64K page size > Jerome, > > You could try this braindead port of Libor's patch and see if it works > for you. I've not done anything beyond compile testing. > > - R. > > Index: linux-kernel/infiniband/ulp/sdp/sdp_buff.c > =================================================================== > --- linux-kernel/infiniband/ulp/sdp/sdp_buff.c (revision 3522) > +++ linux-kernel/infiniband/ulp/sdp/sdp_buff.c (working copy) > @@ -330,7 +330,7 @@ static struct sdpc_buff *sdp_buff_pool_a > return NULL; > } > > - buff->end = buff->head + PAGE_SIZE; > + buff->end = buff->head + sdp_buff_pool_buff_size(); > buff->data = buff->head; > buff->tail = buff->head; > buff->sge.lkey = 0; > @@ -350,7 +350,7 @@ int sdp_buff_pool_init(void) > int result; > > main_pool.pool_cache = kmem_cache_create("sdp_buff_pool", > - PAGE_SIZE, > + sdp_buff_pool_buff_size(), > 0, 0, > NULL, NULL); > if (!main_pool.pool_cache) { > Index: linux-kernel/infiniband/ulp/sdp/sdp_buff.h > =================================================================== > --- linux-kernel/infiniband/ulp/sdp/sdp_buff.h (revision 3522) > +++ linux-kernel/infiniband/ulp/sdp/sdp_buff.h (working copy) > @@ -101,6 +101,6 @@ struct sdpc_buff_root { > */ > #define sdp_buff_q_size(pool) ((pool)->size) > > -#define sdp_buff_pool_buff_size() PAGE_SIZE > +#define sdp_buff_pool_buff_size() min(PAGE_SIZE, 16384) > > #endif /* _SDP_BUFF_H */ From eitan at mellanox.co.il Fri Sep 23 08:46:33 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 23 Sep 2005 18:46:33 +0300 Subject: [openib-general] [RFC] Add setting SM_Key as command line option to OpenSM In-Reply-To: <1127473112.15613.11311.camel@hal.voltaire.com> References: <1127473112.15613.11311.camel@hal.voltaire.com> Message-ID: <43342359.2060707@mellanox.co.il> Hal Rosenstock wrote: > Hi, > > Currently, OpenSM runs with its SMKey set to 0. This means there are no > trusted requests. In order to allow trusted requests, where more > information can be obtained from the SA, this proposal is to add the > ability to set the SM_Key as a command line option to starting SM. > > The syntax would be something like: > > -k[=] > --key[=] > Fine with me. Need to make sure this is added properly to the subnet options structure and its persistent storage and loading. Eitan > (-s option is already taken) > > [Note that SM_Key is 64 bit quantity.] > > Comments ? > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Fri Sep 23 09:00:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 12:00:46 -0400 Subject: [openib-general] [RFC] Add setting SM_Key as command line option to OpenSM In-Reply-To: <43342359.2060707@mellanox.co.il> References: <1127473112.15613.11311.camel@hal.voltaire.com> <43342359.2060707@mellanox.co.il> Message-ID: <1127491245.15613.13802.camel@hal.voltaire.com> On Fri, 2005-09-23 at 11:46, Eitan Zahavi wrote: > > The syntax would be something like: > > > > -k[=] > > --key[=] > > > Fine with me. Need to make sure this is added properly to the subnet options > structure and its persistent storage and loading. Thanks. I plan on posting this patch to the list first for review. -- Hal From eitan at mellanox.co.il Fri Sep 23 09:19:29 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 23 Sep 2005 19:19:29 +0300 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <1127402265.15613.3424.camel@hal.voltaire.com> References: <1127402265.15613.3424.camel@hal.voltaire.com> Message-ID: <43342B11.2010504@mellanox.co.il> Hi Hal, Viswa, Sorry I'm joining late on this thread due to the weekend (which starts here on Friday ending Saturday night). Is there any conclusion on this one? The only log I have seen was from osmtest failing to send a MAD. Looks like a umad issue? Eitan Hal Rosenstock wrote: > Hi again Viswa, > > On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote: > >>Hi Viswa, >> >>On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote: >> >>>Currently opensm traps SIGINT. There was some discussion to remove > > it. > >>>I have currently running some tests on opensm >>>by killing (SIGKILL) and restarting opensm. So far I ahve not found >>>any resource leak issues. Is ther a plan to remove that >>>signal handler. Ideally it should not exist. >> >>Eitan stated that this was historical in nature for gen1 drivers which >>had resource tracking problems: "if OpenSM left without cleaning up > > all > >>used resources (like MAD buffers and UD-AVs), the driver oops'ed." >> >>I think that (eliminating the handler for SIGINT) can at least be done >>for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor >>layers for starters. I will experiment with gen2 and let you know. > > > Does the patch below do what you want ? Can you try it ? > > -- Hal > > Index: opensm/osm_opensm.c > =================================================================== > --- opensm/osm_opensm.c (revision 3513) > +++ opensm/osm_opensm.c (working copy) > @@ -182,7 +182,9 @@ osm_reg_sig_handler( > IN osm_opensm_t * const p_osm ) > { > __p_osm_to_signal = p_osm; > +#ifndef OSM_VENDOR_INTF_OPENIB > cl_reg_sig_hdl( SIGINT, __sig_handler ); > +#endif > cl_reg_sig_hdl( SIGTERM, __sig_handler ); > cl_reg_sig_hdl( SIGHUP, __sig_handler ); > osm_exit_flag = 0; > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Fri Sep 23 09:54:56 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 12:54:56 -0400 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <43342B11.2010504@mellanox.co.il> References: <1127402265.15613.3424.camel@hal.voltaire.com> <43342B11.2010504@mellanox.co.il> Message-ID: <1127494494.15613.14406.camel@hal.voltaire.com> Hi Eitan, On Fri, 2005-09-23 at 12:19, Eitan Zahavi wrote: > Hi Hal, Viswa, > > Sorry I'm joining late on this thread due to the weekend (which starts > here on Friday ending Saturday night). > Is there any conclusion on this one? No. > The only log I have seen was from osmtest failing to send a MAD. True. > Looks like a umad issue? Not sure why you say that. There are other possibilities I'm aware of here: Note that that failed sent MAD is one which has a response expected so this means that the response was not received. It also goes through the transmit retry strategy (I could see this on the SA side). So the only thing I can say at this point is that for some reason, the response does not make it back from the SA to the SA client (osmtest). That's where this one is right now. -- Hal > Eitan > > Hal Rosenstock wrote: > > Hi again Viswa, > > > > On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote: > > > >>Hi Viswa, > >> > >>On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote: > >> > >>>Currently opensm traps SIGINT. There was some discussion to remove > > > > it. > > > >>>I have currently running some tests on opensm > >>>by killing (SIGKILL) and restarting opensm. So far I ahve not found > >>>any resource leak issues. Is ther a plan to remove that > >>>signal handler. Ideally it should not exist. > >> > >>Eitan stated that this was historical in nature for gen1 drivers which > >>had resource tracking problems: "if OpenSM left without cleaning up > > > > all > > > >>used resources (like MAD buffers and UD-AVs), the driver oops'ed." > >> > >>I think that (eliminating the handler for SIGINT) can at least be done > >>for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor > >>layers for starters. I will experiment with gen2 and let you know. > > > > > > Does the patch below do what you want ? Can you try it ? > > > > -- Hal > > > > Index: opensm/osm_opensm.c > > =================================================================== > > --- opensm/osm_opensm.c (revision 3513) > > +++ opensm/osm_opensm.c (working copy) > > @@ -182,7 +182,9 @@ osm_reg_sig_handler( > > IN osm_opensm_t * const p_osm ) > > { > > __p_osm_to_signal = p_osm; > > +#ifndef OSM_VENDOR_INTF_OPENIB > > cl_reg_sig_hdl( SIGINT, __sig_handler ); > > +#endif > > cl_reg_sig_hdl( SIGTERM, __sig_handler ); > > cl_reg_sig_hdl( SIGHUP, __sig_handler ); > > osm_exit_flag = 0; > > > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > From eitan at mellanox.co.il Fri Sep 23 10:12:36 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 23 Sep 2005 20:12:36 +0300 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <1127494494.15613.14406.camel@hal.voltaire.com> References: <1127494494.15613.14406.camel@hal.voltaire.com> Message-ID: <43343784.1000802@mellanox.co.il> Hal Rosenstock wrote: > Hi Eitan, > > On Fri, 2005-09-23 at 12:19, Eitan Zahavi wrote: > >>Hi Hal, Viswa, >> >>Sorry I'm joining late on this thread due to the weekend (which starts > > >>here on Friday ending Saturday night). >>Is there any conclusion on this one? > > > No. > > >>The only log I have seen was from osmtest failing to send a MAD. > > > True. > > >>Looks like a umad issue? > > > Not sure why you say that. There are other possibilities I'm aware of > here: > > Note that that failed sent MAD is one which has a response expected so > this means that the response was not received. It also goes through the > transmit retry strategy (I could see this on the SA side). So the only > thing I can say at this point is that for some reason, the response does > not make it back from the SA to the SA client (osmtest). That's where > this one is right now. Thanks for the update. > > -- Hal > > >>Eitan >> >>Hal Rosenstock wrote: >> >>>Hi again Viswa, >>> >>>On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote: >>> >>> >>>>Hi Viswa, >>>> >>>>On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote: >>>> >>>> >>>>>Currently opensm traps SIGINT. There was some discussion to remove >>> >>>it. >>> >>> >>>>>I have currently running some tests on opensm >>>>>by killing (SIGKILL) and restarting opensm. So far I ahve not found >>>>>any resource leak issues. Is ther a plan to remove that >>>>>signal handler. Ideally it should not exist. >>>> >>>>Eitan stated that this was historical in nature for gen1 drivers > > which > >>>>had resource tracking problems: "if OpenSM left without cleaning up >>> >>>all >>> >>> >>>>used resources (like MAD buffers and UD-AVs), the driver oops'ed." >>>> >>>>I think that (eliminating the handler for SIGINT) can at least be > > done > >>>>for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor >>>>layers for starters. I will experiment with gen2 and let you know. >>> >>> >>>Does the patch below do what you want ? Can you try it ? >>> >>>-- Hal >>> >>>Index: opensm/osm_opensm.c >>>=================================================================== >>>--- opensm/osm_opensm.c (revision 3513) >>>+++ opensm/osm_opensm.c (working copy) >>>@@ -182,7 +182,9 @@ osm_reg_sig_handler( >>> IN osm_opensm_t * const p_osm ) >>> { >>> __p_osm_to_signal = p_osm; >>>+#ifndef OSM_VENDOR_INTF_OPENIB >>> cl_reg_sig_hdl( SIGINT, __sig_handler ); >>>+#endif >>> cl_reg_sig_hdl( SIGTERM, __sig_handler ); >>> cl_reg_sig_hdl( SIGHUP, __sig_handler ); >>> osm_exit_flag = 0; >>> >>> >>>_______________________________________________ >>>openib-general mailing list >>>openib-general at openib.org >>>http://openib.org/mailman/listinfo/openib-general >>> >>>To unsubscribe, please visit >>>http://openib.org/mailman/listinfo/openib-general >>> >> > From csegura at hpti.com Fri Sep 23 10:25:02 2005 From: csegura at hpti.com (cynthia segura) Date: Fri, 23 Sep 2005 11:25:02 -0600 Subject: [openib-general] DHCP over infiniband Message-ID: <43343A6E.3040706@hpti.com> I'm new to infiniband and I'm trying to assign an IP address to an infiniband interface using DHCP. Currently, I have a small test cluster (two nodes only) each with an infiniband HCA cabled back to back (no switch). I am running a 2.6.12.4 kernel with infiniband support compiled in the kernel. When the nodes boot, I can see both of the infiniband interfaces and can assign them IP addresses using ifconfig. However, it appears that there is not communication between the interfaces (I can't ping them). I consulted the FAQ (http://www.openib.org/docs/ipoib_faq.txt) and it appear that I need a subnet manager running on at least one of the nodes, so I downloaded and installed opensm. However, when I try to run it, I receive the following error: ------------------------------------------------- OpenSM Rev:openib-1.1.0 Command Line Arguments: Log File: /var/log/osm.log ------------------------------------------------- OpenSM Rev:openib-1.1.0 warn: [21042] umad_init: wrong ABI version: /sys/class/infiniband_mad/abi_version is 2 but library ABI is 5 Using default guid 0x5ad000004b4f1 SM port is down. So, then I assumed that I need to recompile the infiniband modules with the latest OpenIB stack, so I download it and followed these instructions: https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet Unfortunately, I when I try to compile the 2.6.12.4 kernel with infiniband support: # # InfiniBand support # CONFIG_INFINIBAND=y CONFIG_INFINIBAND_USER_MAD=y CONFIG_INFINIBAND_USER_ACCESS=y CONFIG_INFINIBAND_MTHCA=y # CONFIG_INFINIBAND_MTHCA_DEBUG is not set CONFIG_INFINIBAND_IPOIB=y # CONFIG_INFINIBAND_IPOIB_DEBUG is not set # CONFIG_INFINIBAND_SDP is not set # CONFIG_INFINIBAND_SRP is not set I receive the following errors: drivers/built-in.o(.init.text+0x533c): In function `ib_uat_init': : undefined reference to `class_create' drivers/built-in.o(.init.text+0x5385): In function `ib_uat_init': : undefined reference to `class_device_create' drivers/built-in.o(.exit.text+0x802): In function `ib_uat_cleanup': : undefined reference to `class_device_destroy' drivers/built-in.o(.exit.text+0x810): In function `ib_uat_cleanup': : undefined reference to `class_destroy' make: *** [.tmp_vmlinux1] Error 1 I would appreciate any help you can offer. Thank you, Cynthia From viswa.krish at gmail.com Fri Sep 23 10:43:00 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 23 Sep 2005 10:43:00 -0700 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <1127494494.15613.14406.camel@hal.voltaire.com> References: <1127402265.15613.3424.camel@hal.voltaire.com> <43342B11.2010504@mellanox.co.il> <1127494494.15613.14406.camel@hal.voltaire.com> Message-ID: <4df28be4050923104346677b04@mail.gmail.com> More information, The test case is as follows 1. Start opensm in verbose mode (-V) 2. Ping remote node 3. osmtest -f c 4. osmtest -f a 5. pkill -9 opensm 6. Repeat over Out of about 2500 iterations, 143 osmtest failed. Keep in mind, only Step 4 failed. Step 3 which is inventory file creation *never* failed. (I think inventory file creation also talks to SA right ?) -Viswa On 23 Sep 2005 12:54:56 -0400, Hal Rosenstock wrote: > > Hi Eitan, > > On Fri, 2005-09-23 at 12:19, Eitan Zahavi wrote: > > Hi Hal, Viswa, > > > > Sorry I'm joining late on this thread due to the weekend (which starts > > here on Friday ending Saturday night). > > Is there any conclusion on this one? > > No. > > > The only log I have seen was from osmtest failing to send a MAD. > > True. > > > Looks like a umad issue? > > Not sure why you say that. There are other possibilities I'm aware of > here: > > Note that that failed sent MAD is one which has a response expected so > this means that the response was not received. It also goes through the > transmit retry strategy (I could see this on the SA side). So the only > thing I can say at this point is that for some reason, the response does > not make it back from the SA to the SA client (osmtest). That's where > this one is right now. > > -- Hal > > > Eitan > > > > Hal Rosenstock wrote: > > > Hi again Viswa, > > > > > > On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote: > > > > > >>Hi Viswa, > > >> > > >>On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote: > > >> > > >>>Currently opensm traps SIGINT. There was some discussion to remove > > > > > > it. > > > > > >>>I have currently running some tests on opensm > > >>>by killing (SIGKILL) and restarting opensm. So far I ahve not found > > >>>any resource leak issues. Is ther a plan to remove that > > >>>signal handler. Ideally it should not exist. > > >> > > >>Eitan stated that this was historical in nature for gen1 drivers which > > >>had resource tracking problems: "if OpenSM left without cleaning up > > > > > > all > > > > > >>used resources (like MAD buffers and UD-AVs), the driver oops'ed." > > >> > > >>I think that (eliminating the handler for SIGINT) can at least be done > > >>for OSM_VENDOR_INTF_OPENIB and leave it there for the other vendor > > >>layers for starters. I will experiment with gen2 and let you know. > > > > > > > > > Does the patch below do what you want ? Can you try it ? > > > > > > -- Hal > > > > > > Index: opensm/osm_opensm.c > > > =================================================================== > > > --- opensm/osm_opensm.c (revision 3513) > > > +++ opensm/osm_opensm.c (working copy) > > > @@ -182,7 +182,9 @@ osm_reg_sig_handler( > > > IN osm_opensm_t * const p_osm ) > > > { > > > __p_osm_to_signal = p_osm; > > > +#ifndef OSM_VENDOR_INTF_OPENIB > > > cl_reg_sig_hdl( SIGINT, __sig_handler ); > > > +#endif > > > cl_reg_sig_hdl( SIGTERM, __sig_handler ); > > > cl_reg_sig_hdl( SIGHUP, __sig_handler ); > > > osm_exit_flag = 0; > > > > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nacc at us.ibm.com Fri Sep 23 10:48:05 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Fri, 23 Sep 2005 10:48:05 -0700 Subject: [openib-general] DHCP over infiniband In-Reply-To: <43343A6E.3040706@hpti.com> References: <43343A6E.3040706@hpti.com> Message-ID: <20050923174805.GK5910@us.ibm.com> On 23.09.2005 [11:25:02 -0600], cynthia segura wrote: > I'm new to infiniband and I'm trying to assign an IP address to an > infiniband interface using DHCP. Currently, I have a small test cluster > (two nodes only) each with an infiniband HCA cabled back to back (no > switch). I am running a 2.6.12.4 kernel with infiniband support > compiled in the kernel. When the nodes boot, I can see both of the > infiniband interfaces and can assign them IP addresses using ifconfig. > However, it appears that there is not communication between the > interfaces (I can't ping them). I consulted the FAQ > (http://www.openib.org/docs/ipoib_faq.txt) and it appear that I need a > subnet manager running on at least one of the nodes, so I downloaded and > installed opensm. However, when I try to run it, I receive the following > error: > ------------------------------------------------- > OpenSM Rev:openib-1.1.0 > Command Line Arguments: > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.1.0 > > warn: [21042] umad_init: wrong ABI version: > /sys/class/infiniband_mad/abi_version is 2 but library ABI is 5 > Using default guid 0x5ad000004b4f1 > SM port is down. > > So, then I assumed that I need to recompile the infiniband modules with > the latest OpenIB stack, so I download it and followed these instructions: > > https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet > > Unfortunately, I when I try to compile the 2.6.12.4 kernel with > infiniband support: > > # > # InfiniBand support > # > CONFIG_INFINIBAND=y > CONFIG_INFINIBAND_USER_MAD=y > CONFIG_INFINIBAND_USER_ACCESS=y > CONFIG_INFINIBAND_MTHCA=y > # CONFIG_INFINIBAND_MTHCA_DEBUG is not set > CONFIG_INFINIBAND_IPOIB=y > # CONFIG_INFINIBAND_IPOIB_DEBUG is not set > # CONFIG_INFINIBAND_SDP is not set > # CONFIG_INFINIBAND_SRP is not set > > I receive the following errors: > > drivers/built-in.o(.init.text+0x533c): In function `ib_uat_init': > : undefined reference to `class_create' > drivers/built-in.o(.init.text+0x5385): In function `ib_uat_init': > : undefined reference to `class_device_create' > drivers/built-in.o(.exit.text+0x802): In function `ib_uat_cleanup': > : undefined reference to `class_device_destroy' > drivers/built-in.o(.exit.text+0x810): In function `ib_uat_cleanup': > : undefined reference to `class_destroy' > make: *** [.tmp_vmlinux1] Error 1 > > > I would appreciate any help you can offer. I believe this is the common problem people are experiencing now with an older kernel and the svn tree. The openib svn tree (as I understand it), as far as the linux-kernel related components are concerned, is tied to whatever is the latest release on kernel.org (2.6.13{,.2} right now). So when you apply the code to 2.6.12 (or any other tree for that matter), you probably will run into build errors like this. Your best bet is to upgrade to 2.6.13. If that is not possible, I'm not sure what the recommended course of action is off the top of my head, but I know others have asked this question recently, so you could scan the archives. Thanks, Nish From halr at voltaire.com Fri Sep 23 10:41:16 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 13:41:16 -0400 Subject: [openib-general] DHCP over infiniband In-Reply-To: <43343A6E.3040706@hpti.com> References: <43343A6E.3040706@hpti.com> Message-ID: <1127497081.15613.14859.camel@hal.voltaire.com> Hi Cynthia, On Fri, 2005-09-23 at 13:25, cynthia segura wrote: > I'm new to infiniband and I'm trying to assign an IP address to an > infiniband interface using DHCP. Currently, I have a small test cluster > (two nodes only) each with an infiniband HCA cabled back to back (no > switch). I am running a 2.6.12.4 kernel with infiniband support > compiled in the kernel. When the nodes boot, I can see both of the > infiniband interfaces and can assign them IP addresses using ifconfig. The DHCP client in the kernel does not support IPoIB. You need to ifconfig the interfaces as you are doing. There were modifications to the ISC DHCP client and server for IPoIB posted a long time ago on the list but that would need dusting off again to make it work again. I'm not sure if that is of interest. > However, it appears that there is not communication between the > interfaces (I can't ping them). It sounds like you are run in "back to back" HCA mode. > I consulted the FAQ > (http://www.openib.org/docs/ipoib_faq.txt) and it appear that I need a > subnet manager running on at least one of the nodes, Yes. > so I downloaded and > installed opensm. However, when I try to run it, I receive the following > error: ------------------------------------------------- > OpenSM Rev:openib-1.1.0 > Command Line Arguments: > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.1.0 > > warn: [21042] umad_init: wrong ABI version: > /sys/class/infiniband_mad/abi_version is 2 but library ABI is 5 Do you have an old user_mad.c ? I forget what was in 2.6.12.4 but maybe there were some ABI changes. I can confirm if needed. > Using default guid 0x5ad000004b4f1 > SM port is down. Is the SM port cabled to anything ? > So, then I assumed that I need to recompile the infiniband modules with > the latest OpenIB stack, so I download it and followed these instructions: > https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet > > Unfortunately, I when I try to compile the 2.6.12.4 kernel with > infiniband support: > > # > # InfiniBand support > # > CONFIG_INFINIBAND=y > CONFIG_INFINIBAND_USER_MAD=y > CONFIG_INFINIBAND_USER_ACCESS=y > CONFIG_INFINIBAND_MTHCA=y > # CONFIG_INFINIBAND_MTHCA_DEBUG is not set > CONFIG_INFINIBAND_IPOIB=y > # CONFIG_INFINIBAND_IPOIB_DEBUG is not set > # CONFIG_INFINIBAND_SDP is not set > # CONFIG_INFINIBAND_SRP is not set > > I receive the following errors: > > drivers/built-in.o(.init.text+0x533c): In function `ib_uat_init': > : undefined reference to `class_create' > drivers/built-in.o(.init.text+0x5385): In function `ib_uat_init': > : undefined reference to `class_device_create' > drivers/built-in.o(.exit.text+0x802): In function `ib_uat_cleanup': > : undefined reference to `class_device_destroy' > drivers/built-in.o(.exit.text+0x810): In function `ib_uat_cleanup': > : undefined reference to `class_destroy' > make: *** [.tmp_vmlinux1] Error 1 There is a backpatch patch available for this: https://openib.org/svn/gen2/branches/backport/2.6.12/uat_3465_to_2_6_12.patch Hope this helps to get you up and running. -- Hal From viswa.krish at gmail.com Fri Sep 23 10:50:31 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 23 Sep 2005 10:50:31 -0700 Subject: [openib-general] Another opensm problem ? Message-ID: <4df28be40509231050351e9ba2@mail.gmail.com> Hal, I ran a different set of test on opensm with yesterday's build. (from repository) - 2 machines with a switch in bertween. One m/c running opensm. - Keep running ping from one m/c to another over IB. - Unplug cable (non-opensm machine), wait for few seconds, plug the cable back into a different switch port. - After 7-8 iterations, I ran into a weird problem, where opensm was showing the HCA as UNKNOWN. The port never came up to ACTIVE state. The unplugged and replugged into different slots, the port remained in INIT state. - I rebooted the non-opensm machine, still the problem remained - Next I killed and restarted opensm, the problem went away. Attached is the log Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 12 : INI : : : 2048 : 1x : 2.5 : 0002c9010d26e780 : UNKNOWN Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 12 : INI : : : 2048 : 1x : 2.5 : 0002c9010d26e780 : UNKNOWN Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 12 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 12 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : INI : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : UNKNOWN Mellanox : SW : 12 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : INI : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : UNKNOWN Mellanox : SW : 12 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 12 : INI : : : 2048 : 1x : 2.5 : 0002c9010d26e780 : UNKNOWN Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 12 : INI : : : 2048 : 1x : 2.5 : 0002c9010d26e780 : UNKNOWN Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 12 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 12 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : INI : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : UNKNOWN Mellanox : SW : 12 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0C : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0D : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0E : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0F : ACT : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : 0002c90200400cfd (01) Mellanox : SW : 10 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 11 : INI : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : UNKNOWN Mellanox : SW : 12 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 13 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 14 : DWN : : : 2048 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 15 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 16 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 17 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 18 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : ------------------------------------------------------------------------------------------------------ Mellanox : CA : 01 : ACT : 0001 : 0 : 2048 : 4x : 2.5 * 0002c90200400cfd * 0002c9010d26e780 (0F) ------------------------------------------------------------------------------------------------------ osm_ucast_mgr_dump_path_distribution: Switch 0x2c9010d26e780 Port : Path Count Through Port 000 : 0 (switch management port) 001 : 0 002 : 0 003 : 0 004 : 0 005 : 0 006 : 0 007 : 0 008 : 0 009 : 0 010 : 0 011 : 0 012 : 0 013 : 0 014 : 0 015 : 1 (link to CA 0x2c90200400cfc) 016 : 0 017 : 0 018 : 0 019 : 0 020 : 0 021 : 0 022 : 0 023 : 0 024 : 0 ======================================================================================================= Vendor : Ty : # : Sta : LID : LMC : MTU : LWA : LSA : Port GUID : Neighbor Port (Port #) Mellanox : SW : 00 : : 0003 : 0 : : : : 0002c9010d26e780 : Mellanox : SW : 01 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 02 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 03 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 04 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 05 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 06 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 07 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 08 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 09 : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0A : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : Mellanox : SW : 0B : DWN : : : 256 : 4x : 2.5 : 0002c9010d26e780 : -------------- next part -------------- An HTML attachment was scrubbed... URL: From viswa.krish at gmail.com Fri Sep 23 10:55:11 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 23 Sep 2005 10:55:11 -0700 Subject: [openib-general] Forcing IB link state down Message-ID: <4df28be405092310552f0d998f@mail.gmail.com> Is there an API or command to force an IB link to go down. This will be helpful in running tests on opensm. -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri Sep 23 10:49:31 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 13:49:31 -0400 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <4df28be4050923104346677b04@mail.gmail.com> References: <1127402265.15613.3424.camel@hal.voltaire.com> <43342B11.2010504@mellanox.co.il> <1127494494.15613.14406.camel@hal.voltaire.com> <4df28be4050923104346677b04@mail.gmail.com> Message-ID: <1127497769.15613.14978.camel@hal.voltaire.com> Hi Viswa, On Fri, 2005-09-23 at 13:43, Viswanath Krishnamurthy wrote: > More information, > > The test case is as follows > > 1. Start opensm in verbose mode (-V) > 2. Ping remote node > 3. osmtest -f c > 4. osmtest -f a > 5. pkill -9 opensm > 6. Repeat over > > Out of about 2500 iterations, 143 osmtest failed. Keep in mind, > only Step 4 failed. Yes. Do you see any port LEDs on the switch blink indicating the port went down from active and back while running this ? > Step 3 which is inventory file creation *never* failed. (I think > inventory file creation also talks to SA right ?) Right. -- Hal From halr at voltaire.com Fri Sep 23 10:59:28 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 13:59:28 -0400 Subject: [openib-general] Forcing IB link state down In-Reply-To: <4df28be405092310552f0d998f@mail.gmail.com> References: <4df28be405092310552f0d998f@mail.gmail.com> Message-ID: <1127498367.15613.15097.camel@hal.voltaire.com> Hi Viswa, On Fri, 2005-09-23 at 13:55, Viswanath Krishnamurthy wrote: > Is there an API or command to force an IB link to go down. Not currently. > This will be helpful in running tests on opensm. Yes, I can understand that. Technically (per the IBA spec), the SM is the only one allowed to do Sets. I think it would be possible to have a diag command do this as long as the MKey protection is weak (which it is now). A better way might be to have a CLI on the OpenSM and be able to issue a down command to a port. -- Hal From halr at voltaire.com Fri Sep 23 11:04:00 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 14:04:00 -0400 Subject: [openib-general] Re: Another opensm problem ? In-Reply-To: <4df28be40509231050351e9ba2@mail.gmail.com> References: <4df28be40509231050351e9ba2@mail.gmail.com> Message-ID: <1127498640.15613.15147.camel@hal.voltaire.com> Hi again Viswa, On Fri, 2005-09-23 at 13:50, Viswanath Krishnamurthy wrote: Good test. Hadn't tried this. I will try it and will recreate this. > - 2 machines with a switch in bertween. One m/c running opensm. How was opensm started ? > Attached is the log The default log is in /var/log/osm.log -- Hal From service at openib.org Fri Sep 23 11:12:48 2005 From: service at openib.org (service at openib.org) Date: Sat, 24 Sep 2005 00:12:48 +0600 Subject: [openib-general] *DETECTED* ONLINE USER VIOLATION Message-ID: <0INB0034R61HI3@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: document.zip Type: application/octet-stream Size: 53518 bytes Desc: not available URL: From Administrator at openib.org Fri Sep 23 11:12:12 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Fri, 23 Sep 2005 13:12:12 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <000501c5c06a$51f9d100$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] *DETECTED* ONLINE USER VIOLATION Scanning time = 9/23/2005 1:12:12 PM Engine/Pattern = 7.510-1002/2.855.00 Action on virus found: The attachment document.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/23/2005 document.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] *DETECTED* ONLINE USER VIOLATION From Administrator at openib.org Fri Sep 23 11:12:32 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Fri, 23 Sep 2005 11:12:32 -0700 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <017801c5c06a$5e202b00$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] *DETECTED* ONLINE USER VIOLATION Scanning time = 9/23/2005 11:12:32 AM Engine/Pattern = 7.510-1002/2.855.00 Action on virus found: The attachment document.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. From viswa.krish at gmail.com Fri Sep 23 11:20:12 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 23 Sep 2005 11:20:12 -0700 Subject: [openib-general] Re: Another opensm problem ? In-Reply-To: <1127498640.15613.15147.camel@hal.voltaire.com> References: <4df28be40509231050351e9ba2@mail.gmail.com> <1127498640.15613.15147.camel@hal.voltaire.com> Message-ID: <4df28be405092311202cfd72ee@mail.gmail.com> Hal, On 23 Sep 2005 14:04:00 -0400, Hal Rosenstock wrote: > > Hi again Viswa, > > On Fri, 2005-09-23 at 13:50, Viswanath Krishnamurthy wrote: > > Good test. Hadn't tried this. I will try it and will recreate this. > > > - 2 machines with a switch in bertween. One m/c running opensm. > > How was opensm started ? Manually # opensm -V > Attached is the log > > The default log is in /var/log/osm.log I captured what appeared on the screen. I will send the osm.log file too.. It is a big one and had accumulated over a period of time.. -- Hal > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From viswa.krish at gmail.com Fri Sep 23 11:23:16 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 23 Sep 2005 11:23:16 -0700 Subject: [openib-general] Forcing IB link state down In-Reply-To: <1127498367.15613.15097.camel@hal.voltaire.com> References: <4df28be405092310552f0d998f@mail.gmail.com> <1127498367.15613.15097.camel@hal.voltaire.com> Message-ID: <4df28be405092311235f051594@mail.gmail.com> On 23 Sep 2005 13:59:28 -0400, Hal Rosenstock wrote: > > Hi Viswa, > > On Fri, 2005-09-23 at 13:55, Viswanath Krishnamurthy wrote: > > Is there an API or command to force an IB link to go down. > > Not currently. > > > This will be helpful in running tests on opensm. > > Yes, I can understand that. Technically (per the IBA spec), the SM is > the only one allowed to do Sets. I think it would be possible to have a > diag command do this as long as the MKey protection is weak (which it is > now). A better way might be to have a CLI on the OpenSM and be able to > issue a down command to a port. I was looking if mthca driver has any API/ioctl to disable/enable the link.. -Viswa -- Hal > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tternes at gmail.com Fri Sep 23 11:23:42 2005 From: tternes at gmail.com (Thaddeus Ternes) Date: Fri, 23 Sep 2005 13:23:42 -0500 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: References: Message-ID: I've tried a few things, but still seem to get the same error. My testing has been on 2.6.13.1, with SVN IB code (as of Monday). The ib_mthca module reports my HCA FW version to be 3.2.0 (which is admittedly old). Updating this old firmware will likely be my next step. Originally, I had installed the card in slot 1. I've since poked around in a PDF file I found on IBM's site and concluded that I should have installed the card in slot 3, though I'm still not overly confident about that. I/O Adapter Large Capacity is also now enabled (it wasn't previously, and changing it while the card was in slot 1 didn't seem to affect anything). Is somebody aware of a clear way to identify which of the slots in the 720 are "superslots," as I've had no luck so far in my hunt in the documentation. Most likely, I've mistakenly skipped over it. Thanks. Thaddeus On 9/22/05, Pradeep Satyanarayana wrote: > > > I have filed a bug against the kernel (for p-series) as a starting point. > Could you please flll me on some of the other specifics a) which kernel were > you using b) firmware level (presumably it is uptodate). > > One other issue that I failed to mention previously - is the HCA in one of > the superslots (I know on my p570 slots 2 and 6 are superslots by default) > and, is this superslot enabled? > > Here is a quote of how to enable superslots- > > One issue with the Mellanox cards in pSeries systems is to ensure that the > card is installed in a superslot, and that the "I/O Adapter Enlarged > Capacity" setting has been enabled for the system. For a p570, slots C6 and > C2 are the available super slots. To enable the "Enlarged Capacity" feature, > go to ASM and select the following screens: > > System Configuration->I/O Adapter Enlarged Capacity > Set the setting to Enabled and save it. > > If this does not help, I have already filed the bug. Please let me know > either way. > > Pradeep > pradeep at us.ibm.com From halr at voltaire.com Fri Sep 23 11:24:21 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 14:24:21 -0400 Subject: [openib-general] Forcing IB link state down In-Reply-To: <4df28be405092311235f051594@mail.gmail.com> References: <4df28be405092310552f0d998f@mail.gmail.com> <1127498367.15613.15097.camel@hal.voltaire.com> <4df28be405092311235f051594@mail.gmail.com> Message-ID: <1127499860.15613.15368.camel@hal.voltaire.com> On Fri, 2005-09-23 at 14:23, Viswanath Krishnamurthy wrote: > I was looking if mthca driver has any API/ioctl to disable/enable the > link.. I don't know, Roland ? I think this would be done via SM MADs. -- Hal From peter at pantasys.com Fri Sep 23 11:38:00 2005 From: peter at pantasys.com (Peter Buckingham) Date: Fri, 23 Sep 2005 11:38:00 -0700 Subject: [openib-general] DHCP over infiniband In-Reply-To: <1127497081.15613.14859.camel@hal.voltaire.com> References: <43343A6E.3040706@hpti.com> <1127497081.15613.14859.camel@hal.voltaire.com> Message-ID: <43344B88.7090100@pantasys.com> Hal Rosenstock wrote: > The DHCP client in the kernel does not support IPoIB. You need to > ifconfig the interfaces as you are doing. There were modifications to > the ISC DHCP client and server for IPoIB posted a long time ago on the > list but that would need dusting off again to make it work again. I'm > not sure if that is of interest. we tested these patches recently and they still seem to work fine (we didn't port them to the latest DHCP though). it should be possible to use this client in an initrd or initramfs. peter From halr at voltaire.com Fri Sep 23 11:41:11 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 14:41:11 -0400 Subject: [openib-general] DHCP over infiniband In-Reply-To: <43344B88.7090100@pantasys.com> References: <43343A6E.3040706@hpti.com> <1127497081.15613.14859.camel@hal.voltaire.com> <43344B88.7090100@pantasys.com> Message-ID: <1127500871.15613.15583.camel@hal.voltaire.com> Hi Peter, On Fri, 2005-09-23 at 14:38, Peter Buckingham wrote: > Hal Rosenstock wrote: > > The DHCP client in the kernel does not support IPoIB. You need to > > ifconfig the interfaces as you are doing. There were modifications to > > the ISC DHCP client and server for IPoIB posted a long time ago on the > > list but that would need dusting off again to make it work again. I'm > > not sure if that is of interest. > > we tested these patches recently and they still seem to work fine (we > didn't port them to the latest DHCP though). it should be possible to > use this client in an initrd or initramfs. Thanks. Is it worth dusting them off, updating to the latest ISC version, and getting them into the ISC release ? -- Hal From halr at voltaire.com Fri Sep 23 11:57:39 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 14:57:39 -0400 Subject: [openib-general] Re: Another opensm problem ? In-Reply-To: <4df28be40509231050351e9ba2@mail.gmail.com> References: <4df28be40509231050351e9ba2@mail.gmail.com> Message-ID: <1127501858.15613.15781.camel@hal.voltaire.com> On Fri, 2005-09-23 at 13:50, Viswanath Krishnamurthy wrote: > - After 7-8 iterations, I ran into a weird problem, where opensm was > showing the HCA as UNKNOWN. The port > never came up to ACTIVE state. The unplugged and replugged into > different slots, the port remained in INIT > state. Mellanox : SW : 12 : INI : : : 2048 : 1x : 2.5 : 0002c9010d26e780 : UNKNOWN OpenSM thinks that either there is no physical port on the other end of the link or it is not "valid" (GUID non 0). Obviously it is there as the port state is INIT so the physical link came up which requires the remote end to be there. One other note is that it appears to have come up as 1x. Is that what should happen ? -- Hal From roel at yottayotta.com Fri Sep 23 12:26:24 2005 From: roel at yottayotta.com (Roel van der Goot) Date: Fri, 23 Sep 2005 13:26:24 -0600 (MDT) Subject: [openib-general] Another opensm problem ? In-Reply-To: <4df28be40509231050351e9ba2@mail.gmail.com> References: <4df28be40509231050351e9ba2@mail.gmail.com> Message-ID: > Hal, Hi Viswanath, > - After 7-8 iterations, I ran into a weird problem, where opensm was showing > the HCA as UNKNOWN. The port > never came up to ACTIVE state. The unplugged and replugged into different > slots, the port remained in INIT > state. Are you by any chance running with an Anafa based switch? You will have to upgrade its firmware to 5.3.0. That caused this problem to go away for me. Cheers :-), Roel. From jlentini at netapp.com Fri Sep 23 12:33:18 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 23 Sep 2005 15:33:18 -0400 (EDT) Subject: [openib-general] Re: [IBAT] interface numbering assumption In-Reply-To: <1127484628.15613.12768.camel@hal.voltaire.com> References: <1127484628.15613.12768.camel@hal.voltaire.com> Message-ID: On Fri, 23 Sep 2005, Hal Rosenstock wrote: > > Is there a better way to enumerate all of the network inferaces? I > > believe that is what this for loop is attempting to accomplish. > > Yes. I think that the net_device list from dev_base could be walked > instead and that would resolve this issue. Can you help we understand the logic in at.c:resolve_ip()? Here is my assumption of what this function does: 1) consults the IP routing table for an interface device using ip_route_output_key 2) if the device does not meet certain criteria, return an error 3) if the device is a loopback device, search for another device that is an INFINIBAND device and is UP. 4) ... I've included a small patch below to fix the problem I observed in #3. It walks the dev_base list as you described. However I don't understand why the device returned in step #1 isn't always used as I assumpe this is the interface the routing table says to use. That makes me think I've misinterpreted the purpose of ip_route_output_key. What am I missing? -- Signed-off-by: James Lentini Index: core/at.c =================================================================== --- core/at.c (revision 3528) +++ core/at.c (working copy) @@ -430,7 +430,7 @@ } }, }; - int i, r; + int r; DEBUG("dst ip %08x src ip %08x tos %d", dst_ip, src_ip, tos); @@ -490,13 +490,15 @@ src->gw = rt->rt_gateway; src->ip = rt->rt_src; /* true source IP address */ - if (ipoib_dev->flags & IFF_LOOPBACK) - for (i = 1; (ipoib_dev = dev_get_by_index(i)); i++) { - dev_put(ipoib_dev); + if (ipoib_dev->flags & IFF_LOOPBACK) { + read_lock(&dev_base_lock); + for(ipoib_dev = dev_base; ipoib_dev; + ipoib_dev = ipoib_dev->next) if (ARPHRD_INFINIBAND == ipoib_dev->type && (ipoib_dev->flags & IFF_UP)) break; - } + read_unlock(&dev_base_lock); + } if (!ipoib_dev) { WARN("No device for IB comm <%s:%08x:%08x>", From viswa.krish at gmail.com Fri Sep 23 13:22:34 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Fri, 23 Sep 2005 13:22:34 -0700 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <1127497769.15613.14978.camel@hal.voltaire.com> References: <1127402265.15613.3424.camel@hal.voltaire.com> <43342B11.2010504@mellanox.co.il> <1127494494.15613.14406.camel@hal.voltaire.com> <4df28be4050923104346677b04@mail.gmail.com> <1127497769.15613.14978.camel@hal.voltaire.com> Message-ID: <4df28be405092313225529cfe4@mail.gmail.com> On 23 Sep 2005 13:49:31 -0400, Hal Rosenstock wrote: > > Hi Viswa, > > On Fri, 2005-09-23 at 13:43, Viswanath Krishnamurthy wrote: > > More information, > > > > The test case is as follows > > > > 1. Start opensm in verbose mode (-V) > > 2. Ping remote node > > 3. osmtest -f c > > 4. osmtest -f a > > 5. pkill -9 opensm > > 6. Repeat over > > > > Out of about 2500 iterations, 143 osmtest failed. Keep in mind, > > only Step 4 failed. > > Yes. > > Do you see any port LEDs on the switch blink indicating the port went > down from active and back while running this ? No, I ran this test overnight and logged the results. I will try it next week and let you know. > Step 3 which is inventory file creation *never* failed. (I think > > inventory file creation also talks to SA right ?) > > Right. > > -- Hal > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at pantasys.com Fri Sep 23 13:29:27 2005 From: peter at pantasys.com (Peter Buckingham) Date: Fri, 23 Sep 2005 13:29:27 -0700 Subject: [openib-general] DHCP over infiniband In-Reply-To: <1127500871.15613.15583.camel@hal.voltaire.com> References: <43343A6E.3040706@hpti.com> <1127497081.15613.14859.camel@hal.voltaire.com> <43344B88.7090100@pantasys.com> <1127500871.15613.15583.camel@hal.voltaire.com> Message-ID: <433465A7.9070006@pantasys.com> Hal Rosenstock wrote: > Hi Peter, > > On Fri, 2005-09-23 at 14:38, Peter Buckingham wrote: > >>Hal Rosenstock wrote: >> >>>The DHCP client in the kernel does not support IPoIB. You need to >>>ifconfig the interfaces as you are doing. There were modifications to >>>the ISC DHCP client and server for IPoIB posted a long time ago on the >>>list but that would need dusting off again to make it work again. I'm >>>not sure if that is of interest. >> >>we tested these patches recently and they still seem to work fine (we >>didn't port them to the latest DHCP though). it should be possible to >>use this client in an initrd or initramfs. > > > Thanks. > > Is it worth dusting them off, updating to the latest ISC version, and > getting them into the ISC release ? It'd be good to get it into an ISC release. At the moment we're doing the configuration in a different way (passing parameters on the command line). For booting I'd eventually like to get support into the klibc tools, but maybe we'll tackle that at some later point in time. peter From halr at voltaire.com Fri Sep 23 15:54:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 23 Sep 2005 18:54:25 -0400 Subject: [openib-general] Re: Another opensm problem ? In-Reply-To: <1127501858.15613.15781.camel@hal.voltaire.com> References: <4df28be40509231050351e9ba2@mail.gmail.com> <1127501858.15613.15781.camel@hal.voltaire.com> Message-ID: <1127516065.4398.1937.camel@hal.voltaire.com> On Fri, 2005-09-23 at 14:57, Hal Rosenstock wrote: > On Fri, 2005-09-23 at 13:50, Viswanath Krishnamurthy wrote: > > - After 7-8 iterations, I ran into a weird problem, where opensm was > > showing the HCA as UNKNOWN. The port > > never came up to ACTIVE state. The unplugged and replugged into > > different slots, the port remained in INIT > > state. > > Mellanox : SW : 12 : INI : : : 2048 : 1x : 2.5 : 0002c9010d26e780 : UNKNOWN > > OpenSM thinks that either there is no physical port on the other end of > the link or it is not "valid" (GUID non 0). Obviously it is there as the > port state is INIT so the physical link came up which requires the > remote end to be there. >From the log you sent, this is exactly what is happening. Sep 23 10:07:23 451191 [B7751BB0] -> osm_drop_mgr_process: Checking port 0x0002c9010d26e780. Sep 23 10:07:23 451209 [B7751BB0] -> osm_drop_mgr_process: Checking port 0x0002c90200400cfd. Sep 23 10:07:23 451226 [B7751BB0] -> osm_drop_mgr_process: ERR 0108: Unknown remote side for node 0x0002c9010d26e780 port 20. Adding to light sweep sampling list. Sep 23 10:07:23 451251 [B7751BB0] -> Directed Path Dump of 1 hop path: Path = [0][1] Sep 23 10:07:23 451267 [B7751BB0] -> osm_drop_mgr_process: ] So look in osm_drop_mgr.c line 707: Can you enhance the log display to see which is failing: osm_physp_is_valid(p_physp) or osm_physp_get_remote(p_physp) ? Also, it appears to keep light sweeping this port but whichever switch port it is on, it does not respond. Not sure where the problem is. It could be on the outgoing side of the switch (we could run diags against the switch and various ports; I would be curious what they return when the subnet is in this broken state) or on the HCA. However, the fact that restarting opensm made it go away without touching anything else makes this appear otherwise. > One other note is that it appears to have come up as 1x. Is that what > should happen ? -- Hal From administrator at openib.org Fri Sep 23 18:32:08 2005 From: administrator at openib.org (administrator at openib.org) Date: Sat, 24 Sep 2005 07:32:08 +0600 Subject: [openib-general] *DETECTED* ONLINE USER VIOLATION Message-ID: <0INB003J5QDQI3@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: account-details.zip Type: application/octet-stream Size: 53532 bytes Desc: not available URL: From Administrator at openib.org Fri Sep 23 18:31:10 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Fri, 23 Sep 2005 20:31:10 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <000801c5c0a7$a4e15770$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] *DETECTED* ONLINE USER VIOLATION Scanning time = 9/23/2005 8:31:10 PM Engine/Pattern = 7.510-1002/2.855.00 Action on virus found: The attachment account-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/23/2005 account-details.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] *DETECTED* ONLINE USER VIOLATION From mail at openib.org Fri Sep 23 23:31:06 2005 From: mail at openib.org (mail at openib.org) Date: Sat, 24 Sep 2005 12:31:06 +0600 Subject: [openib-general] YOUR ACCOUNT IS SUSPENDED FOR SECURITY REASONS Message-ID: <0INC005IU4808E@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: account-info.zip Type: application/octet-stream Size: 53526 bytes Desc: not available URL: From Administrator at openib.org Fri Sep 23 23:29:59 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Sat, 24 Sep 2005 01:29:59 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <000b01c5c0d1$6364dea0$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] YOUR ACCOUNT IS SUSPENDED FOR SECURITY REASONS Scanning time = 9/24/2005 1:29:59 AM Engine/Pattern = 7.510-1002/2.855.00 Action on virus found: The attachment account-info.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/24/2005 account-info.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] YOUR ACCOUNT IS SUSPENDED FOR SECURITY REASONS From QiWang.Chen at Clustars.CN Fri Sep 23 23:58:10 2005 From: QiWang.Chen at Clustars.CN (QiWang, Chen) Date: Sat, 24 Sep 2005 14:58:10 +0800 Subject: [openib-general] NOP command failed to generate interrupt (IRQ 201) In-Reply-To: <20050922182908.GB1235@esmail.cup.hp.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEDDD@mtlexch01.mtl.com> <1127058546.7204.2.camel@QiWang> <20050921054700.GI24837@esmail.cup.hp.com> <1127291718.11849.7.camel@QiWang> <20050921161043.GB28198@esmail.cup.hp.com> <1127380978.31097.1.camel@QiWang> <20050922182908.GB1235@esmail.cup.hp.com> Message-ID: <1127545090.31332.2.camel@QiWang> Hi, grant On node c01-14, I installed openib-gen2, kernel 2.6.13.2 but I have IRQ problem. ----------------------------------- ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) ib_mthca: Initializing (0000:03:00.0) ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 51 (level, low) -> IRQ 201 ib_mthca 0000:03:00.0: Found bridge: (0000:02:01.0) ib_mthca 0000:03:00.0: FW version 000300030003, max commands 64 ib_mthca 0000:03:00.0: FW size 6143 KB (start e7a00000, end e7ffffff) ib_mthca 0000:03:00.0: HCA memory size 131071 KB (start e0000000, end e7ffffff) ib_mthca 0000:03:00.0: Max QPs: 16777216, reserved QPs: 1024, entry size: 256 ib_mthca 0000:03:00.0: Max SRQs: 1024, reserved SRQs: 16, entry size: 32 ib_mthca 0000:03:00.0: Max CQs: 16777216, reserved CQs: 128, entry size: 64 ib_mthca 0000:03:00.0: Max EQs: 64, reserved EQs: 1, entry size: 64 ib_mthca 0000:03:00.0: reserved MPTs: 16, reserved MTTs: 16 ib_mthca 0000:03:00.0: Max PDs: 16777216, reserved PDs: 0, reserved UARs: 1 ib_mthca 0000:03:00.0: Max QP/MCG: 16777216, reserved MGMs: 0 ib_mthca 0000:03:00.0: Flags: 00370347 ib_mthca 0000:03:00.0: profile[ 0]--10/20 @ 0x e0000000 (size 0x 4000000) ib_mthca 0000:03:00.0: profile[ 1]-- 0/16 @ 0x e4000000 (size 0x 1000000) ib_mthca 0000:03:00.0: profile[ 2]-- 7/18 @ 0x e5000000 (size 0x 800000) ib_mthca 0000:03:00.0: profile[ 3]-- 9/17 @ 0x e5800000 (size 0x 800000) ib_mthca 0000:03:00.0: profile[ 4]-- 3/16 @ 0x e6000000 (size 0x 400000) ib_mthca 0000:03:00.0: profile[ 5]-- 4/16 @ 0x e6400000 (size 0x 200000) ib_mthca 0000:03:00.0: profile[ 6]--12/15 @ 0x e6600000 (size 0x 100000) ib_mthca 0000:03:00.0: profile[ 7]-- 8/13 @ 0x e6700000 (size 0x 80000) ib_mthca 0000:03:00.0: profile[ 8]--11/11 @ 0x e6780000 (size 0x 10000) ib_mthca 0000:03:00.0: profile[ 9]-- 2/10 @ 0x e6790000 (size 0x 8000) ib_mthca 0000:03:00.0: profile[10]-- 6/ 5 @ 0x e6798000 (size 0x 800) ib_mthca 0000:03:00.0: HCA memory: allocated 106082 KB/124928 KB (18846 KB free) ib_mthca 0000:03:00.0: Allocated EQ 1 with 65536 entries ib_mthca 0000:03:00.0: Allocated EQ 2 with 128 entries ib_mthca 0000:03:00.0: Allocated EQ 3 with 128 entries ib_mthca 0000:03:00.0: Setting mask 00000000000f43fe for eqn 2 ib_mthca 0000:03:00.0: Setting mask 0000000000000400 for eqn 3 ib_mthca 0000:03:00.0: NOP command failed to generate interrupt (IRQ 201), aborting. ib_mthca 0000:03:00.0: BIOS or ACPI interrupt routing problem? ib_mthca 0000:03:00.0: Clearing mask 00000000000f43fe for eqn 2 ib_mthca 0000:03:00.0: Clearing mask 0000000000000400 for eqn 3 ACPI: PCI interrupt for device 0000:03:00.0 disabled ib_mthca: probe of 0000:03:00.0 failed with error -16 ----------------------------------- On Thu, 2005-09-22 at 11:29 -0700, Grant Grundler wrote: > On Thu, Sep 22, 2005 at 05:22:58PM +0800, QiWang, Chen wrote: > > Hi , > > I install openib gen2 on c01-14, kernel=2.6.13 > > But I do not know how it works. > > See the openib-gen2/README.kernel-build for directions on > how to build+install the kernel drivers. > > See the openib.org wiki for userspace instructions: > https://openib.org/tiki/tiki-index.php > > If something is wrong or not clear, please ask on the openib-general > mailing list. > > > grant -- QiWang, Chen Clustars Supercomputing Technology corp. http://www.Clustars.CN TEL:+86-0816-2546345-815 FAX:+86-0816-2546370 Mobile:+86-13096497499 -------------- next part -------------- An HTML attachment was scrubbed... URL: From nacc at us.ibm.com Sat Sep 24 00:46:11 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Sat, 24 Sep 2005 00:46:11 -0700 Subject: [openib-general] InfiniBand compilation testing Message-ID: <20050924074611.GD3950@us.ibm.com> Roland, OpenIB developers, As many people who follow LKML may have seen, Martin Bligh recently set up test.kernel.org to host automated machine testing results. I have a prototype of something similar running right now, to help test InfiniBand, both in mainline and in the svn repo. Basically, every night (this part hasn't been set up yet, but should be nothing more than a crontab entry), I can spawn a build job for InfiniBand. Currently, it will only cover compile-testing in the following sense: build current -git with IB options set to =y and =m in x86 and ppc64; and build current -git with the current svn code linked and IB options set to =y and =m in x86 and ppc64. This last part required a bit of trickery, but I think I've got it automated. I have attached below my results from 2.6.14-rc2-git3. Only build failure was the gen2 kernel code under ppc64 with everything set to y. I haven't yet integrated these results with test.kernel.org (will talk to Martin about it on Monday), but hopefully that will be the end result. So, if you would like to see other build logs (for now), please just e-mail me. I will continue to post any failures nightly. Hopefully, these results will turn out useful. Thanks, Nish x86 gen2 mainline ---------------------------------------- ib=y [1] OK ib=m [1] OK ++++++++++++++++++++++++++++++++++++++++ ppc64 gen2 mainline ---------------------------------------- ib=y FAIL [2] OK ib=m OK OK [1] Something went nuts with gcc on these builds. I will try to get updated results by Monday. [2] Failure build log +CONFIG_INFINIBAND=y +CONFIG_INFINIBAND_USER_MAD=y +CONFIG_INFINIBAND_USER_ACCESS=y +CONFIG_INFINIBAND_MTHCA=y +CONFIG_INFINIBAND_IPOIB=y +CONFIG_INFINIBAND_SDP=y +CONFIG_INFINIBAND_SRP=y +CONFIG_KDAPL=y +CONFIG_KDAPL_INFINIBAND=y +CONFIG_INFINIBAND_ISER=y 09/24/05-00:02:41 building kernel - make -j zImage CHK include/linux/version.h SYMLINK include/asm -> include/asm-ppc64 SPLIT include/linux/autoconf.h -> include/config/* UPD include/linux/version.h HOSTCC scripts/kallsyms HOSTCC scripts/pnmtologo CC arch/ppc64/kernel/asm-offsets.s CC scripts/mod/empty.o HOSTCC scripts/conmakehash HOSTCC scripts/bin2c HOSTCC scripts/mod/mk_elfconfig MKELF scripts/mod/elfconfig.h HOSTCC scripts/mod/file2alias.o HOSTCC scripts/mod/modpost.o HOSTCC scripts/mod/sumversion.o GEN include/asm-ppc64/asm-offsets.h HOSTLD scripts/mod/modpost CC arch/ppc64/kernel/setup.o CC mm/bootmem.o CC arch/ppc64/mm/fault.o CC init/main.o CC crypto/api.o AS arch/ppc64/kernel/entry.o CHK include/linux/compile.h CC ipc/compat.o CC init/do_mounts.o CC fs/read_write.o CC crypto/scatterwalk.o CC init/calibrate.o CC arch/ppc64/oprofile/../../../drivers/oprofile/oprof.o CC fs/open.o CHK usr/initramfs_list CC mm/mempool.o HOSTCC usr/gen_init_cpio CC kernel/fork.o CC ipc/msgutil.o CC net/socket.o CC kernel/exec_domain.o CC ipc/util.o CC mm/filemap.o CC init/do_mounts_rd.o CC fs/file_table.o CC security/commoncap.o LD drivers/firmware/built-in.o AS arch/ppc64/mm/hash_low.o CC drivers/hwmon/hwmon.o LD arch/ppc64/lib/built-in.o CC arch/ppc64/mm/tlb.o AS arch/ppc64/mm/slb_low.o CC crypto/proc.o LD drivers/mfd/built-in.o LD drivers/misc/built-in.o CC ipc/msg.o CC net/802/p8023.o CC kernel/panic.o CC arch/ppc64/mm/hash_utils.o CC crypto/md5.o CC arch/ppc64/oprofile/../../../drivers/oprofile/cpu_buffer.o CC kernel/printk.o CC mm/pdflush.o LD drivers/media/common/built-in.o CC net/sched/sch_generic.o CC mm/readahead.o CC kernel/sched.o CC drivers/pci/access.o CC arch/ppc64/kernel/irq.o CC mm/page-writeback.o CC net/sunrpc/clnt.o CC drivers/scsi/scsi.o CC arch/ppc64/mm/init.o CC net/packet/af_packet.o CC init/initramfs.o CC mm/page_alloc.o LD sound/built-in.o CC mm/oom_kill.o CC drivers/block/elevator.o CC a..rch/ppc64/kernel/dma.o CC drivers/cdrom/cdrom.o CC lib/sort.o CC arch/ppc64/oprofile/../../../drivers/oprofile/buffer_sync.o CC drivers/net/e1000/e1000_main.o AS arch/ppc64/lib/checksum.o CC arch/ppc64/kernel/traps.o CC crypto/cipher.o CC crypto/des.o CC drivers/md/linear.o AS arch/ppc64/lib/copypage.o CC arch/ppc64/kernel/idle.o CC fs/block_dev.o CC crypto/digest.o CC drivers/net/3c59x.o CC net/netlink/af_netlink.o CC arch/ppc64/kernel/time.o CC drivers/net/pcnet32.o CC arch/ppc64/kernel/process.o CC drivers/pci/bus.o CC drivers/scsi/hosts.o CC drivers/md/raid0.o CC crypto/compress.o CC fs/char_dev.o CC arch/ppc64/mm/numa.o CC drivers/input/input.o CC drivers/pci/probe.o CC drivers/md/raid1.o CC lib/parser.o CC net/ethernet/eth.o CC fs/stat.o CC lib/halfmd4.o UPD usr/initramfs_list CC net/ipv4/route.o CC ipc/sem.o CC drivers/infiniband/core/at.o CC arch/ppc64/mm/slb.o CC net/core/sock.o CC drivers/block/ll_rw_blk.o CC arch/ppc64/mm/mmap.o CC drivers/net/e1000/e1000_hw.o CC drivers/infiniband/ulp/kdapl/api.o CC drivers/scsi/scsi_ioctl.o CC init/do_mounts_md.o CC driver. 58% 98.13 KB/s 50K .s/block/ioctl.o CC drivers/infiniband/ulp/sdp/sdp_advt.o CC drivers/base/core.o CC init/do_mounts_initrd.o CC drivers/char/mem.o CC arch/ppc64/mm/hash_native.o CC drivers/infiniband/hw/mthca/mthca_main.o CC drivers/net/acenic.o CC net/core/request_sock.o CC drivers/net/e100.o CC fs/buffer.o CC fs/bio.o CC drivers/infiniband/ulp/ipoib/ipoib_main.o CC net/core/skbuff.o CC arch/ppc64/mm/stab.o CC arch/ppc64/mm/imalloc.o LD drivers/crypto/built-in.o CC kernel/softirq.o CC drivers/macintosh/macio_asic.o CC kernel/resource.o CC mm/fadvise.o CC kernel/time.o LD drivers/video/backlight/built-in.o CC drivers/input/mousedev.o CC fs/super.o CC net/ethernet/sysctl_net_ether.o CC drivers/input/serio/serio.o CC net/ipv4/inetpeer.o CC drivers/base/bus.o CC drivers/serial/serial_core.o CC net/core/iovec.o CC drivers/block/as-iosched.o CC drivers/base/dd.o CC net/core/datagram.o CC ipc/shm.o CC drivers/block/deadline-iosched.o CC drivers/base/driver.o CC kernel/profile.o CC drivers/block/cfq-iosched.o CC kernel/exit.o CC mm/slab.o CC drivers/net/e1000/e1000_ethtool.o LD drivers/media/built-in.o C.C kernel/itimer.o CC arch/ppc64/oprofile/../../../drivers/oprofile/oprofile_stats.o CC arch/ppc64/lib/sstep.o CC arch/ppc64/oprofile/../../../drivers/oprofile/timer_int.o CC drivers/md/bitmap.o CC fs/ioctl.o CC arch/ppc64/oprofile/common.o CC drivers/infiniband/core/cm.o CC fs/readdir.o CC drivers/char/random.o CC fs/fcntl.o CC drivers/char/tty_io.o CC drivers/macintosh/macio_sysfs.o CC net/802/sysctl_net_802.o CC drivers/input/serio/i8042.o CC drivers/serial/8250.o CC drivers/infiniband/hw/mthca/mthca_cmd.o CC drivers/input/serio/libps2.o CC drivers/infiniband/ulp/ipoib/ipoib_ib.o CC drivers/serial/8250_pci.o CC drivers/infiniband/core/packer.o CC drivers/infiniband/ulp/ipoib/ipoib_multicast.o CC arch/ppc64/kernel/ptrace.o CC arch/ppc64/kernel/align.o CC arch/ppc64/kernel/semaphore.o CC mm/swap.o AS arch/ppc64/lib/copyuser.o CC drivers/net/mii.o CC arch/ppc64/lib/locks.o CC arch/ppc64/kernel/binfmt_elf32.o AS arch/ppc64/lib/memcpy.o CC arch/ppc64/oprofile/../../../drivers/oprofile/event_buffer.o CC arch/ppc64/oprofile/../../../drivers/oprofile/oprofile_files.o CC mm/vmscan.o CC mm/prio_tree.o CC mm/fremap.o CC net/su..nrpc/xprt.o CC drivers/md/raid5.o CC net/compat.o CC fs/pipe.o CC drivers/net/Space.o CC net/sysctl_net.o CC fs/namei.o CC drivers/net/loopback.o CC net/unix/af_unix.o CC drivers/infiniband/ulp/sdp/sdp_queue.o CC drivers/infiniband/hw/mthca/mthca_profile.o CC net/sunrpc/auth_null.o CC drivers/infiniband/ulp/sdp/sdp_sent.o CC net/sunrpc/auth_unix.o CC drivers/infiniband/ulp/sdp/sdp_read.o CC net/sunrpc/svc.o CC drivers/infiniband/ulp/sdp/sdp_write.o CC drivers/base/firmware.o CC lib/bust_spinlocks.o UPD include/linux/compile.h CC net/unix/garbage.o CC init/version.o CC drivers/block/floppy.o CC drivers/pci/remove.o CC fs/exec.o CC arch/ppc64/kernel/signal.o CC arch/ppc64/kernel/bitops.o CC drivers/pci/pci.o CC arch/ppc64/kernel/syscalls.o CC drivers/infiniband/ulp/sdp/sdp_buff.o CC lib/kernel_lock.o CC drivers/md/xor.o HOSTCC lib/gen_crc32table CC drivers/base/class.o CC drivers/base/platform.o CC drivers/block/rd.o CC fs/select.o CC arch/ppc64/oprofile/op_model_rs64.o CC drivers/block/genhd.o CC mm/highmem.o CC arch/ppc64/oprofile/op_model_power4.o CC drivers/block/scsi_ioctl.o CC drive.rs/infiniband/ulp/srp/ib_srp.o CC fs/fifo.o CC kernel/sysctl.o CC arch/ppc64/kernel/pacaData.o CC drivers/serial/8250_early.o CC drivers/video/logo/logo.o CC drivers/infiniband/ulp/kdapl/ib/dapl_openib_qp.o CC drivers/video/fbmem.o CC net/sunrpc/sched.o CC kernel/capability.o CC mm/madvise.o CC net/core/stream.o CC drivers/scsi/constants.o CC drivers/infiniband/ulp/sdp/sdp_proc.o CC drivers/net/e1000/e1000_param.o CC net/core/scm.o CC drivers/video/console/dummycon.o CC drivers/char/n_tty.o CC drivers/input/keyboard/atkbd.o AS arch/ppc64/kernel/misc.o CC drivers/infiniband/ulp/ipoib/ipoib_verbs.o CC drivers/scsi/scsicam.o CC net/sunrpc/svcsock.o CC net/sunrpc/svcauth.o CC drivers/infiniband/core/ud_header.o CC drivers/infiniband/hw/mthca/mthca_reset.o CC drivers/infiniband/ulp/ipoib/ipoib_vlan.o CC drivers/infiniband/hw/mthca/mthca_allocator.o CC lib/bitmap.o CC arch/ppc64/lib/strcase.o CC drivers/video/fbmon.o AS arch/ppc64/lib/string.o CC arch/ppc64/kernel/udbg.o CC drivers/block/nbd.o CC arch/ppc64/kernel/sys_ppc32.o CC lib/cmdline.o CC drivers/input/misc/pcspkr.o CC drivers/base/cpu.o CC mm/memory.o CC . net/unix/sysctl_net_unix.o CC drivers/infiniband/ulp/sdp/sdp_inet.o CC drivers/block/noop-iosched.o CC kernel/ptrace.o CC drivers/block/loop.o LOGO drivers/video/logo/logo_linux_clut224.c LOGO drivers/video/logo/logo_superh_mono.c LOGO drivers/video/logo/clut_vga16.c LOGO drivers/video/logo/logo_superh_vga16.c CC arch/ppc64/kernel/ioctl32.o CC net/sunrpc/svcauth_unix.o CC drivers/base/map.o CC drivers/base/init.o CC drivers/pci/quirks.o CC kernel/user.o CC kernel/signal.o CC kernel/timer.o CC net/core/gen_estimator.o CC net/sunrpc/pmap_clnt.o CC fs/dcache.o CC net/sunrpc/timer.o CC net/sunrpc/auth.o CC mm/truncate.o CC drivers/infiniband/ulp/sdp/sdp_rcvd.o LOGO drivers/video/logo/logo_dec_clut224.c CC fs/locks.o CC drivers/video/matrox/matroxfb_base.o CC drivers/infiniband/ulp/iser/iser_mod.o CC drivers/md/md.o CC net/ipv4/protocol.o LOGO drivers/video/logo/logo_linux_mono.c CC drivers/pci/pci-driver.o CC drivers/input/mouse/psmouse-base.o CC net/core/gen_stats.o CC drivers/scsi/scsi_error.o CC arch/ppc64/lib/usercopy.o CC drivers/video/console/fbcon.o CC drivers/infiniband/core/verbs.o CC drivers/video/fbcmap.o CC drivers/bas..e/dmapool.o CC kernel/sys.o CC kernel/kmod.o CC drivers/scsi/scsi_lib.o LOGO drivers/video/logo/logo_linux_vga16.c CC drivers/infiniband/hw/mthca/mthca_eq.o CC net/core/sysctl_net_core.o CC lib/ctype.o CC fs/inode.o CC kernel/workqueue.o CC net/sunrpc/xdr.o CC arch/ppc64/oprofile/../../../drivers/oprofile/oprofilefs.o CC drivers/char/tty_ioctl.o CC drivers/video/matrox/matroxfb_accel.o CC drivers/infiniband/core/sysfs.o CC drivers/infiniband/ulp/sdp/sdp_conn.o CC drivers/infiniband/ulp/iser/iser_conn.o CC drivers/base/sys.o CC drivers/infiniband/hw/mthca/mthca_pd.o CC drivers/base/attribute_container.o LOGO drivers/video/logo/logo_m32r_clut224.c CC drivers/video/fbsysfs.o CC arch/ppc64/kernel/ptrace32.o CC drivers/infiniband/ulp/kdapl/ib/dapl_openib_util.o CC drivers/infiniband/ulp/kdapl/ib/dapl_openib_cm.o CC drivers/infiniband/ulp/sdp/sdp_iocb.o CC drivers/video/console/bitblit.o CC drivers/char/pty.o CC drivers/input/mouse/alps.o CC drivers/infiniband/ulp/kdapl/ib/dapl_cookie.o CC net/ipv4/ip_input.o CC drivers/infiniband/hw/mthca/mthca_mr.o CC drivers/infiniband/hw/mthca/mthca_cq.o CC fs/attr.o CC drivers/scsi/scsi_scan.o CC drivers/sc.si/scsi_sysfs.o CC drivers/input/mouse/logips2pp.o CC drivers/scsi/scsi_devinfo.o CC drivers/infiniband/ulp/sdp/sdp_send.o CC drivers/video/console/font_8x8.o CC drivers/video/console/fonts.o CC drivers/infiniband/ulp/iser/iser_initiator.o LD drivers/input/misc/built-in.o CC net/ipv4/ip_fragment.o CC drivers/infiniband/ulp/iser/iser_memory.o CC drivers/infiniband/hw/mthca/mthca_qp.o CC drivers/infiniband/hw/mthca/mthca_av.o CC drivers/infiniband/ulp/iser/iser_task.o CC drivers/input/mouse/synaptics.o CC lib/dec_and_lock.o CC drivers/input/mouse/lifebook.o CC fs/bad_inode.o CC drivers/video/matrox/matroxfb_DAC1064.o CC drivers/infiniband/core/device.o CC drivers/char/misc.o drivers/infiniband/core/at.c:1551: warning: initialization from incompatible pointer type drivers/infiniband/hw/mthca/mthca_main.c: In function `mthca_init_one': drivers/infiniband/hw/mthca/mthca_main.c:941: warning: implicit declaration of function `pci_pretty_name' drivers/infiniband/hw/mthca/mthca_main.c:941: warning: format argument is not a pointer (arg 2) drivers/infiniband/hw/mthca/mthca_main.c:945: warning: format argument is not a pointer (arg 2) CC drivers/video/console/font_8x16.o CC drivers/infiniband/ulp/kdapl/ib/dapl_cr.o CC fs/file.o . CC arch/ppc64/kernel/signal32.o CC drivers/video/console/tileblit.o CC lib/div64.o CC net/ipv4/ip_forward.o CC drivers/infiniband/ulp/sdp/sdp_recv.o CC net/ipv4/ip_options.o CC net/ipv4/ip_output.o CC drivers/base/transport_class.o CC net/ipv4/ip_sockglue.o CC drivers/input/mouse/trackpoint.o CC drivers/pci/search.o CC kernel/pid.o CC drivers/infiniband/core/fmr_pool.o CC drivers/char/vt_ioctl.o CC drivers/infiniband/ulp/iser/iser_dto.o CC drivers/infiniband/ulp/iser/iser_lkdapl.o CC net/core/dev.o CC drivers/video/modedb.o CC mm/mincore.o CC mm/mlock.o CC mm/mmap.o CC mm/mremap.o CC drivers/char/vc_screen.o CC mm/msync.o CC drivers/pci/pci-sysfs.o drivers/infiniband/hw/mthca/mthca_reset.c: In function `mthca_reset': drivers/infiniband/hw/mthca/mthca_reset.c:86: warning: implicit declaration of function `pci_pretty_name' drivers/infiniband/hw/mthca/mthca_reset.c:86: warning: format argument is not a pointer (arg 4) CC kernel/rcupdate.o CC drivers/infiniband/ulp/kdapl/ib/dapl_ep.o LOGO drivers/video/logo/logo_mac_clut224.c CC drivers/video/matrox/matroxfb_Ti3026.o CC net/sunrpc/sunrpc_syms.o CC net/ipv4/inet_hashtables.o CC mm/mprotect.o CC net/ipv4/ ..inet_timewait_sock.o LOGO drivers/video/logo/logo_parisc_clut224.c CC arch/ppc64/kernel/rtc.o CC drivers/video/matrox/matroxfb_misc.o CC drivers/infiniband/core/cache.o CC arch/ppc64/kernel/init_task.o LD drivers/hwmon/built-in.o CC arch/ppc64/kernel/lmb.o CC drivers/pci/rom.o CC arch/ppc64/kernel/cputable.o CC mm/rmap.o CC mm/vmalloc.o CC drivers/char/consolemap.o CC drivers/pci/setup-res.o CC net/sunrpc/rpc_pipe.o CONMK drivers/char/consolemap_deftbl.c CC drivers/scsi/scsi_proc.o CC net/sunrpc/cache.o CC drivers/scsi/scsi_sysctl.o CC fs/filesystems.o CC drivers/pci/setup-bus.o AS arch/ppc64/kernel/cpu_setup_power4.o CC drivers/pci/proc.o AS arch/ppc64/kernel/idle_power4.o CC mm/page_io.o CC drivers/char/selection.o CC net/ipv4/tcp.o CC mm/swap_state.o CC net/sunrpc/stats.o CC net/sunrpc/sysctl.o CC drivers/infiniband/core/smi.o CC drivers/char/keyboard.o CC net/ipv4/inet_connection_sock.o LD net/802/built-in.o LOGO drivers/video/logo/logo_sgi_clut224.c CC fs/namespace.o CC drivers/infiniband/hw/mthca/mthca_mad.o CC mm/swapfile.o CC drivers/infiniband/ulp/kdapl/ib/dapl_lmr.o CC drivers/infiniband/ulp/kdapl/ib/dapl._ia.o CC drivers/video/fbcvt.o CC net/sunrpc/auth_gss/auth_gss.o CC drivers/infiniband/ulp/iser/iser_socket.o CC drivers/infiniband/hw/mthca/mthca_mcg.o CC drivers/infiniband/ulp/kdapl/ib/dapl_evd.o CC net/ipv4/tcp_input.o CC drivers/scsi/sr.o CC drivers/scsi/sd.o CC mm/thrash.o CC drivers/infiniband/hw/mthca/mthca_provider.o CC drivers/infiniband/hw/mthca/mthca_memfree.o CC drivers/infiniband/ulp/kdapl/ib/dapl_provider.o CC drivers/infiniband/ulp/kdapl/ib/dapl_pz.o LOGO drivers/video/logo/logo_superh_clut224.c LOGO drivers/video/logo/logo_sun_clut224.c CC drivers/video/cfbfillrect.o LD drivers/video/console/font.o CC drivers/infiniband/core/mad.o CC drivers/base/power/shutdown.o CC kernel/intermodule.o CC net/ipv4/tcp_output.o CC drivers/scsi/sr_ioctl.o CC drivers/video/logo/logo_linux_mono.o CC drivers/video/cfbcopyarea.o CC kernel/extable.o CC drivers/infiniband/core/agent.o CC drivers/char/vt.o SHIPPED drivers/char/defkeymap.c CC drivers/pci/syscall.o CC drivers/char/hvc_console.o CC drivers/infiniband/ulp/kdapl/ib/dapl_ring_buffer_util.o CC drivers/infiniband/hw/mthca/mthca_uar.o CC drivers/infiniband/hw/mthca/mthca_srq.o CC fs/aio.o CC fs./seq_file.o CC net/core/ethtool.o CC drivers/char/sysrq.o CC net/core/dev_mcast.o CC lib/dump_stack.o CC drivers/infiniband/ulp/kdapl/ib/dapl_rmr.o CC mm/shmem.o CC mm/mempolicy.o CC drivers/infiniband/core/mad_rmpp.o LD arch/ppc64/mm/built-in.o CC lib/errno.o kernel/intermodule.c:178: warning: `inter_module_register' is deprecated (declared at kernel/intermodule.c:38) kernel/intermodule.c:179: warning: `inter_module_unregister' is deprecated (declared at kernel/intermodule.c:78) kernel/intermodule.c:181: warning: `inter_module_put' is deprecated (declared at kernel/intermodule.c:159) CC drivers/video/logo/logo_linux_vga16.o CC net/core/dst.o LD drivers/input/keyboard/built-in.o CC net/ipv4/tcp_timer.o CC fs/xattr.o CC drivers/scsi/sr_vendor.o CC drivers/base/node.o CC drivers/infiniband/ulp/sdp/sdp_event.o CC drivers/infiniband/ulp/kdapl/ib/dapl_srq.o CC drivers/infiniband/ulp/kdapl/ib/dapl_sp.o CC kernel/posix-timers.o CC drivers/char/hvc_vio.o CC net/core/neighbour.o LD arch/ppc64/oprofile/oprofile.o CC drivers/infiniband/core/ping.o CC net/sunrpc/auth_gss/gss_generic_token.o CC drivers/video/logo/logo_linux_clut224.o CC drivers/infiniband/ulp/sdp/sdp_pass.o CC drivers/video/softcursor.o CC kernel/params.o CC drivers/video/cfbimgblt.o CC drivers/scsi/scsi_transport_spi.o CC kernel/kthread.o CC kernel/wait.o CC drivers/char/hvsi.o drivers/net/e1000/e1000_main.c:3645: warning: `e1000_suspend' defined but not used CC lib/extable.o LD drivers/scsi/qla2xxx/built-in.o LD arch/ppc64/oprofile/built-in.o CC drivers/scsi/sym53c8xx_2/sym_glue.o CC fs/libfs.o LD drivers/macintosh/built-in.o CC arch/ppc64/kernel/iommu.o CC drivers/video/offb.o CC kernel/kfifo.o CC net/core/rtnetlink.o CC kernel/sys_ni.o CC lib/idr.o CC drivers/video/macmodes.o CC fs/fs-writeback.o CC arch/ppc64/kernel/sysfs.o LD security/built-in.o CC net/core/utils.o CC net/sunrpc/auth_gss/gss_mech_switch.o CC kernel/posix-cpu-timers.o CC net/ipv4/tcp_ipv4.o CC drivers/infiniband/ulp/sdp/sdp_link.o CC drivers/scsi/sg.o CC drivers/infiniband/ulp/sdp/sdp_actv.o CC drivers/scsi/st.o CC drivers/infiniband/core/sa_query.o LD net/packet/built-in.o LD drivers/base/power/built-in.o LD net/ethernet/built-in.o LD drivers/input/serio/built-in.o CC fs/mpage.o CC fs/direct-io.o LD init/mounts.o LD net/unix/unix.o LD ... net/unix/built-in.o CC drivers/scsi/sym53c8xx_2/sym_fw.o CPIO usr/initramfs_data.cpio GZIP usr/initramfs_data.cpio.gz CC net/sunrpc/auth_gss/gss_krb5_crypto.o CC net/sunrpc/auth_gss/gss_krb5_mech.o CC net/sunrpc/auth_gss/gss_krb5_seal.o LD drivers/video/fb.o LD net/netlink/built-in.o LD drivers/cdrom/built-in.o LD net/sched/built-in.o CC net/core/link_watch.o CC net/core/filter.o CC fs/ioprio.o CC fs/inotify.o CC net/ipv4/tcp_minisocks.o CC net/sunrpc/auth_gss/gss_krb5_unseal.o CC net/sunrpc/auth_gss/gss_krb5_seqnum.o LD ipc/built-in.o CC arch/ppc64/kernel/vdso.o CC lib/klist.o CC net/core/net-sysfs.o CC lib/int_sqrt.o CC fs/eventpoll.o CC fs/compat.o LD drivers/infiniband/ulp/srp/built-in.o CC net/ipv4/tcp_cong.o CC net/sunrpc/auth_gss/svcauth_gss.o CC kernel/futex.o CC net/ipv4/datagram.o CC drivers/char/raw.o CC drivers/char/consolemap_deftbl.o CC drivers/scsi/sym53c8xx_2/sym_hipd.o CC fs/nfsctl.o CC lib/kobject_uevent.o CC drivers/infiniband/core/uat.o CC lib/kobject.o CC drivers/infiniband/core/user_mad.o CC drivers/infiniband/core/uverbs_main.o CC drivers/infiniband/core/ucm.o LD init/built-i.n.o AR arch/ppc64/lib/lib.a LD drivers/infiniband/ulp/kdapl/kdapl.o CC net/ipv4/raw.o CC kernel/cpu.o CC lib/kref.o CC net/ipv4/udp.o CC net/ipv4/arp.o CC lib/prio_tree.o CC lib/rwsem.o CC drivers/infiniband/core/uverbs_cmd.o CC lib/sha1.o CC lib/string.o LD drivers/md/md-mod.o CC lib/radix-tree.o LD drivers/video/logo/built-in.o LD drivers/infiniband/ulp/iser/ib_iser.o CC fs/binfmt_script.o LD net/sunrpc/sunrpc.o CC fs/binfmt_elf.o CC drivers/infiniband/core/uverbs_mem.o CC drivers/char/defkeymap.o LD crypto/built-in.o CC drivers/scsi/sym53c8xx_2/sym_malloc.o CC kernel/dma.o drivers/infiniband/ulp/sdp/sdp_link.c:752: warning: initialization from incompatible pointer type CC arch/ppc64/kernel/pmc.o CC drivers/scsi/sym53c8xx_2/sym_nvram.o GEN lib/crc32table.h CC kernel/spinlock.o CC kernel/compat.o CC lib/vsprintf.o AS usr/initramfs_data.o LD drivers/infiniband/ulp/iser/built-in.o CC kernel/module.o CC kernel/kallsyms.o GZIP kernel/config_data.gz CC net/ipv4/igmp.o CC net/ipv4/icmp.o CC arch/ppc64/kernel/firmware.o LD drivers/video/matrox/built-in.o CC net/ipv4/devinet.o LD drivers/infiniband/ulp/ipoib/ib_ipoib.o LD usr/built-in.o LD drivers/pci/built-in.o CC fs/ext2/balloc.o CC lib/rbtree.o CC fs/ext2/bitmap.o CC fs/ext3/balloc.o CC fs/autofs/dirhash.o CC fs/isofs/namei.o CC fs/autofs/init.o CC fs/isofs/inode.o LDS arch/ppc64/kernel/vdso32/vdso32.lds LDS arch/ppc64/kernel/vdso64/vdso64.lds CC fs/exportfs/expfs.o LD drivers/infiniband/ulp/sdp/ib_sdp.o CC net/ipv4/fib_frontend.o CC net/ipv4/sysctl_net_ipv4.o LD drivers/md/built-in.o CC fs/isofs/dir.o CC net/ipv4/proc.o CC fs/jbd/transaction.o CC net/ipv4/ipip.o CC net/ipv4/syncookies.o CC fs/jfs/super.o LD drivers/base/built-in.o CC lib/crc32.o CC arch/ppc64/kernel/of_device.o CC net/ipv4/af_inet.o CC fs/cifs/cifsfs.o CC fs/devpts/inode.o CC fs/jbd/commit.o CC fs/jfs/file.o CC fs/jbd/recovery.o CC fs/jfs/inode.o CC fs/cifs/cifssmb.o CC fs/cifs/cifs_debug.o CC fs/ext3/bitmap.o CC fs/ext3/dir.o VDSO32A arch/ppc64/kernel/vdso32/sigtramp.o VDSO64A arch/ppc64/kernel/vdso64/sigtramp.o CC fs/ext3/fsync.o VDSO64A arch/ppc64/kernel/vdso64/gettimeofday.o CC fs/ext3/ialloc.o VDSO32A arch/ppc64/kernel/vdso32/gettimeofday.o VDSO32A arch/ppc64/ke.. .rnel/vdso32/datapage.o LD drivers/input/mouse/psmouse.o LD drivers/serial/built-in.o CC fs/ext2/dir.o CC net/ipv4/fib_semantics.o CC arch/ppc64/kernel/pci.o CC kernel/ksysfs.o CC kernel/stop_machine.o LD drivers/infiniband/ulp/kdapl/ib/kdapl_ib.o CC fs/ext2/file.o LD drivers/input/mouse/built-in.o CC net/ipv4/netfilter.o CC fs/ext2/ialloc.o CC fs/fat/cache.o LD drivers/infiniband/ulp/ipoib/built-in.o CC fs/autofs/inode.o CC fs/fat/dir.o CC fs/fat/fatent.o CC net/ipv4/fib_hash.o CC fs/ext2/fsync.o CC fs/fat/file.o CC fs/ext2/inode.o CC fs/ext2/ioctl.o CC fs/jfs/jfs_xtree.o LD drivers/infiniband/ulp/kdapl/ib/built-in.o CC fs/fat/inode.o CC arch/ppc64/kernel/pci_iommu.o CC fs/jbd/checkpoint.o CC fs/jbd/revoke.o CC fs/isofs/util.o CC net/ipv4/inet_diag.o VDSO64A arch/ppc64/kernel/vdso64/datapage.o VDSO64A arch/ppc64/kernel/vdso64/cacheflush.o CC fs/jfs/namei.o CC fs/jfs/jfs_mount.o LD drivers/infiniband/hw/mthca/ib_mthca.o LD drivers/video/console/built-in.o CC fs/jfs/jfs_umount.o LD drivers/video/built-in.o CC fs/jfs/jfs_imap.o CC fs/jfs/jfs_debug.o LD drivers/input/built-in.o VDSO32A arch/ppc64/ker..nel/vdso32/note.o CC fs/autofs/root.o LD drivers/net/e1000/e1000.o LD drivers/net/e1000/built-in.o LD drivers/net/built-in.o CC fs/autofs/waitq.o CC fs/ext3/file.o CC fs/cifs/dir.o LD drivers/infiniband/hw/mthca/built-in.o CC fs/isofs/rock.o CC fs/isofs/export.o LD drivers/infiniband/ulp/sdp/built-in.o CC fs/cifs/file.o CC fs/cifs/inode.o CC fs/cifs/link.o CC fs/cifs/misc.o CC fs/autofs/symlink.o CC fs/msdos/namei.o CC fs/lockd/clntlock.o CC kernel/softlockup.o CC arch/ppc64/kernel/nvram.o CC fs/lockd/clntproc.o CC net/ipv4/tcp_diag.o CC arch/ppc64/kernel/iomap.o CC arch/ppc64/kernel/pci_dn.o CC fs/fat/misc.o CC fs/ext2/super.o VDSO32A arch/ppc64/kernel/vdso32/cacheflush.o CC fs/ext2/namei.o CC fs/cifs/connect.o CC fs/jfs/jfs_dmap.o CC fs/jfs/jfs_unicode.o CC fs/cifs/netmisc.o CC kernel/seccomp.o VDSO64A arch/ppc64/kernel/vdso64/note.o CC arch/ppc64/kernel/pci_direct_iommu.o CC kernel/irq/handle.o CC fs/nfs/dir.o CC fs/jfs/jfs_dtree.o CC fs/ext3/inode.o CC fs/ext2/symlink.o CC fs/jbd/journal.o CC fs/lockd/host.o IKCFG kernel/config_data.h CC fs/cifs/smbdes.o LD mm/built-in..o LD drivers/infiniband/ulp/kdapl/built-in.o CC net/ipv4/tcp_bic.o CC kernel/configs.o CC kernel/irq/manage.o CC kernel/irq/spurious.o LD fs/devpts/devpts.o CC fs/nfs/file.o CC kernel/irq/proc.o CC fs/ext3/ioctl.o CC fs/ext3/namei.o CC fs/lockd/svc.o CC fs/ext3/symlink.o CC fs/ext3/super.o CC fs/nfs/inode.o LD drivers/scsi/scsi_mod.o CC fs/lockd/svclock.o LD fs/nfs_common/built-in.o LD drivers/scsi/sd_mod.o CC fs/nfsd/nfssvc.o CC fs/cifs/smbencrypt.o CC fs/nfs/pagelist.o CC fs/nfs/nfs2xdr.o VDSO64L arch/ppc64/kernel/vdso64/vdso64.so CC arch/ppc64/kernel/i8259.o CC fs/jfs/jfs_inode.o CC fs/nfsd/nfsctl.o CC fs/nfs/proc.o CC fs/nfsd/nfsproc.o LD fs/exportfs/exportfs.o CC fs/nfs/read.o CC fs/lockd/svcshare.o LD net/sunrpc/auth_gss/auth_rpcgss.o CC fs/lockd/svcproc.o LD fs/autofs/autofs.o LD fs/autofs/built-in.o LD net/sunrpc/auth_gss/rpcsec_gss_krb5.o CC fs/lockd/svcsubs.o LD drivers/scsi/sr_mod.o CC arch/ppc64/kernel/prom_init.o CC fs/lockd/mon.o CC fs/reiserfs/bitmap.o CC fs/jfs/jfs_extent.o CC fs/sysfs/inode.o CC fs/reiserfs/do_balan.o CC fs/partitions/check.o CC . fs/proc/mmu.o CC fs/ext3/hash.o CC fs/nfsd/nfsfh.o LD fs/exportfs/built-in.o CC fs/nfsd/auth.o LD net/sunrpc/auth_gss/built-in.o LD net/sunrpc/built-in.o CC arch/ppc64/kernel/prom.o AS arch/ppc64/kernel/vdso64/vdso64_wrapper.o CC fs/nfsd/vfs.o CC fs/nfsd/export.o CC fs/partitions/msdos.o CC fs/proc/task_mmu.o LD lib/built-in.o CC fs/nfsd/lockd.o CC fs/proc/inode.o CC fs/nfsd/nfscache.o CC fs/jfs/symlink.o CC fs/nls/nls_base.o CC fs/ext3/resize.o CC fs/xfs/linux-2.6/xfs_stats.o CC arch/ppc64/kernel/pSeries_pci.o CC fs/dnotify.o LD net/core/built-in.o CC arch/ppc64/kernel/pSeries_lpar.o CC fs/dcookies.o AS arch/ppc64/kernel/pSeries_hvCall.o CC fs/cifs/asn1.o CC fs/vfat/namei.o CC arch/ppc64/kernel/pSeries_nvram.o CC fs/reiserfs/namei.o CC fs/reiserfs/dir.o CC arch/ppc64/kernel/rtasd.o CC fs/sysfs/file.o CC fs/lockd/xdr.o LD fs/devpts/built-in.o CC fs/reiserfs/inode.o CC fs/cifs/md4.o CC fs/lockd/xdr4.o CC fs/cifs/md5.o CC fs/reiserfs/file.o CC fs/sysfs/mount.o CC fs/sysfs/bin.o CC fs/cifs/transport.o CC fs/sysfs/dir.o LD arch/ppc64/kernel/vdso64/built-in.o ..CC fs/nfsd/stats.o CC fs/proc/base.o AR lib/lib.a CC fs/proc/root.o CC fs/nfsd/nfsxdr.o CC fs/jfs/jfs_logmgr.o CC fs/jfs/jfs_metapage.o CC fs/sysfs/group.o CC fs/xfs/linux-2.6/xfs_sysctl.o CC fs/reiserfs/fix_node.o CC fs/proc/generic.o CC fs/proc/array.o CC fs/nfsd/nfs3xdr.o CC fs/nfsd/nfs3proc.o CC fs/jfs/jfs_uniupr.o CC arch/ppc64/kernel/ras.o CC fs/cifs/cifs_unicode.o CC fs/jfs/jfs_txnmgr.o CC fs/sysfs/symlink.o CC fs/xfs/linux-2.6/xfs_ioctl32.o CC fs/proc/kmsg.o fs/ext3/super.c: In function `ext3_show_options': fs/ext3/super.c:516: warning: unused variable `sbi' CC fs/proc/proc_tty.o CC fs/jfs/resize.o CC fs/nfs/symlink.o CC fs/ramfs/inode.o CC arch/ppc64/kernel/pSeries_reconfig.o CC fs/cifs/nterr.o CC fs/jfs/xattr.o LD fs/nls/built-in.o CC fs/proc/proc_misc.o CC fs/proc/proc_devtree.o CC fs/lockd/svc4proc.o CC fs/reiserfs/super.o LD drivers/char/built-in.o LD drivers/block/built-in.o CC fs/xfs/linux-2.6/xfs_export.o CC fs/xfs/xfs_alloc.o LD fs/isofs/isofs.o CC fs/reiserfs/prints.o CC fs/nfs/unlink.o CC fs/cifs/xattr.o CC fs/nfs/write.o CC fs/cifs/cifsencrypt.o CC arch/ppc64/kernel/pSeries_setup.o CC fs/reiserfs/objectid.o CC fs/nfs/nfs3proc.o CC arch/ppc64/kernel/pSeries_iommu.o CC fs/xfs/xfs_attr.o CC fs/xfs/xfs_alloc_btree.o CC fs/nfs/nfs3xdr.o CC fs/xfs/xfs_attr_leaf.o CC fs/xfs/xfs_behavior.o CC arch/ppc64/kernel/udbg_16550.o CC fs/reiserfs/lbalance.o CC fs/reiserfs/ibalance.o CC fs/reiserfs/stree.o CC fs/reiserfs/hashes.o CC fs/reiserfs/tail_conversion.o LD fs/ramfs/ramfs.o CC arch/ppc64/kernel/proc_ppc64.o CC arch/ppc64/kernel/smp.o CC arch/ppc64/kernel/module.o CC arch/ppc64/kernel/ppc_ksyms.o CC fs/nfs/nfs4proc.o CC fs/nfs/nfs4state.o CC fs/cifs/fcntl.o CC fs/nfs/nfs4xdr.o LD fs/fat/fat.o CC fs/xfs/xfs_bit.o CC arch/ppc64/kernel/eeh.o LD net/ipv4/built-in.o CC fs/xfs/xfs_bmap.o CC fs/xfs/xfs_btree.o CC arch/ppc64/kernel/rtas.o CC fs/xfs/xfs_bmap_btree.o LD fs/ramfs/built-in.o CC fs/xfs/xfs_buf_item.o LD fs/isofs/built-in.o CC fs/xfs/xfs_da_btree.o LD fs/vfat/vfat.o LD kernel/irq/built-in.o CC fs/reiserfs/ioctl.o CC fs/reiserfs/procfs.o LD fs/fat/built-in.o LD drivers/infiniband/core/ib_core.o CC fs/nfs/nfs4renewd.o LD net/buil..t-in.o CC fs/nfs/delegation.o CC fs/nfs/callback.o CC fs/nfs/idmap.o LD fs/jbd/jbd.o VDSO32L arch/ppc64/kernel/vdso32/vdso32.so CC fs/nfs/callback_xdr.o CC fs/nfs/callback_proc.o AS arch/ppc64/kernel/vdso32/vdso32_wrapper.o LD drivers/infiniband/core/ib_mad.o LD drivers/infiniband/core/ib_ping.o CC fs/reiserfs/journal.o CC fs/reiserfs/resize.o LD fs/vfat/built-in.o LD arch/ppc64/kernel/vdso32/built-in.o LD kernel/built-in.o CC fs/xfs/xfs_dir2_data.o CC arch/ppc64/kernel/rtas_pci.o LD drivers/infiniband/core/ib_cm.o LD fs/jbd/built-in.o CC fs/xfs/xfs_dir.o CC fs/xfs/xfs_dir2.o CC arch/ppc64/kernel/rtas-proc.o CC arch/ppc64/kernel/scanlog.o CC fs/xfs/xfs_dir2_block.o CC fs/xfs/xfs_dir2_leaf.o CC arch/ppc64/kernel/hvconsole.o LD drivers/infiniband/core/ib_sa.o LD fs/partitions/built-in.o CC fs/cifs/readdir.o CC fs/reiserfs/item_ops.o CC fs/xfs/xfs_dir2_node.o LD drivers/infiniband/core/ib_at.o CC fs/cifs/ioctl.o LD fs/ext2/ext2.o CC arch/ppc64/kernel/vio.o CC arch/ppc64/kernel/pSeries_vio.o LD drivers/infiniband/core/ib_umad.o LD drivers/infiniband/core/ib_uverbs.o CC arch/ppc64/kernel/xics.o LD fs/msdo. .s/msdos.o LD fs/ext2/built-in.o LD fs/proc/proc.o LD fs/sysfs/built-in.o CC fs/xfs/xfs_dir_leaf.o CC fs/xfs/xfs_error.o CC fs/xfs/xfs_extfree_item.o CC fs/xfs/xfs_fsops.o CC fs/xfs/xfs_ialloc.o CC arch/ppc64/kernel/mpic.o CC arch/ppc64/kernel/pmac_setup.o CC arch/ppc64/kernel/pmac_pci.o CC arch/ppc64/kernel/pmac_time.o CC fs/xfs/xfs_dir2_sf.o CC arch/ppc64/kernel/pmac_nvram.o CC fs/xfs/xfs_ialloc_btree.o LD drivers/infiniband/core/ib_ucm.o LD drivers/infiniband/core/ib_uat.o LD drivers/scsi/sym53c8xx_2/sym53c8xx.o LD drivers/scsi/sym53c8xx_2/built-in.o LD drivers/scsi/built-in.o LD fs/proc/built-in.o CC fs/xfs/xfs_inode_item.o CC fs/xfs/xfs_iocore.o CC fs/xfs/xfs_iomap.o LD drivers/infiniband/core/built-in.o CC arch/ppc64/kernel/pmac_low_i2c.o CC fs/xfs/xfs_itable.o CC arch/ppc64/kernel/pmac_feature.o LD fs/nfsd/nfsd.o CC arch/ppc64/kernel/pmac_smp.o CC fs/xfs/xfs_iget.o CC arch/ppc64/kernel/u3_iommu.o LD fs/msdos/built-in.o CC arch/ppc64/kernel/smp-tbsync.o LD fs/lockd/lockd.o CC arch/ppc64/kernel/udbg_scc.o CC fs/xfs/xfs_inode.o CC fs/xfs/xfs_dfrag.o CC fs/xfs/xfs_log_recover.o CC fs/xfs/xfs_m.acros.o CC fs/xfs/xfs_mount.o LD drivers/infiniband/built-in.o CC arch/ppc64/kernel/pSeries_smp.o LD drivers/built-in.o LD fs/nfsd/built-in.o LD fs/ext3/ext3.o CC fs/xfs/xfs_trans_ail.o CC fs/xfs/xfs_trans_buf.o CC fs/xfs/xfs_trans_extfree.o CC fs/xfs/xfs_trans_inode.o CC fs/xfs/xfs_rename.o CC fs/xfs/xfs_log.o LD fs/ext3/built-in.o CC arch/ppc64/kernel/vecemu.o AS arch/ppc64/kernel/vector.o AS arch/ppc64/kernel/head.o LDS arch/ppc64/kernel/vmlinux.lds LD fs/lockd/built-in.o CC fs/xfs/xfs_trans.o CC fs/xfs/xfs_trans_item.o CC fs/xfs/xfs_utils.o CC fs/xfs/xfs_vnodeops.o CC fs/xfs/xfs_rw.o CC fs/xfs/xfs_dmops.o CC fs/xfs/xfs_qmops.o CC fs/xfs/xfs_vfsops.o CC fs/xfs/linux-2.6/kmem.o LD fs/jfs/jfs.o CC fs/xfs/linux-2.6/xfs_buf.o CC fs/xfs/linux-2.6/xfs_aops.o LD fs/cifs/cifs.o CC fs/xfs/linux-2.6/xfs_ioctl.o CC fs/xfs/linux-2.6/xfs_iops.o CC fs/xfs/linux-2.6/xfs_lrw.o LD fs/jfs/built-in.o CC fs/xfs/support/move.o CC fs/xfs/support/uuid.o CC fs/xfs/linux-2.6/xfs_file.o LD fs/cifs/built-in.o CC fs/xfs/linux-2.6/xfs_fs_subr.o CC fs/xfs/linux-2.6/xfs_vnode.o CC fs/xfs/linux-2.6/xfs_vfs.o CC fs/xfs/linux-2.6/xfs_globals.o CC fs/xfs/support/debug.o CC fs/xfs/linux-2.6/xfs_super.o LD arch/ppc64/kernel/built-in.o LD fs/nfs/nfs.o LD fs/nfs/built-in.o LD fs/reiserfs/reiserfs.o LD fs/reiserfs/built-in.o LD fs/xfs/xfs.o LD fs/xfs/built-in.o LD fs/built-in.o GEN .version CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD .tmp_vmlinux1 drivers/built-in.o(.text+0x130df8): In function `.mthca_reset': : undefined reference to `.pci_pretty_name' drivers/built-in.o(.init.text+0x17be0): In function `.mthca_init_one': : undefined reference to `.pci_pretty_name' drivers/built-in.o(.init.text+0x189e0): In function `.mthca_init_one': : undefined reference to `.pci_pretty_name' make: *** [.tmp_vmlinux1] Error 1 09/24/05-00:04:48 Build the kernel. Failed rc = 2 09/24/05-00:04:48 build: kernel build Failed rc = 1 09/24/05-00:04:48 command complete: (2) rc=126 Failed and terminated the run Fatal error, aborting autorun From QiWang.Chen at Clustars.CN Sat Sep 24 01:46:50 2005 From: QiWang.Chen at Clustars.CN (QiWang, Chen) Date: Sat, 24 Sep 2005 16:46:50 +0800 Subject: [openib-general] [Fwd: NOP command failed to generate interrupt (IRQ 201)] Message-ID: <1127551610.4675.3.camel@QiWang> Thank you Grant, I solve the problem. On openib-gen2 installed node, I turned off the acpi with kernel command line parameter: " pci=noacpi" On my mellanox driver installed nodes, I turnoff acpi ,and all nodes work fine now. Thank you very much. -------- Forwarded Message -------- > From: QiWang, Chen > To: Grant Grundler > Cc: openib-general > Subject: NOP command failed to generate interrupt (IRQ 201) > Date: Sat, 24 Sep 2005 14:58:10 +0800 > Hi, grant > > On node c01-14, I installed openib-gen2, kernel 2.6.13.2 > but I have IRQ problem. > ----------------------------------- > > ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) > ib_mthca: Initializing (0000:03:00.0) > ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 51 (level, low) -> IRQ 201 > ib_mthca 0000:03:00.0: Found bridge: (0000:02:01.0) > ib_mthca 0000:03:00.0: FW version 000300030003, max commands 64 > ib_mthca 0000:03:00.0: FW size 6143 KB (start e7a00000, end e7ffffff) > ib_mthca 0000:03:00.0: HCA memory size 131071 KB (start e0000000, end > e7ffffff) > ib_mthca 0000:03:00.0: Max QPs: 16777216, reserved QPs: 1024, entry > size: 256 > ib_mthca 0000:03:00.0: Max SRQs: 1024, reserved SRQs: 16, entry size: > 32 > ib_mthca 0000:03:00.0: Max CQs: 16777216, reserved CQs: 128, entry > size: 64 > ib_mthca 0000:03:00.0: Max EQs: 64, reserved EQs: 1, entry size: 64 > ib_mthca 0000:03:00.0: reserved MPTs: 16, reserved MTTs: 16 > ib_mthca 0000:03:00.0: Max PDs: 16777216, reserved PDs: 0, reserved > UARs: 1 > ib_mthca 0000:03:00.0: Max QP/MCG: 16777216, reserved MGMs: 0 > ib_mthca 0000:03:00.0: Flags: 00370347 > ib_mthca 0000:03:00.0: profile[ 0]--10/20 @ 0x e0000000 (size > 0x 4000000) > ib_mthca 0000:03:00.0: profile[ 1]-- 0/16 @ 0x e4000000 (size > 0x 1000000) > ib_mthca 0000:03:00.0: profile[ 2]-- 7/18 @ 0x e5000000 (size > 0x 800000) > ib_mthca 0000:03:00.0: profile[ 3]-- 9/17 @ 0x e5800000 (size > 0x 800000) > ib_mthca 0000:03:00.0: profile[ 4]-- 3/16 @ 0x e6000000 (size > 0x 400000) > ib_mthca 0000:03:00.0: profile[ 5]-- 4/16 @ 0x e6400000 (size > 0x 200000) > ib_mthca 0000:03:00.0: profile[ 6]--12/15 @ 0x e6600000 (size > 0x 100000) > ib_mthca 0000:03:00.0: profile[ 7]-- 8/13 @ 0x e6700000 (size > 0x 80000) > ib_mthca 0000:03:00.0: profile[ 8]--11/11 @ 0x e6780000 (size > 0x 10000) > ib_mthca 0000:03:00.0: profile[ 9]-- 2/10 @ 0x e6790000 (size > 0x 8000) > ib_mthca 0000:03:00.0: profile[10]-- 6/ 5 @ 0x e6798000 (size > 0x 800) > ib_mthca 0000:03:00.0: HCA memory: allocated 106082 KB/124928 KB > (18846 KB free) > ib_mthca 0000:03:00.0: Allocated EQ 1 with 65536 entries > ib_mthca 0000:03:00.0: Allocated EQ 2 with 128 entries > ib_mthca 0000:03:00.0: Allocated EQ 3 with 128 entries > ib_mthca 0000:03:00.0: Setting mask 00000000000f43fe for eqn 2 > ib_mthca 0000:03:00.0: Setting mask 0000000000000400 for eqn 3 > ib_mthca 0000:03:00.0: NOP command failed to generate interrupt (IRQ > 201), aborting. > ib_mthca 0000:03:00.0: BIOS or ACPI interrupt routing problem? > ib_mthca 0000:03:00.0: Clearing mask 00000000000f43fe for eqn 2 > ib_mthca 0000:03:00.0: Clearing mask 0000000000000400 for eqn 3 > ACPI: PCI interrupt for device 0000:03:00.0 disabled > ib_mthca: probe of 0000:03:00.0 failed with error -16 > ----------------------------------- > > > On Thu, 2005-09-22 at 11:29 -0700, Grant Grundler wrote: > > > On Thu, Sep 22, 2005 at 05:22:58PM +0800, QiWang, Chen wrote: > > > Hi , > > > I install openib gen2 on c01-14, kernel=2.6.13 > > > But I do not know how it works. > > > > See the openib-gen2/README.kernel-build for directions on > > how to build+install the kernel drivers. > > > > See the openib.org wiki for userspace instructions: > > https://openib.org/tiki/tiki-index.php > > > > If something is wrong or not clear, please ask on the openib-general > > mailing list. > > > > > > grant > > -- > QiWang, Chen > Clustars Supercomputing Technology corp. > http://www.Clustars.CN > TEL:+86-0816-2546345-815 > FAX:+86-0816-2546370 > Mobile:+86-13096497499 -- QiWang, Chen Clustars Supercomputing Technology corp. http://www.Clustars.CN TEL:+86-0816-2546345-815 FAX:+86-0816-2546370 Mobile:+86-13096497499 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Sat Sep 24 10:19:53 2005 From: rolandd at cisco.com (Roland Dreier) Date: Sat, 24 Sep 2005 10:19:53 -0700 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <20050924074611.GD3950@us.ibm.com> (Nishanth Aravamudan's message of "Sat, 24 Sep 2005 00:46:11 -0700") References: <20050924074611.GD3950@us.ibm.com> Message-ID: <52psqy8jt2.fsf@cisco.com> Nish> I have a prototype of something similar running right now, Nish> to help test InfiniBand, both in mainline and in the svn Nish> repo. Basically, every night (this part hasn't been set up Nish> yet, but should be nothing more than a crontab entry), I can Nish> spawn a build job for InfiniBand. Currently, it will only Nish> cover compile-testing in the following sense: build current Nish> -git with IB options set to =y and =m in x86 and ppc64; and Nish> build current -git with the current svn code linked and IB Nish> options set to =y and =m in x86 and ppc64. This is great, thanks! The build of latest git + latest svn might not always succeed, because we try to keep svn working with the latest full kernel release, but it's still very helpful to get advance warning of API changes that will break our tree. Nish> I have attached below my results from 2.6.14-rc2-git3. Only Nish> build failure was the gen2 kernel code under ppc64 with Nish> everything set to y. I just checked in a fix for this -- the pci_pretty_name() API has gone away, so I removed our use of it in svn. I don't understand how your other builds of git + svn succeeded though, since pci_pretty_name is completely gone. Oh, I guess you'll miss link failures when building modules, so functions that disappear won't break the build. Still, how did the x86 =y build succeed? Also, is there any way to set this up to spam me when a build fails? Of course, the next step is to add some real IB hardware to the test farm and start running regression tests. (And yes I know that's a lot of work ;) But even just seeing if the drivers load successfully would be quite useful. Thanks, Roland From nacc at us.ibm.com Sat Sep 24 11:08:50 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Sat, 24 Sep 2005 11:08:50 -0700 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <52psqy8jt2.fsf@cisco.com> References: <20050924074611.GD3950@us.ibm.com> <52psqy8jt2.fsf@cisco.com> Message-ID: <20050924180850.GA28695@us.ibm.com> On 24.09.2005 [10:19:53 -0700], Roland Dreier wrote: > Nish> I have a prototype of something similar running right now, > Nish> to help test InfiniBand, both in mainline and in the svn > Nish> repo. Basically, every night (this part hasn't been set up > Nish> yet, but should be nothing more than a crontab entry), I can > Nish> spawn a build job for InfiniBand. Currently, it will only > Nish> cover compile-testing in the following sense: build current > Nish> -git with IB options set to =y and =m in x86 and ppc64; and > Nish> build current -git with the current svn code linked and IB > Nish> options set to =y and =m in x86 and ppc64. > > This is great, thanks! The build of latest git + latest svn might not > always succeed, because we try to keep svn working with the latest > full kernel release, but it's still very helpful to get advance > warning of API changes that will break our tree. Right, I figured the more unstable compile-test (git + svn) may be slightly less useful, but it was far more interesting to get working in the framework we have :) And now that I have it consistently running, it's no extra work to add those compile tests. > Nish> I have attached below my results from 2.6.14-rc2-git3. Only > Nish> build failure was the gen2 kernel code under ppc64 with > Nish> everything set to y. > > I just checked in a fix for this -- the pci_pretty_name() API has gone > away, so I removed our use of it in svn. I don't understand how your > other builds of git + svn succeeded though, since pci_pretty_name is > completely gone. Oh, I guess you'll miss link failures when building > modules, so functions that disappear won't break the build. Still, > how did the x86 =y build succeed? The git + svn builds on x86 with =y did not succeed due to gcc errors on those machines. I am rerunning everything 2.6.13-rc2-git4 on machines I know are capable of building and will send out a new mail in a little bit (only two more jobs left to complete). > Also, is there any way to set this up to spam me when a build fails? Eventually, probably. Initially, I will keep bouncing mails to you and openib-general, when things go bad. I will try to figure out a good way to automagic this part, but it's a little lower priority ... > Of course, the next step is to add some real IB hardware to the test > farm and start running regression tests. (And yes I know that's a lot > of work ;) But even just seeing if the drivers load successfully would > be quite useful. Indeed! That in fact is the next *planned* step. In fact, we do have h/w that probably could be used, but it's a matter of getting the machines in the harness and doing some testing to make sure it's working ok. We also would like to do userspace builds off the svn tree and run whatever tests are available. That will be more work, of course, and requires some autobuilding capability across all openib userspace tools (that functionality really makes the testing simpler). One of our team-members is working on that part, but is currently occupied elsewhere. We'll keep the community posted on any progres we make (including contributing back any fancy Makefiles we eventually come up with to build *everything* :) Hopefully, once I'm sure my build-tests are successful from cron, I can take a look at minimally booting the kernels I'm building. Finally, the way I have things set up right now, I generate .config sections in the following way (excuse the amazingly long line): for option in $(find $DEV_TREE_PREFIX/drivers/infiniband -type f -name Kconfig -exec grep ^config {} \; | grep -iv debug | cut -f 2 -d " "); do echo CONFIG_$option=y >> $OPENIB_YES_CONFIG_FILE echo CONFIG_$option=m >> $OPENIB_MOD_CONFIG_FILE done Basically meaning I either turn all options (except for debug) either =y or =m. Now, I could try and generate combinations of those, but I'm not sure if I can always guarantee the combinations will be consisten relative to Kconfig dependencies. Would such combinations be useful? Thanks, Nish From nacc at us.ibm.com Sat Sep 24 11:11:08 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Sat, 24 Sep 2005 11:11:08 -0700 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <52psqy8jt2.fsf@cisco.com> References: <20050924074611.GD3950@us.ibm.com> <52psqy8jt2.fsf@cisco.com> Message-ID: <20050924181108.GB28695@us.ibm.com> On 24.09.2005 [10:19:53 -0700], Roland Dreier wrote: > Nish> I have a prototype of something similar running right now, > Nish> to help test InfiniBand, both in mainline and in the svn > Nish> repo. Basically, every night (this part hasn't been set up > Nish> yet, but should be nothing more than a crontab entry), I can > Nish> spawn a build job for InfiniBand. Currently, it will only > Nish> cover compile-testing in the following sense: build current > Nish> -git with IB options set to =y and =m in x86 and ppc64; and > Nish> build current -git with the current svn code linked and IB > Nish> options set to =y and =m in x86 and ppc64. > > This is great, thanks! The build of latest git + latest svn might not > always succeed, because we try to keep svn working with the latest > full kernel release, but it's still very helpful to get advance > warning of API changes that will break our tree. > > Nish> I have attached below my results from 2.6.14-rc2-git3. Only > Nish> build failure was the gen2 kernel code under ppc64 with > Nish> everything set to y. > > I just checked in a fix for this -- the pci_pretty_name() API has gone > away, so I removed our use of it in svn. I don't understand how your > other builds of git + svn succeeded though, since pci_pretty_name is > completely gone. Oh, I guess you'll miss link failures when building > modules, so functions that disappear won't break the build. Still, > how did the x86 =y build succeed? And, in fact, the x86 =y build also fails, same issue (now that I've found a consistently working machine, shouldn't run into the gcc problems again; we tend not to update the test machines). Thanks, Nish From eitan at mellanox.co.il Sat Sep 24 13:01:26 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 24 Sep 2005 23:01:26 +0300 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <4df28be405092313225529cfe4@mail.gmail.com> References: <4df28be405092313225529cfe4@mail.gmail.com> Message-ID: <4335B096.2060901@mellanox.co.il> Hi Viswa, Please run step 4 with verbose : osmtest -f a -V -l /tmp/osmtest.log If it fails - please send us one copy of the /tmp/osmtest.log This is just a guess but I think the "bug" will be in the fact that the SM did had a chance to completely cleanup between the tests and the tests are picky about the SM state (like number of services, multicast groups etc. We will try to reproduce in here too. Thanks Eitan Viswanath Krishnamurthy wrote: > > On 23 Sep 2005 13:49:31 -0400, Hal Rosenstock < halr at voltaire.com> wrote: > > Hi Viswa, > > On Fri, 2005-09-23 at 13:43, Viswanath Krishnamurthy wrote: > >>More information, >> >>The test case is as follows >> >>1. Start opensm in verbose mode (-V) >>2. Ping remote node >>3. osmtest -f c >>4. osmtest -f a >>5. pkill -9 opensm >>6. Repeat over >> >>Out of about 2500 iterations, 143 osmtest failed. Keep in mind, >>only Step 4 failed. > > > Yes. > > Do you see any port LEDs on the switch blink indicating the port went > down from active and back while running this ? > > > > No, I ran this test overnight and logged the results. I will try it next week and let you know. > > > > >>Step 3 which is inventory file creation *never* failed. (I think >>inventory file creation also talks to SA right ?) > > > Right. > > -- Hal > > > > > From eitan at mellanox.co.il Sat Sep 24 13:43:30 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sat, 24 Sep 2005 23:43:30 +0300 Subject: [openib-general] Re: Another opensm problem ? In-Reply-To: <1127516065.4398.1937.camel@hal.voltaire.com> References: <1127516065.4398.1937.camel@hal.voltaire.com> Message-ID: <4335BA72.9080203@mellanox.co.il> Hi Viswa and Hal, I have read through the thread and have few comments. But first let me see if I understand the test run correctly. The test is as follows: 1. OpenSM starts up configuring the subnet. 2. Then the user ears up a cable and connects it to the other side port of a switch 3. The SM is supposed to bring up the new connection 4. Step 2 is repeated until the SM stops responding. Well, if this is the case then OpenSM is might stop responding due to the following features: 1. We had in the past cases where bad hardware continuously flooded the SM with Traps. To protect against this kind of DOS attack we have implemented an adaptive filter in the SM trap receiver: If the exact same trap is received continuously from same source more then 10 times (with no more then of 5sec between the traps) they are considered DOS and are ignored. Please see osm_trap_rcv.c for details. 2. The way IB switches work is that each time a port of their changes state they: a. Set the "change bit" in the SwitchInfo b. Send a trap 128 to the SM. But Trap 128 does not carry the changed port number. So under a test case like you describe what can happen: 1. The SM decides to ignore trap 128 from the switch as more then 5 connect/reconnect sequences happen with not enough "quite" time to recover. 2. The SwitchInfo ChangeBit is sampled during the OSM light sweep. There is a race between the reading of the change bit and the clearing of it. If the connect disconnect happen very fast the change bit set by the re-connect can be cleaned by the clear starting by the disconnect. It is easy to see in the log file if the SM did ignore traps. Run with -V and look for: grep "Continuously received this trap" /var/log/osm.log (for some reason I did not get any log attachments with this thread - otherwise I would do some analysis on it too). Anyway, if the SM does not heavy sweep (due to the above) it is very likely it will continue to poll the non existing node that was previously attached to a switch port with no success. So testing of cable tear off and reconnect should be done with at least 10 seconds recovery time. Also you could try sending kill -HUP to the OpenSM process and see if the full sweep you start is able to bring all ports up. Viswa, with all that said, it is very possible you are experiencing a bug in OpenSM and we want to encourage your effort finding those. With your, and others, help we will be able to flush them out. Thanks Eitan Hal Rosenstock wrote: > On Fri, 2005-09-23 at 14:57, Hal Rosenstock wrote: > >>On Fri, 2005-09-23 at 13:50, Viswanath Krishnamurthy wrote: >> >>>- After 7-8 iterations, I ran into a weird problem, where opensm was >>>showing the HCA as UNKNOWN. The port >>>never came up to ACTIVE state. The unplugged and replugged into >>>different slots, the port remained in INIT >>>state. >> >>Mellanox : SW : 12 : INI : : : 2048 : 1x : 2.5 : > > 0002c9010d26e780 : UNKNOWN > >>OpenSM thinks that either there is no physical port on the other end > > of > >>the link or it is not "valid" (GUID non 0). Obviously it is there as > > the > >>port state is INIT so the physical link came up which requires the >>remote end to be there. > > >>From the log you sent, this is exactly what is happening. > Sep 23 10:07:23 451191 [B7751BB0] -> osm_drop_mgr_process: Checking port > 0x0002c9010d26e780. > Sep 23 10:07:23 451209 [B7751BB0] -> osm_drop_mgr_process: Checking port > 0x0002c90200400cfd. > Sep 23 10:07:23 451226 [B7751BB0] -> osm_drop_mgr_process: ERR 0108: > Unknown remote side for node 0x0002c9010d26e780 port 20. Adding to light > sweep sampling list. > Sep 23 10:07:23 451251 [B7751BB0] -> Directed Path Dump of 1 hop path: > Path = [0][1] > Sep 23 10:07:23 451267 [B7751BB0] -> osm_drop_mgr_process: ] > > So look in osm_drop_mgr.c line 707: > Can you enhance the log display to see which is failing: > osm_physp_is_valid(p_physp) or osm_physp_get_remote(p_physp) ? > > Also, it appears to keep light sweeping this port but whichever switch > port it is on, it does not respond. Not sure where the problem is. It > could be on the outgoing side of the switch (we could run diags against > the switch and various ports; I would be curious what they return when > the subnet is in this broken state) or on the HCA. However, the fact > that restarting opensm made it go away without touching anything else > makes this appear otherwise. > > >>One other note is that it appears to have come up as 1x. Is that what >>should happen ? > > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rolandd at cisco.com Sat Sep 24 14:10:38 2005 From: rolandd at cisco.com (Roland Dreier) Date: Sat, 24 Sep 2005 14:10:38 -0700 Subject: [openib-general] Re: [PATCH] check for valid MGID in user space In-Reply-To: <20050922124511.GA9109@mellanox.co.il> (Jack Morgenstein's message of "Thu, 22 Sep 2005 15:45:11 +0300") References: <20050922124511.GA9109@mellanox.co.il> Message-ID: <52ll1m894h.fsf@cisco.com> Jack> The following patch checks validity of MGID when Jack> attaching/detaching a QP to/from a multicast group (for Jack> user-space only). IB spec demands that multicast gids start Jack> with 0xFF in 0-th byte. (IB Spec v1.2,section 4.1.1 (page Jack> 144)). This seems like a strange place to check the condition. Why do we want to allow kernel callers to use invalid MGIDs? In other words it would seem more appropriate to me to put the check in verbs.c. Also no Signed-off-by: line for your patch... - R. From rolandd at cisco.com Sat Sep 24 14:15:40 2005 From: rolandd at cisco.com (Roland Dreier) Date: Sat, 24 Sep 2005 14:15:40 -0700 Subject: [openib-general] Re: [PATCH] add cq error events In-Reply-To: <20050922103159.GE31820@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 22 Sep 2005 13:31:59 +0300") References: <20050922103159.GE31820@mellanox.co.il> Message-ID: <52hdca88w3.fsf@cisco.com> Michael> The following implements reporting error events in mthca. Michael> (I've renamed mthca_cq_event to mthca_cq_completion, for Michael> consistency with qp events). Looks good, I've checked it in to svn. To be a good kernel citizen I'll wait for 2.6.15 to merge it upstream (since we're in bugfix-only time right now). Michael> As a side note, the spec says: "Two types of CQ errors Michael> can occur: the CQ can overrun or it can become Michael> inaccessible": I wander whether this should be Michael> interpreted in a sense that that there should be two Michael> types of events: IB_EVENT_CQ_OVERRUN and Michael> IB_EVENT_CQ_ACCESS, rather than just a generic Michael> IB_EVENT_CQ_ERR Yes, this seems useful to me. The reason is that a CQ overrun indicates a bug in the consumer, and a CQ access error indicates a bug in the verbs implementation. So it's useful to be able to tell whose fault a CQ error is. - R. From yaronh at voltaire.com Sat Sep 24 21:53:03 2005 From: yaronh at voltaire.com (Yaron Haviv) Date: Sun, 25 Sep 2005 07:53:03 +0300 Subject: [openib-general][PATCH][RFC]: CMA IB implementation Message-ID: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Sean Hefty > Sent: Thursday, September 22, 2005 12:28 PM > To: Guy German > Cc: Openib > Subject: Re: [openib-general][PATCH][RFC]: CMA IB implementation > > Guy German wrote: > > I don't think this layer should replace ib_at. If you think there are > > things to be fixed in the ib_at, I suggest we fix them. I do believe > > that the original purpose of this generic cm was to serve ulps that > > don't want to be transport oriented (e.g. iSER). > > Based on discussions from last month, the general agreement was to use CM > private data in place of ATS. Once that's done, I don't see a need for > ib_at. > (Also, put simply, I don't believe that ATS can work.) I think that a > combination of what Roland, including his original API design, and Yaron > proposed is the right direction to go. > Sean, my response is somewhat behind Any way ib_at doesn't depend or directly connect to ATS ATS was just one way to translate IP to GID IB_AT provides a way to eventually translate src/dst IP + QoS attributes to a set of layer 2 attributes and QP parameters in one place for few ULPs And with potential enhancements to implement central address cache and central QoS & Partitioning configuration mechanism. Basically it's the IB equivalent of TCP/IPs IP & Eth resolution and routing layers. Having said that it doesn't really matter if its part of the CM or external if we keep the functionality and implementation To address partitioning IB_AT suggest using the P_Key value derived from the IPoIB interface, also allowing a consumer/ULP to override those values with its own. This forming the exact behavior as you would expect from an Ethernet or iWarp mapping the RDMA sessions to the VLAN used by that Interface. To address QoS IB_AT model suggest taking by default the SL value from the IPoIB interface of that subnet which took it from the SA MCRecord (can override that with ULP). This allows a user to create two subnets over the fabric each mapped to a different SL/VL with its BW/Priority reservation, and on the ULP side he just needs to config ULP with different BW requirements to work over a different subnet (which is what people already do today in many cases since they use separate fabrics for e.g. one for NFS and one for MPI) The API was also designed to let users override the default values derived from IPoIB, so a sophisticated user/ulp can always get the best granularity. Yaron > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From eitan at mellanox.co.il Sat Sep 24 22:36:51 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 25 Sep 2005 08:36:51 +0300 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <4335B096.2060901@mellanox.co.il> References: <4335B096.2060901@mellanox.co.il> Message-ID: <43363773.6090605@mellanox.co.il> Hi Hal, Seems I was able to reproduce the osmtest failure (hope same one Viswa see). I have left it running for a while on a machine and after 736 iterations it failed. Once it did - I stopped the loop. From osm.log I see: Sep 25 02:50:56 463143 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory). ... Sep 25 02:50:57 463991 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory). ... Sep 25 02:50:58 463751 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory). Sep 25 02:50:59 462938 [C004] -> __osm_sr_rcv_respond: [ Sep 25 02:50:59 462955 [C004] -> __osm_sr_rcv_respond: Generating response with 744 records. ... Sep 25 02:50:59 463489 [C004] -> osm_vendor_send: RMPP 1 length 131000 Sep 25 02:50:59 463518 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory). Sep 25 02:50:59 463549 [C004] -> __osm_sa_mad_ctrl_send_err_callback: [ Sep 25 02:50:59 463566 [C004] -> __osm_sa_mad_ctrl_send_err_callback: ERR 1A06: MAD transaction completed in error. From osmtest I get: Sep 25 02:50:56 461412 [4000] -> osmt_get_all_services_and_check_names: Getting All Service Records Sep 25 02:50:56 461429 [4000] -> osmv_query_sa: [ Sep 25 02:50:56 461445 [4000] -> osmv_query_sa DBG:001 SVC_REC_BY_NAME Sep 25 02:50:56 461462 [4000] -> __osmv_send_sa_req: [ Sep 25 02:50:56 461478 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: [ Sep 25 02:50:56 461498 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: Using previously stored lid:0x0001 sm_lid:0x0001 Sep 25 02:50:56 461515 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: ] Sep 25 02:50:56 461555 [4000] -> osm_mad_pool_get: [ ... Sep 25 02:51:00 461961 [8003] -> umad_receiver: ERR 5409: send completed with error (method=12 attr=31) -- dropping. Sep 25 02:51:00 461979 [8003] -> umad_receiver: ERR 5410: class 0x3 LID 0x0 Is it possible there is a max limit on MAD size in umad? It seems the SM fails to allocate the size of the MAD required for answering the "get all service records" query. Another interesting message is the last message saying "umad_receiver: ERR 5410: class 0x3 LID 0x0" Why is the reported LID 0 ? Will you be able to handle the mad allocation? Please advice Eitan Eitan Zahavi wrote: > Hi Viswa, > > Please run step 4 with verbose : osmtest -f a -V -l /tmp/osmtest.log > If it fails - please send us one copy of the /tmp/osmtest.log > > This is just a guess but I think the "bug" will be in the fact that the SM > did had a chance to completely cleanup between the tests and the tests are > picky about the SM state (like number of services, multicast groups etc. > > We will try to reproduce in here too. > > Thanks > > Eitan > > Viswanath Krishnamurthy wrote: > >>On 23 Sep 2005 13:49:31 -0400, Hal Rosenstock < halr at voltaire.com> wrote: >> >>Hi Viswa, >> >>On Fri, 2005-09-23 at 13:43, Viswanath Krishnamurthy wrote: >> >> >>>More information, >>> >>>The test case is as follows >>> >>>1. Start opensm in verbose mode (-V) >>>2. Ping remote node >>>3. osmtest -f c >>>4. osmtest -f a >>>5. pkill -9 opensm >>>6. Repeat over >>> >>>Out of about 2500 iterations, 143 osmtest failed. Keep in mind, >>>only Step 4 failed. >> >> >>Yes. >> >>Do you see any port LEDs on the switch blink indicating the port went >>down from active and back while running this ? >> >> >> >>No, I ran this test overnight and logged the results. I will try it next week and let you know. >> >> >> >> >> >>>Step 3 which is inventory file creation *never* failed. (I think >>>inventory file creation also talks to SA right ?) >> >> >>Right. >> >>-- Hal >> >> >> >> >> > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From eitan at mellanox.co.il Sat Sep 24 22:52:53 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 25 Sep 2005 08:52:53 +0300 Subject: [openib-general] Forcing IB link state down In-Reply-To: <1127499860.15613.15368.camel@hal.voltaire.com> References: <1127499860.15613.15368.camel@hal.voltaire.com> Message-ID: <43363B35.108@mellanox.co.il> Hi Viswa, A Tcl API for sending and receiving MADs is available under https://openib.org/svn/gen2/utils/src/linux-user/ibis (IBIS == IB Inband Services) If you follow on the autogen.sh && configure && make && make install you should get /usr/local/bin/ibis executable. Then the following script would do the trick. NOTE: when you force a port down you need to remember that if that port is used for returning the response to the Set command - no response will be possible and the following script will say: "You probably turned off the port you came in through as you got no response!" Hope this will meet your requirements. Eitan #!/bin/sh # This file invokes the IBIS \ export TCLLIBPATH="/usr/local/lib"; \ export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH; \ exec /usr/local/bin/ibis "$0" "$@" proc usage {} { global argv0 puts "Usage: $argv0 " exit 1 } # parse command line if {[llength $argv] != 3} { usage } set lid [lindex $argv 0] set port [lindex $argv 1] set state [lindex $argv 2] if {[lsearch {UP DOWN} $state] < 0} { usage } # init the IBIS ibis_init # attach to teh default port (as selected by UMAD) IBIS will abort if fail ibis_set_port [lindex [lindex [ibis_get_local_ports_info] 0] 0] # Query the PortInfo to get all its parameters and verify it is there... set r [smPortInfoMad getByLid $lid $port] if {$r != 0} { puts "Could not query the target port. Got code:$r" exit 1 } # Do we set the physical port state polling or disabled? if {$state == "UP"} { smPortInfoMad configure -state_info2 0x22 -state_info1 0x0 } else { smPortInfoMad configure -state_info2 0x32 -state_info1 0x0 } # send and see we got set r [smPortInfoMad setByLid $lid $port] if {$r == 6} { puts "You probably turned off the port you came in through as you got no response!" exit 0 } if {$r != 0} { puts "set failed with code $r" exit 1 } exit 0 Hal Rosenstock wrote: > On Fri, 2005-09-23 at 14:23, Viswanath Krishnamurthy wrote: > >>I was looking if mthca driver has any API/ioctl to disable/enable the >>link.. > > > I don't know, Roland ? I think this would be done via SM MADs. > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From yael at mellanox.co.il Sun Sep 25 00:12:28 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 25 Sep 2005 10:12:28 +0300 Subject: [openib-general] [PATCH] Opensm - ignore strict-aliasing warning Message-ID: <5z3bnty61v.fsf@mtl066.yok.mtl.com> Hi Hal, This patch resolves the ERROR numbering issue. As you mentioned - there was a problem with the cl_event_wheel.c. The patch fixes the error there to match the rest of the opensm code. Also, there was double use of numbering in osm_db_files.c and osm_vendor_mlx_ts_anafa.c, so I changed the error numbers in osm_db_files.c Thanks, Yael Signed-off-by: Yael Kalka Index: complib/cl_event_wheel.c =================================================================== --- complib/cl_event_wheel.c (revision 3532) +++ complib/cl_event_wheel.c (working copy) @@ -187,7 +187,7 @@ __cl_event_wheel_callback( IN void* cont if (cl_status != CL_SUCCESS) { osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, - "__cl_event_wheel_callback : ERROR 1000: " + "__cl_event_wheel_callback : ERR 6100: " "Failed to start timer\n" ); } } @@ -231,7 +231,7 @@ cl_event_wheel_init( if (cl_status != CL_SUCCESS) { osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, - "cl_event_wheel_init : ERROR 1000: " + "cl_event_wheel_init : ERR 6101: " "Failed to initialize cl_spinlock\n" ); goto Exit; } @@ -246,7 +246,7 @@ cl_event_wheel_init( if (cl_status != CL_SUCCESS) { osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, - "cl_event_wheel_init : ERROR 1000: " + "cl_event_wheel_init : ERR 6102: " "Failed to initialize cl_timer\n" ); goto Exit; } @@ -432,7 +432,7 @@ cl_event_wheel_reg( if (cl_status != CL_SUCCESS) { osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, - "cl_event_wheel_reg : ERROR 1000: " + "cl_event_wheel_reg : ERR 6103: " "Failed to start timer\n" ); goto Exit; } Index: opensm/osm_db_files.c =================================================================== --- opensm/osm_db_files.c (revision 3532) +++ opensm/osm_db_files.c (working copy) @@ -177,7 +177,7 @@ osm_db_init( if (mkdir(p_db_imp->db_dir_name, 777)) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_init: ERR 6901: " + "osm_db_init: ERR 6101: " " Fail to create the db directory:%s\n", p_db_imp->db_dir_name); OSM_LOG_EXIT( p_log ); @@ -233,7 +233,7 @@ osm_db_domain_init( if (! p_file) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_domain_init: ERR 6902: " + "osm_db_domain_init: ERR 6102: " " Fail to open the db file:%s\n", p_domain_imp->file_name); cl_free(p_domain_imp); @@ -288,7 +288,7 @@ osm_db_restore( if (! p_file) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_restore: ERR 6903: " + "osm_db_restore: ERR 6103: " " Fail to open the db file:%s\n", p_domain_imp->file_name); status = 1; @@ -327,7 +327,7 @@ osm_db_restore( if (! p_first_word) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_restore: ERR 6904: " + "osm_db_restore: ERR 6104: " " Fail to get key from line:%u : %s\n", line_num, sLine); status = 1; @@ -353,7 +353,7 @@ osm_db_restore( else if (sLine[0] != '\n') { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_restore: ERR 6905: " + "osm_db_restore: ERR 6105: " " How did we get here? line:%u : %s\n", line_num, sLine); status = 1; @@ -375,7 +375,7 @@ osm_db_restore( (st_data_t*)&p_prev_val)) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_restore: ERR 6906: " + "osm_db_restore: ERR 6106: " " Key:%s already exists in:%s with value:%s." " Removing it.\n", p_key, @@ -454,7 +454,7 @@ osm_db_store( if (! p_file) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_store: ERR 6907: " + "osm_db_store: ERR 6107: " " Fail to open the db file:%s for writing\n", p_domain_imp->file_name); status = 1; @@ -469,7 +469,7 @@ osm_db_store( if (status) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_store: ERR 6908: " + "osm_db_store: ERR 6108: " " Fail to rename the db file to:%s (err:%u)\n", p_domain_imp->file_name, status); } From yael at mellanox.co.il Sun Sep 25 00:14:37 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 25 Sep 2005 10:14:37 +0300 Subject: [openib-general] [PATCH] Opensm - error numbering Message-ID: <5z1x3dy5ya.fsf@mtl066.yok.mtl.com> Hi Hal, Sorry, this mail was sent with wrong subject.... This patch resolves the ERROR numbering issue. As you mentioned - there was a problem with the cl_event_wheel.c. The patch fixes the error there to match the rest of the opensm code. Also, there was double use of numbering in osm_db_files.c and osm_vendor_mlx_ts_anafa.c, so I changed the error numbers in osm_db_files.c Thanks, Yael Signed-off-by: Yael Kalka Index: complib/cl_event_wheel.c =================================================================== --- complib/cl_event_wheel.c (revision 3532) +++ complib/cl_event_wheel.c (working copy) @@ -187,7 +187,7 @@ __cl_event_wheel_callback( IN void* cont if (cl_status != CL_SUCCESS) { osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, - "__cl_event_wheel_callback : ERROR 1000: " + "__cl_event_wheel_callback : ERR 6100: " "Failed to start timer\n" ); } } @@ -231,7 +231,7 @@ cl_event_wheel_init( if (cl_status != CL_SUCCESS) { osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, - "cl_event_wheel_init : ERROR 1000: " + "cl_event_wheel_init : ERR 6101: " "Failed to initialize cl_spinlock\n" ); goto Exit; } @@ -246,7 +246,7 @@ cl_event_wheel_init( if (cl_status != CL_SUCCESS) { osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, - "cl_event_wheel_init : ERROR 1000: " + "cl_event_wheel_init : ERR 6102: " "Failed to initialize cl_timer\n" ); goto Exit; } @@ -432,7 +432,7 @@ cl_event_wheel_reg( if (cl_status != CL_SUCCESS) { osm_log (p_event_wheel->p_log, OSM_LOG_ERROR, - "cl_event_wheel_reg : ERROR 1000: " + "cl_event_wheel_reg : ERR 6103: " "Failed to start timer\n" ); goto Exit; } Index: opensm/osm_db_files.c =================================================================== --- opensm/osm_db_files.c (revision 3532) +++ opensm/osm_db_files.c (working copy) @@ -177,7 +177,7 @@ osm_db_init( if (mkdir(p_db_imp->db_dir_name, 777)) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_init: ERR 6901: " + "osm_db_init: ERR 6101: " " Fail to create the db directory:%s\n", p_db_imp->db_dir_name); OSM_LOG_EXIT( p_log ); @@ -233,7 +233,7 @@ osm_db_domain_init( if (! p_file) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_domain_init: ERR 6902: " + "osm_db_domain_init: ERR 6102: " " Fail to open the db file:%s\n", p_domain_imp->file_name); cl_free(p_domain_imp); @@ -288,7 +288,7 @@ osm_db_restore( if (! p_file) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_restore: ERR 6903: " + "osm_db_restore: ERR 6103: " " Fail to open the db file:%s\n", p_domain_imp->file_name); status = 1; @@ -327,7 +327,7 @@ osm_db_restore( if (! p_first_word) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_restore: ERR 6904: " + "osm_db_restore: ERR 6104: " " Fail to get key from line:%u : %s\n", line_num, sLine); status = 1; @@ -353,7 +353,7 @@ osm_db_restore( else if (sLine[0] != '\n') { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_restore: ERR 6905: " + "osm_db_restore: ERR 6105: " " How did we get here? line:%u : %s\n", line_num, sLine); status = 1; @@ -375,7 +375,7 @@ osm_db_restore( (st_data_t*)&p_prev_val)) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_restore: ERR 6906: " + "osm_db_restore: ERR 6106: " " Key:%s already exists in:%s with value:%s." " Removing it.\n", p_key, @@ -454,7 +454,7 @@ osm_db_store( if (! p_file) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_store: ERR 6907: " + "osm_db_store: ERR 6107: " " Fail to open the db file:%s for writing\n", p_domain_imp->file_name); status = 1; @@ -469,7 +469,7 @@ osm_db_store( if (status) { osm_log( p_log, OSM_LOG_ERROR, - "osm_db_store: ERR 6908: " + "osm_db_store: ERR 6108: " " Fail to rename the db file to:%s (err:%u)\n", p_domain_imp->file_name, status); } From yael at mellanox.co.il Sun Sep 25 00:20:19 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 25 Sep 2005 10:20:19 +0300 Subject: [openib-general] [PATCH] Opensm - discovered lids issue Message-ID: <5zzmq1wr4c.fsf@mtl066.yok.mtl.com> Hi Hal, During our windows checks we noticed an issue in the __osm_lid_mgr_init_sweep function under osm_lid_mgr.c. The initializing of max_persistent_lid and max_discovered_lid is correct only if the vector is not empty. Attached is a patch resolving this issue. Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/osm_lid_mgr.c =================================================================== --- opensm/osm_lid_mgr.c (revision 3395) +++ opensm/osm_lid_mgr.c (working copy) @@ -418,8 +418,12 @@ __osm_lid_mgr_init_sweep( */ /* find the range of lids to scan */ - max_discovered_lid = cl_ptr_vector_get_size(p_discovered_vec) - 1; - max_persistent_lid = cl_ptr_vector_get_size(p_persistent_vec) - 1; + max_discovered_lid = (uint16_t)cl_ptr_vector_get_size(p_discovered_vec); + max_persistent_lid = (uint16_t)cl_ptr_vector_get_size(p_persistent_vec); + + /* but the vectors have one extra entry for lid=0 */ + if (max_discovered_lid) max_discovered_lid--; + if (max_persistent_lid) max_persistent_lid--; if (max_persistent_lid > max_discovered_lid) max_defined_lid = max_persistent_lid; From ikumi___hello_ at yahoo.com Sun Sep 25 01:15:44 2005 From: ikumi___hello_ at yahoo.com (=?ISO-2022-JP?B?GyRCP3k7MzAmSH4bKEI=?=) Date: Sun, 25 Sep 2005 01:15:44 -0700 (PDT) Subject: [openib-general] =?iso-2022-jp?b?GyRCIUohMSFBITEhSyZOGyhCIA==?= Message-ID: http://lovers-god.com/index.html?media=pc253 �ŋ߁u�l�b�g�z�X�g�v�Ƃ��u�t���v�Ȃ�Č��t���͂���Ă���́A�����m�ł����H �{���ɂ��������̂��Ă���Ǝv���܂��H �N�����āu�͂��H�v���Ďv����ˁB�������Ă������������B �ł�A��������������Ԃ������āASEX���������āA������Ƃ�������ق����Ȃ��Ǝv���Ă���Ȃ�A���ЃA�N�Z�X���Ă݂āB ���S����������z���g�A���Ԃ�����Ȃ����Ă݂Ă��Ċ����ł��B http://lovers-god.com/index.html?media=pc253 ���ǁA��w��͂��߂Ƃ��ĕ��i�Ȃ��Ȃ��v���؂������Ƃ��ł��Ȃ�������SEX�ɑ΂��Ă̕s���x�����Ȃ荂�����Ă��ƂȂ񂾂ˁB �l���Ă݂�Ƃ�����܂��̘b�ŁB ����ł����������J���C�C���̎q�ƃ����Ă�A��������10������Ɠ�S�O���C���ɂȂ����Ⴄ�ł���H ���ꂪ���A���Ƃ��Ύ�w�Ƃ����Ƒ���͎��������I���W�Ȃ킯���B��������‚��邩��m��Ȃ��悤��SEX�ŁA��������Ȃ�W���E�E�E�i����A�������b�ł����ǁi���j�j�B ����Ⴝ�܂�Ȃ���ˁB ����𕥂��Ă���̒j�ɐӂ߂Ăق����A�Ǝv���̂́A�܂��킩��b���ˁB http://lovers-god.com/index.html?media=pc253 From halr at voltaire.com Sun Sep 25 02:22:52 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 25 Sep 2005 05:22:52 -0400 Subject: [openib-general] Re: Another opensm problem ? In-Reply-To: <4335BA72.9080203@mellanox.co.il> References: <1127516065.4398.1937.camel@hal.voltaire.com> <4335BA72.9080203@mellanox.co.il> Message-ID: <1127640172.4398.27454.camel@hal.voltaire.com> On Sat, 2005-09-24 at 16:43, Eitan Zahavi wrote: > Well, if this is the case then OpenSM is might stop responding due to the following features: > 1. We had in the past cases where bad hardware continuously flooded the SM with Traps. > To protect against this kind of DOS attack we have implemented an adaptive filter in > the SM trap receiver: > If the exact same trap is received continuously from same source more then 10 times > (with no more then of 5sec between the traps) they are considered DOS and are ignored. > Please see osm_trap_rcv.c for details. > 2. The way IB switches work is that each time a port of their changes state they: > a. Set the "change bit" in the SwitchInfo > b. Send a trap 128 to the SM. But Trap 128 does not carry the changed port number. > > So under a test case like you describe what can happen: > 1. The SM decides to ignore trap 128 from the switch as more then 5 connect/reconnect sequences > happen with not enough "quite" time to recover. > 2. The SwitchInfo ChangeBit is sampled during the OSM light sweep. There is a race between the > reading of the change bit and the clearing of it. If the connect disconnect happen very fast > the change bit set by the re-connect can be cleaned by the clear starting by the disconnect. > > It is easy to see in the log file if the SM did ignore traps. Run with -V and look for: > grep "Continuously received this trap" /var/log/osm.log This is what is happening. So the policy is 5 reconnect sequences without coming up ? What's not quite enough time for recovery Is this settable ? > (for some reason I did not get any log attachments with this thread - otherwise I would > do some analysis on it too). I will forward separately. This was too big for the list. -- Hal From halr at voltaire.com Sun Sep 25 03:07:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 25 Sep 2005 06:07:49 -0400 Subject: [openib-general] Re: opensm and SIGINT In-Reply-To: <43363773.6090605@mellanox.co.il> References: <4335B096.2060901@mellanox.co.il> <43363773.6090605@mellanox.co.il> Message-ID: <1127642868.4398.27995.camel@hal.voltaire.com> Hi Eitan, On Sun, 2005-09-25 at 01:36, Eitan Zahavi wrote: > Hi Hal, > > Seems I was able to reproduce the osmtest failure (hope same one Viswa see). ^^^ an osmtest failure I don't think it's the same one. This looks quite different. > I have left it running for a while on a machine and after 736 > iterations it failed. Once it did - I stopped the loop. > > From osm.log I see: > Sep 25 02:50:56 463143 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory). > ... > Sep 25 02:50:57 463991 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory). > ... > Sep 25 02:50:58 463751 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory). > > Sep 25 02:50:59 462938 [C004] -> __osm_sr_rcv_respond: [ > Sep 25 02:50:59 462955 [C004] -> __osm_sr_rcv_respond: Generating response with 744 records. > ... > Sep 25 02:50:59 463489 [C004] -> osm_vendor_send: RMPP 1 length 131000 That sounds right for 744 service records. > Sep 25 02:50:59 463518 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory). > Sep 25 02:50:59 463549 [C004] -> __osm_sa_mad_ctrl_send_err_callback: [ > Sep 25 02:50:59 463566 [C004] -> __osm_sa_mad_ctrl_send_err_callback: ERR 1A06: MAD transaction completed in error. > > From osmtest I get: > Sep 25 02:50:56 461412 [4000] -> osmt_get_all_services_and_check_names: Getting All Service Records > Sep 25 02:50:56 461429 [4000] -> osmv_query_sa: [ > Sep 25 02:50:56 461445 [4000] -> osmv_query_sa DBG:001 SVC_REC_BY_NAME > Sep 25 02:50:56 461462 [4000] -> __osmv_send_sa_req: [ > Sep 25 02:50:56 461478 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: [ > Sep 25 02:50:56 461498 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: Using previously stored lid:0x0001 sm_lid:0x0001 > Sep 25 02:50:56 461515 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: ] > Sep 25 02:50:56 461555 [4000] -> osm_mad_pool_get: [ > ... > Sep 25 02:51:00 461961 [8003] -> umad_receiver: ERR 5409: send completed with error (method=12 attr=31) -- dropping. > Sep 25 02:51:00 461979 [8003] -> umad_receiver: ERR 5410: class 0x3 LID 0x0 > > Is it possible there is a max limit on MAD size in umad? The memory allocation is just using calloc. > It seems the SM fails to allocate the size of the MAD required > for answering the "get all service records" query. It looks like it may have run out of memory just before this. > Another interesting message is the last message saying > "umad_receiver: ERR 5410: class 0x3 LID 0x0" Why is the reported LID 0 ? Not sure. I'll look into it. This is only "cosmetic" (e.g. informational). > Will you be able to handle the mad allocation? Not sure what you mean by this question. I think this must be a memory leak situation. What was your osmtest invocation for this ? I may have some questions about this as I investigate further. -- Hal From mst at mellanox.co.il Sun Sep 25 03:45:59 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Sun, 25 Sep 2005 13:45:59 +0300 Subject: [openib-general] Re: FW: SDP problems with 64K page size In-Reply-To: <52ll1pdjm8.fsf@cisco.com> References: <52ll1pdjm8.fsf@cisco.com> Message-ID: <20050925104559.GT31820@mellanox.co.il> Quoting r. Roland Dreier : > Subject: FW: SDP problems with 64K page size > > Hi, Jerome asked me to forward this on, since for some reason his > email didn't appear when he sent it. > > In any case there seem to be some PAGE_SIZE dependencies in SDP. > Libor provided a patch that fixed this up a while ago, but I don't > know if this is the right way to handle this. > > - R. Roland, thanks very much for forwarding this, and for providing a patch to Jerome. The problem is with recv_size/send_size counters in SDP, which are u16, so that assigning a value of 64K overflows them. The best way to fix this appears to be to bump the counters up to u32 or s32. The patch that Roland posted fixes this by reducing the buffer size to 16K, so that should work, too. Roland, I might check in the patch that you posted to work around this problem for 64K page users, until I have a final fix ready. Is that OK with everyone? Thanks, -- MST From yipeeyipeeyipeeyipee at yahoo.com Sun Sep 25 04:16:45 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Sun, 25 Sep 2005 11:16:45 +0000 (UTC) Subject: [openib-general] OpenSM & pkeys Message-ID: Hi, How do I tell the current OpenSM to set specific pkeys to hca ports of some of my host nodes? Can OpenSM tell me what are the pkeys for all the host nodes in the fabric? Thanks, s From service at openib.org Sun Sep 25 04:37:53 2005 From: service at openib.org (service at openib.org) Date: Sun, 25 Sep 2005 17:37:53 +0600 Subject: [openib-general] Your Account is Suspended For Security Reasons Message-ID: <0INE006I6D3WXV@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: account-report.zip Type: application/octet-stream Size: 53530 bytes Desc: not available URL: From Administrator at openib.org Sun Sep 25 04:37:29 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Sun, 25 Sep 2005 06:37:29 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <001101c5c1c5$829e07c0$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Your Account is Suspended For Security Reasons Scanning time = 9/25/2005 6:37:29 AM Engine/Pattern = 7.510-1002/2.855.00 Action on virus found: The attachment account-report.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/25/2005 account-report.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] Your Account is Suspended For Security Reasons From eitan at mellanox.co.il Sun Sep 25 06:02:54 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Sun, 25 Sep 2005 16:02:54 +0300 Subject: [openib-general] OpenSM & pkeys Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3069237@mtlexch01.mtl.com> Hi Yipee, Current OpenSM implementation does not support PKey assignment. There is no programmatic way for you to get the PKey tables on the HCAs from within the OpenSM. However, if you turn on verbose mode (-V) you should be able to see the PKeyTable dump in the log file. We had PKey manager support optionally planned for implementation this quarter. However, the pace of stabilizing OpenSM on OpenIB stack is slower then expected and we might not be able to make it this quarter. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: yipee [mailto:yipeeyipeeyipeeyipee at yahoo.com] > Sent: Sunday, September 25, 2005 2:17 PM > To: openib-general at openib.org > Subject: [openib-general] OpenSM & pkeys > > Hi, > > How do I tell the current OpenSM to set specific pkeys to hca ports of some of > my host nodes? > Can OpenSM tell me what are the pkeys for all the host nodes in the fabric? > > > Thanks, > s > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Sun Sep 25 09:39:34 2005 From: rolandd at cisco.com (Roland Dreier) Date: Sun, 25 Sep 2005 09:39:34 -0700 Subject: [openib-general] Re: FW: SDP problems with 64K page size In-Reply-To: <20050925104559.GT31820@mellanox.co.il> (Michael S. Tsirkin's message of "Sun, 25 Sep 2005 13:45:59 +0300") References: <52ll1pdjm8.fsf@cisco.com> <20050925104559.GT31820@mellanox.co.il> Message-ID: <5264sp85kp.fsf@cisco.com> Michael> Roland, I might check in the patch that you posted to Michael> work around this problem for 64K page users, until I have Michael> a final fix ready. Is that OK with everyone? Fine with me... you might want to use this patch instead (change from min() to min_t() to avoid some compile warnings): Limit SDP buffers to 16K to avoid overflows with 64K pages. Signed-off-by: Roland Dreier Index: infiniband/ulp/sdp/sdp_buff.c =================================================================== --- infiniband/ulp/sdp/sdp_buff.c (revision 3531) +++ infiniband/ulp/sdp/sdp_buff.c (working copy) @@ -330,7 +330,7 @@ static struct sdpc_buff *sdp_buff_pool_a return NULL; } - buff->end = buff->head + PAGE_SIZE; + buff->end = buff->head + sdp_buff_pool_buff_size(); buff->data = buff->head; buff->tail = buff->head; buff->sge.lkey = 0; @@ -350,7 +350,7 @@ int sdp_buff_pool_init(void) int result; main_pool.pool_cache = kmem_cache_create("sdp_buff_pool", - PAGE_SIZE, + sdp_buff_pool_buff_size(), 0, 0, NULL, NULL); if (!main_pool.pool_cache) { Index: infiniband/ulp/sdp/sdp_buff.h =================================================================== --- infiniband/ulp/sdp/sdp_buff.h (revision 3531) +++ infiniband/ulp/sdp/sdp_buff.h (working copy) @@ -101,6 +101,6 @@ struct sdpc_buff_root { */ #define sdp_buff_q_size(pool) ((pool)->size) -#define sdp_buff_pool_buff_size() PAGE_SIZE +#define sdp_buff_pool_buff_size() min_t(int, PAGE_SIZE, 16384) #endif /* _SDP_BUFF_H */ From nacc at us.ibm.com Sun Sep 25 12:14:26 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Sun, 25 Sep 2005 12:14:26 -0700 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <20050924181108.GB28695@us.ibm.com> References: <20050924074611.GD3950@us.ibm.com> <52psqy8jt2.fsf@cisco.com> <20050924181108.GB28695@us.ibm.com> Message-ID: <20050925191426.GA5079@us.ibm.com> On 24.09.2005 [11:11:08 -0700], Nishanth Aravamudan wrote: > On 24.09.2005 [10:19:53 -0700], Roland Dreier wrote: > > Nish> I have a prototype of something similar running right now, > > Nish> to help test InfiniBand, both in mainline and in the svn > > Nish> repo. Basically, every night (this part hasn't been set up > > Nish> yet, but should be nothing more than a crontab entry), I can > > Nish> spawn a build job for InfiniBand. Currently, it will only > > Nish> cover compile-testing in the following sense: build current > > Nish> -git with IB options set to =y and =m in x86 and ppc64; and > > Nish> build current -git with the current svn code linked and IB > > Nish> options set to =y and =m in x86 and ppc64. > > > > This is great, thanks! The build of latest git + latest svn might not > > always succeed, because we try to keep svn working with the latest > > full kernel release, but it's still very helpful to get advance > > warning of API changes that will break our tree. > > > > Nish> I have attached below my results from 2.6.14-rc2-git3. Only > > Nish> build failure was the gen2 kernel code under ppc64 with > > Nish> everything set to y. > > > > I just checked in a fix for this -- the pci_pretty_name() API has gone > > away, so I removed our use of it in svn. I don't understand how your > > other builds of git + svn succeeded though, since pci_pretty_name is > > completely gone. Oh, I guess you'll miss link failures when building > > modules, so functions that disappear won't break the build. Still, > > how did the x86 =y build succeed? > > And, in fact, the x86 =y build also fails, same issue (now that I've > found a consistently working machine, shouldn't run into the gcc > problems again; we tend not to update the test machines). 2.6.14-rc2-git5 with svn 3534 builds fine on x86 and ppc64, as does mainline alone. Thanks, Nish From support at openib.org Sun Sep 25 16:37:02 2005 From: support at openib.org (support at openib.org) Date: Mon, 26 Sep 2005 05:37:02 +0600 Subject: [openib-general] *DETECTED* Online User Violation Message-ID: <0INF006PMADZXV@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: account-details.zip Type: application/octet-stream Size: 53532 bytes Desc: not available URL: From Administrator at openib.org Sun Sep 25 16:35:59 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Sun, 25 Sep 2005 18:35:59 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <001401c5c229$e28100c0$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] *DETECTED* Online User Violation Scanning time = 9/25/2005 6:35:59 PM Engine/Pattern = 7.510-1002/2.855.00 Action on virus found: The attachment account-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/25/2005 account-details.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] *DETECTED* Online User Violation From jackm at mellanox.co.il Sun Sep 25 23:36:41 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Mon, 26 Sep 2005 09:36:41 +0300 Subject: [openib-general] Re: [PATCH] check for valid MGID in user space In-Reply-To: <52ll1m894h.fsf@cisco.com> References: <52ll1m894h.fsf@cisco.com> Message-ID: <20050926063641.GA15117@mellanox.co.il> I was operating under the assumption that kernel code is trusted code (so as not to burden the kernel will too many tests of correctness. Below is a patch for verbs.c, which will accomplish the same check for both kernel user space applications. In addition, since the fix is on the same line, the patch also checks that the QP we are trying to attach/detach is UD only (per IB Spec 11.3.1). Signed-off-by: Jack Morgenstein Index: linux-kernel/infiniband/core/verbs.c =================================================================== --- linux-kernel/infiniband/core/verbs.c (revision 3532) +++ linux-kernel/infiniband/core/verbs.c (working copy) @@ -523,16 +523,16 @@ int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) { - return qp->device->attach_mcast ? - qp->device->attach_mcast(qp, gid, lid) : - -ENOSYS; + return !qp->device->attach_mcast ? -ENOSYS : + (gid->raw[0] != 0xFF || qp->qp_type != IB_QPT_UD) ? -EINVAL : + qp->device->attach_mcast(qp, gid, lid); } EXPORT_SYMBOL(ib_attach_mcast); int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) { - return qp->device->detach_mcast ? - qp->device->detach_mcast(qp, gid, lid) : - -ENOSYS; + return !qp->device->detach_mcast ? -ENOSYS : + (gid->raw[0] != 0xFF || qp->qp_type != IB_QPT_UD) ? -EINVAL : + qp->device->detach_mcast(qp, gid, lid); } EXPORT_SYMBOL(ib_detach_mcast); On Sun, Sep 25, 2005 at 12:10:38AM +0300, Roland Dreier wrote: > Jack> The following patch checks validity of MGID when > Jack> attaching/detaching a QP to/from a multicast group (for > Jack> user-space only). IB spec demands that multicast gids start > Jack> with 0xFF in 0-th byte. (IB Spec v1.2,section 4.1.1 (page > Jack> 144)). > > This seems like a strange place to check the condition. Why do we > want to allow kernel callers to use invalid MGIDs? In other words it > would seem more appropriate to me to put the check in verbs.c. > > Also no Signed-off-by: line for your patch... > > - R. From yipeeyipeeyipeeyipee at yahoo.com Mon Sep 26 01:12:15 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Mon, 26 Sep 2005 08:12:15 +0000 (UTC) Subject: [openib-general] Re: OpenSM & pkeys References: <6AB138A2AB8C8E4A98B9C0C3D52670E3069237@mtlexch01.mtl.com> Message-ID: Eitan Zahavi mellanox.co.il> writes: [cut] > We had PKey manager support optionally planned for implementation this quarter. > However, the pace of stabilizing OpenSM on OpenIB stack is slower then expected  and we might not be able to make it this quarter. Thanks for the response, Can I count on PKey support to be added in Q4? Thanks, n From eitan at mellanox.co.il Mon Sep 26 04:08:19 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 26 Sep 2005 14:08:19 +0300 Subject: [openib-general] Re: OpenSM & pkeys Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E306924B@mtlexch01.mtl.com> Hi Yipee, I am sorry to disappoint you again. The level of interest for PKey support is not very high so I am not sure it will not shift gain. But I hope it will not. Are there any other people interested in PKey management feature in OpenSM??? Please speak up. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: yipee [mailto:yipeeyipeeyipeeyipee at yahoo.com] > Sent: Monday, September 26, 2005 11:12 AM > To: openib-general at openib.org > Subject: [openib-general] Re: OpenSM & pkeys > > Eitan Zahavi mellanox.co.il> writes: > [cut] > > > We had PKey manager support optionally planned for implementation this > quarter. > > However, the pace of stabilizing OpenSM on OpenIB stack is slower then > expected  and we might not be able to make it this quarter. > > Thanks for the response, > Can I count on PKey support to be added in Q4? > > > Thanks, > n > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Sep 26 05:05:43 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 08:05:43 -0400 Subject: [openib-general] OpenSM & pkeys In-Reply-To: References: Message-ID: <1127736342.4398.49770.camel@hal.voltaire.com> On Sun, 2005-09-25 at 07:16, yipee wrote: > How do I tell the current OpenSM to set specific pkeys to hca ports of some of > my host nodes? As Eitan indicated, OpenSM does not currently support this. > Can OpenSM tell me what are the pkeys for all the host nodes in the fabric? There is diagnostics support for this. smpdump can retrieve this and other SM attribute information. -- Hal From halr at voltaire.com Mon Sep 26 05:15:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 08:15:25 -0400 Subject: [openib-general] Re: OpenSM & pkeys In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E3069237@mtlexch01.mtl.com> Message-ID: <1127736353.4398.49774.camel@hal.voltaire.com> Hi, On Mon, 2005-09-26 at 04:12, yipee wrote: > Eitan Zahavi mellanox.co.il> writes: > [cut] > > > We had PKey manager support optionally planned for implementation this > quarter. > > However, the pace of stabilizing OpenSM on OpenIB stack is slower then > expected and we might not be able to make it this quarter. > > Thanks for the response, > Can I count on PKey support to be added in Q4? Can you share any requirements relative to this (or alternatively review a proposal) ? The policy on PKey management (partition management) is beyond the IB spec. Thanks. -- Hal From halr at voltaire.com Mon Sep 26 05:21:52 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 08:21:52 -0400 Subject: [openib-general] OpenSM & pkeys In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3069237@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3069237@mtlexch01.mtl.com> Message-ID: <1127736436.4398.49792.camel@hal.voltaire.com> On Sun, 2005-09-25 at 09:02, Eitan Zahavi wrote: > Current OpenSM implementation does not support PKey assignment. > There is no programmatic way for you to get the PKey tables on the > HCAs from within the OpenSM. However, if you turn on verbose mode (-V) > you should be able to see the > > PKeyTable dump in the log file. The PKey tables can also be obtained by the diagnostic tools: smpdump in particular. -- Hal From halr at voltaire.com Mon Sep 26 05:25:05 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 08:25:05 -0400 Subject: [openib-general] Re: [PATCH] Opensm - error numbering In-Reply-To: <5z1x3dy5ya.fsf@mtl066.yok.mtl.com> References: <5z1x3dy5ya.fsf@mtl066.yok.mtl.com> Message-ID: <1127736728.4398.49855.camel@hal.voltaire.com> On Sun, 2005-09-25 at 03:14, Yael Kalka wrote: > Sorry, this mail was sent with wrong subject.... The previous one. I could see :-) No big deal... > This patch resolves the ERROR numbering issue. > As you mentioned - there was a problem with the cl_event_wheel.c. The > patch fixes the error there to match the rest of the opensm code. > Also, there was double use of numbering in osm_db_files.c and > osm_vendor_mlx_ts_anafa.c, so I changed the error numbers in osm_db_files.c Thanks. Applied. -- Hal From halr at voltaire.com Mon Sep 26 05:33:33 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 08:33:33 -0400 Subject: [openib-general] Re: [PATCH] Opensm - discovered lids issue In-Reply-To: <5zzmq1wr4c.fsf@mtl066.yok.mtl.com> References: <5zzmq1wr4c.fsf@mtl066.yok.mtl.com> Message-ID: <1127736901.4398.49894.camel@hal.voltaire.com> On Sun, 2005-09-25 at 03:20, Yael Kalka wrote: > During our windows checks we noticed an issue in > the __osm_lid_mgr_init_sweep function under osm_lid_mgr.c. > The initializing of max_persistent_lid and max_discovered_lid is > correct only if the vector is not empty. > Attached is a patch resolving this issue. Thanks. Applied. -- Hal From mst at mellanox.co.il Mon Sep 26 05:49:43 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Sep 2005 15:49:43 +0300 Subject: [openib-general] core and ipoib questions and oops Message-ID: <20050926124943.GF12818@mellanox.co.il> Two questions: 1. Roland, looking at ipoib_multicast, I see if (mcast->query) { ib_sa_cancel_query(mcast->query_id, mcast->query); mcast->query = NULL; ipoib_dbg_mcast(priv, "waiting for MGID " IPOIB_GID_FMT "\n", IPOIB_GID_ARG(mcast->mcmember.mgid)); wait_for_completion(&mcast->done); } what prevents ipoib_mcast_join_complete from running at the same time and changing mcast->query after we've tested it? 2. All, what happends in the core if I call ib_sa_cancel_query while the completion is running, or has already run? Is it possible that there's a bug that makes it possible for a completion callback to run twice in this case? Thanks, MST --- The following oops happends on svn rev 3535. #ifconfig ib0 down Unable to handle kernel NULL pointer dereference at 0000000000000388 RIP: {:ib_ipoib:ipoib_mcast_join_finish+100} PGD 172cd4067 PUD 172d16067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: ib_sdp ib_cm ib_ipoib ib_sa ib_umad ib_mthca ib_mad ib_core Pid: 2399, comm: ib_mad1 Not tainted 2.6.13 RIP: 0010:[] {:ib_ipoib:ipoib_mcast_join_finish+100} RSP: 0018:ffff81017348dc58 EFLAGS: 00010282 RAX: 0000000074010000 RBX: 0000000000000000 RCX: 0000000000000010 RDX: ffff810177d93380 RSI: ffff810177d93380 RDI: ffff810177d93380 RBP: ffff810177d93380 R08: 0000000000000000 R09: ffff81017348dd38 R10: ffff81017348ddf8 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000380 R14: 0000000000000000 R15: ffff810173484898 FS: 0000000000000000(0000) GS:ffffffff8064b800(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000388 CR3: 0000000174725000 CR4: 00000000000006e0 Process ib_mad1 (pid: 2399, threadinfo ffff81017348c000, task ffff8101773e07f0) Stack: 0000000100000092 0000000000000000 0000000000000096 0000000000000296 0000000000000296 ffffffff8028b8b0 0000000000000096 ffff81017d1343c0 ffff810172fd50c0 ffff810172daba10 Call Trace:{dma_pool_free+272} {:ib_ipoib:ipoib_mcast_join_complete+43} {:ib_core:ib_unpack+198} {:ib_sa:ib_sa_mcmember_rec_callback+64} {:ib_sa:recv_handler+117} {:ib_mad:ib_mad_completion_handler+949} {:ib_mad:ib_mad_completion_handler+0} {worker_thread+478} {default_wake_function+0} {__wake_up_common+64} {default_wake_function+0} {keventd_create_kthread+0} {worker_thread+0} {keventd_create_kthread+0} {kthread+204} {child_rip+8} {keventd_create_kthread+0} {kthread+0} {child_rip+0} Code: 49 8b 7d 08 48 81 c7 b4 00 00 00 f3 a6 75 17 49 8b 45 70 8b RIP {:ib_ipoib:ipoib_mcast_join_finish+100} RSP CR2: 0000000000000388 Seems to oops at 0xda4 here: 0000000000000d40 : ipoib_mcast_join_finish(): drivers/infiniband/ulp/ipoib/ipoib_multicast.c:215 d40: 41 56 push %r14 drivers/infiniband/ulp/ipoib/ipoib_multicast.c:223 d42: b9 10 00 00 00 mov $0x10,%ecx d47: fc cld drivers/infiniband/ulp/ipoib/ipoib_multicast.c:215 d48: 41 55 push %r13 d4a: 41 54 push %r12 d4c: 55 push %rbp d4d: 48 89 fd mov %rdi,%rbp d50: 53 push %rbx d51: 48 83 ec 60 sub $0x60,%rsp drivers/infiniband/ulp/ipoib/ipoib_multicast.c:220 d55: 48 8b 06 mov (%rsi),%rax Seems most likely that dev is NULL in the following: static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast, struct ib_sa_mcmember_rec *mcmember) { struct net_device *dev = mcast->dev; struct ipoib_dev_priv *priv = netdev_priv(dev); -- MST From mst at mellanox.co.il Mon Sep 26 06:06:25 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Sep 2005 16:06:25 +0300 Subject: [openib-general] Re: core and ipoib questions and oops In-Reply-To: <20050926124943.GF12818@mellanox.co.il> References: <20050926124943.GF12818@mellanox.co.il> Message-ID: <20050926130625.GH12818@mellanox.co.il> Quoting Michael S. Tsirkin : > #ifconfig ib0 down > > Unable to handle kernel NULL pointer dereference at 0000000000000388 > RIP: > Code: 49 8b 7d 08 48 81 c7 b4 00 00 00 f3 a6 75 17 49 8b 45 70 8b > RIP {:ib_ipoib:ipoib_mcast_join_finish+100} RSP > > CR2: 0000000000000388 Here's a bit more objdump output, and a corrected analysis: > Seems to oops at 0xda4 here: > 0000000000000d40 : > ipoib_mcast_join_finish(): [...] drivers/infiniband/ulp/ipoib/ipoib_multicast.c:223 d9d: 48 89 ee mov %rbp,%rsi drivers/infiniband/ulp/ipoib/ipoib_multicast.c:220 da0: 48 89 47 38 mov %rax,0x38(%rdi) drivers/infiniband/ulp/ipoib/ipoib_multicast.c:223 da4: 49 8b 7d 08 mov 0x8(%r13),%rdi da8: 48 81 c7 b4 00 00 00 add $0xb4,%rdi daf: f3 a6 repz cmpsb %es:(%rdi),%ds:(%rsi) db1: 75 17 jne dca and ipoib_multicast.c:223 is this line: if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4, sizeof (union ib_gid))) { -- MST From jackm at mellanox.co.il Mon Sep 26 06:43:02 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Mon, 26 Sep 2005 16:43:02 +0300 Subject: [openib-general] [PATCH] incorrect atomic attribute returned by ib/v_query_device Message-ID: <20050926134302.GA18503@mellanox.co.il> I'm starting to fix ib_query_device/ibv_query_device -- adding missing fields, correcting values in current fields. Enclosed is a patch for the atomic_cap field. Please review. Thanks. Jack Signed-off-by: Jack Morgenstein Index: linux-kernel/infiniband/core/uverbs_cmd.c =================================================================== --- linux-kernel/infiniband/core/uverbs_cmd.c (revision 3532) +++ linux-kernel/infiniband/core/uverbs_cmd.c (working copy) @@ -199,6 +199,7 @@ resp.max_pkeys = attr.max_pkeys; resp.local_ca_ack_delay = attr.local_ca_ack_delay; resp.phys_port_cnt = file->device->ib_dev->phys_port_cnt; + resp.atomic_cap = attr.atomic_cap; if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) Index: linux-kernel/infiniband/hw/mthca/mthca_dev.h =================================================================== --- linux-kernel/infiniband/hw/mthca/mthca_dev.h (revision 3532) +++ linux-kernel/infiniband/hw/mthca/mthca_dev.h (working copy) @@ -148,6 +148,7 @@ int reserved_mcgs; int num_pds; int reserved_pds; + u32 flags; u8 port_width_cap; }; Index: linux-kernel/infiniband/hw/mthca/mthca_main.c =================================================================== --- linux-kernel/infiniband/hw/mthca/mthca_main.c (revision 3532) +++ linux-kernel/infiniband/hw/mthca/mthca_main.c (working copy) @@ -172,6 +172,7 @@ mdev->limits.reserved_uars = dev_lim->reserved_uars; mdev->limits.reserved_pds = dev_lim->reserved_pds; mdev->limits.port_width_cap = dev_lim->max_port_width; + mdev->limits.flags = dev_lim->flags; /* IB_DEVICE_RESIZE_MAX_WR not supported by driver. May be doable since hardware supports it for SRQ. Index: linux-kernel/infiniband/hw/mthca/mthca_provider.c =================================================================== --- linux-kernel/infiniband/hw/mthca/mthca_provider.c (revision 3532) +++ linux-kernel/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -99,6 +99,8 @@ props->max_qp_rd_atom = 1 << mdev->qp_table.rdb_shift; props->max_qp_init_rd_atom = 1 << mdev->qp_table.rdb_shift; props->local_ca_ack_delay = mdev->limits.local_ca_ack_delay; + props->atomic_cap = (mdev->limits.flags & DEV_LIM_FLAG_ATOMIC) ? + IB_ATOMIC_HCA : IB_ATOMIC_NONE; err = 0; out: From rolandd at cisco.com Mon Sep 26 07:17:42 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 07:17:42 -0700 Subject: [openib-general] 2.6.14 heads up: ip_dev_find() not exported Message-ID: <52slvr7w1l.fsf@cisco.com> I noticed while compiling against an up-to-date kernel tree that SDP and IBAT both use the function ip_dev_find(). The EXPORT_SYMBOL for this function was removed during the 2.6.14 devel cycle. I haven't looked yet at what this function does, how SDP and IBAT use it or what it could be replaced by. But now would be a good time to figure out whether we need to ask for it to be re-exported, or if there's a better alternative to do whatever it does for us. - R. From mst at mellanox.co.il Mon Sep 26 07:41:25 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Sep 2005 17:41:25 +0300 Subject: [openib-general] Re: 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: <52slvr7w1l.fsf@cisco.com> References: <52slvr7w1l.fsf@cisco.com> Message-ID: <20050926144125.GN12818@mellanox.co.il> Quoting r. Roland Dreier : > Subject: 2.6.14 heads up: ip_dev_find() not exported > > I noticed while compiling against an up-to-date kernel tree that SDP > and IBAT both use the function ip_dev_find(). The EXPORT_SYMBOL for > this function was removed during the 2.6.14 devel cycle. > > I haven't looked yet at what this function does, how SDP and IBAT use > it or what it could be replaced by. But now would be a good time to > figure out whether we need to ask for it to be re-exported, or if > there's a better alternative to do whatever it does for us. > > - R. If ip_route_output_key resolves to a loopback device, sdp uses ip_dev_find to try and locate the actual hardware device that the source ip address is for. Do you know of a better way to do this? I think we could get by with just dev_get_by_index, I'll have to investigate this. -- MST From mst at mellanox.co.il Mon Sep 26 07:43:29 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Sep 2005 17:43:29 +0300 Subject: [openib-general] [PATCH] mthca: fix clr_int calculation Message-ID: <20050926144329.GP12818@mellanox.co.il> ----- Forwarded message from Leonid Keller ----- Subject: a bug ? Date: Wed, 21 Sep 2005 21:22:05 +0300 From: "Leonid Keller" in mthca_init_eq_table() there is code: dev->eq_table.clr_int = dev->clr_base + (dev->eq_table.inta_pin < 31 ? 4 : 0); In VAPI i saw: dev->eq_table.clr_int = dev->clr_base + (dev->eq_table.inta_pin < 32 ? 4 : 0); It's a bug or i'm wrong somewhere ? ----- End forwarded message ----- Roland, the following makes more sense, does it not? --- Fix clr_int calculation. Signed-off-by: Michael S. Tsirkin Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_eq.c =================================================================== --- linux-kernel.orig/drivers/infiniband/hw/mthca/mthca_eq.c 2005-09-26 17:17:01.000000000 +0300 +++ linux-kernel/drivers/infiniband/hw/mthca/mthca_eq.c 2005-09-26 17:17:08.000000000 +0300 @@ -838,7 +838,7 @@ int __devinit mthca_init_eq_table(struct dev->eq_table.clr_mask = swab32(1 << (dev->eq_table.inta_pin & 31)); dev->eq_table.clr_int = dev->clr_base + - (dev->eq_table.inta_pin < 31 ? 4 : 0); + (dev->eq_table.inta_pin < 32 ? 4 : 0); } dev->eq_table.arm_mask = 0; -- MST From mst at mellanox.co.il Mon Sep 26 07:55:52 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Sep 2005 17:55:52 +0300 Subject: [openib-general] [PATCH] mthca: mthca_map_cmd off by one (was Fwd: a bug ?) Message-ID: <20050926145552.GQ12818@mellanox.co.il> ----- Forwarded message from Leonid Keller ----- Subject: a bug ? Date: Mon, 26 Sep 2005 17:17:05 +0300 From: "Leonid Keller" look at the following code in mthca_map_cmd(): ----------------------------------------------------------------------- pages[nent * 2 + 1] = cpu_to_be64((mthca_icm_addr(&iter) + (i << lg)) | (lg - 12)); ... if (nent == MTHCA_MAILBOX_SIZE / 16) { err = mthca_cmd(dev, mailbox->dma, nent, 0, op, CMD_TIME_CLASS_B, status); ... nent = 0; } ----------------------------------------------------------------------------- On nent = 256 'pages[nent * 2 + 1]' overruns mailbox ! It is to be if (nent == MTHCA_MAILBOX_SIZE / 16 - 1) Right ? ----- End forwarded message ----- Looks like a bug. The following patch is untested: I dont have memfree hardware at the moment. Roland, does this make sense to you? --- Fix off by one bug in mthca_map_cmd. Signed-off-by: Michael S. Tsirkin Index: linux-kernel/drivers/infiniband/hw/mthca/mthca_cmd.c =================================================================== --- linux-kernel.orig/drivers/infiniband/hw/mthca/mthca_cmd.c 2005-08-29 10:22:10.000000000 +0300 +++ linux-kernel/drivers/infiniband/hw/mthca/mthca_cmd.c 2005-09-26 17:50:54.000000000 +0300 @@ -616,7 +616,7 @@ static int mthca_map_cmd(struct mthca_de ts += 1 << (lg - 10); ++tc; - if (nent == MTHCA_MAILBOX_SIZE / 16) { + if (nent == MTHCA_MAILBOX_SIZE / 16 - 1) { err = mthca_cmd(dev, mailbox->dma, nent, 0, op, CMD_TIME_CLASS_B, status); if (err || *status) -- MST From jackm at mellanox.co.il Mon Sep 26 08:06:22 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Mon, 26 Sep 2005 18:06:22 +0300 Subject: [openib-general] RE: core and ipoib questions and oops Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E319AD02@mtlexch01.mtl.com> Problem is at ipoib_multicast.c:223 : if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4, a. r14 contains mcast->dev: drivers/infiniband/ulp/ipoib/ipoib_multicast.c:216 41a8: 4c 8b b7 f0 00 00 00 mov 0xf0(%rdi),%r14 NOTE THAT r14 is ZERO. This implies that we still have a pointer to the mcast structure, but the entire structure has been zeroed out (the code only sets mcast->dev at mcast struct allocation time -- it never zeroes out mcast->dev). This could happen, for example, if mcast was freed, then re-allocated and zeroed. b. r13 contains priv (which is obtained via the netdev_priv macro) -- this explains the 0x380 offset from NULL in r13: include/linux/netdevice.h:488 41b2: 4d 8d ae 80 03 00 00 lea 0x380(%r14),%r13 We can conclude that the mcast group was deleted, but an mcast completion still got delivered from below by ib_mad (the thread which failed was ib_mad1). BTW -- this is the same sort of kernel oops that I sent to Roland on 20.9.05 (we also saw 0x0000000000380 in r13 there). This might happen, for example, if when invoking the restart task, wait_for_completion() incorrectly terminated in ipoib_mcast_stop_thread() (ipoib_multicast.c:836), then the multicast group was freed ( ipoib_mcast_free() at ipoib_multicast.c:915), and finally a callback was invoked after the free. Jack > > -----Original Message----- > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Monday, September 26, 2005 4:08 PM > To: Tziporet Koren; Jack Morgenstein > Subject: Fwd: core and ipoib questions and oops > > > Here's an oops I got recently. > You might want to look into it. > > ----- Forwarded message from "Michael S. Tsirkin" > ----- > > Subject: core and ipoib questions and oops > Date: Mon, 26 Sep 2005 15:49:43 +0300 > From: "Michael S. Tsirkin" > > Two questions: > > 1. Roland, looking at ipoib_multicast, I see > if (mcast->query) { > ib_sa_cancel_query(mcast->query_id, > mcast->query); > mcast->query = NULL; > ipoib_dbg_mcast(priv, "waiting for MGID " > IPOIB_GID_FMT "\n", > > IPOIB_GID_ARG(mcast->mcmember.mgid)); > wait_for_completion(&mcast->done); > } > > what prevents ipoib_mcast_join_complete from running > at the same time and changing mcast->query after we've tested it? > > 2. All, what happends in the core if I call ib_sa_cancel_query > while the completion is running, or has already run? > Is it possible that there's a bug that makes it possible for > a completion callback to run twice in this case? > > Thanks, > MST > > --- > > The following oops happends on svn rev 3535. > > #ifconfig ib0 down > > Unable to handle kernel NULL pointer dereference at 0000000000000388 > RIP: > {:ib_ipoib:ipoib_mcast_join_finish+100} > PGD 172cd4067 PUD 172d16067 PMD 0 > Oops: 0000 [1] SMP > CPU 0 > Modules linked in: ib_sdp ib_cm ib_ipoib ib_sa ib_umad ib_mthca ib_mad > ib_core > Pid: 2399, comm: ib_mad1 Not tainted 2.6.13 > RIP: 0010:[] > {:ib_ipoib:ipoib_mcast_join_finish+100} > RSP: 0018:ffff81017348dc58 EFLAGS: 00010282 > RAX: 0000000074010000 RBX: 0000000000000000 RCX: 0000000000000010 > RDX: ffff810177d93380 RSI: ffff810177d93380 RDI: ffff810177d93380 > RBP: ffff810177d93380 R08: 0000000000000000 R09: ffff81017348dd38 > R10: ffff81017348ddf8 R11: 0000000000000001 R12: 0000000000000000 > R13: 0000000000000380 R14: 0000000000000000 R15: ffff810173484898 > FS: 0000000000000000(0000) GS:ffffffff8064b800(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000388 CR3: 0000000174725000 CR4: 00000000000006e0 > Process ib_mad1 (pid: 2399, threadinfo ffff81017348c000, task > ffff8101773e07f0) > Stack: 0000000100000092 0000000000000000 0000000000000096 > 0000000000000296 > 0000000000000296 ffffffff8028b8b0 0000000000000096 > ffff81017d1343c0 > ffff810172fd50c0 ffff810172daba10 > Call Trace:{dma_pool_free+272} > {:ib_ipoib:ipoib_mcast_join_complete+43} > {:ib_core:ib_unpack+198} > {:ib_sa:ib_sa_mcmember_rec_callback+64} > {:ib_sa:recv_handler+117} > {:ib_mad:ib_mad_completion_handler+949} > {:ib_mad:ib_mad_completion_handler+0} > {worker_thread+478} > {default_wake_function+0} > {__wake_up_common+64} > {default_wake_function+0} > {keventd_create_kthread+0} > {worker_thread+0} > {keventd_create_kthread+0} > {kthread+204} > {child_rip+8} > {keventd_create_kthread+0} > {kthread+0} {child_rip+0} > > > Code: 49 8b 7d 08 48 81 c7 b4 00 00 00 f3 a6 75 17 49 8b 45 70 8b > RIP {:ib_ipoib:ipoib_mcast_join_finish+100} RSP > > CR2: 0000000000000388 > > > -- > MST > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > ----- End forwarded message ----- -------------- next part -------------- An HTML attachment was scrubbed... URL: From yipeeyipeeyipeeyipee at yahoo.com Mon Sep 26 08:00:52 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Mon, 26 Sep 2005 15:00:52 +0000 (UTC) Subject: [openib-general] Re: OpenSM & pkeys References: <6AB138A2AB8C8E4A98B9C0C3D52670E3069237@mtlexch01.mtl.com> <1127736436.4398.49792.camel@hal.voltaire.com> Message-ID: Hal Rosenstock voltaire.com> writes: [cut] > The PKey tables can also be obtained by the diagnostic tools: smpdump in > particular. OK, But currently there's no way (at least that I know of) to set PKeys for the hosts ports. y From jlentini at netapp.com Mon Sep 26 08:16:58 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 26 Sep 2005 11:16:58 -0400 (EDT) Subject: [openib-general] Re: 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: <20050926144125.GN12818@mellanox.co.il> References: <52slvr7w1l.fsf@cisco.com> <20050926144125.GN12818@mellanox.co.il> Message-ID: On Mon, 26 Sep 2005, Michael S. Tsirkin wrote: > Quoting r. Roland Dreier : > > Subject: 2.6.14 heads up: ip_dev_find() not exported > > > > I noticed while compiling against an up-to-date kernel tree that SDP > > and IBAT both use the function ip_dev_find(). The EXPORT_SYMBOL for > > this function was removed during the 2.6.14 devel cycle. > > > > I haven't looked yet at what this function does, how SDP and IBAT use > > it or what it could be replaced by. But now would be a good time to > > figure out whether we need to ask for it to be re-exported, or if > > there's a better alternative to do whatever it does for us. > > > > - R. > > If ip_route_output_key resolves to a loopback device, > sdp uses ip_dev_find to try and locate the actual hardware device > that the source ip address is for. > > Do you know of a better way to do this? > > I think we could get by with just dev_get_by_index, I'll have to > investigate this. FYI: I found a bug in IBAT's use of dev_get_by_index(). See: http://openib.org/pipermail/openib-general/2005-September/011668.html From yipeeyipeeyipeeyipee at yahoo.com Mon Sep 26 08:16:24 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Mon, 26 Sep 2005 15:16:24 +0000 (UTC) Subject: [openib-general] Re: OpenSM & pkeys References: <6AB138A2AB8C8E4A98B9C0C3D52670E3069237@mtlexch01.mtl.com> <1127736353.4398.49774.camel@hal.voltaire.com> Message-ID: Hal Rosenstock voltaire.com> writes: [cut] > Can you share any requirements relative to this (or alternatively review > a proposal) ? The policy on PKey management (partition management) is > beyond the IB spec. I'm looking for a way to set the PKey tables for some hosts ports in my fabric. The other hosts' (the ones I'm not interested in) PKey tables shouldn't be changed/touched. The reason I'm looking into this partitioning issue is that I want to partition my fabric into several disjointed groups of hosts that can't/don't interact with each other. Is there any proposal you want me to review? Where can I read it? thanks, y From Administrator at openib.org Mon Sep 26 08:30:58 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Mon, 26 Sep 2005 08:30:58 -0700 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <019501c5c2af$4adcb210$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] *DETECTED* ONLINE USER VIOLATION Scanning time = 9/26/2005 8:30:57 AM Engine/Pattern = 7.510-1002/2.857.00 Action on virus found: The attachment account-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. From Administrator at openib.org Mon Sep 26 08:31:06 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Mon, 26 Sep 2005 08:31:06 -0700 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <019801c5c2af$501647f0$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] *DETECTED* Online User Violation Scanning time = 9/26/2005 8:31:06 AM Engine/Pattern = 7.510-1002/2.857.00 Action on virus found: The attachment account-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. From eitan at mellanox.co.il Mon Sep 26 08:37:35 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Mon, 26 Sep 2005 18:37:35 +0300 Subject: [openib-general] Re: OpenSM & pkeys Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3069256@mtlexch01.mtl.com> Hi Yipee, A Tcl API for sending and receiving MADs is available under https://openib.org/svn/gen2/utils/src/linux-user/ibis (IBIS == IB Inband Services) If you follow on the autogen.sh && configure && make && make install you should get /usr/local/bin/ibis executable. Then you need to basically: Run ibis from shell. And get the Tcl prompt. Then run: ibis_init ibis_get_local_ports_info # select one of the guids above that are ACTIVE and use in next command ibis_set_port # set the PKey you want to use smPkeyTableMad configure -pkey_entry {0xffff 0x1234 0x8555 .... the set of pkeys } # Then send it over using directed route or lid smPkeyTableMad setByDr smPkeyTableMad setByLid # Where: # dr-path = list of ports the packets has to exit through # 1,2 will leave the first node through port 1 and the second node through port 2 # lid = the target node lid # port num = the target port number to set pkey on # block num = the block of pkeys to set (you should probably use always 0 ) # you can figure out what other capabilities are there by using Tcl command: # info commands Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: yipee [mailto:yipeeyipeeyipeeyipee at yahoo.com] > Sent: Monday, September 26, 2005 6:01 PM > To: openib-general at openib.org > Subject: [openib-general] Re: OpenSM & pkeys > > Hal Rosenstock voltaire.com> writes: > > [cut] > > The PKey tables can also be obtained by the diagnostic tools: smpdump in > > particular. > > OK, But currently there's no way (at least that I know of) to set PKeys for the > hosts ports. > > > y > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Sep 26 08:28:59 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 11:28:59 -0400 Subject: [openib-general] Re: OpenSM & pkeys In-Reply-To: References: <6AB138A2AB8C8E4A98B9C0C3D52670E3069237@mtlexch01.mtl.com> <1127736353.4398.49774.camel@hal.voltaire.com> Message-ID: <1127748353.4398.527.camel@hal.voltaire.com> On Mon, 2005-09-26 at 11:16, yipee wrote: > Is there any proposal you want me to review? Where can I read it? It hasn't been written yet. -- Hal From ardavis at ichips.intel.com Mon Sep 26 08:36:56 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Mon, 26 Sep 2005 08:36:56 -0700 Subject: [openib-general] 3513 DAPL is Broken In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F00059AF893@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F00059AF893@orsmsx408> Message-ID: <43381598.4090506@ichips.intel.com> Woodruff, Robert J wrote: >Seems to hang around the time of the modify QP. > > I just pulled the latest (3541) and ran on my 2.6.13 systems and saw no problems with MPI. -arlin >ibv_rc_pingpong seems to work OK and also your >DAPL-socket CM version that you gave me yesterday seems >to work, but the DAPL I pulled from SVN that uses the IB AT/CM >has the following problem. > >I am starting to think that pushing out your socket CM >version until things stabilize with the IBAT/IBCM version >might be worth considering, so that people that >want to use DAPL now have something that is reliable. > >woody > > > > > > From mst at mellanox.co.il Mon Sep 26 08:45:43 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Sep 2005 18:45:43 +0300 Subject: [openib-general] Re: Re: 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: References: Message-ID: <20050926154543.GU12818@mellanox.co.il> Quoting James Lentini : > > I think we could get by with just dev_get_by_index, I'll have to > > investigate this. > > FYI: I found a bug in IBAT's use of dev_get_by_index(). See: > > http://openib.org/pipermail/openib-general/2005-September/011668.html > Right. We should just use dev_getfirstbyhwtype and ignore the fact that it could be down, or manually scan the dev_base list instead. -- MST From halr at voltaire.com Mon Sep 26 08:47:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 11:47:25 -0400 Subject: [openib-general] Page allocation failures & kdapltest oops Message-ID: <1127749644.4398.878.camel@hal.voltaire.com> Hi James, I keep getting the following when running kdapltest. This is similar to what I saw before and reported a couple of times but now seems more consistent in occurring. -- Hal Sep 26 10:29:29 hal kernel: DT_Mdep_Thread_: page allocation failure. order:0, mode:0x20 Sep 26 10:29:29 hal kernel: [] __alloc_pages+0x2f2/0x490 Sep 26 10:29:29 hal kernel: [] kmem_getpages+0x31/0xb0 Sep 26 10:29:29 hal kernel: [] cache_grow+0x139/0x360 Sep 26 10:29:29 hal kernel: [] cache_alloc_refill+0x151/0x340 Sep 26 10:29:29 hal kernel: [] DT_handle_send_op+0x2fa/0x400 [kdapltest] Sep 26 10:29:29 hal kernel: [] __kmalloc+0xb4/0xf0 Sep 26 10:29:29 hal kernel: [] DT_Mdep_Malloc+0x25/0x60 [kdapltest] Sep 26 10:29:29 hal kernel: [] DT_Tdep_PT_Printf+0x16/0x1b0 [kdapltest] Sep 26 10:29:29 hal kernel: [] DT_Transaction_Run+0x607/0xb60 [kdapltest] Sep 26 10:29:29 hal kernel: [] DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest] Sep 26 10:29:29 hal kernel: [] DT_Transaction_Main+0x1388/0x21a0 [kdapltest] Sep 26 10:29:29 hal kernel: [] kernel_map_pages+0x28/0x60 Sep 26 10:29:30 hal kernel: [] cache_free_debugcheck+0x196/0x2d0 Sep 26 10:29:30 hal kernel: [] DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest] Sep 26 10:29:30 hal kernel: [] DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest] Sep 26 10:29:30 hal kernel: [] kernel_thread_helper+0x5/0x10 Sep 26 10:29:30 hal kernel: Mem-info: Sep 26 10:29:30 hal kernel: DMA per-cpu: Sep 26 10:29:30 hal kernel: cpu 0 hot: low 2, high 6, batch 1 used:2 Sep 26 10:29:30 hal kernel: cpu 0 cold: low 0, high 2, batch 1 used:1 Sep 26 10:29:30 hal kernel: Normal per-cpu: Sep 26 10:29:30 hal kernel: cpu 0 hot: low 62, high 186, batch 31 used:92 Sep 26 10:29:30 hal kernel: cpu 0 cold: low 0, high 62, batch 31 used:44 Sep 26 10:29:30 hal kernel: HighMem per-cpu: empty Sep 26 10:29:30 hal kernel: Free pages: 1636kB (0kB HighMem) Sep 26 10:29:30 hal kernel: Active:28634 inactive:3039 dirty:41 writeback:0 unstable:0 free:409 slab:28426 mapped:31411 pagetables:543 Sep 26 10:29:30 hal kernel: DMA free:1008kB min:128kB low:160kB high:192kB active:2232kB inactive:4kB present:16384kB pages_scanned:6 all_unreclaimable? no Sep 26 10:29:30 hal kernel: lowmem_reserve[]: 0 240 240 Sep 26 10:29:30 hal kernel: Normal free:628kB min:1920kB low:2400kB high:2880kB active:112304kB inactive:12152kB present:245760kB pages_scanned:0 all_unreclaimable? no Sep 26 10:29:30 hal kernel: lowmem_reserve[]: 0 0 0 Sep 26 10:29:30 hal kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Sep 26 10:29:30 hal kernel: lowmem_reserve[]: 0 0 0 Sep 26 10:29:30 hal kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB Sep 26 10:29:30 hal kernel: Normal: 1*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 628kB Sep 26 10:29:30 hal kernel: HighMem: empty Sep 26 10:29:30 hal kernel: Swap cache: add 135242, delete 121268, find 84138/98778, race 0+0 Sep 26 10:29:30 hal kernel: Free swap = 341804kB Sep 26 10:29:30 hal kernel: Total swap = 522104kB Sep 26 10:29:30 hal kernel: Free swap: 341804kB Sep 26 10:29:30 hal kernel: 65536 pages of RAM Sep 26 10:29:30 hal kernel: 0 pages of HIGHMEM Sep 26 10:29:30 hal kernel: 1533 reserved pages Sep 26 10:29:30 hal kernel: 30650 pages shared Sep 26 10:29:30 hal kernel: 13974 pages swap cached Sep 26 10:29:30 hal kernel: 41 pages dirty Sep 26 10:29:30 hal kernel: 0 pages writeback Sep 26 10:29:30 hal kernel: 31411 pages mapped Sep 26 10:29:30 hal kernel: 28426 pages slab Sep 26 10:29:30 hal kernel: 543 pages pagetables Sep 26 10:29:30 hal kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004 Sep 26 10:29:30 hal kernel: printing eip: Sep 26 10:29:30 hal kernel: c022934b Sep 26 10:29:30 hal kernel: *pde = 07efd067 Sep 26 10:29:30 hal kernel: *pte = 00000000 Sep 26 10:29:30 hal kernel: Oops: 0002 [#1] Sep 26 10:29:30 hal kernel: DEBUG_PAGEALLOC Sep 26 10:29:30 hal kernel: Modules linked in: kdapltest kdapl_ib ib_cm ib_at kdapl ib_ipoib ib_sa ib_umad ide_cd cdrom lp ipv6 autofs parport_pc parport uhci_hcd ehci_hcd ib_mthca ib_mad ib_core ohci_hcd eepro100 mii usbcore evdev Sep 26 10:29:30 hal kernel: CPU: 0 Sep 26 10:29:30 hal kernel: EIP: 0060:[] Not tainted VLI Sep 26 10:29:30 hal kernel: EFLAGS: 00010283 (2.6.13) Sep 26 10:29:30 hal kernel: EIP is at vsnprintf+0x4b/0x4f0 Sep 26 10:29:30 hal kernel: eax: 00000054 ebx: cd240f78 ecx: 00000000 edx: d0bc70c0 Sep 26 10:29:30 hal kernel: esi: 00000004 edi: 00000001 ebp: 00000103 esp: c006dd78 Sep 26 10:29:30 hal kernel: ds: 007b es: 007b ss: 0068 Sep 26 10:29:30 hal kernel: Process DT_Mdep_Thread_ (pid: 2230, threadinfo=c006c000 task=c052cac0) Sep 26 10:29:30 hal kernel: Stack: cffff740 00000020 00000000 d0bc16d5 00000104 00000000 00000001 d0bc16d5 Sep 26 10:29:30 hal kernel: 00000104 00000020 cd240f78 00000000 00000001 c2538000 d0bc258b 00000004 Sep 26 10:29:30 hal kernel: 00000100 d0bc70c0 c006dde8 00000000 0000007b ca254f78 cd240f78 cef6e060 Sep 26 10:29:30 hal kernel: Call Trace: Sep 26 10:29:30 hal kernel: [] DT_Mdep_Malloc+0x25/0x60 [kdapltest] Sep 26 10:29:30 hal kernel: [] DT_Mdep_Malloc+0x25/0x60 [kdapltest] Sep 26 10:29:30 hal kernel: [] DT_Tdep_PT_Printf+0x3b/0x1b0 [kdapltest] Sep 26 10:29:30 hal kernel: [] DT_Transaction_Run+0x607/0xb60 [kdapltest] Sep 26 10:29:30 hal kernel: [] DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest] Sep 26 10:29:30 hal kernel: [] DT_Transaction_Main+0x1388/0x21a0 [kdapltest] Sep 26 10:29:30 hal kernel: [] kernel_map_pages+0x28/0x60 Sep 26 10:29:30 hal kernel: [] cache_free_debugcheck+0x196/0x2d0 Sep 26 10:29:30 hal kernel: [] DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest] Sep 26 10:29:30 hal kernel: [] DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest] Sep 26 10:29:30 hal kernel: [] kernel_thread_helper+0x5/0x10 Sep 26 10:29:30 hal kernel: Code: f0 48 39 c5 73 0d 89 f2 f7 da bd ff ff ff ff 89 54 24 40 8b 54 24 44 80 3a 00 74 23 8d 74 26 00 0f b6 02 3c 25 74 3d 39 ee 77 06 <88> 06 8b 54 24 44 46 89 d0 42 89 54 24 44 80 78 01 00 75 e1 39 From halr at voltaire.com Mon Sep 26 08:52:23 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 11:52:23 -0400 Subject: [openib-general] Re: OpenSM Routing Algorithms Scalability and Enhancements In-Reply-To: <431DEB78.4040800@mellanox.co.il> References: <431DEB78.4040800@mellanox.co.il> Message-ID: <1127749941.4398.957.camel@hal.voltaire.com> Hi Eitan, I finally got a chance to read this over. Here are some comments: On Tue, 2005-09-06 at 15:18, Eitan Zahavi wrote: > Hi All, > > As we are about to start working on the fast routing algorithms, > here is the writeup about proposed algorithms for your review. This appears to be an update to what was publshed last May. > The plan is to start development once the merge of 1.8.0 into the > trunk is done. > > ______________________________________________________________________ > OpenSM Unicast Routing Enhancements for Scalability > ===================================================== > > Authors:Eitan Zahavi , Yael Kalka > Date: Aug 2005. > > Table of contents: > 1. Overview > 2. Notation > 3. Current Algorithms > 4. Proposed Routing Algorithms > 5. Min Hop Tables Implementation > 6. Incremental Routing > 7. Routing Persistancy > > 1. Overview: > -- > > OpenSM currently uses a two stage routing algorithm for unicast > forwarding tables calculation. As shown later these algorithm are > O(N^3). Inspected run time of OpenSM routing stage was ~2.5min on 1100 > nodes cluster. On what processor architecture (and what speed was the CPU) ? Also how much memory was in that machine (memory consumption of OpenSM) ? > The purpose of this memo is to present the community > the proposed work for enhancing OpenSM routing engine. > > 2. Notation: > -- > The following notations are used throughout this document: > S = Number of switch devices in the system > P = Number of ports each switch node has > H = Number of HCA ports connected to the fabric > L = Number of HCAs connected to each Leaf switch device. > Normal values are 1/2P to 3/4P Is L the proportion of leaf switch ports connected to HCAs rather than to other switches ? That's what it looks like from 1/2P -> 3/4P. > D = Fat Tree depth > > 3. Current Algorithms: > -- > OpenSM provide two routing algorithms: Minimal Hop and Up/Down. Both > of them do not scale with cluster size and can consume both large run > time (minutes) and memory (GB). This section provides meta code for > these algorithms and order calculation. > > 3.1 Min Hop algorithm analysis: > The Min Hop algorithm is divided into two stages: computation of > min-hop tables on every switch and LFT output port assignment. > > Step 1: Computation of Min-Hop Tables on each switch > > The memory consumed is S*(S+H)*(P+2)*Byte. On 10K nodes cluster with > 2500 switch devices this ends up as 812M-Byte (using LMC=0). > > Meta algorithm: > For each HCA mark its remote neighbor switch port with hop 1. > For each switch mark itself port 0 as hop 0 > While changes > For each switch > For each port > For each LID > Propagate remote port hop as hop +1 if smaller or undefined > > The order of this step: O(S*P*(S+H)*(D+1)) > > Step 2: Assigning output port: > For each switch > For each LID > For each Port > Is it the one with min count > > The order of this step: O(S*(S+H)*P) > > 3.2 Up/Down algorithm analysis > The Up/Down algorithm depends on the ability to rank the fabric nodes > from root to leaf of the tree. To get that ranking it runs a > heuristics that is based on the Min Hop tables. So the memory and > complexity are identical to the Min-Hop first step to start with. > > Once ranking is performed the algorithm is BFS from every HCA and fill > in the Min Hop tables again. Up/Down traversal rule is enforced during > the BFS such that only valid turns are allowed. > > Meta algorithm: > For each HCA > Get connected Switch > For each Switch in NextSwitches > For each Port > Check if direction is OK. Check if not visited > > The order is O(H*S*P) > > To finalize output port assignment the second step of the Min Hop > algorithm is invoked. > > > 4. Proposed Algorithm: > -- > > Inspecting the routing problem we have noticed the following > attributes: > a. Using Min-Hop tables for keeping intermediate routing information > has a disadvantage in terms of memory consumption. However, any > incremental routing algorithm (for handling fabric changes after > first setup), or routing persistence solution could use this > information and gain speed. > b. Since we need to fill in LFT tables that are of the order S*(S+H) > the algorithm is lower bounded by O(H^2). Is the lower bound O(S^2) ? > c. A persistence based solution which uses previously routed fabric > data and is able to handle simple incremental changes will provide > a much faster runtime as topology match will require O(S*P) > (traversing all links once) > d. Since the minimum hops information is identical for a switch and > all the HCAs connected to it - there is no point in building "min > hop" tables for HCAs. During the "output port" assignment stage, > the HCAs connected to each switch are considered and routed. Yes, but the actual min hops value is different for the HCA and switch but I don't think that matters. It is the output port for min hops that matters here.. I presume wherever HCA is in this would also apply to router ports. Also, what about switch port 0 routing ? > The result of "a" is that several algorithms that are superior from > memory footprint and skip any "hop table" stage are not considered for > implementation. > > To support "d" we needed to provide an index to each switch such that > the "min hop" tables are dense (previously they were indexed by LID). > The new index is stored on the switch object and thus allow lookup of > a switch "min hop" by its index. An array of switch pointer by index > supports the reverse lookup. Seems like array matching is might be inefficient in a large switch network. I'm sure this is in more places in OpenSM. Should these be replaced by some other data structure and search ? [This is a general OpenSM question but comes up in this context.] > The suggested algorithm is broken into the following 3 stages: > * Root nodes identification heuristics > * Min Hop tables computation > * Output port assignment > > 4.1 Root nodes identification heuristics: > This step is only required under the AND of the following two > conditions: > * Up/Down routing is required > * The user does not provide a file with guids of the tree "root nodes" What is the impact of poor root node selection/choice ? How many root nodes are there ? > This heuristics for recognizing the tree roots is based on histograms > of the HCAs distance from every switch. > I.e. How many HCAs are 1 hop, 2 hops from the switch. In order to fill > in these histograms on all switches we need to BFS from every leaf > switch and propagate the number of HCAs connected to it: > > Meta algorithm: > For each switch > For each Port > If connected to HCAs count them > If any HCA > Init BFS to start with current switch > set hop count to 0 > While there are switches in BFS list > increment hop count > For each switch in BFS List > Add the number of HCAs to the histogram at the current hop count > For each port > If remote port switch not visited > Add the switch to the BFS Next Step List > Once finished all this step list use next steps I didn't follow the meta algorithm but I'm not sure it matters right now. > The order of this step is: O(S*P + H/L*S*P) = O(*H*S) ^^^^^? This looks to me like O((1+H/L)*S*P). Not sure how that reduces to O(*H*S) but it looks like something might be missing there. > 4.2 Min Hop tables computation: > This step is mandatory and has a slightly different flavor for the > case of Up/Down routing. > > The algorithm starts from every leaf switch and traverses BFS wise > through the fabric. > > Meta algorithm: > foreach switch in the fabric > clear the "Rank" vector for all switches. > start BFS with the given switch. > set rank to 0 > while any switch in BFS list > | foreach switch in BFS switch list > | |foreach port (valid, active, not unhealthy) > | | if remote side is a switch: > | | if rank of remote side 0 or = rank + 1 > | | set the remote port entry MinHopPort for this switch > | | if rank of remote side 0 i.e. never visited > | | set the remote switch rank to rank + 1 > | | add the remote switch to next BFS switches > | |-- > | switch between the current and next switches list > | increment rank > |-- > > The order of this algorithm: O(P*S^2) > > Algorithm that merges Up/Down step criteria not yet written for this > stage. But the idea is to make each step keep track of any previous > step down. Such that a step up will be prohibited in this case. > > 4.3 Output port assignment: > This step provide actual LFT values assignment to each switch. > To do that we access the built "min hop" tables and track port usage. > > Meta algorithm: > foreach switch in the fabric > clear the port subscription vector (track number of paths subscribed) > foreach target switch in the "min hop" table > get the list of min-hop ports > foreach end-node attached (HCA connected to it and itself) > if lmc > 0 init tracking of remote system and node for selecting > disjoint paths for same end-node different LID LSBs What are the policies for disjoint paths ? I can envision several. > get min-subscribed (and disjoint) port marked min-hop target switch > track port usage in port-subscription (opt. if target LID is not a switch) > > Order of this step: > Currently the selection of the output port by min-subscription is > trivial and requires O(P) so the overall order is > O(S*S*(1+L)*P) <= O(16*P*S^2) > > One could obtain the list of marked min-hop ports and then use a > modified cyclic list for avoiding the search for min subscription in > the case of LMC > 0. In that case the order could be reduced to: > O(S*S*(1+L)) ~= O(S*H) > > 5. Min Hop Table Implementation: > -- > The proposed algorithm does not require storing the number of hops > arriving at the switch port - but only the fact a port is on the min > hop path. This allowed for another memory usage reduction if the min > hop table would be of boolean values. > > The issue then is in an efficiant iterator on the boolean (bits) > array. The tradeoff is thus the common memory versus runtime. > > (Anybody knowns off a fast boolean array lookup implementation ?) > > 6. Incremental Routing: > -- > Once the fabric is routed we can define an algorithm for performing > incremental routing changes. An obvious case is when a link is > declared un-healthy or one of the ports is dropped. Assuming the > recognition of the change is done by some other algorithm. The > following cases apply: > * If the link connects HCA and a switch the HCA is unreachable. No > routing change required. > * If the link is between switches: > * If there at least one another link between these switches: > o Spread all routes going through the failing port to the other > ports connecting to the same switch. Right, but LMC needs to be "honored". > * If there is no other link to these switches > o Go back to all switches that feed into each one of the switches > (feed in means they route some target lids through the switch) > but only those that route lids that go through the failing port. Check > to see if there is another port that goes to a different switch > to route that lid to. If there is no other way do nothing. When multipathing is no longer multipath, it would be nice to inform NM. > How do we support topology changes line moving an HCA from one Switch > to another? Also, what about addition of new switches and HCAs ? What about subnet merge ? > 7. Routing Persistancy: > -- > To make the subnet initialization faster, one could store the existing > routing solution and use it without any calculation. > > The issue is of course what conditions makes the stored routing > obsolete. To maximize the usefullnes of the stored information we > propose to store the Min Hop tables rather then the final port > assignment. It is assumed that after restart there might be a need for > modifications to LMC and routing which will invalidate the LFT > anyways. To enable "cache invalidation criteria" the persistent > database should include information that could be used to easily check > if the fabric was not altered in a way that invalidates the MinHop tables. > The stored information should hold for each switch in the fabric (by guid) > the list of ports and the guids and port numbers on the remote side. > To validate there are no significant changes, the discovered set > of switches is checked to match the stored information. Right, it's only the switches that matter (and their connectivity: that links between switches are active )and healthy)). A related implementation issue is the format of the stored information. > Table 1 > describes the possible changes and their effect on the validity of > the MinHop tables. > > Table 1 - Connectivity Changes effect on Routing Info Validity > > Change | Effect on MinHop Tables | Effect on LFT and MFT | > -- > New Switch found | Invalidates (might connect | Invalidates | > | more HCAs and carries more | | > | routing resources) | | > -- > Missing Switch | Invalidates (MinHops might | Invalidates | > | be broken a few steps away)| | > -- > New cable found | Does not invalidate | Does not invalidate | > -- Assuming this is interswitch links, that could be ignored (which is what I think you are saying by does not invalidate) even if the new link(s) could improve the routing. If this is the case, does not need to invalidate might be more accurate here. > Missing Cable | Invalidates only if there | Invalidates all LIDs | > (SW to SW) | is no other cable | going through that port | > | connecting the switches | | > -- Might affect multipathing. > Missing Cable | Does no invalidate | Does not invalidate | > (SW to HCA) | | | > -- > New HCA | Does no invalidate | Does not invalidate | > -- > Missing HCA | Does no invalidate | Does not invalidate | > -- > LID Changes | Does no invalidate | Invalidates the modified| > | | LIDs | > -- > > Special marking for "root nodes" shold cache the results of the first > step for Up/Dpwn routing. These nodes should be invalidated on any > missing or additional switch conditions. It appears that (SM) failover is totally independent of all this. Is that true ? Possible some more on this later... -- Hal From guyg at voltaire.com Mon Sep 26 09:08:28 2005 From: guyg at voltaire.com (Guy German) Date: Mon, 26 Sep 2005 19:08:28 +0300 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> Message-ID: <43381CFC.7070508@voltaire.com> Sean, I've committed a modified version of the CMA to the svn repository, under gen2/users/guyg/ This version takes most of your changes to my original posting, except for the "connect" method - it still uses ib_at and does not discover the path to the consumer. I also added a small test, _for bring up purposes only_, which is not for real use (has memory leaks and uses global vars etc), but it does connect and passes private data from side to side. I would like to port the cmpost, for a real test module, but it is maybe possible to fix this test too. I had some problems with the private data manipulations in the cr, so I removed the src_ip from the private data, until I fix it. I believe that ib_at is still a valuable module even if ATS reverse ARP is broken, and I think we should discuss this. Thanks, Guy From mshefty at ichips.intel.com Mon Sep 26 09:19:50 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 09:19:50 -0700 Subject: [openib-general] 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: <52slvr7w1l.fsf@cisco.com> References: <52slvr7w1l.fsf@cisco.com> Message-ID: <43381FA6.4040609@ichips.intel.com> Roland Dreier wrote: > I noticed while compiling against an up-to-date kernel tree that SDP > and IBAT both use the function ip_dev_find(). The EXPORT_SYMBOL for > this function was removed during the 2.6.14 devel cycle. I'm calling using this routine in the rdam_cma code as well. I call it to get the local net_device before calling neigh_lookup(). I could call neigh_lookup_nodev() instead. - Sean From thomas.duffy.99 at alumni.brown.edu Mon Sep 26 09:20:06 2005 From: thomas.duffy.99 at alumni.brown.edu (Tom Duffy) Date: Mon, 26 Sep 2005 09:20:06 -0700 Subject: [openib-general] Re: SDP problems with 64K page size In-Reply-To: <20050925104559.GT31820@mellanox.co.il> References: <52ll1pdjm8.fsf@cisco.com> <20050925104559.GT31820@mellanox.co.il> Message-ID: On Sep 25, 2005, at 3:45 AM, Michael S. Tsirkin wrote: > Roland, I might check in the patch that you posted to work around > this problem for 64K page users, until I have a final fix ready. > Is that OK with everyone? > It looks harmless in the 4K or 16K case. Go for it. -tduffy From mst at mellanox.co.il Mon Sep 26 09:27:43 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Sep 2005 19:27:43 +0300 Subject: [openib-general] Re: 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: <52slvr7w1l.fsf@cisco.com> References: <52slvr7w1l.fsf@cisco.com> Message-ID: <20050926162742.GW12818@mellanox.co.il> Quoting Roland Dreier : > Subject: 2.6.14 heads up: ip_dev_find() not exported > > I noticed while compiling against an up-to-date kernel tree that SDP > and IBAT both use the function ip_dev_find(). The EXPORT_SYMBOL for > this function was removed during the 2.6.14 devel cycle. > > I haven't looked yet at what this function does, how SDP and IBAT use > it or what it could be replaced by. But now would be a good time to > figure out whether we need to ask for it to be re-exported, or if > there's a better alternative to do whatever it does for us. > > - R. Yes, this seems to have been removed as part of a general "cleanups" patch. http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0742fd53a3774781255bd1e471e7aa2e4a82d5f7 So the question is: given a source and destination address pair that resolves to a loopback device, how to map it to a specific IPoIB device that serves the source? -- MST From rolandd at cisco.com Mon Sep 26 09:29:44 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 09:29:44 -0700 Subject: [openib-general] Re: [PATCH] mthca: fix clr_int calculation In-Reply-To: <20050926144329.GP12818@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 26 Sep 2005 17:43:29 +0300") References: <20050926144329.GP12818@mellanox.co.il> Message-ID: <52br2f7pxj.fsf@cisco.com> Looks good, applied and queued for 2.6.14. - R. From mst at mellanox.co.il Mon Sep 26 09:35:47 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Sep 2005 19:35:47 +0300 Subject: [openib-general] Re: 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: <43381FA6.4040609@ichips.intel.com> References: <43381FA6.4040609@ichips.intel.com> Message-ID: <20050926163547.GX12818@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: 2.6.14 heads up: ip_dev_find() not exported > > Roland Dreier wrote: > > I noticed while compiling against an up-to-date kernel tree that SDP > > and IBAT both use the function ip_dev_find(). The EXPORT_SYMBOL for > > this function was removed during the 2.6.14 devel cycle. > > I'm calling using this routine in the rdam_cma code as well. I call it to get > the local net_device before calling neigh_lookup(). I could call > neigh_lookup_nodev() instead. > > - Sean And pass it arp_tbl? But I think this does lookup by destination address, while what we are trying to do here is a device lookup by source address. Am I mistaken? -- MST From rolandd at cisco.com Mon Sep 26 09:34:37 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 09:34:37 -0700 Subject: [openib-general] [PATCH] mthca: mthca_map_cmd off by one In-Reply-To: <20050926145552.GQ12818@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 26 Sep 2005 17:55:52 +0300") References: <20050926145552.GQ12818@mellanox.co.il> Message-ID: <523bnr7ppe.fsf@cisco.com> Yes, good catch, but I think the fix is not quite right. When we fill up the table, we'll set nent = 0, and then do ++nent at the end of the loop. So the second time around we'll start with nent = 1. Something like this is better, right? --- linux-kernel/infiniband/hw/mthca/mthca_cmd.c (revision 3544) +++ linux-kernel/infiniband/hw/mthca/mthca_cmd.c (working copy) @@ -605,7 +605,7 @@ static int mthca_map_cmd(struct mthca_de err = -EINVAL; goto out; } - for (i = 0; i < mthca_icm_size(&iter) / (1 << lg); ++i, ++nent) { + for (i = 0; i < mthca_icm_size(&iter) / (1 << lg); ++i) { if (virt != -1) { pages[nent * 2] = cpu_to_be64(virt); virt += 1 << lg; @@ -616,7 +616,7 @@ static int mthca_map_cmd(struct mthca_de ts += 1 << (lg - 10); ++tc; - if (nent == MTHCA_MAILBOX_SIZE / 16) { + if (++nent == MTHCA_MAILBOX_SIZE / 16) { err = mthca_cmd(dev, mailbox->dma, nent, 0, op, CMD_TIME_CLASS_B, status); if (err || *status) From mshefty at ichips.intel.com Mon Sep 26 09:38:23 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 09:38:23 -0700 Subject: [openib-general] Re: 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: <20050926163547.GX12818@mellanox.co.il> References: <43381FA6.4040609@ichips.intel.com> <20050926163547.GX12818@mellanox.co.il> Message-ID: <433823FF.1070902@ichips.intel.com> Michael S. Tsirkin wrote: >>>I noticed while compiling against an up-to-date kernel tree that SDP >>>and IBAT both use the function ip_dev_find(). The EXPORT_SYMBOL for >>>this function was removed during the 2.6.14 devel cycle. >> >>I'm calling using this routine in the rdam_cma code as well. I call it to get >>the local net_device before calling neigh_lookup(). I could call >>neigh_lookup_nodev() instead. >> >>- Sean > > And pass it arp_tbl? Yes. > But I think this does lookup by destination address, while > what we are trying to do here is a device lookup by source address. > Am I mistaken? For the code that I'm referring to, which hasn't been committed yet, I'm trying to lookup by remote address. I was handling the code differently based on whether or not I was given a local address. - Sean From rolandd at cisco.com Mon Sep 26 09:46:34 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 09:46:34 -0700 Subject: [openib-general] [PATCH] mthca: mthca_map_cmd off by one In-Reply-To: <20050926145552.GQ12818@mellanox.co.il> (Michael S. Tsirkin's message of "Mon, 26 Sep 2005 17:55:52 +0300") References: <20050926145552.GQ12818@mellanox.co.il> Message-ID: <52y85j6al1.fsf@cisco.com> I applied my version of the patch -- let me know if you think it's buggy. - R. From mst at mellanox.co.il Mon Sep 26 09:52:21 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Sep 2005 19:52:21 +0300 Subject: [openib-general] Re: 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: <433823FF.1070902@ichips.intel.com> References: <433823FF.1070902@ichips.intel.com> Message-ID: <20050926165221.GZ12818@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: 2.6.14 heads up: ip_dev_find() not exported > > Michael S. Tsirkin wrote: > >>>I noticed while compiling against an up-to-date kernel tree that SDP > >>>and IBAT both use the function ip_dev_find(). The EXPORT_SYMBOL for > >>>this function was removed during the 2.6.14 devel cycle. > >> > >>I'm calling using this routine in the rdam_cma code as well. I call it to get > >>the local net_device before calling neigh_lookup(). I could call > >>neigh_lookup_nodev() instead. > >> > >>- Sean > > > > And pass it arp_tbl? > > Yes. > > > But I think this does lookup by destination address, while > > what we are trying to do here is a device lookup by source address. > > Am I mistaken? > > For the code that I'm referring to, which hasn't been committed yet, I'm trying > to lookup by remote address. I was handling the code differently based on > whether or not I was given a local address. > > - Sean > Hmm. I do need the source address for the path record query, do I not? -- MST From mst at mellanox.co.il Mon Sep 26 09:54:27 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Mon, 26 Sep 2005 19:54:27 +0300 Subject: [openib-general] [PATCH] mthca: mthca_map_cmd off by one In-Reply-To: <52y85j6al1.fsf@cisco.com> References: <52y85j6al1.fsf@cisco.com> Message-ID: <20050926165427.GA12818@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: [openib-general] [PATCH] mthca: mthca_map_cmd off by one > > I applied my version of the patch -- let me know if you think it's buggy. > > - R. > Looks good to me. -- MST From mshefty at ichips.intel.com Mon Sep 26 09:55:27 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 09:55:27 -0700 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <43381CFC.7070508@voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> Message-ID: <433827FF.3010601@ichips.intel.com> Guy German wrote: > I believe that ib_at is still a valuable module even if ATS reverse ARP > is broken, and I think we should discuss this. Here's my thinking on this. ATS is broken as you mentioned for reverse lookups. However, if we want to keep ATS, I think that ATS registration/deregistration should be integrated with IPoIB. To keep it separate, we will need to patch net_device to provide an rdma_ptr as suggested by Roland. For ARP, we can extract ARP code from ib_at and use it with the CMA. The CMA can then use the "ib_arp" module to map network addresses to GIDs, then issue path record queries using ib_sa. Any caching should be done in other places. For example, ARP already does its own caching, and I believe that a generic SA caching module should be a part of ib_sa or exist separately. This seems the best approach to getting a higher level RDMA connection interface into the kernel. - Sean From viswa.krish at gmail.com Mon Sep 26 10:00:45 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Mon, 26 Sep 2005 10:00:45 -0700 Subject: [openib-general] Re: Another opensm problem ? In-Reply-To: <4335BA72.9080203@mellanox.co.il> References: <1127516065.4398.1937.camel@hal.voltaire.com> <4335BA72.9080203@mellanox.co.il> Message-ID: <4df28be405092610004722455f@mail.gmail.com> Hi Eitan, I see that message in the log. -Viswa On 9/24/05, Eitan Zahavi wrote: > > Hi Viswa and Hal, > > I have read through the thread and have few comments. > > But first let me see if I understand the test run correctly. The test is > as follows: > 1. OpenSM starts up configuring the subnet. > 2. Then the user ears up a cable and connects it to the other side port of > a switch > 3. The SM is supposed to bring up the new connection > 4. Step 2 is repeated until the SM stops responding. > > Well, if this is the case then OpenSM is might stop responding due to the > following features: > 1. We had in the past cases where bad hardware continuously flooded the SM > with Traps. > To protect against this kind of DOS attack we have implemented an adaptive > filter in > the SM trap receiver: > If the exact same trap is received continuously from same source more then > 10 times > (with no more then of 5sec between the traps) they are considered DOS and > are ignored. > Please see osm_trap_rcv.c for details. > 2. The way IB switches work is that each time a port of their changes > state they: > a. Set the "change bit" in the SwitchInfo > b. Send a trap 128 to the SM. But Trap 128 does not carry the changed port > number. > > So under a test case like you describe what can happen: > 1. The SM decides to ignore trap 128 from the switch as more then 5 > connect/reconnect sequences > happen with not enough "quite" time to recover. > 2. The SwitchInfo ChangeBit is sampled during the OSM light sweep. There > is a race between the > reading of the change bit and the clearing of it. If the connect > disconnect happen very fast > the change bit set by the re-connect can be cleaned by the clear starting > by the disconnect. > > It is easy to see in the log file if the SM did ignore traps. Run with -V > and look for: > grep "Continuously received this trap" /var/log/osm.log > > (for some reason I did not get any log attachments with this thread - > otherwise I would > do some analysis on it too). > > Anyway, if the SM does not heavy sweep (due to the above) it is very > likely it will continue to > poll the non existing node that was previously attached to a switch port > with no success. > > So testing of cable tear off and reconnect should be done with at least 10 > seconds recovery time. > Also you could try sending kill -HUP to the OpenSM process and see if the > full sweep you start > is able to bring all ports up. > > Viswa, with all that said, it is very possible you are experiencing a bug > in OpenSM and we > want to encourage your effort finding those. With your, and others, help > we will be able to > flush them out. > > Thanks > > Eitan > > Hal Rosenstock wrote: > > On Fri, 2005-09-23 at 14:57, Hal Rosenstock wrote: > > > >>On Fri, 2005-09-23 at 13:50, Viswanath Krishnamurthy wrote: > >> > >>>- After 7-8 iterations, I ran into a weird problem, where opensm was > >>>showing the HCA as UNKNOWN. The port > >>>never came up to ACTIVE state. The unplugged and replugged into > >>>different slots, the port remained in INIT > >>>state. > >> > >>Mellanox : SW : 12 : INI : : : 2048 : 1x : 2.5 : > > > > 0002c9010d26e780 : UNKNOWN > > > >>OpenSM thinks that either there is no physical port on the other end > > > > of > > > >>the link or it is not "valid" (GUID non 0). Obviously it is there as > > > > the > > > >>port state is INIT so the physical link came up which requires the > >>remote end to be there. > > > > > >>From the log you sent, this is exactly what is happening. > > Sep 23 10:07:23 451191 [B7751BB0] -> osm_drop_mgr_process: Checking port > > 0x0002c9010d26e780. > > Sep 23 10:07:23 451209 [B7751BB0] -> osm_drop_mgr_process: Checking port > > 0x0002c90200400cfd. > > Sep 23 10:07:23 451226 [B7751BB0] -> osm_drop_mgr_process: ERR 0108: > > Unknown remote side for node 0x0002c9010d26e780 port 20. Adding to light > > sweep sampling list. > > Sep 23 10:07:23 451251 [B7751BB0] -> Directed Path Dump of 1 hop path: > > Path = [0][1] > > Sep 23 10:07:23 451267 [B7751BB0] -> osm_drop_mgr_process: ] > > > > So look in osm_drop_mgr.c line 707: > > Can you enhance the log display to see which is failing: > > osm_physp_is_valid(p_physp) or osm_physp_get_remote(p_physp) ? > > > > Also, it appears to keep light sweeping this port but whichever switch > > port it is on, it does not respond. Not sure where the problem is. It > > could be on the outgoing side of the switch (we could run diags against > > the switch and various ports; I would be curious what they return when > > the subnet is in this broken state) or on the HCA. However, the fact > > that restarting opensm made it go away without touching anything else > > makes this appear otherwise. > > > > > >>One other note is that it appears to have come up as 1x. Is that what > >>should happen ? > > > > > > -- Hal > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Sep 26 09:54:50 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 12:54:50 -0400 Subject: [openib-general] Loading kdapl on 2.6.11.6 Message-ID: <1127753690.4397.25.camel@hal.voltaire.com> Hi James, When loading kdapl built on 2.6.11.6, I keep getting the following: kdapl_/sbin/modprobe kdapl_ib FATAL: Error inserting kdapl_ib (/lib/modules/2.6.11.6/kernel/drivers/infiniband/ulp/kdapl/ib/kdapl_ib.ko): Unknown symbol in module, or unknown parameter (see dmesg) ib: disagrees about version of symbol dat_registry_remove_provider kdapl_ib: Unknown symbol dat_registry_remove_provider kdapl_ib: disagrees about version of symbol dat_registry_add_provider kdapl_ib: Unknown symbol dat_registry_add_provider kdapl_ib: disagrees about version of symbol dat_registry_remove_provider kdapl_ib: Unknown symbol dat_registry_remove_provider kdapl_ib: disagrees about version of symbol dat_registry_add_provider kdapl_ib: Unknown symbol dat_registry_add_provider kdapl_ib: disagrees about version of symbol dat_registry_remove_provider kdapl_ib: Unknown symbol dat_registry_remove_provider kdapl_ib: disagrees about version of symbol dat_registry_add_provider kdapl_ib: Unknown symbol dat_registry_add_provider I've rebuilt the kdapl, kdapl_ib, and dat modules and rebooted but this still occurs. Any idea what's wrong ? Thanks. -- Hal From iod00d at hp.com Mon Sep 26 10:11:32 2005 From: iod00d at hp.com (Grant Grundler) Date: Mon, 26 Sep 2005 10:11:32 -0700 Subject: [openib-general] Re: NOP command failed to generate interrupt (IRQ 201) In-Reply-To: <1127545090.31332.2.camel@QiWang> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30FEDDD@mtlexch01.mtl.com> <1127058546.7204.2.camel@QiWang> <20050921054700.GI24837@esmail.cup.hp.com> <1127291718.11849.7.camel@QiWang> <20050921161043.GB28198@esmail.cup.hp.com> <1127380978.31097.1.camel@QiWang> <20050922182908.GB1235@esmail.cup.hp.com> <1127545090.31332.2.camel@QiWang> Message-ID: <20050926171132.GA16113@esmail.cup.hp.com> On Sat, Sep 24, 2005 at 02:58:10PM +0800, QiWang, Chen wrote: > Hi, grant > > On node c01-14, I installed openib-gen2, kernel 2.6.13.2 > but I have IRQ problem. Yes, I'm not surprised. > ----------------------------------- > > ib_mthca: Mellanox InfiniBand HCA driver v0.06 (June 23, 2005) > ib_mthca: Initializing (0000:03:00.0) > ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 51 (level, low) -> IRQ 201 The firmware (ACPI) in c01-14 says to use "201". The other blades that work (e.g. c01-01) use IRQ 193. Ask your HW/FW supplier why the difference in IRQ assignment. grant > ib_mthca 0000:03:00.0: Found bridge: (0000:02:01.0) > ib_mthca 0000:03:00.0: FW version 000300030003, max commands 64 > ib_mthca 0000:03:00.0: FW size 6143 KB (start e7a00000, end e7ffffff) > ib_mthca 0000:03:00.0: HCA memory size 131071 KB (start e0000000, end > e7ffffff) > ib_mthca 0000:03:00.0: Max QPs: 16777216, reserved QPs: 1024, entry > size: 256 > ib_mthca 0000:03:00.0: Max SRQs: 1024, reserved SRQs: 16, entry size: 32 > ib_mthca 0000:03:00.0: Max CQs: 16777216, reserved CQs: 128, entry size: > 64 > ib_mthca 0000:03:00.0: Max EQs: 64, reserved EQs: 1, entry size: 64 > ib_mthca 0000:03:00.0: reserved MPTs: 16, reserved MTTs: 16 > ib_mthca 0000:03:00.0: Max PDs: 16777216, reserved PDs: 0, reserved > UARs: 1 > ib_mthca 0000:03:00.0: Max QP/MCG: 16777216, reserved MGMs: 0 > ib_mthca 0000:03:00.0: Flags: 00370347 > ib_mthca 0000:03:00.0: profile[ 0]--10/20 @ 0x e0000000 (size 0x > 4000000) > ib_mthca 0000:03:00.0: profile[ 1]-- 0/16 @ 0x e4000000 (size 0x > 1000000) > ib_mthca 0000:03:00.0: profile[ 2]-- 7/18 @ 0x e5000000 (size 0x > 800000) > ib_mthca 0000:03:00.0: profile[ 3]-- 9/17 @ 0x e5800000 (size 0x > 800000) > ib_mthca 0000:03:00.0: profile[ 4]-- 3/16 @ 0x e6000000 (size 0x > 400000) > ib_mthca 0000:03:00.0: profile[ 5]-- 4/16 @ 0x e6400000 (size 0x > 200000) > ib_mthca 0000:03:00.0: profile[ 6]--12/15 @ 0x e6600000 (size 0x > 100000) > ib_mthca 0000:03:00.0: profile[ 7]-- 8/13 @ 0x e6700000 (size 0x > 80000) > ib_mthca 0000:03:00.0: profile[ 8]--11/11 @ 0x e6780000 (size 0x > 10000) > ib_mthca 0000:03:00.0: profile[ 9]-- 2/10 @ 0x e6790000 (size 0x > 8000) > ib_mthca 0000:03:00.0: profile[10]-- 6/ 5 @ 0x e6798000 (size 0x > 800) > ib_mthca 0000:03:00.0: HCA memory: allocated 106082 KB/124928 KB (18846 > KB free) > ib_mthca 0000:03:00.0: Allocated EQ 1 with 65536 entries > ib_mthca 0000:03:00.0: Allocated EQ 2 with 128 entries > ib_mthca 0000:03:00.0: Allocated EQ 3 with 128 entries > ib_mthca 0000:03:00.0: Setting mask 00000000000f43fe for eqn 2 > ib_mthca 0000:03:00.0: Setting mask 0000000000000400 for eqn 3 > ib_mthca 0000:03:00.0: NOP command failed to generate interrupt (IRQ > 201), aborting. > ib_mthca 0000:03:00.0: BIOS or ACPI interrupt routing problem? > ib_mthca 0000:03:00.0: Clearing mask 00000000000f43fe for eqn 2 > ib_mthca 0000:03:00.0: Clearing mask 0000000000000400 for eqn 3 > ACPI: PCI interrupt for device 0000:03:00.0 disabled > ib_mthca: probe of 0000:03:00.0 failed with error -16 > ----------------------------------- > > > On Thu, 2005-09-22 at 11:29 -0700, Grant Grundler wrote: > > > On Thu, Sep 22, 2005 at 05:22:58PM +0800, QiWang, Chen wrote: > > > Hi , > > > I install openib gen2 on c01-14, kernel=2.6.13 > > > But I do not know how it works. > > > > See the openib-gen2/README.kernel-build for directions on > > how to build+install the kernel drivers. > > > > See the openib.org wiki for userspace instructions: > > https://openib.org/tiki/tiki-index.php > > > > If something is wrong or not clear, please ask on the openib-general > > mailing list. > > > > > > grant > > -- > QiWang, Chen > Clustars Supercomputing Technology corp. > http://www.Clustars.CN > TEL:+86-0816-2546345-815 > FAX:+86-0816-2546370 > Mobile:+86-13096497499 From halr at voltaire.com Mon Sep 26 10:06:21 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 13:06:21 -0400 Subject: [openib-general] Re: 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: <20050926165221.GZ12818@mellanox.co.il> References: <433823FF.1070902@ichips.intel.com> <20050926165221.GZ12818@mellanox.co.il> Message-ID: <1127753850.4397.29.camel@hal.voltaire.com> On Mon, 2005-09-26 at 12:52, Michael S. Tsirkin wrote: > Hmm. I do need the source address for the path record query, do I not? Yes, SGID is a required component for a SA GetTable request of PathRecord. -- Hal From jlentini at netapp.com Mon Sep 26 10:13:10 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 26 Sep 2005 13:13:10 -0400 (EDT) Subject: [openib-general] Re: Loading kdapl on 2.6.11.6 In-Reply-To: <1127753690.4397.25.camel@hal.voltaire.com> References: <1127753690.4397.25.camel@hal.voltaire.com> Message-ID: On Mon, 26 Sep 2005, Hal Rosenstock wrote: > Hi James, > > When loading kdapl built on 2.6.11.6, I keep getting the following: > > kdapl_/sbin/modprobe kdapl_ib > FATAL: Error inserting kdapl_ib > (/lib/modules/2.6.11.6/kernel/drivers/infiniband/ulp/kdapl/ib/kdapl_ib.ko): Unknown symbol in module, or unknown parameter (see dmesg) > ib: disagrees about version of symbol dat_registry_remove_provider > kdapl_ib: Unknown symbol dat_registry_remove_provider > kdapl_ib: disagrees about version of symbol dat_registry_add_provider > kdapl_ib: Unknown symbol dat_registry_add_provider > kdapl_ib: disagrees about version of symbol dat_registry_remove_provider > kdapl_ib: Unknown symbol dat_registry_remove_provider > kdapl_ib: disagrees about version of symbol dat_registry_add_provider > kdapl_ib: Unknown symbol dat_registry_add_provider > kdapl_ib: disagrees about version of symbol dat_registry_remove_provider > kdapl_ib: Unknown symbol dat_registry_remove_provider > kdapl_ib: disagrees about version of symbol dat_registry_add_provider > kdapl_ib: Unknown symbol dat_registry_add_provider > > I've rebuilt the kdapl, kdapl_ib, and dat modules and rebooted but this > still occurs. Any idea what's wrong ? Thanks. What is the lsmod output before you modprobe kdapl_ib? If there is an old module being autmatically loaded that defines dat_registry_remove_provider, etc. this would be the type of error I would expect. From jlentini at netapp.com Mon Sep 26 10:15:50 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 26 Sep 2005 13:15:50 -0400 (EDT) Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: <1127749644.4398.878.camel@hal.voltaire.com> References: <1127749644.4398.878.camel@hal.voltaire.com> Message-ID: What is the kdapltest command you are using? On Mon, 26 Sep 2005, Hal Rosenstock wrote: > Hi James, > > I keep getting the following when running kdapltest. This is similar to > what I saw before and reported a couple of times but now seems more > consistent in occurring. > > -- Hal > > Sep 26 10:29:29 hal kernel: DT_Mdep_Thread_: page allocation failure. order:0, mode:0x20 > Sep 26 10:29:29 hal kernel: [] __alloc_pages+0x2f2/0x490 > Sep 26 10:29:29 hal kernel: [] kmem_getpages+0x31/0xb0 > Sep 26 10:29:29 hal kernel: [] cache_grow+0x139/0x360 > Sep 26 10:29:29 hal kernel: [] cache_alloc_refill+0x151/0x340 > Sep 26 10:29:29 hal kernel: [] DT_handle_send_op+0x2fa/0x400 [kdapltest] > Sep 26 10:29:29 hal kernel: [] __kmalloc+0xb4/0xf0 > Sep 26 10:29:29 hal kernel: [] DT_Mdep_Malloc+0x25/0x60 [kdapltest] > Sep 26 10:29:29 hal kernel: [] DT_Tdep_PT_Printf+0x16/0x1b0 [kdapltest] > Sep 26 10:29:29 hal kernel: [] DT_Transaction_Run+0x607/0xb60 [kdapltest] > Sep 26 10:29:29 hal kernel: [] DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest] > Sep 26 10:29:29 hal kernel: [] DT_Transaction_Main+0x1388/0x21a0 [kdapltest] > Sep 26 10:29:29 hal kernel: [] kernel_map_pages+0x28/0x60 > Sep 26 10:29:30 hal kernel: [] cache_free_debugcheck+0x196/0x2d0 > Sep 26 10:29:30 hal kernel: [] DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest] > Sep 26 10:29:30 hal kernel: [] DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest] > Sep 26 10:29:30 hal kernel: [] kernel_thread_helper+0x5/0x10 > Sep 26 10:29:30 hal kernel: Mem-info: > Sep 26 10:29:30 hal kernel: DMA per-cpu: > Sep 26 10:29:30 hal kernel: cpu 0 hot: low 2, high 6, batch 1 used:2 > Sep 26 10:29:30 hal kernel: cpu 0 cold: low 0, high 2, batch 1 used:1 > Sep 26 10:29:30 hal kernel: Normal per-cpu: > Sep 26 10:29:30 hal kernel: cpu 0 hot: low 62, high 186, batch 31 used:92 > Sep 26 10:29:30 hal kernel: cpu 0 cold: low 0, high 62, batch 31 used:44 > Sep 26 10:29:30 hal kernel: HighMem per-cpu: empty > Sep 26 10:29:30 hal kernel: Free pages: 1636kB (0kB HighMem) > Sep 26 10:29:30 hal kernel: Active:28634 inactive:3039 dirty:41 writeback:0 unstable:0 free:409 slab:28426 mapped:31411 pagetables:543 > Sep 26 10:29:30 hal kernel: DMA free:1008kB min:128kB low:160kB high:192kB active:2232kB inactive:4kB present:16384kB pages_scanned:6 all_unreclaimable? no > Sep 26 10:29:30 hal kernel: lowmem_reserve[]: 0 240 240 > Sep 26 10:29:30 hal kernel: Normal free:628kB min:1920kB low:2400kB high:2880kB active:112304kB inactive:12152kB present:245760kB pages_scanned:0 all_unreclaimable? no > Sep 26 10:29:30 hal kernel: lowmem_reserve[]: 0 0 0 > Sep 26 10:29:30 hal kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no > Sep 26 10:29:30 hal kernel: lowmem_reserve[]: 0 0 0 > Sep 26 10:29:30 hal kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB > Sep 26 10:29:30 hal kernel: Normal: 1*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 628kB > Sep 26 10:29:30 hal kernel: HighMem: empty > Sep 26 10:29:30 hal kernel: Swap cache: add 135242, delete 121268, find 84138/98778, race 0+0 > Sep 26 10:29:30 hal kernel: Free swap = 341804kB > Sep 26 10:29:30 hal kernel: Total swap = 522104kB > Sep 26 10:29:30 hal kernel: Free swap: 341804kB > Sep 26 10:29:30 hal kernel: 65536 pages of RAM > Sep 26 10:29:30 hal kernel: 0 pages of HIGHMEM > Sep 26 10:29:30 hal kernel: 1533 reserved pages > Sep 26 10:29:30 hal kernel: 30650 pages shared > Sep 26 10:29:30 hal kernel: 13974 pages swap cached > Sep 26 10:29:30 hal kernel: 41 pages dirty > Sep 26 10:29:30 hal kernel: 0 pages writeback > Sep 26 10:29:30 hal kernel: 31411 pages mapped > Sep 26 10:29:30 hal kernel: 28426 pages slab > Sep 26 10:29:30 hal kernel: 543 pages pagetables > Sep 26 10:29:30 hal kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004 > Sep 26 10:29:30 hal kernel: printing eip: > Sep 26 10:29:30 hal kernel: c022934b > Sep 26 10:29:30 hal kernel: *pde = 07efd067 > Sep 26 10:29:30 hal kernel: *pte = 00000000 > Sep 26 10:29:30 hal kernel: Oops: 0002 [#1] > Sep 26 10:29:30 hal kernel: DEBUG_PAGEALLOC > Sep 26 10:29:30 hal kernel: Modules linked in: kdapltest kdapl_ib ib_cm ib_at kdapl ib_ipoib ib_sa ib_umad ide_cd cdrom lp ipv6 autofs parport_pc parport uhci_hcd ehci_hcd ib_mthca ib_mad ib_core ohci_hcd eepro100 mii usbcore evdev > Sep 26 10:29:30 hal kernel: CPU: 0 > Sep 26 10:29:30 hal kernel: EIP: 0060:[] Not tainted VLI > Sep 26 10:29:30 hal kernel: EFLAGS: 00010283 (2.6.13) > Sep 26 10:29:30 hal kernel: EIP is at vsnprintf+0x4b/0x4f0 > Sep 26 10:29:30 hal kernel: eax: 00000054 ebx: cd240f78 ecx: 00000000 edx: d0bc70c0 > Sep 26 10:29:30 hal kernel: esi: 00000004 edi: 00000001 ebp: 00000103 esp: c006dd78 > Sep 26 10:29:30 hal kernel: ds: 007b es: 007b ss: 0068 > Sep 26 10:29:30 hal kernel: Process DT_Mdep_Thread_ (pid: 2230, threadinfo=c006c000 task=c052cac0) > Sep 26 10:29:30 hal kernel: Stack: cffff740 00000020 00000000 d0bc16d5 00000104 00000000 00000001 d0bc16d5 > Sep 26 10:29:30 hal kernel: 00000104 00000020 cd240f78 00000000 00000001 c2538000 d0bc258b 00000004 > Sep 26 10:29:30 hal kernel: 00000100 d0bc70c0 c006dde8 00000000 0000007b ca254f78 cd240f78 cef6e060 > Sep 26 10:29:30 hal kernel: Call Trace: > Sep 26 10:29:30 hal kernel: [] DT_Mdep_Malloc+0x25/0x60 [kdapltest] > Sep 26 10:29:30 hal kernel: [] DT_Mdep_Malloc+0x25/0x60 [kdapltest] > Sep 26 10:29:30 hal kernel: [] DT_Tdep_PT_Printf+0x3b/0x1b0 [kdapltest] > Sep 26 10:29:30 hal kernel: [] DT_Transaction_Run+0x607/0xb60 [kdapltest] > Sep 26 10:29:30 hal kernel: [] DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest] > Sep 26 10:29:30 hal kernel: [] DT_Transaction_Main+0x1388/0x21a0 [kdapltest] > Sep 26 10:29:30 hal kernel: [] kernel_map_pages+0x28/0x60 > Sep 26 10:29:30 hal kernel: [] cache_free_debugcheck+0x196/0x2d0 > Sep 26 10:29:30 hal kernel: [] DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest] > Sep 26 10:29:30 hal kernel: [] DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest] > Sep 26 10:29:30 hal kernel: [] kernel_thread_helper+0x5/0x10 > Sep 26 10:29:30 hal kernel: Code: f0 48 39 c5 73 0d 89 f2 f7 da bd ff ff ff ff 89 54 24 40 8b 54 24 44 80 3a 00 74 23 8d 74 26 00 0f b6 02 3c 25 74 3d 39 ee 77 06 <88> 06 8b 54 24 44 46 89 d0 42 89 54 24 44 80 78 01 00 75 e1 39 > > From iod00d at hp.com Mon Sep 26 10:15:21 2005 From: iod00d at hp.com (Grant Grundler) Date: Mon, 26 Sep 2005 10:15:21 -0700 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <52psqy8jt2.fsf@cisco.com> References: <20050924074611.GD3950@us.ibm.com> <52psqy8jt2.fsf@cisco.com> Message-ID: <20050926171521.GB16113@esmail.cup.hp.com> On Sat, Sep 24, 2005 at 10:19:53AM -0700, Roland Dreier wrote: ... > I just checked in a fix for this -- the pci_pretty_name() API has gone > away, so I removed our use of it in svn. I don't understand how your > other builds of git + svn succeeded though, since pci_pretty_name is > completely gone. Oh, I guess you'll miss link failures when building > modules, so functions that disappear won't break the build. I think he can run "make modules_install INSTALL_MOD_PATH=/tmp" to verify modules link in properly. grant From halr at voltaire.com Mon Sep 26 10:12:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 13:12:49 -0400 Subject: [openib-general] Re: [IBAT] interface numbering assumption In-Reply-To: References: <1127484628.15613.12768.camel@hal.voltaire.com> Message-ID: <1127754754.4397.39.camel@hal.voltaire.com> On Fri, 2005-09-23 at 15:33, James Lentini wrote: > On Fri, 23 Sep 2005, Hal Rosenstock wrote: > > > > Is there a better way to enumerate all of the network inferaces? I > > > believe that is what this for loop is attempting to accomplish. > > > > Yes. I think that the net_device list from dev_base could be walked > > instead and that would resolve this issue. > > Can you help we understand the logic in at.c:resolve_ip()? Here is my > assumption of what this function does: > > 1) consults the IP routing table for an interface > device using ip_route_output_key > > 2) if the device does not meet certain criteria, return an error > > 3) if the device is a loopback device, search for another device > that is an INFINIBAND device and is UP. > > 4) ... > > I've included a small patch below to fix the problem I observed in #3. > It walks the dev_base list as you described. Thanks. Applied. > However I don't understand why the device returned in step #1 isn't > always used as I assumpe this is the interface the routing table says > to use. That makes me think I've misinterpreted the purpose of > ip_route_output_key. What am I missing? Because the loopback device is sometimes the answer and that doesn't work for address resolution. It needs to be an IPoIB interface. I don't think you can expect ATS to work over a loopback interface. -- Hal From viswa.krish at gmail.com Mon Sep 26 10:20:18 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Mon, 26 Sep 2005 10:20:18 -0700 Subject: [openib-general] Another opensm bug ? Message-ID: <4df28be405092610206c1e8d52@mail.gmail.com> I ran into another opensm bug which caused opensm to stop functioning. This happened only once. Here is the test case 1. Run opensm on Machine A 2. Run the following script on M/c B a. Check ibstatus b. Ping machine A c. Run osmtest d. reboot The test case is to make sure opensm configures the machine correcty. Out of 850 iterations, I saw this error once. The opensm started receiving Sbnet trap continiously. 9I did not see any message in the log to prevent DOS attacks) The Trap has the same transacation id (0x224 in this case). opensm mad receive thread was getting called continously called. Initially I suspected the situation which Eitan described.. (Bad hardware causing traps etc). But when I stoppped and restarted opensm, the problem went away. Log attached off-list. -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Mon Sep 26 10:16:54 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 13:16:54 -0400 Subject: [openib-general] Re: Loading kdapl on 2.6.11.6 In-Reply-To: References: <1127753690.4397.25.camel@hal.voltaire.com> Message-ID: <1127755013.4397.43.camel@hal.voltaire.com> On Mon, 2005-09-26 at 13:13, James Lentini wrote: > On Mon, 26 Sep 2005, Hal Rosenstock wrote: > > > Hi James, > > > > When loading kdapl built on 2.6.11.6, I keep getting the following: > > > > kdapl_/sbin/modprobe kdapl_ib > > FATAL: Error inserting kdapl_ib > > (/lib/modules/2.6.11.6/kernel/drivers/infiniband/ulp/kdapl/ib/kdapl_ib.ko): Unknown symbol in module, or unknown parameter (see dmesg) > > ib: disagrees about version of symbol dat_registry_remove_provider > > kdapl_ib: Unknown symbol dat_registry_remove_provider > > kdapl_ib: disagrees about version of symbol dat_registry_add_provider > > kdapl_ib: Unknown symbol dat_registry_add_provider > > kdapl_ib: disagrees about version of symbol dat_registry_remove_provider > > kdapl_ib: Unknown symbol dat_registry_remove_provider > > kdapl_ib: disagrees about version of symbol dat_registry_add_provider > > kdapl_ib: Unknown symbol dat_registry_add_provider > > kdapl_ib: disagrees about version of symbol dat_registry_remove_provider > > kdapl_ib: Unknown symbol dat_registry_remove_provider > > kdapl_ib: disagrees about version of symbol dat_registry_add_provider > > kdapl_ib: Unknown symbol dat_registry_add_provider > > > > I've rebuilt the kdapl, kdapl_ib, and dat modules and rebooted but this > > still occurs. Any idea what's wrong ? Thanks. > > What is the lsmod output before you modprobe kdapl_ib? [root at localhost hal]# /sbin/modprobe kdapl [root at localhost hal]# /sbin/modprobe kdapl_ib FATAL: Error inserting kdapl_ib (/lib/modules/2.6.11.6/kernel/drivers/infiniband/ulp/kdapl/ib/kdapl_ib.ko): Unknown symbol in module, or unknown parameter (see dmesg) > If there is an > old module being autmatically loaded that defines > dat_registry_remove_provider, etc. this would be the type of error I > would expect. What are the potential old modules other than dat, kdapl, and kdapl_ib here ? dat must somehow be old (or newer)... -- Hal From jlentini at netapp.com Mon Sep 26 10:27:21 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 26 Sep 2005 13:27:21 -0400 (EDT) Subject: [openib-general] Re: Loading kdapl on 2.6.11.6 In-Reply-To: <1127755013.4397.43.camel@hal.voltaire.com> References: <1127753690.4397.25.camel@hal.voltaire.com> <1127755013.4397.43.camel@hal.voltaire.com> Message-ID: On Mon, 26 Sep 2005, Hal Rosenstock wrote: > On Mon, 2005-09-26 at 13:13, James Lentini wrote: > > On Mon, 26 Sep 2005, Hal Rosenstock wrote: > > > > > Hi James, > > > > > > When loading kdapl built on 2.6.11.6, I keep getting the following: > > > > > > kdapl_/sbin/modprobe kdapl_ib > > > FATAL: Error inserting kdapl_ib > > > (/lib/modules/2.6.11.6/kernel/drivers/infiniband/ulp/kdapl/ib/kdapl_ib.ko): Unknown symbol in module, or unknown parameter (see dmesg) > > > ib: disagrees about version of symbol dat_registry_remove_provider > > > kdapl_ib: Unknown symbol dat_registry_remove_provider > > > kdapl_ib: disagrees about version of symbol dat_registry_add_provider > > > kdapl_ib: Unknown symbol dat_registry_add_provider > > > kdapl_ib: disagrees about version of symbol dat_registry_remove_provider > > > kdapl_ib: Unknown symbol dat_registry_remove_provider > > > kdapl_ib: disagrees about version of symbol dat_registry_add_provider > > > kdapl_ib: Unknown symbol dat_registry_add_provider > > > kdapl_ib: disagrees about version of symbol dat_registry_remove_provider > > > kdapl_ib: Unknown symbol dat_registry_remove_provider > > > kdapl_ib: disagrees about version of symbol dat_registry_add_provider > > > kdapl_ib: Unknown symbol dat_registry_add_provider > > > > > > I've rebuilt the kdapl, kdapl_ib, and dat modules and rebooted but this > > > still occurs. Any idea what's wrong ? Thanks. > > > > What is the lsmod output before you modprobe kdapl_ib? > > [root at localhost hal]# /sbin/modprobe kdapl > [root at localhost hal]# /sbin/modprobe kdapl_ib > FATAL: Error inserting kdapl_ib (/lib/modules/2.6.11.6/kernel/drivers/infiniband/ulp/kdapl/ib/kdapl_ib.ko): Unknown symbol in module, or unknown parameter (see dmesg) > > > If there is an > > old module being autmatically loaded that defines > > dat_registry_remove_provider, etc. this would be the type of error I > > would expect. > > What are the potential old modules other than dat, kdapl, and kdapl_ib > here ? dat must somehow be old (or newer)... The dat module is the problem. When we moved the code to the trunk, we called the dat registry module kdapl. There shouldn't be a "dat" module. If you rmmod dat, I expect that this will fix your problem. james From halr at voltaire.com Mon Sep 26 10:35:15 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 13:35:15 -0400 Subject: [openib-general] Re: Loading kdapl on 2.6.11.6 In-Reply-To: References: <1127753690.4397.25.camel@hal.voltaire.com> <1127755013.4397.43.camel@hal.voltaire.com> Message-ID: <1127756114.4397.62.camel@hal.voltaire.com> On Mon, 2005-09-26 at 13:27, James Lentini wrote: > The dat module is the problem. When we moved the code to the trunk, we > called the dat registry module kdapl. There shouldn't be a "dat" > module. > > If you rmmod dat, I expect that this will fix your problem. Thanks. That pointed me in the right direction. I'm now able to load and run kdapl on 2.6.11.6 again. -- Hal From nacc at us.ibm.com Mon Sep 26 10:47:22 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 26 Sep 2005 10:47:22 -0700 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <20050926171521.GB16113@esmail.cup.hp.com> References: <20050924074611.GD3950@us.ibm.com> <52psqy8jt2.fsf@cisco.com> <20050926171521.GB16113@esmail.cup.hp.com> Message-ID: <20050926174722.GE7532@us.ibm.com> On 26.09.2005 [10:15:21 -0700], Grant Grundler wrote: > On Sat, Sep 24, 2005 at 10:19:53AM -0700, Roland Dreier wrote: > ... > > I just checked in a fix for this -- the pci_pretty_name() API has gone > > away, so I removed our use of it in svn. I don't understand how your > > other builds of git + svn succeeded though, since pci_pretty_name is > > completely gone. Oh, I guess you'll miss link failures when building > > modules, so functions that disappear won't break the build. > > I think he can run "make modules_install INSTALL_MOD_PATH=/tmp" to verify > modules link in properly. True true. I will work on adding that to my test script and see what happens. Thanks, Nish From halr at voltaire.com Mon Sep 26 10:50:10 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 13:50:10 -0400 Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: References: <1127749644.4398.878.camel@hal.voltaire.com> Message-ID: <1127757010.4400.1.camel@hal.voltaire.com> On Mon, 2005-09-26 at 13:15, James Lentini wrote: > What is the kdapltest command you are using? kdapltest -T T -s -D mthca0a -d -i 10000 -w 8 client SR server SR seems to fail but kdapltest -T T -s -D mthca0a -d -t 2 -w 8 -i 20 client SR server SR works on that machine From halr at voltaire.com Mon Sep 26 10:53:29 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 13:53:29 -0400 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <433827FF.3010601@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> <433827FF.3010601@ichips.intel.com> Message-ID: <1127757208.4400.5.camel@hal.voltaire.com> On Mon, 2005-09-26 at 12:55, Sean Hefty wrote: > Guy German wrote: > > I believe that ib_at is still a valuable module even if ATS reverse ARP > > is broken, and I think we should discuss this. > > Here's my thinking on this. ATS is broken as you mentioned for reverse lookups. > However, if we want to keep ATS, I think that ATS registration/deregistration > should be integrated with IPoIB. There was a desire expressed a long time ago to keep these separate. > To keep it separate, we will need to patch > net_device to provide an rdma_ptr as suggested by Roland. This is also needed by SDP currently. -- Hal From jlentini at netapp.com Mon Sep 26 11:20:38 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 26 Sep 2005 14:20:38 -0400 (EDT) Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <1127757208.4400.5.camel@hal.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> <433827FF.3010601@ichips.intel.com> <1127757208.4400.5.camel@hal.voltaire.com> Message-ID: On Mon, 26 Sep 2005, Hal Rosenstock wrote: > On Mon, 2005-09-26 at 12:55, Sean Hefty wrote: > > Guy German wrote: > > > I believe that ib_at is still a valuable module even if ATS > > > reverse ARP is broken, and I think we should discuss this. > > > > Here's my thinking on this. ATS is broken as you mentioned for > > reverse lookups. > > However, if we want to keep ATS, I think that ATS > > registration/deregistration should be integrated with IPoIB. > > There was a desire expressed a long time ago to keep these separate. I agree. Keeping it seperate is best. From suri at baymicrosystems.com Mon Sep 26 11:42:06 2005 From: suri at baymicrosystems.com (Suresh Shelvapille) Date: Mon, 26 Sep 2005 14:42:06 -0400 Subject: [openib-general] drivers.diff patch In-Reply-To: <20050926154543.GU12818@mellanox.co.il> Message-ID: <200509261842.j8QIg6T0007073@ns1.baymicrosystems.com> Folks: I am trying to add Infiniband drivers to a 2.6.10 kernel, and the docs/readme-kernel.txt says to apply drivers.diff patch so that the Infiniband support can be enabled in configuration. Where do I find this patch? Thanks a lot. Suri From rolandd at cisco.com Mon Sep 26 11:45:19 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 11:45:19 -0700 Subject: [openib-general] Re: [PATCH] check for valid MGID in user space In-Reply-To: <20050926063641.GA15117@mellanox.co.il> (Jack Morgenstein's message of "Mon, 26 Sep 2005 09:36:41 +0300") References: <52ll1m894h.fsf@cisco.com> <20050926063641.GA15117@mellanox.co.il> Message-ID: <52oe6fu0qo.fsf@cisco.com> I think that's going to far to stick everything into one return statement. I committed the change below, which I think is a lot clearer, and queued it for 2.6.15. - R. --- linux-kernel/infiniband/core/verbs.c (revision 3544) +++ linux-kernel/infiniband/core/verbs.c (working copy) @@ -523,16 +523,22 @@ EXPORT_SYMBOL(ib_dealloc_fmr); int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) { - return qp->device->attach_mcast ? - qp->device->attach_mcast(qp, gid, lid) : - -ENOSYS; + if (!qp->device->attach_mcast) + return -ENOSYS; + if (gid->raw[0] != 0xff || qp->qp_type != IB_QPT_UD) + return -EINVAL; + + return qp->device->attach_mcast(qp, gid, lid); } EXPORT_SYMBOL(ib_attach_mcast); int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid) { - return qp->device->detach_mcast ? - qp->device->detach_mcast(qp, gid, lid) : - -ENOSYS; + if (!qp->device->detach_mcast) + return -ENOSYS; + if (gid->raw[0] != 0xff || qp->qp_type != IB_QPT_UD) + return -EINVAL; + + return qp->device->detach_mcast(qp, gid, lid); } EXPORT_SYMBOL(ib_detach_mcast); From iod00d at hp.com Mon Sep 26 11:53:33 2005 From: iod00d at hp.com (Grant Grundler) Date: Mon, 26 Sep 2005 11:53:33 -0700 Subject: [openib-general] Re: Loading kdapl on 2.6.11.6 In-Reply-To: References: <1127753690.4397.25.camel@hal.voltaire.com> Message-ID: <20050926185333.GC16113@esmail.cup.hp.com> On Mon, Sep 26, 2005 at 01:13:10PM -0400, James Lentini wrote: ... > > kdapl_ib: disagrees about version of symbol dat_registry_add_provider > > kdapl_ib: Unknown symbol dat_registry_add_provider > > > > I've rebuilt the kdapl, kdapl_ib, and dat modules and rebooted but this > > still occurs. Any idea what's wrong ? Thanks. > > What is the lsmod output before you modprobe kdapl_ib? If there is an > old module being autmatically loaded that defines > dat_registry_remove_provider, etc. this would be the type of error I > would expect. yes, I'm seeing something similar caused by ib_sdp (SVN 3485): ota:~# lsmod Module Size Used by ib_sdp 296816 59 ib_cm 92936 1 ib_sdp ib_sa 26212 1 ib_sdp ib_mad 89328 2 ib_cm,ib_sa ib_core 92200 4 ib_sdp,ib_cm,ib_sa,ib_mad I'll reboot to SVN 3547 and see if it's something obviously wrong with the reference counting. thanks, grant From halr at voltaire.com Mon Sep 26 11:57:24 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 14:57:24 -0400 Subject: [openib-general] drivers.diff patch In-Reply-To: <200509261842.j8QIg6T0007073@ns1.baymicrosystems.com> References: <200509261842.j8QIg6T0007073@ns1.baymicrosystems.com> Message-ID: <1127761043.4400.330.camel@hal.voltaire.com> On Mon, 2005-09-26 at 14:42, Suresh Shelvapille wrote: > Folks: > > I am trying to add Infiniband drivers to a 2.6.10 kernel, and the > docs/readme-kernel.txt says to apply drivers.diff patch so that the > Infiniband support can be enabled in configuration. Where do I find this > patch? That README may be a little dated. The backpatches are available in either: https://openib.org/svn/gen2/branches/backport-to-2.6.9/ or https://openib.org/svn/gen2/branches/backport/ Not sure exactly what is needed for 2.6.10. -- Hal > Thanks a lot. > > Suri > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Mon Sep 26 12:19:16 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 15:19:16 -0400 Subject: [openib-general] netdev reference counting problem with ib_at In-Reply-To: <528xxwk0rp.fsf@cisco.com> References: <52hdctg7jv.fsf@cisco.com> <432B5F1E.7090909@ichips.intel.com> <528xxwk0rp.fsf@cisco.com> Message-ID: <1127762094.4400.445.camel@hal.voltaire.com> On Fri, 2005-09-16 at 20:15, Roland Dreier wrote: > Sean> I continue to hit this same issue, so I've started looking > Sean> at the ib_at code. Ib_at accesses struct ipoib_dev_priv to > Sean> get information about the related port that IPoIB is using. > Sean> Is there some other way for AT to get to the same > Sean> information? It seems wrong for AT to poke into the priv > Sean> data of a net_device. Should IPoIB expose a function that AT > Sean> can call to map IP addresses (or net_device) to IB ports? > Sean> How do we want to handle this long term? > > It probably makes sense to add an ib_ptr (or rdma_ptr) to struct > net_device (along with all the other ones like ip_ptr, dn_ptr, > ax25_ptr, etc). Is this ib_ptr or ipoib_ptr ? I would think iWARP devices would need this. -- Hal From jlentini at netapp.com Mon Sep 26 12:28:28 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 26 Sep 2005 15:28:28 -0400 (EDT) Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: References: Message-ID: On Wed, 21 Sep 2005, Sean Hefty wrote: > Here's the updated implementation. It compiles, but that's it. Sean, Overall this looks very good. A few comments: Why would this module be a ULP and not part of the core? Especially since the rdma_cma.h include file is intended for the core include area, include/rdma. I expect that the IB_CM_REQ_RECEIVED callback will be confusing to ULPs. The ULP will receive a new cma_id with an old context value. If the ULP wanted to make an adjustments to the cma_id that received the request, it would need to store a reference to it in the old cma_id's context value. I suggest you make the new cma_id part of the event data (see patch below). In cma_get_service_id: I assume that the IB_OPENIB_OUI will be replaced by an IBTA OUI when this address resolution mechanism is standardized. How will the IP address and port number in the connection request be delivered to the ULP? Would the design be cleaner if instead of sprinkling the code with "switch (cma_id->device->node_type)" statements, we used function pointers? Here's a patch that: - move listen declaration closer to accept and reject - add private data and new cma_id fields to event structure - record need to address information in the event structure - implement private data handling for IB_CM_REQ_RECEIVED and IB_CM_REP_RECEIVED - on white space fix for rdma_cma_reject Signed-off-by: James Lentini Index: ulp/cma/cma.c =================================================================== --- ulp/cma/cma.c (revision 3541) +++ ulp/cma/cma.c (working copy) @@ -177,8 +177,6 @@ if (!route->path_rec) goto err; - ib_event->private_data += sizeof *addr; - route->path_rec[0] = *ib_event->param.req_rcvd.primary_path; if (route->num_paths == 2) route->path_rec[1] = *ib_event->param.req_rcvd.alternate_path; @@ -229,7 +227,7 @@ static int cma_ib_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) { - struct cma_id_private *cma_id_priv; + struct cma_id_private *cma_id_priv, new_cma_id_priv; struct rdma_cma_event event; cma_id_priv = cm_id->context; @@ -240,13 +238,22 @@ event.event = RDMA_CMA_EVENT_UNREACHABLE; break; case IB_CM_REQ_RECEIVED: - cma_id_priv = cma_req_recv(cma_id_priv, ib_event); - if (!cma_id_priv) + new_cma_id_priv = cma_req_recv(cma_id_priv, ib_event); + if (!new_cma_id_priv) return -ENOMEM; event.event = RDMA_CMA_EVENT_CONNECT_REQUEST; + event.private_data = ib_event->private_data + + sizeof struct cma_addr; + event.private_data_len = IB_CM_REQ_PRIVATE_DATA_SIZE - + sizeof struct cma_addr; + event.new_cma_id = new_cma_id_priv->cma_id; break; case IB_CM_REP_RECEIVED: event.event = cma_rep_recv(cma_id_priv); + event.private_data = ib_event->private_data + + sizeof struct cma_addr; + event.private_data_len = IB_CM_REQ_PRIVATE_DATA_SIZE - + sizeof struct cma_addr; break; case IB_CM_RTU_RECEIVED: event.event = cma_rtu_recv(cma_id_priv); @@ -606,8 +613,8 @@ } EXPORT_SYMBOL(rdma_cma_accept); -int rdma_cma_reject(struct rdma_cma_id *cma_id, - const void *private_data, u8 private_data_len) +int rdma_cma_reject(struct rdma_cma_id *cma_id, const void *private_data, + u8 private_data_len) { struct cma_id_private *cma_id_priv; int ret; Index: include/rdma/rdma_cma.h =================================================================== --- include/rdma/rdma_cma.h (revision 3541) +++ include/rdma/rdma_cma.h (working copy) @@ -53,13 +53,20 @@ int num_paths; }; +struct rdma_cma_id; + struct rdma_cma_event { enum rdma_cma_event_type event; + /* for RDMA_CMA_EVENT_CONNECT_REQUEST and */ + /* active side RDMA_CMA_EVENT_ESTABLISHED */ void *private_data; + u8 private_data_len; + /* for RDMA_CMA_EVENT_CONNECT_REQUEST */ + rdma_cma_id *new_cma_id; + + /* TODO need to add the RDMA_CMA_EVENT_CONNECT_REQUEST's IP and port */ }; -struct rdma_cma_id; - typedef void (*rdma_cma_event_handler)(struct rdma_cma_id *cma_id, struct rdma_cma_event *event); @@ -76,12 +83,6 @@ void rdma_cma_destroy_id(struct rdma_cma_id *cma_id); -/** - * rdma_cma_listen - this function is called by the passive side. It is - * listening on a the specified port for incomming connection requests. - */ -int rdma_cma_listen(struct rdma_cma_id *cma_id, struct sockaddr *addr); - int rdma_cma_resolve_route(struct rdma_cma_id *cma_id, struct sockaddr *src_addr, struct sockaddr *dst_addr); @@ -109,6 +110,13 @@ struct rdma_cma_conn_param *conn_param); /** + * rdma_cma_listen - this function is called by the passive side. It is + * listening on a the specified port for incoming connection requests. + */ +int rdma_cma_listen(struct rdma_cma_id *cma_id, struct sockaddr *addr); + + +/** * rdma_cma_accept - call on the passive side to accept a connection request * note that if the function returned with error - a reject message was * sent to the remote side and the cma_id was destroyed. From mshefty at ichips.intel.com Mon Sep 26 12:29:05 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 12:29:05 -0700 Subject: [openib-general] netdev reference counting problem with ib_at In-Reply-To: <1127762094.4400.445.camel@hal.voltaire.com> References: <52hdctg7jv.fsf@cisco.com> <432B5F1E.7090909@ichips.intel.com> <528xxwk0rp.fsf@cisco.com> <1127762094.4400.445.camel@hal.voltaire.com> Message-ID: <43384C01.7030409@ichips.intel.com> Hal Rosenstock wrote: >> Sean> I continue to hit this same issue, so I've started looking >> Sean> at the ib_at code. Ib_at accesses struct ipoib_dev_priv to >> Sean> get information about the related port that IPoIB is using. >> Sean> Is there some other way for AT to get to the same >> Sean> information? It seems wrong for AT to poke into the priv >> Sean> data of a net_device. Should IPoIB expose a function that AT >> Sean> can call to map IP addresses (or net_device) to IB ports? >> Sean> How do we want to handle this long term? >> >>It probably makes sense to add an ib_ptr (or rdma_ptr) to struct >>net_device (along with all the other ones like ip_ptr, dn_ptr, >>ax25_ptr, etc). > > > Is this ib_ptr or ipoib_ptr ? I would think iWARP devices would need > this. I think that we can implement the CMA interface without adding this pointer by using ARP and the private data in the CM REQ. Is it needed for any other purpose? - Sean From halr at voltaire.com Mon Sep 26 12:29:58 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 15:29:58 -0400 Subject: [openib-general] netdev reference counting problem with ib_at In-Reply-To: <1127762094.4400.445.camel@hal.voltaire.com> References: <52hdctg7jv.fsf@cisco.com> <432B5F1E.7090909@ichips.intel.com> <528xxwk0rp.fsf@cisco.com> <1127762094.4400.445.camel@hal.voltaire.com> Message-ID: <1127762997.4400.535.camel@hal.voltaire.com> On Mon, 2005-09-26 at 15:19, Hal Rosenstock wrote: > On Fri, 2005-09-16 at 20:15, Roland Dreier wrote: > > Sean> I continue to hit this same issue, so I've started looking > > Sean> at the ib_at code. Ib_at accesses struct ipoib_dev_priv to > > Sean> get information about the related port that IPoIB is using. > > Sean> Is there some other way for AT to get to the same > > Sean> information? It seems wrong for AT to poke into the priv > > Sean> data of a net_device. Should IPoIB expose a function that AT > > Sean> can call to map IP addresses (or net_device) to IB ports? > > Sean> How do we want to handle this long term? > > > > It probably makes sense to add an ib_ptr (or rdma_ptr) to struct > > net_device (along with all the other ones like ip_ptr, dn_ptr, > > ax25_ptr, etc). > > Is this ib_ptr or ipoib_ptr ? I would think iWARP devices would need > this. Oops. I meant to write "iWARP devices wouldn't need this". -- Hal From rolandd at cisco.com Mon Sep 26 12:36:14 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 12:36:14 -0700 Subject: [openib-general] Re: [PATCH] incorrect atomic attribute returned by ib/v_query_device In-Reply-To: <20050926134302.GA18503@mellanox.co.il> (Jack Morgenstein's message of "Mon, 26 Sep 2005 16:43:02 +0300") References: <20050926134302.GA18503@mellanox.co.il> Message-ID: <52irwntydt.fsf@cisco.com> Jack> I'm starting to fix ib_query_device/ibv_query_device -- Jack> adding missing fields, correcting values in current fields. Great! Jack> Enclosed is a patch for the atomic_cap field. Please Jack> review. Thanks. Thanks, applied and queued for 2.6.15, with the following fix > --- linux-kernel/infiniband/core/uverbs_cmd.c (revision 3532) > +++ linux-kernel/infiniband/core/uverbs_cmd.c (working copy) > @@ -199,6 +199,7 @@ > resp.max_pkeys = attr.max_pkeys; > resp.local_ca_ack_delay = attr.local_ca_ack_delay; > resp.phys_port_cnt = file->device->ib_dev->phys_port_cnt; > + resp.atomic_cap = attr.atomic_cap; > > if (copy_to_user((void __user *) (unsigned long) cmd.response, > &resp, sizeof resp)) This chunk seems to introduce a duplicate line -- we already set atomic_cap a few lines earlier. I just left this part out. - R. From rolandd at cisco.com Mon Sep 26 12:41:35 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 12:41:35 -0700 Subject: [openib-general] netdev reference counting problem with ib_at In-Reply-To: <1127762997.4400.535.camel@hal.voltaire.com> (Hal Rosenstock's message of "26 Sep 2005 15:29:58 -0400") References: <52hdctg7jv.fsf@cisco.com> <432B5F1E.7090909@ichips.intel.com> <528xxwk0rp.fsf@cisco.com> <1127762094.4400.445.camel@hal.voltaire.com> <1127762997.4400.535.camel@hal.voltaire.com> Message-ID: <52ek7bty4w.fsf@cisco.com> Hal> Oops. I meant to write "iWARP devices wouldn't need this". I think they do. For example, an iWARP device driver would want to get from a struct net_device to a struct rdma_device when using the route tables. - R. From halr at voltaire.com Mon Sep 26 12:36:35 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 15:36:35 -0400 Subject: [openib-general] netdev reference counting problem with ib_at In-Reply-To: <43384C01.7030409@ichips.intel.com> References: <52hdctg7jv.fsf@cisco.com> <432B5F1E.7090909@ichips.intel.com> <528xxwk0rp.fsf@cisco.com> <1127762094.4400.445.camel@hal.voltaire.com> <43384C01.7030409@ichips.intel.com> Message-ID: <1127763141.4400.557.camel@hal.voltaire.com> On Mon, 2005-09-26 at 15:29, Sean Hefty wrote: > > Is this ib_ptr or ipoib_ptr ? I would think iWARP devices would need > > this. > > I think that we can implement the CMA interface without adding this pointer by > using ARP and the private data in the CM REQ. True if that is the approach taken. > Is it needed for any other purpose? SDP does this too (as well as AT). -- Hal From mshefty at ichips.intel.com Mon Sep 26 12:52:18 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 12:52:18 -0700 Subject: [openib-general] netdev reference counting problem with ib_at In-Reply-To: <1127763141.4400.557.camel@hal.voltaire.com> References: <52hdctg7jv.fsf@cisco.com> <432B5F1E.7090909@ichips.intel.com> <528xxwk0rp.fsf@cisco.com> <1127762094.4400.445.camel@hal.voltaire.com> <43384C01.7030409@ichips.intel.com> <1127763141.4400.557.camel@hal.voltaire.com> Message-ID: <43385172.60604@ichips.intel.com> Hal Rosenstock wrote: > On Mon, 2005-09-26 at 15:29, Sean Hefty wrote: > >>>Is this ib_ptr or ipoib_ptr ? I would think iWARP devices would need >>>this. >> >>I think that we can implement the CMA interface without adding this pointer by >>using ARP and the private data in the CM REQ. > > True if that is the approach taken. I thought that there was agreement to use the private data in the CM REQ. Using ARP is just taking advantage of existing functionality. >> Is it needed for any other purpose? > > SDP does this too (as well as AT). What does SDP use this for? Can it get to the same data another way? - Sean From halr at voltaire.com Mon Sep 26 13:00:52 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 16:00:52 -0400 Subject: [openib-general] netdev reference counting problem with ib_at In-Reply-To: <43385172.60604@ichips.intel.com> References: <52hdctg7jv.fsf@cisco.com> <432B5F1E.7090909@ichips.intel.com> <528xxwk0rp.fsf@cisco.com> <1127762094.4400.445.camel@hal.voltaire.com> <43384C01.7030409@ichips.intel.com> <1127763141.4400.557.camel@hal.voltaire.com> <43385172.60604@ichips.intel.com> Message-ID: <1127764851.4400.729.camel@hal.voltaire.com> On Mon, 2005-09-26 at 15:52, Sean Hefty wrote: > Hal Rosenstock wrote: > > On Mon, 2005-09-26 at 15:29, Sean Hefty wrote: > > > >>>Is this ib_ptr or ipoib_ptr ? I would think iWARP devices would need > >>>this. > >> > >>I think that we can implement the CMA interface without adding this pointer by > >>using ARP and the private data in the CM REQ. > > > > True if that is the approach taken. > > I thought that there was agreement to use the private data in the CM REQ. Using > ARP is just taking advantage of existing functionality. Is there any harm in exposing this ? > >> Is it needed for any other purpose? > > > > SDP does this too (as well as AT). > > What does SDP use this for? Same thing as AT right now. > Can it get to the same data another way? Probably. -- Hal From mshefty at ichips.intel.com Mon Sep 26 13:13:25 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 13:13:25 -0700 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: References: Message-ID: <43385665.5000104@ichips.intel.com> James Lentini wrote: > Why would this module be a ULP and not part of the core? Especially > since the rdma_cma.h include file is intended for the core include > area, include/rdma. It can be a separately loaded module, so a ULP from the viewpoint of verbs, SA query, IB CM, etc. > I expect that the IB_CM_REQ_RECEIVED callback will be confusing to > ULPs. The ULP will receive a new cma_id with an old context value. If > the ULP wanted to make an adjustments to the cma_id that received the > request, it would need to store a reference to it in the old cma_id's > context value. I suggest you make the new cma_id part of the event > data (see patch below). The new cma_id must be used after the REQ is received, so I wanted to make that clear. There's not much that a user can do with the listening cma_id from within the callback; since it cannot be destroyed from the callback itself. This is a fairly minor issue that we can discuss more. > How will the IP address and port number in the connection request be > delivered to the ULP? The cma_id contains the source and destination address. See cma_req_recv(). This was added in after the initial posting of the code. > Would the design be cleaner if instead of sprinkling the code with "switch > (cma_id->device->node_type)" statements, we used function pointers? I wasn't sure what the iWarp portion of the code would look like, but I wanted to leave place for an iWarp implementation to be easily inserted. It seems that it would take a fair deal of additional code to use function pointers. > Here's a patch that: > > - move listen declaration closer to accept and reject > - add private data and new cma_id fields to event structure > - record need to address information in the event structure > - implement private data handling for IB_CM_REQ_RECEIVED and > IB_CM_REP_RECEIVED > - on white space fix for rdma_cma_reject Thanks - I'll look it over and merge in the changes. - Sean From halr at voltaire.com Mon Sep 26 13:08:15 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 16:08:15 -0400 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: <4332EDF8.7090700@ichips.intel.com> References: <4332EDF8.7090700@ichips.intel.com> Message-ID: <1127765295.4400.778.camel@hal.voltaire.com> On Thu, 2005-09-22 at 13:46, Sean Hefty wrote: > I've checked this into svn under svn/gen2/users/mshefty/linux-kernel/infiniband, > so that changes can be tracked easier. I haven't had a chance to look at this as yet but have a couple of questions: What would be done for uDAPL ? Would there be uCMA ? Also, would IPv6 be extensions to the current API parameters or additional APIs ? --- Hal From mshefty at ichips.intel.com Mon Sep 26 13:24:23 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 13:24:23 -0700 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: <1127765295.4400.778.camel@hal.voltaire.com> References: <4332EDF8.7090700@ichips.intel.com> <1127765295.4400.778.camel@hal.voltaire.com> Message-ID: <433858F7.9090006@ichips.intel.com> Hal Rosenstock wrote: > On Thu, 2005-09-22 at 13:46, Sean Hefty wrote: > >>I've checked this into svn under svn/gen2/users/mshefty/linux-kernel/infiniband, >>so that changes can be tracked easier. > > What would be done for uDAPL ? Would there be uCMA ? I've considered uDAPL, but haven't thought through all of the details yet. My hope is that the uCMA will interface to the uCM, userspace SA query code, and userspace address resolution code, and will not need to communicate directly with the kernel CMA > Also, would IPv6 be extensions to the current API parameters or > additional APIs ? I'm not sure about this. I was trying to get this working on IPv4 first, then worry about IPv6 later. However, I tried to avoid assuming that addresses were IPv4 where possible. - Sean From rolandd at cisco.com Mon Sep 26 13:25:55 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 13:25:55 -0700 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: <1127765295.4400.778.camel@hal.voltaire.com> (Hal Rosenstock's message of "26 Sep 2005 16:08:15 -0400") References: <4332EDF8.7090700@ichips.intel.com> <1127765295.4400.778.camel@hal.voltaire.com> Message-ID: <52achztw30.fsf@cisco.com> Hal> Also, would IPv6 be extensions to the current API parameters Hal> or additional APIs ? I think the API works unchanged for IPv6, since addresses are specified using struct sockaddr. - R. From mshefty at ichips.intel.com Mon Sep 26 14:03:38 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 14:03:38 -0700 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: References: Message-ID: <4338622A.4010600@ichips.intel.com> James Lentini wrote: > - move listen declaration closer to accept and reject Accepted - will be pushed in with next version. > - add private data and new cma_id fields to event structure Added private_data_len field to cma_id structure. Would like to get some additional feedback before adding the new_cma_id field. If there are no objections, I'll add this. > - record need to address information in the event structure The address information is only sent in the CM REQ. There shouldn't be a need to carry it back in the CM REP. > - implement private data handling for IB_CM_REQ_RECEIVED and > IB_CM_REP_RECEIVED See below. > @@ -177,8 +177,6 @@ > if (!route->path_rec) > goto err; > > - ib_event->private_data += sizeof *addr; Used to skip address information sent in CM REQ. > case IB_CM_REQ_RECEIVED: > - cma_id_priv = cma_req_recv(cma_id_priv, ib_event); > - if (!cma_id_priv) > + new_cma_id_priv = cma_req_recv(cma_id_priv, ib_event); > + if (!new_cma_id_priv) > return -ENOMEM; > event.event = RDMA_CMA_EVENT_CONNECT_REQUEST; > + event.private_data = ib_event->private_data + > + sizeof struct cma_addr; private_data pointer is set at the end of this routine. > + event.private_data_len = IB_CM_REQ_PRIVATE_DATA_SIZE - > + sizeof struct cma_addr; added this. > + event.private_data = ib_event->private_data + > + sizeof struct cma_addr; > + event.private_data_len = IB_CM_REQ_PRIVATE_DATA_SIZE - > + sizeof struct cma_addr; Set private_data_len = IB_CM_ *REP* _PRIVATE_DATA_SIZE. > -int rdma_cma_reject(struct rdma_cma_id *cma_id, > - const void *private_data, u8 private_data_len) > +int rdma_cma_reject(struct rdma_cma_id *cma_id, const void *private_data, > + u8 private_data_len) I prefer that the private data variables appear together... - Sean From jerome.pioux at bull.com Mon Sep 26 14:08:08 2005 From: jerome.pioux at bull.com (Jerome Pioux) Date: Mon, 26 Sep 2005 14:08:08 -0700 Subject: [openib-general] Re: FW: SDP problems with 64K page size References: <52ll1pdjm8.fsf@cisco.com> <20050925104559.GT31820@mellanox.co.il> Message-ID: <017901c5c2de$67905e50$0211708d@gpv.az05.bull.com> > The best way to fix this appears to be to bump the counters up to u32 or > s32. Just an open question: Do you think that we could get better performance if we would go with u32 instead of reducing the buffer to 16K? Jerome ----- Original Message ----- From: "Michael S. Tsirkin" To: "Roland Dreier" ; "Tom Duffy" ; "Jerome Pioux" Cc: Sent: Sunday, September 25, 2005 3:45 AM Subject: Re: FW: SDP problems with 64K page size > Quoting r. Roland Dreier : >> Subject: FW: SDP problems with 64K page size >> >> Hi, Jerome asked me to forward this on, since for some reason his >> email didn't appear when he sent it. >> >> In any case there seem to be some PAGE_SIZE dependencies in SDP. >> Libor provided a patch that fixed this up a while ago, but I don't >> know if this is the right way to handle this. >> >> - R. > > Roland, thanks very much for forwarding this, and for providing > a patch to Jerome. > > The problem is with recv_size/send_size counters in SDP, which are u16, > so that assigning a value of 64K overflows them. > The best way to fix this appears to be to bump the counters up to u32 or > s32. > The patch that Roland posted fixes this by reducing the buffer > size to 16K, so that should work, too. > > Roland, I might check in the patch that you posted to work around > this problem for 64K page users, until I have a final fix ready. > Is that OK with everyone? > > Thanks, > > -- > MST From rolandd at cisco.com Mon Sep 26 14:09:45 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 14:09:45 -0700 Subject: [openib-general] New uverbs ABI version Message-ID: <521x3btu1y.fsf@cisco.com> I just checked a change into subversion to implement the completion channel API I described last week. This also cleans up some of the problems in the kernel error paths. These changes break both the kernel ABI and the userspace library API, so to use the new code, you will have to update your kernel, libibverbs, libmthca, and whatever application you are running on top of libibverbs. The new libibverbs will work with all old kernels, so it should be fine to update. I'll post patches to update MVAPICH and Open MPI to work with the new libibverbs. I didn't try to fix uDAPL, because some thought probably needs to go into how to use completion channels most efficiently. I've done some testing, but I undoubtedly introduced some new bugs, so please let me know the results of your testing. Thanks, Roland From rolandd at cisco.com Mon Sep 26 14:13:39 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 14:13:39 -0700 Subject: [openib-general] Re: FW: SDP problems with 64K page size In-Reply-To: <017901c5c2de$67905e50$0211708d@gpv.az05.bull.com> (Jerome Pioux's message of "Mon, 26 Sep 2005 14:08:08 -0700") References: <52ll1pdjm8.fsf@cisco.com> <20050925104559.GT31820@mellanox.co.il> <017901c5c2de$67905e50$0211708d@gpv.az05.bull.com> Message-ID: <52wtl3sfb0.fsf@cisco.com> Jerome> Just an open question: Do you think that we could get Jerome> better performance if we would go with u32 instead of Jerome> reducing the buffer to 16K? Not sure. One easy test you could try would be increasing 16384 to 32768 in my patch. If that works and improves performance, then further increases would probably be worthwhile also. You could also try changing recv_size and send_size from u16 to s32 in the declaration in sdp_conn.h. BTW, I say all of this with only the vaguest understanding of the SDP code base, so it might be complete nonsense. - R. From jlentini at netapp.com Mon Sep 26 14:14:28 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 26 Sep 2005 17:14:28 -0400 (EDT) Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: <43385665.5000104@ichips.intel.com> References: <43385665.5000104@ichips.intel.com> Message-ID: On Mon, 26 Sep 2005, Sean Hefty wrote: > James Lentini wrote: > > Why would this module be a ULP and not part of the core? Especially since > > the rdma_cma.h include file is intended for the core include area, > > include/rdma. > > It can be a separately loaded module, so a ULP from the viewpoint of > verbs, SA query, IB CM, etc. The distinction between a core component and a ULP is still fuzzy. The core is comprised of seperately loaded modules (e.g. ib_core, ib_sa, ib_mad, ib_ping, ib_cm, etc.). > > I expect that the IB_CM_REQ_RECEIVED callback will be confusing to > > ULPs. The ULP will receive a new cma_id with an old context value. > > If the ULP wanted to make an adjustments to the cma_id that > > received the request, it would need to store a reference to it in > > the old cma_id's context value. I suggest you make the new cma_id > > part of the event data (see patch below). > > The new cma_id must be used after the REQ is received, so I wanted to make > that clear. There's not much that a user can do with the listening cma_id > from within the callback; since it cannot be destroyed from the callback > itself. This is a fairly minor issue that we can discuss more. Fair enough. A comment that says the callback context is not strictly tied to the cma_id would help here. > > How will the IP address and port number in the connection request be > > delivered to the ULP? > > The cma_id contains the source and destination address. See cma_req_recv(). > This was added in after the initial posting of the code. Thanks. I see it now. > > Would the design be cleaner if instead of sprinkling the code with "switch > > (cma_id->device->node_type)" statements, we used function pointers? > > I wasn't sure what the iWarp portion of the code would look like, but I wanted > to leave place for an iWarp implementation to be easily inserted. It seems > that it would take a fair deal of additional code to use function pointers. I see what you mean. Without knowing the function for the iWARP calls, it isn't possible to know what each funtion pointer's signature should be. james From rolandd at cisco.com Mon Sep 26 14:15:27 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 14:15:27 -0700 Subject: [openib-general] [PATCH] Update MVAPICH for new libibverbs API In-Reply-To: <521x3btu1y.fsf@cisco.com> (Roland Dreier's message of "Mon, 26 Sep 2005 14:09:45 -0700") References: <521x3btu1y.fsf@cisco.com> Message-ID: <52slvrsf80.fsf@cisco.com> This patch updates MVAPICH for the new ibv_create_cq() API. Signed-off-by: Roland Dreier --- mvapich-gen2/mpid/ch_gen2/viainit.c (revision 3534) +++ mvapich-gen2/mpid/ch_gen2/viainit.c (working copy) @@ -115,7 +115,7 @@ static void get_lid(void) static void create_cq(void) { ibv_dev.cq_hndl = ibv_create_cq(ibv_dev.context, - viadev_cq_size, NULL); + viadev_cq_size, NULL, NULL, 0); if(!ibv_dev.cq_hndl) { error_abort_all(GEN_EXIT_ERR, "Error creating CQ\n"); From rolandd at cisco.com Mon Sep 26 14:19:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 14:19:11 -0700 Subject: [openib-general] [PATCH] Fix MVAPICH compile with gcc4 In-Reply-To: <52slvrsf80.fsf@cisco.com> (Roland Dreier's message of "Mon, 26 Sep 2005 14:15:27 -0700") References: <521x3btu1y.fsf@cisco.com> <52slvrsf80.fsf@cisco.com> Message-ID: <52ll1jsf1s.fsf_-_@cisco.com> gcc version 4 doesn't like the extern declaration of free_vbuf_head to followed by a static declaration in vbuf.c. To fix this, we can just get rid of the declaration in vbuf.h, since free_vbuf_head is not used outside of vbuf.c. Signed-off-by: Roland Dreier --- mpid/ch_gen2/vbuf.h (revision 3549) +++ mpid/ch_gen2/vbuf.h (working copy) @@ -188,8 +188,6 @@ void allocate_vbufs(void); void deallocate_vbufs(void); -extern vbuf *free_vbuf_head; - vbuf *get_vbuf(void); void release_vbuf(vbuf * v); void vbuf_init_send(vbuf * v, unsigned long len); From rolandd at cisco.com Mon Sep 26 14:20:20 2005 From: rolandd at cisco.com (Roland Dreier) Date: Mon, 26 Sep 2005 14:20:20 -0700 Subject: [openib-general] [PATCH] Update Open MPI for new libibverbs API In-Reply-To: <521x3btu1y.fsf@cisco.com> (Roland Dreier's message of "Mon, 26 Sep 2005 14:09:45 -0700") References: <521x3btu1y.fsf@cisco.com> Message-ID: <52fyrrsezv.fsf@cisco.com> [It's somewhat annoying to have to subscribe to devel at open-mpi.org just to be able to send patches, but oh well...] This patch updates Open MPI for the new ibv_create_cq() API. Signed-off-by: Roland Dreier --- ompi/mca/btl/openib/btl_openib.c (revision 7507) +++ ompi/mca/btl/openib/btl_openib.c (working copy) @@ -656,7 +656,8 @@ int mca_btl_openib_module_init(mca_btl_o } /* Create the low and high priority queue pairs */ - openib_btl->ib_cq_low = ibv_create_cq(ctx, mca_btl_openib_component.ib_cq_size, NULL); + openib_btl->ib_cq_low = ibv_create_cq(ctx, mca_btl_openib_component.ib_cq_size, + NULL, NULL, 0); if(NULL == openib_btl->ib_cq_low) { BTL_ERROR(("error creating low priority cq for %s errno says %s\n", @@ -665,7 +666,8 @@ int mca_btl_openib_module_init(mca_btl_o return OMPI_ERROR; } - openib_btl->ib_cq_high = ibv_create_cq(ctx, mca_btl_openib_component.ib_cq_size, NULL); + openib_btl->ib_cq_high = ibv_create_cq(ctx, mca_btl_openib_component.ib_cq_size, + NULL, NULL, 0); if(NULL == openib_btl->ib_cq_high) { BTL_ERROR(("error creating high priority cq for %s errno says %s\n", From jlentini at netapp.com Mon Sep 26 14:20:45 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 26 Sep 2005 17:20:45 -0400 (EDT) Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: <1127765295.4400.778.camel@hal.voltaire.com> References: <4332EDF8.7090700@ichips.intel.com> <1127765295.4400.778.camel@hal.voltaire.com> Message-ID: On Mon, 26 Sep 2005, Hal Rosenstock wrote: > On Thu, 2005-09-22 at 13:46, Sean Hefty wrote: > > I've checked this into svn under svn/gen2/users/mshefty/linux-kernel/infiniband, > > so that changes can be tracked easier. > > I haven't had a chance to look at this as yet but have a couple of > questions: > > What would be done for uDAPL ? Would there be uCMA ? That is a discussion for the DAT collaborative. When layered on IB, the uDAPL and kDAPL APIs may or may not want to use the new "IP-addressing" protocol that the CMA module will provide. From mshefty at ichips.intel.com Mon Sep 26 14:24:54 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 14:24:54 -0700 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: References: <43385665.5000104@ichips.intel.com> Message-ID: <43386726.3070400@ichips.intel.com> James Lentini wrote: >>>I expect that the IB_CM_REQ_RECEIVED callback will be confusing to >>>ULPs. The ULP will receive a new cma_id with an old context value. >>>If the ULP wanted to make an adjustments to the cma_id that >>>received the request, it would need to store a reference to it in >>>the old cma_id's context value. I suggest you make the new cma_id >>>part of the event data (see patch below). >> >>The new cma_id must be used after the REQ is received, so I wanted to make >>that clear. There's not much that a user can do with the listening cma_id >>from within the callback; since it cannot be destroyed from the callback >>itself. This is a fairly minor issue that we can discuss more. > > Fair enough. A comment that says the callback context is not strictly > tied to the cma_id would help here. The IB CM does this a little differently that your proposal. It returns the new cm_id in the callback, with a reference to the listen_id in the event. For consistency, we may want to match this model. With the CMA, I took out the listen_id, assuming that the user could get to it using the returned context. I see your point about the context being confusing. Another approach is to return the context directly in the callback. Note that nothing prevents the user from changing the context stored with a cma_id. It's only by default that a new cma_id is created with a context value equal to that of the listening request. - Sean From jlentini at netapp.com Mon Sep 26 14:28:04 2005 From: jlentini at netapp.com (James Lentini) Date: Mon, 26 Sep 2005 17:28:04 -0400 (EDT) Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: <1127757010.4400.1.camel@hal.voltaire.com> References: <1127749644.4398.878.camel@hal.voltaire.com> <1127757010.4400.1.camel@hal.voltaire.com> Message-ID: On Mon, 26 Sep 2005, Hal Rosenstock wrote: > On Mon, 2005-09-26 at 13:15, James Lentini wrote: > > What is the kdapltest command you are using? > > > kdapltest -T T -s -D mthca0a -d -i 10000 -w 8 client SR server SR > > seems to fail > > but > > kdapltest -T T -s -D mthca0a -d -t 2 -w 8 -i 20 client SR server SR > > works on that machine I've been running the first command in a loop 100s (if not 1000s) of times on my setup and still haven't seen any errors. I'll look over the stack trace again and see if I can see what is going on. From ardavis at ichips.intel.com Mon Sep 26 14:37:19 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Mon, 26 Sep 2005 14:37:19 -0700 Subject: [openib-general] New uverbs ABI version In-Reply-To: <521x3btu1y.fsf@cisco.com> References: <521x3btu1y.fsf@cisco.com> Message-ID: <43386A0F.9060404@ichips.intel.com> Roland Dreier wrote: >I just checked a change into subversion to implement the completion >channel API I described last week. This also cleans up some of the >problems in the kernel error paths. > >These changes break both the kernel ABI and the userspace library >API, so to use the new code, you will have to update your kernel, >libibverbs, libmthca, and whatever application you are running on top >of libibverbs. The new libibverbs will work with all old kernels, so >it should be fine to update. > >I'll post patches to update MVAPICH and Open MPI to work with the new >libibverbs. I didn't try to fix uDAPL, because some thought probably >needs to go into how to use completion channels most efficiently. > > I will take care of uDAPL. -arlin From twbowman at gmail.com Mon Sep 26 15:05:03 2005 From: twbowman at gmail.com (Todd Bowman) Date: Mon, 26 Sep 2005 16:05:03 -0600 Subject: [openib-general] uDAPL problem Message-ID: I am having a problem with uDAPL accessing /dev/infiniband/{uat,ucm0}. I am running 3549, 2.6.12 kernel with backport. Here is a snippet of the uDAPL debug messages running dtest. The dat.conf file seems to be correct, the correclty named providers are being loaded. 26248 Running as server DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called DAT Registry: IA OpenIB-ib0, trying to load library /usr/local/lib/libdapl.so libuat: Error <-1:6> couldn't open IB at device libibcm: error <-1:6> opening device DAPL: NOT Setting Loopback dapl_ib_init: DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0) dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0) open_hca: mthca0 - 0x1001fdb0 open_hca: Found dev mthca0 f422000002c90200 open_hca: GID subnet 00000000000080fe id f522000002c90200 ips_by_gid: ERR ips_by_gid -1 Bad file descriptor open_hca: ERR ib_at_ips_by_gid for mthca0 dapls_ib_open_hca failed 40000 dapl_ia_open () returns 0x40000 26248: Error Adaptor open: DAT_INTERNAL_ERROR DAT Registry: Stopped (dat_fini) DAPL: Stopped (dapl_fini) dapl_ib_release: I am not running udev but manually create uat and ucm. Here is the list of /dev/infiniband: ls -l /dev/infiniband/ total 0 crw-rw-rw- 1 root root 231, 64 Sep 22 15:18 issm0 crw-rw-rw- 1 root root 231, 65 Sep 22 15:18 issm1 crw-rw-rw- 1 root root 231, 254 Sep 22 22:47 uat crw-rw-rw- 1 root root 231, 255 Sep 20 22:31 ucm crw-rw-rw- 1 root root 231, 255 Sep 26 20:01 ucm0 crw-rw-rw- 1 root root 231, 0 Sep 22 15:18 umad0 crw-rw-rw- 1 root root 231, 1 Sep 22 15:18 umad1 crw-rw-rw- 1 root root 231, 192 Sep 20 22:30 uverbs0 crw-rw-rw- 1 root root 231, 193 Sep 20 22:30 uverbs1 And the loaded modules: kdapl_ib 82000 0 kdapl 14888 1 kdapl_ib ib_uverbs 52064 0 ib_ipoib 65480 0 ib_ucm 32624 0 ib_cm 51944 2 kdapl_ib,ib_ucm ib_uat 22168 0 ib_at 34840 2 kdapl_ib,ib_uat ib_sa 25328 2 ib_ipoib,ib_at ib_mthca 160376 0 ib_mad 61108 3 ib_cm,ib_sa,ib_mthca ib_core 73888 8 kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad I am sure that I am missing something simple. Can someone point me in the right direction. Thanks, Todd -------------- next part -------------- An HTML attachment was scrubbed... URL: From nacc at us.ibm.com Mon Sep 26 15:21:57 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Mon, 26 Sep 2005 15:21:57 -0700 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <20050926171521.GB16113@esmail.cup.hp.com> References: <20050924074611.GD3950@us.ibm.com> <52psqy8jt2.fsf@cisco.com> <20050926171521.GB16113@esmail.cup.hp.com> Message-ID: <20050926222157.GI7532@us.ibm.com> On 26.09.2005 [10:15:21 -0700], Grant Grundler wrote: > On Sat, Sep 24, 2005 at 10:19:53AM -0700, Roland Dreier wrote: > ... > > I just checked in a fix for this -- the pci_pretty_name() API has gone > > away, so I removed our use of it in svn. I don't understand how your > > other builds of git + svn succeeded though, since pci_pretty_name is > > completely gone. Oh, I guess you'll miss link failures when building > > modules, so functions that disappear won't break the build. > > I think he can run "make modules_install INSTALL_MOD_PATH=/tmp" to verify > modules link in properly. Turns out the build system already does the modules_install, but none of the default .configs have CONFIG_MODULES=y. I've updated my scripts to do this and beyond the ip_dev_find() issues which Roland has already posted about, I am not seeing any other link issues. Thanks, Nish From halr at voltaire.com Mon Sep 26 15:44:35 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 18:44:35 -0400 Subject: [openib-general] uDAPL problem In-Reply-To: References: Message-ID: <1127774675.7959.852.camel@hal.voltaire.com> On Mon, 2005-09-26 at 18:05, Todd Bowman wrote: > I am having a problem with uDAPL accessing > /dev/infiniband/{uat,ucm0}. I am running 3549, 2.6.12 kernel with > backport. Here is a snippet of the uDAPL debug messages running > dtest. The dat.conf file seems to be correct, the correclty named > providers are being loaded. > > 26248 Running as server > DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called > DAT Registry: IA OpenIB-ib0, trying to load library > /usr/local/lib/libdapl.so > libuat: Error <-1:6> couldn't open IB at device > libibcm: error <-1:6> opening device > DAPL: NOT Setting Loopback > dapl_ib_init: > DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0) > dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0) > open_hca: mthca0 - 0x1001fdb0 > open_hca: Found dev mthca0 f422000002c90200 > open_hca: GID subnet 00000000000080fe id f522000002c90200 These look like they need to be endianized to me. > ips_by_gid: ERR ips_by_gid -1 Bad file descriptor > open_hca: ERR ib_at_ips_by_gid for mthca0 > dapls_ib_open_hca failed 40000 > dapl_ia_open () returns 0x40000 > 26248: Error Adaptor open: DAT_INTERNAL_ERROR > DAT Registry: Stopped (dat_fini) > DAPL: Stopped (dapl_fini) > dapl_ib_release: > > > I am not running udev but manually create uat and ucm. Here is the > list of /dev/infiniband: > > ls -l /dev/infiniband/ > total 0 > crw-rw-rw- 1 root root 231, 64 Sep 22 15:18 issm0 > crw-rw-rw- 1 root root 231, 65 Sep 22 15:18 issm1 > crw-rw-rw- 1 root root 231, 254 Sep 22 22:47 uat uat is at 231/191. > crw-rw-rw- 1 root root 231, 255 Sep 20 22:31 ucm I don't think you need this. > crw-rw-rw- 1 root root 231, 255 Sep 26 20:01 ucm0 ucm devices start at 231/224. -- Hal > crw-rw-rw- 1 root root 231, 0 Sep 22 15:18 umad0 > crw-rw-rw- 1 root root 231, 1 Sep 22 15:18 umad1 > crw-rw-rw- 1 root root 231, 192 Sep 20 22:30 uverbs0 > crw-rw-rw- 1 root root 231, 193 Sep 20 22:30 uverbs1 > > > And the loaded modules: > > kdapl_ib 82000 0 > kdapl 14888 1 kdapl_ib > ib_uverbs 52064 0 > ib_ipoib 65480 0 > ib_ucm 32624 0 > ib_cm 51944 2 kdapl_ib,ib_ucm > ib_uat 22168 0 > ib_at 34840 2 kdapl_ib,ib_uat > ib_sa 25328 2 ib_ipoib,ib_at > ib_mthca 160376 0 > ib_mad 61108 3 ib_cm,ib_sa,ib_mthca > ib_core 73888 8 > kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad > > > I am sure that I am missing something simple. Can someone point me in > the right direction. > > Thanks, > Todd > > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From halr at voltaire.com Mon Sep 26 16:31:23 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 19:31:23 -0400 Subject: [openib-general] [PATCH] examples/cmpost.c: Update to new ibv_create_cq API Message-ID: <1127777483.4379.2.camel@hal.voltaire.com> examples/cmpost.c: Update to new ibv_create_cq API Signed-off-by: Hal Rosenstock Index: examples/cmpost.c =================================================================== --- examples/cmpost.c (revision 3552) +++ examples/cmpost.c (working copy) @@ -315,7 +315,7 @@ static int init_node(struct cmtest_node } cqe = message_count ? message_count * 2 : 2; - node->cq = ibv_create_cq(test.verbs, cqe, node); + node->cq = ibv_create_cq(test.verbs, cqe, node, NULL, 0); if (!node->cq) { printf("unable to create CQ\n"); goto error1; From pradeep at us.ibm.com Mon Sep 26 16:39:53 2005 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Mon, 26 Sep 2005 16:39:53 -0700 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: Message-ID: Tried to find out the "default superslotes" for an OpenPower 720. Please try either slot 2 or 5. I delayed my response to make certain that these were indeed the superslots. I am still not a 100% certain -no point waiting beyond a certain stage. If you can please go ahead and try these and let us see what happens. Also can you provide the output of "lspci -v" before you load the ib_mthca? The firmware I was referring to was the OpenPower firmware, not that of the HCA. Pradeep pradeep at us.ibm.com Thaddeus Ternes To Pradeep 09/23/2005 11:23 Satyanarayana/Beaverton/IBM at IBMUS AM cc openib-general at openib.org, Roland Dreier Please respond to Subject Thaddeus Ternes Re: [openib-general] EEH: MMIO Failure on Power5 I've tried a few things, but still seem to get the same error. My testing has been on 2.6.13.1, with SVN IB code (as of Monday). The ib_mthca module reports my HCA FW version to be 3.2.0 (which is admittedly old). Updating this old firmware will likely be my next step. Originally, I had installed the card in slot 1. I've since poked around in a PDF file I found on IBM's site and concluded that I should have installed the card in slot 3, though I'm still not overly confident about that. I/O Adapter Large Capacity is also now enabled (it wasn't previously, and changing it while the card was in slot 1 didn't seem to affect anything). Is somebody aware of a clear way to identify which of the slots in the 720 are "superslots," as I've had no luck so far in my hunt in the documentation. Most likely, I've mistakenly skipped over it. Thanks. Thaddeus On 9/22/05, Pradeep Satyanarayana wrote: > > > I have filed a bug against the kernel (for p-series) as a starting point. > Could you please flll me on some of the other specifics a) which kernel were > you using b) firmware level (presumably it is uptodate). > > One other issue that I failed to mention previously - is the HCA in one of > the superslots (I know on my p570 slots 2 and 6 are superslots by default) > and, is this superslot enabled? > > Here is a quote of how to enable superslots- > > One issue with the Mellanox cards in pSeries systems is to ensure that the > card is installed in a superslot, and that the "I/O Adapter Enlarged > Capacity" setting has been enabled for the system. For a p570, slots C6 and > C2 are the available super slots. To enable the "Enlarged Capacity" feature, > go to ASM and select the following screens: > > System Configuration->I/O Adapter Enlarged Capacity > Set the setting to Enabled and save it. > > If this does not help, I have already filed the bug. Please let me know > either way. > > Pradeep > pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pic06119.gif Type: image/gif Size: 1255 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From mshefty at ichips.intel.com Mon Sep 26 16:47:57 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 16:47:57 -0700 Subject: [openib-general] Re: [PATCH] examples/cmpost.c: Update to new ibv_create_cq API In-Reply-To: <1127777483.4379.2.camel@hal.voltaire.com> References: <1127777483.4379.2.camel@hal.voltaire.com> Message-ID: <433888AD.2070700@ichips.intel.com> Hal Rosenstock wrote: > examples/cmpost.c: Update to new ibv_create_cq API > > Signed-off-by: Hal Rosenstock Thanks - committed. - Sean From mshefty at ichips.intel.com Mon Sep 26 16:50:53 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Mon, 26 Sep 2005 16:50:53 -0700 Subject: [openib-general][PATCH][RFC]: CMA IB implementation In-Reply-To: <43381CFC.7070508@voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> Message-ID: <4338895D.9070508@ichips.intel.com> Guy German wrote: > I also added a small test, _for bring up purposes only_, which is not > for real use (has memory leaks and uses global vars etc), but it does > connect and passes private data from side to side. > I would like to port the cmpost, for a real test module, but it is maybe > possible to fix this test too. FYI - I've ported a version of cmpost to the new API, and I'm in the process of testing it now. I did make one more change to the API: the event_handler now returns an int, to allow the user to destroy a cma_id. I will commit the changes after completing some basic testing. - Sean From viswa.krish at gmail.com Mon Sep 26 16:57:58 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Mon, 26 Sep 2005 16:57:58 -0700 Subject: [openib-general] opensm and faulty hardware Message-ID: <4df28be4050926165756759bb3@mail.gmail.com> I have an exerciser in the IB network. The exerciser seems to be faulty/buggy. When opensm starts I do not see 'SUBNET UP" message. It says "Entering MASTER" and waits there. Any new node inserted in this state is not assigned any LID. Anybody seen such behavior ? -Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Mon Sep 26 20:05:42 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Sep 2005 06:05:42 +0300 Subject: [openib-general] Re: FW: SDP problems with 64K page size In-Reply-To: <52wtl3sfb0.fsf@cisco.com> References: <52wtl3sfb0.fsf@cisco.com> Message-ID: <20050927030542.GB25823@mellanox.co.il> Quoting r. Roland Dreier : > Subject: Re: FW: SDP problems with 64K page size > > Jerome> Just an open question: Do you think that we could get > Jerome> better performance if we would go with u32 instead of > Jerome> reducing the buffer to 16K? > > Not sure. One easy test you could try would be increasing 16384 to > 32768 in my patch. I would replace 16384 with 4096 instead, and see whether the performance decreases. Other approaches need changing more places. > If that works and improves performance, then > further increases would probably be worthwhile also. My worry would be sign extention kicking in at this point, since we are doing math on these numbers. > You could also try changing recv_size and send_size from u16 to s32 in > the declaration in sdp_conn.h. Some other places need cleaning up for this to work. > BTW, I say all of this with only the vaguest understanding of the SDP > code base, so it might be complete nonsense. > > - R. > -- MST From mst at mellanox.co.il Mon Sep 26 20:06:24 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Sep 2005 06:06:24 +0300 Subject: [openib-general] Re: FW: SDP problems with 64K page size In-Reply-To: <017901c5c2de$67905e50$0211708d@gpv.az05.bull.com> References: <017901c5c2de$67905e50$0211708d@gpv.az05.bull.com> Message-ID: <20050927030624.GC25823@mellanox.co.il> Quoting r. Jerome Pioux : > Subject: Re: FW: SDP problems with 64K page size > > > The best way to fix this appears to be to bump the counters up to u32 or > > s32. > > Just an open question: Do you think that we could get better performance if > we would go with u32 instead of reducing the buffer to 16K? > > Jerome My guess would be yes, but this depends on the application, obviously. -- MST From mst at mellanox.co.il Mon Sep 26 20:16:59 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Sep 2005 06:16:59 +0300 Subject: [openib-general] Re: netdev reference counting problem with ib_at In-Reply-To: <1127764851.4400.729.camel@hal.voltaire.com> References: <1127764851.4400.729.camel@hal.voltaire.com> Message-ID: <20050927031659.GE25823@mellanox.co.il> Quoting Hal Rosenstock : > > What does SDP use this for? > > Same thing as AT right now. Except SDP drops netdevice and route reference counts after sending an arp :) -- MST From mst at mellanox.co.il Mon Sep 26 20:25:45 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Sep 2005 06:25:45 +0300 Subject: [openib-general] Re: [PATCH] add cq error events In-Reply-To: <52hdca88w3.fsf@cisco.com> References: <52hdca88w3.fsf@cisco.com> Message-ID: <20050927032545.GF25823@mellanox.co.il> Quoting Roland Dreier : > Michael> As a side note, the spec says: "Two types of CQ errors > Michael> can occur: the CQ can overrun or it can become > Michael> inaccessible": I wander whether this should be > Michael> interpreted in a sense that that there should be two > Michael> types of events: IB_EVENT_CQ_OVERRUN and > Michael> IB_EVENT_CQ_ACCESS, rather than just a generic > Michael> IB_EVENT_CQ_ERR > > Yes, this seems useful to me. The reason is that a CQ overrun > indicates a bug in the consumer, and a CQ access error indicates a bug > in the verbs implementation. So it's useful to be able to tell whose > fault a CQ error is. > > - R. > Okay ... one problem that I've run into adding this, is that IB_EVENT_CQ_ERR is the first item in the ib_event_type enum. And since uverbs seems to just copy the event over to userspace, changing all the enum values would break the ABI. Given that IB_EVENT_CQ_ERR wasnt actually produced by any hardware provider yet, I'm thinking about working around this by simply giving specific values to enum items, like this enum ib_event_type { IB_EVENT_QP_FATAL = 1, IB_EVENT_QP_REQ_ERR = 2, IB_EVENT_QP_ACCESS_ERR = 3, IB_EVENT_COMM_EST = 4, IB_EVENT_SQ_DRAINED = 5, IB_EVENT_PATH_MIG = 6, IB_EVENT_PATH_MIG_ERR = 7, IB_EVENT_DEVICE_FATAL = 8, IB_EVENT_PORT_ACTIVE = 9, IB_EVENT_PORT_ERR = 10, IB_EVENT_LID_CHANGE = 11, IB_EVENT_PKEY_CHANGE = 12, IB_EVENT_SM_CHANGE = 13, IB_EVENT_SRQ_ERR = 14, IB_EVENT_SRQ_LIMIT_REACHED = 15, IB_EVENT_QP_LAST_WQE_REACHED = 16, IB_EVENT_CQ_OVERRUN = 17, IB_EVENT_CQ_ACCESS = 18 } Is that acceptable? And alternative would be to add a switch statement to uverbs or to libibverbs. -- MST From halr at voltaire.com Mon Sep 26 20:34:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 23:34:46 -0400 Subject: [openib-general] opensm and faulty hardware In-Reply-To: <4df28be4050926165756759bb3@mail.gmail.com> References: <4df28be4050926165756759bb3@mail.gmail.com> Message-ID: <1127792086.4379.290.camel@hal.voltaire.com> On Mon, 2005-09-26 at 19:57, Viswanath Krishnamurthy wrote: > I have an exerciser in the IB network. The exerciser seems to be > faulty/buggy. When opensm starts I do not > see 'SUBNET UP" message. It says "Entering MASTER" and waits there. > Any new node inserted in this state is not assigned any LID. Anybody > seen such behavior ? Any idea on how the IB exerciser misbehaves on the network ? Do you have an analyzer too ? What does the OSM log show ? -- Hal From mail at openib.org Mon Sep 26 20:50:43 2005 From: mail at openib.org (mail at openib.org) Date: Tue, 27 Sep 2005 09:50:43 +0600 Subject: [openib-general] IMPORTANT NOTIFICATION Message-ID: <0INH00ADYGSVEN@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: account-details.zip Type: application/octet-stream Size: 53532 bytes Desc: not available URL: From Administrator at openib.org Mon Sep 26 20:50:05 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Mon, 26 Sep 2005 22:50:05 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <002101c5c316$8c3e7f10$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] IMPORTANT NOTIFICATION Scanning time = 9/26/2005 10:50:05 PM Engine/Pattern = 7.510-1002/2.857.00 Action on virus found: The attachment account-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/26/2005 account-details.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] IMPORTANT NOTIFICATION From Administrator at openib.org Mon Sep 26 20:50:25 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Mon, 26 Sep 2005 20:50:25 -0700 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <01e401c5c316$97f4d930$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] IMPORTANT NOTIFICATION Scanning time = 9/26/2005 8:50:25 PM Engine/Pattern = 7.510-1002/2.857.00 Action on virus found: The attachment account-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. From halr at voltaire.com Mon Sep 26 20:46:08 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 26 Sep 2005 23:46:08 -0400 Subject: [openib-general] Re: netdev reference counting problem with ib_at In-Reply-To: <20050927031659.GE25823@mellanox.co.il> References: <1127764851.4400.729.camel@hal.voltaire.com> <20050927031659.GE25823@mellanox.co.il> Message-ID: <1127792767.4379.329.camel@hal.voltaire.com> On Mon, 2005-09-26 at 23:16, Michael S. Tsirkin wrote: > Quoting Hal Rosenstock : > > > What does SDP use this for? > > > > Same thing as AT right now. > > Except SDP drops netdevice and route reference counts after sending an arp :) So SDP wouldn't take advantage of a pointer to access IPoIB private data if this were available ? From mst at mellanox.co.il Mon Sep 26 21:08:22 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Sep 2005 07:08:22 +0300 Subject: [openib-general] Re: netdev reference counting problem with ib_at In-Reply-To: <1127792767.4379.329.camel@hal.voltaire.com> References: <1127792767.4379.329.camel@hal.voltaire.com> Message-ID: <20050927040822.GL25823@mellanox.co.il> Quoting Hal Rosenstock : > > > > What does SDP use this for? > > > > > > Same thing as AT right now. > > > > Except SDP drops netdevice and route reference counts after sending an > > arp :) > > So SDP wouldn't take advantage of a pointer to access IPoIB private data > if this were available ? I think it would, that would be cleaner than what I do now. However, I dont think SDP needs this pointer after the point where it sends an arp request. Why does AT need to keep netdev reference for longer? -- MST From rjwalsh at pathscale.com Tue Sep 27 00:09:40 2005 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 27 Sep 2005 00:09:40 -0700 Subject: [openib-general] svn:ignore property stuff Message-ID: <1127804980.24349.3.camel@phosphene.durables.org> Hi all, I've added an svn:ignore property to a bunch of directories in the subversion repository. This causes subversion to ignore certain files (like .o files and other generated files) in a particular directory. This is useful if you've modified and built a bunch of stuff and only want 'svn status' to print out information on your modifications or additions, and not show the generated files with a '?' status field. Is it OK if I go ahead and check these properties in? Regards, Robert. -- Robert Walsh Email: rjwalsh at pathscale.com PathScale, Inc. Phone: +1 650 934 8117 2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969 Mountain View, CA 94043. From eitan at mellanox.co.il Tue Sep 27 00:16:59 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 27 Sep 2005 10:16:59 +0300 Subject: [openib-general] opensm and faulty hardware In-Reply-To: <1127792086.4379.290.camel@hal.voltaire.com> References: <1127792086.4379.290.camel@hal.voltaire.com> Message-ID: <4338F1EB.3070909@mellanox.co.il> Hi Viswa, Please send a full /var/log/osm.log file of opensm -V . You can send us a copy off the list if it is too big: yael and eitan in @mellanox.co.il EZ Hal Rosenstock wrote: > On Mon, 2005-09-26 at 19:57, Viswanath Krishnamurthy wrote: > >>I have an exerciser in the IB network. The exerciser seems to be >>faulty/buggy. When opensm starts I do not >>see 'SUBNET UP" message. It says "Entering MASTER" and waits there. >>Any new node inserted in this state is not assigned any LID. Anybody >>seen such behavior ? > > > Any idea on how the IB exerciser misbehaves on the network ? Do you have > an analyzer too ? > > What does the OSM log show ? > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From gabhijit at pantasys.com Tue Sep 27 01:11:55 2005 From: gabhijit at pantasys.com (Abhijit Gadgil) Date: Tue, 27 Sep 2005 13:41:55 +0530 Subject: [openib-general] IPoIB question Message-ID: <1127808715.6627.15.camel@psmith.ind.pantasys.com> Hi All, I am new to IPoIB. I have a query, as per the IPoIB Architecture document, whenever an IPoIB interface is brought up, it needs to do a Full Member Join to the "broadcast" Multicast group. Where exactly in the code, is this taking place? I have been able to trace a little bit - eg. in ipoib_add_port() there is a call to ipoib_intf_alloc() which in turn creates a Work Queue for the ipoib_mcast_restart_task(). In this task, subsequently there is _ipoib_mcast_join() and so on where it finally reaches ipoib_mcast_attach() (in ipoib_verbs.c). What is not clear at this point is, Why is it looking for cached PKey? Is it not something that needs to be sent by the SM? Further, I am putting SM in testability 'debug' mode (DEBUG=10 in /etc/opensm.conf), however I am still not seeing any dump of messages about FullMember join whenever I try restarting the IB interfaces. What should be log-level to put SM to dump those messages? Thanks in advance. Regards. -abhijit From eitan at mellanox.co.il Tue Sep 27 01:15:51 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Tue, 27 Sep 2005 11:15:51 +0300 Subject: [openib-general] IPoIB question Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3069265@mtlexch01.mtl.com> > Further, I am putting SM in testability 'debug' mode (DEBUG=10 in > /etc/opensm.conf), however I am still not seeing any dump of messages > about FullMember join whenever I try restarting the IB interfaces. What > should be log-level to put SM to dump those messages? [EZ] Seems you are using a Gen1 Mellanox distribution (IBGD). You can use OpenIB OpenSM too. If still want to use IBGD OpenSM you need to use DEBUG="-V" instead of DEBUG=10 . If you want to use OpenIB OpenSM you will need to run : opensm -V EZ -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Sep 27 02:21:08 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 05:21:08 -0400 Subject: [openib-general] svn:ignore property stuff In-Reply-To: <1127804980.24349.3.camel@phosphene.durables.org> References: <1127804980.24349.3.camel@phosphene.durables.org> Message-ID: <1127811848.4379.1593.camel@hal.voltaire.com> On Tue, 2005-09-27 at 03:09, Robert Walsh wrote: > Hi all, > > I've added an svn:ignore property to a bunch of directories in the > subversion repository. This causes subversion to ignore certain files > (like .o files and other generated files) in a particular directory. > This is useful if you've modified and built a bunch of stuff and only > want 'svn status' to print out information on your modifications or > additions, and not show the generated files with a '?' status field. > > Is it OK if I go ahead and check these properties in? This seems like the right thing to do to me. -- Hal From halr at voltaire.com Tue Sep 27 02:24:25 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 05:24:25 -0400 Subject: [openib-general] IPoIB question In-Reply-To: <1127808715.6627.15.camel@psmith.ind.pantasys.com> References: <1127808715.6627.15.camel@psmith.ind.pantasys.com> Message-ID: <1127811868.4379.1602.camel@hal.voltaire.com> On Tue, 2005-09-27 at 04:11, Abhijit Gadgil wrote: > Hi All, > > I am new to IPoIB. I have a query, as per the IPoIB Architecture > document, whenever an IPoIB interface is brought up, it needs to do a > Full Member Join to the "broadcast" Multicast group. Where exactly in > the code, is this taking place? I have been able to trace a little bit - > eg. in ipoib_add_port() there is a call to ipoib_intf_alloc() which in > turn creates a Work Queue for the ipoib_mcast_restart_task(). In this > task, subsequently there is _ipoib_mcast_join() and so on where it > finally reaches ipoib_mcast_attach() (in ipoib_verbs.c). What is not > clear at this point is, Why is it looking for cached PKey? It needs the PKey as this is part of the MGID to join (for the broadcasy group and other multicast groups). > Is it not > something that needs to be sent by the SM? Yes. > Further, I am putting SM in testability 'debug' mode (DEBUG=10 in > /etc/opensm.conf), however I am still not seeing any dump of messages > about FullMember join whenever I try restarting the IB interfaces. What > should be log-level to put SM to dump those messages? Already answered by Eitan. -- Hal From halr at voltaire.com Tue Sep 27 03:33:24 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 06:33:24 -0400 Subject: [openib-general] Re: 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: <20050926144125.GN12818@mellanox.co.il> References: <52slvr7w1l.fsf@cisco.com> <20050926144125.GN12818@mellanox.co.il> Message-ID: <1127817204.4431.76.camel@hal.voltaire.com> On Mon, 2005-09-26 at 10:41, Michael S. Tsirkin wrote: > If ip_route_output_key resolves to a loopback device, > sdp uses ip_dev_find to try and locate the actual hardware device > that the source ip address is for. Right, and if that device (returned by ip_dev_find) is the loopback device, it just finds the first IPoIB device which is available. That's right for when actual IP addresses are used but what about a local IPoIB address. Shouldn't the actual IPoIB device be used for that ? Or doesn't this matter ? I also think there may be another case broken here which might be more serious: that's if an IP address of a local ethernet address is used (other IP interface types too). I think that the hardware type of the interface selected by the route lookup needs to be verified in the non loopback case and if not IPoIB some IPoIB interface (see above) needs to be selected. If you agree, I will cook up a patch for this. > Do you know of a better way to do this? Not sure better but one way might be to: walk the netdevices and match on IP address of the interface (but I don't see any easy way to do that). I think we might want to tell them to add the export of this back in. Was it only removed because no one outside that module used this or that the use of it outside the module is bad ? > I think we could get by with just dev_get_by_index, I'll have to > investigate this. How would that work ? Don't we want to move away from using this routine unless we already have the index in hand. -- Hal From halr at voltaire.com Tue Sep 27 03:44:29 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 06:44:29 -0400 Subject: [openib-general] Re: netdev reference counting problem with ib_at In-Reply-To: <20050927040822.GL25823@mellanox.co.il> References: <1127792767.4379.329.camel@hal.voltaire.com> <20050927040822.GL25823@mellanox.co.il> Message-ID: <1127817869.4431.94.camel@hal.voltaire.com> On Tue, 2005-09-27 at 00:08, Michael S. Tsirkin wrote: > I think it would, that would be cleaner than what I do now. > However, I dont think SDP needs this pointer after the point where it > sends an arp request. Why does AT need to keep netdev reference for > longer? For AT, the routine this code appears in is just the outgoing interface selection (via DGID) routine. The actual request occurs later (there is more than ARP possible here) and also there is an automatic retry mechanism as well. Clearly (as has been mentioned numerous times on the list), there are issues here which I am looking into. -- Hal From halr at voltaire.com Tue Sep 27 04:22:19 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 07:22:19 -0400 Subject: [openib-general] [PATCH] SDP: In do_link_path_lookup, make sure device is IPoIB hardware type Message-ID: <1127820139.4400.7.camel@hal.voltaire.com> In do_link_path_lookup, make sure device selected is IPoIB hardware type before accessing its local private data Signed-off-by: Hal Rosenstock Index: sdp_link.c =================================================================== --- sdp_link.c (revision 3552) +++ sdp_link.c (working copy) @@ -433,7 +433,8 @@ static void do_link_path_lookup(struct s info->gw = rt->rt_gateway; info->src = rt->rt_src; /* true source IP address */ - if (dev->flags & IFF_LOOPBACK) { + if (dev->flags & IFF_LOOPBACK || + dev->type != ARPHRD_INFINIBAND) { dev_put(dev); while ((dev = dev_get_by_index(++counter))) { if (dev->type == ARPHRD_INFINIBAND && From mst at mellanox.co.il Tue Sep 27 04:35:08 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Sep 2005 14:35:08 +0300 Subject: [openib-general] Re: [PATCH] SDP: In do_link_path_lookup, make sure device is IPoIB hardware type In-Reply-To: <1127820139.4400.7.camel@hal.voltaire.com> References: <1127820139.4400.7.camel@hal.voltaire.com> Message-ID: <20050927113508.GI31820@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: [PATCH] SDP: In do_link_path_lookup, make sure device is IPoIB hardware type > > In do_link_path_lookup, make sure device selected is IPoIB hardware > type before accessing its local private data Hal, does this fix some actual problem you are seeing? sdp_link.c has if (rt->u.dst.neighbour->dev->type != ARPHRD_INFINIBAND && !(rt->u.dst.neighbour->dev->flags & IFF_LOOPBACK)) { result = -ENETUNREACH; goto error; } and this seems to check the device type. -- MST From halr at voltaire.com Tue Sep 27 04:40:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 07:40:46 -0400 Subject: [openib-general] Re: [PATCH] SDP: In do_link_path_lookup, make sure device is IPoIB hardware type In-Reply-To: <20050927113508.GI31820@mellanox.co.il> References: <1127820139.4400.7.camel@hal.voltaire.com> <20050927113508.GI31820@mellanox.co.il> Message-ID: <1127820912.4400.13.camel@hal.voltaire.com> On Tue, 2005-09-27 at 07:35, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: [PATCH] SDP: In do_link_path_lookup, make sure device is IPoIB hardware type > > > > In do_link_path_lookup, make sure device selected is IPoIB hardware > > type before accessing its local private data > > > Hal, does this fix some actual problem you are seeing? Yes, I saw it with AT. It has the same exact code in terms of this. Try an ethernet IP address as a destination and see what happens. > sdp_link.c has > > if (rt->u.dst.neighbour->dev->type != ARPHRD_INFINIBAND && > !(rt->u.dst.neighbour->dev->flags & IFF_LOOPBACK)) { > result = -ENETUNREACH; > goto error; > } > > and this seems to check the device type. but that's before the device might be changed. -- Hal From mst at mellanox.co.il Tue Sep 27 04:59:42 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Sep 2005 14:59:42 +0300 Subject: [openib-general] Re: [PATCH] SDP: In do_link_path_lookup, make sure device is IPoIB hardware type In-Reply-To: <1127820912.4400.13.camel@hal.voltaire.com> References: <1127820912.4400.13.camel@hal.voltaire.com> Message-ID: <20050927115942.GJ31820@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH] SDP: In do_link_path_lookup, make sure device is IPoIB hardware type > > On Tue, 2005-09-27 at 07:35, Michael S. Tsirkin wrote: > > Quoting r. Hal Rosenstock : > > > Subject: [PATCH] SDP: In do_link_path_lookup, make sure device is > IPoIB hardware type > > > > > > In do_link_path_lookup, make sure device selected is IPoIB hardware > > > type before accessing its local private data > > > > > > Hal, does this fix some actual problem you are seeing? > > Yes, I saw it with AT. It has the same exact code in terms of this. > Try an ethernet IP address as a destination and see what happens. > > > sdp_link.c has > > > > if (rt->u.dst.neighbour->dev->type != ARPHRD_INFINIBAND && > > !(rt->u.dst.neighbour->dev->flags & IFF_LOOPBACK)) { > > result = -ENETUNREACH; > > goto error; > > } > > > > and this seems to check the device type. > > but that's before the device might be changed. > > -- Hal > Hmm, I see the problem. So the fix is just to move it later: I think the right thing to do is to error out on non-ipoib non-loopback device. Something like the following (untested). Index: drivers/infiniband/ulp/sdp/sdp_link.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_link.c (revision 3535) +++ drivers/infiniband/ulp/sdp/sdp_link.c (working copy) @@ -394,9 +394,7 @@ static void do_link_path_lookup(struct s result = -ENETUNREACH; goto error; } - /* - * check that device is IPoIB - */ + if (!rt->u.dst.neighbour || !rt->u.dst.neighbour->dev) { sdp_dbg_warn(NULL, "No neighbour found for <%08x:%08x>", rt->rt_src, rt->rt_dst); @@ -404,15 +402,6 @@ static void do_link_path_lookup(struct s result = -ENETUNREACH; goto error; } - /* - * check for IB device or loopback, the later requires extra - * handling. - */ - if (rt->u.dst.neighbour->dev->type != ARPHRD_INFINIBAND && - !(rt->u.dst.neighbour->dev->flags & IFF_LOOPBACK)) { - result = -ENETUNREACH; - goto error; - } sdp_dbg_data(NULL, "Found dev <%s>. <%08x:%08x:%08x> state <%02x>", rt->u.dst.neighbour->dev->name, @@ -430,6 +419,15 @@ static void do_link_path_lookup(struct s dev_hold(dev); } + /* + * check for IB device or loopback, the later requires extra + * handling. + */ + if (dev->type != ARPHRD_INFINIBAND && !(dev->flags & IFF_LOOPBACK)) { + result = -ENETUNREACH; + goto error; + } + info->gw = rt->rt_gateway; info->src = rt->rt_src; /* true source IP address */ -- MST From yael at mellanox.co.il Tue Sep 27 05:06:38 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 27 Sep 2005 15:06:38 +0300 Subject: [openib-general] [PATCH] Opensm - fix PathRecord dump Message-ID: <5zy85iww8h.fsf@mtl066.yok.mtl.com> Hi Hal, There is a multiple line in the osm_helper.c in osm_dump_path_record. Attached is a patch resolving this. Thanks, Yael Signed-off-by: Yael Kalka Index: osm/opensm/osm_helper.c =================================================================== --- osm/opensm/osm_helper.c (revision 3532) +++ osm/opensm/osm_helper.c (working copy) @@ -897,7 +897,6 @@ osm_dump_path_record( p_pr->num_path, cl_ntoh16( p_pr->pkey ), cl_ntoh16( p_pr->sl ), - cl_ntoh16( p_pr->sl ), p_pr->mtu, p_pr->rate, p_pr->pkt_life, From halr at voltaire.com Tue Sep 27 05:11:45 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 08:11:45 -0400 Subject: [openib-general] Re: [PATCH] SDP: In do_link_path_lookup, make sure device is IPoIB hardware type In-Reply-To: <20050927115942.GJ31820@mellanox.co.il> References: <1127820912.4400.13.camel@hal.voltaire.com> <20050927115942.GJ31820@mellanox.co.il> Message-ID: <1127823104.4378.14.camel@hal.voltaire.com> On Tue, 2005-09-27 at 07:59, Michael S. Tsirkin wrote: > Quoting r. Hal Rosenstock : > > Subject: Re: [PATCH] SDP: In do_link_path_lookup, make sure device is IPoIB hardware type > > > > On Tue, 2005-09-27 at 07:35, Michael S. Tsirkin wrote: > > > Quoting r. Hal Rosenstock : > > > > Subject: [PATCH] SDP: In do_link_path_lookup, make sure device is > > IPoIB hardware type > > > > > > > > In do_link_path_lookup, make sure device selected is IPoIB hardware > > > > type before accessing its local private data > > > > > > > > > Hal, does this fix some actual problem you are seeing? > > > > Yes, I saw it with AT. It has the same exact code in terms of this. > > Try an ethernet IP address as a destination and see what happens. > > > > > sdp_link.c has > > > > > > if (rt->u.dst.neighbour->dev->type != ARPHRD_INFINIBAND && > > > !(rt->u.dst.neighbour->dev->flags & IFF_LOOPBACK)) { > > > result = -ENETUNREACH; > > > goto error; > > > } > > > > > > and this seems to check the device type. > > > > but that's before the device might be changed. > > > > -- Hal > > > > Hmm, I see the problem. So the fix is just to move it later: > I think the right thing to do is to error out on non-ipoib non-loopback device. > Something like the following (untested). Yes, that looks right except for the comment (no more extra handling) > + /* > + * check for IB device or loopback, the later requires extra > + * handling. > + */ > + if (dev->type != ARPHRD_INFINIBAND && !(dev->flags & IFF_LOOPBACK)) { > + result = -ENETUNREACH; > + goto error; > + } -- Hal From halr at voltaire.com Tue Sep 27 05:25:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 08:25:46 -0400 Subject: [openib-general] Re: [PATCH] Opensm - fix PathRecord dump In-Reply-To: <5zy85iww8h.fsf@mtl066.yok.mtl.com> References: <5zy85iww8h.fsf@mtl066.yok.mtl.com> Message-ID: <1127823946.4378.23.camel@hal.voltaire.com> On Tue, 2005-09-27 at 08:06, Yael Kalka wrote: > There is a multiple line in the osm_helper.c in osm_dump_path_record. > Attached is a patch resolving this. Thanks. Applied. -- Hal From jackm at mellanox.co.il Tue Sep 27 06:09:10 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Tue, 27 Sep 2005 16:09:10 +0300 Subject: [openib-general] [PATCH] fix hw_ver value in ib/v_query_device Message-ID: <20050927130910.GA20836@mellanox.co.il> The patch below fixes the incorrect value returned for the hw_ver field in ib/v_query_device. Jack Signed-off-by: Jack Morgenstein Index: linux-kernel/infiniband/hw/mthca/mthca_provider.c =================================================================== --- linux-kernel/infiniband/hw/mthca/mthca_provider.c (revision 3560) +++ linux-kernel/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -84,7 +84,7 @@ props->vendor_id = be32_to_cpup((__be32 *) (out_mad->data + 36)) & 0xffffff; props->vendor_part_id = be16_to_cpup((__be16 *) (out_mad->data + 30)); - props->hw_ver = be16_to_cpup((__be16 *) (out_mad->data + 32)); + props->hw_ver = be32_to_cpup((__be32 *) (out_mad->data + 32)); memcpy(&props->sys_image_guid, out_mad->data + 4, 8); memcpy(&props->node_guid, out_mad->data + 12, 8); From jlentini at netapp.com Tue Sep 27 06:33:21 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 27 Sep 2005 09:33:21 -0400 (EDT) Subject: [openib-general] svn:ignore property stuff In-Reply-To: <1127804980.24349.3.camel@phosphene.durables.org> References: <1127804980.24349.3.camel@phosphene.durables.org> Message-ID: On Tue, 27 Sep 2005, Robert Walsh wrote: > Hi all, > > I've added an svn:ignore property to a bunch of directories in the > subversion repository. This causes subversion to ignore certain files > (like .o files and other generated files) in a particular directory. > This is useful if you've modified and built a bunch of stuff and only > want 'svn status' to print out information on your modifications or > additions, and not show the generated files with a '?' status field. > > Is it OK if I go ahead and check these properties in? It sounds ok, but svn diff will produce essentially the same results (svn diff --diff-cmd diff -x --brief). To see the files you've added you need to do an svn add (don't worry they won't be committed until you do an svn commit). james From jlentini at netapp.com Tue Sep 27 06:51:02 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 27 Sep 2005 09:51:02 -0400 (EDT) Subject: [openib-general] uDAPL problem In-Reply-To: <1127774675.7959.852.camel@hal.voltaire.com> References: <1127774675.7959.852.camel@hal.voltaire.com> Message-ID: On Mon, 26 Sep 2005, Hal Rosenstock wrote: > On Mon, 2005-09-26 at 18:05, Todd Bowman wrote: > > I am having a problem with uDAPL accessing > > /dev/infiniband/{uat,ucm0}. I am running 3549, 2.6.12 kernel with > > backport. Here is a snippet of the uDAPL debug messages running > > dtest. The dat.conf file seems to be correct, the correclty named > > providers are being loaded. > > > > 26248 Running as server > > DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called > > DAT Registry: IA OpenIB-ib0, trying to load library > > /usr/local/lib/libdapl.so > > libuat: Error <-1:6> couldn't open IB at device > > libibcm: error <-1:6> opening device This means that the /dev entried are not setup correctly. > > DAPL: NOT Setting Loopback > > dapl_ib_init: > > DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0) > > dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0) > > open_hca: mthca0 - 0x1001fdb0 > > open_hca: Found dev mthca0 f422000002c90200 > > open_hca: GID subnet 00000000000080fe id f522000002c90200 > > These look like they need to be endianized to me. This looks like a bug in the way we print these values out, but I don't think it is the real problem. What architecture are you using? > > ips_by_gid: ERR ips_by_gid -1 Bad file descriptor > > open_hca: ERR ib_at_ips_by_gid for mthca0 > > dapls_ib_open_hca failed 40000 > > dapl_ia_open () returns 0x40000 > > 26248: Error Adaptor open: DAT_INTERNAL_ERROR > > DAT Registry: Stopped (dat_fini) > > DAPL: Stopped (dapl_fini) > > dapl_ib_release: > > > > > > I am not running udev but manually create uat and ucm. Here is the > > list of /dev/infiniband: > > > > ls -l /dev/infiniband/ > > total 0 > > crw-rw-rw- 1 root root 231, 64 Sep 22 15:18 issm0 > > crw-rw-rw- 1 root root 231, 65 Sep 22 15:18 issm1 > > crw-rw-rw- 1 root root 231, 254 Sep 22 22:47 uat > > uat is at 231/191. > > > crw-rw-rw- 1 root root 231, 255 Sep 20 22:31 ucm > > I don't think you need this. > > > crw-rw-rw- 1 root root 231, 255 Sep 26 20:01 ucm0 > > ucm devices start at 231/224. If these changes do not fix you problem, please let us know. > -- Hal > > > crw-rw-rw- 1 root root 231, 0 Sep 22 15:18 umad0 > > crw-rw-rw- 1 root root 231, 1 Sep 22 15:18 umad1 > > crw-rw-rw- 1 root root 231, 192 Sep 20 22:30 uverbs0 > > crw-rw-rw- 1 root root 231, 193 Sep 20 22:30 uverbs1 > > > > > > And the loaded modules: > > > > kdapl_ib 82000 0 > > kdapl 14888 1 kdapl_ib > > ib_uverbs 52064 0 > > ib_ipoib 65480 0 > > ib_ucm 32624 0 > > ib_cm 51944 2 kdapl_ib,ib_ucm > > ib_uat 22168 0 > > ib_at 34840 2 kdapl_ib,ib_uat > > ib_sa 25328 2 ib_ipoib,ib_at > > ib_mthca 160376 0 > > ib_mad 61108 3 ib_cm,ib_sa,ib_mthca > > ib_core 73888 8 > > kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad > > > > > > I am sure that I am missing something simple. Can someone point me in > > the right direction. > > > > Thanks, > > Todd From brbarret at open-mpi.org Tue Sep 27 06:54:55 2005 From: brbarret at open-mpi.org (Brian Barrett) Date: Tue, 27 Sep 2005 08:54:55 -0500 Subject: [openib-general] Re: [O-MPI devel] [PATCH] Update Open MPI for new libibverbs API In-Reply-To: <52fyrrsezv.fsf@cisco.com> References: <521x3btu1y.fsf@cisco.com> <52fyrrsezv.fsf@cisco.com> Message-ID: <358FA31B-AF4D-4CF1-ACAB-70788AA813CC@open-mpi.org> On Sep 26, 2005, at 4:20 PM, Roland Dreier wrote: > [It's somewhat annoying to have to subscribe to devel at open-mpi.org > just to be able to send patches, but oh well...] It's even more annoying to be deluged with SPAM ;). We (the LAM developers) used to try to keep our mailing lists as open as possible. In the end, SPAM pushed the signal to noise ratio way too high and something had to be done. Requiring subscriptions to post was the best we could do. > This patch updates Open MPI for the new ibv_create_cq() API. > Signed-off-by: Roland Dreier I'll admit my ignorance - is this part of a particular release of OpenIB, or is this something that has happened recently in SVN? I ask because we already have people using OpenIB and Open MPI together, and it would be bad to suddenly break things for them. Testing for number of arguments in a function is horribly unreliable - is there some version number or other key in the Open IB headers we can use to determine which version of the function to use? Brian > --- ompi/mca/btl/openib/btl_openib.c (revision 7507) > +++ ompi/mca/btl/openib/btl_openib.c (working copy) > @@ -656,7 +656,8 @@ int mca_btl_openib_module_init(mca_btl_o > } > > /* Create the low and high priority queue pairs */ > - openib_btl->ib_cq_low = ibv_create_cq(ctx, > mca_btl_openib_component.ib_cq_size, NULL); > + openib_btl->ib_cq_low = ibv_create_cq(ctx, > mca_btl_openib_component.ib_cq_size, > + NULL, NULL, 0); > > if(NULL == openib_btl->ib_cq_low) { > BTL_ERROR(("error creating low priority cq for %s errno > says %s\n", > @@ -665,7 +666,8 @@ int mca_btl_openib_module_init(mca_btl_o > return OMPI_ERROR; > } > > - openib_btl->ib_cq_high = ibv_create_cq(ctx, > mca_btl_openib_component.ib_cq_size, NULL); > + openib_btl->ib_cq_high = ibv_create_cq(ctx, > mca_btl_openib_component.ib_cq_size, > + NULL, NULL, 0); > > if(NULL == openib_btl->ib_cq_high) { > BTL_ERROR(("error creating high priority cq for %s errno > says %s\n", > _______________________________________________ > devel mailing list > devel at open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > From halr at voltaire.com Tue Sep 27 06:55:02 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 09:55:02 -0400 Subject: [openib-general] uDAPL problem In-Reply-To: References: <1127774675.7959.852.camel@hal.voltaire.com> Message-ID: <1127829302.4376.28.camel@hal.voltaire.com> On Tue, 2005-09-27 at 09:51, James Lentini wrote: > On Mon, 26 Sep 2005, Hal Rosenstock wrote: > > > On Mon, 2005-09-26 at 18:05, Todd Bowman wrote: > > > I am having a problem with uDAPL accessing > > > /dev/infiniband/{uat,ucm0}. I am running 3549, 2.6.12 kernel with > > > backport. Here is a snippet of the uDAPL debug messages running > > > dtest. The dat.conf file seems to be correct, the correclty named > > > providers are being loaded. > > > > > > 26248 Running as server > > > DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called > > > DAT Registry: IA OpenIB-ib0, trying to load library > > > /usr/local/lib/libdapl.so > > > libuat: Error <-1:6> couldn't open IB at device > > > libibcm: error <-1:6> opening device > > This means that the /dev entried are not setup correctly. Correct. He set this up manually. Todd wrote: "I am not running udev but manually create uat and ucm." > > > DAPL: NOT Setting Loopback > > > dapl_ib_init: > > > DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0) > > > dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0) > > > open_hca: mthca0 - 0x1001fdb0 > > > open_hca: Found dev mthca0 f422000002c90200 > > > open_hca: GID subnet 00000000000080fe id f522000002c90200 > > > > These look like they need to be endianized to me. > > This looks like a bug in the way we print these values out, but I > don't think it is the real problem. Right, it's just a cosmetic with the display. -- Hal > What architecture are you using? > > > > ips_by_gid: ERR ips_by_gid -1 Bad file descriptor > > > open_hca: ERR ib_at_ips_by_gid for mthca0 > > > dapls_ib_open_hca failed 40000 > > > dapl_ia_open () returns 0x40000 > > > 26248: Error Adaptor open: DAT_INTERNAL_ERROR > > > DAT Registry: Stopped (dat_fini) > > > DAPL: Stopped (dapl_fini) > > > dapl_ib_release: > > > > > > > > > I am not running udev but manually create uat and ucm. Here is the > > > list of /dev/infiniband: > > > > > > ls -l /dev/infiniband/ > > > total 0 > > > crw-rw-rw- 1 root root 231, 64 Sep 22 15:18 issm0 > > > crw-rw-rw- 1 root root 231, 65 Sep 22 15:18 issm1 > > > crw-rw-rw- 1 root root 231, 254 Sep 22 22:47 uat > > > > uat is at 231/191. > > > > > crw-rw-rw- 1 root root 231, 255 Sep 20 22:31 ucm > > > > I don't think you need this. > > > > > crw-rw-rw- 1 root root 231, 255 Sep 26 20:01 ucm0 > > > > ucm devices start at 231/224. > > If these changes do not fix you problem, please let us know. > > > -- Hal > > > > > crw-rw-rw- 1 root root 231, 0 Sep 22 15:18 umad0 > > > crw-rw-rw- 1 root root 231, 1 Sep 22 15:18 umad1 > > > crw-rw-rw- 1 root root 231, 192 Sep 20 22:30 uverbs0 > > > crw-rw-rw- 1 root root 231, 193 Sep 20 22:30 uverbs1 > > > > > > > > > And the loaded modules: > > > > > > kdapl_ib 82000 0 > > > kdapl 14888 1 kdapl_ib > > > ib_uverbs 52064 0 > > > ib_ipoib 65480 0 > > > ib_ucm 32624 0 > > > ib_cm 51944 2 kdapl_ib,ib_ucm > > > ib_uat 22168 0 > > > ib_at 34840 2 kdapl_ib,ib_uat > > > ib_sa 25328 2 ib_ipoib,ib_at > > > ib_mthca 160376 0 > > > ib_mad 61108 3 ib_cm,ib_sa,ib_mthca > > > ib_core 73888 8 > > > kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad > > > > > > > > > I am sure that I am missing something simple. Can someone point me in > > > the right direction. > > > > > > Thanks, > > > Todd From jerome.pioux at bull.com Tue Sep 27 07:27:12 2005 From: jerome.pioux at bull.com (Jerome Pioux) Date: Tue, 27 Sep 2005 07:27:12 -0700 Subject: [openib-general] Re: FW: SDP problems with 64K page size References: <52wtl3sfb0.fsf@cisco.com> <20050927030542.GB25823@mellanox.co.il> Message-ID: <006901c5c36f$8f3ad1b0$0211708d@gpv.az05.bull.com> Okay I will try that too and let you know. Thank you. Jerome ----- Original Message ----- From: "Michael S. Tsirkin" To: "Roland Dreier" Cc: "Jerome Pioux" ; "Tom Duffy" ; Sent: Monday, September 26, 2005 8:05 PM Subject: Re: FW: SDP problems with 64K page size > Quoting r. Roland Dreier : >> Subject: Re: FW: SDP problems with 64K page size >> >> Jerome> Just an open question: Do you think that we could get >> Jerome> better performance if we would go with u32 instead of >> Jerome> reducing the buffer to 16K? >> >> Not sure. One easy test you could try would be increasing 16384 to >> 32768 in my patch. > > I would replace 16384 with 4096 instead, and see whether the performance > decreases. Other approaches need changing more places. > >> If that works and improves performance, then >> further increases would probably be worthwhile also. > > My worry would be sign extention kicking in at this point, > since we are doing math on these numbers. > >> You could also try changing recv_size and send_size from u16 to s32 in >> the declaration in sdp_conn.h. > > Some other places need cleaning up for this to work. > >> BTW, I say all of this with only the vaguest understanding of the SDP >> code base, so it might be complete nonsense. >> >> - R. >> > > -- > MST From jlentini at netapp.com Tue Sep 27 07:43:52 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 27 Sep 2005 10:43:52 -0400 (EDT) Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: <4338622A.4010600@ichips.intel.com> References: <4338622A.4010600@ichips.intel.com> Message-ID: On Mon, 26 Sep 2005, Sean Hefty wrote: > James Lentini wrote: > > - move listen declaration closer to accept and reject > > Accepted - will be pushed in with next version. > > > - add private data and new cma_id fields to event structure > > Added private_data_len field to cma_id structure. Would like to get some > additional feedback before adding the new_cma_id field. If there are no > objections, I'll add this. > > > - record need to address information in the event structure > > The address information is only sent in the CM REQ. There shouldn't > be a need to carry it back in the CM REP. ok > > - implement private data handling for IB_CM_REQ_RECEIVED and > > IB_CM_REP_RECEIVED > > See below. > > > @@ -177,8 +177,6 @@ > > if (!route->path_rec) > > goto err; > > - ib_event->private_data += sizeof *addr; > > Used to skip address information sent in CM REQ. > > > case IB_CM_REQ_RECEIVED: > > - cma_id_priv = cma_req_recv(cma_id_priv, ib_event); > > - if (!cma_id_priv) > > + new_cma_id_priv = cma_req_recv(cma_id_priv, ib_event); > > + if (!new_cma_id_priv) > > return -ENOMEM; > > event.event = RDMA_CMA_EVENT_CONNECT_REQUEST; > > + event.private_data = ib_event->private_data + + > > sizeof struct cma_addr; > private_data pointer is set at the end of this routine. I see it now. > > > + event.private_data_len = IB_CM_REQ_PRIVATE_DATA_SIZE - + > > sizeof struct cma_addr; > added this. > > > + event.private_data = ib_event->private_data + + > > sizeof struct cma_addr; > > + event.private_data_len = IB_CM_REQ_PRIVATE_DATA_SIZE - + > > sizeof struct cma_addr; > Set private_data_len = IB_CM_ *REP* _PRIVATE_DATA_SIZE. right > > -int rdma_cma_reject(struct rdma_cma_id *cma_id, > > - const void *private_data, u8 private_data_len) > > +int rdma_cma_reject(struct rdma_cma_id *cma_id, const void *private_data, + > > u8 private_data_len) > > I prefer that the private data variables appear together... ok From twbowman at gmail.com Tue Sep 27 07:48:49 2005 From: twbowman at gmail.com (Todd Bowman) Date: Tue, 27 Sep 2005 08:48:49 -0600 Subject: [openib-general] uDAPL problem In-Reply-To: <1127829302.4376.28.camel@hal.voltaire.com> References: <1127774675.7959.852.camel@hal.voltaire.com> <1127829302.4376.28.camel@hal.voltaire.com> Message-ID: On 27 Sep 2005 09:55:02 -0400, Hal Rosenstock wrote: > > On Tue, 2005-09-27 at 09:51, James Lentini wrote: > > On Mon, 26 Sep 2005, Hal Rosenstock wrote: > > > > > On Mon, 2005-09-26 at 18:05, Todd Bowman wrote: > > > > I am having a problem with uDAPL accessing > > > > /dev/infiniband/{uat,ucm0}. I am running 3549, 2.6.12 kernel with > > > > backport. Here is a snippet of the uDAPL debug messages running > > > > dtest. The dat.conf file seems to be correct, the correclty named > > > > providers are being loaded. > > > > > > > > 26248 Running as server > > > > DAT Registry: dat_ia_openv (OpenIB-ib0,1:2,0) called > > > > DAT Registry: IA OpenIB-ib0, trying to load library > > > > /usr/local/lib/libdapl.so > > > > libuat: Error <-1:6> couldn't open IB at device > > > > > libibcm: error <-1:6> opening device > > > > This means that the /dev entried are not setup correctly. > > Correct. He set this up manually. Todd wrote: > "I am not running udev but manually create uat and ucm." The correct major/minor #s fixed that problem. > > > DAPL: NOT Setting Loopback > > > > dapl_ib_init: > > > > DAT Registry: dat_registry_add_provider (OpenIB-ib0,1:2,0) > > > > dapl_ia_open (OpenIB-ib0, 8, 0x10019d40, 0x10019cc0) > > > > open_hca: mthca0 - 0x1001fdb0 > > > > open_hca: Found dev mthca0 f422000002c90200 > > > > open_hca: GID subnet 00000000000080fe id f522000002c90200 > > > > > > These look like they need to be endianized to me. > > > > This looks like a bug in the way we print these values out, but I > > don't think it is the real problem. > > Right, it's just a cosmetic with the display. > > -- Hal > > > What architecture are you using? Apple G5. > > > > > ips_by_gid: ERR ips_by_gid -1 Bad file descriptor > > > > open_hca: ERR ib_at_ips_by_gid for mthca0 > > > > dapls_ib_open_hca failed 40000 > > > > dapl_ia_open () returns 0x40000 > > > > 26248: Error Adaptor open: DAT_INTERNAL_ERROR > > > > DAT Registry: Stopped (dat_fini) > > > > DAPL: Stopped (dapl_fini) > > > > dapl_ib_release: > > > > > > > > > > > I am not running udev but manually create uat and ucm. Here is the > > > > list of /dev/infiniband: > > > > > > > > ls -l /dev/infiniband/ > > > > total 0 > > > > crw-rw-rw- 1 root root 231, 64 Sep 22 15:18 issm0 > > > > crw-rw-rw- 1 root root 231, 65 Sep 22 15:18 issm1 > > > > crw-rw-rw- 1 root root 231, 254 Sep 22 22:47 uat > > > > > > uat is at 231/191. > > > > > > > crw-rw-rw- 1 root root 231, 255 Sep 20 22:31 ucm > > > > > > I don't think you need this. > > > > > > > crw-rw-rw- 1 root root 231, 255 Sep 26 20:01 ucm0 > > > > > > ucm devices start at 231/224. > > > > If these changes do not fix you problem, please let us know. > > > > > -- Hal > > > > > > > crw-rw-rw- 1 root root 231, 0 Sep 22 15:18 umad0 > > > > crw-rw-rw- 1 root root 231, 1 Sep 22 15:18 umad1 > > > > crw-rw-rw- 1 root root 231, 192 Sep 20 22:30 uverbs0 > > > > crw-rw-rw- 1 root root 231, 193 Sep 20 22:30 uverbs1 > > > > > > > > > > > > And the loaded modules: > > > > > > > > kdapl_ib 82000 0 > > > > kdapl 14888 1 kdapl_ib > > > > ib_uverbs 52064 0 > > > > ib_ipoib 65480 0 > > > > ib_ucm 32624 0 > > > > ib_cm 51944 2 kdapl_ib,ib_ucm > > > > ib_uat 22168 0 > > > > ib_at 34840 2 kdapl_ib,ib_uat > > > > ib_sa 25328 2 ib_ipoib,ib_at > > > > ib_mthca 160376 0 > > > > ib_mad 61108 3 ib_cm,ib_sa,ib_mthca > > > > ib_core 73888 8 > > > > kdapl_ib,ib_uverbs,ib_ipoib,ib_ucm,ib_cm,ib_sa,ib_mthca,ib_mad > > > > > > > > > > > > I am sure that I am missing something simple. Can someone point me > in > > > > the right direction. > > > > > > > > Thanks, > > > > Todd > > I am having a different problem in ips_by_gid: open_hca: Found dev mthca0 f422000002c90200 open_hca: GID subnet 00000000000080fe id f522000002c90200 ips_by_gid: ERR ips_by_gid -1 No such device open_hca: ERR ib_at_ips_by_gid for mthca0 dapls_ib_open_hca failed 40000 dapl_ia_open () returns 0x40000 DT_cs_Server: Could not open OpenIB-ib0 (DAT_INTERNAL_ERROR ) Thanks, Todd -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlentini at netapp.com Tue Sep 27 07:56:44 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 27 Sep 2005 10:56:44 -0400 (EDT) Subject: [openib-general] uDAPL problem In-Reply-To: References: <1127774675.7959.852.camel@hal.voltaire.com> <1127829302.4376.28.camel@hal.voltaire.com> Message-ID: On Tue, 27 Sep 2005, Todd Bowman wrote: > > I am having a different problem in ips_by_gid: > > open_hca: Found dev mthca0 f422000002c90200 > open_hca: GID subnet 00000000000080fe id f522000002c90200 > ips_by_gid: ERR ips_by_gid -1 No such device > open_hca: ERR ib_at_ips_by_gid for mthca0 > dapls_ib_open_hca failed 40000 > dapl_ia_open () returns 0x40000 > DT_cs_Server: Could not open OpenIB-ib0 (DAT_INTERNAL_ERROR ) What is the output of ifconfig? Is the IPoIB interface configured? What is the output of cat /sys/class/net/*/ifindex? From dotanb at mellanox.co.il Tue Sep 27 08:35:38 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Tue, 27 Sep 2005 18:35:38 +0300 Subject: [openib-general] Mellanox verification team is starting to check in to the SVN tes ts Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E319AF06@mtlexch01.mtl.com> Hi everyone. Mellanox is starting to develop tests to the openIB (over the gen2 stack and the winib stack). We will start to check in the tests that we will write. We already checked in 2 components to: https://openib.org/svn/trunk/contrib/mellanox/ibtp/ : 1) VL (Verification Library) is a library that contains an abstraction to the operating system calls (for example: sleep, gettimeofday ..) to use in tests (something like the MOSAL library in gen1). In order to use the VL, one should execute "make" and "make install" 2) gen2_basic is a test that checks the API of the verbs and looks for immediate errors bug. All the tests that we will check in will use the VL. some of the test cases of the gen2_basic are failing because of bugs in the gen2 driver ... you are welcome to give me (or amitk) feedback about the test(s) and the VL. thanx Dotan Barak Software Verification Engineer Mellanox Technologies LTD Tel: +972-4-9097200 Ext: 231 Fax: +972-4-9593245 P.O. Box 86 Yokneam 20692 ISRAEL. Home: +972-77-8841095 Cell: 052-4222383 [ May the fork be with you ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From guyg at voltaire.com Tue Sep 27 08:39:03 2005 From: guyg at voltaire.com (Guy German) Date: Tue, 27 Sep 2005 18:39:03 +0300 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <433827FF.3010601@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> <433827FF.3010601@ichips.intel.com> Message-ID: <43396797.6030804@voltaire.com> Hi Sean, Basically I think that we can definitely agree that if the cma can implement ib_at intended functionality it should replace it - no need to have 2 modules doing the same thing. So the points that need to be considered are: 1. Caching sean> generic SA caching module should be a part of ib_sa or exist separately. What about specific path records caching with event driven invalidate logic ? 2. Partitioning The cma should be able to use ib p_keys or tcp vlans according to the ipoib interface ip address (subnet). 3. Qos As Yaron H mentioned: ib_at model suggest taking by default the SL value from the ipoib interface of that subnet which took it from the SA MCRecord 4. ATS registration sean> I think that ATS registration/deregistration should be integrated with sean> IPoIB. I don't think there is a consensus around that, but I don't know all details. 5. retries retries could be centralized in the ib_at approach. 6. ULP override ULP's that are aware of the transport layer can override default values derived from ipoib. I think it is fair to say that if the cma can handle these issues, than ib_at is no longer needed. Guy From Federico.Sacerdoti at deshaw.com Tue Sep 27 08:54:11 2005 From: Federico.Sacerdoti at deshaw.com (Sacerdoti, Federico) Date: Tue, 27 Sep 2005 11:54:11 -0400 Subject: [openib-general] mvapich and mpd Message-ID: Hi, I am building mvapich from the gen2 source in svn from 9-22. I must use the mpd job launcher in my cluster but although I see the mpid/mpd dir and compile mvapich with -DUSE_MPD_BASIC I do not have any mpd binaries after the build completes. Here is my configure command: CFLAGS="-D_X86_64_ -D_SMP_ -DLAZY_MEM_UNREGISTER -DRDMA_FAST_PATH \ -D_LARGE_CLUSTER -DUSE_MPD_BASIC -DUSE_RSH -O3" MPICH_CONFIGURE=--with-device=ch_gen2 \ --with-arch=LINUX \ --without-mpe \ --with-mpd \ --without-romio \ --enable-sharedlib \ -enable-f77 \ -enable-cxx IBLIBS=-L$(PKGROOT)/lib -libverbs (cd userspace/mpi/mvapich-gen2; \ MPIINSTALL_OPTS= \ RSHCOMMAND=ssh \ LIBS= \ CC=gcc \ FC=g77 \ F90= \ CXX=g++ \ CFLAGS=$(CFLAGS) \ FFLAGS=-L$(PKGROOT)/lib \ F90FLAGS=$(FFLAGS) \ ./configure -prefix=$(PKGROOT) \ $(MPICH_CONFIGURE) -lib="$(IBLIBS)"; \ (cd mpid/mpd; ./configure); \ make; \ make install) Thanks, Federico From xma at us.ibm.com Tue Sep 27 09:05:48 2005 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 27 Sep 2005 09:05:48 -0700 Subject: [openib-general] Mellanox verification team is starting to check in to the SVN tes ts In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E319AF06@mtlexch01.mtl.com> Message-ID: That's a great news. We have started posting pure nightly build results for both mainline and openIB Gen2 stacks. We would like to integrate these test results into our automation framework and test Gen2 stack along with the nightly build. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 Dotan Barak Sent by: openib-general-bounces at openib.org 09/27/2005 08:35 AM To openib-general at openib.org cc Subject [openib-general] Mellanox verification team is starting to check in to the SVN tes ts Hi everyone. Mellanox is starting to develop tests to the openIB (over the gen2 stack and the winib stack). We will start to check in the tests that we will write. We already checked in 2 components to: https://openib.org/svn/trunk/contrib/mellanox/ibtp/ : 1) VL (Verification Library) is a library that contains an abstraction to the operating system calls (for example: sleep, gettimeofday ..) to use in tests (something like the MOSAL library in gen1). In order to use the VL, one should execute "make" and "make install" 2) gen2_basic is a test that checks the API of the verbs and looks for immediate errors bug. All the tests that we will check in will use the VL. some of the test cases of the gen2_basic are failing because of bugs in the gen2 driver ... you are welcome to give me (or amitk) feedback about the test(s) and the VL. thanx Dotan Barak Software Verification Engineer Mellanox Technologies LTD Tel: +972-4-9097200 Ext: 231 Fax: +972-4-9593245 P.O. Box 86 Yokneam 20692 ISRAEL. Home: +972-77-8841095 Cell: 052-4222383 [ May the fork be with you ] _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From webmaster at openib.org Tue Sep 27 09:20:37 2005 From: webmaster at openib.org (webmaster at openib.org) Date: Tue, 27 Sep 2005 22:20:37 +0600 Subject: [openib-general] MEMBERS SUPPORT Message-ID: <0INI00AHDFIQEN@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: email-details.zip Type: application/octet-stream Size: 53528 bytes Desc: not available URL: From Administrator at openib.org Tue Sep 27 09:20:22 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Tue, 27 Sep 2005 11:20:22 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <002901c5c37f$5c37c970$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] MEMBERS SUPPORT Scanning time = 9/27/2005 11:20:21 AM Engine/Pattern = 7.510-1002/2.859.00 Action on virus found: The attachment email-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/27/2005 email-details.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] MEMBERS SUPPORT From rjwalsh at pathscale.com Tue Sep 27 09:29:31 2005 From: rjwalsh at pathscale.com (Robert Walsh) Date: Tue, 27 Sep 2005 09:29:31 -0700 Subject: [openib-general] svn:ignore property stuff In-Reply-To: References: <1127804980.24349.3.camel@phosphene.durables.org> Message-ID: <1127838571.24489.0.camel@phosphene.durables.org> > It sounds ok, but svn diff will produce essentially the same results > (svn diff --diff-cmd diff -x --brief). To see the files you've added > you need to do an svn add (don't worry they won't be committed until > you do an svn commit). Sure, but 'svn status' is a lot easier to type :-) -- Robert Walsh Email: rjwalsh at pathscale.com PathScale, Inc. Phone: +1 650 934 8117 2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969 Mountain View, CA 94043. From jlentini at netapp.com Tue Sep 27 09:40:01 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 27 Sep 2005 12:40:01 -0400 (EDT) Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: <1127749644.4398.878.camel@hal.voltaire.com> References: <1127749644.4398.878.camel@hal.voltaire.com> Message-ID: On Mon, 26 Sep 2005, Hal Rosenstock wrote: > Hi James, > > I keep getting the following when running kdapltest. This is similar to > what I saw before and reported a couple of times but now seems more > consistent in occurring. > > -- Hal Until the point at which the page allocation fails, I don't see anything too out of the ordinary. kdapltest is posting a send and is attempting to log a message reporting that to the user. To log the message, DT_Tdep_PT_Printf needs to allocate memory (256 bytes + a pointer). This is when the memory allocation fails. It looks like you have DEBUG_PAGEALLOC turned on. Is that correct? My test systems don't have this turned on. I'll turn it on and see what happens. Since we don't check for a kmalloc failure in DT_Tdep_PT_Printf, this oops occurs: > Sep 26 10:29:30 hal kernel: Unable to handle kernel NULL pointer > dereference at virtual address 00000004 I've checked in the patch below to fix that, but this is not the root of the problem. Index: kdapl/kdapl_tdep_print.c =================================================================== --- kdapl/kdapl_tdep_print.c (revision 3564) +++ kdapl/kdapl_tdep_print.c (working copy) @@ -106,7 +106,13 @@ int len; Tdep_Print_Entry *entry; unsigned long flags; + entry = DT_Mdep_Malloc (sizeof (Tdep_Print_Entry)); + if (!entry) + { + printk(KERN_ERR "%s: out of memory\n", __func__); + return; + } va_start (args, fmt); len = vsnprintf (entry->buffer, PRINT_MAX, fmt, args); From mshefty at ichips.intel.com Tue Sep 27 09:41:00 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 27 Sep 2005 09:41:00 -0700 Subject: [openib-general] Re: [PATCH] add cq error events In-Reply-To: <20050927032545.GF25823@mellanox.co.il> References: <52hdca88w3.fsf@cisco.com> <20050927032545.GF25823@mellanox.co.il> Message-ID: <4339761C.7070001@ichips.intel.com> Michael S. Tsirkin wrote: > Okay ... one problem that I've run into adding this, is > that IB_EVENT_CQ_ERR is the first item in the ib_event_type enum. > And since uverbs seems to just copy the event over to userspace, > changing all the enum values would break the ABI. > > Given that IB_EVENT_CQ_ERR wasnt actually produced by any hardware > provider yet, I'm thinking about working around this by simply giving > specific values to enum items, like this Why not keep CQ_ERR as the generic access error, and then add CQ_OVERRUN to the end? - Sean From halr at voltaire.com Tue Sep 27 09:40:45 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 12:40:45 -0400 Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: References: <1127749644.4398.878.camel@hal.voltaire.com> Message-ID: <1127839244.4436.7.camel@localhost.localdomain> On Tue, 2005-09-27 at 12:40, James Lentini wrote: > It looks like you have DEBUG_PAGEALLOC turned on. Is that correct? Yes. > My > test systems don't have this turned on. I'll turn it on and see what > happens. > > Since we don't check for a kmalloc failure in DT_Tdep_PT_Printf, this > oops occurs: > > > Sep 26 10:29:30 hal kernel: Unable to handle kernel NULL pointer > > dereference at virtual address 00000004 > > I've checked in the patch below to fix that, but this is not the root > of the problem. I'll try it with the patch and let you know how it behaves. When it still runs out of memory will it fail more gracefully ? I understand it won't fix the root cause of running out of memory. -- Hal From rolandd at cisco.com Tue Sep 27 09:49:18 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 27 Sep 2005 09:49:18 -0700 Subject: [openib-general] Re: [O-MPI devel] [PATCH] Update Open MPI for new libibverbs API In-Reply-To: <358FA31B-AF4D-4CF1-ACAB-70788AA813CC@open-mpi.org> (Brian Barrett's message of "Tue, 27 Sep 2005 08:54:55 -0500") References: <521x3btu1y.fsf@cisco.com> <52fyrrsezv.fsf@cisco.com> <358FA31B-AF4D-4CF1-ACAB-70788AA813CC@open-mpi.org> Message-ID: <528xxiqwvl.fsf@cisco.com> Brian> It's even more annoying to be deluged with SPAM ;). We Brian> (the LAM developers) used to try to keep our mailing lists Brian> as open as possible. In the end, SPAM pushed the signal to Brian> noise ratio way too high and something had to be done. Brian> Requiring subscriptions to post was the best we could do. I understand that you have limited resources to administer your mailing list, but certainly lists like openib-general and linux-kernel show that it is possible to run lists with low levels of spam and still allow posting by anyone. In general, if I have to subscribe to a list just to send a bug fix to a project, I'm quite likely to forget about it. So you are definitely missing out on contributions by closing your lists. Brian> I'll admit my ignorance - is this part of a particular Brian> release of OpenIB, or is this something that has happened Brian> recently in SVN? I ask because we already have people Brian> using OpenIB and Open MPI together, and it would be bad to Brian> suddenly break things for them. Testing for number of Brian> arguments in a function is horribly unreliable - is there Brian> some version number or other key in the Open IB headers we Brian> can use to determine which version of the function to use? OpenIB has not done an "official" release of any userspace components, so this falls into the category of prerelease API breakage. New kernels will require a new libibverbs, so the number of obsolete old development versions should decrease fairly quickly. - R. From jlentini at netapp.com Tue Sep 27 09:53:05 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 27 Sep 2005 12:53:05 -0400 (EDT) Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: <1127839244.4436.7.camel@localhost.localdomain> References: <1127749644.4398.878.camel@hal.voltaire.com> <1127839244.4436.7.camel@localhost.localdomain> Message-ID: On Tue, 27 Sep 2005, Hal Rosenstock wrote: > > Since we don't check for a kmalloc failure in DT_Tdep_PT_Printf, this > > oops occurs: > > > > > Sep 26 10:29:30 hal kernel: Unable to handle kernel NULL pointer > > > dereference at virtual address 00000004 > > > > I've checked in the patch below to fix that, but this is not the root > > of the problem. > > I'll try it with the patch and let you know how it behaves. When it > still runs out of memory will it fail more gracefully ? I understand it > won't fix the root cause of running out of memory. It should behave more gracefully. Thanks for testing. From rolandd at cisco.com Tue Sep 27 09:53:39 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 27 Sep 2005 09:53:39 -0700 Subject: [openib-general] Mellanox verification team is starting to check in to the SVN tes ts In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E319AF06@mtlexch01.mtl.com> (Dotan Barak's message of "Tue, 27 Sep 2005 18:35:38 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E319AF06@mtlexch01.mtl.com> Message-ID: <524q86qwoc.fsf@cisco.com> Dotan> 1) VL (Verification Library) is a library that contains an Dotan> abstraction to the operating system calls (for example: Dotan> sleep, gettimeofday ..) to use in tests (something like the Dotan> MOSAL library in gen1). In order to use the VL, one should Dotan> execute "make" and "make install" Do we really need an OS abstraction library? Can't we just use libc? Dotan> 2) gen2_basic is a test that checks the API of the verbs Dotan> and looks for immediate errors bug. Dotan> All the tests that we will check in will use the VL. Dotan> some of the test cases of the gen2_basic are failing Dotan> because of bugs in the gen2 driver ... Can you post more details or better still fixes for the bugs? Would it be possible to update the tests for the new CQ API? Thanks, Roland From amitk at mellanox.co.il Tue Sep 27 10:07:06 2005 From: amitk at mellanox.co.il (Amit Krig) Date: Tue, 27 Sep 2005 20:07:06 +0300 Subject: [openib-general] Mellanox verification team is starting to ch eck in to the SVN tes ts Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E311AFE4@mtlexch01.mtl.com> Hi Shirley We are also running a nightly regression system on different platforms/Oss, I think that we should have a sync meeting to come up with the best way of sharing/developing the tests/regression tools _____ From: Shirley Ma [mailto:xma at us.ibm.com] Sent: Tuesday, September 27, 2005 7:06 PM To: Dotan Barak Cc: openib-general-bounces at openib.org; openib-general at openib.org Subject: Re: [openib-general] Mellanox verification team is starting to check in to the SVN tes ts That's a great news. We have started posting pure nightly build results for both mainline and openIB Gen2 stacks. We would like to integrate these test results into our automation framework and test Gen2 stack along with the nightly build. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 Dotan Barak Sent by: openib-general-bounces at openib.org 09/27/2005 08:35 AM To openib-general at openib.org cc Subject [openib-general] Mellanox verification team is starting to check in to the SVN tes ts Hi everyone. Mellanox is starting to develop tests to the openIB (over the gen2 stack and the winib stack). We will start to check in the tests that we will write. We already checked in 2 components to: https://openib.org/svn/trunk/contrib/mellanox/ibtp/ : 1) VL (Verification Library) is a library that contains an abstraction to the operating system calls (for example: sleep, gettimeofday ..) to use in tests (something like the MOSAL library in gen1). In order to use the VL, one should execute "make" and "make install" 2) gen2_basic is a test that checks the API of the verbs and looks for immediate errors bug. All the tests that we will check in will use the VL. some of the test cases of the gen2_basic are failing because of bugs in the gen2 driver ... you are welcome to give me (or amitk) feedback about the test(s) and the VL. thanx Dotan Barak Software Verification Engineer Mellanox Technologies LTD Tel: +972-4-9097200 Ext: 231 Fax: +972-4-9593245 P.O. Box 86 Yokneam 20692 ISRAEL. Home: +972-77-8841095 Cell: 052-4222383 [ May the fork be with you ] _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From twbowman at gmail.com Tue Sep 27 10:09:47 2005 From: twbowman at gmail.com (Todd Bowman) Date: Tue, 27 Sep 2005 11:09:47 -0600 Subject: [openib-general] uDAPL problem In-Reply-To: References: <1127774675.7959.852.camel@hal.voltaire.com> <1127829302.4376.28.camel@hal.voltaire.com> Message-ID: On 9/27/05, James Lentini wrote: > > > > On Tue, 27 Sep 2005, Todd Bowman wrote: > > > On 9/27/05, James Lentini wrote: > > > > What is the output of cat /sys/class/net/*/ifindex? > > > > > > cat /sys/class/net/*/ifindex > > 1 #eth0 > > 10 #ib0 > > 11 #ib1 > > 2 #lo > > 3 #tunl0 > > This looks like the problem I reported last week: > > http://openib.org/pipermail/openib-general/2005-September/011668.html > > If so, this is fixed in the current subversion sources. Could you > update your sources, specifically infiniband/core/at.c, and try again? > > james That worked. Thanks. It is printing the addresses backwards. But I assume this is also a bug in printing. ips_by_gid: RET 0 at_rec 0x1ffff9e5f40 -> id 1 dapli_at_event_cb() ip_comp_handler: rec 0x1ffff9e5f40 ->id 1 id 1 num 1 b0f0add open_hca: mthca0, port 1, AF_INET 221.10.15.11 INLINE_MAX=128 query_hca: mthca0 AF_INET 221.10.15.11 query_hca: (0.30002) ep 64512 ep_q 65535 evd 65408 evd_q 65535 query_hca: msg 2147483648 rdma 2147483648 iov 28 lmr 131056 rmr 0 Bus error The bus error is probably due to changes in ibv_context. I didn't recompile libdapl. The changes in ibv_context that broke udapl are: num_comp is now num_comp_vectors I don't know what is used in place of cq_fd[1]. Any pointers? Todd -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Tue Sep 27 10:10:24 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 27 Sep 2005 10:10:24 -0700 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <43396797.6030804@voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> <433827FF.3010601@ichips.intel.com> <43396797.6030804@voltaire.com> Message-ID: <43397D00.6080505@ichips.intel.com> Guy German wrote: > Basically I think that we can definitely agree that if the cma can > implement ib_at intended functionality it should replace it - no need to > have 2 modules doing the same thing. I think that there will still be a need for a separate address translation module(s). I have a call like the following in my latest version of the CMA: int ib_resolve_addr(struct sockaddr *src_addr, struct sockaddr *dst_addr, void (*callback)(int status, struct ib_addr *addr, void *context), void *context); It uses ARP to convert a dst_addr and an optional src_addr to an sgid/dgid. My intent is that this routine be moved to a separate module that deals only with resolving IP addresses to hardware addresses using ARP. This essentially separates the ARP handling from ib_at into its own module. I would rather start with some basic functionality that can be built upon rather than jumping directly to a ARP/ATS/QoS/caching handling interface. > 1. Caching > sean> generic SA caching module should be a part of ib_sa or exist > separately. > > What about specific path records caching with event driven invalidate > logic ? Caching will be complex, which is why I think that it needs to have its own module. I'm envisioning a cache that can be saved to disk for faster system startup. > 4. ATS registration > sean> I think that ATS registration/deregistration should be integrated > with > sean> IPoIB. > > I don't think there is a consensus around that, but I don't know all > details. This makes more sense to me than having the ATS code deference IPoIB private data structures. However, if adding an rdma_ptr to the net_device can avoid this, then that will work. And to be clear, I was referring to only registration/deregistration, not ATS queries. It looks like the ATS code periodically scans all network devices in the system looking for changes in order to update the ATS records. If this is the case, I would think that IPoIB should be able to do this more efficiently. (I'm not that familiar with the code, so I could be off here.) > 5. retries > retries could be centralized in the ib_at approach. For at least the ARP resolution module that I mentioned, I would expect all retries to be handled by that module. For SA queries, since the MAD layer already performs retries, we just need to enable access through the API. - Sean From viswa.krish at gmail.com Tue Sep 27 10:13:55 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Tue, 27 Sep 2005 10:13:55 -0700 Subject: [openib-general] opensm and faulty hardware In-Reply-To: <4338F1EB.3070909@mellanox.co.il> References: <1127792086.4379.290.camel@hal.voltaire.com> <4338F1EB.3070909@mellanox.co.il> Message-ID: <4df28be405092710132622298a@mail.gmail.com> Log sent off-list... -Viswa On 9/27/05, Eitan Zahavi wrote: > > Hi Viswa, > > Please send a full /var/log/osm.log file of opensm -V . > You can send us a copy off the list if it is too big: > > yael and eitan in @mellanox.co.il > > EZ > > Hal Rosenstock wrote: > > On Mon, 2005-09-26 at 19:57, Viswanath Krishnamurthy wrote: > > > >>I have an exerciser in the IB network. The exerciser seems to be > >>faulty/buggy. When opensm starts I do not > >>see 'SUBNET UP" message. It says "Entering MASTER" and waits there. > >>Any new node inserted in this state is not assigned any LID. Anybody > >>seen such behavior ? > > > > > > Any idea on how the IB exerciser misbehaves on the network ? Do you have > > an analyzer too ? > > > > What does the OSM log show ? > > > > -- Hal > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From twbowman at gmail.com Tue Sep 27 10:15:06 2005 From: twbowman at gmail.com (Todd Bowman) Date: Tue, 27 Sep 2005 11:15:06 -0600 Subject: [openib-general] uDAPL problem In-Reply-To: References: <1127774675.7959.852.camel@hal.voltaire.com> <1127829302.4376.28.camel@hal.voltaire.com> Message-ID: On 9/27/05, Todd Bowman wrote: > > > > On 9/27/05, James Lentini wrote: > > > > > > > > On Tue, 27 Sep 2005, Todd Bowman wrote: > > > > > On 9/27/05, James Lentini wrote: > > > > > > What is the output of cat /sys/class/net/*/ifindex? > > > > > > > > cat /sys/class/net/*/ifindex > > > 1 #eth0 > > > 10 #ib0 > > > 11 #ib1 > > > 2 #lo > > > 3 #tunl0 > > > > This looks like the problem I reported last week: > > > > http://openib.org/pipermail/openib-general/2005-September/011668.html > > > > If so, this is fixed in the current subversion sources. Could you > > update your sources, specifically infiniband/core/at.c, and try again? > > > > james > > > That worked. Thanks. > > It is printing the addresses backwards. But I assume this is also a bug in > printing. > > ips_by_gid: RET 0 at_rec 0x1ffff9e5f40 -> id 1 > dapli_at_event_cb() > ip_comp_handler: rec 0x1ffff9e5f40 ->id 1 id 1 num 1 b0f0add > open_hca: mthca0, port 1, AF_INET 221.10.15.11 INLINE_MAX=128 > query_hca: mthca0 AF_INET 221.10.15.11 > query_hca: (0.30002) ep 64512 ep_q 65535 evd 65408 evd_q 65535 > query_hca: msg 2147483648 rdma 2147483648 iov 28 lmr 131056 rmr 0 > Bus error > > > The bus error is probably due to changes in ibv_context. I didn't > recompile libdapl. > > The changes in ibv_context that broke udapl are: > num_comp is now num_comp_vectors > I don't know what is used in place of cq_fd[1]. Any pointers? > > Todd > Never mind, I found the post from Roland discussing this issue: Roland wrote: I didn't try to fix uDAPL, because some thought probably needs to go into how to use completion channels most efficiently. -------------- next part -------------- An HTML attachment was scrubbed... URL: From suri at baymicrosystems.com Tue Sep 27 10:18:36 2005 From: suri at baymicrosystems.com (Suresh Shelvapille) Date: Tue, 27 Sep 2005 13:18:36 -0400 Subject: FW: [openib-general] drivers.diff patch Message-ID: <200509271718.j8RHIdT0001746@ns1.baymicrosystems.com> Just for the benefit of the community, I am forwarding a mail in which Hal already answered my question (I forgot to hit reply all...). -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Tuesday, September 27, 2005 12:27 PM To: Suresh Shelvapille Subject: RE: [openib-general] drivers.diff patch Hi Suresh, On Tue, 2005-09-27 at 12:14, Suresh Shelvapille wrote: > Thanks for your help. It is really painful to get this going. If I applied > any of the patches from branches/backport-to-2.6.9 the kernel would not > built. > So I managed to look at the changes needed for drivers/kconfig and > drivers/Makefile in the kernel-patch diff file and applied it, so that the > menu showed the infiniband. And then applied a few patches from the > branches/backport/2.6.9. Now the Kernel builds with the CONFIG_INFINIBAND=y > and CONFIG_INFININBAND_USER_MAD=y. > > If I have anything else set in the CONFIG the build fails. As I said before > I am afraid to take any of the patches from the branches/backport-to-2.6.9 > as even the core modules won't build. Not sure what is broke. > Question is: > Our system (at least in the initial release) will act like a bear bones ib > switch, That configuration (run as a switch; most everyone has run as HCA) is untested so this may need a tweak or two. What I'm referring to is not build related though. > and I am hoping that only MAD packet support would be needed within > the kernel. Are the above two CONFIG_INFINIBAND and > CONFIG_INFININBAND_USER_MAD sufficient? If you want to run OpenSM and/or the management diagnostics, yes. Otherwise, you won't need user MAD configured. You will need some switch driver which I presume you would be adding locally. -- Hal From mst at mellanox.co.il Tue Sep 27 10:34:33 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 27 Sep 2005 20:34:33 +0300 Subject: [openib-general] Re: [PATCH] add cq error events In-Reply-To: <4339761C.7070001@ichips.intel.com> References: <4339761C.7070001@ichips.intel.com> Message-ID: <20050927173433.GA30506@mellanox.co.il> Quoting r. Sean Hefty : > Subject: Re: [openib-general] Re: [PATCH] add cq error events > > Michael S. Tsirkin wrote: > > Okay ... one problem that I've run into adding this, is > > that IB_EVENT_CQ_ERR is the first item in the ib_event_type enum. > > And since uverbs seems to just copy the event over to userspace, > > changing all the enum values would break the ABI. > > > > Given that IB_EVENT_CQ_ERR wasnt actually produced by any hardware > > provider yet, I'm thinking about working around this by simply giving > > specific values to enum items, like this > > Why not keep CQ_ERR as the generic access error, and then add > CQ_OVERRUN to the end? > > - Sean > Fine with me. Roland? -- MST From jlentini at netapp.com Tue Sep 27 10:37:55 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 27 Sep 2005 13:37:55 -0400 (EDT) Subject: [openib-general] uDAPL problem In-Reply-To: References: <1127774675.7959.852.camel@hal.voltaire.com> <1127829302.4376.28.camel@hal.voltaire.com> Message-ID: On Tue, 27 Sep 2005, Todd Bowman wrote: > Never mind, I found the post from Roland discussing this issue: > > Roland wrote: > I didn't try to fix uDAPL, because some thought probably needs to go > into how to use completion channels most efficiently. I appologize. I forgot that the current tree has an ABI change. Arlin is working on a fix for this. The kernel and userspace code at revision 3547 is what you need. Please use svn co -r 3547 https://openib.org/svn/gen2/trunk/src/ to obtain these sources. From alexisgr at rogers.com Tue Sep 27 10:38:24 2005 From: alexisgr at rogers.com (ALEXIS GARCIA RUIZ) Date: Tue, 27 Sep 2005 13:38:24 -0400 (EDT) Subject: [openib-general] New on IB (Help) Message-ID: <20050927173824.22094.qmail@web88108.mail.re2.yahoo.com> Hello, all: I just got the task to get into the IB technology, I made some reading and installed some packages on a Host. But still I am lost. Could you lead me to some reading? Regards Alex __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlentini at netapp.com Tue Sep 27 10:59:01 2005 From: jlentini at netapp.com (James Lentini) Date: Tue, 27 Sep 2005 13:59:01 -0400 (EDT) Subject: [openib-general] New on IB (Help) In-Reply-To: <20050927173824.22094.qmail@web88108.mail.re2.yahoo.com> References: <20050927173824.22094.qmail@web88108.mail.re2.yahoo.com> Message-ID: On Tue, 27 Sep 2005, ALEXIS GARCIA RUIZ wrote: > Hello, all: > I just got the task to get into the IB technology, I made some > reading and installed some packages on a Host. > But still I am lost. > Could you lead me to some reading? > > Regards > > Alex The Wiki is a good place for installation and configuration help: https://openib.org/tiki/tiki-index.php From xma at us.ibm.com Tue Sep 27 11:06:25 2005 From: xma at us.ibm.com (Shirley Ma) Date: Tue, 27 Sep 2005 11:06:25 -0700 Subject: [openib-general] Mellanox verification team is starting to ch eck in to the SVN tes ts In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E311AFE4@mtlexch01.mtl.com> Message-ID: >I think that we should have a sync meeting to come up with the best way of sharing/developing the tests/regression tools. That's a good idea. Thanks Shirley Ma IBM Linux Technology Center 15300 SW Koll Parkway Beaverton, OR 97006-6063 Phone(Fax): (503) 578-7638 -------------- next part -------------- An HTML attachment was scrubbed... URL: From viswa.krish at gmail.com Tue Sep 27 11:13:31 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Tue, 27 Sep 2005 11:13:31 -0700 Subject: [openib-general] opensm and faulty hardware In-Reply-To: <4df28be405092710132622298a@mail.gmail.com> References: <1127792086.4379.290.camel@hal.voltaire.com> <4338F1EB.3070909@mellanox.co.il> <4df28be405092710132622298a@mail.gmail.com> Message-ID: <4df28be4050927111336b0861e@mail.gmail.com> I tracked down the issue to a bug in osm_lid_mgr.c function: __osm_lid_mgr_init_sweep(...) The bad hardware was retutning an assigned LID of 0xFFFF. In this function there is a loop as follows where opensm is getting stuck.. (with line number) 392 p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl; 393 394 for( p_port = (osm_port_t*)cl_qmap_head( p_port_guid_tbl ); 395 p_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl ); 396 p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ) ) 397 { 398 osm_port_get_lid_range_ho(p_port, &disc_min_lid, &disc_max_lid); 399 for (lid = disc_min_lid; lid <= disc_max_lid; lid++) <===== Bug here 400 cl_ptr_vector_set(p_discovered_vec, lid, p_port ); 401 } Since the disc_max_lid and disc_min_lid are 0xFFFF, and these are unsigned 16 bit numbers, the condition in the for loop never becomes false, and opensm is stuck in the loop. There are couple of other places in that function that needs fixing too. -Viswa On 9/27/05, Viswanath Krishnamurthy wrote: > > Log sent off-list... > > -Viswa > > > On 9/27/05, Eitan Zahavi wrote: > > > > Hi Viswa, > > > > Please send a full /var/log/osm.log file of opensm -V . > > You can send us a copy off the list if it is too big: > > > > yael and eitan in @mellanox.co.il > > > > EZ > > > > Hal Rosenstock wrote: > > > On Mon, 2005-09-26 at 19:57, Viswanath Krishnamurthy wrote: > > > > > >>I have an exerciser in the IB network. The exerciser seems to be > > >>faulty/buggy. When opensm starts I do not > > >>see 'SUBNET UP" message. It says "Entering MASTER" and waits there. > > >>Any new node inserted in this state is not assigned any LID. Anybody > > >>seen such behavior ? > > > > > > > > > Any idea on how the IB exerciser misbehaves on the network ? Do you > > have > > > an analyzer too ? > > > > > > What does the OSM log show ? > > > > > > -- Hal > > > > > > _______________________________________________ > > > openib-general mailing list > > > openib-general at openib.org > > > http://openib.org/mailman/listinfo/openib-general > > > > > > To unsubscribe, please visit > > > http://openib.org/mailman/listinfo/openib-general > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Sep 27 11:21:06 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 14:21:06 -0400 Subject: [openib-general] opensm and faulty hardware In-Reply-To: <4df28be4050927111336b0861e@mail.gmail.com> References: <1127792086.4379.290.camel@hal.voltaire.com> <4338F1EB.3070909@mellanox.co.il> <4df28be405092710132622298a@mail.gmail.com> <4df28be4050927111336b0861e@mail.gmail.com> Message-ID: <1127845266.4403.8.camel@hal.voltaire.com> Hi Viswa, On Tue, 2005-09-27 at 14:13, Viswanath Krishnamurthy wrote: > I tracked down the issue to a bug in osm_lid_mgr.c > > function: __osm_lid_mgr_init_sweep(...) > > The bad hardware was retutning an assigned LID of 0xFFFF. In this > function there is a loop > as follows where opensm is getting stuck.. (with line number) > > 392 p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl; > 393 > 394 for( p_port = (osm_port_t*)cl_qmap_head( p_port_guid_tbl ); > 395 p_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl ); > 396 p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ) > ) > 397 { > 398 osm_port_get_lid_range_ho(p_port, &disc_min_lid, > &disc_max_lid); > 399 for (lid = disc_min_lid; lid <= disc_max_lid; > lid++) <===== Bug here > 400 cl_ptr_vector_set(p_discovered_vec, lid, p_port ); > 401 } > > Since the disc_max_lid and disc_min_lid are 0xFFFF, and these are > unsigned 16 bit numbers, the condition 0xFFFF is the permissive LID and not LID routed. In fact, unicast LIDs should be between 0x0001 and 0xbfff. So I think the fix involves not allowing min/max to be set that way. > in the for loop never becomes false, and opensm is stuck in the loop. > There are couple of other places in that > function that needs fixing too. What are the other places you see ? Thanks. -- Hal From halr at voltaire.com Tue Sep 27 12:11:05 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 15:11:05 -0400 Subject: [openib-general] opensm and faulty hardware In-Reply-To: <4df28be4050927111336b0861e@mail.gmail.com> References: <1127792086.4379.290.camel@hal.voltaire.com> <4338F1EB.3070909@mellanox.co.il> <4df28be405092710132622298a@mail.gmail.com> <4df28be4050927111336b0861e@mail.gmail.com> Message-ID: <1127848264.4829.44.camel@hal.voltaire.com> On Tue, 2005-09-27 at 14:13, Viswanath Krishnamurthy wrote: > I tracked down the issue to a bug in osm_lid_mgr.c > > function: __osm_lid_mgr_init_sweep(...) > > The bad hardware was retutning an assigned LID of 0xFFFF. In this > function there is a loop > as follows where opensm is getting stuck.. (with line number) > > 392 p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl; > 393 > 394 for( p_port = (osm_port_t*)cl_qmap_head( p_port_guid_tbl ); > 395 p_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl ); > 396 p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ) > ) > 397 { > 398 osm_port_get_lid_range_ho(p_port, &disc_min_lid, > &disc_max_lid); > 399 for (lid = disc_min_lid; lid <= disc_max_lid; > lid++) <===== Bug here > 400 cl_ptr_vector_set(p_discovered_vec, lid, p_port ); > 401 } > > Since the disc_max_lid and disc_min_lid are 0xFFFF, and these are > unsigned 16 bit numbers, the condition > in the for loop never becomes false, and opensm is stuck in the loop. > There are couple of other places in that > function that needs fixing too. Sep 26 15:26:03 424135 [B66CFBB0] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x2 trans_id................0x1274 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x1 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Sep 26 15:26:03 424407 [B6ED0BB0] -> __osm_nd_rcv_process_nd: Node 0x30d300002c7234 Description = Agilent E2954A 4x Generator for InfiniBand. Sep 26 15:26:03 424426 [B6ED0BB0] -> __osm_nd_rcv_process_nd: ] Sep 26 15:26:03 679882 [B56CDBB0] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x81 (SubnGetResp) D bit...................0x1 status..................0x0 hop_ptr.................0x0 hop_count...............0x2 trans_id................0x1274 attr_id.................0x15 (PortInfo) resv....................0x0 attr_mod................0x1 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][12] Return path: [0][E][0] Sep 26 15:26:03 680291 [B76D1BB0] -> osm_pi_rcv_process: [ Sep 26 15:26:03 680323 [B56CDBB0] -> __osm_sm_mad_ctrl_rcv_callback: ] Sep 26 15:26:03 680343 [B76D1BB0] -> PortInfo dump: port number.............0x1 node_guid...............0x0030d300002c7234 port_guid...............0x0030d300002c7234 m_key...................0x0000000000000000 subnet_prefix...........0xfe80000000000000 base_lid................0xFFFF Yes, it appears the Agilent exerciser returned good status to a SM Get PortInfo with a base_lid of 0xffff. The base_lid should be validated by OpenSM. -- Hal From tternes at gmail.com Tue Sep 27 12:40:47 2005 From: tternes at gmail.com (Thaddeus Ternes) Date: Tue, 27 Sep 2005 14:40:47 -0500 Subject: [openib-general] EEH: MMIO Failure on Power5 In-Reply-To: References: Message-ID: Well, I moved the card to slot 5 and things seem to be working... I have another Power5 and Mellanox card available, so I decided to retest with them to see what the solution was. I dropped the card into slot 5 on the second Power 5, and it came right up, even without the firmware upgrade (though the module did inform me that I had old firmware). Apparently it was just an issue with the card not being right slot. Is there some place that this is documented and I've just missed it? Some folks that I work with (who are more familiar with the Power series than I am) didn't seem to know much about it either. Thankfully, you knew about it, Pradeep. Still... it might be nice if this was a little more obvious. Thanks for helping track this down. Thaddeus On 9/26/05, Pradeep Satyanarayana wrote: > > > Tried to find out the "default superslotes" for an OpenPower 720. Please try either slot 2 or 5. I delayed my response to make certain that these were indeed the superslots. I am still not a 100% certain -no point waiting beyond a certain stage. > > If you can please go ahead and try these and let us see what happens. Also can you provide the output of "lspci -v" before you load the ib_mthca? > > The firmware I was referring to was the OpenPower firmware, not that of the HCA. > > Pradeep > pradeep at us.ibm.com > > Thaddeus Ternes > > > > > > > > Thaddeus Ternes > > 09/23/2005 11:23 AM > > Please respond to > Thaddeus Ternes > > > To > Pradeep Satyanarayana/Beaverton/IBM at IBMUS > > > cc > openib-general at openib.org, Roland Dreier > > > Subject > Re: [openib-general] EEH: MMIO Failure on Power5 > > > I've tried a few things, but still seem to get the same error. My > testing has been on 2.6.13.1, with SVN IB code (as of Monday). The > ib_mthca module reports my HCA FW version to be 3.2.0 (which is > admittedly old). Updating this old firmware will likely be my next > step. > > Originally, I had installed the card in slot 1. I've since poked > around in a PDF file I found on IBM's site and concluded that I should > have installed the card in slot 3, though I'm still not overly > confident about that. I/O Adapter Large Capacity is also now enabled > (it wasn't previously, and changing it while the card was in slot 1 > didn't seem to affect anything). > > Is somebody aware of a clear way to identify which of the slots in the > 720 are "superslots," as I've had no luck so far in my hunt in the > documentation. Most likely, I've mistakenly skipped over it. > > Thanks. > > Thaddeus > > On 9/22/05, Pradeep Satyanarayana wrote: > > > > > > I have filed a bug against the kernel (for p-series) as a starting point. > > Could you please flll me on some of the other specifics a) which kernel were > > you using b) firmware level (presumably it is uptodate). > > > > One other issue that I failed to mention previously - is the HCA in one of > > the superslots (I know on my p570 slots 2 and 6 are superslots by default) > > and, is this superslot enabled? > > > > Here is a quote of how to enable superslots- > > > > One issue with the Mellanox cards in pSeries systems is to ensure that the > > card is installed in a superslot, and that the "I/O Adapter Enlarged > > Capacity" setting has been enabled for the system. For a p570, slots C6 and > > C2 are the available super slots. To enable the "Enlarged Capacity" feature, > > go to ASM and select the following screens: > > > > System Configuration->I/O Adapter Enlarged Capacity > > Set the setting to Enabled and save it. > > > > If this does not help, I have already filed the bug. Please let me know > > either way. > > > > Pradeep > > pradeep at us.ibm.com > > > > > From viswa.krish at gmail.com Tue Sep 27 13:00:38 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Tue, 27 Sep 2005 13:00:38 -0700 Subject: [openib-general] opensm and faulty hardware In-Reply-To: <1127848264.4829.44.camel@hal.voltaire.com> References: <1127792086.4379.290.camel@hal.voltaire.com> <4338F1EB.3070909@mellanox.co.il> <4df28be405092710132622298a@mail.gmail.com> <4df28be4050927111336b0861e@mail.gmail.com> <1127848264.4829.44.camel@hal.voltaire.com> Message-ID: <4df28be40509271300b39548b@mail.gmail.com> Hal, I added a hack now to get around the problem. There needs to be a proper fix later.. [root at ibstg1 opensm]# svn diff osm_port.h Index: osm_port.h =================================================================== --- osm_port.h (revision 3549) +++ osm_port.h (working copy) @@ -1049,6 +1049,8 @@ { CL_ASSERT( p_physp ); CL_ASSERT( osm_physp_is_valid( p_physp ) ); + if (p_physp->port_info.base_lid == 0xFFFF) + return (0); return( p_physp->port_info.base_lid ); } /* On 27 Sep 2005 15:11:05 -0400, Hal Rosenstock wrote: > > On Tue, 2005-09-27 at 14:13, Viswanath Krishnamurthy wrote: > > I tracked down the issue to a bug in osm_lid_mgr.c > > > > function: __osm_lid_mgr_init_sweep(...) > > > > The bad hardware was retutning an assigned LID of 0xFFFF. In this > > function there is a loop > > as follows where opensm is getting stuck.. (with line number) > > > > 392 p_port_guid_tbl = &p_mgr->p_subn->port_guid_tbl; > > 393 > > 394 for( p_port = (osm_port_t*)cl_qmap_head( p_port_guid_tbl ); > > 395 p_port != (osm_port_t*)cl_qmap_end( p_port_guid_tbl ); > > 396 p_port = (osm_port_t*)cl_qmap_next( &p_port->map_item ) > > ) > > 397 { > > 398 osm_port_get_lid_range_ho(p_port, &disc_min_lid, > > &disc_max_lid); > > 399 for (lid = disc_min_lid; lid <= disc_max_lid; > > lid++) <===== Bug here > > 400 cl_ptr_vector_set(p_discovered_vec, lid, p_port ); > > 401 } > > > > Since the disc_max_lid and disc_min_lid are 0xFFFF, and these are > > unsigned 16 bit numbers, the condition > > in the for loop never becomes false, and opensm is stuck in the loop. > > There are couple of other places in that > > function that needs fixing too. > > Sep 26 15:26:03 424135 [B66CFBB0] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > D bit...................0x0 > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x2 > trans_id................0x1274 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x1 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > > Sep 26 15:26:03 424407 [B6ED0BB0] -> __osm_nd_rcv_process_nd: Node > 0x30d300002c7234 > Description = Agilent E2954A 4x Generator for InfiniBand. > Sep 26 15:26:03 424426 [B6ED0BB0] -> __osm_nd_rcv_process_nd: ] > > Sep 26 15:26:03 679882 [B56CDBB0] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x81 (SubnGetResp) > D bit...................0x1 > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x2 > trans_id................0x1274 > attr_id.................0x15 (PortInfo) > resv....................0x0 > attr_mod................0x1 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][12] > Return path: [0][E][0] > > > Sep 26 15:26:03 680291 [B76D1BB0] -> osm_pi_rcv_process: [ > Sep 26 15:26:03 680323 [B56CDBB0] -> __osm_sm_mad_ctrl_rcv_callback: ] > Sep 26 15:26:03 680343 [B76D1BB0] -> PortInfo dump: > port number.............0x1 > node_guid...............0x0030d300002c7234 > port_guid...............0x0030d300002c7234 > m_key...................0x0000000000000000 > subnet_prefix...........0xfe80000000000000 > base_lid................0xFFFF > > Yes, it appears the Agilent exerciser returned good status to a SM Get > PortInfo with a base_lid of 0xffff. The base_lid should be validated by > OpenSM. > > -- Hal > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Tue Sep 27 12:57:05 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 15:57:05 -0400 Subject: [openib-general] osm_port_info_rcv.c Error Numbering Message-ID: <1127851024.4829.208.camel@hal.voltaire.com> Hi Yael, It looks to me like the two 4A0x error messages below in osm_port_info_rcv.c should be changed to something unused in 0x0Fxx range. osm_log( p_log, OSM_LOG_ERROR, "osm_physp_has_pkey: ERR 4A02: " osm_log( p_log, OSM_LOG_ERROR, "osm_physp_has_pkey: ERR 4A03: " If that is acceptable, I will take care of it. -- Hal From Administrator at openib.org Tue Sep 27 13:05:57 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Tue, 27 Sep 2005 13:05:57 -0700 Subject: [openib-general] [MailServer Notification]To Recipient file blocking settings matched and action taken. Message-ID: <01f901c5c39e$dfc86af0$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has blocked an attachment. Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] MEMBERS SUPPORT Scanning time = 9/27/2005 1:05:57 PM Action on file blocking: The attachment email-details.zip matches the file blocking settings. ScanMail has Quarantined it. The attachment was quarantined to C:\Program Files\Trend\Smex\Alert\email-details4339a62518.zip_. Warning to Recipient: Action taken by attachment blocking. From rolandd at cisco.com Tue Sep 27 13:10:14 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 27 Sep 2005 13:10:14 -0700 Subject: [openib-general] Re: [PATCH] add cq error events In-Reply-To: <20050927173433.GA30506@mellanox.co.il> (Michael S. Tsirkin's message of "Tue, 27 Sep 2005 20:34:33 +0300") References: <4339761C.7070001@ichips.intel.com> <20050927173433.GA30506@mellanox.co.il> Message-ID: <52zmpyp909.fsf@cisco.com> Michael> Fine with me. Roland? Yes, seems like a good solution. - R. From halr at voltaire.com Tue Sep 27 13:13:01 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 16:13:01 -0400 Subject: [openib-general] opensm and faulty hardware In-Reply-To: <4df28be40509271300b39548b@mail.gmail.com> References: <1127792086.4379.290.camel@hal.voltaire.com> <4338F1EB.3070909@mellanox.co.il> <4df28be405092710132622298a@mail.gmail.com> <4df28be4050927111336b0861e@mail.gmail.com> <1127848264.4829.44.camel@hal.voltaire.com> <4df28be40509271300b39548b@mail.gmail.com> Message-ID: <1127851980.4829.270.camel@hal.voltaire.com> On Tue, 2005-09-27 at 16:00, Viswanath Krishnamurthy wrote: > Hal, > > I added a hack now to get around the problem. There needs to be a > proper fix later.. Can you try this instead ? Thanks. -- Hal Index: include/opensm/osm_port.h =================================================================== --- include/opensm/osm_port.h (revision 3567) +++ include/opensm/osm_port.h (working copy) @@ -346,7 +346,7 @@ osm_physp_is_healthy( * Returns TRUE if the Physical Port has been maked as healthy * FALSE otherwise. * All physical ports are initialized as "healthy" but may be marked -* otherwise if a received trap claims otherwise. +* otherwise if a received trap claims otherwise. * * NOTES * @@ -456,6 +456,42 @@ osm_physp_set_port_info( * Port, Physical Port *********/ +/****f* OpenSM: Physical Port/osm_physp_validate_base_lid +* NAME +* osm_physp_validate_base_lid +* +* DESCRIPTION +* Validates the base LID in the Physical Port object. +* +* SYNOPSIS +*/ +static inline boolean_t +osm_physp_validate_base_lid( + IN osm_physp_t* const p_physp ) +{ + CL_ASSERT( osm_physp_is_valid( p_physp ) ); + if ( cl_ntoh16( p_physp->port_info.base_lid ) > IB_LID_UCAST_END_HO ) + { + p_physp->port_info.base_lid = 0; + return FALSE; + } + return TRUE; +} +/* +* PARAMETERS +* p_physp +* [in] Pointer to an osm_physp_t object. +* +* RETURN VALUES +* Returns TRUE if the base LID in the Physical port object is valid. +* FALSE otherwise. +* +* NOTES +* +* SEE ALSO +* Port, Physical Port +*********/ + /****f* OpenSM: Physical Port/osm_physp_set_pkey_tbl * NAME * osm_physp_set_pkey_tbl Index: opensm/osm_port_info_rcv.c =================================================================== --- opensm/osm_port_info_rcv.c (revision 3579) +++ opensm/osm_port_info_rcv.c (working copy) @@ -346,8 +346,12 @@ __osm_pi_rcv_process_switch_port( if (port_num == 0) { - /* This is a management port 0 */ - __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); + /* This is switch management port 0 */ + if ( !osm_physp_validate_base_lid( p_physp ) ) + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_pi_rcv_process_switch_port: ERR 0F04: " + "Invalid base LID corrected.\n" ); + __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); } OSM_LOG_EXIT( p_rcv->p_log ); @@ -367,6 +371,10 @@ __osm_pi_rcv_process_ca_port( UNUSED_PARAM( p_node ); osm_physp_set_port_info( p_physp, p_pi ); + if ( !osm_physp_validate_base_lid( p_physp ) ) + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_pi_rcv_process_ca_port: ERR 0F08: " + "Invalid base LID corrected.\n" ); __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); @@ -390,6 +398,10 @@ __osm_pi_rcv_process_router_port( Update the PortInfo attribute. */ osm_physp_set_port_info( p_physp, p_pi ); + if ( !osm_physp_validate_base_lid( p_physp ) ) + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_pi_rcv_process_router_port: ERR 0F09: " + "Invalid base LID corrected.\n" ); OSM_LOG_EXIT( p_rcv->p_log ); } From sean.hefty at intel.com Tue Sep 27 12:55:56 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Tue, 27 Sep 2005 12:55:56 -0700 Subject: [openib-general] [PATCH] [CMA] [RFC] add routine to transition a QP to INIT state Message-ID: This patch will transition a QP to the INIT state and bind the QP to the cma_id. It is called after a route has been resolved and should assist with transport independent code. Signed-off-by: Sean Hefty Index: ulp/cma/cma.c =================================================================== --- ulp/cma/cma.c (revision 3568) +++ ulp/cma/cma.c (working copy) @@ -116,6 +116,55 @@ struct cma_id_private* cma_alloc_id(stru return cma_id_priv; } +static int cma_modify_ib_qp_init(struct cma_id_private *cma_id_priv, + struct ib_qp *qp, int qp_access_flags) +{ + struct ib_qp_attr qp_attr; + struct ib_sa_path_rec *path_rec; + int ret; + + qp_attr.qp_state = IB_QPS_INIT; + qp_attr.qp_access_flags = qp_access_flags; + + path_rec = cma_id_priv->cma_id.route.path_rec; + ret = ib_find_cached_pkey(cma_id_priv->cma_id.device, qp_attr.port_num, + be16_to_cpu(path_rec->pkey), + &qp_attr.pkey_index); + if (ret) + return ret; + + ret = ib_find_cached_gid(cma_id_priv->cma_id.device, &path_rec->sgid, + &qp_attr.port_num, NULL); + if (ret) + return ret; + + return ib_modify_qp(qp, &qp_attr, IB_QP_STATE | IB_QP_ACCESS_FLAGS | + IB_QP_PKEY_INDEX | IB_QP_PORT); +} + +int rdma_cma_init_qp(struct rdma_cma_id *cma_id, struct ib_qp *qp, + int qp_access_flags) +{ + struct cma_id_private *cma_id_priv; + int ret; + + cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); + + switch (cma_id->device->node_type) { + case IB_NODE_CA: + ret = cma_modify_ib_qp_init(cma_id_priv, qp, qp_access_flags); + break; + default: + ret = -ENOSYS; + break; + } + + if (!ret) + cma_id->qp = qp; + return ret; +} +EXPORT_SYMBOL(rdma_cma_init_qp); + static int cma_modify_ib_qp_rtr(struct cma_id_private *cma_id_priv) { struct ib_qp_attr qp_attr; @@ -552,7 +601,7 @@ static int cma_connect_ib(struct cma_id_ req.alternate_path = &route->path_rec[1]; req.service_id = cma_get_service_id(&route->dst_addr); - req.qp_num = conn_param->qp->qp_num; + req.qp_num = cma_id_priv->cma_id.qp->qp_num; req.qp_type = IB_QPT_RC; req.starting_psn = req.qp_num; req.responder_resources = conn_param->responder_resources; @@ -563,7 +612,7 @@ static int cma_connect_ib(struct cma_id_ req.remote_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT; req.local_cm_response_timeout = CMA_CM_RESPONSE_TIMEOUT; req.max_cm_retries = CMA_MAX_CM_RETRIES; - req.srq = conn_param->qp->srq ? 1 : 0; + req.srq = cma_id_priv->cma_id.qp->srq ? 1 : 0; return ib_send_cm_req(cma_id_priv->cm_id, &req); } @@ -576,8 +625,6 @@ int rdma_cma_connect(struct rdma_cma_id cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); - cma_id->qp = conn_param->qp; - switch (cma_id->device->node_type) { case IB_NODE_CA: ret = cma_connect_ib(cma_id_priv, conn_param); @@ -602,7 +649,7 @@ static int cma_accept_ib(struct cma_id_p return ret; memset(&rep, 0, sizeof rep); - rep.qp_num = conn_param->qp->qp_num; + rep.qp_num = cma_id_priv->cma_id.qp->qp_num; rep.starting_psn = rep.qp_num; rep.private_data = conn_param->private_data; rep.private_data_len = conn_param->private_data_len; @@ -612,7 +659,7 @@ static int cma_accept_ib(struct cma_id_p rep.failover_accepted = 0; rep.flow_control = conn_param->flow_control; rep.rnr_retry_count = conn_param->rnr_retry_count; - rep.srq = conn_param->qp->srq ? 1 : 0; + rep.srq = cma_id_priv->cma_id.qp->srq ? 1 : 0; return ib_send_cm_rep(cma_id_priv->cm_id, &rep); } @@ -625,8 +672,6 @@ int rdma_cma_accept(struct rdma_cma_id * cma_id_priv = container_of(cma_id, struct cma_id_private, cma_id); - cma_id->qp = conn_param->qp; - switch (cma_id->device->node_type) { case IB_NODE_CA: ret = cma_accept_ib(cma_id_priv, conn_param); Index: include/rdma/rdma_cma.h =================================================================== --- include/rdma/rdma_cma.h (revision 3568) +++ include/rdma/rdma_cma.h (working copy) @@ -93,8 +93,14 @@ int rdma_cma_resolve_route(struct rdma_c struct sockaddr *src_addr, struct sockaddr *dst_addr, int timeout_ms); +/** + * rdma_cma_init_qp - Associates a QP with a CMA identifier and initializes the + * QP for use in establishing a connection. + */ +int rdma_cma_init_qp(struct rdma_cma_id *cma_id, struct ib_qp *qp, + int qp_access_flags); + struct rdma_cma_conn_param { - struct ib_qp *qp; const void *private_data; u8 private_data_len; u8 responder_resources; From arlin.r.davis at intel.com Tue Sep 27 12:51:06 2005 From: arlin.r.davis at intel.com (Arlin Davis) Date: Tue, 27 Sep 2005 12:51:06 -0700 Subject: [openib-general] [PATCH] uDAPL build fix for OS vendor variations of IA64_FETCHADD Message-ID: James, Please review the following uDAPL patch which fixes some ia64 build problems (atomics) with the latest Redhat EL4.0 update and adds support for SuSe. Feel free to come up with a better solution. Thanks, -arlin Signed-off by: Arlin Davis Index: dapl/udapl/linux/dapl_osd.h =================================================================== --- dapl/udapl/linux/dapl_osd.h (revision 3541) +++ dapl/udapl/linux/dapl_osd.h (working copy) @@ -83,7 +83,6 @@ #include #endif - /* Useful debug definitions */ #ifndef STATIC #define STATIC static @@ -156,13 +155,17 @@ #ifdef __ia64__ DAT_COUNT old_value; -#if OS_VERSION >= LINUX_VERSION(2,6) - IA64_FETCHADD (old_value,v,1,4,rel); +#ifndef REDHAT_EL4 +# if OS_RELEASE >= LINUX_VERSION(2,6) + IA64_FETCHADD(old_value,v,1,4,rel); +# else + IA64_FETCHADD(old_value,v,1,4); +# endif #else - IA64_FETCHADD (old_value,v,1,4); + IA64_FETCHADD(old_value,v,1,4); #endif -#else /* !__ia64__ */ +#else __asm__ __volatile__ ( "lock;" "incl %0" :"=m" (*v) @@ -184,13 +187,17 @@ #ifdef __ia64__ DAT_COUNT old_value; -#if OS_VERSION >= LINUX_VERSION(2,6) - IA64_FETCHADD (old_value,v,-1,4,rel); +#ifndef REDHAT_EL4 +# if OS_RELEASE >= LINUX_VERSION(2,6) + IA64_FETCHADD(old_value,v,-1,4,rel); +# else + IA64_FETCHADD(old_value,v,-1,4); +# endif #else - IA64_FETCHADD (old_value,v,-1,4); + IA64_FETCHADD(old_value,v,-1,4); #endif -#else /* !__ia64__ */ +#else __asm__ __volatile__ ( "lock;" "decl %0" :"=m" (*v) @@ -227,9 +234,11 @@ */ #ifdef __ia64__ - -current_value = ia64_cmpxchg("acq",v,match_value,new_value,4); - +#ifdef REDHAT_EL4 + current_value = ia64_cmpxchg("acq",v,match_value,new_value,4); +#else + current_value = ia64_cmpxchg(acq,v,match_value,new_value,4); +#endif #else __asm__ __volatile__ ( "lock; cmpxchgl %1, %2" Index: dapl/udapl/dapl_cno_free.c =================================================================== --- dapl/udapl/dapl_cno_free.c (revision 3541) +++ dapl/udapl/dapl_cno_free.c (working copy) @@ -74,7 +74,7 @@ goto bail; } - if (cno_ptr->cno_ref_count != 0 + if (dapl_os_atomic_read(&cno_ptr->cno_ref_count) != 0 || cno_ptr->cno_waiters != 0) { dat_status = DAT_ERROR (DAT_INVALID_STATE,DAT_INVALID_STATE_CNO_IN_USE); Index: dapl/udapl/Makefile =================================================================== --- dapl/udapl/Makefile (revision 3565) +++ dapl/udapl/Makefile (working copy) @@ -57,6 +57,13 @@ endif # +# Set up the default OS Vendor +# +ifndef OS_VENDOR +OS_VENDOR = REDHAT_EL4 +endif + +# # CFLAGS definition # # The makefile will build for multiple providers, but each provider @@ -67,7 +74,7 @@ # CFLAGS Compile time flags for build # -CFLAGS = -O2 $(CPPFLAGS) -DOS_VERSION=$(OSRELEASE) -DDAPL_DBG +CFLAGS = -O2 $(CPPFLAGS) -D$(OS_VENDOR) -DOS_VERSION=$(OSRELEASE) -DDAPL_DBG # # dummy provider @@ -152,15 +159,12 @@ CFLAGS += --no-strict-aliasing CFLAGS += -Werror CFLAGS += -g3 +CFLAGS += -fPIC ifdef GPROF CFLAGS += -pg endif -ifeq (${MACH},x86_64) -CFLAGS += -fPIC -endif - LD = ld # @@ -170,6 +174,10 @@ LDFLAGS += -lpthread LDFLAGS += -init dapl_init LDFLAGS += -fini dapl_fini +ifeq ($(OS_VENDOR),SuSE) +LDFLAGS += -lgcc_s +endif + AR = ar # Index: dapl/common/dapl_evd_free.c =================================================================== --- dapl/common/dapl_evd_free.c (revision 3541) +++ dapl/common/dapl_evd_free.c (working copy) @@ -113,7 +113,7 @@ #else if (cno_ptr != NULL) { - if (cno_ptr->cno_ref_count == 0 && cno_ptr->cno_waiters > 0) + if (dapl_os_atomic_read(&cno_ptr->cno_ref_count) == 0 && cno_ptr->cno_waiters > 0) { /* * Last reference on the CNO, trigger a notice. See -------------- next part -------------- A non-text attachment was scrubbed... Name: osvendor.patch Type: application/octet-stream Size: 3581 bytes Desc: not available URL: From rolandd at cisco.com Tue Sep 27 13:56:52 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 27 Sep 2005 13:56:52 -0700 Subject: [openib-general] Re: [PATCH] fix hw_ver value in ib/v_query_device In-Reply-To: <20050927130910.GA20836@mellanox.co.il> (Jack Morgenstein's message of "Tue, 27 Sep 2005 16:09:10 +0300") References: <20050927130910.GA20836@mellanox.co.il> Message-ID: <52vf0mp6uj.fsf@cisco.com> Thanks, committed to svn and queued for 2.6.14. - R. From rolandd at cisco.com Tue Sep 27 14:08:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 27 Sep 2005 14:08:10 -0700 Subject: [openib-general] [PATCH] trivial ucm sparse cleanup Message-ID: <52mzlyp6bp.fsf@cisco.com> Just an FYI, I already checked in... just make two vars only used in a single file static to appease sparse. --- infiniband/core/ucm.c (revision 3579) +++ infiniband/core/ucm.c (working copy) @@ -113,8 +113,8 @@ static struct ib_client ucm_client = { .remove = ib_ucm_remove_one }; -DECLARE_MUTEX(ctx_id_mutex); -DEFINE_IDR(ctx_id_table); +static DECLARE_MUTEX(ctx_id_mutex); +static DEFINE_IDR(ctx_id_table); static DECLARE_BITMAP(dev_map, IB_UCM_MAX_DEVICES); static struct ib_ucm_context *ib_ucm_ctx_get(struct ib_ucm_file *file, int id) From rolandd at cisco.com Tue Sep 27 15:08:58 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 27 Sep 2005 15:08:58 -0700 Subject: [openib-general] [PATCH] Message-ID: <52irwmp3id.fsf@cisco.com> Robert, I just committed the first step towards merging your branch: pure simplication/fixing, based on your uverbs changes. - R. --- infiniband/core/uverbs_cmd.c (revision 3579) +++ infiniband/core/uverbs_cmd.c (working copy) @@ -1,6 +1,7 @@ /* * Copyright (c) 2005 Topspin Communications. All rights reserved. * Copyright (c) 2005 Cisco Systems. All rights reserved. + * Copyright (c) 2005 PathScale, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -288,24 +289,20 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uve pd->uobject = uobj; atomic_set(&pd->usecnt, 0); + down(&ib_uverbs_idr_mutex); + retry: if (!idr_pre_get(&ib_uverbs_pd_idr, GFP_KERNEL)) { ret = -ENOMEM; - goto err_pd; + goto err_up; } - down(&ib_uverbs_idr_mutex); ret = idr_get_new(&ib_uverbs_pd_idr, pd, &uobj->id); - up(&ib_uverbs_idr_mutex); if (ret == -EAGAIN) goto retry; if (ret) - goto err_pd; - - down(&file->mutex); - list_add_tail(&uobj->list, &file->ucontext->pd_list); - up(&file->mutex); + goto err_up; memset(&resp, 0, sizeof resp); resp.pd_handle = uobj->id; @@ -313,21 +310,22 @@ retry: if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { ret = -EFAULT; - goto err_list; + goto err_idr; } - return in_len; - -err_list: - down(&file->mutex); - list_del(&uobj->list); + down(&file->mutex); + list_add_tail(&uobj->list, &file->ucontext->pd_list); up(&file->mutex); - down(&ib_uverbs_idr_mutex); - idr_remove(&ib_uverbs_pd_idr, uobj->id); up(&ib_uverbs_idr_mutex); -err_pd: + return in_len; + +err_idr: + idr_remove(&ib_uverbs_pd_idr, uobj->id); + +err_up: + up(&ib_uverbs_idr_mutex); ib_dealloc_pd(pd); err: @@ -463,24 +461,22 @@ retry: resp.mr_handle = obj->uobject.id; - down(&file->mutex); - list_add_tail(&obj->uobject.list, &file->ucontext->mr_list); - up(&file->mutex); - if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { ret = -EFAULT; - goto err_list; + goto err_idr; } + down(&file->mutex); + list_add_tail(&obj->uobject.list, &file->ucontext->mr_list); + up(&file->mutex); + up(&ib_uverbs_idr_mutex); return in_len; -err_list: - down(&file->mutex); - list_del(&obj->uobject.list); - up(&file->mutex); +err_idr: + idr_remove(&ib_uverbs_mr_idr, obj->uobject.id); err_unreg: ib_dereg_mr(mr); @@ -616,24 +612,20 @@ ssize_t ib_uverbs_create_cq(struct ib_uv cq->cq_context = ev_file; atomic_set(&cq->usecnt, 0); + down(&ib_uverbs_idr_mutex); + retry: if (!idr_pre_get(&ib_uverbs_cq_idr, GFP_KERNEL)) { ret = -ENOMEM; - goto err_cq; + goto err_up; } - down(&ib_uverbs_idr_mutex); ret = idr_get_new(&ib_uverbs_cq_idr, cq, &uobj->uobject.id); - up(&ib_uverbs_idr_mutex); if (ret == -EAGAIN) goto retry; if (ret) - goto err_cq; - - down(&file->mutex); - list_add_tail(&uobj->uobject.list, &file->ucontext->cq_list); - up(&file->mutex); + goto err_up; memset(&resp, 0, sizeof resp); resp.cq_handle = uobj->uobject.id; @@ -642,21 +634,22 @@ retry: if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { ret = -EFAULT; - goto err_list; + goto err_idr; } - return in_len; - -err_list: - down(&file->mutex); - list_del(&uobj->uobject.list); + down(&file->mutex); + list_add_tail(&uobj->uobject.list, &file->ucontext->cq_list); up(&file->mutex); - down(&ib_uverbs_idr_mutex); - idr_remove(&ib_uverbs_cq_idr, uobj->uobject.id); up(&ib_uverbs_idr_mutex); -err_cq: + return in_len; + +err_idr: + idr_remove(&ib_uverbs_cq_idr, uobj->uobject.id); + +err_up: + up(&ib_uverbs_idr_mutex); ib_destroy_cq(cq); err: @@ -837,24 +830,22 @@ retry: resp.qp_handle = uobj->uobject.id; - down(&file->mutex); - list_add_tail(&uobj->uobject.list, &file->ucontext->qp_list); - up(&file->mutex); - if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { ret = -EFAULT; - goto err_list; + goto err_idr; } + down(&file->mutex); + list_add_tail(&uobj->uobject.list, &file->ucontext->qp_list); + up(&file->mutex); + up(&ib_uverbs_idr_mutex); return in_len; -err_list: - down(&file->mutex); - list_del(&uobj->uobject.list); - up(&file->mutex); +err_idr: + idr_remove(&ib_uverbs_qp_idr, uobj->uobject.id); err_destroy: ib_destroy_qp(qp); @@ -1126,24 +1117,22 @@ retry: resp.srq_handle = uobj->uobject.id; - down(&file->mutex); - list_add_tail(&uobj->uobject.list, &file->ucontext->srq_list); - up(&file->mutex); - if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { ret = -EFAULT; - goto err_list; + goto err_idr; } + down(&file->mutex); + list_add_tail(&uobj->uobject.list, &file->ucontext->srq_list); + up(&file->mutex); + up(&ib_uverbs_idr_mutex); return in_len; -err_list: - down(&file->mutex); - list_del(&uobj->uobject.list); - up(&file->mutex); +err_idr: + idr_remove(&ib_uverbs_srq_idr, uobj->uobject.id); err_destroy: ib_destroy_srq(srq); From Federico.Sacerdoti at deshaw.com Tue Sep 27 15:53:50 2005 From: Federico.Sacerdoti at deshaw.com (Sacerdoti, Federico) Date: Tue, 27 Sep 2005 18:53:50 -0400 Subject: [openib-general] segfault on openib mvapich Message-ID: I had such high hopes for using openib gen2 when I got ibv_uc_pingpong to pass packets on our infiniband cluster. However, I cannot get mvapich to work, even with Pete Wyckoff's patches. A simple program run on two hosts always segfaults. I might have done something wrong, but tried to build using a plain source from the openib gen2 svn tree and Pete's patches (those that were not rejected). Adding the -debug flag to mpirun_rsh does not help (the xterms flash on then dissapear). The ssh connections are started fine, but the segfault happens early on. I was hoping Pete's patch: +++ mvapich-0.9.5-112/mpid/vapi/process/mpirun_rsh.c 2005-05-26 17:35:58.000000000 -0400 @@ -744,7 +744,8 @@ int id = getpid(); int str_len; - str_len = strlen(command_name) + strlen(env) + strlen(wd) + 512; + str_len = strlen(command_name) + strlen(env) + strlen(wd) + + strlen(mpirun_processes) + 512 would solve the segfault, but it still persists. -Federico Output from my mpirun_rsh -show command: command: /usr/bin/ssh drda1054 cd /u/fds/run/gen2/simple; /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=drda1054.nyc.deshaw.com MPIRUN_PORT=32884 MPIRUN_PROCESSES='drda1054:drda1055:' MPIRUN_RANK=0 MPIRUN_NPROCS=2 MPIRUN_ID=2425 DISPLAY=desrad2.nyc.deshaw.com:8.0 /u/fds/run/gen2/simple/mp command: /usr/bin/ssh drda1055 cd /u/fds/run/gen2/simple; /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=drda1054.nyc.deshaw.com MPIRUN_PORT=32884 MPIRUN_PROCESSES='drda1054:drda1055:' MPIRUN_RANK=1 MPIRUN_NPROCS=2 MPIRUN_ID=2425 DISPLAY=desrad2.nyc.deshaw.com:8.0 /u/fds/run/gen2/simple/mp bash: line 1: 31553 Segmentation fault /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=drda1054.nyc.deshaw.com MPIRUN_PORT=32885 MPIRUN_PROCESSES='drda1054:drda1055:' MPIRUN_RANK=1 MPIRUN_NPROCS=2 MPIRUN_ID=2428 DISPLAY=desrad2.nyc.deshaw.com:8.0 /u/fds/run/gen2/simple/mp bash: line 1: 2565 Segmentation fault /usr/bin/env MPIRUN_MPD=0 MPIRUN_HOST=drda1054.nyc.deshaw.com MPIRUN_PORT=32885 MPIRUN_PROCESSES='drda1054:drda1055:' MPIRUN_RANK=0 MPIRUN_NPROCS=2 MPIRUN_ID=2428 DISPLAY=desrad2.nyc.deshaw.com:8.0 /u/fds/run/gen2/simple/mp >From dmesg (tried two programs, mpi-ring and a hello-world mp): mpi-ring[30116]: segfault at 0000000000000000 rip 00000036a2b711c0 rsp 00007fffffdad598 error 6 mpi-ring[30386]: segfault at 0000000000000000 rip 00000036a2b711c0 rsp 00007fffffcb2868 error 6 mp[31283]: segfault at 0000000000000000 rip 00000036a2b711c0 rsp 00007fffffc4a838 error 6 mp[31553]: segfault at 0000000000000000 rip 00000036a2b711c0 rsp 00007fffff869738 error 6 From rolandd at cisco.com Tue Sep 27 16:01:36 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 27 Sep 2005 16:01:36 -0700 Subject: [openib-general] segfault on openib mvapich In-Reply-To: (Federico Sacerdoti's message of "Tue, 27 Sep 2005 18:53:50 -0400") References: Message-ID: <521x3ap12n.fsf@cisco.com> Federico> I might have done something wrong, but tried to build Federico> using a plain source from the openib gen2 svn tree and Federico> Pete's patches (those that were not rejected). For whatever it's worth, basic MVAPICH tests like osu_bw work fine for me with two and even four processes on two x86_64 machines. Federico> Adding the -debug flag to mpirun_rsh does not help (the Federico> xterms flash on then dissapear). The ssh connections are Federico> started fine, but the segfault happens early on. Without more data like a traceback from a core file or something like that, it's going to be very difficult for anyone to debug this. Also, it might be worth contacting the MVAPICH developers by emailing mvapich_request -- they are much more likely to be able to help than the openib-general community. - R. From panda at cse.ohio-state.edu Tue Sep 27 16:18:49 2005 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue, 27 Sep 2005 19:18:49 -0400 (EDT) Subject: [openib-general] segfault on openib mvapich In-Reply-To: <521x3ap12n.fsf@cisco.com> from "Roland Dreier" at Sep 27, 2005 04:01:36 PM Message-ID: <200509272318.j8RNInxt010823@xi.cse.ohio-state.edu> Federico, > Federico> I might have done something wrong, but tried to build > Federico> using a plain source from the openib gen2 svn tree and > Federico> Pete's patches (those that were not rejected). > > For whatever it's worth, basic MVAPICH tests like osu_bw work fine for > me with two and even four processes on two x86_64 machines. FYI, we are also running the latest version successfully on multiple platforms (IA32, Opetron and EM64T) of different sizes. We are also able to run applications successfully. To the best of our knowledge, many other organizations are also running mvapich-gen2 successfully on their platforms. > Federico> Adding the -debug flag to mpirun_rsh does not help (the > Federico> xterms flash on then dissapear). The ssh connections are > Federico> started fine, but the segfault happens early on. > > Without more data like a traceback from a core file or something like > that, it's going to be very difficult for anyone to debug this. As Roland indicates, could you please provide more details on the platform, OpenIB version (kernel, userlib), and the errors you are getting. This will help to debug the problem further and faster. > Also, it might be worth contacting the MVAPICH developers by emailing > mvapich_request -- they are much more likely to be able to help than > the openib-general community. We at OSU are monitoring the OpenIB list for mvapich-gen2 related questions and are answering them. In addition, if you can send a copy to mvapich-help at cse.ohio-state.edu (not mvapich_request), we will be able to respond even faster. Thanks, DK > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Tue Sep 27 16:30:05 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 27 Sep 2005 16:30:05 -0700 Subject: [openib-general] 2.6.14 heads up: ip_dev_find() not exported In-Reply-To: <43381FA6.4040609@ichips.intel.com> References: <52slvr7w1l.fsf@cisco.com> <43381FA6.4040609@ichips.intel.com> Message-ID: <4339D5FD.60607@ichips.intel.com> Sean Hefty wrote: >> I noticed while compiling against an up-to-date kernel tree that SDP >> and IBAT both use the function ip_dev_find(). The EXPORT_SYMBOL for >> this function was removed during the 2.6.14 devel cycle. > > > I'm calling using this routine in the rdam_cma code as well. I call it > to get the local net_device before calling neigh_lookup(). I could call > neigh_lookup_nodev() instead. FYI - calling neigh_lookup_nodev() and passing it the arp table will successfully crash the system. - Sean From viswa.krish at gmail.com Tue Sep 27 16:44:04 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Tue, 27 Sep 2005 16:44:04 -0700 Subject: [openib-general] opensm and faulty hardware In-Reply-To: <1127851980.4829.270.camel@hal.voltaire.com> References: <1127792086.4379.290.camel@hal.voltaire.com> <4338F1EB.3070909@mellanox.co.il> <4df28be405092710132622298a@mail.gmail.com> <4df28be4050927111336b0861e@mail.gmail.com> <1127848264.4829.44.camel@hal.voltaire.com> <4df28be40509271300b39548b@mail.gmail.com> <1127851980.4829.270.camel@hal.voltaire.com> Message-ID: <4df28be40509271644209b6d5@mail.gmail.com> Hal, Thanks.. works like a charm... -Viswa On 27 Sep 2005 16:13:01 -0400, Hal Rosenstock wrote: > > On Tue, 2005-09-27 at 16:00, Viswanath Krishnamurthy wrote: > > Hal, > > > > I added a hack now to get around the problem. There needs to be a > > proper fix later.. > > Can you try this instead ? Thanks. > > -- Hal > > Index: include/opensm/osm_port.h > =================================================================== > --- include/opensm/osm_port.h (revision 3567) > +++ include/opensm/osm_port.h (working copy) > @@ -346,7 +346,7 @@ osm_physp_is_healthy( > * Returns TRUE if the Physical Port has been maked as healthy > * FALSE otherwise. > * All physical ports are initialized as "healthy" but may be marked > -* otherwise if a received trap claims otherwise. > +* otherwise if a received trap claims otherwise. > * > * NOTES > * > @@ -456,6 +456,42 @@ osm_physp_set_port_info( > * Port, Physical Port > *********/ > > +/****f* OpenSM: Physical Port/osm_physp_validate_base_lid > +* NAME > +* osm_physp_validate_base_lid > +* > +* DESCRIPTION > +* Validates the base LID in the Physical Port object. > +* > +* SYNOPSIS > +*/ > +static inline boolean_t > +osm_physp_validate_base_lid( > + IN osm_physp_t* const p_physp ) > +{ > + CL_ASSERT( osm_physp_is_valid( p_physp ) ); > + if ( cl_ntoh16( p_physp->port_info.base_lid ) > IB_LID_UCAST_END_HO ) > + { > + p_physp->port_info.base_lid = 0; > + return FALSE; > + } > + return TRUE; > +} > +/* > +* PARAMETERS > +* p_physp > +* [in] Pointer to an osm_physp_t object. > +* > +* RETURN VALUES > +* Returns TRUE if the base LID in the Physical port object is valid. > +* FALSE otherwise. > +* > +* NOTES > +* > +* SEE ALSO > +* Port, Physical Port > +*********/ > + > /****f* OpenSM: Physical Port/osm_physp_set_pkey_tbl > * NAME > * osm_physp_set_pkey_tbl > Index: opensm/osm_port_info_rcv.c > =================================================================== > --- opensm/osm_port_info_rcv.c (revision 3579) > +++ opensm/osm_port_info_rcv.c (working copy) > @@ -346,8 +346,12 @@ __osm_pi_rcv_process_switch_port( > > if (port_num == 0) > { > - /* This is a management port 0 */ > - __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); > + /* This is switch management port 0 */ > + if ( !osm_physp_validate_base_lid( p_physp ) ) > + osm_log( p_rcv->p_log, OSM_LOG_ERROR, > + "__osm_pi_rcv_process_switch_port: ERR 0F04: " > + "Invalid base LID corrected.\n" ); > + __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); > } > > OSM_LOG_EXIT( p_rcv->p_log ); > @@ -367,6 +371,10 @@ __osm_pi_rcv_process_ca_port( > UNUSED_PARAM( p_node ); > > osm_physp_set_port_info( p_physp, p_pi ); > + if ( !osm_physp_validate_base_lid( p_physp ) ) > + osm_log( p_rcv->p_log, OSM_LOG_ERROR, > + "__osm_pi_rcv_process_ca_port: ERR 0F08: " > + "Invalid base LID corrected.\n" ); > > __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); > > @@ -390,6 +398,10 @@ __osm_pi_rcv_process_router_port( > Update the PortInfo attribute. > */ > osm_physp_set_port_info( p_physp, p_pi ); > + if ( !osm_physp_validate_base_lid( p_physp ) ) > + osm_log( p_rcv->p_log, OSM_LOG_ERROR, > + "__osm_pi_rcv_process_router_port: ERR 0F09: " > + "Invalid base LID corrected.\n" ); > > OSM_LOG_EXIT( p_rcv->p_log ); > } > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iod00d at hp.com Tue Sep 27 18:17:00 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 27 Sep 2005 18:17:00 -0700 Subject: [openib-general] netperf over SDP bug Message-ID: <20050928011700.GA22427@esmail.cup.hp.com> Hi Michael, I'm trying to collect a full set of netperf TCP_STREAM over SDP for SVN r3547 on 2.6.13 kernel. But some netperf runs get no throughput. Usually when sending 1k to 4k messages. The same netperf parameters sing IPoIB seem to be working fine - just alot slower of course. Summary of all netperf over SDP runs is appended. Sample commandline that got < 1Mb/s throughput is: LD_PRELOAD=/usr/local/lib/libsdp.so /usr/local/bin/netperf -p 12866 -l 60 -H 10.0.0.30 -t TCP_STREAM -T 1 -- -m 1024 -s 16384 -S 16384 I tried with some smaller -m parameters: 512 -> ~270-280 Mb/s 640 -> ~200-2100 Mb/s 768 -> ~30-50 Mb/s 896 -> ~2-6 Mb/s CPU is essentially idle in the above 512-896 byte cases. Anything obvious I should be looking for or worth trying? thanks, grant #Recv Send Send #Socket Socket Message Elapsed #Size Size Size Time Throughput #bytes bytes bytes secs. 10^6bits/sec grundler <505>grep -h ' 60\.0. ' sdp*.out | sort -n -k 3 131072 131072 1 60.01 12.79 196608 196608 1 60.01 13.16 245760 245760 1 60.02 12.81 32768 32768 1 60.02 13.63 65536 65536 1 60.00 14.00 98304 98304 1 60.02 13.46 131072 131072 16 60.03 202.12 196608 196608 16 60.01 210.89 245760 245760 16 60.01 210.49 32768 32768 16 60.01 203.87 65536 65536 16 60.02 205.01 98304 98304 16 60.02 207.76 131072 131072 64 60.02 707.19 196608 196608 64 60.01 822.16 245760 245760 64 60.01 68.63 32768 32768 64 60.01 129.12 65536 65536 64 60.03 563.39 98304 98304 64 60.02 833.90 131072 131072 1024 60.01 0.06 196608 196608 1024 60.02 0.21 245760 245760 1024 60.01 0.28 32768 32768 1024 60.02 7.24 65536 65536 1024 60.01 0.06 98304 98304 1024 60.01 0.20 131072 131072 2048 60.02 0.03 196608 196608 2048 60.02 0.03 245760 245760 2048 60.01 0.03 32768 32768 2048 60.02 0.03 65536 65536 2048 60.01 0.03 98304 98304 2048 60.02 0.11 131072 131072 4000 60.02 0.04 196608 196608 4000 60.02 0.04 245760 245760 4000 60.01 0.04 32768 32768 4000 60.02 0.04 65536 65536 4000 60.02 0.04 98304 98304 4000 60.01 0.04 131072 131072 4096 60.02 0.03 196608 196608 4096 60.02 0.03 245760 245760 4096 60.02 0.03 32768 32768 4096 60.02 0.03 65536 65536 4096 60.02 0.03 98304 98304 4096 60.02 0.03 131072 131072 8192 60.02 5340.88 196608 196608 8192 60.02 5484.05 245760 245760 8192 60.02 5328.48 32768 32768 8192 60.02 5480.58 65536 65536 8192 60.02 5319.36 98304 98304 8192 60.01 5433.01 131072 131072 8197 60.01 5412.53 196608 196608 8197 60.01 5392.23 245760 245760 8197 60.02 5391.18 32768 32768 8197 60.01 5467.68 65536 65536 8197 60.01 5465.49 98304 98304 8197 60.01 5295.13 131072 131072 16384 60.01 5492.78 196608 196608 16384 60.02 0.05 245760 245760 16384 60.01 5487.60 32768 32768 16384 60.01 5507.06 65536 65536 16384 60.02 5497.21 98304 98304 16384 60.02 5494.97 131072 131072 49152 60.01 5711.59 196608 196608 49152 60.02 5708.24 245760 245760 49152 60.01 5710.23 32768 32768 49152 60.02 5657.29 65536 65536 49152 60.01 5710.62 98304 98304 49152 60.02 5711.56 131072 131072 49157 60.01 5693.26 196608 196608 49157 60.03 5685.55 245760 245760 49157 60.01 5691.51 32768 32768 49157 60.01 5653.52 65536 65536 49157 60.01 5690.49 98304 98304 49157 60.01 5693.11 131072 131072 57344 60.01 5528.27 196608 196608 57344 60.01 5629.15 245760 245760 57344 60.01 5626.02 32768 32768 57344 60.02 5544.92 65536 65536 57344 60.01 5627.32 98304 98304 57344 60.01 5628.52 131072 131072 65536 60.01 5674.38 196608 196608 65536 60.01 5674.19 245760 245760 65536 60.01 5672.49 32768 32768 65536 60.02 5619.26 65536 65536 65536 60.01 5673.84 98304 98304 65536 60.01 5675.05 131072 131072 131070 60.01 5550.89 196608 196608 131070 60.02 5549.08 245760 245760 131070 60.01 5548.43 32768 32768 131070 60.01 5550.42 65536 65536 131070 60.02 5489.22 98304 98304 131070 60.01 5550.11 131072 131072 131072 60.01 5554.28 196608 196608 131072 60.02 5553.87 245760 245760 131072 60.02 5554.01 32768 32768 131072 60.01 5567.14 65536 65536 131072 60.01 5458.83 98304 98304 131072 60.01 5554.56 131072 131072 131079 60.03 5428.02 196608 196608 131079 60.02 5361.43 245760 245760 131079 60.01 5437.48 32768 32768 131079 60.01 5426.85 65536 65536 131079 60.01 5432.93 98304 98304 131079 60.02 5344.43 From Venkatesh.Babu at 3leafnetworks.com Tue Sep 27 18:17:17 2005 From: Venkatesh.Babu at 3leafnetworks.com (Venkatesh Babu) Date: Tue, 27 Sep 2005 18:17:17 -0700 Subject: [openib-general] Local QP operation err while sending packet over UD transport Message-ID: <7C1D552561AF0544ACC7CF6F10E4966E0252CA@chronus.3leafnetworks.corp> Hi, I am using verbs layer in OpenIB of linux kernel 2.6.13.1 and Mellanox HCA (CA type: MT25208, Number of ports: 2, Firmware version: 5.0.1, Hardware version: a0). I have created a QP with Unreliable Datagram transport to communicate with another similar system through Mallanox IB switch. I created a QP and transitioned it to RTS state and created first packet and posted it to the send queue successfully and later got a response in completion queue with an error status IB_WC_LOC_QP_OP_ERR. Opcode field of ib_wc structure was showing some unrealistic value 18177 and vendor_err as 469a0400. And also I got the following output in /var/log/messages. Oct 1 18:29:56 york kernel: ib_mthca 0000:07:00.0: local QP operation err (QPN 00040c, WQE @ 00002005, CQN 00008a, index 0) Oct 1 18:29:56 york kernel: ib_mthca 0000:07:00.0: CQE contents 0000040c b3000000 fd000000 11000000 026f0000 00000010 00002005 ff100000 I am not sure what this error means. Please somebody help me in debugging this problem and sending my first packet over the IB. Thanks, VBabu -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Tue Sep 27 18:27:41 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 27 Sep 2005 18:27:41 -0700 Subject: [openib-general] Local QP operation err while sending packet over UD transport In-Reply-To: <7C1D552561AF0544ACC7CF6F10E4966E0252CA@chronus.3leafnetworks.corp> (Venkatesh Babu's message of "Tue, 27 Sep 2005 18:17:17 -0700") References: <7C1D552561AF0544ACC7CF6F10E4966E0252CA@chronus.3leafnetworks.corp> Message-ID: <52k6h2nfqq.fsf@cisco.com> VBabu> I am using verbs layer in OpenIB of linux kernel 2.6.13.1 VBabu> and Mellanox HCA (CA type: MT25208, Number of ports: 2, VBabu> Firmware version: 5.0.1, Hardware version: a0) I'm not sure if it matters, but firmware version 5.0.1 is old -- you might want to upgrade to 5.1.0. Also, are you using the IB drivers shipped with kernel 2.6.13.1, or have you updated to the OpenIB subversion code? There were several bugfixes to the mem-free support since 2.6.13 was released. VBabu> Opcode field of ib_wc structure was showing some VBabu> unrealistic value 18177 and vendor_err as 469a0400. Per the IB spec, the opcode field is not valid if the status is not success, so there's no point in looking at that -- it's just an uninitialized memory location. Also, the current driver doesn't set the vendor_err field, so that's not useful either. VBabu> CQE contents 0000040c b3000000 fd000000 11000000 026f0000 00000010 00002005 ff100000 Perhaps someone from Mellanox can decode the undocumented fields of this CQE. One possibility that comes to mind is that you are not creating your UD address handle correctly. I think that could possibly cause the error you're seeing. Can you post the code you're getting this error with? Thanks, Roland From Venkatesh.Babu at 3leafnetworks.com Tue Sep 27 19:43:10 2005 From: Venkatesh.Babu at 3leafnetworks.com (Venkatesh Babu) Date: Tue, 27 Sep 2005 19:43:10 -0700 Subject: [openib-general] Local QP operation err while sending packet over UD transport Message-ID: <7C1D552561AF0544ACC7CF6F10E4966E0252CB@chronus.3leafnetworks.corp> VBabu> Thanks for your quick response. I'm not sure if it matters, but firmware version 5.0.1 is old -- you might want to upgrade to 5.1.0. VBabu > I will check with the vendor and findout how to upgrade the firmware. Also, are you using the IB drivers shipped with kernel 2.6.13.1, or have you updated to the OpenIB subversion code? There were several bugfixes to the mem-free support since 2.6.13 was released. VBabu> Currently I am just using the IB drivers shipped with kernel 2.6.13.1. I will also try to upgrade IB drivers to the latest OpenIB subversion code. Per the IB spec, the opcode field is not valid if the status is not success, so there's no point in looking at that -- it's just an uninitialized memory location. Also, the current driver doesn't set the vendor_err field, so that's not useful either. VBabu> Then I will ignore opcode and vendor_err fields. Perhaps someone from Mellanox can decode the undocumented fields of this CQE. VBabu> I will contact Mellanox FAE to decode these fields. One possibility that comes to mind is that you are not creating your UD address handle correctly. I think that could possibly cause the error you're seeing. Can you post the code you're getting this error with? VBabu> I am not sure I can post the whole code. But here is the part of it. I am getting "dlid" and "dqpn" of the remote end and initializing it here. #define TLEM_SERVICE_LEVEL 0 /* TODO: Choose right QoS level */ #define TLEM_SOURCE_PATH_BITS 0 /* Offset from the source base LID */ #define TLEM_STATIC_RATE 4 /* 4x can transmit @ 10Gb/s */ ... ah_attr.dlid = dlid; ah_attr.sl = TLEM_SERVICE_LEVEL; ah_attr.src_path_bits = TLEM_SOURCE_PATH_BITS; ah_attr.static_rate = TLEM_STATIC_RATE; ah_attr.ah_flags = 0; ah_attr.port_num = tlem -> port; if (IS_ERR(tlem -> address = ib_create_ah(tlem -> hca_info -> pd, &ah_attr))) { printk(KERN_ERR PFX "%s: Can't create address handle" "ib_create_ah() returned %p", __func__, tlem -> address); ret = -1; } ... /* Describe the scatter/gather list */ elem -> u.txwr.num_sge = n; elem -> u.txwr.sg_list = sglist; elem -> u.txwr.wr_id = (u64) elem; elem -> u.txwr.imm_data = 0; elem -> u.txwr.send_flags = IB_SEND_SIGNALED; elem -> u.txwr.wr.ud.ah = tlem -> address; elem -> u.txwr.wr.ud.remote_qpn = dqpn; elem -> u.txwr.wr.ud.remote_qkey = tlem -> qkey; if ((ret = ib_post_send (tlem -> qp, & elem -> u.txwr, & bad_wr)) != 0) printk (KERN_ERR "%s: ib_post_send failed: %d\n", __func__, ret); else printk (KERN_INFO "%s: ib_post_send okay!\n", __func__); -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Tue Sep 27 20:12:31 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 27 Sep 2005 20:12:31 -0700 Subject: [openib-general] Local QP operation err while sending packet over UD transport In-Reply-To: <7C1D552561AF0544ACC7CF6F10E4966E0252CB@chronus.3leafnetworks.corp> (Venkatesh Babu's message of "Tue, 27 Sep 2005 19:43:10 -0700") References: <7C1D552561AF0544ACC7CF6F10E4966E0252CB@chronus.3leafnetworks.corp> Message-ID: <528xxhopgg.fsf@cisco.com> VBabu> I am not sure I can post the whole code. But here is the VBabu> part of it. It's a little hard to debug without being able to run your code and reproduce the error. The only things I see obviously wrong are that you never seem to set elem -> u.txwr.opcode to IB_WR_SEND, so you may be posting an invalid work request, and also you never set elem -> u.txwr.next to NULL, so ib_post_send() could follow the next pointer into some other memory and post a random work request. (BTW, if your coding style is to put spaces around every operator, shouldn't you write things like 'elem -> u . txwr . next'? ;) - R. From halr at voltaire.com Tue Sep 27 20:41:27 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 27 Sep 2005 23:41:27 -0400 Subject: [openib-general] [RFC] [PATCH] OpenSM: Protect against bad LIDs returned from faulty hardware/SMA Message-ID: <1127878662.4829.2629.camel@hal.voltaire.com> OpenSM: Protect against bad LIDs returned from faulty hardware/SMA Signed-off-by: Hal Rosenstock Index: include/opensm/osm_port.h =================================================================== --- include/opensm/osm_port.h (revision 3590) +++ include/opensm/osm_port.h (working copy) @@ -346,7 +346,7 @@ osm_physp_is_healthy( * Returns TRUE if the Physical Port has been maked as healthy * FALSE otherwise. * All physical ports are initialized as "healthy" but may be marked -* otherwise if a received trap claims otherwise. +* otherwise if a received trap claims otherwise. * * NOTES * @@ -456,6 +456,44 @@ osm_physp_set_port_info( * Port, Physical Port *********/ +/****f* OpenSM: Physical Port/osm_physp_validate_base_lid +* NAME +* osm_physp_validate_base_lid +* +* DESCRIPTION +* Validates the base LID in the Physical Port object. +* +* SYNOPSIS +*/ +static inline ib_net16_t +osm_physp_validate_base_lid( + IN osm_physp_t* const p_physp ) +{ + ib_net16_t orig_lid = 0; + + CL_ASSERT( osm_physp_is_valid( p_physp ) ); + if ( cl_ntoh16( p_physp->port_info.base_lid ) > IB_LID_UCAST_END_HO ) + { + orig_lid = p_physp->port_info.base_lid; + p_physp->port_info.base_lid = 0; + } + return orig_lid; +} +/* +* PARAMETERS +* p_physp +* [in] Pointer to an osm_physp_t object. +* +* RETURN VALUES +* Returns 0 if the base LID in the Physical port object is valid. +* Returns original invalid LID otherwise. +* +* NOTES +* +* SEE ALSO +* Port, Physical Port +*********/ + /****f* OpenSM: Physical Port/osm_physp_set_pkey_tbl * NAME * osm_physp_set_pkey_tbl Index: opensm/osm_port_info_rcv.c =================================================================== --- opensm/osm_port_info_rcv.c (revision 3590) +++ opensm/osm_port_info_rcv.c (working copy) @@ -235,6 +235,7 @@ __osm_pi_rcv_process_switch_port( osm_madw_context_t context; osm_physp_t *p_remote_physp; osm_node_t *p_remote_node; + ib_net16_t orig_lid; uint8_t port_num; uint8_t remote_port_num; osm_dr_path_t path; @@ -346,8 +347,13 @@ __osm_pi_rcv_process_switch_port( if (port_num == 0) { - /* This is a management port 0 */ - __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); + /* This is switch management port 0 */ + if ( ( orig_lid = osm_physp_validate_base_lid( p_physp ) ) ) + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_pi_rcv_process_switch_port: ERR 0F04: " + "Invalid base LID 0x%x corrected.\n", + cl_ntoh16( orig_lid ) ); + __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); } OSM_LOG_EXIT( p_rcv->p_log ); @@ -362,11 +368,18 @@ __osm_pi_rcv_process_ca_port( IN osm_physp_t* const p_physp, IN const ib_port_info_t* const p_pi ) { + ib_net16_t orig_lid; + OSM_LOG_ENTER( p_rcv->p_log, __osm_pi_rcv_process_ca_port ); UNUSED_PARAM( p_node ); osm_physp_set_port_info( p_physp, p_pi ); + if ( (orig_lid = osm_physp_validate_base_lid( p_physp ) ) ) + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_pi_rcv_process_ca_port: ERR 0F08: " + "Invalid base LID 0x%x corrected.\n", + cl_ntoh16 ( orig_lid ) ); __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); @@ -382,6 +395,8 @@ __osm_pi_rcv_process_router_port( IN osm_physp_t* const p_physp, IN const ib_port_info_t* const p_pi ) { + ib_net16_t orig_lid; + OSM_LOG_ENTER( p_rcv->p_log, __osm_pi_rcv_process_router_port ); UNUSED_PARAM( p_node ); @@ -390,6 +405,11 @@ __osm_pi_rcv_process_router_port( Update the PortInfo attribute. */ osm_physp_set_port_info( p_physp, p_pi ); + if ( (orig_lid = osm_physp_validate_base_lid( p_physp ) ) ) + osm_log( p_rcv->p_log, OSM_LOG_ERROR, + "__osm_pi_rcv_process_router_port: ERR 0F09: " + "Invalid base LID 0x%x corrected.\n", + cl_ntoh16 ( orig_lid) ); OSM_LOG_EXIT( p_rcv->p_log ); } From mst at mellanox.co.il Tue Sep 27 20:52:03 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Sep 2005 06:52:03 +0300 Subject: [openib-general] Re: netperf over SDP bug In-Reply-To: <20050928011700.GA22427@esmail.cup.hp.com> References: <20050928011700.GA22427@esmail.cup.hp.com> Message-ID: <20050928035203.GB6765@mellanox.co.il> Quoting r. Grant Grundler : > Subject: netperf over SDP bug > > Hi Michael, > I'm trying to collect a full set of netperf TCP_STREAM over SDP for > SVN r3547 on 2.6.13 kernel. But some netperf runs get no throughput. > Usually when sending 1k to 4k messages. The same netperf parameters > sing IPoIB seem to be working fine - just alot slower of course. > Summary of all netperf over SDP runs is appended. > > Sample commandline that got < 1Mb/s throughput is: > LD_PRELOAD=/usr/local/lib/libsdp.so /usr/local/bin/netperf -p 12866 -l > 60 -H 10.0.0.30 -t TCP_STREAM -T 1 -- -m 1024 -s 16384 -S 16384 > > I tried with some smaller -m parameters: > 512 -> ~270-280 Mb/s > 640 -> ~200-2100 Mb/s > 768 -> ~30-50 Mb/s > 896 -> ~2-6 Mb/s > > CPU is essentially idle in the above 512-896 byte cases. > Anything obvious I should be looking for or worth trying? > > thanks, > grant > > #Recv Send Send > #Socket Socket Message Elapsed > #Size Size Size Time Throughput > #bytes bytes bytes secs. 10^6bits/sec > > grundler <505>grep -h ' 60\.0. ' sdp*.out | sort -n -k 3 Interesting. Things to try would be - enable SDP debug, or event data path debug - try running with oprofile, see what is CPU doing -- MST From rolandd at cisco.com Tue Sep 27 21:01:45 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 27 Sep 2005 21:01:45 -0700 Subject: [openib-general] [git pull] InfiniBand fixes for 2.6.14 Message-ID: <524q85on6e.fsf@cisco.com> Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This will pull the following changes (full patch below): Jack Morgenstein: [IB] mthca: fix hw_ver value returned from mthca_query_device Michael S. Tsirkin: [IB] mthca: fix off by one in clr_int calculation [IB] mthca: Fix off by one bug in mthca_map_cmd [IB] mthca: Round up number of slots in HCA context memory table Roland Dreier: [IB] mthca: Fix doorbell record resource leak [IB] uverbs: Close some exploitable races drivers/infiniband/core/uverbs.h | 1 drivers/infiniband/core/uverbs_cmd.c | 122 ++++++++++++++------------ drivers/infiniband/core/uverbs_main.c | 27 ++++-- drivers/infiniband/hw/mthca/mthca_cmd.c | 4 - drivers/infiniband/hw/mthca/mthca_eq.c | 2 drivers/infiniband/hw/mthca/mthca_memfree.c | 19 +++- drivers/infiniband/hw/mthca/mthca_provider.c | 2 include/rdma/ib_verbs.h | 1 8 files changed, 105 insertions(+), 73 deletions(-) diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -69,6 +69,7 @@ struct ib_uverbs_event_file { struct ib_uverbs_file { struct kref ref; + struct semaphore mutex; struct ib_uverbs_device *device; struct ib_ucontext *ucontext; struct ib_event_handler event_handler; diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -76,8 +76,9 @@ ssize_t ib_uverbs_get_context(struct ib_ struct ib_uverbs_get_context_resp resp; struct ib_udata udata; struct ib_device *ibdev = file->device->ib_dev; + struct ib_ucontext *ucontext; int i; - int ret = in_len; + int ret; if (out_len < sizeof resp) return -ENOSPC; @@ -85,45 +86,56 @@ ssize_t ib_uverbs_get_context(struct ib_ if (copy_from_user(&cmd, buf, sizeof cmd)) return -EFAULT; + down(&file->mutex); + + if (file->ucontext) { + ret = -EINVAL; + goto err; + } + INIT_UDATA(&udata, buf + sizeof cmd, (unsigned long) cmd.response + sizeof resp, in_len - sizeof cmd, out_len - sizeof resp); - file->ucontext = ibdev->alloc_ucontext(ibdev, &udata); - if (IS_ERR(file->ucontext)) { - ret = PTR_ERR(file->ucontext); - file->ucontext = NULL; - return ret; - } - - file->ucontext->device = ibdev; - INIT_LIST_HEAD(&file->ucontext->pd_list); - INIT_LIST_HEAD(&file->ucontext->mr_list); - INIT_LIST_HEAD(&file->ucontext->mw_list); - INIT_LIST_HEAD(&file->ucontext->cq_list); - INIT_LIST_HEAD(&file->ucontext->qp_list); - INIT_LIST_HEAD(&file->ucontext->srq_list); - INIT_LIST_HEAD(&file->ucontext->ah_list); - spin_lock_init(&file->ucontext->lock); + ucontext = ibdev->alloc_ucontext(ibdev, &udata); + if (IS_ERR(ucontext)) + return PTR_ERR(file->ucontext); + + ucontext->device = ibdev; + INIT_LIST_HEAD(&ucontext->pd_list); + INIT_LIST_HEAD(&ucontext->mr_list); + INIT_LIST_HEAD(&ucontext->mw_list); + INIT_LIST_HEAD(&ucontext->cq_list); + INIT_LIST_HEAD(&ucontext->qp_list); + INIT_LIST_HEAD(&ucontext->srq_list); + INIT_LIST_HEAD(&ucontext->ah_list); resp.async_fd = file->async_file.fd; for (i = 0; i < file->device->num_comp; ++i) if (copy_to_user((void __user *) (unsigned long) cmd.cq_fd_tab + i * sizeof (__u32), - &file->comp_file[i].fd, sizeof (__u32))) - goto err; + &file->comp_file[i].fd, sizeof (__u32))) { + ret = -EFAULT; + goto err_free; + } if (copy_to_user((void __user *) (unsigned long) cmd.response, - &resp, sizeof resp)) - goto err; + &resp, sizeof resp)) { + ret = -EFAULT; + goto err_free; + } + + file->ucontext = ucontext; + up(&file->mutex); return in_len; -err: - ibdev->dealloc_ucontext(file->ucontext); - file->ucontext = NULL; +err_free: + ibdev->dealloc_ucontext(ucontext); - return -EFAULT; +err: + up(&file->mutex); + return ret; } ssize_t ib_uverbs_query_device(struct ib_uverbs_file *file, @@ -352,9 +364,9 @@ retry: if (ret) goto err_pd; - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_add_tail(&uobj->list, &file->ucontext->pd_list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); memset(&resp, 0, sizeof resp); resp.pd_handle = uobj->id; @@ -368,9 +380,9 @@ retry: return in_len; err_list: - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); down(&ib_uverbs_idr_mutex); idr_remove(&ib_uverbs_pd_idr, uobj->id); @@ -410,9 +422,9 @@ ssize_t ib_uverbs_dealloc_pd(struct ib_u idr_remove(&ib_uverbs_pd_idr, cmd.pd_handle); - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); kfree(uobj); @@ -512,9 +524,9 @@ retry: resp.mr_handle = obj->uobject.id; - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_add_tail(&obj->uobject.list, &file->ucontext->mr_list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { @@ -527,9 +539,9 @@ retry: return in_len; err_list: - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&obj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); err_unreg: ib_dereg_mr(mr); @@ -570,9 +582,9 @@ ssize_t ib_uverbs_dereg_mr(struct ib_uve idr_remove(&ib_uverbs_mr_idr, cmd.mr_handle); - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&memobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); ib_umem_release(file->device->ib_dev, &memobj->umem); kfree(memobj); @@ -647,9 +659,9 @@ retry: if (ret) goto err_cq; - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_add_tail(&uobj->uobject.list, &file->ucontext->cq_list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); memset(&resp, 0, sizeof resp); resp.cq_handle = uobj->uobject.id; @@ -664,9 +676,9 @@ retry: return in_len; err_list: - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); down(&ib_uverbs_idr_mutex); idr_remove(&ib_uverbs_cq_idr, uobj->uobject.id); @@ -712,9 +724,9 @@ ssize_t ib_uverbs_destroy_cq(struct ib_u idr_remove(&ib_uverbs_cq_idr, cmd.cq_handle); - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); spin_lock_irq(&file->comp_file[0].lock); list_for_each_entry_safe(evt, tmp, &uobj->comp_list, obj_list) { @@ -847,9 +859,9 @@ retry: resp.qp_handle = uobj->uobject.id; - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_add_tail(&uobj->uobject.list, &file->ucontext->qp_list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { @@ -862,9 +874,9 @@ retry: return in_len; err_list: - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); err_destroy: ib_destroy_qp(qp); @@ -989,9 +1001,9 @@ ssize_t ib_uverbs_destroy_qp(struct ib_u idr_remove(&ib_uverbs_qp_idr, cmd.qp_handle); - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); spin_lock_irq(&file->async_file.lock); list_for_each_entry_safe(evt, tmp, &uobj->event_list, obj_list) { @@ -1136,9 +1148,9 @@ retry: resp.srq_handle = uobj->uobject.id; - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_add_tail(&uobj->uobject.list, &file->ucontext->srq_list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); if (copy_to_user((void __user *) (unsigned long) cmd.response, &resp, sizeof resp)) { @@ -1151,9 +1163,9 @@ retry: return in_len; err_list: - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); err_destroy: ib_destroy_srq(srq); @@ -1227,9 +1239,9 @@ ssize_t ib_uverbs_destroy_srq(struct ib_ idr_remove(&ib_uverbs_srq_idr, cmd.srq_handle); - spin_lock_irq(&file->ucontext->lock); + down(&file->mutex); list_del(&uobj->uobject.list); - spin_unlock_irq(&file->ucontext->lock); + up(&file->mutex); spin_lock_irq(&file->async_file.lock); list_for_each_entry_safe(evt, tmp, &uobj->event_list, obj_list) { diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -448,7 +448,9 @@ static ssize_t ib_uverbs_write(struct fi if (hdr.in_words * 4 != count) return -EINVAL; - if (hdr.command < 0 || hdr.command >= ARRAY_SIZE(uverbs_cmd_table)) + if (hdr.command < 0 || + hdr.command >= ARRAY_SIZE(uverbs_cmd_table) || + !uverbs_cmd_table[hdr.command]) return -EINVAL; if (!file->ucontext && @@ -484,27 +486,29 @@ static int ib_uverbs_open(struct inode * file = kmalloc(sizeof *file + (dev->num_comp - 1) * sizeof (struct ib_uverbs_event_file), GFP_KERNEL); - if (!file) - return -ENOMEM; + if (!file) { + ret = -ENOMEM; + goto err; + } file->device = dev; kref_init(&file->ref); + init_MUTEX(&file->mutex); file->ucontext = NULL; + kref_get(&file->ref); ret = ib_uverbs_event_init(&file->async_file, file); if (ret) - goto err; + goto err_kref; file->async_file.is_async = 1; - kref_get(&file->ref); - for (i = 0; i < dev->num_comp; ++i) { + kref_get(&file->ref); ret = ib_uverbs_event_init(&file->comp_file[i], file); if (ret) goto err_async; - kref_get(&file->ref); file->comp_file[i].is_async = 0; } @@ -524,9 +528,16 @@ err_async: ib_uverbs_event_release(&file->async_file); -err: +err_kref: + /* + * One extra kref_put() because we took a reference before the + * event file creation that failed and got us here. + */ + kref_put(&file->ref, ib_uverbs_release_file); kref_put(&file->ref, ib_uverbs_release_file); +err: + module_put(dev->ib_dev->owner); return ret; } diff --git a/drivers/infiniband/hw/mthca/mthca_cmd.c b/drivers/infiniband/hw/mthca/mthca_cmd.c --- a/drivers/infiniband/hw/mthca/mthca_cmd.c +++ b/drivers/infiniband/hw/mthca/mthca_cmd.c @@ -605,7 +605,7 @@ static int mthca_map_cmd(struct mthca_de err = -EINVAL; goto out; } - for (i = 0; i < mthca_icm_size(&iter) / (1 << lg); ++i, ++nent) { + for (i = 0; i < mthca_icm_size(&iter) / (1 << lg); ++i) { if (virt != -1) { pages[nent * 2] = cpu_to_be64(virt); virt += 1 << lg; @@ -616,7 +616,7 @@ static int mthca_map_cmd(struct mthca_de ts += 1 << (lg - 10); ++tc; - if (nent == MTHCA_MAILBOX_SIZE / 16) { + if (++nent == MTHCA_MAILBOX_SIZE / 16) { err = mthca_cmd(dev, mailbox->dma, nent, 0, op, CMD_TIME_CLASS_B, status); if (err || *status) diff --git a/drivers/infiniband/hw/mthca/mthca_eq.c b/drivers/infiniband/hw/mthca/mthca_eq.c --- a/drivers/infiniband/hw/mthca/mthca_eq.c +++ b/drivers/infiniband/hw/mthca/mthca_eq.c @@ -836,7 +836,7 @@ int __devinit mthca_init_eq_table(struct dev->eq_table.clr_mask = swab32(1 << (dev->eq_table.inta_pin & 31)); dev->eq_table.clr_int = dev->clr_base + - (dev->eq_table.inta_pin < 31 ? 4 : 0); + (dev->eq_table.inta_pin < 32 ? 4 : 0); } dev->eq_table.arm_mask = 0; diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c --- a/drivers/infiniband/hw/mthca/mthca_memfree.c +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -290,7 +290,7 @@ struct mthca_icm_table *mthca_alloc_icm_ int i; u8 status; - num_icm = obj_size * nobj / MTHCA_TABLE_CHUNK_SIZE; + num_icm = (obj_size * nobj + MTHCA_TABLE_CHUNK_SIZE - 1) / MTHCA_TABLE_CHUNK_SIZE; table = kmalloc(sizeof *table + num_icm * sizeof *table->icm, GFP_KERNEL); if (!table) @@ -529,12 +529,25 @@ int mthca_alloc_db(struct mthca_dev *dev goto found; } + for (i = start; i != end; i += dir) + if (!dev->db_tab->page[i].db_rec) { + page = dev->db_tab->page + i; + goto alloc; + } + if (dev->db_tab->max_group1 >= dev->db_tab->min_group2 - 1) { ret = -ENOMEM; goto out; } + if (group == 0) + ++dev->db_tab->max_group1; + else + --dev->db_tab->min_group2; + page = dev->db_tab->page + end; + +alloc: page->db_rec = dma_alloc_coherent(&dev->pdev->dev, 4096, &page->mapping, GFP_KERNEL); if (!page->db_rec) { @@ -554,10 +567,6 @@ int mthca_alloc_db(struct mthca_dev *dev } bitmap_zero(page->used, MTHCA_DB_REC_PER_PAGE); - if (group == 0) - ++dev->db_tab->max_group1; - else - --dev->db_tab->min_group2; found: j = find_first_zero_bit(page->used, MTHCA_DB_REC_PER_PAGE); diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -84,7 +84,7 @@ static int mthca_query_device(struct ib_ props->vendor_id = be32_to_cpup((__be32 *) (out_mad->data + 36)) & 0xffffff; props->vendor_part_id = be16_to_cpup((__be16 *) (out_mad->data + 30)); - props->hw_ver = be16_to_cpup((__be16 *) (out_mad->data + 32)); + props->hw_ver = be32_to_cpup((__be32 *) (out_mad->data + 32)); memcpy(&props->sys_image_guid, out_mad->data + 4, 8); memcpy(&props->node_guid, out_mad->data + 12, 8); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -665,7 +665,6 @@ struct ib_ucontext { struct list_head qp_list; struct list_head srq_list; struct list_head ah_list; - spinlock_t lock; }; struct ib_uobject { From yael at mellanox.co.il Tue Sep 27 22:27:20 2005 From: yael at mellanox.co.il (Yael Kalka) Date: Wed, 28 Sep 2005 08:27:20 +0300 Subject: [openib-general] RE: osm_port_info_rcv.c Error Numbering Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30E2321@mtlexch01.mtl.com> No problem. -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Tuesday, September 27, 2005 10:57 PM To: Yael Kalka Cc: Eitan Zahavi; openib-general at openib.org Subject: osm_port_info_rcv.c Error Numbering Hi Yael, It looks to me like the two 4A0x error messages below in osm_port_info_rcv.c should be changed to something unused in 0x0Fxx range. osm_log( p_log, OSM_LOG_ERROR, "osm_physp_has_pkey: ERR 4A02: " osm_log( p_log, OSM_LOG_ERROR, "osm_physp_has_pkey: ERR 4A03: " If that is acceptable, I will take care of it. -- Hal -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Tue Sep 27 23:13:40 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 28 Sep 2005 08:13:40 +0200 Subject: [openib-general] Re: [RFC] [PATCH] OpenSM: Protect against bad LIDs returned from faulty hardware/SMA In-Reply-To: <1127878662.4829.2629.camel@hal.voltaire.com> References: <1127878662.4829.2629.camel@hal.voltaire.com> Message-ID: <433A3494.4080400@mellanox.co.il> Hi Hal, Good catch! So I guess the osm_lid_mgr is broken when the given lid it out of range. My comment is nit picking but I think that a function that validate and modify LID it got from the HW should have a more meaningful name to clarify the "modify". I would propose: osm_physp_fix_out_of_range_base_lid osm_physp_validate_and_fix_base_lid osm_physp_trim_base_lid_to_valid_range Eitan Hal Rosenstock wrote: > OpenSM: Protect against bad LIDs returned from faulty hardware/SMA > > Signed-off-by: Hal Rosenstock > > Index: include/opensm/osm_port.h > =================================================================== > --- include/opensm/osm_port.h (revision 3590) > +++ include/opensm/osm_port.h (working copy) > @@ -346,7 +346,7 @@ osm_physp_is_healthy( > * Returns TRUE if the Physical Port has been maked as healthy > * FALSE otherwise. > * All physical ports are initialized as "healthy" but may be marked > -* otherwise if a received trap claims otherwise. > +* otherwise if a received trap claims otherwise. > * > * NOTES > * > @@ -456,6 +456,44 @@ osm_physp_set_port_info( > * Port, Physical Port > *********/ > > +/****f* OpenSM: Physical Port/osm_physp_validate_base_lid > +* NAME > +* osm_physp_validate_base_lid > +* > +* DESCRIPTION > +* Validates the base LID in the Physical Port object. > +* > +* SYNOPSIS > +*/ > +static inline ib_net16_t > +osm_physp_validate_base_lid( > + IN osm_physp_t* const p_physp ) > +{ > + ib_net16_t orig_lid = 0; > + > + CL_ASSERT( osm_physp_is_valid( p_physp ) ); > + if ( cl_ntoh16( p_physp->port_info.base_lid ) > > IB_LID_UCAST_END_HO ) > + { > + orig_lid = p_physp->port_info.base_lid; > + p_physp->port_info.base_lid = 0; > + } > + return orig_lid; > +} > +/* > +* PARAMETERS > +* p_physp > +* [in] Pointer to an osm_physp_t object. > +* > +* RETURN VALUES > +* Returns 0 if the base LID in the Physical port object is valid. > +* Returns original invalid LID otherwise. > +* > +* NOTES > +* > +* SEE ALSO > +* Port, Physical Port > +*********/ > + > /****f* OpenSM: Physical Port/osm_physp_set_pkey_tbl > * NAME > * osm_physp_set_pkey_tbl > Index: opensm/osm_port_info_rcv.c > =================================================================== > --- opensm/osm_port_info_rcv.c (revision 3590) > +++ opensm/osm_port_info_rcv.c (working copy) > @@ -235,6 +235,7 @@ __osm_pi_rcv_process_switch_port( > osm_madw_context_t context; > osm_physp_t *p_remote_physp; > osm_node_t *p_remote_node; > + ib_net16_t orig_lid; > uint8_t port_num; > uint8_t remote_port_num; > osm_dr_path_t path; > @@ -346,8 +347,13 @@ __osm_pi_rcv_process_switch_port( > > if (port_num == 0) > { > - /* This is a management port 0 */ > - __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); > + /* This is switch management port 0 */ > + if ( ( orig_lid = osm_physp_validate_base_lid( p_physp ) ) ) > + osm_log( p_rcv->p_log, OSM_LOG_ERROR, > + "__osm_pi_rcv_process_switch_port: ERR 0F04: " > + "Invalid base LID 0x%x corrected.\n", > + cl_ntoh16( orig_lid ) ); > + __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); > } > > OSM_LOG_EXIT( p_rcv->p_log ); > @@ -362,11 +368,18 @@ __osm_pi_rcv_process_ca_port( > IN osm_physp_t* const p_physp, > IN const ib_port_info_t* const p_pi ) > { > + ib_net16_t orig_lid; > + > OSM_LOG_ENTER( p_rcv->p_log, __osm_pi_rcv_process_ca_port ); > > UNUSED_PARAM( p_node ); > > osm_physp_set_port_info( p_physp, p_pi ); > + if ( (orig_lid = osm_physp_validate_base_lid( p_physp ) ) ) > + osm_log( p_rcv->p_log, OSM_LOG_ERROR, > + "__osm_pi_rcv_process_ca_port: ERR 0F08: " > + "Invalid base LID 0x%x corrected.\n", > + cl_ntoh16 ( orig_lid ) ); > > __osm_pi_rcv_process_endport(p_rcv, p_physp, p_pi); > > @@ -382,6 +395,8 @@ __osm_pi_rcv_process_router_port( > IN osm_physp_t* const p_physp, > IN const ib_port_info_t* const p_pi ) > { > + ib_net16_t orig_lid; > + > OSM_LOG_ENTER( p_rcv->p_log, __osm_pi_rcv_process_router_port ); > > UNUSED_PARAM( p_node ); > @@ -390,6 +405,11 @@ __osm_pi_rcv_process_router_port( > Update the PortInfo attribute. > */ > osm_physp_set_port_info( p_physp, p_pi ); > + if ( (orig_lid = osm_physp_validate_base_lid( p_physp ) ) ) > + osm_log( p_rcv->p_log, OSM_LOG_ERROR, > + "__osm_pi_rcv_process_router_port: ERR 0F09: " > + "Invalid base LID 0x%x corrected.\n", > + cl_ntoh16 ( orig_lid) ); > > OSM_LOG_EXIT( p_rcv->p_log ); > } > > From yael at mellanox.co.il Wed Sep 28 00:45:56 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 28 Sep 2005 10:45:56 +0300 Subject: [openib-general] [PATCH] Opensm - locking order issue Message-ID: <5zwtl1ws7f.fsf@mtl066.yok.mtl.com> Hi Hal, During one of our Windows runs we encountered a deadlock in opensm. This caused us to review the different locks in the opensm code, and we found 2 places that might cause deadlock. In the osm_state_mgr_process the order of the locks is: 1. osm_state_mgr_t.state_lock 2. p_lock (same pointer for the different sm managers) We noticed 2 places where this order wasn't kept, and a deadlock might occure. This happened where inside the p_lock osm_state_mgr_process function was called. Attached is a patch resolving this issue. Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/osm_sminfo_rcv.c =================================================================== --- opensm/osm_sminfo_rcv.c (revision 3590) +++ opensm/osm_sminfo_rcv.c (working copy) @@ -425,13 +425,18 @@ __osm_sminfo_rcv_process_set_request( /********************************************************************** + * Return a signal with which to call the osm_state_mgr_process. + * This is done since we are locked by p_rcv->p_lock in this function, + * and thus cannot call osm_state_mgr_process (that locks the state_lock). + * If return OSM_SIGNAL_NONE - do not call osm_state_mgr_process. **********************************************************************/ -void +osm_signal_t __osm_sminfo_rcv_process_get_sm( IN const osm_sminfo_rcv_t* const p_rcv, IN const osm_remote_sm_t* const p_sm ) { const ib_sm_info_t* p_smi; + osm_signal_t ret_val = OSM_SIGNAL_NONE; OSM_LOG_ENTER( p_rcv->p_log, __osm_sminfo_rcv_process_get_sm ); @@ -459,8 +464,7 @@ __osm_sminfo_rcv_process_get_sm( case IB_SMINFO_STATE_NOTACTIVE: break; case IB_SMINFO_STATE_MASTER: - osm_state_mgr_process( p_rcv->p_state_mgr, - OSM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED ); + ret_val = OSM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED; /* save on the p_sm_state_mgr the guid of the current master. */ osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, "__osm_sminfo_rcv_process_get_sm: " @@ -473,8 +477,7 @@ __osm_sminfo_rcv_process_get_sm( if ( __osm_sminfo_rcv_remote_sm_is_higher(p_rcv, p_smi) == TRUE ) { /* the remote is a higher sm - need to stop sweeping */ - osm_state_mgr_process( p_rcv->p_state_mgr, - OSM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED ); + ret_val = OSM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED; /* save on the p_sm_state_mgr the guid of the higher SM we found - */ /* we will poll it - as long as it lives - we should be in Standby. */ osm_log( p_rcv->p_log, OSM_LOG_VERBOSE, @@ -548,6 +551,7 @@ __osm_sminfo_rcv_process_get_sm( } OSM_LOG_EXIT( p_rcv->p_log ); + return ret_val; } @@ -566,6 +570,7 @@ __osm_sminfo_rcv_process_get_response( ib_net64_t port_guid; osm_remote_sm_t* p_sm; ib_api_status_t status; + osm_signal_t process_get_sm_ret_val = OSM_SIGNAL_NONE; OSM_LOG_ENTER( p_rcv->p_log, __osm_sminfo_rcv_process_get_response ); @@ -664,10 +669,16 @@ __osm_sminfo_rcv_process_get_response( p_sm->smi = *p_smi; } - __osm_sminfo_rcv_process_get_sm( p_rcv, p_sm ); + process_get_sm_ret_val = __osm_sminfo_rcv_process_get_sm( p_rcv, p_sm ); Exit: CL_PLOCK_RELEASE( p_rcv->p_lock ); + + /* If process_get_sm_ret_val != OSM_SIGNAL_NONE then we have to signal + * to the state_mgr with that signal. */ + if (process_get_sm_ret_val != OSM_SIGNAL_NONE) + osm_state_mgr_process( p_rcv->p_state_mgr, + process_get_sm_ret_val ); OSM_LOG_EXIT( p_rcv->p_log ); } Index: opensm/osm_node_info_rcv.c =================================================================== --- opensm/osm_node_info_rcv.c (revision 3590) +++ opensm/osm_node_info_rcv.c (working copy) @@ -726,8 +726,6 @@ __osm_ni_rcv_process_new( cl_ntoh64( p_ni->node_guid ), cl_ntoh64( p_smp->trans_id ) ); - osm_state_mgr_process( p_rcv->p_state_mgr, OSM_SIGNAL_CHANGE_DETECTED ); - p_node = osm_node_new( p_madw ); if( p_node == NULL ) { @@ -985,6 +983,7 @@ osm_ni_rcv_process( ib_node_info_t *p_ni; ib_smp_t *p_smp; osm_node_t *p_node; + boolean_t process_new_flag = FALSE; OSM_LOG_ENTER( p_rcv->p_log, osm_ni_rcv_process ); @@ -1026,11 +1025,23 @@ osm_ni_rcv_process( osm_dump_node_info( p_rcv->p_log, p_ni, OSM_LOG_DEBUG ); if( p_node == (osm_node_t*)cl_qmap_end(p_guid_tbl) ) + { __osm_ni_rcv_process_new( p_rcv, p_madw ); + process_new_flag = TRUE; + } else __osm_ni_rcv_process_existing( p_rcv, p_node, p_madw ); CL_PLOCK_RELEASE( p_rcv->p_lock ); + + /* + * If we processed a new node - need to signal to the state_mgr that + * change detected. BUT - we cannot call the osm_state_mgr_process + * from within the lock of p_rcv->p_lock (can cause a deadlock). + */ + if ( process_new_flag ) + osm_state_mgr_process( p_rcv->p_state_mgr, OSM_SIGNAL_CHANGE_DETECTED ); + Exit: OSM_LOG_EXIT( p_rcv->p_log ); } From guyg at voltaire.com Wed Sep 28 02:21:46 2005 From: guyg at voltaire.com (Guy German) Date: Wed, 28 Sep 2005 12:21:46 +0300 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <43397D00.6080505@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> <433827FF.3010601@ichips.intel.com> <43396797.6030804@voltaire.com> <43397D00.6080505@ichips.intel.com> Message-ID: <433A60AA.2040704@voltaire.com> Sean Hefty wrote: > I think that there will still be a need for a separate address > translation module(s) I don't understand. You think there should be an address translation module, but you object to the *name* ib_at ? (ib_at stands for "infiniband address translation") I suggested before that if ib_at should be fixed lets fix it. If API should be improved or other functionality should be added (or removed) why not do it in the existing ib_at ? If there will indeed be a separate address translation module((s)?), then why would transport aware modules won't use it along with the cm ? Doesn't that leave the cma with the abstraction purposes only ? > Caching will be complex, which is why I think that it needs to have its > own module. I'm envisioning a cache that can be saved to disk for > faster system startup. I admit I'm not aware to all the complexity of caching, so I fail to see why it can't be implemented in the ib_at module. The way I see it - If the cma can replace ib_at functionality and also serve as an at module, that's fine. But if we decide that there should be an ib_at module (which centralizes the at to all the ULP's) the cma should use this module and the cma consumer needn't be aware of it. Guy From halr at voltaire.com Wed Sep 28 02:24:24 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2005 05:24:24 -0400 Subject: [openib-general] Re: [RFC] [PATCH] OpenSM: Protect against bad LIDs returned from faulty hardware/SMA In-Reply-To: <433A3494.4080400@mellanox.co.il> References: <1127878662.4829.2629.camel@hal.voltaire.com> <433A3494.4080400@mellanox.co.il> Message-ID: <1127899464.4829.5030.camel@hal.voltaire.com> On Wed, 2005-09-28 at 02:13, Eitan Zahavi wrote: > My comment is nit picking but I think that a function that validate and modify > LID it got from the HW should have a more meaningful name to clarify the "modify". > > I would propose: > osm_physp_fix_out_of_range_base_lid > osm_physp_validate_and_fix_base_lid > osm_physp_trim_base_lid_to_valid_range OK. Of those, I chose the last option. Thanks. -- Hal From greg at kroah.com Wed Sep 28 02:36:33 2005 From: greg at kroah.com (Greg KH) Date: Wed, 28 Sep 2005 02:36:33 -0700 Subject: [openib-general] Re: [git pull] InfiniBand fixes for 2.6.14 In-Reply-To: <524q85on6e.fsf@cisco.com> References: <524q85on6e.fsf@cisco.com> Message-ID: <20050928093633.GA12757@kroah.com> On Tue, Sep 27, 2005 at 09:01:45PM -0700, Roland Dreier wrote: > Linus, please pull from > > master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus Hm, I complained about this last time, with no response... I didn't think that git pulls were going to be allowed from subsystem maintainers after -rc1 came out. After that, patches by email were required to be sent, not git pulls. This does cause a bit more work for the maintainer, but it ensures that they only send the patches they really want to get in. At least that was what I thought we decided on at the kernel summit... thanks, greg k-h From halr at voltaire.com Wed Sep 28 03:04:05 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2005 06:04:05 -0400 Subject: [openib-general] Re: [PATCH] Opensm - locking order issue In-Reply-To: <5zwtl1ws7f.fsf@mtl066.yok.mtl.com> References: <5zwtl1ws7f.fsf@mtl066.yok.mtl.com> Message-ID: <1127901845.4829.5284.camel@hal.voltaire.com> Hi Yael, On Wed, 2005-09-28 at 03:45, Yael Kalka wrote: > During one of our Windows runs we encountered a deadlock in opensm. > This caused us to review the different locks in the opensm code, and > we found 2 places that might cause deadlock. > In the osm_state_mgr_process the order of the locks is: > 1. osm_state_mgr_t.state_lock > 2. p_lock (same pointer for the different sm managers) > > We noticed 2 places where this order wasn't kept, and a deadlock might > occure. This happened where inside the p_lock osm_state_mgr_process > function was called. > > Attached is a patch resolving this issue. Excellent catch. Thanks. Applied. I wonder whether this is related to any of the other issues (not on SMInfo but on NodeInfo perhaps). Also, I am seeing the following which should be a valid self transition (DISCOVER -> DISCOVER): Sep 28 05:51:59 129290 [B76F8C40] -> __osm_sm_state_mgr_signal_error: ERR 3207: Invalid signal OSM_SM_SIGNAL_HANDOVER in state IB_SMINFO_STATE_DISCOVERING. Sep 28 05:51:59 129310 [B76F8C40] -> osm_sm_state_mgr_check_legality: ] Sep 28 05:51:59 129329 [B76F8C40] -> __osm_sminfo_rcv_process_set_request: ERR 2F07: Check legality of SM needed transition. AttributeModifier:0x1000000 RemoteState:IB_SMINFO_STATE_MASTER If you agree, do you want to fix this or should I ? -- Hal From mst at mellanox.co.il Wed Sep 28 03:29:39 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Sep 2005 13:29:39 +0300 Subject: [openib-general] Re: Local QP operation err while sending packet over UD transport In-Reply-To: <52k6h2nfqq.fsf@cisco.com> References: <52k6h2nfqq.fsf@cisco.com> Message-ID: <20050928102939.GP28251@mellanox.co.il> Quoting Roland Dreier : > VBabu> CQE contents 0000040c b3000000 fd000000 11000000 026f0000 00000010 00002005 ff100000 > > Perhaps someone from Mellanox can decode the undocumented fields of > this CQE. inline segment exceeds WQE size -- MST From aia21 at cam.ac.uk Wed Sep 28 03:58:04 2005 From: aia21 at cam.ac.uk (Anton Altaparmakov) Date: Wed, 28 Sep 2005 11:58:04 +0100 (BST) Subject: [openib-general] Re: [git pull] InfiniBand fixes for 2.6.14 In-Reply-To: <20050928093633.GA12757@kroah.com> References: <524q85on6e.fsf@cisco.com> <20050928093633.GA12757@kroah.com> Message-ID: On Wed, 28 Sep 2005, Greg KH wrote: > On Tue, Sep 27, 2005 at 09:01:45PM -0700, Roland Dreier wrote: > > Linus, please pull from > > > > master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus > > Hm, I complained about this last time, with no response... > > I didn't think that git pulls were going to be allowed from subsystem > maintainers after -rc1 came out. After that, patches by email were > required to be sent, not git pulls. This does cause a bit more work > for the maintainer, but it ensures that they only send the patches they > really want to get in. > > At least that was what I thought we decided on at the kernel summit... That is not what Linus said on LKML only a week or two ago. He said git pulls are just fine. It is the content of the git pull that matters. It has to be bug fixes only. As Linus said, and I couldn't agree more, it would be silly to limit the method of patch submission given only the content is meant to be limited. Best regards, Anton -- Anton Altaparmakov (replace at with @) Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ From yael at mellanox.co.il Wed Sep 28 04:06:51 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 28 Sep 2005 14:06:51 +0300 Subject: [openib-general] [PATCH] Opensm - accept HANDOVER during DISOVERY Message-ID: <5zvf0lwiwk.fsf@mtl066.yok.mtl.com> Hi Hal, Here is a patch for the valid signaling you found. If we receive a HANDOVER signal during DISCOVERY it can just be ignored (continue the discovering). Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/osm_sm_state_mgr.c =================================================================== --- opensm/osm_sm_state_mgr.c (revision 3595) +++ opensm/osm_sm_state_mgr.c (working copy) @@ -587,6 +587,13 @@ osm_sm_state_mgr_process( __osm_sm_state_mgr_start_polling( p_sm_mgr ); break; + case OSM_SM_SIGNAL_HANDOVER: + /* + * Do nothing. We will discover it later on. If we already discovered + * this SM, and got the HANDOVER - this means the remote SM is of + * lower priority. In this case we will stop polling it (since it is + * a lower priority SM in STANDBY state). + */ default: __osm_sm_state_mgr_signal_error( p_sm_mgr, signal ); status = IB_INVALID_PARAMETER; From tziporet at mellanox.co.il Wed Sep 28 04:14:54 2005 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 28 Sep 2005 14:14:54 +0300 Subject: [openib-general] Re: [O-MPI devel] [PATCH] Update Open MPI fo r new libibverbs API Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30A9549@mtlexch01.mtl.com> >From Roland: > OpenIB has not done an "official" release of any userspace components, > so this falls into the category of prerelease API breakage. > > New kernels will require a new libibverbs, so the number of obsolete > old development versions should decrease fairly quickly. Hi Roland, When do you expect openib to have an official userspace release so we can count on a stable API? Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Wed Sep 28 04:37:18 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Sep 2005 14:37:18 +0300 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <433827FF.3010601@ichips.intel.com> References: <433827FF.3010601@ichips.intel.com> Message-ID: <20050928113718.GB8114@mellanox.co.il> Quoting Sean Hefty : > Subject: Re: [openib-general][RFC]: CMA IB implementation > > Guy German wrote: > > I believe that ib_at is still a valuable module even if ATS reverse ARP > > is broken, and I think we should discuss this. > > Here's my thinking on this. ATS is broken as you mentioned for > reverse lookups. However, if we want to keep ATS, I think that ATS > registration/deregistration should be integrated with IPoIB. To keep > it separate, we will need to patch net_device to provide an rdma_ptr > as suggested by Roland. I *think* that having rdma_ptr might be useful in its own right, as a nicer way to get at the ipoib private data, which we need anyway, dont we? I am not, however, sure what rdma_ptr should point to? Some kind of structure including ca, port and pkey? On a side note, I wander whether ATS can be split into a separate module so that people that dont need it can avoid loading it. MST -- MST From tziporet at mellanox.co.il Wed Sep 28 04:43:29 2005 From: tziporet at mellanox.co.il (Tziporet Koren) Date: Wed, 28 Sep 2005 14:43:29 +0300 Subject: [openib-general] Mellanox verification team is starting to ch eck in to the SVN tests Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30A954A@mtlexch01.mtl.com> Roland> Can you post more details or better still fixes for the bugs? Jack is working to fix the bugs and he send patches Tziporet -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanb at mellanox.co.il Wed Sep 28 04:52:46 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Wed, 28 Sep 2005 14:52:46 +0300 Subject: [openib-general] Mellanox verification team is starting to ch eck in to the SVN tests Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E319B0EB@mtlexch01.mtl.com> Hi roland. Thanx for the feedback, we thought about the VL one more time after your email ... > > Do we really need an OS abstraction library? Can't we just use libc? We need the VL for several main reasons: * we write tests to both linux(gen2 stack) and windows(ibal stack) and we want to have common API to OS calls * we want to have functionality that you don't want to add for the driver (for example: enum values to string) * we want to have functionality that cannot be found in other place (for example: random in kernel level) > Can you post more details or better still fixes for the bugs? later i will send an email with the test command lines + the bugs they reveal. > Would it be possible to update the tests for the new CQ API? fixed. Thanx Dotan -------------- next part -------------- An HTML attachment was scrubbed... URL: From glebn at voltaire.com Wed Sep 28 05:06:28 2005 From: glebn at voltaire.com (Gleb Natapov) Date: Wed, 28 Sep 2005 15:06:28 +0300 Subject: [openib-general] Mellanox verification team is starting to ch eck in to the SVN tests In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E319B0EB@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E319B0EB@mtlexch01.mtl.com> Message-ID: <20050928120628.GJ31949@minantech.com> On Wed, Sep 28, 2005 at 02:52:46PM +0300, Dotan Barak wrote: > * we want to have functionality that cannot be found in other place (for > example: random in kernel level) You have random in linux kernel. Look for get_random_bytes(). -- Gleb. From halr at voltaire.com Wed Sep 28 05:01:17 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2005 08:01:17 -0400 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <20050928113718.GB8114@mellanox.co.il> References: <433827FF.3010601@ichips.intel.com> <20050928113718.GB8114@mellanox.co.il> Message-ID: <1127908877.4384.22.camel@hal.voltaire.com> On Wed, 2005-09-28 at 07:37, Michael S. Tsirkin wrote: > Quoting Sean Hefty : > > Subject: Re: [openib-general][RFC]: CMA IB implementation > > > > Guy German wrote: > > > I believe that ib_at is still a valuable module even if ATS reverse ARP > > > is broken, and I think we should discuss this. > > > > Here's my thinking on this. ATS is broken as you mentioned for > > reverse lookups. However, if we want to keep ATS, I think that ATS > > registration/deregistration should be integrated with IPoIB. To keep > > it separate, we will need to patch net_device to provide an rdma_ptr > > as suggested by Roland. > > I *think* that having rdma_ptr might be useful in its own right, > as a nicer way to get at the ipoib private data, which we need anyway, > dont we? > I am not, however, sure what rdma_ptr should point to? > Some kind of structure including ca, port and pkey? In the case of IPoIB, it could point to the IPoIB netdevice struct and have an IPoIB exported function to return these parameters (based on rdma_ptr passed in) ? > On a side note, I wander whether ATS can be split into a separate module > so that people that dont need it can avoid loading it. It is a separate module so I don't understand what you are saying. The only people needing this are running kDAPL, uDAPL, or iSER currently. -- Hal From mst at mellanox.co.il Wed Sep 28 05:45:02 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 28 Sep 2005 15:45:02 +0300 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <1127908877.4384.22.camel@hal.voltaire.com> References: <1127908877.4384.22.camel@hal.voltaire.com> Message-ID: <20050928124502.GE8114@mellanox.co.il> Quoting Hal Rosenstock : > > On a side note, I wander whether ATS can be split into a separate > > module so that people that dont need it can avoid loading it. > > It is a separate module so I don't understand what you are saying. > The only people needing this are running kDAPL, uDAPL, or iSER > currently. > > -- Hal > I mean, make ATS a separate module from AT. -- MST From halr at voltaire.com Wed Sep 28 05:39:18 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2005 08:39:18 -0400 Subject: [openib-general] Re: [PATCH] Opensm - accept HANDOVER during DISOVERY In-Reply-To: <5zvf0lwiwk.fsf@mtl066.yok.mtl.com> References: <5zvf0lwiwk.fsf@mtl066.yok.mtl.com> Message-ID: <1127911156.4380.8.camel@hal.voltaire.com> On Wed, 2005-09-28 at 07:06, Yael Kalka wrote: > Here is a patch for the valid signaling you found. > If we receive a HANDOVER signal during DISCOVERY it can just be > ignored (continue the discovering). Thanks. That made it easy but is a partial patch. In the future, please indicate whether patches are tested or not. Below is an updated patch for this. -- Hal Index: osm_sm_state_mgr.c =================================================================== --- osm_sm_state_mgr.c (revision 3590) +++ osm_sm_state_mgr.c (working copy) @@ -587,6 +587,14 @@ osm_sm_state_mgr_process( __osm_sm_state_mgr_start_polling( p_sm_mgr ); break; + case OSM_SM_SIGNAL_HANDOVER: + /* + * Do nothing. We will discover it later on. If we already discovered + * this SM, and got the HANDOVER - this means the remote SM is of + * lower priority. In this case we will stop polling it (since it is + * a lower priority SM in STANDBY state). + */ + break; default: __osm_sm_state_mgr_signal_error( p_sm_mgr, signal ); status = IB_INVALID_PARAMETER; @@ -798,6 +806,7 @@ osm_sm_state_mgr_check_legality( case OSM_SM_SIGNAL_DISCOVERY_COMPLETED: case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED: case OSM_SM_SIGNAL_MASTER_OR_HIGHER_SM_DETECTED_DONE: + case OSM_SM_SIGNAL_HANDOVER: status = IB_SUCCESS; break; default: From guyg at voltaire.com Wed Sep 28 06:24:47 2005 From: guyg at voltaire.com (Guy German) Date: Wed, 28 Sep 2005 16:24:47 +0300 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <20050928124502.GE8114@mellanox.co.il> References: <1127908877.4384.22.camel@hal.voltaire.com> <20050928124502.GE8114@mellanox.co.il> Message-ID: <433A999F.9020000@voltaire.com> Michael S. Tsirkin wrote: > I mean, make ATS a separate module from AT. This is how it is done in the Voltaire stack. However, I think that there are a lot of modules in the openib as it is. But the main question is: does openib wants to support ATS arp ? do we also want to support ATS registration/deregistration ? openib can support, for example, only the ATS arp and rely on the openib-less targets to do their own registrations. Guy From jackm at mellanox.co.il Wed Sep 28 06:41:07 2005 From: jackm at mellanox.co.il (Jack Morgenstein) Date: Wed, 28 Sep 2005 16:41:07 +0300 Subject: [openib-general] [PATCH] [mthca]: fixed fields in query_port Message-ID: <20050928134107.GA23849@mellanox.co.il> Still need to fix up max_vl_num (with this fix, the encoded value is returned). Should change this field to an enumerated type in struct ib_port_attr. My next patch will do this (in user level as well). Jack Signed-off-by: Jack Morgenstein Index: linux-kernel/infiniband/hw/mthca/mthca_provider.c =================================================================== --- linux-kernel/infiniband/hw/mthca/mthca_provider.c (revision 3590) +++ linux-kernel/infiniband/hw/mthca/mthca_provider.c (working copy) @@ -152,9 +152,14 @@ props->gid_tbl_len = to_mdev(ibdev)->limits.gid_table_len; props->max_msg_sz = 0x80000000; props->pkey_tbl_len = to_mdev(ibdev)->limits.pkey_table_len; + props->bad_pkey_cntr = be16_to_cpup((__be16 *) (out_mad->data + 46)); props->qkey_viol_cntr = be16_to_cpup((__be16 *) (out_mad->data + 48)); props->active_width = out_mad->data[31] & 0xf; props->active_speed = out_mad->data[35] >> 4; + props->max_mtu = out_mad->data[41] & 0xf; + props->active_mtu = out_mad->data[36] >> 4; + props->max_vl_num = out_mad->data[37] >> 4; + props->subnet_timeout = out_mad->data[51] & 0x1f; out: kfree(in_mad); From Federico.Sacerdoti at deshaw.com Wed Sep 28 06:40:10 2005 From: Federico.Sacerdoti at deshaw.com (Sacerdoti, Federico) Date: Wed, 28 Sep 2005 09:40:10 -0400 Subject: [openib-general] segfault on openib mvapich Message-ID: Thank you for your replies. It is helpful to know that you see no problems. I will continue playing with my config. For what its worth, the error happens in process/pmgr_client_mpirun_rsh.c. Here is a traceback from gdb: # Command: # mpirun_rsh -ssh -debug -np 2 -hostfile ../../machines.txt # /u/fds/run/gen2/simple/mp This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db library "/lib64/tls/libthread_db.so.1". (gdb) run Starting program: /u/fds/run/gen2/simple/mp Program received signal SIGSEGV, Segmentation fault. 0x0000003347d711c0 in bzero () from /lib64/tls/libc.so.6 (gdb) bt #0 0x0000003347d711c0 in bzero () from /lib64/tls/libc.so.6 #1 0x0000000000419d6c in pmgr_client_init () #2 0x000000000041ff36 in MPID_VIA_Init () #3 0x0000000000415962 in MPID_Init () #4 0x0000000000402059 in MPIR_Init () #5 0x0000000000401ea4 in main (argc=1, argv=0x7fffff819e38) at mp.c:8 (gdb) I will try to turn on -g on mpirun_rsh to get better debugging info. -Federico -----Original Message----- From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] Sent: Tuesday, September 27, 2005 7:19 PM To: Roland Dreier Cc: Sacerdoti, Federico; openib-general at openib.org Subject: Re: [openib-general] segfault on openib mvapich Federico, > Federico> I might have done something wrong, but tried to build > Federico> using a plain source from the openib gen2 svn tree and > Federico> Pete's patches (those that were not rejected). > > For whatever it's worth, basic MVAPICH tests like osu_bw work fine for > me with two and even four processes on two x86_64 machines. FYI, we are also running the latest version successfully on multiple platforms (IA32, Opetron and EM64T) of different sizes. We are also able to run applications successfully. To the best of our knowledge, many other organizations are also running mvapich-gen2 successfully on their platforms. > Federico> Adding the -debug flag to mpirun_rsh does not help (the > Federico> xterms flash on then dissapear). The ssh connections are > Federico> started fine, but the segfault happens early on. > > Without more data like a traceback from a core file or something like > that, it's going to be very difficult for anyone to debug this. As Roland indicates, could you please provide more details on the platform, OpenIB version (kernel, userlib), and the errors you are getting. This will help to debug the problem further and faster. > Also, it might be worth contacting the MVAPICH developers by emailing > mvapich_request -- they are much more likely to be able to help than > the openib-general community. We at OSU are monitoring the OpenIB list for mvapich-gen2 related questions and are answering them. In addition, if you can send a copy to mvapich-help at cse.ohio-state.edu (not mvapich_request), we will be able to respond even faster. Thanks, DK > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From dotanb at mellanox.co.il Wed Sep 28 06:43:01 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Wed, 28 Sep 2005 16:43:01 +0300 Subject: [openib-general] some bugs that can be found using the gen2_basic in the contrib/m ellanox folder Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E319B157@mtlexch01.mtl.com> 1) ibv_query_gid does not check port number and gid table index % ./gen2_basic -c=HCA -t=5 2) create ah with illegal port number in struct ibv_ah_attr % ./gen2_basic -c=AV -t=1 3) register mr with illegal permission (only remote read / write / atom is enabled without local write) % ./gen2_basic -c=MR -t=1 Dotan Barak Software Verification Engineer Mellanox Technologies LTD Tel: +972-4-9097200 Ext: 231 Fax: +972-4-9593245 P.O. Box 86 Yokneam 20692 ISRAEL. Home: +972-77-8841095 Cell: 052-4222383 [ May the fork be with you ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From dotanb at mellanox.co.il Wed Sep 28 06:44:19 2005 From: dotanb at mellanox.co.il (Dotan Barak) Date: Wed, 28 Sep 2005 16:44:19 +0300 Subject: [openib-general] Mellanox verification team is starting to ch eck in to the SVN tests Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E319B159@mtlexch01.mtl.com> > You have random in Linux kernel. Look for get_random_bytes(). > this was only an example ... Dotan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Wed Sep 28 06:44:55 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 28 Sep 2005 06:44:55 -0700 Subject: [openib-general] Re: [git pull] InfiniBand fixes for 2.6.14 In-Reply-To: <20050928093633.GA12757@kroah.com> (Greg KH's message of "Wed, 28 Sep 2005 02:36:33 -0700") References: <524q85on6e.fsf@cisco.com> <20050928093633.GA12757@kroah.com> Message-ID: <52zmpxmhm0.fsf@cisco.com> Greg> I didn't think that git pulls were going to be allowed from Greg> subsystem maintainers after -rc1 came out. After that, Greg> patches by email were required to be sent, not git pulls. Greg> This does cause a bit more work for the maintainer, but it Greg> ensures that they only send the patches they really want to Greg> get in. I specifically asked Linus about this a couple of weeks ago, and he said that bug-fix-only git merges are file. See http://lkml.org/lkml/2005/9/13/277 - R. From halr at voltaire.com Wed Sep 28 06:39:13 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2005 09:39:13 -0400 Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: References: <1127749644.4398.878.camel@hal.voltaire.com> <1127839244.4436.7.camel@localhost.localdomain> Message-ID: <1127914752.4380.120.camel@hal.voltaire.com> On Tue, 2005-09-27 at 12:53, James Lentini wrote: > On Tue, 27 Sep 2005, Hal Rosenstock wrote: > > > > Since we don't check for a kmalloc failure in DT_Tdep_PT_Printf, this > > > oops occurs: > > > > > > > Sep 26 10:29:30 hal kernel: Unable to handle kernel NULL pointer > > > > dereference at virtual address 00000004 > > > > > > I've checked in the patch below to fix that, but this is not the root > > > of the problem. > > > > I'll try it with the patch and let you know how it behaves. When it > > still runs out of memory will it fail more gracefully ? I understand it > > won't fix the root cause of running out of memory. > > It should behave more gracefully. Thanks for testing. That seems better but I still see the following: Sep 28 09:33:07 hal kernel: teback:0 unstable:0 free:420 slab:29838 mapped:28019 pagetables:487 Sep 28 09:33:07 hal kernel: DMA free:1008kB min:128kB low:160kB high:192kB active:3560kB inactive:1596kB present:16384kB pages_scanned:0 all_unreclaimable? no Sep 28 09:33:07 hal kernel: lowmem_reserve[]: 0 240 240 Sep 28 09:33:07 hal kernel: Normal free:672kB min:1920kB low:2400kB high:2880kB active:90152kB inactive:25992kB present:245760kB pages_scanned:91 all_unreclaimable? no Sep 28 09:33:07 hal kernel: lowmem_reserve[]: 0 0 0 Sep 28 09:33:07 hal kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Sep 28 09:33:07 hal kernel: lowmem_reserve[]: 0 0 0 Sep 28 09:33:07 hal kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB Sep 28 09:33:07 hal kernel: Normal: 0*4kB 0*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 672kB Sep 28 09:33:07 hal kernel: HighMem: empty Sep 28 09:33:07 hal kernel: Swap cache: add 19875, delete 15824, find 4973/5982, race 0+0 Sep 28 09:33:07 hal kernel: Free swap = 483892kB Sep 28 09:33:07 hal kernel: Total swap = 522104kB Sep 28 09:33:07 hal kernel: Free swap: 483892kB Sep 28 09:33:09 hal kernel: 65536 pages of RAM Sep 28 09:33:10 hal kernel: 0 pages of HIGHMEM Sep 28 09:33:10 hal kernel: 1533 reserved pages Sep 28 09:33:11 hal kernel: 47248 pages shared Sep 28 09:33:11 hal kernel: 4051 pages swap cached Sep 28 09:33:11 hal kernel: 0 pages dirty Sep 28 09:33:11 hal kernel: 0 pages writeback Sep 28 09:33:11 hal kernel: 28019 pages mapped Sep 28 09:33:11 hal kernel: 29838 pages slab Sep 28 09:33:11 hal kernel: 487 pages pagetables Sep 28 09:33:11 hal kernel: DT_Tdep_PT_Printf: out of memory Sep 28 09:33:11 hal kernel: DT_Mdep_Thread_: page allocation failure. order:0, mode:0x20 Sep 28 09:33:11 hal kernel: [] __alloc_pages+0x2f2/0x490 Sep 28 09:33:11 hal kernel: [] kmem_getpages+0x31/0xb0 Sep 28 09:33:11 hal kernel: [] cache_grow+0x139/0x360 Sep 28 09:33:11 hal kernel: [] cache_alloc_refill+0x151/0x340 Sep 28 09:33:11 hal kernel: [] DT_handle_send_op+0x2fa/0x400 [kdapltest] Sep 28 09:33:11 hal kernel: [] __kmalloc+0xb4/0xf0 Sep 28 09:33:11 hal kernel: [] DT_Mdep_Malloc+0x25/0x60 [kdapltest] Sep 28 09:33:11 hal kernel: [] DT_Tdep_PT_Printf+0x16/0x1d0 [kdapltest] Sep 28 09:33:11 hal kernel: [] DT_Transaction_Run+0x2c8/0xb60 [kdapltest] Sep 28 09:33:11 hal kernel: [] DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest] Sep 28 09:33:11 hal kernel: [] DT_Mdep_wait_object_wakeup+0x1d/0x30 [kdapltest] Sep 28 09:33:11 hal kernel: [] DT_Transaction_Main+0x1388/0x21a0 [kdapltest] Sep 28 09:33:11 hal kernel: [] __change_page_attr+0x2d/0x170 Sep 28 09:33:11 hal kernel: [] cache_free_debugcheck+0x196/0x2d0 Sep 28 09:33:11 hal kernel: [] DT_Mdep_Thread_Start_Routine+0x1f/0x30 [kdapltest] Sep 28 09:33:11 hal kernel: [] DT_Mdep_Thread_Start_Routine+0x0/0x30 [kdapltest] Sep 28 09:33:11 hal kernel: [] kernel_thread_helper+0x5/0x10 Sep 28 09:33:11 hal kernel: DMA per-cpu: Sep 28 09:33:11 hal kernel: cpu 0 hot: low 2, high 6, batch 1 used:2 Sep 28 09:33:11 hal kernel: cpu 0 cold: low 0, high 2, batch 1 used:1 Sep 28 09:33:11 hal kernel: Normal per-cpu: Sep 28 09:33:11 hal kernel: cpu 0 hot: low 62, high 186, batch 31 used:92 Sep 28 09:33:11 hal kernel: cpu 0 cold: low 0, high 62, batch 31 used:34 Sep 28 09:33:11 hal kernel: HighMem per-cpu: empty Sep 28 09:33:11 hal kernel: Free pages: 1680kB (0kB HighMem) Sep 28 09:33:11 hal kernel: Active:23428 inactive:6897 dirty:0 writeback:0 unstable:0 free:420 slab:29838 mapped:28019 pagetables:487 Sep 28 09:33:11 hal kernel: DMA free:1008kB min:128kB low:160kB high:192kB active:3560kB inactive:1596kB present:16384kB pages_scanned:0 all_unreclaimable? no Sep 28 09:33:11 hal kernel: lowmem_reserve[]: 0 240 240 Sep 28 09:33:11 hal kernel: Normal free:672kB min:1920kB low:2400kB high:2880kB active:90152kB inactive:25992kB present:245760kB pages_scanned:91 all_unreclaimable? no Sep 28 09:33:11 hal kernel: lowmem_reserve[]: 0 0 0 Sep 28 09:33:11 hal kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no Sep 28 09:33:11 hal kernel: lowmem_reserve[]: 0 0 0 Sep 28 09:33:11 hal kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1008kB Sep 28 09:33:11 hal kernel: Normal: 0*4kB 0*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 672kB Sep 28 09:33:11 hal kernel: HighMem: empty Sep 28 09:33:11 hal kernel: Swap cache: add 19875, delete 15824, find 4973/5982, race 0+0 Sep 28 09:33:11 hal kernel: Free swap = 483892kB Sep 28 09:33:11 hal kernel: Total swap = 522104kB Sep 28 09:33:11 hal kernel: Free swap: 483892kB Sep 28 09:33:11 hal kernel: 65536 pages of RAM Sep 28 09:33:11 hal kernel: 0 pages of HIGHMEM Sep 28 09:33:12 hal kernel: 1533 reserved pages Sep 28 09:33:12 hal kernel: 47248 pages shared Sep 28 09:33:12 hal kernel: 4051 pages swap cached Sep 28 09:33:12 hal kernel: 0 pages dirty Sep 28 09:33:12 hal kernel: 0 pages writeback Sep 28 09:33:12 hal kernel: 28019 pages mapped Sep 28 09:33:12 hal kernel: 29838 pages slab Sep 28 09:33:12 hal kernel: 487 pages pagetables Sep 28 09:33:12 hal kernel: DT_Tdep_PT_Printf: out of memory Sep 28 09:33:12 hal kernel: DT_Mdep_Thread_: page allocation failure. order:0, mode:0x20 Sep 28 09:33:12 hal kernel: [] __alloc_pages+0x2f2/0x490 Sep 28 09:33:12 hal kernel: [] kmem_getpages+0x31/0xb0 Sep 28 09:33:12 hal kernel: [] cache_grow+0x139/0x360 Sep 28 09:33:12 hal kernel: [] vscnprintf+0x2b/0x40 Sep 28 09:33:12 hal kernel: [] cache_alloc_refill+0x151/0x340 Sep 28 09:33:12 hal kernel: [] __kmalloc+0xb4/0xf0 Sep 28 09:33:12 hal kernel: [] DT_Mdep_Malloc+0x25/0x60 [kdapltest] Sep 28 09:33:12 hal kernel: [] DT_Mdep_Malloc+0x25/0x60 [kdapltest] Sep 28 09:33:12 hal kernel: -D mthca0a -d -t 2 -w 8 -i 20 client SR server SR would work and kdapltest -T T -s -D mthca0a -d -i 10000 -w 8 client SR server SR would fail. It seems the former is more strenuous (everything same but 2 threads and less iterations). -- Hal From nacc at us.ibm.com Wed Sep 28 07:12:26 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Wed, 28 Sep 2005 07:12:26 -0700 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <20050924181108.GB28695@us.ibm.com> References: <20050924074611.GD3950@us.ibm.com> <52psqy8jt2.fsf@cisco.com> <20050924181108.GB28695@us.ibm.com> Message-ID: <20050928141226.GC5791@us.ibm.com> On 24.09.2005 [11:11:08 -0700], Nishanth Aravamudan wrote: > On 24.09.2005 [10:19:53 -0700], Roland Dreier wrote: > > Nish> I have a prototype of something similar running right now, > > Nish> to help test InfiniBand, both in mainline and in the svn > > Nish> repo. Basically, every night (this part hasn't been set up > > Nish> yet, but should be nothing more than a crontab entry), I can > > Nish> spawn a build job for InfiniBand. Currently, it will only > > Nish> cover compile-testing in the following sense: build current > > Nish> -git with IB options set to =y and =m in x86 and ppc64; and > > Nish> build current -git with the current svn code linked and IB > > Nish> options set to =y and =m in x86 and ppc64. > > > > This is great, thanks! The build of latest git + latest svn might not > > always succeed, because we try to keep svn working with the latest > > full kernel release, but it's still very helpful to get advance > > warning of API changes that will break our tree. > > > > Nish> I have attached below my results from 2.6.14-rc2-git3. Only > > Nish> build failure was the gen2 kernel code under ppc64 with > > Nish> everything set to y. > > > > I just checked in a fix for this -- the pci_pretty_name() API has gone > > away, so I removed our use of it in svn. I don't understand how your > > other builds of git + svn succeeded though, since pci_pretty_name is > > completely gone. Oh, I guess you'll miss link failures when building > > modules, so functions that disappear won't break the build. Still, > > how did the x86 =y build succeed? > > And, in fact, the x86 =y build also fails, same issue (now that I've > found a consistently working machine, shouldn't run into the gcc > problems again; we tend not to update the test machines). Just an FYI to everyone, I haven't run my tests for the past two days, as it seems Linus' tree is stuck at -git6. I guess I can just run the svn components. I'll reply later with the results. Thanks, Nish From jlentini at netapp.com Wed Sep 28 08:13:29 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 28 Sep 2005 11:13:29 -0400 (EDT) Subject: [openib-general] Re: [PATCH] uDAPL build fix for OS vendor variations of IA64_FETCHADD In-Reply-To: References: Message-ID: Hi Arlin, A couple of questions: > Index: dapl/udapl/linux/dapl_osd.h > =================================================================== > --- dapl/udapl/linux/dapl_osd.h (revision 3541) > +++ dapl/udapl/linux/dapl_osd.h (working copy) > @@ -83,7 +83,6 @@ > #include > #endif > > - > /* Useful debug definitions */ > #ifndef STATIC > #define STATIC static > @@ -156,13 +155,17 @@ > #ifdef __ia64__ > DAT_COUNT old_value; > > -#if OS_VERSION >= LINUX_VERSION(2,6) > - IA64_FETCHADD (old_value,v,1,4,rel); > +#ifndef REDHAT_EL4 > +# if OS_RELEASE >= LINUX_VERSION(2,6) > + IA64_FETCHADD(old_value,v,1,4,rel); > +# else > + IA64_FETCHADD(old_value,v,1,4); > +# endif > #else > - IA64_FETCHADD (old_value,v,1,4); > + IA64_FETCHADD(old_value,v,1,4); > #endif Previously, if we were on Linux => 2.6, we used the 5 parameter version, otherwise we used the 4 parameter version. Why don't we continue to use the 5 parameter version if we are on Linux => 2.6 and not REHHAT_EL4? > > -#else /* !__ia64__ */ > +#else Why remove /* !__ia64__ */? > __asm__ __volatile__ ( > "lock;" "incl %0" > :"=m" (*v) > @@ -184,13 +187,17 @@ > #ifdef __ia64__ > DAT_COUNT old_value; > > -#if OS_VERSION >= LINUX_VERSION(2,6) > - IA64_FETCHADD (old_value,v,-1,4,rel); > +#ifndef REDHAT_EL4 > +# if OS_RELEASE >= LINUX_VERSION(2,6) > + IA64_FETCHADD(old_value,v,-1,4,rel); > +# else > + IA64_FETCHADD(old_value,v,-1,4); > +# endif > #else > - IA64_FETCHADD (old_value,v,-1,4); > + IA64_FETCHADD(old_value,v,-1,4); Why not continue to use the 5 parameter version if we are on Linux => 2.6 and not REHHAT_EL4? > #endif > > -#else /* !__ia64__ */ > +#else Why remove /* !__ia64__ */? > __asm__ __volatile__ ( > "lock;" "decl %0" > :"=m" (*v) > @@ -227,9 +234,11 @@ > */ > > #ifdef __ia64__ > - > -current_value = ia64_cmpxchg("acq",v,match_value,new_value,4); > - > +#ifdef REDHAT_EL4 > + current_value = ia64_cmpxchg("acq",v,match_value,new_value,4); > +#else > + current_value = ia64_cmpxchg(acq,v,match_value,new_value,4); > +#endif > #else > __asm__ __volatile__ ( > "lock; cmpxchgl %1, %2" Everything else looks good. From jlentini at netapp.com Wed Sep 28 08:37:39 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 28 Sep 2005 11:37:39 -0400 (EDT) Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: <1127914752.4380.120.camel@hal.voltaire.com> References: <1127749644.4398.878.camel@hal.voltaire.com> <1127839244.4436.7.camel@localhost.localdomain> <1127914752.4380.120.camel@hal.voltaire.com> Message-ID: On Wed, 28 Sep 2005, Hal Rosenstock wrote: halr> That seems better but I still see the following: The failure is as expected, but at least it was graceful. halr> Also, I don't understand why: halr> kdapltest -T T -s -D mthca0a -d -t 2 -w 8 -i 20 client SR server SR halr> would work and halr> kdapltest -T T -s -D mthca0a -d -i 10000 -w 8 client SR server SR halr> would fail. It seems the former is more strenuous (everything same but 2 halr> threads and less iterations). My hypothesis is that each iteration allocates memory that isn't free'd until the test is over. I haven't been able to reproduce this yet, even when I turn on DEBUG_PAGEALLOC. What are the contents of /etc/cpuinfo and /etc/meminfo on this system? james From rolandd at cisco.com Wed Sep 28 08:52:12 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 28 Sep 2005 08:52:12 -0700 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <433A999F.9020000@voltaire.com> (Guy German's message of "Wed, 28 Sep 2005 16:24:47 +0300") References: <1127908877.4384.22.camel@hal.voltaire.com> <20050928124502.GE8114@mellanox.co.il> <433A999F.9020000@voltaire.com> Message-ID: <52mzlxmbpv.fsf@cisco.com> Guy> But the main question is: does openib wants to support ATS Guy> arp ? do we also want to support ATS Guy> registration/deregistration ? openib can support, for Guy> example, only the ATS arp and rely on the openib-less targets Guy> to do their own registrations. I would certainly prefer to forget all about ATS in every form. - R. From jlentini at netapp.com Wed Sep 28 08:59:12 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 28 Sep 2005 11:59:12 -0400 (EDT) Subject: [openib-general] [CMA][PATCH] comment fix Message-ID: Signed-off-by: James Lentini Index: include/rdma/rdma_cma.h =================================================================== --- include/rdma/rdma_cma.h (revision 3600) +++ include/rdma/rdma_cma.h (working copy) @@ -117,7 +117,7 @@ /** * rdma_cma_listen - this function is called by the passive side to - * listen on a the specified address for incoming connection requests. + * listen on the specified address for incoming connection requests. */ int rdma_cma_listen(struct rdma_cma_id *cma_id, struct sockaddr *addr); From administrator at openib.org Wed Sep 28 09:06:07 2005 From: administrator at openib.org (administrator at openib.org) Date: Wed, 28 Sep 2005 22:06:07 +0600 Subject: [openib-general] Your password has been updated Message-ID: <0INK00CTW9IM7R@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: account-password.zip Type: application/octet-stream Size: 53534 bytes Desc: not available URL: From Administrator at openib.org Wed Sep 28 09:05:04 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Wed, 28 Sep 2005 11:05:04 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <003b01c5c446$6333ef70$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Your password has been updated Scanning time = 9/28/2005 11:05:03 AM Engine/Pattern = 7.510-1002/2.861.00 Action on virus found: The attachment account-password.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/28/2005 account-password.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] Your password has been updated From Administrator at openib.org Wed Sep 28 09:05:46 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Wed, 28 Sep 2005 09:05:46 -0700 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <021001c5c446$7cb81200$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Your password has been updated Scanning time = 9/28/2005 9:05:45 AM Engine/Pattern = 7.510-1002/2.861.00 Action on virus found: The attachment account-password.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. From jlentini at netapp.com Wed Sep 28 09:10:30 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 28 Sep 2005 12:10:30 -0400 (EDT) Subject: [openib-general] [PATCH] [CMA] [RFC] add routine to transition a QP to INIT state In-Reply-To: References: Message-ID: On Tue, 27 Sep 2005, Sean Hefty wrote: > Index: include/rdma/rdma_cma.h > =================================================================== > --- include/rdma/rdma_cma.h (revision 3568) > +++ include/rdma/rdma_cma.h (working copy) > @@ -93,8 +93,14 @@ int rdma_cma_resolve_route(struct rdma_c > struct sockaddr *src_addr, struct sockaddr *dst_addr, > int timeout_ms); > > +/** > + * rdma_cma_init_qp - Associates a QP with a CMA identifier and initializes the > + * QP for use in establishing a connection. > + */ > +int rdma_cma_init_qp(struct rdma_cma_id *cma_id, struct ib_qp *qp, > + int qp_access_flags); > + How will the qp_accesss_flags be implemented in a transport neutral way? If iWARP doesn't support the ib_access_flags values, a transport nuetral consumer can't pass them here. Can we remove the need for a consumer to call rdma_cma_init_qp all together? Can we create a QP and move it to the init state when the consumer creates their rdma_cma_id? From mamidala at cse.ohio-state.edu Wed Sep 28 09:15:09 2005 From: mamidala at cse.ohio-state.edu (amith rajith mamidala) Date: Wed, 28 Sep 2005 12:15:09 -0400 (EDT) Subject: [openib-general] [PATCH] Fix MVAPICH compile with gcc4 In-Reply-To: <52ll1jsf1s.fsf_-_@cisco.com> Message-ID: Hi, We have incorporated the gcc version 4 patch into the openib svn. Thanks, Amith,Weikuan On Mon, 26 Sep 2005, Roland Dreier wrote: > gcc version 4 doesn't like the extern declaration of free_vbuf_head to > followed by a static declaration in vbuf.c. To fix this, we can just > get rid of the declaration in vbuf.h, since free_vbuf_head is not used > outside of vbuf.c. > > Signed-off-by: Roland Dreier > > --- mpid/ch_gen2/vbuf.h (revision 3549) > +++ mpid/ch_gen2/vbuf.h (working copy) > @@ -188,8 +188,6 @@ void allocate_vbufs(void); > > void deallocate_vbufs(void); > > -extern vbuf *free_vbuf_head; > - > vbuf *get_vbuf(void); > void release_vbuf(vbuf * v); > void vbuf_init_send(vbuf * v, unsigned long len); > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From halr at voltaire.com Wed Sep 28 09:19:17 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2005 12:19:17 -0400 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <43397D00.6080505@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> <433827FF.3010601@ichips.intel.com> <43396797.6030804@voltaire.com> <43397D00.6080505@ichips.intel.com> Message-ID: <1127924356.4380.587.camel@hal.voltaire.com> On Tue, 2005-09-27 at 13:10, Sean Hefty wrote: > > 4. ATS registration > > sean> I think that ATS registration/deregistration should be integrated > > with > > sean> IPoIB. > > > > I don't think there is a consensus around that, but I don't know all > > details. > > This makes more sense to me than having the ATS code deference IPoIB private > data structures. That's just the way it is implemented today. There could be a public way to get the device, port, and pkey from IPoIB. > However, if adding an rdma_ptr to the net_device can avoid > this, then that will work. Good. Then there's agreement on adding this at leaast amongst the OpenIBers. > And to be clear, I was referring to only > registration/deregistration, not ATS queries. It looks like the ATS code > periodically scans all network devices in the system looking for changes in > order to update the ATS records. Where do you see that ? I think it only does that on notification events from netdev or inet. -- Hal From mshefty at ichips.intel.com Wed Sep 28 09:37:27 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 28 Sep 2005 09:37:27 -0700 Subject: [openib-general] Re: [CMA][PATCH] comment fix In-Reply-To: References: Message-ID: <433AC6C7.1080808@ichips.intel.com> James Lentini wrote: > Signed-off-by: James Lentini > > Index: include/rdma/rdma_cma.h > =================================================================== Thanks! Applied. From mshefty at ichips.intel.com Wed Sep 28 09:44:17 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 28 Sep 2005 09:44:17 -0700 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <52mzlxmbpv.fsf@cisco.com> References: <1127908877.4384.22.camel@hal.voltaire.com> <20050928124502.GE8114@mellanox.co.il> <433A999F.9020000@voltaire.com> <52mzlxmbpv.fsf@cisco.com> Message-ID: <433AC861.4060809@ichips.intel.com> Roland Dreier wrote: > Guy> But the main question is: does openib wants to support ATS > Guy> arp ? do we also want to support ATS > Guy> registration/deregistration ? openib can support, for > Guy> example, only the ATS arp and rely on the openib-less targets > Guy> to do their own registrations. > > I would certainly prefer to forget all about ATS in every form. That is my preference as well. I believe that we want to support address translation using ARP, but not ATS as defined by SA service registration and queries. - Sean From Venkatesh.Babu at 3leafnetworks.com Wed Sep 28 09:57:13 2005 From: Venkatesh.Babu at 3leafnetworks.com (Venkatesh Babu) Date: Wed, 28 Sep 2005 09:57:13 -0700 Subject: [openib-general] Local QP operation err while sending packet over UD transport Message-ID: <7C1D552561AF0544ACC7CF6F10E4966E0252CC@chronus.3leafnetworks.corp> I missed the following lines when I did cut and paste of the code - elem -> u.txwr.opcode = IB_WR_SEND; elem -> u.txwr.next = NULL; /* Send only one */ So it is not the obvious error, it may be something else. (Yes, I need to correct the coding style errors.) VBabu -----Original Message----- From: Roland Dreier [mailto:rolandd at cisco.com] Sent: Tue 9/27/2005 8:12 PM To: Venkatesh Babu Cc: Roland Dreier; openib-general at openib.org Subject: Re: [openib-general] Local QP operation err while sending packet over UD transport VBabu> I am not sure I can post the whole code. But here is the VBabu> part of it. It's a little hard to debug without being able to run your code and reproduce the error. The only things I see obviously wrong are that you never seem to set elem -> u.txwr.opcode to IB_WR_SEND, so you may be posting an invalid work request, and also you never set elem -> u.txwr.next to NULL, so ib_post_send() could follow the next pointer into some other memory and post a random work request. (BTW, if your coding style is to put spaces around every operator, shouldn't you write things like 'elem -> u . txwr . next'? ;) - R. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Wed Sep 28 09:58:04 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 28 Sep 2005 09:58:04 -0700 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <433A60AA.2040704@voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> <433827FF.3010601@ichips.intel.com> <43396797.6030804@voltaire.com> <43397D00.6080505@ichips.intel.com> <433A60AA.2040704@voltaire.com> Message-ID: <433ACB9C.80003@ichips.intel.com> Guy German wrote: > Sean Hefty wrote: > >> I think that there will still be a need for a separate address >> translation module(s) > > I don't understand. You think there should be an address translation > module, but you object to the *name* ib_at ? (ib_at stands for > "infiniband address translation") I don't object to the name, just combining the current functionality that ib_at tries to provide into a single abstraction. I think that the disagreement is what functionality a core address translation module should provide. > I suggested before that if ib_at should be fixed lets fix it. If API > should be improved or other functionality should be added (or removed) > why not do it in the existing ib_at ? My preference/current course of action is to extract and improve the ARP based address translation code from ib_at. My guess is that the ARP based address translation code will be enough functionality to stand alone in its own module. If other functionality from ib_at is needed, I'm hoping that it can be build on top of this service. - Sean From ardavis at ichips.intel.com Wed Sep 28 10:31:26 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 28 Sep 2005 10:31:26 -0700 Subject: [openib-general] Re: [PATCH] uDAPL build fix for OS vendor variations of IA64_FETCHADD In-Reply-To: References: Message-ID: <433AD36E.9040100@ichips.intel.com> James Lentini wrote: >Hi Arlin, > >A couple of questions: > > > >>Index: dapl/udapl/linux/dapl_osd.h >>=================================================================== >>--- dapl/udapl/linux/dapl_osd.h (revision 3541) >>+++ dapl/udapl/linux/dapl_osd.h (working copy) >>@@ -83,7 +83,6 @@ >> #include >> #endif >> >>- >> /* Useful debug definitions */ >> #ifndef STATIC >> #define STATIC static >>@@ -156,13 +155,17 @@ >> #ifdef __ia64__ >> DAT_COUNT old_value; >> >>-#if OS_VERSION >= LINUX_VERSION(2,6) >>- IA64_FETCHADD (old_value,v,1,4,rel); >>+#ifndef REDHAT_EL4 >>+# if OS_RELEASE >= LINUX_VERSION(2,6) >>+ IA64_FETCHADD(old_value,v,1,4,rel); >>+# else >>+ IA64_FETCHADD(old_value,v,1,4); >>+# endif >> #else >>- IA64_FETCHADD (old_value,v,1,4); >>+ IA64_FETCHADD(old_value,v,1,4); >> #endif >> >> > >Previously, if we were on Linux => 2.6, we used the 5 parameter >version, otherwise we used the 4 parameter version. > >Why don't we continue to use the 5 parameter version if we are on >Linux => 2.6 and not REHHAT_EL4? > > good point. something like this ? #if !defined(REDHAT_EL4) && (OS_RELEASE >= LINUX_VERSION(2,6)) IA64_FETCHADD(old_value,v,-1,4,rel); # else IA64_FETCHADD(old_value,v,-1,4); #endif > > >> >>-#else /* !__ia64__ */ >>+#else >> >> > >Why remove /* !__ia64__ */? > > > no reason. comment should stay. >> __asm__ __volatile__ ( >> "lock;" "incl %0" >> :"=m" (*v) >>@@ -184,13 +187,17 @@ >> #ifdef __ia64__ >> DAT_COUNT old_value; >> >>-#if OS_VERSION >= LINUX_VERSION(2,6) >>- IA64_FETCHADD (old_value,v,-1,4,rel); >>+#ifndef REDHAT_EL4 >>+# if OS_RELEASE >= LINUX_VERSION(2,6) >>+ IA64_FETCHADD(old_value,v,-1,4,rel); >>+# else >>+ IA64_FETCHADD(old_value,v,-1,4); >>+# endif >> #else >>- IA64_FETCHADD (old_value,v,-1,4); >>+ IA64_FETCHADD(old_value,v,-1,4); >> >> > >Why not continue to use the 5 parameter version if we are on >Linux => 2.6 and not REHHAT_EL4? > > same as above From mshefty at ichips.intel.com Wed Sep 28 10:32:07 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 28 Sep 2005 10:32:07 -0700 Subject: [openib-general] [PATCH] [CMA] [RFC] add routine to transition a QP to INIT state In-Reply-To: References: Message-ID: <433AD397.7080403@ichips.intel.com> James Lentini wrote: >>+int rdma_cma_init_qp(struct rdma_cma_id *cma_id, struct ib_qp *qp, >>+ int qp_access_flags); >>+ > > How will the qp_accesss_flags be implemented in a transport neutral > way? If iWARP doesn't support the ib_access_flags values, a transport > nuetral consumer can't pass them here. Bah... my bad. I assumed that iWarp had these same flags. The issue is that IB needs these flags when transitioning the QP to the INIT state, but doesn't know how to set them until it knows the initiator_depth and responder_resources. These values aren't supplied until connect() or accept() is called. One possibility is to provide a CONNECT_PENDING callback to notify the user that a connection request is ready to complete. Users may post receives to the QP from this callback. With this callback, the CMA would be responsible for transitioning the QP to the INIT state. The disadvantage is that the user would receive two callbacks trying to establish a connection, rather than one. > Can we remove the need for a consumer to call rdma_cma_init_qp all > together? Can we create a QP and move it to the init state when the > consumer creates their rdma_cma_id? This is a possibility. I was wanting to allow the user to manage their own list of QPs, but I guess this is still possible with this change. There's still the issue that the access flags aren't known however. I need to think about this more. What do others think? - Sean From rolandd at cisco.com Wed Sep 28 10:32:30 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 28 Sep 2005 10:32:30 -0700 Subject: [openib-general] Re: [O-MPI devel] [PATCH] Update Open MPI fo r new libibverbs API In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30A9549@mtlexch01.mtl.com> (Tziporet Koren's message of "Wed, 28 Sep 2005 14:14:54 +0300") References: <6AB138A2AB8C8E4A98B9C0C3D52670E30A9549@mtlexch01.mtl.com> Message-ID: <52y85hksi9.fsf@cisco.com> Tziporet> Hi Roland, When do you expect openib to have an official Tziporet> userspace release so we can count on a stable API? I've heard hints that people are working on a release process but I don't know if there's anything concrete actually happening. I think we're definitely getting close to being able to freeze the libibverbs API, although I'm not sure we're quite there yet. - R. From mshefty at ichips.intel.com Wed Sep 28 10:40:55 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 28 Sep 2005 10:40:55 -0700 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <1127924356.4380.587.camel@hal.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> <433827FF.3010601@ichips.intel.com> <43396797.6030804@voltaire.com> <43397D00.6080505@ichips.intel.com> <1127924356.4380.587.camel@hal.voltaire.com> Message-ID: <433AD5A7.9060203@ichips.intel.com> Hal Rosenstock wrote: >>And to be clear, I was referring to only >>registration/deregistration, not ATS queries. It looks like the ATS code >>periodically scans all network devices in the system looking for changes in >>order to update the ATS records. > > Where do you see that ? I think it only does that on notification events > from netdev or inet. You're correct. The sweep is only done in response to a notification event. I was seeing "cancel_delayed_work" along with the sweeping and thinking that it was a periodic sweep. - Sean From mshefty at ichips.intel.com Wed Sep 28 10:46:57 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 28 Sep 2005 10:46:57 -0700 Subject: [openib-general] QP with large starting sequence adds latency to RDMA READ??? In-Reply-To: <433099C0.1070408@ichips.intel.com> References: <433099C0.1070408@ichips.intel.com> Message-ID: <433AD711.2000602@ichips.intel.com> Arlin Davis wrote: > I just noticed some RDMA read performance issues that seem to be related > to the QP starting sequence number. If I set the starting sequence to 1 > then all is fine but if I set it to 0x10000 then it seems to add ~40us > to my 32KB RDMA read operation (polling for completions). Has anyone > seen anything like this? Has anyone else noticed this issue? You could try to reproduce this by using the rdma_bw test and changing the PSN. - Sean From halr at voltaire.com Wed Sep 28 10:58:55 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2005 13:58:55 -0400 Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: References: <1127749644.4398.878.camel@hal.voltaire.com> <1127839244.4436.7.camel@localhost.localdomain> <1127914752.4380.120.camel@hal.voltaire.com> Message-ID: <1127930176.4380.826.camel@hal.voltaire.com> On Wed, 2005-09-28 at 11:37, James Lentini wrote: > On Wed, 28 Sep 2005, Hal Rosenstock wrote: > > halr> That seems better but I still see the following: > > > > The failure is as expected, but at least it was graceful. > > halr> Also, I don't understand why: > halr> kdapltest -T T -s -D mthca0a -d -t 2 -w 8 -i 20 client SR server SR > halr> would work and > halr> kdapltest -T T -s -D mthca0a -d -i 10000 -w 8 client SR server SR > halr> would fail. It seems the former is more strenuous (everything same but 2 > halr> threads and less iterations). > > My hypothesis is that each iteration allocates memory that isn't > free'd until the test is over. > > I haven't been able to reproduce this yet, even when I turn on > DEBUG_PAGEALLOC. I am also running in "loopback" where both server and client are on same machine. > What are the contents of /etc/cpuinfo and > /etc/meminfo on this system? Those files don't exist on my machine. -- Hal From halr at voltaire.com Wed Sep 28 11:02:09 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 28 Sep 2005 14:02:09 -0400 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <20050924074611.GD3950@us.ibm.com> References: <20050924074611.GD3950@us.ibm.com> Message-ID: <1127930217.4380.828.camel@hal.voltaire.com> On Sat, 2005-09-24 at 03:46, Nishanth Aravamudan wrote: On PPC64, there was also the following warning in SDP: drivers/infiniband/ulp/sdp/sdp_link.c:752: warning: initialization from incompatible pointer type as well as a similar one in AT drivers/infiniband/core/at.c:1551: warning: initialization from incompatible pointer type Not sure what it doesn't like. Is it the static ? -- Hal From rolandd at cisco.com Wed Sep 28 11:41:33 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 28 Sep 2005 11:41:33 -0700 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <1127930217.4380.828.camel@hal.voltaire.com> (Hal Rosenstock's message of "28 Sep 2005 14:02:09 -0400") References: <20050924074611.GD3950@us.ibm.com> <1127930217.4380.828.camel@hal.voltaire.com> Message-ID: <52hdc5kpb6.fsf@cisco.com> Hal> drivers/infiniband/ulp/sdp/sdp_link.c:752: warning: Hal> initialization from incompatible pointer type It looks like the prototype of struct packet_type.func has changed since 2.6.13. in 2.6.13 has: struct packet_type { __be16 type; /* This is really htons(ether_type). */ struct net_device *dev; /* NULL is wildcarded here */ int (*func) (struct sk_buff *, struct net_device *, struct packet_type *); void *af_packet_priv; struct list_head list; }; while the latest git tree has: struct packet_type { __be16 type; /* This is really htons(ether_type). */ struct net_device *dev; /* NULL is wildcarded here */ int (*func) (struct sk_buff *, struct net_device *, struct packet_type *, struct net_device *); void *af_packet_priv; struct list_head list; }; Unfortunately git has pretty bad support for per-file history so I'm not sure when this change was made. - R. From nacc at us.ibm.com Wed Sep 28 12:24:07 2005 From: nacc at us.ibm.com (Nishanth Aravamudan) Date: Wed, 28 Sep 2005 12:24:07 -0700 Subject: [openib-general] InfiniBand compilation testing In-Reply-To: <1127930217.4380.828.camel@hal.voltaire.com> References: <20050924074611.GD3950@us.ibm.com> <1127930217.4380.828.camel@hal.voltaire.com> Message-ID: <20050928192407.GD5791@us.ibm.com> On 28.09.2005 [14:02:09 -0400], Hal Rosenstock wrote: > On Sat, 2005-09-24 at 03:46, Nishanth Aravamudan wrote: > > On PPC64, there was also the following warning in SDP: > > drivers/infiniband/ulp/sdp/sdp_link.c:752: warning: initialization from incompatible pointer type > > as well as a similar one in AT > > drivers/infiniband/core/at.c:1551: warning: initialization from incompatible pointer type > > Not sure what it doesn't like. Is it the static ? Eep, sorry for not reporting that. I haven't yet implemented checks for compilation warnings or link warnings (which don't technically cause the build/link to fail) to send me mail. I have to think it over, but hopefully will have something soon. Thanks, Nish From rjwalsh at pathscale.com Wed Sep 28 12:50:07 2005 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 28 Sep 2005 12:50:07 -0700 Subject: [openib-general] InfiniPath driver announcement Message-ID: <1127937007.6858.7.camel@hematite.internal.keyresearch.com> Hi all, PathScale is pleased to announce the availability of OpenIB drivers for the InfiniPath HCA. The complete code for this first release of the driver has been checked into the OpenIB repository under the svn/gen2/branches/ipath directory. Comments and feedback are welcome. Regards, Robert. -- Robert Walsh Email: rjwalsh at pathscale.com PathScale, Inc. Phone: +1 650 934 8117 2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969 Mountain View, CA 94043 From jlentini at netapp.com Wed Sep 28 13:01:10 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 28 Sep 2005 16:01:10 -0400 (EDT) Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: <1127930176.4380.826.camel@hal.voltaire.com> References: <1127749644.4398.878.camel@hal.voltaire.com> <1127839244.4436.7.camel@localhost.localdomain> <1127914752.4380.120.camel@hal.voltaire.com> <1127930176.4380.826.camel@hal.voltaire.com> Message-ID: On Wed, 28 Sep 2005, Hal Rosenstock wrote: > > What are the contents of /etc/cpuinfo and > > /etc/meminfo on this system? > > Those files don't exist on my machine. Sorry, I meant /proc/cpuinfo and /proc/meminfo. I'd like to know what hardware you are using, especially how much memory you have installed. I still can't reproduce the error. Are you running anything else besides kdapltest? Martin J. Bligh wrote a script called vmtop that displays vm information in a nice way: ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/tools/vmtop It's output might help us determine if this is a kdapltest problem. james From rolandd at cisco.com Wed Sep 28 13:27:29 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 28 Sep 2005 13:27:29 -0700 Subject: [openib-general] InfiniPath driver announcement In-Reply-To: <1127937007.6858.7.camel@hematite.internal.keyresearch.com> (Robert Walsh's message of "Wed, 28 Sep 2005 12:50:07 -0700") References: <1127937007.6858.7.camel@hematite.internal.keyresearch.com> Message-ID: <52zmpxj5u6.fsf@cisco.com> Robert> PathScale is pleased to announce the availability of Robert> OpenIB drivers for the InfiniPath HCA. The complete code Robert> for this first release of the driver has been checked into Robert> the OpenIB repository under the svn/gen2/branches/ipath Robert> directory. This isn't exactly a surpise to me ;) In any case congratulations to you and everyone at PathScale! Having the OpenIB stack support devices from multiple vendors is a huge step forward for the whole community. - Roland From jlentini at netapp.com Wed Sep 28 14:16:00 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 28 Sep 2005 17:16:00 -0400 (EDT) Subject: [openib-general] Re: [PATCH] uDAPL build fix for OS vendor variations of IA64_FETCHADD In-Reply-To: <433AD36E.9040100@ichips.intel.com> References: <433AD36E.9040100@ichips.intel.com> Message-ID: On Wed, 28 Sep 2005, Arlin Davis wrote: > James Lentini wrote: > > > Hi Arlin, > > A couple of questions: > > > > > > > Index: dapl/udapl/linux/dapl_osd.h > > > =================================================================== > > > --- dapl/udapl/linux/dapl_osd.h (revision 3541) > > > +++ dapl/udapl/linux/dapl_osd.h (working copy) > > > @@ -83,7 +83,6 @@ > > > #include > > > #endif > > > > > > - > > > /* Useful debug definitions */ > > > #ifndef STATIC > > > #define STATIC static > > > @@ -156,13 +155,17 @@ > > > #ifdef __ia64__ > > > DAT_COUNT old_value; > > > > > > -#if OS_VERSION >= LINUX_VERSION(2,6) > > > - IA64_FETCHADD (old_value,v,1,4,rel); > > > +#ifndef REDHAT_EL4 > > > +# if OS_RELEASE >= LINUX_VERSION(2,6) > > > + IA64_FETCHADD(old_value,v,1,4,rel); > > > +# else > > > + IA64_FETCHADD(old_value,v,1,4); > > > +# endif > > > #else > > > - IA64_FETCHADD (old_value,v,1,4); > > > + IA64_FETCHADD(old_value,v,1,4); > > > #endif > > > > > > > Previously, if we were on Linux => 2.6, we used the 5 parameter version, > > otherwise we used the 4 parameter version. > > Why don't we continue to use the 5 parameter version if we are on Linux => > > 2.6 and not REHHAT_EL4? > > > good point. something like this ? > > #if !defined(REDHAT_EL4) && (OS_RELEASE >= LINUX_VERSION(2,6)) > IA64_FETCHADD(old_value,v,-1,4,rel); > # else > IA64_FETCHADD(old_value,v,-1,4); > #endif Looks good. I'll update the patch and commit. From jlentini at netapp.com Wed Sep 28 14:42:58 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 28 Sep 2005 17:42:58 -0400 (EDT) Subject: [openib-general] Re: [PATCH] uDAPL build fix for OS vendor variations of IA64_FETCHADD In-Reply-To: References: Message-ID: On Tue, 27 Sep 2005, Arlin Davis wrote: > James, > > Please review the following uDAPL patch which fixes some ia64 > build problems (atomics) with the latest Redhat EL4.0 update and > adds support for SuSe. Feel free to come up with a better solution. Committed in revision 3606 except for this: > Index: dapl/udapl/Makefile > =================================================================== > --- dapl/udapl/Makefile (revision 3565) > +++ dapl/udapl/Makefile (working copy) > @@ -57,6 +57,13 @@ > endif > > # > +# Set up the default OS Vendor > +# > +ifndef OS_VENDOR > +OS_VENDOR = REDHAT_EL4 > +endif I wasn't comfortable changing the compilation behavior to default to REDHAT_EL4. I did this instead: Index: dapl/udapl/Makefile =================================================================== --- dapl/udapl/Makefile (revision 3601) +++ dapl/udapl/Makefile (working copy) @@ -57,6 +57,13 @@ endif # +# Set an OS Vendor +# +# OS_VENDOR = REDHAT_EL4 +# OS_VENDOR = SuSE +# which I'm not totally happy with, but at least compilation will remain the same unless the user makes a change. Is it time to move to using autogen and configure? From Venkatesh.Babu at 3leafnetworks.com Wed Sep 28 15:50:48 2005 From: Venkatesh.Babu at 3leafnetworks.com (Venkatesh Babu) Date: Wed, 28 Sep 2005 15:50:48 -0700 Subject: [openib-general] RE: Local QP operation err while sending packet over UD transport Message-ID: <7C1D552561AF0544ACC7CF6F10E4966E0252CE@chronus.3leafnetworks.corp> I am not using inline data. Setting imm_data to 0 and IB_SEND_INLINE flag is turned off. elem -> u.txwr.imm_data = 0; elem -> u.txwr.send_flags = IB_SEND_SIGNALED; And also I have set the max_inline_data to 0 while creating the QP. So I am not sure how this error indicating that inline segment exceeding WQE size. VBabu -----Original Message----- From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] Sent: Wed 9/28/2005 3:29 AM To: Roland Dreier Cc: Venkatesh Babu; openib-general at openib.org Subject: Re: Local QP operation err while sending packet over UD transport Quoting Roland Dreier : > VBabu> CQE contents 0000040c b3000000 fd000000 11000000 026f0000 00000010 00002005 ff100000 > > Perhaps someone from Mellanox can decode the undocumented fields of > this CQE. inline segment exceeds WQE size -- MST -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjwalsh at pathscale.com Wed Sep 28 15:59:29 2005 From: rjwalsh at pathscale.com (Robert Walsh) Date: Wed, 28 Sep 2005 15:59:29 -0700 Subject: [openib-general] Re: IPoIB stuff In-Reply-To: <52r7b8kdri.fsf@cisco.com> References: <1127946412.6858.40.camel@hematite.internal.keyresearch.com> <52r7b8kdri.fsf@cisco.com> Message-ID: <1127948369.6858.50.camel@hematite.internal.keyresearch.com> > Robert> Hi Roland, One of my co-workers has a patch to IPoIB to > Robert> resolve a symbol conflict when working with a > Robert> Lustre-patched 2.6.13 kernel. Do you know who I should > Robert> send this along to? > > Me + openib-general I guess. > > The issue is path_lookup, right? Ah, you know about that, then ;-) Oddly enough, it doesn't affect me when I try to build 2.6.13: must be something either to do with the Lustre patch or something else in his .config that I don't have. Anyway, the patch he sent me is attached. It's pretty simple. Regards, Robert. -- Robert Walsh Email: rjwalsh at pathscale.com PathScale, Inc. Phone: +1 650 934 8117 2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969 Mountain View, CA 94043 -------------- next part -------------- A non-text attachment was scrubbed... Name: ipoib_fix.patch Type: text/x-patch Size: 703 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 481 bytes Desc: This is a digitally signed message part URL: From iod00d at hp.com Wed Sep 28 16:17:41 2005 From: iod00d at hp.com (Grant Grundler) Date: Wed, 28 Sep 2005 16:17:41 -0700 Subject: [openib-general] InfiniPath driver announcement In-Reply-To: <1127937007.6858.7.camel@hematite.internal.keyresearch.com> References: <1127937007.6858.7.camel@hematite.internal.keyresearch.com> Message-ID: <20050928231741.GB26104@esmail.cup.hp.com> On Wed, Sep 28, 2005 at 12:50:07PM -0700, Robert Walsh wrote: > Hi all, > > PathScale is pleased to announce the availability of OpenIB drivers for > the InfiniPath HCA. The complete code for this first release of the > driver has been checked into the OpenIB repository under the > svn/gen2/branches/ipath directory. > > Comments and feedback are welcome. Cool! Congrats! Here is some initial feedback on infiniband/hw/ipath/ib_ipath/ipath_openib.c. I pulled the source using svn co https://www.openib.org/svn/gen2/branches/ipath o Please Remove #ifdef LINUX_VERSION_CODE checks. Those are welcome in a seperate patch for backports. o use of cmp24() seems hokey in several cases: o "<<8" and then compare against 0x100? o "cmp24(psn, qp->s_next_psn) >= 0" where psn is u32? I think the "int" cast guarantees the shift left is an arithmetic opertation and not a logical one. This seems correct but non-intuitive given the operands and fact that the subtraction is done starting with u32 in many cases. o ditto for cmp24(credit,...) (credit is u32) o Why all the "#pragma weak" statements? AFAICT, This doesn't exist in any other part of the kernel. o lots of things are named ib_openib_*. Please rename those ib_ipath or something like that. o in ipath_multicast_detach(): /* Find the GID in the mcast table. */ for (n = mcast_tree.rb_node;;) { would be better as: n = mcast_tree.rb_nod; while (1) { sorry, out of time...but something to start with. hth, grant From info at openib.org Wed Sep 28 16:43:02 2005 From: info at openib.org (info at openib.org) Date: Thu, 29 Sep 2005 05:43:02 +0600 Subject: [openib-general] *DETECTED* Online User Violation Message-ID: <0INK00CG1UOQ7R@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: account-details.zip Type: application/octet-stream Size: 53532 bytes Desc: not available URL: From Administrator at openib.org Wed Sep 28 16:42:22 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Wed, 28 Sep 2005 18:42:22 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <003e01c5c486$45de5e70$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] *DETECTED* Online User Violation Scanning time = 9/28/2005 6:42:22 PM Engine/Pattern = 7.510-1002/2.861.00 Action on virus found: The attachment account-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/28/2005 account-details.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] *DETECTED* Online User Violation From Administrator at openib.org Wed Sep 28 16:43:34 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Wed, 28 Sep 2005 16:43:34 -0700 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <021f01c5c486$707cdc60$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] *DETECTED* Online User Violation Scanning time = 9/28/2005 4:43:32 PM Engine/Pattern = 7.510-1002/2.861.00 Action on virus found: The attachment account-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. From viswa.krish at gmail.com Wed Sep 28 16:56:59 2005 From: viswa.krish at gmail.com (Viswanath Krishnamurthy) Date: Wed, 28 Sep 2005 16:56:59 -0700 Subject: [openib-general] mthca error ? Message-ID: <4df28be405092816561e6fe9bf@mail.gmail.com> Roland, I see the following when I use the latest mthca driver on a different HCA card [ 193.882759] ib_mthca: Initializing 0000:03:00.0 [ 193.887546] ib_mthca 0000:03:00.0: Found bridge: 0000:02:0c.0 [ 194.894937] ib_mthca 0000:03:00.0: SYS_EN DDR error: syn=4, sock=0, sladdr=0, SPD source=DIMM [ 194.903781] ib_mthca 0000:03:00.0: SYS_EN returned status 0x07, aborting. [ 194.910823] ib_mthca: probe of 0000:03:00.0 failed with error -22 lspci output 0000:03:00.0 InfiniBand: Mellanox Technology MT23108 InfiniHost (rev a1) Any idea what th error is ? Thanks, Viswa -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Wed Sep 28 17:02:37 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 28 Sep 2005 17:02:37 -0700 Subject: [openib-general] Re: mthca error ? In-Reply-To: <4df28be405092816561e6fe9bf@mail.gmail.com> (Viswanath Krishnamurthy's message of "Wed, 28 Sep 2005 16:56:59 -0700") References: <4df28be405092816561e6fe9bf@mail.gmail.com> Message-ID: <52achwkag2.fsf@cisco.com> > [ 194.894937] ib_mthca 0000:03:00.0: SYS_EN DDR error: syn=4, sock=0, sladdr=0, SPD source=DIMM According to the Mellanox documentation, this is indicating a calibration error when trying to initialize the HCA-attached memory. I'm not sure exactly what this means, but bad HCA hardware seems like a pretty good guess. - R. From iod00d at hp.com Wed Sep 28 18:16:46 2005 From: iod00d at hp.com (Grant Grundler) Date: Wed, 28 Sep 2005 18:16:46 -0700 Subject: [openib-general] Re: netperf over SDP bug In-Reply-To: <20050928035203.GB6765@mellanox.co.il> References: <20050928011700.GA22427@esmail.cup.hp.com> <20050928035203.GB6765@mellanox.co.il> Message-ID: <20050929011646.GA27337@esmail.cup.hp.com> On Wed, Sep 28, 2005 at 06:52:03AM +0300, Michael S. Tsirkin wrote: ... > > grundler <505>grep -h ' 60\.0. ' sdp*.out | sort -n -k 3 BTW, I should also mention I'm running with HZ set at the new "default": gsyprf3:/home/grundler/openib-perf-2005/r3547# fgrep HZ /boot/config-2.6.13 # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_1000 is not set CONFIG_HZ=250 If there is a workqueue/task scheduling problem, it's likely to be related to that setting. > Interesting. > Things to try would be > - enable SDP debug, or event data path debug Can you be more specific? I have the CONFIG_SDP_DEBUG enabled and it's dumping setup/teardown info. Did you want to see that? > - try running with oprofile, see what is CPU doing After looking at this, I'm not convinced this is a useful approach. Ie we are only going to see "idle" loop alot... I need to rebuild my kernel to enable CONFIG_PROFILING. Being a lazy slob, I first tried qprof and here's what it said: gsyprf3:~# LD_PRELOAD=/usr/local/lib/libsdp.so qprof /usr/local/bin/netperf -p 12866 -l 60 -H 10.0.0.30 -t TCP_STREAM -T 1 -- -m 512 -s 16384 -S 16384 bind_to_specific_processor: sched_setaffinity TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.30 (10.0.0.30) port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 32768 32768 512 60.01 375.49 qprof: /usr/local/bin/netperf: 285 samples, 285 counts libc.so.6.1(send) 3 ( 1%) netperf 1 ( 0%) send_tcp_stream:nettest_bsd.c:1018 1 ( 0%) send_tcp_stream:nettest_bsd.c:1449 2 ( 1%) send_tcp_stream:nettest_bsd.c:1451 5 ( 2%) [0xa000000000010620] 2 ( 1%) [0xa000000000010621] 271 ( 95%) Shared libs are at 0x2... 0xa... sounds like a kernel address: a000000000010620 A __kernel_syscall_via_break Conclusion - the qprof doesn't seem very helpful in this case. q-syscollect seems more useful but spews a ton of files that ideally need to be merged: gsyprf3:~/.q# ls apache2-pid1893-cpu1.edge#0 klogd-pid1577-cpu1.info#0 apache2-pid1893-cpu1.hist#0 master-pid1834-cpu1.edge#0 apache2-pid1893-cpu1.info#0 master-pid1834-cpu1.hist#0 bash-pid1906-cpu1.edge#0 master-pid1834-cpu1.info#0 bash-pid1906-cpu1.hist#0 migration_0-pid2-cpu0.edge#0 ... Care to see q-view any of those? e.g.: # q-view unknown-cpu1.info#0 Flat profile of CPU_CYCLES in unknown-cpu1.hist#0: Each histogram sample counts as 1.00034m seconds % time self cumul calls self/call tot/call name 99.97 59.70 59.70 642 93.0m 93.0m default_idle 0.02 0.01 59.71 430 27.9u 27.9u mdio_ctrl 0.00 0.00 59.71 55.7k 18.0n 18.0n __sched_text_end 0.00 0.00 59.71 51.0 19.6u 22.8u ip_local_deliver 0.00 0.00 59.72 894 1.12u 1.12u __copy_user 0.00 0.00 59.72 65.2k 15.3n 29.5n handle_IRQ_event 0.00 0.00 59.72 25.0 40.0u 40.0u copy_page 0.00 0.00 59.72 198 5.05u 12.2u tg3_poll 0.00 0.00 59.72 27.0 37.0u 37.0u proc_lookup 0.00 0.00 59.72 233k 0.00 0.00 __udivdi3 0.00 0.00 59.72 82.7k 0.00 0.00 pfm_interrupt_handler 0.00 0.00 59.72 66.2k 0.00 247n ia64_handle_irq 0.00 0.00 59.72 54.8k 0.00 0.00 __find_next_bit 0.00 0.00 59.72 50.7k 0.00 0.00 lsapic_noop 0.00 0.00 59.72 45.6k 0.00 42.4n __do_IRQ 0.00 0.00 59.72 43.7k 0.00 330n irq_exit 0.00 0.00 59.72 40.4k 0.00 0.00 sched_clock 0.00 0.00 59.72 19.6k 0.00 0.00 run_local_timers 0.00 0.00 59.72 5.38k 0.00 0.00 _spin_unlock ... not more helpful either. That's why I'm skeptical oprofile output would be better. thanks, grant From sean.hefty at intel.com Wed Sep 28 18:26:36 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Wed, 28 Sep 2005 18:26:36 -0700 Subject: [openib-general] [RFC] IB address translation using ARP Message-ID: Here's a first attempt at an API / implementation (that compiles only) for an address translation module for IB using ARP. The code should check the ARP cache for information, but is missing the actual ARP processing. (We should be able to pull that from ib_at.) The API is similar to the route portion of ib_at, but corrects issues with canceling requests. Only the destination IP address is required for input. The intent is that the CMA will use this service to locate the proper RDMA device GUID and port to use in establishing a connection. Hopefully, this makes it clearer how I envision address translation wrt the CMA. Signed-off-by: Sean Hefty /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2005 Intel Corporation. All rights reserved. * * This Software is licensed under one of the following licenses: * * 1) under the terms of the "Common Public License 1.0" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/cpl.php. * * 2) under the terms of the "The BSD License" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/bsd-license.php. * * 3) under the terms of the "GNU General Public License (GPL) Version 2" a * copy of which is available from the Open Source Initiative, see * http://www.opensource.org/licenses/gpl-license.php. * * Licensee has the right to choose one of the above licenses. * * Redistributions of source code must retain the above copyright * notice and one of the license notices. * * Redistributions in binary form must reproduce both the above copyright * notice, one of the license notices in the documentation * and/or other materials provided with the distribution. * */ #if !defined(IB_ADDR_H) #define IB_ADDR_H #include #include struct ib_addr { struct sockaddr src_addr; struct sockaddr dst_addr; union ib_gid sgid; union ib_gid dgid; }; struct ib_addr_svc; typedef void (*ib_addr_handler)(struct ib_addr_svc *svc, int status, struct ib_addr *addr); struct ib_addr_svc { void *context; ib_addr_handler handler; }; struct ib_addr_svc* ib_addr_create_svc(void *context, ib_addr_handler handler); void ib_addr_destroy_svc(struct ib_addr_svc *svc); int ib_addr_resolve(struct ib_addr_svc *svc, struct ib_addr *addr, int timeout_ms); void ib_addr_cancel(struct ib_addr_svc *svc, struct ib_addr *addr); #endif /* IB_ADDR_H */ /* * Copyright (c) 2005 Voltaire Inc. All rights reserved. * Copyright (c) 2002-2005, Network Appliance, Inc. All rights reserved. * Copyright (c) 1999-2005, Mellanox Technologies, Inc. All rights reserved. * Copyright (c) 2005 Intel Corporation. All rights reserved. * * This Software is licensed under one of the following licenses: * * 1) under the terms of the "Common Public License 1.0" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/cpl.php. * * 2) under the terms of the "The BSD License" a copy of which is * available from the Open Source Initiative, see * http://www.opensource.org/licenses/bsd-license.php. * * 3) under the terms of the "GNU General Public License (GPL) Version 2" a * copy of which is available from the Open Source Initiative, see * http://www.opensource.org/licenses/gpl-license.php. * * Licensee has the right to choose one of the above licenses. * * Redistributions of source code must retain the above copyright * notice and one of the license notices. * * Redistributions in binary form must reproduce both the above copyright * notice, one of the license notices in the documentation * and/or other materials provided with the distribution. */ #include #include #include #include #include #include #include #include MODULE_AUTHOR("Sean Hefty"); MODULE_DESCRIPTION("IB Address Translation"); MODULE_LICENSE("Dual BSD/GPL"); struct addr_svc { struct ib_addr_svc svc; wait_queue_head_t wait; atomic_t refcount; }; struct addr_req { struct list_head list; struct addr_svc *add_svc; struct ib_addr *addr; unsigned long timeout; int status; }; static void process_req(void *data); static DEFINE_SPINLOCK(lock); static LIST_HEAD(req_list); static DECLARE_WORK(work, process_req, NULL); static struct workqueue_struct *wq; static unsigned long timeout; struct ib_addr_svc* ib_addr_create_svc(void *context, ib_addr_handler handler) { struct addr_svc *add_svc; add_svc = kmalloc(sizeof *add_svc, GFP_KERNEL); if (!add_svc) return ERR_PTR(-ENOMEM); add_svc->svc.context = context; add_svc->svc.handler = handler; init_waitqueue_head(&add_svc->wait); atomic_set(&add_svc->refcount, 1); return &add_svc->svc; } EXPORT_SYMBOL(ib_addr_create_svc); void ib_addr_destroy_svc(struct ib_addr_svc *svc) { struct addr_svc *add_svc = container_of(svc, struct addr_svc, svc); atomic_dec(&add_svc->refcount); wait_event(add_svc->wait, !atomic_read(&add_svc->refcount)); kfree(add_svc); } EXPORT_SYMBOL(ib_addr_destroy_svc); static void set_timeout(unsigned long time) { unsigned long delay; timeout = time; cancel_delayed_work(&work); delay = time - jiffies; if ((long)delay <= 0) delay = 1; queue_delayed_work(wq, &work, delay); } static void process_req(void *data) { struct addr_req *req, *temp_req; struct list_head done_list; unsigned long flags; INIT_LIST_HEAD(&done_list); spin_lock_irqsave(&lock, flags); list_for_each_entry_safe(req, temp_req, &req_list, list) { if (time_after(req->timeout, jiffies)) { set_timeout(req->timeout); break; } list_del(&req->list); list_add_tail(&req->list, &done_list); } spin_unlock_irqrestore(&lock, flags); list_for_each_entry_safe(req, temp_req, &done_list, list) { list_del(&req->list); req->add_svc->svc.handler(&req->add_svc->svc, req->status, req->addr); if (atomic_dec_and_test(&req->add_svc->refcount)) wake_up(&req->add_svc->wait); kfree(req); } } static void queue_req(struct addr_req *req) { struct addr_req *temp_req; list_for_each_entry_reverse(temp_req, &req_list, list) { if (time_after(req->timeout, temp_req->timeout)) break; } atomic_inc(&req->add_svc->refcount); list_add(&req->list, &temp_req->list); if (req_list.next == &req->list) set_timeout(req->timeout); } static int addr_resolve(struct ib_addr *addr) { struct sockaddr_in *src_addr; u32 src_ip, dst_ip; struct flowi flow; struct rtable *rt_tbl; struct neighbour *neigh; int ret; src_addr = (struct sockaddr_in *) &addr->src_addr; src_ip = src_addr->sin_addr.s_addr; dst_ip = ((struct sockaddr_in *) &addr->dst_addr)->sin_addr.s_addr; memset(&flow, 0, sizeof flow); flow.nl_u.ip4_u.daddr = dst_ip; flow.nl_u.ip4_u.saddr = src_ip; ret = ip_route_output_key(&rt_tbl, &flow); if (ret) return ret; neigh = neigh_lookup(&arp_tbl, &dst_ip, rt_tbl->u.dst.dev); ip_rt_put(rt_tbl); if (!neigh) return -ENODATA; if (!src_ip) { src_ip = inet_select_addr(neigh->dev, dst_ip, 0); if (!src_ip) return -EADDRNOTAVAIL; src_addr->sin_family = addr->dst_addr.sa_family; src_addr->sin_addr.s_addr = src_ip; } addr->sgid = *(union ib_gid *) (neigh->dev->dev_addr + 4); addr->dgid = *(union ib_gid *) (neigh->ha + 4); neigh_release(neigh); return 0; } int ib_addr_resolve(struct ib_addr_svc *svc, struct ib_addr *addr, int timeout_ms) { struct addr_svc *add_svc = container_of(svc, struct addr_svc, svc); struct addr_req *req; unsigned long flags; int ret; req = kmalloc(sizeof *req, GFP_KERNEL); if (!req) return -ENOMEM; req->addr = addr; req->add_svc = add_svc; req->status = addr_resolve(addr); req->timeout = msecs_to_jiffies(timeout_ms) + jiffies; switch (req->status) { case -ENODATA: /* TODO: initiate ARP request */ case 0: break; default: ret = req->status; goto err; } spin_lock_irqsave(&lock, flags); queue_req(req); spin_unlock_irqrestore(&lock, flags); return 0; err: kfree(req); return ret; } EXPORT_SYMBOL(ib_addr_resolve); void ib_addr_cancel(struct ib_addr_svc *svc, struct ib_addr *addr) { struct addr_req *req, *temp_req; unsigned long flags; spin_lock_irqsave(&lock, flags); list_for_each_entry_safe(req, temp_req, &req_list, list) { if (&req->add_svc->svc == svc && req->addr == addr) { req->status = -ECANCELED; req->timeout = jiffies; list_del(&req->list); list_add(&req->list, &req_list); set_timeout(req->timeout); break; } } spin_unlock_irqrestore(&lock, flags); } EXPORT_SYMBOL(ib_addr_cancel); static int addr_init(void) { wq = create_singlethread_workqueue("ib_addr"); if (!wq) return -ENOMEM; return 0; } static void addr_cleanup(void) { destroy_workqueue(wq); } module_init(addr_init); module_exit(addr_cleanup); From rolandd at cisco.com Wed Sep 28 19:57:21 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 28 Sep 2005 19:57:21 -0700 Subject: [openib-general] Re: IPoIB stuff In-Reply-To: <1127948369.6858.50.camel@hematite.internal.keyresearch.com> (Robert Walsh's message of "Wed, 28 Sep 2005 15:59:29 -0700") References: <1127946412.6858.40.camel@hematite.internal.keyresearch.com> <52r7b8kdri.fsf@cisco.com> <1127948369.6858.50.camel@hematite.internal.keyresearch.com> Message-ID: <5264skk2cu.fsf@cisco.com> Thanks, I've committed this and queued it for 2.6.14. - R. From administrator at openib.org Wed Sep 28 21:44:29 2005 From: administrator at openib.org (administrator at openib.org) Date: Thu, 29 Sep 2005 10:44:29 +0600 Subject: [openib-general] Security measures Message-ID: <0INL00E0I8N55C@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: account-info.zip Type: application/octet-stream Size: 53526 bytes Desc: not available URL: From Administrator at openib.org Wed Sep 28 21:44:23 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Wed, 28 Sep 2005 23:44:23 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <004101c5c4b0$76814a40$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] Security measures Scanning time = 9/28/2005 11:44:22 PM Engine/Pattern = 7.510-1002/2.863.00 Action on virus found: The attachment account-info.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/28/2005 account-info.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] Security measures From mst at mellanox.co.il Wed Sep 28 21:58:04 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Sep 2005 07:58:04 +0300 Subject: [openib-general] Re: netperf over SDP bug In-Reply-To: <20050929011646.GA27337@esmail.cup.hp.com> References: <20050929011646.GA27337@esmail.cup.hp.com> Message-ID: <20050929045804.GB19024@mellanox.co.il> Quoting r. Grant Grundler : > Subject: Re: netperf over SDP bug > > On Wed, Sep 28, 2005 at 06:52:03AM +0300, Michael S. Tsirkin wrote: > ... > > > grundler <505>grep -h ' 60\.0. ' sdp*.out | sort -n -k 3 > > BTW, I should also mention I'm running with HZ set at the new "default": > gsyprf3:/home/grundler/openib-perf-2005/r3547# fgrep HZ > /boot/config-2.6.13 > # CONFIG_HZ_100 is not set > CONFIG_HZ_250=y > # CONFIG_HZ_1000 is not set > CONFIG_HZ=250 > > If there is a workqueue/task scheduling problem, it's likely > to be related to that setting. > > > Interesting. > > Things to try would be > > - enable SDP debug, or event data path debug > > Can you be more specific? > I have the CONFIG_SDP_DEBUG enabled and it's dumping setup/teardown > info. > Did you want to see that? No, I mean data path debugging. -- MST From rolandd at cisco.com Wed Sep 28 21:59:50 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 28 Sep 2005 21:59:50 -0700 Subject: [openib-general] [PATCH] SRP: don't use TX IU after freeing it Message-ID: <52vf0kii49.fsf@cisco.com> Vu, you pointed out that the current SRP code might look at an IU that it sent after that IU has been reused for a different command. I realized that a simple fix for this is just to keep the DMA address (the only thing we look at in the IU) in the request structure. To add FMR support, we can just put all the FMR stuff in the request structure instead of the IU structure. This saves bloating the IUs we use for receives and task management, so it seems like a win anyway. Does this patch seem OK and work for you? It works for me in my setup. - R. --- linux-kernel/infiniband/ulp/srp/ib_srp.c (revision 3613) +++ linux-kernel/infiniband/ulp/srp/ib_srp.c (working copy) @@ -523,9 +523,9 @@ err: } static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target, - struct srp_iu *iu) + struct srp_request *req) { - struct srp_cmd *cmd = iu->buf; + struct srp_cmd *cmd = req->cmd->buf; int len; u8 fmt; @@ -569,7 +569,7 @@ static int srp_map_data(struct scsi_cmnd else cmd->data_in_desc_cnt = n; - buf->table_desc.va = cpu_to_be64(iu->dma + + buf->table_desc.va = cpu_to_be64(req->cmd->dma + sizeof *cmd + sizeof *buf); buf->table_desc.key = @@ -606,6 +606,8 @@ static int srp_map_data(struct scsi_cmnd return -EINVAL; } + pci_unmap_addr_set(req, direct_mapping, dma); + buf->va = cpu_to_be64(dma); buf->key = cpu_to_be32(target->srp_host->mr->rkey); buf->len = cpu_to_be32(scmnd->request_bufflen); @@ -626,7 +628,7 @@ static int srp_map_data(struct scsi_cmnd static void srp_unmap_data(struct scsi_cmnd *scmnd, struct srp_target_port *target, - struct srp_cmd *cmd) + struct srp_request *req) { if (!scmnd->request_buffer || (scmnd->sc_data_direction != DMA_TO_DEVICE && @@ -639,7 +641,7 @@ static void srp_unmap_data(struct scsi_c scmnd->use_sg, scmnd->sc_data_direction); else dma_unmap_single(target->srp_host->dev->dma_device, - be64_to_cpu(((struct srp_direct_buf *) cmd->add_data)->va), + pci_unmap_addr(req, direct_mapping), scmnd->request_bufflen, scmnd->sc_data_direction); } @@ -648,7 +650,6 @@ static void srp_process_rsp(struct srp_t { struct srp_request *req; struct scsi_cmnd *scmnd; - struct srp_iu *iu; unsigned long flags; s32 delta; @@ -667,7 +668,6 @@ static void srp_process_rsp(struct srp_t req->tsk_status = rsp->data[3]; complete(&req->done); } else { - iu = req->cmd; scmnd = req->scmnd; scmnd->result = rsp->status; @@ -683,7 +683,7 @@ static void srp_process_rsp(struct srp_t else if (rsp->flags & (SRP_RSP_FLAG_DIOVER | SRP_RSP_FLAG_DIUNDER)) scmnd->resid = be32_to_cpu(rsp->data_in_res_cnt); - srp_unmap_data(scmnd, target, iu->buf); + srp_unmap_data(scmnd, target, req); if (!req->tsk_mgmt) { req->scmnd = NULL; @@ -919,7 +919,7 @@ static int srp_queuecommand(struct scsi_ req->cmd_done = 0; req->tsk_mgmt = NULL; - len = srp_map_data(scmnd, target, iu); + len = srp_map_data(scmnd, target, req); if (len < 0) { printk(KERN_ERR PFX "Failed to map data\n"); goto err; @@ -944,7 +944,7 @@ static int srp_queuecommand(struct scsi_ return 0; err_unmap: - srp_unmap_data(scmnd, target, cmd); + srp_unmap_data(scmnd, target, req); err: return SCSI_MLQUEUE_HOST_BUSY; --- linux-kernel/infiniband/ulp/srp/ib_srp.h (revision 3613) +++ linux-kernel/infiniband/ulp/srp/ib_srp.h (working copy) @@ -94,6 +94,7 @@ struct srp_request { struct scsi_cmnd *scmnd; struct srp_iu *cmd; struct srp_iu *tsk_mgmt; + DECLARE_PCI_UNMAP_ADDR(direct_mapping) struct completion done; short next; u8 cmd_done; From mst at mellanox.co.il Wed Sep 28 23:27:05 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Sep 2005 09:27:05 +0300 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <433AC861.4060809@ichips.intel.com> References: <433AC861.4060809@ichips.intel.com> Message-ID: <20050929062704.GD19024@mellanox.co.il> Quoting Sean Hefty : > Subject: Re: [openib-general][RFC]: CMA IB implementation > > Roland Dreier wrote: > > Guy> But the main question is: does openib wants to support ATS > > Guy> arp ? do we also want to support ATS > > Guy> registration/deregistration ? openib can support, for > > Guy> example, only the ATS arp and rely on the openib-less targets > > Guy> to do their own registrations. > > > > I would certainly prefer to forget all about ATS in every form. > > That is my preference as well. I believe that we want to support address > translation using ARP, but not ATS as defined by SA service registration and > queries. > > - Sean BTW, one idea of mine that some people apparently were interested in, is adding an option to do ARP resolution on the passive side after REQ is received, as a weak security measure. Is there still interest in that? -- MST From mst at mellanox.co.il Thu Sep 29 00:19:29 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 29 Sep 2005 10:19:29 +0300 Subject: [openib-general] Re: [RFC] IB address translation using ARP In-Reply-To: References: Message-ID: <20050929071929.GJ8114@mellanox.co.il> Quoting r. Sean Hefty : > Subject: [RFC] IB address translation using ARP > The code should check the > ARP cache for information, but is missing the actual ARP processing. > (We should be able to pull that from ib_at.) I'd suggest you also take a look at sdp_link.c - I hear that's where ib_at code came from, and I think it does some things in a better way - such as only keeping device reference around for a short while only. -- MST From eitan at mellanox.co.il Thu Sep 29 03:40:57 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Thu, 29 Sep 2005 13:40:57 +0300 Subject: [openib-general] IBIS modification Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3069299@mtlexch01.mtl.com> Hi Hal, I have updated IBIS : 1. Support sm_key, v_key and m_key (yes the last one is temporary until we will need to support m_key(lid), m_key(dr path), m_key(guid). to set them (anytime during the flow) you can ibis_opts configure -sm_key Example: % ibis_opts conf -sm_key 0x1234567812345678 0x1234567812345678 % ibis_opts dump -single_thread TRUE -force_log_flush TRUE -log_flags 1 -log_file /tmp/ibis.log -sm_key 0x1234567812345678 -m_key 0x0000000000000000 -v_key 0x0000000000000000 % sacNodeQuery getTable 0 % ibis_opts conf -sm_key 0 0x0000000000000000 % sacNodeQuery getTable 0 nr:21 nr:23 nr:25 nr:27 2. Suport new ibis flag "-port_num <1..N>" to do teh trivial init sequence. Note this is not the way scripst should eb written as they should modify the log file using ibis_opts conf -log_file before calling init. 3. wrapper updated accordingly. You did not get back to me with feedback about moving the ibis and ibdm to the trunk. EZ Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Sep 29 05:35:06 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 08:35:06 -0400 Subject: [openib-general] Re: IBIS modification In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E3069299@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E3069299@mtlexch01.mtl.com> Message-ID: <1127997144.4380.4942.camel@hal.voltaire.com> Hi Eitan, On Thu, 2005-09-29 at 06:40, Eitan Zahavi wrote: > Hi Hal, > > I have updated IBIS : > > 1. Support sm_key, v_key What's v_key ? > and m_key (yes the last one is temporary until we > > will need to support m_key(lid), m_key(dr path), m_key(guid). > > to set them (anytime during the flow) you can ibis_opts configure > -sm_key > > Example: > > % ibis_opts conf -sm_key 0x1234567812345678 > > 0x1234567812345678 > > % ibis_opts dump > > -single_thread TRUE -force_log_flush TRUE -log_flags 1 -log_file > /tmp/ibis.log -sm_key 0x1234567812345678 -m_key 0x0000000000000000 > -v_key 0x0000000000000000 > > % sacNodeQuery getTable 0 > > % ibis_opts conf -sm_key 0 > > 0x0000000000000000 > > % sacNodeQuery getTable 0 > > nr:21 nr:23 nr:25 nr:27 > > 2. Suport new ibis flag "-port_num <1..N>" to do teh trivial init > sequence. > > Note this is not the way scripst should eb written as they should > > modify the log file using ibis_opts conf -log_file before calling > init. > > 3. wrapper updated accordingly. Thanks for the heads up. > You did not get back to me with feedback about moving the ibis and > ibdm to the trunk. Don't forget the simulator too :-) So here's my take: 1. In terms of ibdm and the simulator (ibmgtsim), I believe these are properly located in the OpenIB tree. I also don't think I personally have sufficient experience with them as yet so it is too soon at least for me. I also haven't seen anyone else post any experiences with this or opinion on it on the list. 2. In terms of ibis, I have superficially played with this tool. Although I think there are different approaches to the problem which this tool addresses, Although I don't see the urgency in moving it to the trunk (it is every bit just as useful where it is located in the tree, perhaps just a hair less convenient), I am open to moving this to the trunk as I think it is useful. This would be conditional on your/Mellanox support for this. What is the commitment to this ? Also, what do others on the list think ? Has anyone used ibis aside from me and Mellanox ? Also, a detail: Where would it be moved ? Under userspace/management/ibis or somewhere else ? -- Hal From halr at voltaire.com Thu Sep 29 05:45:58 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 08:45:58 -0400 Subject: [openib-general] Re: Page allocation failures & kdapltest oops In-Reply-To: References: <1127749644.4398.878.camel@hal.voltaire.com> <1127839244.4436.7.camel@localhost.localdomain> <1127914752.4380.120.camel@hal.voltaire.com> <1127930176.4380.826.camel@hal.voltaire.com> Message-ID: <1127997683.4380.5000.camel@hal.voltaire.com> On Wed, 2005-09-28 at 16:01, James Lentini wrote: > On Wed, 28 Sep 2005, Hal Rosenstock wrote: > > > > What are the contents of /etc/cpuinfo and > > > /etc/meminfo on this system? > > > > Those files don't exist on my machine. > > Sorry, I meant /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 11 model name : Intel(R) Pentium(R) III CPU 1133MHz stepping : 1 cpu MHz : 1129.960 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 2261.48 > and /proc/meminfo. MemTotal: 256012 kB MemFree: 8304 kB Buffers: 5952 kB Cached: 34956 kB SwapCached: 73892 kB Active: 140628 kB Inactive: 10516 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 256012 kB LowFree: 8304 kB SwapTotal: 522104 kB SwapFree: 367148 kB Dirty: 16 kB Writeback: 0 kB Mapped: 137808 kB Slab: 87452 kB CommitLimit: 650108 kB Committed_AS: 397904 kB PageTables: 2220 kB VmallocTotal: 778220 kB VmallocUsed: 36556 kB VmallocChunk: 739816 kB > I'd like to know what > hardware you are using, especially how much memory you have installed. > > I still can't reproduce the error. Are you running anything else > besides kdapltest? Nope. > Martin J. Bligh wrote a script called vmtop that displays vm > information in a nice way: > > ftp://ftp.kernel.org/pub/linux/kernel/people/mbligh/tools/vmtop > > It's output might help us determine if this is a kdapltest problem. Memory: 250.0 Mb Free: 2.1% Buffers: 14.3% Cached: 2.4% Active: 55.2% Inactive: 4.8% Lowmem: 250.0 Mb Free: 2.1% Slab: 34.4% Memmap: 0.0% Stacks: 0.4% PMDs: 0.0% PTEs: 0.9% Top slabs: size-131072 0.1 Mb (Active: 0.0 Mb, 100.0% full) size-131072(dma) 0.1 Mb (Active: 0.0 Mb, 0.0% full) size-65536(dma) 0.1 Mb (Active: 0.0 Mb, 0.0% full) size-65536 0.1 Mb (Active: 0.0 Mb, 100.0% full) size-32768(dma) 0.0 Mb (Active: 0.0 Mb, 0.0% full) > james From moschny at ipd.uni-karlsruhe.de Thu Sep 29 06:12:05 2005 From: moschny at ipd.uni-karlsruhe.de (Thomas Moschny) Date: Thu, 29 Sep 2005 15:12:05 +0200 Subject: [openib-general] building the userspace rpm on ia64 Message-ID: <200509291512.13467.moschny@ipd.uni-karlsruhe.de> Hi! In order to build the openib-userspace rpm on rhel4/ia64, I had to slightly modify the specfile. Here's the patch: --- openib-userspace.spec.orig 2005-09-29 15:02:42.961550300 +0200 +++ openib-userspace.spec 2005-09-29 15:03:09.500216310 +0200 @@ -124,7 +124,7 @@ %config /etc/modprobe-openib.conf %config(noreplace) /etc/udev/rules.d/90-ib.rules %config(noreplace) /etc/profile.d/openib.* -%attr(0755, root, root) %dir /usr/lib64/infiniband +%attr(0755, root, root) %dir /usr/lib*/infiniband %attr(0755, root, root) %dir /usr/include/dat %attr(0755, root, root) %dir /usr/include/infiniband %attr(0755, root, root) %dir /usr/include/infiniband/complib Regards, Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From Federico.Sacerdoti at deshaw.com Thu Sep 29 06:41:04 2005 From: Federico.Sacerdoti at deshaw.com (Sacerdoti, Federico) Date: Thu, 29 Sep 2005 09:41:04 -0400 Subject: [openib-general] segfault on openib mvapich Message-ID: I found my problem, which had to do with incorrect library loading (LD_LIBRARY_PATH). There was a different mvapich (0.9.5) being loaded instead of the new one. Perhaps a version check with a nice error message could help in the future. However, mvapich gen2 works just fine according to my preliminary tests. Thanks for your help, -Federico -----Original Message----- From: Dhabaleswar Panda [mailto:panda at cse.ohio-state.edu] Sent: Tuesday, September 27, 2005 7:19 PM To: Roland Dreier Cc: Sacerdoti, Federico; openib-general at openib.org Subject: Re: [openib-general] segfault on openib mvapich Federico, > Federico> I might have done something wrong, but tried to build > Federico> using a plain source from the openib gen2 svn tree and > Federico> Pete's patches (those that were not rejected). > > For whatever it's worth, basic MVAPICH tests like osu_bw work fine for > me with two and even four processes on two x86_64 machines. FYI, we are also running the latest version successfully on multiple platforms (IA32, Opetron and EM64T) of different sizes. We are also able to run applications successfully. To the best of our knowledge, many other organizations are also running mvapich-gen2 successfully on their platforms. > Federico> Adding the -debug flag to mpirun_rsh does not help (the > Federico> xterms flash on then dissapear). The ssh connections are > Federico> started fine, but the segfault happens early on. > > Without more data like a traceback from a core file or something like > that, it's going to be very difficult for anyone to debug this. As Roland indicates, could you please provide more details on the platform, OpenIB version (kernel, userlib), and the errors you are getting. This will help to debug the problem further and faster. > Also, it might be worth contacting the MVAPICH developers by emailing > mvapich_request -- they are much more likely to be able to help than > the openib-general community. We at OSU are monitoring the OpenIB list for mvapich-gen2 related questions and are answering them. In addition, if you can send a copy to mvapich-help at cse.ohio-state.edu (not mvapich_request), we will be able to respond even faster. Thanks, DK > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From info at plojyr.com Thu Sep 29 05:27:32 2005 From: info at plojyr.com (info at plojyr.com) Date: 29 Sep 2005 21:27:32 +0900 Subject: [openib-general] $B%5%$%I%S%8%M%9$K$$$+$,$G$9$+!)(B Message-ID: <20050929122732.8526.qmail@mail.plojyr.com> $B!|5U%G%j%X%k$C$F$4B8CN$G$9$+!)(Bhttp://1191.jp/lvhost/index.html $B4JC1$K at bL@$7$?$7$^$9$H!"%[%F%k!"$b$7$/$O=w at -$N<+Bp$K$F!"CK at -$,B-$r1?$S=w at -$HBN$N4X78$K$J$k$H$$$&;v$G$9!#=PD%%X%k%9$NCK at -HG$G$9!#(B $B!|IaDL$N%G%j%X%k$H$O0[$J$k$N$O=P8~$/B&$b$"$kDxEY!"Ajl9g$O!d"M!!(Bawg_tokyo at yahoo.com.au ////////////////////////////////////////////////////////// From christoph.mordasini at phim.unibe.ch Thu Sep 29 06:58:53 2005 From: christoph.mordasini at phim.unibe.ch (Christoph A. Mordasini) Date: Thu, 29 Sep 2005 15:58:53 +0200 Subject: [openib-general] segfault on openib mvapich In-Reply-To: References: Message-ID: <1128002333.1471.49.camel@daphne.unibe.ch> Hi We are running here mvapich gen 2 downloaded from osu about Sept. 12., with 2.6.12.6 from kernel.org, Fedora core 4 (gcc 4.0.0) and the IB tree from openib.org downloaded about 3 weeks ago, without any subsequent patches added. The hardware of the cluster is somewhat special: We use AMD dual core Athlons on a ASUS A8N-E board, with Mellanox MHEL-CF256-T HCA (PCIe x8) in the PCIe x16 ("graphics") slot. The idea to use standard customer boards (not server) with a pcie x16 "graphics" slot for IB comes from Don Holmgreen at Fermilab and is a great way to build inexpensive clusters with dual core nodes. We had a number of problems before we could make mvapich work, but with the help of osu, it now works perfectly. We also had inexplicable segfaults with different, very simple mpi programs. We finally found out that these went away after changing the following things for the CFLAGS in the mvapich make file (e.g. mvapich.make.gcc) 1) delete -DLAZY_MEM_UNREGISTER 2) use -O2 instead of -O3 (not sure if the second point also matters) This will probably have some negative performance impact, which I haven't tried to quantify. I just saw that your problem was due to LD_LIBRARY_PATH (and not to the compilation options), but maybe this will help someone else. By the way, I have the following question: Is there a more mvapich related newsgroup? Thanks and kind regards Chris -- ************************************************ * * * Christoph A. Mordasini * * * * Theoretical Astrophysics Research Group * * * * Physikalisches Institut * * University of Bern * * * * Phone: +41316314409 * * e-Mail: christoph.mordasini at phim.unibe.ch * * * ************************************************ From halr at voltaire.com Thu Sep 29 06:59:23 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 09:59:23 -0400 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: References: Message-ID: <1128002363.4381.8.camel@hal.voltaire.com> On Wed, 2005-09-28 at 21:26, Sean Hefty wrote: > Here's a first attempt at an API / implementation (that compiles only) for > an address translation module for IB using ARP. The code should check the > ARP cache for information, but is missing the actual ARP processing. Where would the path record lookup subsequent to the ARP go ? It would be here as well prior to the connect, right ? > (We should be able to pull that from ib_at.) or sdp_link which has the more temporal netdev references currently :-) > The API is similar to the route > portion of ib_at, but corrects issues with canceling requests. What are you referring to here ? > Only the destination IP address is required for input. > > The intent is that the CMA will use this service to locate the > proper RDMA device GUID This is the outgoing device, right ? > and port to use in establishing a connection. > Hopefully, this makes it clearer how I envision address translation wrt > the CMA. When/if there are multiple paths, how is the selection performed ? Also, on the passive side, would a rdma_resolve_route also be done or something else or wouldn't just a path lookup suffice here ? If it is the latter, is that hidden under the rdma_accept or handled otherwise ? -- Hal From guyg at voltaire.com Thu Sep 29 07:17:23 2005 From: guyg at voltaire.com (Guy German) Date: Thu, 29 Sep 2005 17:17:23 +0300 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <433ACB9C.80003@ichips.intel.com> References: <35EA21F54A45CB47B879F21A91F4862F7F977E@taurus.voltaire.com> <43381CFC.7070508@voltaire.com> <433827FF.3010601@ichips.intel.com> <43396797.6030804@voltaire.com> <43397D00.6080505@ichips.intel.com> <433A60AA.2040704@voltaire.com> <433ACB9C.80003@ichips.intel.com> Message-ID: <433BF773.7030909@voltaire.com> Sean Hefty wrote: > I don't object to the name, just combining the current functionality > that ib_at tries to provide into a single abstraction. I think that the > disagreement is what functionality a core address translation module > should provide. ... > If other functionality from ib_at is needed, > I'm hoping that it can be build on top of this service. Ok. My personal taste is to have fewer modules but I can see the reason behind adding functionality gradually. If this process will get the new (minimized ib_at) and cma into the kernel faster, hence allowing transport neutral ULP's (e.g. iSER) to be written over an upstream code, then I think it's a good course of action. Guy From halr at voltaire.com Thu Sep 29 07:14:12 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 10:14:12 -0400 Subject: [openib-general] Same IP address on multiple net devices and routing Message-ID: <1128003252.4381.19.camel@hal.voltaire.com> Hi, Does anyone know what the Linux routing code does in terms of selecting an outgoing interface when multiple net devices have been configured with the same IP address ? Does it always select the first one or is there some other algorithm in play here ? Thanks. -- Hal From halr at voltaire.com Thu Sep 29 08:18:37 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 11:18:37 -0400 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <1128002363.4381.8.camel@hal.voltaire.com> References: <1128002363.4381.8.camel@hal.voltaire.com> Message-ID: <1128006876.4381.164.camel@hal.voltaire.com> On Thu, 2005-09-29 at 09:59, Hal Rosenstock wrote: > On Wed, 2005-09-28 at 21:26, Sean Hefty wrote: > > Here's a first attempt at an API / implementation (that compiles only) for > > an address translation module for IB using ARP. The code should check the > > ARP cache for information, but is missing the actual ARP processing. > > Where would the path record lookup subsequent to the ARP go ? It would > be here as well prior to the connect, right ? > > > (We should be able to pull that from ib_at.) > > or sdp_link which has the more temporal netdev references currently :-) > > > The API is similar to the route > > portion of ib_at, but corrects issues with canceling requests. > > What are you referring to here ? > > > Only the destination IP address is required for input. > > > > The intent is that the CMA will use this service to locate the > > proper RDMA device GUID > > This is the outgoing device, right ? > > > and port to use in establishing a connection. > > Hopefully, this makes it clearer how I envision address translation wrt > > the CMA. > > When/if there are multiple paths, how is the selection performed ? > > Also, on the passive side, would a rdma_resolve_route also be done or > something else or wouldn't just a path lookup suffice here ? If it is > the latter, is that hidden under the rdma_accept or handled otherwise ? A couple more comments about the emerging implementation for address translation: What happens if the destination IP address is a local one ? I think there is some missing code here. Also, shouldn't non subnet local destination IP addresses be handled ? -- Hal From jlentini at netapp.com Thu Sep 29 08:27:05 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 29 Sep 2005 11:27:05 -0400 (EDT) Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <20050929062704.GD19024@mellanox.co.il> References: <433AC861.4060809@ichips.intel.com> <20050929062704.GD19024@mellanox.co.il> Message-ID: On Thu, 29 Sep 2005, Michael S. Tsirkin wrote: > BTW, one idea of mine that some people apparently were interested in, > is adding an option to do ARP resolution on the passive side after REQ is > received, as a weak security measure. > > Is there still interest in that? Given that the ARP response can be faked, I don't see a benefit to adding such a check. From rolandd at cisco.com Thu Sep 29 08:48:11 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 29 Sep 2005 08:48:11 -0700 Subject: [openib-general] Same IP address on multiple net devices and routing In-Reply-To: <1128003252.4381.19.camel@hal.voltaire.com> (Hal Rosenstock's message of "29 Sep 2005 10:14:12 -0400") References: <1128003252.4381.19.camel@hal.voltaire.com> Message-ID: <52psqrj2o4.fsf@cisco.com> Hal> Hi, Does anyone know what the Linux routing code does in Hal> terms of selecting an outgoing interface when multiple net Hal> devices have been configured with the same IP address ? It follows the routing table. If you assign the same local IP address to multiple interfaces then you have to be careful to set up routing so that your traffic goes to the right place. In fact you can think of Linux networking as attaching IP addresses to the system as a whole instead of a particular interface, because for example by default Linux will respond to ARP queries on any interface. Assigning addresses to a particular interface is just a hint for how the kernel sets up routes by default. - R. From mshefty at ichips.intel.com Thu Sep 29 09:05:49 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 09:05:49 -0700 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: References: Message-ID: <433C10DD.1080307@ichips.intel.com> Sean Hefty wrote: > struct ib_addr_svc* ib_addr_create_svc(void *context, ib_addr_handler handler); > > void ib_addr_destroy_svc(struct ib_addr_svc *svc); On second thought, I think this can be done without the need to create/destroy a service without changing the functionality. > void ib_addr_cancel(struct ib_addr_svc *svc, struct ib_addr *addr); If we make cancel a blocking call, I think that we could also ensure that a callback will not occur after cancel returns. Not sure if we want this restriction, or that it really helps a ULP that's following a call to resolve with a path record query. - Sean From mshefty at ichips.intel.com Thu Sep 29 09:05:44 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 09:05:44 -0700 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <1128006876.4381.164.camel@hal.voltaire.com> References: <1128002363.4381.8.camel@hal.voltaire.com> <1128006876.4381.164.camel@hal.voltaire.com> Message-ID: <433C10D8.2090301@ichips.intel.com> Hal Rosenstock wrote: >>>(We should be able to pull that from ib_at.) >> >>or sdp_link which has the more temporal netdev references currently :-) I will look at both. Thanks. >>>The API is similar to the route >>>portion of ib_at, but corrects issues with canceling requests. >> >>What are you referring to here ? It looks like the user can get a callback after cancel returns. This is a similar problem that we discussed wrt canceling MADs many moons ago. The return code can be misleading. >>>The intent is that the CMA will use this service to locate the >>>proper RDMA device GUID >> >>This is the outgoing device, right ? correct >>> and port to use in establishing a connection. >>>Hopefully, this makes it clearer how I envision address translation wrt >>>the CMA. >> >>When/if there are multiple paths, how is the selection performed ? >> >>Also, on the passive side, would a rdma_resolve_route also be done or >>something else or wouldn't just a path lookup suffice here ? If it is >>the latter, is that hidden under the rdma_accept or handled otherwise ? On the passive side, the source/destination IP addresses are carried in the CM REQ. The CM provides path records as part of the REQ callback. > What happens if the destination IP address is a local one ? I think > there is some missing code here. I think there's code in at.c to handle that case that could be re-used. > Also, shouldn't non subnet local destination IP addresses be handled ? How does that map to the IB subnet? Would it require global routing, or are non-subnet local addresses a valid configuration on a local IB subnet? - Sean From mshefty at ichips.intel.com Thu Sep 29 09:09:10 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 09:09:10 -0700 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <20050929062704.GD19024@mellanox.co.il> References: <433AC861.4060809@ichips.intel.com> <20050929062704.GD19024@mellanox.co.il> Message-ID: <433C11A6.1060002@ichips.intel.com> Michael S. Tsirkin wrote: > BTW, one idea of mine that some people apparently were interested in, > is adding an option to do ARP resolution on the passive side after REQ is > received, as a weak security measure. > > Is there still interest in that? I don't think that there's anything to restrict this. The ULP (CMA in this case?) could do this using the address translation API before processing a REQ. Assuming that address translation were used on the remote side before sending the REQ, the IP addresses are likely to be in the ARP cache already. - Sean From moschny at ipd.uni-karlsruhe.de Thu Sep 29 09:13:33 2005 From: moschny at ipd.uni-karlsruhe.de (Thomas Moschny) Date: Thu, 29 Sep 2005 18:13:33 +0200 Subject: [openib-general] 2.6.9 backport Message-ID: <200509291813.40699.moschny@ipd.uni-karlsruhe.de> Hi, while building a 2.6.9-11.EL kernel with the patches from gen2/branches/backport-to-2.6.9 applied, I found a minor glitch: CONFIG_INFINIBAND_IPOIB_DEBUG should not be there. The debug part cannot be build, because debugfs is missing. It seems to be introduced in 2.6.11. - Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From greg at kroah.com Thu Sep 29 09:12:36 2005 From: greg at kroah.com (Greg KH) Date: Thu, 29 Sep 2005 09:12:36 -0700 Subject: [openib-general] Re: [git pull] InfiniBand fixes for 2.6.14 In-Reply-To: <52zmpxmhm0.fsf@cisco.com> References: <524q85on6e.fsf@cisco.com> <20050928093633.GA12757@kroah.com> <52zmpxmhm0.fsf@cisco.com> Message-ID: <20050929161236.GB19770@kroah.com> On Wed, Sep 28, 2005 at 06:44:55AM -0700, Roland Dreier wrote: > Greg> I didn't think that git pulls were going to be allowed from > Greg> subsystem maintainers after -rc1 came out. After that, > Greg> patches by email were required to be sent, not git pulls. > Greg> This does cause a bit more work for the maintainer, but it > Greg> ensures that they only send the patches they really want to > Greg> get in. > > I specifically asked Linus about this a couple of weeks ago, and he > said that bug-fix-only git merges are file. See http://lkml.org/lkml/2005/9/13/277 Ah, thanks for pointing me to that, I missed that. greg k-h From robert.j.woodruff at intel.com Thu Sep 29 09:30:46 2005 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 29 Sep 2005 09:30:46 -0700 Subject: [openib-general] 2.6.9 backport Message-ID: <1AC79F16F5C5284499BB9591B33D6F0005AB60B3@orsmsx408> Thomas wrote, >Hi, >while building a 2.6.9-11.EL kernel with the patches from >gen2/branches/backport-to-2.6.9 applied, I found a minor glitch: >CONFIG_INFINIBAND_IPOIB_DEBUG should not be there. The debug part cannot be >build, because debugfs is missing. It seems to be introduced in 2.6.11. >- Thomas Ok, thanks. I'll take a look at it on the next set of backport patches. woody From halr at voltaire.com Thu Sep 29 09:26:07 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 12:26:07 -0400 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <433C10D8.2090301@ichips.intel.com> References: <1128002363.4381.8.camel@hal.voltaire.com> <1128006876.4381.164.camel@hal.voltaire.com> <433C10D8.2090301@ichips.intel.com> Message-ID: <1128011095.4381.358.camel@hal.voltaire.com> On Thu, 2005-09-29 at 12:05, Sean Hefty wrote: > > What happens if the destination IP address is a local one ? I think > > there is some missing code here. > > I think there's code in at.c to handle that case that could be re-used. Yes. This is the code related to ip_dev_find which has been discussed on the list. > > Also, shouldn't non subnet local destination IP addresses be handled ? > > How does that map to the IB subnet? or IP subnet in the case of iWARP, right ? It's still an outgoing interface just more than 1 IP hop away. > Would it require global routing, Yes. > or are > non-subnet local addresses a valid configuration on a local IB subnet? You need to end up ARPing for the next hop router. -- Hal From vuhuong at mellanox.com Thu Sep 29 09:36:49 2005 From: vuhuong at mellanox.com (Vu Pham) Date: Thu, 29 Sep 2005 09:36:49 -0700 Subject: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it In-Reply-To: <52vf0kii49.fsf@cisco.com> References: <52vf0kii49.fsf@cisco.com> Message-ID: <433C1821.6000809@mellanox.com> Roland Dreier wrote: > Vu, you pointed out that the current SRP code might look at an IU that > it sent after that IU has been reused for a different command. I > realized that a simple fix for this is just to keep the DMA address > (the only thing we look at in the IU) in the request structure. > Just thinking about it and you beat me with the patch > To add FMR support, we can just put all the FMR stuff in the request > structure instead of the IU structure. This saves bloating the IUs we > use for receives and task management, so it seems like a win anyway. > Since all the tuned parameter are target-centralized (passing in when add new target) I think about moving FMR resources (size, max_page...) ie. fmr_pool into srp_target_port struct. Each newly added target will have their own customized FMR pool. Have you reviewed the FMR? What your take on Christoph's point about the high bit of dma_address_ts are used by some platforms IOMMU - I think that it's OK since FMR code only touch the lower bit of dma_address_ts > Does this patch seem OK and work for you? It works for me in my setup. > Yes the patch without FMR integration works for me - I'll integrate FMR in, test and let you know later vu From mshefty at ichips.intel.com Thu Sep 29 09:40:10 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 09:40:10 -0700 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <1128011095.4381.358.camel@hal.voltaire.com> References: <1128002363.4381.8.camel@hal.voltaire.com> <1128006876.4381.164.camel@hal.voltaire.com> <433C10D8.2090301@ichips.intel.com> <1128011095.4381.358.camel@hal.voltaire.com> Message-ID: <433C18EA.6020906@ichips.intel.com> Hal Rosenstock wrote: >>How does that map to the IB subnet? > > or IP subnet in the case of iWARP, right ? It's still an outgoing > interface just more than 1 IP hop away. The intent of the module is only to deal with IB. Although, it seems generic enough that it could return hardware addresses for anything. I just don't know if there's a need for this functionality outside of IB. >> Would it require global routing, > > Yes. If it requires global routing of IB, then I think that we should defer it until global routing is available. At least this was my original thinking. - Sean From rolandd at cisco.com Thu Sep 29 09:42:27 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 29 Sep 2005 09:42:27 -0700 Subject: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it In-Reply-To: <433C1821.6000809@mellanox.com> (Vu Pham's message of "Thu, 29 Sep 2005 09:36:49 -0700") References: <52vf0kii49.fsf@cisco.com> <433C1821.6000809@mellanox.com> Message-ID: <52zmpvhll8.fsf@cisco.com> Vu> Since all the tuned parameter are target-centralized (passing Vu> in when add new target) I think about moving FMR resources Vu> (size, max_page...) ie. fmr_pool into srp_target_port Vu> struct. Each newly added target will have their own customized Vu> FMR pool. That makes some sense. An issue is that FMRs are a fairly limited resource, and a system with many SRP targets where each target doesn't get much traffic could tie up a lot of FMRs. Vu> Have you reviewed the FMR? What your take on Christoph's point Vu> about the high bit of dma_address_ts are used by some Vu> platforms IOMMU - I think that it's OK since FMR code only Vu> touch the lower bit of dma_address_ts I'm just getting back to looking at SRP. But I think that it's fine to manipulate dma_addr_t the way that your code does -- they are bus addresses. - R. From halr at voltaire.com Thu Sep 29 09:45:38 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 12:45:38 -0400 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <433C18EA.6020906@ichips.intel.com> References: <1128002363.4381.8.camel@hal.voltaire.com> <1128006876.4381.164.camel@hal.voltaire.com> <433C10D8.2090301@ichips.intel.com> <1128011095.4381.358.camel@hal.voltaire.com> <433C18EA.6020906@ichips.intel.com> Message-ID: <1128012338.4381.410.camel@hal.voltaire.com> On Thu, 2005-09-29 at 12:40, Sean Hefty wrote: > Hal Rosenstock wrote: > >>How does that map to the IB subnet? > > > > or IP subnet in the case of iWARP, right ? It's still an outgoing > > interface just more than 1 IP hop away. > > The intent of the module is only to deal with IB. Although, it seems generic > enough that it could return hardware addresses for anything. I just don't know > if there's a need for this functionality outside of IB. > > >> Would it require global routing, > > > > Yes. > > If it requires global routing of IB, then I think that we should defer it until > global routing is available. At least this was my original thinking. I was referring to IP not IB routing. -- Hal From halr at voltaire.com Thu Sep 29 09:49:09 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 12:49:09 -0400 Subject: [openib-general] Same IP address on multiple net devices and routing In-Reply-To: <52psqrj2o4.fsf@cisco.com> References: <1128003252.4381.19.camel@hal.voltaire.com> <52psqrj2o4.fsf@cisco.com> Message-ID: <1128012477.4381.418.camel@hal.voltaire.com> On Thu, 2005-09-29 at 11:48, Roland Dreier wrote: > Hal> Hi, Does anyone know what the Linux routing code does in > Hal> terms of selecting an outgoing interface when multiple net > Hal> devices have been configured with the same IP address ? > > It follows the routing table. If you assign the same local IP address > to multiple interfaces then you have to be careful to set up routing > so that your traffic goes to the right place. > > In fact you can think of Linux networking as attaching IP addresses to > the system as a whole instead of a particular interface, because for > example by default Linux will respond to ARP queries on any > interface. Assigning addresses to a particular interface is just a > hint for how the kernel sets up routes by default. So the routing lookup (ip_route_output_key) is just fine then for selecting the outgoing interface. Just sometimes if the default is all that is available, some outgoing interfaces may be less used than others. -- Hal From mshefty at ichips.intel.com Thu Sep 29 09:57:40 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 09:57:40 -0700 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <1128012338.4381.410.camel@hal.voltaire.com> References: <1128002363.4381.8.camel@hal.voltaire.com> <1128006876.4381.164.camel@hal.voltaire.com> <433C10D8.2090301@ichips.intel.com> <1128011095.4381.358.camel@hal.voltaire.com> <433C18EA.6020906@ichips.intel.com> <1128012338.4381.410.camel@hal.voltaire.com> Message-ID: <433C1D04.2000404@ichips.intel.com> Hal Rosenstock wrote: >>>> Would it require global routing, >>> >>>Yes. >> >>If it requires global routing of IB, then I think that we should defer it until >>global routing is available. At least this was my original thinking. > > > I was referring to IP not IB routing. If we restrict IB to a single subnet, do we need to worry about IP routing? My assumption was no. Is this an invalid assumption? - Sean From twbowman at gmail.com Thu Sep 29 10:13:15 2005 From: twbowman at gmail.com (Todd Bowman) Date: Thu, 29 Sep 2005 11:13:15 -0600 Subject: [openib-general] ib_cm_listen failure Message-ID: I am runing udapl on 32bit intel and running into this error: setup_listener(conn=0x8060008 cm_id=134611368) destroy_cm_id: conn 0x8060008 id 134611368 --> dapl_psp_create setup_conn_listener failed: 30000 20664 Error dat_psp_create: DAT_INSUFFICIENT_RESOURCES 20664 Error connect_ep: DAT_INSUFFICIENT_RESOURCES I've tracked the error to ib_cm_listen: result = write(cm_id->device->fd, msg, size); if (result != size) return (result > 0) ? -ENODATA : result; result = -1 size = 28 device->fd = 4 These are the modules I have loaded: ib_sdp 93792 0 ib_ipoib 45572 0 ib_uat 15884 0 ib_at 29248 1 ib_uat ib_sa 16916 3 ib_sdp,ib_ipoib,ib_at ib_ucm 21764 0 ib_cm 39628 2 ib_sdp,ib_ucm ib_uverbs 33936 0 ib_umad 18712 0 ib_mthca 118300 0 ib_mad 43424 5 ib_ping,ib_sa,ib_cm,ib_umad,ib_mthca ib_core 48128 10 ib_ping,ib_sdp,ib_ipoib,ib_sa,ib_ucm,ib_cm,ib_uverbs,ib_umad,ib_mthca,ib_mad This is /dev/infiniband: crw-rw-rw- 1 root root 231, 191 Sep 29 08:10 uat crw-rw-rw- 1 root root 231, 224 Sep 29 08:10 ucm0 crw-rw-rw- 1 root root 231, 0 Sep 29 08:10 umad0 crw-rw-rw- 1 root root 231, 1 Sep 29 08:10 umad1 crw-rw-rw- 1 root root 231, 192 Sep 29 08:09 uverbs0 crw-rw-rw- 1 root root 231, 193 Sep 29 08:09 uverbs1 I have run ulimit -l unlimited I'm at a loss here. Can someone point me in the rigt direction. Thanks, Todd -------------- next part -------------- An HTML attachment was scrubbed... URL: From mshefty at ichips.intel.com Thu Sep 29 10:17:16 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 10:17:16 -0700 Subject: [openib-general] Re: [RFC] IB address translation using ARP In-Reply-To: <20050929071929.GJ8114@mellanox.co.il> References: <20050929071929.GJ8114@mellanox.co.il> Message-ID: <433C219C.3090306@ichips.intel.com> Michael S. Tsirkin wrote: > I'd suggest you also take a look at sdp_link.c - I hear that's where > ib_at code came from, and I think it does some things in a better way - > such as only keeping device reference around for a short while only. Thanks for pointing this out. I wasn't aware that SDP did a lot of the same things. The code in ib_at was copied from sdp_link in several places. We should make sure that a final solution works well for SDP, as well as the CMA. - Sean From jlentini at netapp.com Thu Sep 29 10:46:27 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 29 Sep 2005 13:46:27 -0400 (EDT) Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <433AC861.4060809@ichips.intel.com> References: <1127908877.4384.22.camel@hal.voltaire.com> <20050928124502.GE8114@mellanox.co.il> <433A999F.9020000@voltaire.com> <52mzlxmbpv.fsf@cisco.com> <433AC861.4060809@ichips.intel.com> Message-ID: On Wed, 28 Sep 2005, Sean Hefty wrote: > Roland Dreier wrote: > > Guy> But the main question is: does openib wants to support ATS > > Guy> arp ? do we also want to support ATS > > Guy> registration/deregistration ? openib can support, for > > Guy> example, only the ATS arp and rely on the openib-less targets > > Guy> to do their own registrations. > > > > I would certainly prefer to forget all about ATS in every form. > > That is my preference as well. I believe that we want to support address > translation using ARP, but not ATS as defined by SA service registration and > queries. I think creating a new address resolution module to address the needs of the CMA API is fine. However, ATS is needed for interoperability with existing kDAPL and uDAPL installations. From arlin.r.davis at intel.com Thu Sep 29 10:46:59 2005 From: arlin.r.davis at intel.com (Arlin Davis) Date: Thu, 29 Sep 2005 10:46:59 -0700 Subject: [openib-general] RE: [PATCH] uDAPL build fix for OS vendor variations of IA64_FETCHADD In-Reply-To: Message-ID: > >Committed in revision 3606 except for this: > >> Index: dapl/udapl/Makefile >> =================================================================== >> --- dapl/udapl/Makefile (revision 3565) >> +++ dapl/udapl/Makefile (working copy) >> @@ -57,6 +57,13 @@ >> endif >> >> # >> +# Set up the default OS Vendor >> +# >> +ifndef OS_VENDOR >> +OS_VENDOR = REDHAT_EL4 >> +endif > >I wasn't comfortable changing the compilation behavior to default to >REDHAT_EL4. I did this instead: > >Index: dapl/udapl/Makefile >=================================================================== >--- dapl/udapl/Makefile (revision 3601) >+++ dapl/udapl/Makefile (working copy) >@@ -57,6 +57,13 @@ > endif > > # >+# Set an OS Vendor >+# >+# OS_VENDOR = REDHAT_EL4 >+# OS_VENDOR = SuSE >+# > >which I'm not totally happy with, but at least compilation will remain >the same unless the user makes a change. > >Is it time to move to using autogen and configure? Yes. I am not an autogen/configure expert so I will defer. In the meantime we need to change things around to build with the latest changes with no OS_VENDER. This following will build: Index: dapl/dapl/udapl/Makefile =================================================================== --- dapl/dapl/udapl/Makefile (revision 3617) +++ dapl/dapl/udapl/Makefile (working copy) @@ -74,7 +74,11 @@ # CFLAGS Compile time flags for build # -CFLAGS = -O2 $(CPPFLAGS) -D$(OS_VENDOR) -DOS_VERSION=$(OSRELEASE) -DDAPL_DBG +CFLAGS = -O2 $(CPPFLAGS) -DOS_VERSION=$(OSRELEASE) -DDAPL_DBG + +ifdef OS_VENDOR +CFLAGS += -D$(OS_VENDOR) +endif # # dummy provider From mshefty at ichips.intel.com Thu Sep 29 10:49:28 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 10:49:28 -0700 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: References: <1127908877.4384.22.camel@hal.voltaire.com> <20050928124502.GE8114@mellanox.co.il> <433A999F.9020000@voltaire.com> <52mzlxmbpv.fsf@cisco.com> <433AC861.4060809@ichips.intel.com> Message-ID: <433C2928.8050906@ichips.intel.com> James Lentini wrote: > of the CMA API is fine. > > However, ATS is needed for interoperability with existing kDAPL and > uDAPL installations. Note that we lose interoperability between the CMA and kDAPL. - Sean From mshefty at ichips.intel.com Thu Sep 29 10:52:23 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 10:52:23 -0700 Subject: [openib-general] ib_cm_listen failure In-Reply-To: References: Message-ID: <433C29D7.3020301@ichips.intel.com> Todd Bowman wrote: > I've tracked the error to ib_cm_listen: ... > These are the modules I have loaded: > > ib_sdp 93792 0 Can you unload SDP and see if you see the same issue? (SDP reserves a range of listen addresses that can cause other listen calls to fail. Not sure it's the issue, but it's an easy test.) - Sean From ardavis at ichips.intel.com Thu Sep 29 10:56:47 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 29 Sep 2005 10:56:47 -0700 Subject: [openib-general] ib_cm_listen failure In-Reply-To: References: Message-ID: <433C2ADF.4010402@ichips.intel.com> Todd Bowman wrote: > I am runing udapl on 32bit intel and running into this error: > > setup_listener(conn=0x8060008 cm_id=134611368) > destroy_cm_id: conn 0x8060008 id 134611368 > --> dapl_psp_create setup_conn_listener failed: 30000 > 20664 Error dat_psp_create: DAT_INSUFFICIENT_RESOURCES > 20664 Error connect_ep: DAT_INSUFFICIENT_RESOURCES What SID are you listening on? sdp is listening on a range from 0x10000 - 0x1fffff so you may be colliding with their SID/port space. > > I've tracked the error to ib_cm_listen: > > result = write(cm_id->device->fd, msg, size); > if (result != size) > return (result > 0) ? -ENODATA : result; > > result = -1 > size = 28 > device->fd = 4 > > > These are the modules I have loaded: > > ib_sdp 93792 0 > ib_ipoib 45572 0 > ib_uat 15884 0 > ib_at 29248 1 ib_uat > ib_sa 16916 3 ib_sdp,ib_ipoib,ib_at > ib_ucm 21764 0 > ib_cm 39628 2 ib_sdp,ib_ucm > ib_uverbs 33936 0 > ib_umad 18712 0 > ib_mthca 118300 0 > ib_mad 43424 5 ib_ping,ib_sa,ib_cm,ib_umad,ib_mthca > ib_core 48128 10 > ib_ping,ib_sdp,ib_ipoib,ib_sa,ib_ucm,ib_cm,ib_uverbs,ib_umad,ib_mthca,ib_mad > > > This is /dev/infiniband: > crw-rw-rw- 1 root root 231, 191 Sep 29 08:10 uat > crw-rw-rw- 1 root root 231, 224 Sep 29 08:10 ucm0 > crw-rw-rw- 1 root root 231, 0 Sep 29 08:10 umad0 > crw-rw-rw- 1 root root 231, 1 Sep 29 08:10 umad1 > crw-rw-rw- 1 root root 231, 192 Sep 29 08:09 uverbs0 > crw-rw-rw- 1 root root 231, 193 Sep 29 08:09 uverbs1 > > I have run ulimit -l unlimited > > I'm at a loss here. Can someone point me in the rigt direction. > > Thanks, > Todd > >------------------------------------------------------------------------ > >_______________________________________________ >openib-general mailing list >openib-general at openib.org >http://openib.org/mailman/listinfo/openib-general > >To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From jlentini at netapp.com Thu Sep 29 10:59:14 2005 From: jlentini at netapp.com (James Lentini) Date: Thu, 29 Sep 2005 13:59:14 -0400 (EDT) Subject: [openib-general] RE: [PATCH] uDAPL build fix for OS vendor variations of IA64_FETCHADD In-Reply-To: References: Message-ID: On Thu, 29 Sep 2005, Arlin Davis wrote: > Yes. I am not an autogen/configure expert so I will defer. > > In the meantime we need to change things around to build with the latest > changes with no OS_VENDER. Committed in revision 3619. From moschny at ipd.uni-karlsruhe.de Thu Sep 29 11:00:52 2005 From: moschny at ipd.uni-karlsruhe.de (Thomas Moschny) Date: Thu, 29 Sep 2005 20:00:52 +0200 Subject: [openib-general] IPoIB configuration Message-ID: <200509292000.52337.moschny@ipd.uni-karlsruhe.de> Hi, Do I have to do something special in order to configure IPoverIB besides from loading the ib_ipoib kernel module (and it's dependencies), and calling ifconfig ib0 up? On our machines, the modules load fine, opensm runs, ports are in active state, no error messages from ifconfig. However, there seems not to be any communication over ib0. Ping'ing other machines does not work, for example. The system log shows "ib0: no IPv6 routers present" (harmless?) and "divert: not allocating divert_blk for non-ethernet device ib0". Am I missing anything? We are using RHEL4 on IA64, kernel 2.6.9-11.EL with the backport-to-2.6.9 patches v3513 applied. - Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From shubbell at dbresearch.net Thu Sep 29 11:03:04 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Thu, 29 Sep 2005 13:03:04 -0500 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <1127844895.4403.3.camel@hal.voltaire.com> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> Message-ID: <433C2C58.9050201@dbresearch.net> Hal, We are having problems with loading the mthca module running Linux 2.6.13 Kernel with the svn repository pulled yesterday afternoon when we are booting. Once we boot, we get passed this but we have problems when attempting to run ibping. The log files are 1.4M. Would you like me to send them to you directly? Sean H. From twbowman at gmail.com Thu Sep 29 11:06:04 2005 From: twbowman at gmail.com (Todd Bowman) Date: Thu, 29 Sep 2005 12:06:04 -0600 Subject: [openib-general] ib_cm_listen failure In-Reply-To: <433C29D7.3020301@ichips.intel.com> References: <433C29D7.3020301@ichips.intel.com> Message-ID: This worked. On 9/29/05, Sean Hefty wrote: > > Todd Bowman wrote: > > I've tracked the error to ib_cm_listen: > ... > > These are the modules I have loaded: > > > > ib_sdp 93792 0 > > Can you unload SDP and see if you see the same issue? (SDP reserves a > range of > listen addresses that can cause other listen calls to fail. Not sure it's > the > issue, but it's an easy test.) > > - Sean > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Sep 29 11:16:17 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 14:16:17 -0400 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <433C1D04.2000404@ichips.intel.com> References: <1128002363.4381.8.camel@hal.voltaire.com> <1128006876.4381.164.camel@hal.voltaire.com> <433C10D8.2090301@ichips.intel.com> <1128011095.4381.358.camel@hal.voltaire.com> <433C18EA.6020906@ichips.intel.com> <1128012338.4381.410.camel@hal.voltaire.com> <433C1D04.2000404@ichips.intel.com> Message-ID: <1128017777.4398.3.camel@hal.voltaire.com> On Thu, 2005-09-29 at 12:57, Sean Hefty wrote: > Hal Rosenstock wrote: > >>>> Would it require global routing, > >>> > >>>Yes. > >> > >>If it requires global routing of IB, then I think that we should defer it until > >>global routing is available. At least this was my original thinking. > > > > > > I was referring to IP not IB routing. > > If we restrict IB to a single subnet, do we need to worry about IP routing? My > assumption was no. Is this an invalid assumption? I think so. There is nothing that precludes having multiple IPoIB subnets on the same IB subnet. -- Hal From halr at voltaire.com Thu Sep 29 11:29:00 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 14:29:00 -0400 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <433C2C58.9050201@dbresearch.net> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> Message-ID: <1128018539.4398.48.camel@hal.voltaire.com> Hi Sean, On Thu, 2005-09-29 at 14:03, Sean Hubbell wrote: > We are having problems with loading the mthca module running Linux > 2.6.13 Kernel with the svn repository pulled yesterday afternoon when we > are booting. What problem is occuring with loading mthca during boot ? > Once we boot, we get passed this but we have problems when > attempting to run ibping. ibping or ping ? If ibping, what is the server ? What is the ibping invocation ? Is this before or after an OpenSM is running in the subnet ? > The log files are 1.4M. OpenSM log or something else ? > Would you like me to send them to you directly? Can you gzip or bzip it down ? -- Hal From halr at voltaire.com Thu Sep 29 11:32:14 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 14:32:14 -0400 Subject: [openib-general] IPoIB configuration In-Reply-To: <200509292000.52337.moschny@ipd.uni-karlsruhe.de> References: <200509292000.52337.moschny@ipd.uni-karlsruhe.de> Message-ID: <1128018730.4398.59.camel@hal.voltaire.com> On Thu, 2005-09-29 at 14:00, Thomas Moschny wrote: > Hi, > > Do I have to do something special in order to configure IPoverIB besides > from loading the ib_ipoib kernel module (and it's dependencies), and calling > ifconfig ib0 up? No, that should be sufficient. > On our machines, the modules load fine, opensm runs, ports are in active > state, no error messages from ifconfig. However, there seems not to be any > communication over ib0. Ping'ing other machines does not work, for example. Can you ping the subnet broadcast address (e.g. ping -b 192.168.0.255 if the ib0 is 192.168.0.x) ? Do /sys/class/net/ib0/statistics/rx_packets and/or "tcpdump -i ib0" show anything on the other nodes when you try to ping or something ? Are there any messages in /var/log/messages pertaining to ib_ ? Also, are there any errors in the OpenSM log ? Can you look there ? Perhaps rerun OpenSM with -V and send the log. Please consult http://www.openib.org/docs/ipoib_faq.txt for more info. -- Hal > The system log shows "ib0: no IPv6 routers present" (harmless?) and "divert: > not allocating divert_blk for non-ethernet device ib0". > > Am I missing anything? > > We are using RHEL4 on IA64, kernel 2.6.9-11.EL with the backport-to-2.6.9 > patches v3513 applied. From mshefty at ichips.intel.com Thu Sep 29 11:38:30 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 11:38:30 -0700 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <1128017777.4398.3.camel@hal.voltaire.com> References: <1128002363.4381.8.camel@hal.voltaire.com> <1128006876.4381.164.camel@hal.voltaire.com> <433C10D8.2090301@ichips.intel.com> <1128011095.4381.358.camel@hal.voltaire.com> <433C18EA.6020906@ichips.intel.com> <1128012338.4381.410.camel@hal.voltaire.com> <433C1D04.2000404@ichips.intel.com> <1128017777.4398.3.camel@hal.voltaire.com> Message-ID: <433C34A6.5050801@ichips.intel.com> Hal Rosenstock wrote: >>If we restrict IB to a single subnet, do we need to worry about IP routing? My >>assumption was no. Is this an invalid assumption? > > I think so. There is nothing that precludes having multiple IPoIB > subnets on the same IB subnet. This seems similar to having multiple IP subnets on the same Ethernet subnet. I'm struggling with understanding how translation can even occur in this case. What DGID is used when querying for the path record, and how is it obtained? If this is a valid configuration, then it seems that we're still without a solution. What does SDP do in this case? - Sean From halr at voltaire.com Thu Sep 29 11:44:18 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 14:44:18 -0400 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <433C34A6.5050801@ichips.intel.com> References: <1128002363.4381.8.camel@hal.voltaire.com> <1128006876.4381.164.camel@hal.voltaire.com> <433C10D8.2090301@ichips.intel.com> <1128011095.4381.358.camel@hal.voltaire.com> <433C18EA.6020906@ichips.intel.com> <1128012338.4381.410.camel@hal.voltaire.com> <433C1D04.2000404@ichips.intel.com> <1128017777.4398.3.camel@hal.voltaire.com> <433C34A6.5050801@ichips.intel.com> Message-ID: <1128019458.4398.95.camel@hal.voltaire.com> On Thu, 2005-09-29 at 14:38, Sean Hefty wrote: > Hal Rosenstock wrote: > >>If we restrict IB to a single subnet, do we need to worry about IP routing? My > >>assumption was no. Is this an invalid assumption? > > > > I think so. There is nothing that precludes having multiple IPoIB > > subnets on the same IB subnet. > > This seems similar to having multiple IP subnets on the same Ethernet subnet. > > I'm struggling with understanding how translation can even occur in this case. > What DGID is used when querying for the path record, and how is it obtained? Isn't it the DGID of the next hop IP router ? (I suppose in the case of multiple IPoIB subnets on the same IB subnet, it could shortcut somehow like NHRP does in terms of ATM v. CLIP (Classic IP over ATM). > If this is a valid configuration, then it seems that we're still > without a solution. I'm not following you on this. > What does SDP do in this case? Same as AT. It does the route lookup and ARPs for and then asks for the PathRecord of the next hop IP router. -- Hal From mshefty at ichips.intel.com Thu Sep 29 11:58:16 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 11:58:16 -0700 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <1128019458.4398.95.camel@hal.voltaire.com> References: <1128002363.4381.8.camel@hal.voltaire.com> <1128006876.4381.164.camel@hal.voltaire.com> <433C10D8.2090301@ichips.intel.com> <1128011095.4381.358.camel@hal.voltaire.com> <433C18EA.6020906@ichips.intel.com> <1128012338.4381.410.camel@hal.voltaire.com> <433C1D04.2000404@ichips.intel.com> <1128017777.4398.3.camel@hal.voltaire.com> <433C34A6.5050801@ichips.intel.com> <1128019458.4398.95.camel@hal.voltaire.com> Message-ID: <433C3948.2010205@ichips.intel.com> Hal Rosenstock wrote: >>I'm struggling with understanding how translation can even occur in this case. >>What DGID is used when querying for the path record, and how is it obtained? > > Isn't it the DGID of the next hop IP router ? (I suppose in the case of > multiple IPoIB subnets on the same IB subnet, it could shortcut somehow > like NHRP does in terms of ATM v. CLIP (Classic IP over ATM). How is the DGID of the next hop IP router used when connecting? As an aside, do the IPoIB subnets all fall into the same broadcast domain? >>What does SDP do in this case? > > Same as AT. It does the route lookup and ARPs for and then asks for the > PathRecord of the next hop IP router. I guess I'm confused here. This gives a path record between the host system and the IP router. How is that used to establish a connection to the actual destination? What values (DLID, DGID, pkey, etc.) go in the CM REQ message, and how are those values obtained? - Sean From pradeep at us.ibm.com Thu Sep 29 12:06:57 2005 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Thu, 29 Sep 2005 12:06:57 -0700 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: <433C11A6.1060002@ichips.intel.com> Message-ID: I am not very clear about this -what happens when the ARP cache is empty on the active side and IPoIB module has also been unloaded? Is the IP translation information stored somewhere by the CMA, permitting the application to continue? Pradeep pradeep at us.ibm.com openib-general-bounces at openib.org wrote on 09/29/2005 09:09:10 AM: > Michael S. Tsirkin wrote: > > BTW, one idea of mine that some people apparently were interested in, > > is adding an option to do ARP resolution on the passive side after REQ is > > received, as a weak security measure. > > > > Is there still interest in that? > > I don't think that there's anything to restrict this. The ULP (CMA in this > case?) could do this using the address translation API before > processing a REQ. > Assuming that address translation were used on the remote side > before sending > the REQ, the IP addresses are likely to be in the ARP cache already. > > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From moschny at ipd.uni-karlsruhe.de Thu Sep 29 12:11:14 2005 From: moschny at ipd.uni-karlsruhe.de (Thomas Moschny) Date: Thu, 29 Sep 2005 21:11:14 +0200 Subject: [openib-general] IPoIB configuration In-Reply-To: <1128018730.4398.59.camel@hal.voltaire.com> References: <200509292000.52337.moschny@ipd.uni-karlsruhe.de> <1128018730.4398.59.camel@hal.voltaire.com> Message-ID: <200509292111.22577.moschny@ipd.uni-karlsruhe.de> On Thursday 29 September 2005 20:32, you wrote: > Can you ping the subnet broadcast address (e.g. ping -b 192.168.0.255 if > the ib0 is 192.168.0.x) ? The only answer I get is from the sender itself: $ ping -b 192.168.204.255 WARNING: pinging broadcast address PING 192.168.204.255 (192.168.204.255) 56(84) bytes of data. 64 bytes from 192.168.204.1: icmp_seq=0 ttl=64 time=0.029 ms 64 bytes from 192.168.204.1: icmp_seq=1 ttl=64 time=0.012 ms 64 bytes from 192.168.204.1: icmp_seq=2 ttl=64 time=0.012 ms --- 192.168.204.255 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.012/0.017/0.029/0.009 ms, pipe 2 > Do /sys/class/net/ib0/statistics/rx_packets and/or "tcpdump -i ib0" show > anything on the other nodes when you try to ping or something ? No, rx_packages contains '0' on all nodes. And tcpdump doesn't work: $ tcpdump -i ib0 tcpdump: ioctl: Value too large for defined data type > Are there any messages in /var/log/messages pertaining to ib_ ? Only those mentioned earlier: - divert: not allocating divert_blk for non-ethernet device ib0 [and ib1] - ib0: no IPv6 routers present > Also, are there any errors in the OpenSM log ? Can you look there ? > Perhaps rerun OpenSM with -V and send the log. Ok, I did. The log is rather big, and there are some warnings, but no errors. For what should I look? Anyway, I attach a compressed version to this mail. > Please consult http://www.openib.org/docs/ipoib_faq.txt for more info. I already did, but nevertheless have currently no clue what's the problem. The only 'anomaly' I noticed was that ifconfig doesn't show the hardware address *at all*, even not the higher bytes: $ ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:192.168.204.2 Bcast:192.168.204.255 Mask:255.255.255.0 inet6 addr: fe80::202:c902:0:1575/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:0 (0.0 b) TX bytes:360 (360.0 b) (but ip does, so I didn't worry: $ ip ib0 ip addr show dev ib0 5: ib0: mtu 2044 qdisc pfifo_fast qlen 128 link/[32] 00:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:02:00:00:15:75 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff inet 192.168.204.2/24 brd 192.168.204.255 scope global ib0 inet6 fe80::202:c902:0:1575/64 scope link valid_lft forever preferred_lft forever) More seriously, the arp table is empty: $ ip neigh show dev ib0 doesn't show anything. Thanks, Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: osm.log.bz2 Type: application/x-bzip2 Size: 50143 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From mshefty at ichips.intel.com Thu Sep 29 12:15:52 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 12:15:52 -0700 Subject: [openib-general][RFC]: CMA IB implementation In-Reply-To: References: Message-ID: <433C3D68.5040505@ichips.intel.com> Pradeep Satyanarayana wrote: > I am not very clear about this -what happens when the ARP cache is empty > on the active side and IPoIB module has also been unloaded? Is the IP > translation information stored somewhere by the CMA, permitting the > application to continue? If the route is not found in the cache, then an ARP will be sent. (That code is missing, but can be pulled from SDP or AT.) If IPoIB has been unloaded, then there's no remote IP address, and the request will fail. I don't think that it's a good idea to remember IP addresses that are no longer valid. An application that has already established a connection is unaffected by unloading IPoIB or a change in its address. - Sean From halr at voltaire.com Thu Sep 29 12:25:43 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 15:25:43 -0400 Subject: [openib-general] IPoIB configuration In-Reply-To: <200509292111.22577.moschny@ipd.uni-karlsruhe.de> References: <200509292000.52337.moschny@ipd.uni-karlsruhe.de> <1128018730.4398.59.camel@hal.voltaire.com> <200509292111.22577.moschny@ipd.uni-karlsruhe.de> Message-ID: <1128021943.4398.237.camel@hal.voltaire.com> On Thu, 2005-09-29 at 15:11, Thomas Moschny wrote: > On Thursday 29 September 2005 20:32, you wrote: > > Can you ping the subnet broadcast address (e.g. ping -b 192.168.0.255 if > > the ib0 is 192.168.0.x) ? > > The only answer I get is from the sender itself: > > $ ping -b 192.168.204.255 > WARNING: pinging broadcast address > PING 192.168.204.255 (192.168.204.255) 56(84) bytes of data. > 64 bytes from 192.168.204.1: icmp_seq=0 ttl=64 time=0.029 ms > 64 bytes from 192.168.204.1: icmp_seq=1 ttl=64 time=0.012 ms > 64 bytes from 192.168.204.1: icmp_seq=2 ttl=64 time=0.012 ms > > --- 192.168.204.255 ping statistics --- > 3 packets transmitted, 3 received, 0% packet loss, time 2000ms > rtt min/avg/max/mdev = 0.012/0.017/0.029/0.009 ms, pipe 2 > > > Do /sys/class/net/ib0/statistics/rx_packets and/or "tcpdump -i ib0" show > > anything on the other nodes when you try to ping or something ? > > No, rx_packages contains '0' on all nodes. And tcpdump doesn't work: > > $ tcpdump -i ib0 > tcpdump: ioctl: Value too large for defined data type > > > Are there any messages in /var/log/messages pertaining to ib_ ? > > Only those mentioned earlier: > - divert: not allocating divert_blk for non-ethernet device ib0 [and ib1] > - ib0: no IPv6 routers present > > > Also, are there any errors in the OpenSM log ? Can you look there ? > > Perhaps rerun OpenSM with -V and send the log. > > Ok, I did. The log is rather big, and there are some warnings, but no errors. > For what should I look? Anyway, I attach a compressed version to this mail. > > > Please consult http://www.openib.org/docs/ipoib_faq.txt for more info. > > I already did, but nevertheless have currently no clue what's the problem. > > The only 'anomaly' I noticed was that ifconfig doesn't show the hardware > address *at all*, even not the higher bytes: > > $ ifconfig ib0 > ib0 Link encap:UNSPEC HWaddr > 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 > inet addr:192.168.204.2 Bcast:192.168.204.255 Mask:255.255.255.0 > inet6 addr: fe80::202:c902:0:1575/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:0 (0.0 b) TX bytes:360 (360.0 b) > > (but ip does, so I didn't worry: > $ ip ib0 > ip addr show dev ib0 > 5: ib0: mtu 2044 qdisc pfifo_fast qlen 128 > link/[32] 00:00:04:04:fe:80:00:00:00:00:00:00:00:02:c9:02:00:00:15:75 brd > 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff > inet 192.168.204.2/24 brd 192.168.204.255 scope global ib0 > inet6 fe80::202:c902:0:1575/64 scope link > valid_lft forever preferred_lft forever) > > More seriously, the arp table is empty: > $ ip neigh show dev ib0 > doesn't show anything. because no other nodes are seen with ping. In the log, I do see several nodes successfully join the IPoIB broadcast group and the multicast tree for this got setup (I didn't actually validate the tree itself). PortGid.................0xfe80000000000000 : 0x0002c90200001575 PortGid.................0xfe80000000000000 : 0x0002c90200001581 What is your switch configuration ? It looks like a single switch. Is that right ? Bottom line, I still can't see anything wrong. You may need to turn on IPoIB multicast debug in the host stack to see what is not working correctly. --- Hal From rolandd at cisco.com Thu Sep 29 12:44:38 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 29 Sep 2005 12:44:38 -0700 Subject: [openib-general] InfiniPath driver announcement In-Reply-To: <1127937007.6858.7.camel@hematite.internal.keyresearch.com> (Robert Walsh's message of "Wed, 28 Sep 2005 12:50:07 -0700") References: <1127937007.6858.7.camel@hematite.internal.keyresearch.com> Message-ID: <523bnnhd5l.fsf@cisco.com> Just so we don't forget, we need to add something like this to the uverbs code on your branch. (I didn't even compile test but you get the idea...) --- src/linux-kernel/infiniband/core/uverbs_main.c (revision 3620) +++ src/linux-kernel/infiniband/core/uverbs_main.c (working copy) @@ -119,7 +119,13 @@ static int ib_dealloc_ucontext(struct ib down(&ib_uverbs_idr_mutex); - /* XXX Free AHs */ + list_for_each_entry_safe(uobj, tmp, &context->ah_list, list) { + struct ib_ah *ah = idr_find(&ib_uverbs_ah_idr, uobj->id); + idr_remove(&ib_uverbs_ah_idr, uobj->id); + ib_destroy_ah(ah); + list_del(&uobj->list); + kfree(uobj); + } list_for_each_entry_safe(uobj, tmp, &context->qp_list, list) { struct ib_qp *qp = idr_find(&ib_uverbs_qp_idr, uobj->id); From rjwalsh at pathscale.com Thu Sep 29 12:47:42 2005 From: rjwalsh at pathscale.com (Robert Walsh) Date: Thu, 29 Sep 2005 12:47:42 -0700 Subject: [openib-general] InfiniPath driver announcement In-Reply-To: <523bnnhd5l.fsf@cisco.com> References: <1127937007.6858.7.camel@hematite.internal.keyresearch.com> <523bnnhd5l.fsf@cisco.com> Message-ID: <1128023262.4211.2.camel@phosphene.durables.org> On Thu, 2005-09-29 at 12:44 -0700, Roland Dreier wrote: > Just so we don't forget, we need to add something like this to the > uverbs code on your branch. Yup. It's on my TODO list. I need to take care of some unrelated stuff today, but I'll get back to working on all the feedback I've received tonight or tomorrow. Regards, Robert. -- Robert Walsh Email: rjwalsh at pathscale.com PathScale, Inc. Phone: +1 650 934 8117 2071 Stierlin Court, Suite 200 Fax: +1 650 428 1969 Mountain View, CA 94043. From moschny at ipd.uni-karlsruhe.de Thu Sep 29 13:01:51 2005 From: moschny at ipd.uni-karlsruhe.de (Thomas Moschny) Date: Thu, 29 Sep 2005 22:01:51 +0200 Subject: [openib-general] IPoIB configuration In-Reply-To: <1128021943.4398.237.camel@hal.voltaire.com> References: <200509292000.52337.moschny@ipd.uni-karlsruhe.de> <200509292111.22577.moschny@ipd.uni-karlsruhe.de> <1128021943.4398.237.camel@hal.voltaire.com> Message-ID: <200509292201.51832.moschny@ipd.uni-karlsruhe.de> On Thursday 29 September 2005 21:25, Hal Rosenstock wrote: > In the log, I do see several nodes successfully join the IPoIB broadcast > group and the multicast tree for this got setup (I didn't actually > validate the tree itself). > > PortGid.................0xfe80000000000000 : 0x0002c90200001575 > PortGid.................0xfe80000000000000 : 0x0002c90200001581 > > What is your switch configuration ? It looks like a single switch. Is > that right ? Yes, it's a single MTS-2400 with 24 ports. Maybe a switch firmware problem? We once observed a complete switch lockup that shut down all communication. > Bottom line, I still can't see anything wrong. You may need to turn on > IPoIB multicast debug in the host stack to see what is not working > correctly. Which means building yet another kernel, because there is no debugfs in 2.6.9, so IPoIB debugging doesn't work there... - Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From halr at voltaire.com Thu Sep 29 13:08:09 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 16:08:09 -0400 Subject: [openib-general] IPoIB configuration In-Reply-To: <200509292201.51832.moschny@ipd.uni-karlsruhe.de> References: <200509292000.52337.moschny@ipd.uni-karlsruhe.de> <200509292111.22577.moschny@ipd.uni-karlsruhe.de> <1128021943.4398.237.camel@hal.voltaire.com> <200509292201.51832.moschny@ipd.uni-karlsruhe.de> Message-ID: <1128024488.4398.414.camel@hal.voltaire.com> On Thu, 2005-09-29 at 16:01, Thomas Moschny wrote: > Maybe a switch firmware problem? We once observed a complete switch lockup > that shut down all communication. Could be. Do you know what rev of firmware you are running ? Is it 0.7.0 ? (MTS-2400 is Anafa-2 based). Also, what is your HCA firmware version ? -- Hal From shubbell at dbresearch.net Thu Sep 29 13:18:38 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Thu, 29 Sep 2005 15:18:38 -0500 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <1128018539.4398.48.camel@hal.voltaire.com> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> Message-ID: <433C4C1E.3070705@dbresearch.net> Hal Rosenstock wrote: >Hi Sean, > >On Thu, 2005-09-29 at 14:03, Sean Hubbell wrote: > > >> We are having problems with loading the mthca module running Linux >>2.6.13 Kernel with the svn repository pulled yesterday afternoon when we >>are booting. >> >> > >What problem is occuring with loading mthca during boot ? > > The modules does not load and I do not have an error message. This occures intermitantly. > > >>Once we boot, we get passed this but we have problems when >>attempting to run ibping. >> >> > >ibping or ping ? If ibping, what is the server ? What is the ibping >invocation ? Is this before or after an OpenSM is running in the subnet >? > > > ibping 0x1 ibping 0xB Neither of which work. Of course this is after I start opensm. >> The log files are 1.4M. >> >> > >OpenSM log or something else ? > > This includes osmtest, osm.log, and the results of ibnetdiscover. > > >> Would you like me to send them to you directly? >> >> > >Can you gzip or bzip it down ? > > > This is the size after taring and gzipping. Would you like me to send them? Sean From robert.j.woodruff at intel.com Thu Sep 29 13:32:03 2005 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 29 Sep 2005 13:32:03 -0700 Subject: [openib-general] IPoIB configuration Message-ID: <1AC79F16F5C5284499BB9591B33D6F0005AEF3D9@orsmsx408> Thomas wrote, >Yes, it's a single MTS-2400 with 24 ports. >Maybe a switch firmware problem? We once observed a complete switch lockup >that shut down all communication. If you suspect a bad switch, do you have another one you could try ? or you can try to direct connect a couple of nodes. woody From robert.j.woodruff at intel.com Thu Sep 29 13:49:13 2005 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 29 Sep 2005 13:49:13 -0700 Subject: [openib-general] IPoIB configuration Message-ID: <1AC79F16F5C5284499BB9591B33D6F0005AEF450@orsmsx408> >Also, what is your HCA firmware version ? >-- Hal Good point. I have seen IPoIB connectivity issues in the past when dealing with down rev FW. I just re-tested IPoIB on my IPF machines and they seem to work OK for me. I suspect either the HCA FW rev or the switch. [root at iclust-tiger1 SPECS]# cat /sys/class/infiniband/mthca0/fw_ver 3.3.2 uname -a Linux iclust-tiger1 2.6.9-11.OpenIB.3513.EL.root #1 SMP Mon Sep 26 14:29:22 PDT 2005 ia64 ia64 ia64 GNU/Linux [root at iclust-tiger1 ~]# ping iclust-2-ib0 PING iclust-2-ib0 (192.168.0.2) 56(84) bytes of data. 64 bytes from iclust-2-ib0 (192.168.0.2): icmp_seq=0 ttl=64 time=0.146 ms 64 bytes from iclust-2-ib0 (192.168.0.2): icmp_seq=1 ttl=64 time=0.116 ms _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From moschny at ipd.uni-karlsruhe.de Thu Sep 29 14:01:00 2005 From: moschny at ipd.uni-karlsruhe.de (Thomas Moschny) Date: Thu, 29 Sep 2005 23:01:00 +0200 Subject: [openib-general] IPoIB configuration In-Reply-To: <1128024488.4398.414.camel@hal.voltaire.com> References: <200509292000.52337.moschny@ipd.uni-karlsruhe.de> <200509292201.51832.moschny@ipd.uni-karlsruhe.de> <1128024488.4398.414.camel@hal.voltaire.com> Message-ID: <200509292301.01213.moschny@ipd.uni-karlsruhe.de> On Thursday 29 September 2005 22:08, you wrote: > On Thu, 2005-09-29 at 16:01, Thomas Moschny wrote: > > Maybe a switch firmware problem? We once observed a complete switch > > lockup that shut down all communication. > > Could be. Do you know what rev of firmware you are running ? Is it 0.7.0 > ? (MTS-2400 is Anafa-2 based). To be honest, I don't know. How can I find out? I tried talking to the switch using a serial cable, but without success :( > Also, what is your HCA firmware version ? $ cat /sys/class/infiniband/mthca0/fw_ver 3.3.3 - Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From yaronh at voltaire.com Thu Sep 29 14:09:23 2005 From: yaronh at voltaire.com (Yaron Haviv) Date: Thu, 29 Sep 2005 23:09:23 +0200 Subject: [openib-general] [RFC] IB address translation using ARP Message-ID: <35EA21F54A45CB47B879F21A91F4862F7F9EF2@taurus.voltaire.com> > -----Original Message----- > From: openib-general-bounces at openib.org [mailto:openib-general- > bounces at openib.org] On Behalf Of Sean Hefty > Sent: Thursday, September 29, 2005 2:58 PM > To: Hal Rosenstock > Cc: Openib > Subject: Re: [openib-general] [RFC] IB address translation using ARP > > Hal Rosenstock wrote: > >>I'm struggling with understanding how translation can even occur in this > case. > >>What DGID is used when querying for the path record, and how is it > obtained? > > > > Isn't it the DGID of the next hop IP router ? (I suppose in the case of > > multiple IPoIB subnets on the same IB subnet, it could shortcut somehow > > like NHRP does in terms of ATM v. CLIP (Classic IP over ATM). > > How is the DGID of the next hop IP router used when connecting? As an > aside, do > the IPoIB subnets all fall into the same broadcast domain? > > >>What does SDP do in this case? > > > > Same as AT. It does the route lookup and ARPs for and then asks for the > > PathRecord of the next hop IP router. > > I guess I'm confused here. This gives a path record between the host > system and > the IP router. How is that used to establish a connection to the actual > destination? What values (DLID, DGID, pkey, etc.) go in the CM REQ > message, and > how are those values obtained? > > - Sean The idea as Hal was describing is following the common IP model: 1. per destination IP (and TOS in IP case) find the outgoing route entry 2. if it's a subnet covered by an adapter (IPoIB in our case, can have multiple per port each with its own P_Key), find the net device to use 3. if its not in one of my subnets than what is the IP of the router covering that destination (e.g. default gateway), and what is the net device I need to use (a device/port/partition combination). 4. send an arp on the net device find destination MAC Note the destination IP in the ARP phase is either the REAL destination IP in case of a local subnet, or the IP router IP address in case of a gateway/router. 5. issue a path record between the source/dest GIDs (DGID taken from ARP Result IPoIB MAC) That's how its done in SDP & ib_at I believe The generalization beyond a local subnet is very important If we want to address all sorts of applications, and configurations And not related to IB routing e.g. a proxy/LB application that sits in between two IP subnets (both over IB), future mapping from IB to external iWarp subnets, IP routers, etc' it also follows the exact flow as in GbE/IP Yaron > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib- > general From mshefty at ichips.intel.com Thu Sep 29 14:16:04 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 14:16:04 -0700 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F7F9EF2@taurus.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F7F9EF2@taurus.voltaire.com> Message-ID: <433C5994.5040300@ichips.intel.com> Yaron Haviv wrote: > 4. send an arp on the net device find destination MAC > > Note the destination IP in the ARP phase is either the REAL destination > IP in case of a local subnet, or the IP router IP address in case of a > gateway/router. > > 5. issue a path record between the source/dest GIDs (DGID taken from ARP > Result IPoIB MAC) In the case of gateway/router, isn't the returned GID for the router? How is this used to establish a connection with the real destination? - Sean From halr at voltaire.com Thu Sep 29 14:15:12 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 17:15:12 -0400 Subject: [openib-general] IPoIB configuration In-Reply-To: <200509292301.01213.moschny@ipd.uni-karlsruhe.de> References: <200509292000.52337.moschny@ipd.uni-karlsruhe.de> <200509292201.51832.moschny@ipd.uni-karlsruhe.de> <1128024488.4398.414.camel@hal.voltaire.com> <200509292301.01213.moschny@ipd.uni-karlsruhe.de> Message-ID: <1128028512.4398.23.camel@hal.voltaire.com> On Thu, 2005-09-29 at 17:01, Thomas Moschny wrote: > On Thursday 29 September 2005 22:08, you wrote: > > On Thu, 2005-09-29 at 16:01, Thomas Moschny wrote: > > > Maybe a switch firmware problem? We once observed a complete switch > > > lockup that shut down all communication. > > > > Could be. Do you know what rev of firmware you are running ? Is it 0.7.0 > > ? (MTS-2400 is Anafa-2 based). > > To be honest, I don't know. How can I find out? I tried talking to the switch > using a serial cable, but without success :( You need to contact your switch vendor. > > Also, what is your HCA firmware version ? > > $ cat /sys/class/infiniband/mthca0/fw_ver > 3.3.3 That's the most recent. -- Hal From robert.j.woodruff at intel.com Thu Sep 29 14:44:24 2005 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Thu, 29 Sep 2005 14:44:24 -0700 Subject: [openib-general] IPoIB configuration Message-ID: <1AC79F16F5C5284499BB9591B33D6F0005AEF589@orsmsx408> Hal wrote, >> > Also, what is your HCA firmware version ? >> >> $ cat /sys/class/infiniband/mthca0/fw_ver >> 3.3.3 >That's the most recent. >-- Hal I would try 2 nodes point to point. If that works, then I suspect the switch. I did see an issue with one of our MT2400 switches with IPoIB connectivity. We replaced the switch and it seemed to fix the problem, so we did not investigate further, but perhaps down rev. F/W could have been the problem. woody From moschny at ipd.uni-karlsruhe.de Thu Sep 29 14:50:32 2005 From: moschny at ipd.uni-karlsruhe.de (Thomas Moschny) Date: Thu, 29 Sep 2005 23:50:32 +0200 Subject: [openib-general] IPoIB configuration In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F0005AEF589@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F0005AEF589@orsmsx408> Message-ID: <200509292350.39453.moschny@ipd.uni-karlsruhe.de> On Thursday 29 September 2005 23:44, you wrote: > I would try 2 nodes point to point. If that works, then > I suspect the switch. I did see an issue with one of our MT2400 switches > with IPoIB connectivity. We replaced the switch and it > seemed to fix the problem, so we did not investigate further, > but perhaps down rev. F/W could have been the problem. Will try that tomorrow. Thanks for the help, Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From yaronh at voltaire.com Thu Sep 29 15:03:56 2005 From: yaronh at voltaire.com (Yaron Haviv) Date: Fri, 30 Sep 2005 00:03:56 +0200 Subject: [openib-general] [RFC] IB address translation using ARP Message-ID: <35EA21F54A45CB47B879F21A91F4862F7F9EFC@taurus.voltaire.com> > -----Original Message----- > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Thursday, September 29, 2005 5:16 PM > To: Yaron Haviv > Cc: Hal Rosenstock; Openib > Subject: Re: [openib-general] [RFC] IB address translation using ARP > > Yaron Haviv wrote: > > 4. send an arp on the net device find destination MAC > > > > Note the destination IP in the ARP phase is either the REAL destination > > IP in case of a local subnet, or the IP router IP address in case of a > > gateway/router. > > > > 5. issue a path record between the source/dest GIDs (DGID taken from ARP > > Result IPoIB MAC) > > In the case of gateway/router, isn't the returned GID for the router? How > is > this used to establish a connection with the real destination? > > - Sean The RC connection is established with the DGID of the router (it's the equivalent of a MAC address and its ok), the ServiceID + private data in the case of SDP or iSER (or NFS-R assuming the IBTA proposal will pass) also contains info on the REAL destination IP that can be used by the proxy. By the way there is a section on that in the IETF iSER draft talking about iSER to iSCSI routing, but it's a general solution just as applicable to someone doing HTTP proxy to SDP, or NFS/TCP to NFS/RDMA, or SDP to SDP, etc'. to route From halr at voltaire.com Thu Sep 29 15:03:39 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 18:03:39 -0400 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <433C4C1E.3070705@dbresearch.net> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> Message-ID: <1128031418.4398.138.camel@hal.voltaire.com> On Thu, 2005-09-29 at 16:18, Sean Hubbell wrote: > Hal Rosenstock wrote: > > >Hi Sean, > > > >On Thu, 2005-09-29 at 14:03, Sean Hubbell wrote: > > > > > >> We are having problems with loading the mthca module running Linux > >>2.6.13 Kernel with the svn repository pulled yesterday afternoon when we > >>are booting. > >> > >> > > > >What problem is occuring with loading mthca during boot ? > > > > > > The modules does not load and I do not have an error message. This > occures intermitantly. Does this mean there is no error message upon loading mthca ? > >>Once we boot, we get passed this but we have problems when > >>attempting to run ibping. > >> > >> > > > >ibping or ping ? If ibping, what is the server ? What is the ibping > >invocation ? Is this before or after an OpenSM is running in the subnet > >? > > > > > > > ibping 0x1 > ibping 0xB Do you build and start ib_ping module on those machines ? I also assume those are valid LIDs in your subnet. > Neither of which work. Of course this is after I start opensm. > > >> The log files are 1.4M. > >> > >> > > > >OpenSM log or something else ? > This includes osmtest, osm.log, and the results of ibnetdiscover. Not sure this will be relevant. > >> Would you like me to send them to you directly? Perhaps but not yet. > >Can you gzip or bzip it down ? > > > > > > > This is the size after taring and gzipping. > > Would you like me to send them? Not yet. -- Hal From halr at voltaire.com Thu Sep 29 15:07:16 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 29 Sep 2005 18:07:16 -0400 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <433C3948.2010205@ichips.intel.com> References: <1128002363.4381.8.camel@hal.voltaire.com> <1128006876.4381.164.camel@hal.voltaire.com> <433C10D8.2090301@ichips.intel.com> <1128011095.4381.358.camel@hal.voltaire.com> <433C18EA.6020906@ichips.intel.com> <1128012338.4381.410.camel@hal.voltaire.com> <433C1D04.2000404@ichips.intel.com> <1128017777.4398.3.camel@hal.voltaire.com> <433C34A6.5050801@ichips.intel.com> <1128019458.4398.95.camel@hal.voltaire.com> <433C3948.2010205@ichips.intel.com> Message-ID: <1128031635.4398.146.camel@hal.voltaire.com> On Thu, 2005-09-29 at 14:58, Sean Hefty wrote: > As an aside, do the IPoIB subnets all fall into the same broadcast domain? That would depend on the PKey, right ? -- Hal From mshefty at ichips.intel.com Thu Sep 29 15:23:50 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 15:23:50 -0700 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <35EA21F54A45CB47B879F21A91F4862F7F9EFC@taurus.voltaire.com> References: <35EA21F54A45CB47B879F21A91F4862F7F9EFC@taurus.voltaire.com> Message-ID: <433C6976.10405@ichips.intel.com> Yaron Haviv wrote: > The RC connection is established with the DGID of the router (it's the > equivalent of a MAC address and its ok), the ServiceID + private data in > the case of SDP or iSER (or NFS-R assuming the IBTA proposal will pass) > also contains info on the REAL destination IP that can be used by the > proxy. I think I'm missing some fairly important concepts here. Can you explain how RDMA works in this case? This is simply performing IP routing, and not IB routing, correct? Are you referring to a protocol running on top of IP or IB directly? Is the router establishing a second reliable connection on the backend? Does it simply translate headers as packets pass through in this case? My focus so far has been trying to connection directly over IB, but using IP addresses. - Sean From info at openib.org Thu Sep 29 15:49:24 2005 From: info at openib.org (info at openib.org) Date: Fri, 30 Sep 2005 04:49:24 +0600 Subject: [openib-general] fnfetceueegbftqoi Message-ID: <0INM00EMEMUV5C@mail.interblocks.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: email-details.zip Type: application/octet-stream Size: 53528 bytes Desc: not available URL: From Administrator at openib.org Thu Sep 29 15:48:24 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Thu, 29 Sep 2005 17:48:24 -0500 Subject: [openib-general] [MailServer Notification]To Recipient virus found and action taken. Message-ID: <004e01c5c547$e664c200$020ca8c0@banderacom.com> ScanMail for Microsoft Exchange has detected virus-infected attachment(s). Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] fnfetceueegbftqoi Scanning time = 9/29/2005 5:48:24 PM Engine/Pattern = 7.510-1002/2.865.00 Action on virus found: The attachment email-details.zip contains WORM_MYTOB.EI virus. ScanMail has Deleted it. Warning to recipient. ScanMail has detected a virus. 9/29/2005 email-details.zip/Deleted openib-general at openib.org openib-general-bounces at openib.org [openib-general] fnfetceueegbftqoi From Administrator at openib.org Thu Sep 29 15:49:03 2005 From: Administrator at openib.org (Administrator at openib.org) Date: Thu, 29 Sep 2005 15:49:03 -0700 Subject: [openib-general] [MailServer Notification]To Recipient file blocking settings matched and action taken. Message-ID: <025e01c5c547$fd6bd6a0$faf9a8c0@qlogic.org> ScanMail for Microsoft Exchange has blocked an attachment. Sender = openib-general-bounces at openib.org Recipient(s) = openib-general at openib.org Subject = [openib-general] fnfetceueegbftqoi Scanning time = 9/29/2005 3:49:02 PM Action on file blocking: The attachment email-details.zip matches the file blocking settings. ScanMail has Quarantined it. The attachment was quarantined to C:\Program Files\Trend\Smex\Alert\email-details433c6f5e1c.zip_. Warning to Recipient: Action taken by attachment blocking. From vuhuong at mellanox.com Thu Sep 29 16:28:33 2005 From: vuhuong at mellanox.com (Vu Pham) Date: Thu, 29 Sep 2005 16:28:33 -0700 Subject: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it In-Reply-To: <52zmpvhll8.fsf@cisco.com> References: <52vf0kii49.fsf@cisco.com> <433C1821.6000809@mellanox.com> <52zmpvhll8.fsf@cisco.com> Message-ID: <433C78A1.30207@mellanox.com> Roland, Vu> Since all the tuned parameter are target-centralized (passing Vu> in when add new target) I think about moving FMR resources Vu> (size, max_page...) ie. fmr_pool into srp_target_port Vu> struct. Each newly added target will have their own customized Vu> FMR pool. That makes some sense. An issue is that FMRs are a fairly limited resource, and a system with many SRP targets where each target doesn't get much traffic could tie up a lot of FMRs. You're right. For the same reason of unused port (ie. srp_host), I create fmr resource per device and keep it in srp_device_data struct I put back fmr + your patch and it works well with my setup. Signed-off-by: Vu Pham -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: srp.patch URL: From info at sedfjx.com Thu Sep 29 17:29:15 2005 From: info at sedfjx.com (info at sedfjx.com) Date: 30 Sep 2005 09:29:15 +0900 Subject: [openib-general] $B=w@-M-L>7]G=?M$4MQC#$G$9(B Message-ID: <20050930002915.3326.qmail@mail.sedfjx.com> $B!!!!(B $B(#(!(!($(#(!(!($(B $B!!!!(B $B("40A4("("L5NA("(B $B!!!!(B $B(&(!(!(%(&(!(!(%(B $B(#(!(!($(#(!(!($(#(!(!($(B $B("M-L>("("?M!*("(">R2p("(B $B(&(!(!(%(&(!(!(%(&(!(!(%(B http://www.00-love5.com/?vipgirl $B"((BFree$B%a!<%k$G$N40A4L5NA$,=PMh$^$9!#(B $B!V5.J}$K%T%C%?%7$N=w at -$r6KHk$K>R2pCW$7$^$9!W(B $B"(Cm0U"((B $BM-L>$JJ}$bB?!9:_ at R$7$F$$$^$9$N$GI,$:HkL)87EPO?8e$K5$$KF~$i$J$1$l$PB(!"B`2q$7$F$b9=$$$^$;$s!*!*(B $B!yCN$k?M$,CN$k$+$J$jM-L>$J>R2p=j$G$9!#(B *_________________________________* $B"((BI don't veceive your mail$B"-(B sweet_baby_sweet_12 at yahoo.it $B"(%a!<%kITMW"-(B sweet_baby_sweet_12 at yahoo.it *__________________________________* 18$B:PL$K~$O$4MxMQ$G$-$^$;$s!*(B From rolandd at cisco.com Thu Sep 29 18:49:53 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 29 Sep 2005 18:49:53 -0700 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <433C6976.10405@ichips.intel.com> (Sean Hefty's message of "Thu, 29 Sep 2005 15:23:50 -0700") References: <35EA21F54A45CB47B879F21A91F4862F7F9EFC@taurus.voltaire.com> <433C6976.10405@ichips.intel.com> Message-ID: <52vf0jfhoe.fsf@cisco.com> Sean> Can you explain how RDMA works in this case? This is simply Sean> performing IP routing, and not IB routing, correct? Are you Sean> referring to a protocol running on top of IP or IB directly? Sean> Is the router establishing a second reliable connection on Sean> the backend? Does it simply translate headers as packets Sean> pass through in this case? I think the usage model is the following: you have some magic device that has an IB port on one side and "something else" on the other side. Think of something like a gateway that talks SDP on the IB side and TCP/IP on the other side. You configure your IPoIB routing so that this magic device is the next hop for talking to hosts on the IP network on the other side. Now someone tries to make an SDP connection to an IP address on the other side of the magic device. Routing tables + ARP give it the GID of the IB port of this magic device. It connects to the magic device and run SDP to talk to the magic device, and the magic device magically splices this into a TCP connection to the real destination. Or the same idea for an NFS/RDMA <-> NFS/UDP gateway, etc. - R. From sean.hefty at intel.com Thu Sep 29 22:08:05 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 22:08:05 -0700 Subject: [openib-general] [RFC] IB address translation using ARP In-Reply-To: <52vf0jfhoe.fsf@cisco.com> Message-ID: >I think the usage model is the following: you have some magic device >that has an IB port on one side and "something else" on the other >side. Think of something like a gateway that talks SDP on the IB side >and TCP/IP on the other side. > >You configure your IPoIB routing so that this magic device is the next >hop for talking to hosts on the IP network on the other side. > >Now someone tries to make an SDP connection to an IP address on the >other side of the magic device. Routing tables + ARP give it the GID >of the IB port of this magic device. It connects to the magic device >and run SDP to talk to the magic device, and the magic device >magically splices this into a TCP connection to the real destination. Thanks for the clarification. - Sean From sean.hefty at intel.com Thu Sep 29 22:41:40 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 29 Sep 2005 22:41:40 -0700 Subject: [openib-general] CMA and device removal Message-ID: I'd like to get some feedback about the following design change to the CMA. Currently, a user receives a device GUID and port number as part of address resolution. The user matches the GUID to an existing device and creates a cma_id associated with that device. I'm considering the following alternative: On creation, a cma_id is not bound to a particular device. It gets bound to a device as part of address resolution. The CMA tracks device references. When a device is removed, the CMA will generate an event on all associated cma_id's. Users must destroy cma_id's and associated device resources after receiving the event. A couple of notes: the CMA already performs reference counting on devices. Also, with some additional work, this would permit a single listen request to span multiple devices. Comments? - Sean From gabhijit at pantasys.com Thu Sep 29 23:31:28 2005 From: gabhijit at pantasys.com (Abhijit Gadgil) Date: Fri, 30 Sep 2005 12:01:28 +0530 Subject: [openib-general] Sending/Receiving MAD packets from user-space. Message-ID: <1128061888.5764.8.camel@psmith.ind.pantasys.com> Hi All, I am trying to use MAD services from a user-land application. Basically I want to do few things like registering to traps/generating Multicast Send requests from user-space using Userspace VAPI. After reading through some code in osmtest.c, I couldn't figure out whether it is using the Userspace VAPI or through a kernel module Kernel space VAPI. Can someone point to me how I should get going about it? The library libopensm.a or libosmvendor.a can be used for this? However, I am not too sure whether it is using the user mode Verbs API or not. Thanks in advance Regards. -abhijit From iod00d at hp.com Thu Sep 29 23:30:03 2005 From: iod00d at hp.com (Grant Grundler) Date: Thu, 29 Sep 2005 23:30:03 -0700 Subject: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it In-Reply-To: <52zmpvhll8.fsf@cisco.com> References: <52vf0kii49.fsf@cisco.com> <433C1821.6000809@mellanox.com> <52zmpvhll8.fsf@cisco.com> Message-ID: <20050930063003.GM29921@esmail.cup.hp.com> On Thu, Sep 29, 2005 at 09:42:27AM -0700, Roland Dreier wrote: > Vu> Have you reviewed the FMR? What your take on Christoph's point > Vu> about the high bit of dma_address_ts are used by some > Vu> platforms IOMMU - I think that it's OK since FMR code only > Vu> touch the lower bit of dma_address_ts > > But I think that it's fine to manipulate dma_addr_t the way that your > code does -- they are bus addresses. Sorry - I intended to look more closely at this but have been swamped. Christoph is right. Even if the code works, it's risky to muck with the dma_addr_t contents. I'll try to look at this tomorrow and if I have a better idea, propose it. (I'm pessimistic that I'll have a better idea though) grant From mst at mellanox.co.il Fri Sep 30 01:05:10 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Sep 2005 11:05:10 +0300 Subject: [openib-general] [PATCH] core+mthca: questions and proposal: kill ib_(de)alloc_device Message-ID: <20050930080510.GA31930@mellanox.co.il> Guys, looking with Leonid Keller at device.c raised several questions with regard to what ib_alloc_device/ib_dealloc_device do: 1. Why is ib_device_register_sysfs called from ib_register_device, but ib_device_unregister_sysfs from ib_dealloc_device? 2. Who is supposed to set reg_state back to IB_DEV_UNINITIALIZED? Without it ib_dealloc_device does not seem to free the device structure. Is this a memory leak? 3. ib_alloc_device does not set reg_state, it seems to rely on the fact that IB_DEV_UNINITIALIZED = 0. Is that intentional? 4. For ib_alloc_device/ib_dealloc_device to work properly, it seems that the device structure must have ib_device as the first member. Is this limitation documented anywhere? 5. Why do we need reg_state in the device, at all? I thought we can trust providers to call register/unregister in proper order? What do you say we simply let providers allocate the structure? Can you really go wrong reducing the line count by more than 50 :) ? # diffstat patches/alloc.patch core/device.c | 51 +----------------------------------------------- hw/mthca/mthca_main.c | 8 ++++--- include/rdma/ib_verbs.h | 9 -------- 3 files changed, 7 insertions(+), 61 deletions(-) --- Kill ib_alloc_device/ib_dealloc_device and the reg_state field. Also solves what looks like a memory leak. Index: linux-2.6.13/drivers/infiniband/core/device.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/core/device.c 2005-09-30 12:31:22.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/core/device.c 2005-09-30 12:31:39.000000000 +0300 @@ -149,51 +149,6 @@ static int alloc_name(char *name) return 0; } -/** - * ib_alloc_device - allocate an IB device struct - * @size:size of structure to allocate - * - * Low-level drivers should use ib_alloc_device() to allocate &struct - * ib_device. @size is the size of the structure to be allocated, - * including any private data used by the low-level driver. - * ib_dealloc_device() must be used to free structures allocated with - * ib_alloc_device(). - */ -struct ib_device *ib_alloc_device(size_t size) -{ - void *dev; - - BUG_ON(size < sizeof (struct ib_device)); - - dev = kmalloc(size, GFP_KERNEL); - if (!dev) - return NULL; - - memset(dev, 0, size); - - return dev; -} -EXPORT_SYMBOL(ib_alloc_device); - -/** - * ib_dealloc_device - free an IB device struct - * @device:structure to free - * - * Free a structure allocated with ib_alloc_device(). - */ -void ib_dealloc_device(struct ib_device *device) -{ - if (device->reg_state == IB_DEV_UNINITIALIZED) { - kfree(device); - return; - } - - BUG_ON(device->reg_state != IB_DEV_UNREGISTERED); - - ib_device_unregister_sysfs(device); -} -EXPORT_SYMBOL(ib_dealloc_device); - static int add_client_context(struct ib_device *device, struct ib_client *client) { struct ib_client_data *context; @@ -256,8 +211,6 @@ int ib_register_device(struct ib_device list_add_tail(&device->core_list, &device_list); - device->reg_state = IB_DEV_REGISTERED; - { struct ib_client *client; @@ -292,14 +245,14 @@ void ib_unregister_device(struct ib_devi list_del(&device->core_list); + ib_device_unregister_sysfs(device); + up(&device_sem); spin_lock_irqsave(&device->client_data_lock, flags); list_for_each_entry_safe(context, tmp, &device->client_data_list, list) kfree(context); spin_unlock_irqrestore(&device->client_data_lock, flags); - - device->reg_state = IB_DEV_UNREGISTERED; } EXPORT_SYMBOL(ib_unregister_device); Index: linux-2.6.13/drivers/infiniband/hw/mthca/mthca_main.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/hw/mthca/mthca_main.c 2005-09-30 12:31:22.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/hw/mthca/mthca_main.c 2005-09-30 12:31:39.000000000 +0300 @@ -1003,7 +1003,7 @@ static int __devinit mthca_init_one(stru } } - mdev = (struct mthca_dev *) ib_alloc_device(sizeof *mdev); + mdev = kmalloc(sizeof *mdev, GFP_KERNEL); if (!mdev) { dev_err(&pdev->dev, "Device struct alloc failed, " "aborting.\n"); @@ -1011,6 +1011,8 @@ static int __devinit mthca_init_one(stru goto err_free_res; } + memset(mdev, 0, sizeof *mdev); + mdev->pdev = pdev; if (ddr_hidden) @@ -1106,7 +1108,7 @@ err_free_dev: if (mdev->mthca_flags & MTHCA_FLAG_MSI) pci_disable_msi(pdev); - ib_dealloc_device(&mdev->ib_dev); + kfree(mdev); err_free_res: mthca_release_regions(pdev, ddr_hidden); @@ -1154,7 +1156,7 @@ static void __devexit mthca_remove_one(s if (mdev->mthca_flags & MTHCA_FLAG_MSI) pci_disable_msi(pdev); - ib_dealloc_device(&mdev->ib_dev); + kfree(mdev); mthca_release_regions(pdev, mdev->mthca_flags & MTHCA_FLAG_DDR_HIDDEN); pci_disable_device(pdev); Index: linux-2.6.13/drivers/infiniband/include/rdma/ib_verbs.h =================================================================== --- linux-2.6.13.orig/drivers/infiniband/include/rdma/ib_verbs.h 2005-09-30 12:31:30.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/include/rdma/ib_verbs.h 2005-09-30 12:31:52.000000000 +0300 @@ -945,12 +945,6 @@ struct ib_device { struct kobject ports_parent; struct list_head port_list; - enum { - IB_DEV_UNINITIALIZED, - IB_DEV_REGISTERED, - IB_DEV_UNREGISTERED - } reg_state; - int uverbs_abi_ver; u8 node_type; @@ -965,9 +959,6 @@ struct ib_client { struct list_head list; }; -struct ib_device *ib_alloc_device(size_t size); -void ib_dealloc_device(struct ib_device *device); - int ib_register_device (struct ib_device *device); void ib_unregister_device(struct ib_device *device); -- MST From mst at mellanox.co.il Fri Sep 30 01:13:46 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Sep 2005 11:13:46 +0300 Subject: [openib-general] Re: [RFC] IB address translation using ARP In-Reply-To: <433C219C.3090306@ichips.intel.com> References: <433C219C.3090306@ichips.intel.com> Message-ID: <20050930081346.GB31930@mellanox.co.il> Quoting Sean Hefty : > Subject: Re: [openib-general] Re: [RFC] IB address translation using ARP > > Michael S. Tsirkin wrote: > > I'd suggest you also take a look at sdp_link.c - I hear that's where > > ib_at code came from, and I think it does some things in a better way - > > such as only keeping device reference around for a short while only. > > Thanks for pointing this out. I wasn't aware that SDP did a lot of the same > things. The code in ib_at was copied from sdp_link in several places. > > We should make sure that a final solution works well for SDP, as well > as the CMA. I agree it might make sense to reuse the ARP and SA query part in SDP. I suspect the CM related part cant be easily shared between SDP and CMA, since the CM REQ format and the service record format for SDP are already set in stone, and are very SDP-specific. -- MST From mst at mellanox.co.il Fri Sep 30 01:29:20 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 30 Sep 2005 11:29:20 +0300 Subject: [openib-general] [PATCH] fix memory leak on device close Message-ID: <20050930082920.GA32179@mellanox.co.il> ----- Forwarded message from Leonid Keller ----- Look at the end of mthca_init_icm(): mdev->mcg_table.table = mthca_alloc_icm_table(...) It never released ! (at least, in my snapshop). One has to add mthca_free_icm_table(mdev, mdev->mcg_table.table); to mthca_close_hca(). ----- End forwarded message ----- Looks like a memory leak. Rather than fix it in two places, I've factored common code out into a function. Roland, does the following (compile-tested only - I dont have memfree hardware at the moment) make sense to you? --- Fix memory leak on device close. Signed-off-by: Michael S. Tsirkin Index: linux-2.6.13/drivers/infiniband/hw/mthca/mthca_main.c =================================================================== --- linux-2.6.13.orig/drivers/infiniband/hw/mthca/mthca_main.c 2005-09-30 13:14:20.000000000 +0300 +++ linux-2.6.13/drivers/infiniband/hw/mthca/mthca_main.c 2005-09-30 13:15:49.000000000 +0300 @@ -504,6 +504,24 @@ err_free_aux: return err; } +static void mthca_free_icms(struct mthca_dev *mdev) +{ + u8 status; + mthca_free_icm_table(mdev, mdev->mcg_table.table); + if (mdev->mthca_flags & MTHCA_FLAG_SRQ) + mthca_free_icm_table(mdev, mdev->srq_table.table); + mthca_free_icm_table(mdev, mdev->cq_table.table); + mthca_free_icm_table(mdev, mdev->qp_table.rdb_table); + mthca_free_icm_table(mdev, mdev->qp_table.eqp_table); + mthca_free_icm_table(mdev, mdev->qp_table.qp_table); + mthca_free_icm_table(mdev, mdev->mr_table.mpt_table); + mthca_free_icm_table(mdev, mdev->mr_table.mtt_table); + mthca_unmap_eq_icm(mdev); + + mthca_UNMAP_ICM_AUX(mdev, &status); + mthca_free_icm(mdev, mdev->fw.arbel.aux_icm); +} + static int __devinit mthca_init_arbel(struct mthca_dev *mdev) { struct mthca_dev_lim dev_lim; @@ -581,18 +599,7 @@ static int __devinit mthca_init_arbel(st return 0; err_free_icm: - if (mdev->mthca_flags & MTHCA_FLAG_SRQ) - mthca_free_icm_table(mdev, mdev->srq_table.table); - mthca_free_icm_table(mdev, mdev->cq_table.table); - mthca_free_icm_table(mdev, mdev->qp_table.rdb_table); - mthca_free_icm_table(mdev, mdev->qp_table.eqp_table); - mthca_free_icm_table(mdev, mdev->qp_table.qp_table); - mthca_free_icm_table(mdev, mdev->mr_table.mpt_table); - mthca_free_icm_table(mdev, mdev->mr_table.mtt_table); - mthca_unmap_eq_icm(mdev); - - mthca_UNMAP_ICM_AUX(mdev, &status); - mthca_free_icm(mdev, mdev->fw.arbel.aux_icm); + mthca_free_icms(mdev); err_stop_fw: mthca_UNMAP_FA(mdev, &status); @@ -612,18 +619,7 @@ static void mthca_close_hca(struct mthca mthca_CLOSE_HCA(mdev, 0, &status); if (mthca_is_memfree(mdev)) { - if (mdev->mthca_flags & MTHCA_FLAG_SRQ) - mthca_free_icm_table(mdev, mdev->srq_table.table); - mthca_free_icm_table(mdev, mdev->cq_table.table); - mthca_free_icm_table(mdev, mdev->qp_table.rdb_table); - mthca_free_icm_table(mdev, mdev->qp_table.eqp_table); - mthca_free_icm_table(mdev, mdev->qp_table.qp_table); - mthca_free_icm_table(mdev, mdev->mr_table.mpt_table); - mthca_free_icm_table(mdev, mdev->mr_table.mtt_table); - mthca_unmap_eq_icm(mdev); - - mthca_UNMAP_ICM_AUX(mdev, &status); - mthca_free_icm(mdev, mdev->fw.arbel.aux_icm); + mthca_free_icms(mdev); mthca_UNMAP_FA(mdev, &status); mthca_free_icm(mdev, mdev->fw.arbel.fw_icm); -- MST From eitan at mellanox.co.il Fri Sep 30 02:10:46 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Fri, 30 Sep 2005 12:10:46 +0300 Subject: [openib-general] Sending/Receiving MAD packets from user-spac e. Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30692AF@mtlexch01.mtl.com> Please see the file management/osm/libvendor/osm_vendor_ibumad.c Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Abhijit Gadgil [mailto:gabhijit at pantasys.com] > Sent: Friday, September 30, 2005 9:31 AM > To: openib mailing list > Subject: [openib-general] Sending/Receiving MAD packets from user-space. > > Hi All, > > I am trying to use MAD services from a user-land application. Basically > I want to do few things like registering to traps/generating Multicast > Send requests from user-space using Userspace VAPI. > > After reading through some code in osmtest.c, I couldn't figure out > whether it is using the Userspace VAPI or through a kernel module Kernel > space VAPI. Can someone point to me how I should get going about it? > > The library libopensm.a or libosmvendor.a can be used for this? However, > I am not too sure whether it is using the user mode Verbs API or not. > > Thanks in advance > > Regards. > > -abhijit > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From yaronh at voltaire.com Fri Sep 30 06:24:28 2005 From: yaronh at voltaire.com (Yaron Haviv) Date: Fri, 30 Sep 2005 15:24:28 +0200 Subject: [openib-general] [RFC] IB address translation using ARP Message-ID: <35EA21F54A45CB47B879F21A91F4862F7F9F14@taurus.voltaire.com> > -----Original Message----- > From: Roland Dreier [mailto:rolandd at cisco.com] > Sent: Thursday, September 29, 2005 9:50 PM > To: Sean Hefty > Cc: Yaron Haviv; Openib > Subject: Re: [openib-general] [RFC] IB address translation using ARP > > I think the usage model is the following: you have some magic device > that has an IB port on one side and "something else" on the other > side. Think of something like a gateway that talks SDP on the IB side > and TCP/IP on the other side. > Also applicable to two IB ports, e.g. forwarding SDP traffic from one IB partition to SDP on another partition (may even be the same port with two P_Keys), and doing some load-balancing or traffic management in between, overall there are many use cases for that. Yaron From moschny at ipd.uni-karlsruhe.de Fri Sep 30 06:35:11 2005 From: moschny at ipd.uni-karlsruhe.de (Thomas Moschny) Date: Fri, 30 Sep 2005 15:35:11 +0200 Subject: [openib-general] IPoIB configuration In-Reply-To: <1AC79F16F5C5284499BB9591B33D6F0005AEF589@orsmsx408> References: <1AC79F16F5C5284499BB9591B33D6F0005AEF589@orsmsx408> Message-ID: <200509301535.18821.moschny@ipd.uni-karlsruhe.de> On Thursday 29 September 2005 23:44, Woodruff, Robert J wrote: > I would try 2 nodes point to point. If that works, then > I suspect the switch. I did see an issue with one of our MT2400 switches > with IPoIB connectivity. We replaced the switch and it > seemed to fix the problem, so we did not investigate further, > but perhaps down rev. F/W could have been the problem. Just for the record: Meanwhile, I updated the MTS2400 switch firmware from 0.0.1 to 0.7.0. Now I see IPoIB connectivity, albeit using the Mellanox Gold Distribution driver, but I expect the OpenIB drivers to work now, too (will try that). Thanks for your comments. Regards, Thomas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From caitlinb at broadcom.com Fri Sep 30 06:38:27 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 30 Sep 2005 06:38:27 -0700 Subject: [openib-general] [RFC] IB address translation using ARP Message-ID: <54AD0F12E08D1541B826BE97C98F99F1020912@NT-SJCA-0751.brcm.ad.broadcom.com> > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier > Sent: Thursday, September 29, 2005 6:50 PM > To: Sean Hefty > Cc: Openib > Subject: Re: [openib-general] [RFC] IB address translation using ARP > > Sean> Can you explain how RDMA works in this case? This is simply > Sean> performing IP routing, and not IB routing, correct? Are you > Sean> referring to a protocol running on top of IP or IB directly? > Sean> Is the router establishing a second reliable connection on > Sean> the backend? Does it simply translate headers as packets > Sean> pass through in this case? > > I think the usage model is the following: you have some magic > device that has an IB port on one side and "something else" > on the other side. Think of something like a gateway that > talks SDP on the IB side and TCP/IP on the other side. > > You configure your IPoIB routing so that this magic device is > the next hop for talking to hosts on the IP network on the other side. > > Now someone tries to make an SDP connection to an IP address > on the other side of the magic device. Routing tables + ARP > give it the GID of the IB port of this magic device. It > connects to the magic device and run SDP to talk to the magic > device, and the magic device magically splices this into a > TCP connection to the real destination. > > Or the same idea for an NFS/RDMA <-> NFS/UDP gateway, etc. > Those examples are all basically application level gateways. As such they would have no transport or connection setup implications. The application level gateway simply offers a service on network X that it fulfills on network Y. But as far as network X is concerned the gateway IS the server. I do not believe it is possible to construct a transport layer gateway that bridges RDMA between IB and iWARP while appearing to be a normal RDMA endpoint on both networks. Higher level gateways will be possible for many applications, but I don't see how that relates to connection establishment. That would require having an end-to-end reliable connection, complete with flow control semantics, that bridged the two networks by some method other than encapsulation or tunneling. From halr at voltaire.com Fri Sep 30 06:39:36 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 09:39:36 -0400 Subject: [openib-general] Sending/Receiving MAD packets from user-space. In-Reply-To: <1128061888.5764.8.camel@psmith.ind.pantasys.com> References: <1128061888.5764.8.camel@psmith.ind.pantasys.com> Message-ID: <1128087331.5270.877.camel@hal.voltaire.com> On Fri, 2005-09-30 at 02:31, Abhijit Gadgil wrote: > I am trying to use MAD services from a user-land application. Basically > I want to do few things like registering to traps/generating Multicast > Send requests from user-space using Userspace VAPI. Is this OpenIB or gen1 ? > After reading through some code in osmtest.c, I couldn't figure out > whether it is using the Userspace VAPI or through a kernel module Kernel > space VAPI. Can someone point to me how I should get going about it? It uses the API in osm_vendor_sa_api. > The library libopensm.a or libosmvendor.a can be used for this? However, > I am not too sure whether it is using the user mode Verbs API or not. For OpenIB, the implementation is osm_vendor_ibumad_sa.c. This goes through libibumad (and the user_mad kernel module). Note that there may be a more OpenIB style user SA client in the future but this is usable now. -- Hal From shubbell at dbresearch.net Fri Sep 30 07:09:23 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 30 Sep 2005 09:09:23 -0500 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <1128087305.5270.871.camel@hal.voltaire.com> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> Message-ID: <433D4713.6050109@dbresearch.net> Hal Rosenstock wrote: >Hi Sean, > >On Fri, 2005-09-30 at 09:02, Sean Hubbell wrote: > > >>The error message when rebooting and attempting to start opensm on is a >>failure when attempting to make a call to osm_opensm_bind. >> >> > >Is ib_umad loaded ? > >-- Hal > > > > Yes. From gabhijit at pantasys.com Fri Sep 30 07:23:51 2005 From: gabhijit at pantasys.com (Abhijit Gadgil) Date: Fri, 30 Sep 2005 19:53:51 +0530 Subject: [openib-general] Sending/Receiving MAD packets from user-space. In-Reply-To: <1128087331.5270.877.camel@hal.voltaire.com> References: <1128061888.5764.8.camel@psmith.ind.pantasys.com> <1128087331.5270.877.camel@hal.voltaire.com> Message-ID: <1128090231.5764.26.camel@psmith.ind.pantasys.com> On Fri, 2005-09-30 at 19:09, Hal Rosenstock wrote: > On Fri, 2005-09-30 at 02:31, Abhijit Gadgil wrote: > > I am trying to use MAD services from a user-land application. Basically > > I want to do few things like registering to traps/generating Multicast > > Send requests from user-space using Userspace VAPI. > > Is this OpenIB or gen1 ? I am sorry, the code I was referring to is Mellanox IB-Gold 1.8.0 (which appears to be based on Gen1 drivers.) Thanks -abhijit > > After reading through some code in osmtest.c, I couldn't figure out > > whether it is using the Userspace VAPI or through a kernel module Kernel > > space VAPI. Can someone point to me how I should get going about it? > > It uses the API in osm_vendor_sa_api. > > > The library libopensm.a or libosmvendor.a can be used for this? However, > > I am not too sure whether it is using the user mode Verbs API or not. > > For OpenIB, the implementation is osm_vendor_ibumad_sa.c. This goes > through libibumad (and the user_mad kernel module). > > Note that there may be a more OpenIB style user SA client in the future > but this is usable now. > > -- Hal > From halr at voltaire.com Fri Sep 30 07:13:43 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 10:13:43 -0400 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <433D4713.6050109@dbresearch.net> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> Message-ID: <1128089623.5270.995.camel@hal.voltaire.com> On Fri, 2005-09-30 at 10:09, Sean Hubbell wrote: > Hal Rosenstock wrote: > > >Hi Sean, > > > >On Fri, 2005-09-30 at 09:02, Sean Hubbell wrote: > > > > > >>The error message when rebooting and attempting to start opensm on is a > >>failure when attempting to make a call to osm_opensm_bind. > >> > >> > > > >Is ib_umad loaded ? > > > >-- Hal > > > > > > > > > Yes. Is opensm started with sufficient permission to umad ? Are you using udev ? If so, what are the rules ? -- Hal From halr at voltaire.com Fri Sep 30 07:27:32 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 10:27:32 -0400 Subject: [openib-general] Re: netdev reference counting problem with ib_at In-Reply-To: <20050927040822.GL25823@mellanox.co.il> References: <1127792767.4379.329.camel@hal.voltaire.com> <20050927040822.GL25823@mellanox.co.il> Message-ID: <1128090452.5270.1045.camel@hal.voltaire.com> On Tue, 2005-09-27 at 00:08, Michael S. Tsirkin wrote: > Why does AT need to keep netdev reference for longer? I don't think it really does and could be changed. I think (but am not sure) it was a convenience of implementation to try to make the netdev reference counting simpler. It only needs to hold the netdev for sending the ARP (like SDP). It needs the underlying ib_device and port for ATS and path queries as well as reregistration if the interface address changes and deregistration if the IPoIB interface is removed. (SDP doesn't need to worry about these aspects (only path queries).) -- Hal From halr at voltaire.com Fri Sep 30 07:38:31 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 10:38:31 -0400 Subject: [openib-general] [PATCH] SDP: In sdp_link.c::do_link_path_lookup, handle interface table numbering holes Message-ID: <1128091110.5270.1072.camel@hal.voltaire.com> SDP: In sdp_link.c::do_link_path_lookup, handle interface table numbering holes (similar to James Lentini's patch to at.c) (this is untested) Signed-off-by: Hal Rosenstock Index: sdp_link.c =================================================================== --- sdp_link.c (revision 3623) +++ sdp_link.c (working copy) @@ -354,7 +354,6 @@ static void do_link_path_lookup(struct s struct ipoib_dev_priv *priv; struct net_device *dev = NULL; struct rtable *rt; - int counter = 0; int result = 0; struct flowi fl = { .oif = info->dif, /* oif */ @@ -435,7 +434,7 @@ static void do_link_path_lookup(struct s if (dev->flags & IFF_LOOPBACK) { dev_put(dev); - while ((dev = dev_get_by_index(++counter))) { + for (dev = dev_base; dev; dev = dev->next) { if (dev->type == ARPHRD_INFINIBAND && (dev->flags & IFF_UP)) break; From halr at voltaire.com Fri Sep 30 07:43:54 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 10:43:54 -0400 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: References: <43385665.5000104@ichips.intel.com> Message-ID: <1128091434.5270.1079.camel@hal.voltaire.com> On Mon, 2005-09-26 at 17:14, James Lentini wrote: > On Mon, 26 Sep 2005, Sean Hefty wrote: > > > James Lentini wrote: > > > Why would this module be a ULP and not part of the core? Especially since > > > the rdma_cma.h include file is intended for the core include area, > > > include/rdma. > > > > It can be a separately loaded module, so a ULP from the viewpoint of > > verbs, SA query, IB CM, etc. > > The distinction between a core component and a ULP is still fuzzy. The > core is comprised of seperately loaded modules (e.g. ib_core, ib_sa, > ib_mad, ib_ping, ib_cm, etc.). Although CMA is layered on top of CM and some other Linux modules, I too think it appears core in nature although perhaps optionally loaded. Also, in terms of the current implementation, the headers (ib_cma.h and ib_addr.h) are under include/rdma which is core, right ? -- Hal From twbowman at gmail.com Fri Sep 30 08:43:08 2005 From: twbowman at gmail.com (Todd Bowman) Date: Fri, 30 Sep 2005 09:43:08 -0600 Subject: [openib-general] ib_cm_listen failure In-Reply-To: <433C2ADF.4010402@ichips.intel.com> References: <433C2ADF.4010402@ichips.intel.com> Message-ID: udapl is using 0x115d3. How is this set and what value should it be? Todd On 9/29/05, Arlin Davis wrote: > > Todd Bowman wrote: > > > I am runing udapl on 32bit intel and running into this error: > > > > setup_listener(conn=0x8060008 cm_id=134611368) > > destroy_cm_id: conn 0x8060008 id 134611368 > > --> dapl_psp_create setup_conn_listener failed: 30000 > > 20664 Error dat_psp_create: DAT_INSUFFICIENT_RESOURCES > > 20664 Error connect_ep: DAT_INSUFFICIENT_RESOURCES > > > What SID are you listening on? sdp is listening on a range from 0x10000 > - 0x1fffff > so you may be colliding with their SID/port space. > > > > > I've tracked the error to ib_cm_listen: > > > > result = write(cm_id->device->fd, msg, size); > > if (result != size) > > return (result > 0) ? -ENODATA : result; > > > > result = -1 > > size = 28 > > device->fd = 4 > > > > > > These are the modules I have loaded: > > > > ib_sdp 93792 0 > > ib_ipoib 45572 0 > > ib_uat 15884 0 > > ib_at 29248 1 ib_uat > > ib_sa 16916 3 ib_sdp,ib_ipoib,ib_at > > ib_ucm 21764 0 > > ib_cm 39628 2 ib_sdp,ib_ucm > > ib_uverbs 33936 0 > > ib_umad 18712 0 > > ib_mthca 118300 0 > > ib_mad 43424 5 ib_ping,ib_sa,ib_cm,ib_umad,ib_mthca > > ib_core 48128 10 > > > ib_ping,ib_sdp,ib_ipoib,ib_sa,ib_ucm,ib_cm,ib_uverbs,ib_umad,ib_mthca,ib_mad > > > > > > This is /dev/infiniband: > > crw-rw-rw- 1 root root 231, 191 Sep 29 08:10 uat > > crw-rw-rw- 1 root root 231, 224 Sep 29 08:10 ucm0 > > crw-rw-rw- 1 root root 231, 0 Sep 29 08:10 umad0 > > crw-rw-rw- 1 root root 231, 1 Sep 29 08:10 umad1 > > crw-rw-rw- 1 root root 231, 192 Sep 29 08:09 uverbs0 > > crw-rw-rw- 1 root root 231, 193 Sep 29 08:09 uverbs1 > > > > I have run ulimit -l unlimited > > > > I'm at a loss here. Can someone point me in the rigt direction. > > > > Thanks, > > Todd > > > >------------------------------------------------------------------------ > > > >_______________________________________________ > >openib-general mailing list > >openib-general at openib.org > >http://openib.org/mailman/listinfo/openib-general > > > >To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shubbell at dbresearch.net Fri Sep 30 08:48:29 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 30 Sep 2005 10:48:29 -0500 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <1128089623.5270.995.camel@hal.voltaire.com> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> Message-ID: <433D5E4D.70405@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-30 at 10:09, Sean Hubbell wrote: > > >>Hal Rosenstock wrote: >> >> >> >>>Hi Sean, >>> >>>On Fri, 2005-09-30 at 09:02, Sean Hubbell wrote: >>> >>> >>> >>> >>>>The error message when rebooting and attempting to start opensm on is a >>>>failure when attempting to make a call to osm_opensm_bind. >>>> >>>> >>>> >>>> >>>Is ib_umad loaded ? >>> >>>-- Hal >>> >>> >>> >>> >>> >>> >>Yes. >> >> > >Is opensm started with sufficient permission to umad ? Are you using >udev ? If so, what are the rules ? > >-- Hal > > > > Yes, I am using udev. [root at riba ~]# cd /etc/udev/rules.d/ [root at riba rules.d]# cat 90-ib.rules KERNEL="umad*", NAME="infiniband/%k" KERNEL="issm*", NAME="infiniband/%k" KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" Sean From jlentini at netapp.com Fri Sep 30 08:53:03 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 30 Sep 2005 11:53:03 -0400 (EDT) Subject: [openib-general] ib_cm_listen failure In-Reply-To: References: <433C2ADF.4010402@ichips.intel.com> Message-ID: On Fri, 30 Sep 2005, Todd Bowman wrote: > udapl is using 0x115d3. How is this set and what value should it be? > > Todd On InfiniBand, uDAPL maps connection qualifiers onto service IDs (SIDs). The connection qualifier is chosen by the uDAPL application when it creates a Public Service Point (PSP) or Reserved Service Point (RSP). As Arlin noted, 0x115d3 is in the SDP range. The dapltest test tools uses 0xB0de. I would try any value except those in the range 0x10000-0x1fffff and 0xB0de. james From halr at voltaire.com Fri Sep 30 08:47:29 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 11:47:29 -0400 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <433D5E4D.70405@dbresearch.net> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> Message-ID: <1128095249.5270.1092.camel@hal.voltaire.com> On Fri, 2005-09-30 at 11:48, Sean Hubbell wrote: > Yes, I am using udev. > > [root at riba ~]# cd /etc/udev/rules.d/ > [root at riba rules.d]# cat 90-ib.rules > KERNEL="umad*", NAME="infiniband/%k" > KERNEL="issm*", NAME="infiniband/%k" > KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" Then is opensm started as root ? Also, what is the opensm command invocation ? -- Hal From shubbell at dbresearch.net Fri Sep 30 09:00:58 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 30 Sep 2005 11:00:58 -0500 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <1128095249.5270.1092.camel@hal.voltaire.com> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> Message-ID: <433D613A.5000000@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-30 at 11:48, Sean Hubbell wrote: > > > >>Yes, I am using udev. >> >>[root at riba ~]# cd /etc/udev/rules.d/ >>[root at riba rules.d]# cat 90-ib.rules >>KERNEL="umad*", NAME="infiniband/%k" >>KERNEL="issm*", NAME="infiniband/%k" >>KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" >> >> > >Then is opensm started as root ? Also, what is the opensm command >invocation ? > >-- Hal > > > > Yes, opensm -V Sean From halr at voltaire.com Fri Sep 30 09:07:34 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 12:07:34 -0400 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <433D613A.5000000@dbresearch.net> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> <433D613A.5000000@dbresearch.net> Message-ID: <1128096319.5270.1101.camel@hal.voltaire.com> On Fri, 2005-09-30 at 12:00, Sean Hubbell wrote: > Hal Rosenstock wrote: > > >On Fri, 2005-09-30 at 11:48, Sean Hubbell wrote: > > > > > > > >>Yes, I am using udev. > >> > >>[root at riba ~]# cd /etc/udev/rules.d/ > >>[root at riba rules.d]# cat 90-ib.rules > >>KERNEL="umad*", NAME="infiniband/%k" > >>KERNEL="issm*", NAME="infiniband/%k" > >>KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" > >> > >> > > > >Then is opensm started as root ? Also, what is the opensm command > >invocation ? > > > >-- Hal > > > > > > > > > Yes, > > opensm -V OK. I'm game for the log then. -- Hal From sean.hefty at intel.com Fri Sep 30 09:33:33 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 30 Sep 2005 09:33:33 -0700 Subject: [openib-general] [PATCH] [RFC] RDMA generic CMA updates In-Reply-To: <1128091434.5270.1079.camel@hal.voltaire.com> Message-ID: >Although CMA is layered on top of CM and some other Linux modules, I too >think it appears core in nature although perhaps optionally loaded. > >Also, in terms of the current implementation, the headers (ib_cma.h and >ib_addr.h) are under include/rdma which is core, right ? If accepted back into the trunk, we can decide where's best to put it. I placed my copy under the ULP directory to match where kDAPL is, but I'm leaning more towards core now. - Sean From shubbell at dbresearch.net Fri Sep 30 10:07:28 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 30 Sep 2005 12:07:28 -0500 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <1128095249.5270.1092.camel@hal.voltaire.com> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> Message-ID: <433D70D0.7020305@dbresearch.net> Hal Rosenstock wrote: >On Fri, 2005-09-30 at 11:48, Sean Hubbell wrote: > > > >>Yes, I am using udev. >> >>[root at riba ~]# cd /etc/udev/rules.d/ >>[root at riba rules.d]# cat 90-ib.rules >>KERNEL="umad*", NAME="infiniband/%k" >>KERNEL="issm*", NAME="infiniband/%k" >>KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" >> >> > >Then is opensm started as root ? Also, what is the opensm command >invocation ? > >-- Hal > > > > Hal, With this script (which manualy loads each ib module) this works (finally, sigh...). Sean -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: openib URL: From shubbell at dbresearch.net Fri Sep 30 10:14:35 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 30 Sep 2005 12:14:35 -0500 Subject: [openib-general] Updating firmware In-Reply-To: <433D70D0.7020305@dbresearch.net> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> <433D70D0.7020305@dbresearch.net> Message-ID: <433D727B.9050604@dbresearch.net> Hello, What is the best way to update the firmware on the HCA card and the Infiniband switches? Is this mstflint or tvflash? Sean From halr at voltaire.com Fri Sep 30 10:15:01 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 13:15:01 -0400 Subject: [openib-general] Updating firmware In-Reply-To: <433D727B.9050604@dbresearch.net> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> <433D70D0.7020305@dbresearch.net> <433D727B.9050604@dbresearch.net> Message-ID: <1128100500.5270.1143.camel@hal.voltaire.com> Hi Sean, On Fri, 2005-09-30 at 13:14, Sean Hubbell wrote: > What is the best way to update the firmware on the HCA card and the > Infiniband switches? Switch update of firmware is not supported by OpenIB. This is specific to the vendor switch. > Is this mstflint or tvflash? For the HCA, mstflint is officially supported; tvlash isn't. Which you use may also be a matter of preference. -- Hal From halr at voltaire.com Fri Sep 30 10:19:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 13:19:46 -0400 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <433D70D0.7020305@dbresearch.net> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> <433D70D0.7020305@dbresearch.net> Message-ID: <1128100594.5270.1148.camel@hal.voltaire.com> On Fri, 2005-09-30 at 13:07, Sean Hubbell wrote: > Hal Rosenstock wrote: > > >On Fri, 2005-09-30 at 11:48, Sean Hubbell wrote: > > > > > > > >>Yes, I am using udev. > >> > >>[root at riba ~]# cd /etc/udev/rules.d/ > >>[root at riba rules.d]# cat 90-ib.rules > >>KERNEL="umad*", NAME="infiniband/%k" > >>KERNEL="issm*", NAME="infiniband/%k" > >>KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" > >> > >> > > > >Then is opensm started as root ? Also, what is the opensm command > >invocation ? > > > >-- Hal > > > > > > > > > Hal, > > With this script (which manualy loads each ib module) this works > (finally, sigh...). > > Sean > > > > > > > ______________________________________________________________________ > > #! /bin/bash > # > # openib Bring up/down opensm > # > # chkconfig: 2345 9 91 > # description: Activates/Deactivates open infiniband management. > # > ### BEGIN INIT INFO > # Provides: $network > ### END INIT INFO > > # Source function library. > . /etc/init.d/functions > > CWD=`pwd` > > # See how we were called. > case "$1" in > start) > action $"Loading core module: " /sbin/modprobe ib_core > action $"Loading mad module: " /sbin/modprobe ib_mad > action $"Loading sa module: " /sbin/modprobe ib_sa > action $"Loading umad module: " /sbin/modprobe ib_umad > action $"Loading mthca module: " /sbin/modprobe ib_mthca > action $"Loading ping module: " /sbin/modprobe ib_ping > action $"Loading ipoib module: " /sbin/modprobe ib_ipoib > action $"Loading cm module: " /sbin/modprobe ib_cm > action $"Loading ucm module: " /sbin/modprobe ib_ucm > action $"Loading sdp module: " /sbin/modprobe ib_sdp > action $"Loading at module: " /sbin/modprobe ib_at > sleep 2; > /usr/local/bin/opensm -V & > touch /var/lock/subsys/openib > ;; > stop) > echo -n $"Stopping opensm: " > > killproc opensm -TERM > RETVAL=$? > echo > if [ $RETVAL -eq 0 ]; then > rm -f /var/lock/subsys/openib > fi > ;; > status) > PORT1_STATUS=`cat /sys/class/infiniband/mthca0/ports/1/state` > echo "Infiniband Port 1 is : $PORT1_STATUS" > > PORT2_STATUS=`cat /sys/class/infiniband/mthca0/ports/2/state` > echo "Infiniband Port 2 is : $PORT2_STATUS" > ;; > restart|reload) > cd "$CWD" > $0 stop > sleep 10; > $0 start > ;; > *) > echo $"Usage: $0 {start|stop|restart|reload|status}" > exit 1 > esac > > exit 0 Yes, it could be the sleep which makes it work. It takes some time before the umad module gets far enough into initialization and makes things available to user space so that the "bind" can work. -- Hal From halr at voltaire.com Fri Sep 30 10:23:01 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 13:23:01 -0400 Subject: [Fwd: [openib-general] [ANNOUCE] OpenIB OpenSM: trunk now supports 1.8.0 features] In-Reply-To: <433D70D0.7020305@dbresearch.net> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> <433D70D0.7020305@dbresearch.net> Message-ID: <1128100785.5270.1156.camel@hal.voltaire.com> On Fri, 2005-09-30 at 13:07, Sean Hubbell wrote: > Hal Rosenstock wrote: > > >On Fri, 2005-09-30 at 11:48, Sean Hubbell wrote: > > > > > > > >>Yes, I am using udev. > >> > >>[root at riba ~]# cd /etc/udev/rules.d/ > >>[root at riba rules.d]# cat 90-ib.rules > >>KERNEL="umad*", NAME="infiniband/%k" > >>KERNEL="issm*", NAME="infiniband/%k" > >>KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666" > >> > >> > > > >Then is opensm started as root ? Also, what is the opensm command > >invocation ? > > > >-- Hal > > > > > > > > > Hal, > > With this script (which manualy loads each ib module) this works > (finally, sigh...). Glad to hear it. > > Sean > > > > > > > ______________________________________________________________________ > > #! /bin/bash > # > # openib Bring up/down opensm > # > # chkconfig: 2345 9 91 > # description: Activates/Deactivates open infiniband management. > # > ### BEGIN INIT INFO > # Provides: $network > ### END INIT INFO > > # Source function library. > . /etc/init.d/functions > > CWD=`pwd` > > # See how we were called. > case "$1" in > start) > action $"Loading core module: " /sbin/modprobe ib_core > action $"Loading mad module: " /sbin/modprobe ib_mad > action $"Loading sa module: " /sbin/modprobe ib_sa > action $"Loading umad module: " /sbin/modprobe ib_umad > action $"Loading mthca module: " /sbin/modprobe ib_mthca > action $"Loading ping module: " /sbin/modprobe ib_ping > action $"Loading ipoib module: " /sbin/modprobe ib_ipoib > action $"Loading cm module: " /sbin/modprobe ib_cm > action $"Loading ucm module: " /sbin/modprobe ib_ucm > action $"Loading sdp module: " /sbin/modprobe ib_sdp > action $"Loading at module: " /sbin/modprobe ib_at > sleep 2; > /usr/local/bin/opensm -V & > touch /var/lock/subsys/openib > ;; > stop) > echo -n $"Stopping opensm: " > > killproc opensm -TERM > RETVAL=$? > echo > if [ $RETVAL -eq 0 ]; then > rm -f /var/lock/subsys/openib > fi > ;; > status) > PORT1_STATUS=`cat /sys/class/infiniband/mthca0/ports/1/state` > echo "Infiniband Port 1 is : $PORT1_STATUS" > > PORT2_STATUS=`cat /sys/class/infiniband/mthca0/ports/2/state` > echo "Infiniband Port 2 is : $PORT2_STATUS" > ;; > restart|reload) > cd "$CWD" > $0 stop > sleep 10; > $0 start > ;; > *) > echo $"Usage: $0 {start|stop|restart|reload|status}" > exit 1 > esac > > exit 0 Yes, it could be the sleep which makes it work. It takes some time before the umad module gets far enough into initialization and makes things available to user space so that the "bind" can work. -- Hal From jlentini at netapp.com Fri Sep 30 10:42:49 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 30 Sep 2005 13:42:49 -0400 (EDT) Subject: [openib-general] Updating firmware In-Reply-To: <1128100500.5270.1143.camel@hal.voltaire.com> References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> <433D70D0.7020305@dbresearch.net> <433D727B.9050604@dbresearch.net> <1128100500.5270.1143.camel@hal.voltaire.com> Message-ID: On Fri, 30 Sep 2005, Hal Rosenstock wrote: > Hi Sean, > > On Fri, 2005-09-30 at 13:14, Sean Hubbell wrote: > > What is the best way to update the firmware on the HCA card and the > > Infiniband switches? > > Switch update of firmware is not supported by OpenIB. This is specific > to the vendor switch. > > > Is this mstflint or tvflash? > > For the HCA, mstflint is officially supported; tvlash isn't. Which you > use may also be a matter of preference. The Wiki's Installation Cheat Sheet has an overview of how to flash a Mellanox HCA: https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet From shubbell at dbresearch.net Fri Sep 30 10:50:33 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 30 Sep 2005 12:50:33 -0500 Subject: [openib-general] Updating firmware In-Reply-To: References: <1126565088.4382.36298.camel@hal.voltaire.com> <4326CCC7.1030903@dbresearch.net> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> <433D70D0.7020305@dbresearch.net> <433D727B.9050604@dbresearch.net> <1128100500.5270.1143.camel@hal.voltaire.com> Message-ID: <433D7AE9.90308@dbresearch.net> James Lentini wrote: >On Fri, 30 Sep 2005, Hal Rosenstock wrote: > > > >>Hi Sean, >> >>On Fri, 2005-09-30 at 13:14, Sean Hubbell wrote: >> >> >>> What is the best way to update the firmware on the HCA card and the >>>Infiniband switches? >>> >>> >>Switch update of firmware is not supported by OpenIB. This is specific >>to the vendor switch. >> >> >> >>>Is this mstflint or tvflash? >>> >>> >>For the HCA, mstflint is officially supported; tvlash isn't. Which you >>use may also be a matter of preference. >> >> > >The Wiki's Installation Cheat Sheet has an overview of how to flash a >Mellanox HCA: > >https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet > > > > Great, but mstflint does not compile on my system. Would you like me to fix this (using the same autogen.sh and configure like files)? Or is this for a specific reason? Sean From jlentini at netapp.com Fri Sep 30 11:13:03 2005 From: jlentini at netapp.com (James Lentini) Date: Fri, 30 Sep 2005 14:13:03 -0400 (EDT) Subject: [openib-general] Updating firmware In-Reply-To: <433D7AE9.90308@dbresearch.net> References: <1126565088.4382.36298.camel@hal.voltaire.com> <1126616513.4382.43405.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> <433D70D0.7020305@dbresearch.net> <433D727B.9050604@dbresearch.net> <1128100500.5270.1143.camel@hal.voltaire.com> <433D7AE9.90308@dbresearch.net> Message-ID: On Fri, 30 Sep 2005, Sean Hubbell wrote: > James Lentini wrote: > > > On Fri, 30 Sep 2005, Hal Rosenstock wrote: > > > > > > > Hi Sean, > > > > > > On Fri, 2005-09-30 at 13:14, Sean Hubbell wrote: > > > > > > > What is the best way to update the firmware on the HCA card and the > > > > Infiniband switches? > > > Switch update of firmware is not supported by OpenIB. This is specific > > > to the vendor switch. > > > > > > > > > > Is this mstflint or tvflash? > > > > > > > For the HCA, mstflint is officially supported; tvlash isn't. Which you > > > use may also be a matter of preference. > > > > > > > The Wiki's Installation Cheat Sheet has an overview of how to flash a > > Mellanox HCA: > > > > https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet > > > > > > > Great, but mstflint does not compile on my system. Would you like me to fix > this (using the same autogen.sh and configure like files)? > Or is this for a specific reason? Michael S. Tsirkin, mst at mellanox.co.il, is the maintainer of the mstflint tool. I believe Michael lives in Israel, so he may not see your message until Sunday. Bug reports/patches would be appreciated, but Michael would be the one to review them. FYI: The procedure for submitting patches is outlined in the OpenIB FAQ: https://openib.org/tiki/tiki-index.php?page=OpenIBFAQ In addition to openib-general, I'd suggest including Michael in the to field. From shubbell at dbresearch.net Fri Sep 30 11:20:59 2005 From: shubbell at dbresearch.net (Sean Hubbell) Date: Fri, 30 Sep 2005 13:20:59 -0500 Subject: [openib-general] Updating firmware In-Reply-To: References: <1126565088.4382.36298.camel@hal.voltaire.com> <43270A46.50401@dbresearch.net> <1126632083.4514.682.camel@hal.voltaire.com> <43287709.5000801@dbresearch.net> <1127840724.4436.26.camel@localhost.localdomain> <43397EE5.4050105@dbresearch.net> <1127844895.4403.3.camel@hal.voltaire.com> <433C2C58.9050201@dbresearch.net> <1128018539.4398.48.camel@hal.voltaire.com> <433C4C1E.3070705@dbresearch.net> <1128031418.4398.138.camel@hal.voltaire.com> <433D374B.9080806@dbresearch.net> <1128087305.5270.871.camel@hal.voltaire.com> <433D4713.6050109@dbresearch.net> <1128089623.5270.995.camel@hal.voltaire.com> <433D5E4D.70405@dbresearch.net> <1128095249.5270.1092.camel@hal.voltaire.com> <433D70D0.7020305@dbresearch.net> <433D727B.9050604@dbresearch.net> <1128100500.5270.1143.camel@hal.voltaire.com> <433D7AE9.90308@dbresearch.net> Message-ID: <433D820B.10100@dbresearch.net> James Lentini wrote: >On Fri, 30 Sep 2005, Sean Hubbell wrote: > > > >>James Lentini wrote: >> >> >> >>>On Fri, 30 Sep 2005, Hal Rosenstock wrote: >>> >>> >>> >>> >>>>Hi Sean, >>>> >>>>On Fri, 2005-09-30 at 13:14, Sean Hubbell wrote: >>>> >>>> >>>> >>>>> What is the best way to update the firmware on the HCA card and the >>>>>Infiniband switches? >>>>> >>>>> >>>>Switch update of firmware is not supported by OpenIB. This is specific >>>>to the vendor switch. >>>> >>>> >>>> >>>> >>>>>Is this mstflint or tvflash? >>>>> >>>>> >>>>> >>>>For the HCA, mstflint is officially supported; tvlash isn't. Which you >>>>use may also be a matter of preference. >>>> >>>> >>>> >>>The Wiki's Installation Cheat Sheet has an overview of how to flash a >>>Mellanox HCA: >>> >>>https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet >>> >>> >>> >>> >>> >>Great, but mstflint does not compile on my system. Would you like me to fix >>this (using the same autogen.sh and configure like files)? >>Or is this for a specific reason? >> >> > >Michael S. Tsirkin, mst at mellanox.co.il, is the maintainer of the >mstflint tool. I believe Michael lives in Israel, so he may not see >your message until Sunday. > >Bug reports/patches would be appreciated, but Michael would be the one >to review them. > >FYI: The procedure for submitting patches is outlined in the OpenIB >FAQ: > >https://openib.org/tiki/tiki-index.php?page=OpenIBFAQ > >In addition to openib-general, I'd suggest including Michael in the to >field. > > > > Michael, Would you like me to add autogen.sh and configure scripts to build mstflint? The reason is that to compile this on my system (Dell PowerEdge 2850 (2) 3.2 GHz running cAos 2.0 (with Patches) is not resolving some of the require include paths. Sean From pradeep at us.ibm.com Fri Sep 30 11:28:43 2005 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Fri, 30 Sep 2005 11:28:43 -0700 Subject: [openib-general] Re: netdev reference counting problem with ib_at In-Reply-To: <1128090452.5270.1045.camel@hal.voltaire.com> Message-ID: I have been following this and the other thread on CMA. There appears to be some opinions to removing the ib_at module and introduce CMA. Is that correct? True, CMA will need some form of address translation. Can we not use some incarnation of ib_at for that? I realize that ib_at has a net_device refcnt problem. Is this refcnt problem a usage issue rather than just a bug in the implementation? How would CMA solve the refcnt issue? What am I missing? Pradeep pradeep at us.ibm.com openib-general-bounces at openib.org wrote on 09/30/2005 07:27:32 AM: > On Tue, 2005-09-27 at 00:08, Michael S. Tsirkin wrote: > > Why does AT need to keep netdev reference for longer? > > I don't think it really does and could be changed. I think (but am not > sure) it was a convenience of implementation to try to make the netdev > reference counting simpler. > > It only needs to hold the netdev for sending the ARP (like SDP). > > It needs the underlying ib_device and port for ATS and path queries as > well as reregistration if the interface address changes and > deregistration if the IPoIB interface is removed. (SDP doesn't need to > worry about these aspects (only path queries).) > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.hefty at intel.com Fri Sep 30 11:40:36 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 30 Sep 2005 11:40:36 -0700 Subject: [openib-general] Re: netdev reference counting problem with ib_at In-Reply-To: Message-ID: I have been following this and the other thread on CMA. There appears to be some opinions to removing the ib_at module and introduce CMA. Is that correct? The proposal is to remove ATS, which is only a portion of ib_at, and instead use ARP to resolve addresses. True, CMA will need some form of address translation. Can we not use some incarnation of ib_at for that? I realize that ib_at has a net_device refcnt problem. Is this refcnt problem a usage issue rather than just a bug in the implementation? How would CMA solve the refcnt issue? What am I missing? Some incarnation of ib_at will be needed, and portions of either the ib_at or SDP code will likely be used. - Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Fri Sep 30 11:38:30 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 30 Sep 2005 14:38:30 -0400 Subject: [openib-general] Re: netdev reference counting problem with ib_at In-Reply-To: References: Message-ID: <1128105510.4401.82.camel@hal.voltaire.com> On Fri, 2005-09-30 at 14:28, Pradeep Satyanarayana wrote: > I have been following this and the other thread on CMA. There appears > to be some opinions to removing the ib_at module and introduce CMA. > Is that correct? > > True, CMA will need some form of address translation. Can we not use > some incarnation of ib_at for that? I realize that ib_at has a > net_device > refcnt problem. Is this refcnt problem a usage issue rather than just > a bug in the implementation? It's an implementation issue which can be fixed. The address translation portion of CMA will need to do something similar (as SDP does or a fixed AT would). -- Hal > How would CMA solve the refcnt issue? > What am I missing? > > Pradeep > pradeep at us.ibm.com > > > openib-general-bounces at openib.org wrote on 09/30/2005 07:27:32 AM: > > > On Tue, 2005-09-27 at 00:08, Michael S. Tsirkin wrote: > > > Why does AT need to keep netdev reference for longer? > > > > I don't think it really does and could be changed. I think (but am > not > > sure) it was a convenience of implementation to try to make the > netdev > > reference counting simpler. > > > > It only needs to hold the netdev for sending the ARP (like SDP). > > > > It needs the underlying ib_device and port for ATS and path queries > as > > well as reregistration if the interface address changes and > > deregistration if the IPoIB interface is removed. (SDP doesn't need > to > > worry about these aspects (only path queries).) > > > > -- Hal > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From rolandd at cisco.com Fri Sep 30 13:27:12 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 30 Sep 2005 13:27:12 -0700 Subject: [openib-general] CMA and device removal In-Reply-To: (Sean Hefty's message of "Thu, 29 Sep 2005 22:41:40 -0700") References: Message-ID: <5264sifgin.fsf@cisco.com> Sean> On creation, a cma_id is not bound to a particular device. Sean> It gets bound to a device as part of address resolution. Sean> The CMA tracks device references. When a device is removed, Sean> the CMA will generate an event on all associated cma_id's. Sean> Users must destroy cma_id's and associated device resources Sean> after receiving the event. I think this would require some modification of our current device removal handling. Currently we issue removal notifications in reverse order of client registration -- in other words, the client that registered last gets notified first. This means that we don't notify CMA of a device removal until after everyone using CMA has already supposedly cleaned up. - R. From rolandd at cisco.com Fri Sep 30 13:38:05 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 30 Sep 2005 13:38:05 -0700 Subject: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it In-Reply-To: <20050930063003.GM29921@esmail.cup.hp.com> (Grant Grundler's message of "Thu, 29 Sep 2005 23:30:03 -0700") References: <52vf0kii49.fsf@cisco.com> <433C1821.6000809@mellanox.com> <52zmpvhll8.fsf@cisco.com> <20050930063003.GM29921@esmail.cup.hp.com> Message-ID: <521x36fg0i.fsf@cisco.com> Grant> Christoph is right. Even if the code works, it's risky to Grant> muck with the dma_addr_t contents. I'll try to look at this Grant> tomorrow and if I have a better idea, propose it. (I'm Grant> pessimistic that I'll have a better idea though) I don't think I buy this. The DMA mapping API is giving us a gather/scatter list where each entry is a dma address X and a length Y. The only manipulation we're doing is feeding this to the hardware as a list of chunks at addresses X, X + PAGE_SIZE, X + 2 * PAGE_SIZE, on up to whatever is required to cover the length Y. This is exactly what any bus master would do when performing DMA, so I don't see how it is incorrect. - R. From rolandd at cisco.com Fri Sep 30 13:46:50 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 30 Sep 2005 13:46:50 -0700 Subject: [openib-general] Re: [PATCH] core+mthca: questions and proposal: kill ib_(de)alloc_device In-Reply-To: <20050930080510.GA31930@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 30 Sep 2005 11:05:10 +0300") References: <20050930080510.GA31930@mellanox.co.il> Message-ID: <52wtkye11h.fsf@cisco.com> >>>>> "Michael" == Michael S Tsirkin writes: Michael> Guys, looking with Leonid Keller at device.c raised Michael> several questions with regard to what Michael> ib_alloc_device/ib_dealloc_device do: I agree this code is probably wrong and needs to be fixed up, but I don't think it's quite as simple as your patch unfortunately. See below. Michael> 1. Why is ib_device_register_sysfs called from Michael> ib_register_device, but ib_device_unregister_sysfs from Michael> ib_dealloc_device? It probably isn't the best design, but ib_device_unregister_sysfs() is what triggers the final free of the device structure. Michael> 2. Who is supposed to set reg_state back to Michael> IB_DEV_UNINITIALIZED? Without it ib_dealloc_device does Michael> not seem to free the device structure. Is this a memory Michael> leak? No, it's not a leak. ib_dealloc_device() doesn't actually perform the freeing unless the device has never been registered at all. If it has been registered and then unregistered, ib_device_unregister_sysfs() unregisters the class_device, which ends up calling ib_device_release() when all references are gone. Michael> 3. ib_alloc_device does not set reg_state, it seems to Michael> rely on the fact that IB_DEV_UNINITIALIZED = 0. Is that Michael> intentional? No, that should be made explicit. Michael> 4. For ib_alloc_device/ib_dealloc_device to work Michael> properly, it seems that the device structure must have Michael> ib_device as the first member. Is this limitation Michael> documented anywhere? Hmm, not explicitly that I can see. Michael> 5. Why do we need reg_state in the device, at all? I Michael> thought we can trust providers to call Michael> register/unregister in proper order? The only thing reg_state is really used for is to tell if a device has ever been registered. If it hasn't, then there's no sysfs stuff, so ib_dealloc_device() can just free it. This could be reworked so that struct ib_device has an embedded kref, and each sysfs registration just takes a reference on the ib_device. Michael> What do you say we simply let providers allocate the Michael> structure? Can you really go wrong reducing the line Michael> count by more than 50 :) ? Unfortunately I think so. The current code isn't safe and has a bunch of holes, but the goal is correct: keep some context around until the last sysfs reference has been released. - R. From rolandd at cisco.com Fri Sep 30 13:56:20 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 30 Sep 2005 13:56:20 -0700 Subject: [openib-general] Re: [PATCH] fix memory leak on device close In-Reply-To: <20050930082920.GA32179@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 30 Sep 2005 11:29:20 +0300") References: <20050930082920.GA32179@mellanox.co.il> Message-ID: <52mzlue0ln.fsf@cisco.com> Yes, good catch. Applied to svn and queued for 2.6.14. - R. From sean.hefty at intel.com Fri Sep 30 14:05:35 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Fri, 30 Sep 2005 14:05:35 -0700 Subject: [openib-general] CMA and device removal In-Reply-To: <5264sifgin.fsf@cisco.com> Message-ID: >I think this would require some modification of our current device >removal handling. Currently we issue removal notifications in reverse >order of client registration -- in other words, the client that >registered last gets notified first. This means that we don't notify >CMA of a device removal until after everyone using CMA has already >supposedly cleaned up. The idea with this is that a user of the CMA does not need to register for device addition/removal, and track devices themselves. What I have right now is something similar to this: rdma_create_id(); rdma_bind_addr(id, optional src addr, dst addr); rdma_resolve_route(id); /* optional - done by connect if not called */ rdma_connect(id); and on the passive side: rdma_create_id(); rdma_bind_addr(id, src addr); rdma_listen(id); Bind completes asynchronously. On the active side, bind associates an id with a local RDMA device (available through id->device). On the passive side, bind may or may not associate the listen id with a device, depending if a source address was specified, or only a port. If the device associated with an id is removed, a device removal event is generated to the user. - Sean From iod00d at hp.com Fri Sep 30 14:56:19 2005 From: iod00d at hp.com (Grant Grundler) Date: Fri, 30 Sep 2005 14:56:19 -0700 Subject: [openib-general] Re: [PATCH] SRP: don't use TX IU after freeing it In-Reply-To: <521x36fg0i.fsf@cisco.com> References: <52vf0kii49.fsf@cisco.com> <433C1821.6000809@mellanox.com> <52zmpvhll8.fsf@cisco.com> <20050930063003.GM29921@esmail.cup.hp.com> <521x36fg0i.fsf@cisco.com> Message-ID: <20050930215619.GF1770@esmail.cup.hp.com> On Fri, Sep 30, 2005 at 01:38:05PM -0700, Roland Dreier wrote: > Grant> Christoph is right. Even if the code works, it's risky to > Grant> muck with the dma_addr_t contents. I'll try to look at this > Grant> tomorrow and if I have a better idea, propose it. (I'm > Grant> pessimistic that I'll have a better idea though) > > I don't think I buy this. The DMA mapping API is giving us a > gather/scatter list where each entry is a dma address X and a length > Y. The only manipulation we're doing is feeding this to the hardware > as a list of chunks at addresses X, X + PAGE_SIZE, X + 2 * PAGE_SIZE, > on up to whatever is required to cover the length Y. This is exactly > what any bus master would do when performing DMA, so I don't see how > it is incorrect. Yeah, you are right. I'm being too paranoid and didn't think about how a bus master operates. thanks, grant From arlin.r.davis at intel.com Fri Sep 30 17:24:49 2005 From: arlin.r.davis at intel.com (Arlin Davis) Date: Fri, 30 Sep 2005 17:24:49 -0700 Subject: [openib-general] [PATCH] uDAPL cq channel support, sync with latest verbs Message-ID: James, Here is a patch to support CQ_WAIT_OBJECT with channels and sync with latest verbs. Tested with dapltest, dtest, netpipe, and Intel-MPI. -arlin Signed-off by: Arlin Davis Index: dapl/udapl/Makefile =================================================================== --- dapl/udapl/Makefile (revision 3629) +++ dapl/udapl/Makefile (working copy) @@ -134,7 +134,7 @@ ifeq ($(VERBS),openib) PROVIDER = $(TOPDIR)/../openib CFLAGS += -DOPENIB -#CFLAGS += -DCQ_WAIT_OBJECT uncomment when fixed +CFLAGS += -DCQ_WAIT_OBJECT CFLAGS += -I/usr/local/include/infiniband endif Index: dapl/openib/dapl_ib_util.c =================================================================== --- dapl/openib/dapl_ib_util.c (revision 3629) +++ dapl/openib/dapl_ib_util.c (working copy) @@ -208,8 +208,6 @@ { struct dlist *dev_list; long opts; - int i; - dapl_dbg_log (DAPL_DBG_TYPE_UTIL, " open_hca: %s - %p\n", hca_name, hca_ptr ); @@ -278,16 +276,18 @@ " open_hca: ERR with async FD\n" ); goto bail; } - for (i=0;iib_hca_handle->num_comp;i++) { /* uCQ */ - opts = fcntl(hca_ptr->ib_hca_handle->cq_fd[i], F_GETFL); - if (opts < 0 || fcntl(hca_ptr->ib_hca_handle->async_fd, - F_SETFL, opts | O_NONBLOCK) < 0) { - dapl_dbg_log(DAPL_DBG_TYPE_ERR, - " open_hca: ERR with CQ FD\n"); - goto bail; - } - } - + + /* EVD events without direct CQ channels, non-blocking */ + hca_ptr->ib_trans.ib_cq = + ibv_create_comp_channel(hca_ptr->ib_hca_handle); + opts = fcntl(hca_ptr->ib_trans.ib_cq->fd, F_GETFL); /* uCQ */ + if (opts < 0 || fcntl(hca_ptr->ib_trans.ib_cq->fd, + F_SETFL, opts | O_NONBLOCK) < 0) { + dapl_dbg_log (DAPL_DBG_TYPE_ERR, + " open_hca: ERR with CQ FD\n" ); + goto bail; + } + /* Get CM device handle for events, and set to non-blocking */ hca_ptr->ib_trans.ib_cm = ib_cm_get_device(hca_ptr->ib_hca_handle); opts = fcntl(hca_ptr->ib_trans.ib_cm->fd, F_GETFL); /* uCM */ @@ -320,6 +320,7 @@ ((struct sockaddr_in *)&hca_ptr->hca_address)->sin_addr.s_addr >> 24 & 0xff, hca_ptr->ib_trans.max_inline_send ); + hca_ptr->ib_trans.d_hca = hca_ptr; return DAT_SUCCESS; bail: @@ -704,7 +705,6 @@ } } - /* work thread for uAT, uCM, CQ, and async events */ void dapli_thread(void *arg) { @@ -741,7 +741,6 @@ hca = NULL; while(hca) { - int i; ufds[++idx].fd = hca->ib_cm->fd; /* uCM */ ufds[idx].events = POLLIN; ufds[idx].revents = 0; @@ -750,15 +749,17 @@ ufds[idx].events = POLLIN; ufds[idx].revents = 0; uhca[idx] = hca; - for (i=0;iib_ctx->num_comp;i++) { /* uCQ */ - ufds[++idx].fd = hca->ib_ctx->cq_fd[i]; + + if (hca->ib_cq != NULL) { + ufds[++idx].fd = hca->ib_cq->fd; /* uCQ */ ufds[idx].events = POLLIN; ufds[idx].revents = 0; uhca[idx] = hca; } + hca = dapl_llist_next_entry( - &g_hca_list, - (DAPL_LLIST_ENTRY*)&hca->entry); + &g_hca_list, + (DAPL_LLIST_ENTRY*)&hca->entry); } /* unlock, and setup poll */ @@ -810,6 +811,5 @@ dapl_dbg_log(DAPL_DBG_TYPE_UTIL," ib_thread(%d) EXIT\n",getpid()); g_ib_destroy = 2; dapl_os_unlock(&g_hca_lock); - pthread_exit(NULL); } Index: dapl/openib/dapl_ib_cm.c =================================================================== --- dapl/openib/dapl_ib_cm.c (revision 3629) +++ dapl/openib/dapl_ib_cm.c (working copy) @@ -291,7 +291,6 @@ } /* move QP state to RTR and RTS */ - /* TODO: could use a ib_cm_init_qp_attr() call here */ dapl_dbg_log(DAPL_DBG_TYPE_CM, " rep_recv: RTR_RTS: id %d rqp %x rlid %x rSID %d\n", conn->cm_id,event->param.rep_rcvd.remote_qpn, @@ -621,8 +620,8 @@ dapl_dbg_log(DAPL_DBG_TYPE_CM, " connect: at_route requested(ret=%d,id=%d): SRC %x DST %x\n", status, conn->dapl_comp.req_id, - ((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr, - ((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr); + ntohl(((struct sockaddr_in *)&conn->hca->hca_address)->sin_addr.s_addr), + ntohl(((struct sockaddr_in *)&conn->r_addr)->sin_addr.s_addr)); if (status < 0) { dat_status = dapl_convert_errno(errno,"ib_at_route_by_ip"); Index: dapl/openib/dapl_ib_qp.c =================================================================== --- dapl/openib/dapl_ib_qp.c (revision 3629) +++ dapl/openib/dapl_ib_qp.c (working copy) @@ -82,13 +82,21 @@ * Create a CQ with zero entries under the covers to support and * catch any invalid posting. */ - if ( rcv_evd != DAT_HANDLE_NULL ) + if (rcv_evd != DAT_HANDLE_NULL) rcv_cq = rcv_evd->ib_cq_handle; else if (!ia_ptr->hca_ptr->ib_trans.ib_cq_empty) rcv_cq = ia_ptr->hca_ptr->ib_trans.ib_cq_empty; else { - rcv_cq = ibv_create_cq(ia_ptr->hca_ptr->ib_hca_handle, - 0, NULL); + struct ibv_comp_channel *channel = + ia_ptr->hca_ptr->ib_trans.ib_cq; +#ifdef CQ_WAIT_OBJECT + if (rcv_evd->cq_wait_obj_handle) + channel = rcv_evd->cq_wait_obj_handle; +#endif + /* Call IB verbs to create CQ */ + rcv_cq = ibv_create_cq(ia_ptr->hca_ptr->ib_hca_handle, + 0, NULL, channel, 0); + if (rcv_cq == IB_INVALID_HANDLE) return(dapl_convert_errno(ENOMEM, "create_cq")); Index: dapl/openib/dapl_ib_util.h =================================================================== --- dapl/openib/dapl_ib_util.h (revision 3629) +++ dapl/openib/dapl_ib_util.h (working copy) @@ -159,7 +159,7 @@ typedef uint32_t ib_comp_handle_t; #ifdef CQ_WAIT_OBJECT -typedef struct dapl_evd *ib_wait_obj_handle_t; +typedef struct ibv_comp_channel *ib_wait_obj_handle_t; #endif /* Definitions */ @@ -233,9 +233,11 @@ { struct ib_llist_entry entry; int destroy; + struct dapl_hca *d_hca; struct ibv_device *ib_dev; struct ibv_context *ib_ctx; struct ib_cm_device *ib_cm; + struct ibv_comp_channel *ib_cq; ib_cq_handle_t ib_cq_empty; DAPL_OS_WAIT_OBJECT wait_object; int max_inline_send; Index: dapl/openib/dapl_ib_cq.c =================================================================== --- dapl/openib/dapl_ib_cq.c (revision 3629) +++ dapl/openib/dapl_ib_cq.c (working copy) @@ -53,37 +53,36 @@ #include "dapl_ring_buffer_util.h" #include +/* One CQ event channel per HCA */ void dapli_cq_event_cb(struct _ib_hca_transport *hca) { - int i; - dapl_dbg_log(DAPL_DBG_TYPE_UTIL," dapli_cq_event_cb(%p)\n", hca); - /* check all comp events on this device */ - for(i=0;iib_ctx->num_comp;i++) { - struct dapl_evd *evd_ptr = NULL; - struct ibv_cq *ibv_cq = NULL; - struct pollfd cq_fd = { - .fd = hca->ib_ctx->cq_fd[i], - .events = POLLIN, - .revents = 0 - }; - if ((poll(&cq_fd, 1, 0) == 1) && - (!ibv_get_cq_event(hca->ib_ctx, i, - &ibv_cq, (void*)&evd_ptr))) { - - if (DAPL_BAD_HANDLE(evd_ptr, DAPL_MAGIC_EVD)) { - ibv_ack_cq_events(ibv_cq, 1); - continue; - } + struct dapl_evd *evd_ptr = NULL; + struct ibv_cq *ibv_cq = NULL; + struct pollfd cq_fd = { + .fd = hca->ib_cq->fd, + .events = POLLIN, + .revents = 0 + }; + + dapl_dbg_log(DAPL_DBG_TYPE_UTIL," dapli_cq_event_cb(%p)\n", hca); - /* process DTO event via callback */ - dapl_evd_dto_callback ( hca->ib_ctx, - evd_ptr->ib_cq_handle, - (void*)evd_ptr ); + if ((poll(&cq_fd, 1, 0) == 1) && + (!ibv_get_cq_event(hca->ib_cq, + &ibv_cq, (void*)&evd_ptr))) { + if (DAPL_BAD_HANDLE(evd_ptr, DAPL_MAGIC_EVD)) { ibv_ack_cq_events(ibv_cq, 1); - } - } + return; + } + + /* process DTO event via callback */ + dapl_evd_dto_callback ( hca->ib_ctx, + evd_ptr->ib_cq_handle, + (void*)evd_ptr ); + + ibv_ack_cq_events(ibv_cq, 1); + } } /* @@ -241,16 +240,24 @@ dapl_dbg_log ( DAPL_DBG_TYPE_UTIL, "dapls_ib_cq_alloc: evd %p cqlen=%d \n", evd_ptr, *cqlen ); + struct ibv_comp_channel *channel = ia_ptr->hca_ptr->ib_trans.ib_cq; + +#ifdef CQ_WAIT_OBJECT + if (evd_ptr->cq_wait_obj_handle) + channel = evd_ptr->cq_wait_obj_handle; +#endif + /* Call IB verbs to create CQ */ evd_ptr->ib_cq_handle = ibv_create_cq(ia_ptr->hca_ptr->ib_hca_handle, *cqlen, - evd_ptr); + evd_ptr, + channel, 0); if (evd_ptr->ib_cq_handle == IB_INVALID_HANDLE) return DAT_INSUFFICIENT_RESOURCES; /* arm cq for events */ - dapls_set_cq_notify (ia_ptr, evd_ptr); + dapls_set_cq_notify(ia_ptr, evd_ptr); /* update with returned cq entry size */ *cqlen = evd_ptr->ib_cq_handle->cqe; @@ -288,16 +295,21 @@ IN DAT_COUNT *cqlen ) { ib_cq_handle_t new_cq; + struct ibv_comp_channel *channel = ia_ptr->hca_ptr->ib_trans.ib_cq; /* IB verbs doe not support resize. Try to re-create CQ * with new size. Can only be done if QP is not attached. * destroy EBUSY == QP still attached. */ - /* create a new size before destroying original */ - new_cq = ibv_create_cq( ia_ptr->hca_ptr->ib_hca_handle, - *cqlen, - evd_ptr); +#ifdef CQ_WAIT_OBJECT + if (evd_ptr->cq_wait_obj_handle) + channel = evd_ptr->cq_wait_obj_handle; +#endif + + /* Call IB verbs to create CQ */ + new_cq = ibv_create_cq(ia_ptr->hca_ptr->ib_hca_handle, *cqlen, + evd_ptr, channel, 0); if (new_cq == IB_INVALID_HANDLE) return DAT_INSUFFICIENT_RESOURCES; @@ -444,12 +456,13 @@ IN ib_wait_obj_handle_t *p_cq_wait_obj_handle ) { dapl_dbg_log ( DAPL_DBG_TYPE_CM, - " cq_object_create: (%p)=%p\n", - p_cq_wait_obj_handle, evd_ptr ); + " cq_object_create: (%p,%p)\n", + evd_ptr, p_cq_wait_obj_handle ); /* set cq_wait object to evd_ptr */ - *p_cq_wait_obj_handle = evd_ptr; - + *p_cq_wait_obj_handle = + ibv_create_comp_channel(evd_ptr->header.owner_ia->hca_ptr->ib_hca_handle); + return DAT_SUCCESS; } @@ -460,6 +473,9 @@ dapl_dbg_log ( DAPL_DBG_TYPE_UTIL, " cq_object_destroy: wait_obj=%p\n", p_cq_wait_obj_handle ); + + ibv_destroy_comp_channel(p_cq_wait_obj_handle); + return DAT_SUCCESS; } @@ -470,6 +486,8 @@ dapl_dbg_log ( DAPL_DBG_TYPE_UTIL, " cq_object_wakeup: wait_obj=%p\n", p_cq_wait_obj_handle ); + + /* no wake up mechanism */ return DAT_SUCCESS; } @@ -478,88 +496,42 @@ IN ib_wait_obj_handle_t p_cq_wait_obj_handle, IN u_int32_t timeout) { - DAPL_EVD *evd_ptr = p_cq_wait_obj_handle; - ib_cq_handle_t cq = evd_ptr->ib_cq_handle; - struct ibv_cq *ibv_cq = NULL; - void *ibv_ctx = NULL; - int status = 0; - - dapl_dbg_log ( DAPL_DBG_TYPE_CM, - " cq_object_wait: dev %p evd %p cq %p, time %d\n", - cq->context, evd_ptr, cq, timeout ); - - /* Multiple EVD's sharing one event handle for now until uverbs supports more */ - - /* - * This makes it very inefficient and tricky to manage multiple CQ per device open - * For example: 4 threads waiting on separate CQ events will all be woke when - * a CQ event fires. So the poll wakes up and the first thread to get to the - * the get_cq_event wins and the other 3 will block. The dapl_evd_wait code - * above will immediately do a poll_cq after returning from CQ wait and if - * nothing on the queue will call this wait again and go back to sleep. So - * as long as they all wake up, a mutex is held around the get_cq_event - * so no blocking occurs and they all return then everything should work. - * Of course, the timeout needs adjusted on the threads that go back to sleep. - */ - while (cq) { - struct pollfd cq_poll = { - .fd = cq->context->cq_fd[0], + struct dapl_evd *evd_ptr; + struct ibv_cq *ibv_cq = NULL; + void *ibv_ctx = NULL; + int status = 0; + int timeout_ms = -1; + struct pollfd cq_fd = { + .fd = p_cq_wait_obj_handle->fd, .events = POLLIN, .revents = 0 }; - int timeout_ms = -1; - if (timeout != DAT_TIMEOUT_INFINITE) - timeout_ms = timeout/1000; - - /* check if another thread processed the event already, pending queue > 0 */ - dapl_os_lock( &evd_ptr->header.owner_ia->hca_ptr->ib_trans.cq_lock ); - if (dapls_rbuf_count(&evd_ptr->pending_event_queue)) { - dapl_os_unlock( &evd_ptr->header.owner_ia->hca_ptr->ib_trans.cq_lock ); - break; + dapl_dbg_log ( DAPL_DBG_TYPE_CM, + " cq_object_wait: CQ channel %p time %d\n", + p_cq_wait_obj_handle, timeout ); + + /* uDAPL timeout values in usecs */ + if (timeout != DAT_TIMEOUT_INFINITE) + timeout_ms = timeout/1000; + + status = poll(&cq_fd, 1, timeout_ms); + + /* returned event */ + if (status > 0) { + if (!ibv_get_cq_event(p_cq_wait_obj_handle, + &ibv_cq, (void*)&evd_ptr)) { + ibv_ack_cq_events(ibv_cq, 1); } - dapl_os_unlock( &evd_ptr->header.owner_ia->hca_ptr->ib_trans.cq_lock ); + status = 0; - dapl_dbg_log ( DAPL_DBG_TYPE_CM," cq_object_wait: polling\n"); - status = poll(&cq_poll, 1, timeout_ms); - dapl_dbg_log ( DAPL_DBG_TYPE_CM," cq_object_wait: poll returned status=%d\n",status); - - /* - * If poll with timeout wakes then hold mutex around a poll with no timeout - * so subsequent get_cq_events will be guaranteed not to block - * If the event does not belong to this EVD then put it on proper EVD pending - * queue under the mutex. - */ - if (status == 1) { - dapl_os_lock( &evd_ptr->header.owner_ia->hca_ptr->ib_trans.cq_lock ); - status = poll(&cq_poll, 1, 0); - if (status == 1) { - status = ibv_get_cq_event(cq->context, - 0, &ibv_cq, &ibv_ctx); - - /* if event is not ours, put on proper evd pending queue */ - /* force another wakeup */ - if ((ibv_ctx != evd_ptr ) && - (!DAPL_BAD_HANDLE(ibv_ctx, DAPL_MAGIC_EVD))) { - dapl_dbg_log (DAPL_DBG_TYPE_CM, - " cq_object_wait: ibv_ctx %p != evd %p\n", - ibv_ctx, evd_ptr); - dapls_evd_copy_cq((struct evd_ptr*)ibv_ctx); - dapl_os_unlock(&evd_ptr->header.owner_ia->hca_ptr->ib_trans.cq_lock ); - continue; - } - } - dapl_os_unlock( &evd_ptr->header.owner_ia->hca_ptr->ib_trans.cq_lock ); - break; - - } else if (status == 0) { - status = ETIMEDOUT; - break; - } - } + /* timeout */ + } else if (status == 0) + status = ETIMEDOUT; + dapl_dbg_log (DAPL_DBG_TYPE_CM, - " cq_object_wait: RET evd %p cq %p ibv_cq %p ibv_ctx %p %s\n", - evd_ptr, cq,ibv_cq,ibv_ctx,strerror(errno)); + " cq_object_wait: RET evd %p ibv_cq %p ibv_ctx %p %s\n", + evd_ptr, ibv_cq,ibv_ctx,strerror(errno)); return(dapl_convert_errno(status,"cq_wait_object_wait")); -------------- next part -------------- A non-text attachment was scrubbed... Name: cq_channel.patch Type: application/octet-stream Size: 14204 bytes Desc: not available URL: From info at xcdfvg.com Fri Sep 30 19:02:09 2005 From: info at xcdfvg.com (info at xcdfvg.com) Date: 1 Oct 2005 11:02:09 +0900 Subject: [openib-general] $B!y=EMW!y(B Message-ID: <20051001020209.15697.qmail@mail.xcdfvg.com> $B(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(,(B $B(.(#(#(B $BG.$$$4MWK>$K$*1~$($7$F$D$$$KL5NA%]%$%s%HBgI}(BUP$B!*(B $B(.(.(#(B $BATG/$NJ}$K$b$4Gx$7=k$$F|!9$,B3$-$^$9$,!"$*Hh$l$G$O$"$j$^$;$s$+!)(B $B!VL~$5$l$?$$!W(B $B!V$3$N2F$O$^$@(BH$B$J=P2q$$$,L5$$!W(B $B!V$*6b$,L5$$!W(B $BEy!9$*G:$_$G$O$"$j$^$;$s$+!)(B $B$?$@:#!"=w at -;o9-9pBgNL7G:\Cf$K$D$-!"6/NO$K$"$J$?$NM_K>$rK~$?$9=P2q$$$r;Y1gCW$7$^$9!*(B $B"(!V%5%/%i!W(B $B!V6Hl$rDs6!$7$F$*$j$^$9!#(B $B"(A4$F%*!<%W%s$K$7$F$*$j$^$9$N$G0B?4$7$F$*3Z$7$_2<$5$$!#(B $B:#$9$0EPO?$7$FD:$$$?J}$K$O!";O$a$K(B10000$B1_J,$N%]%$%s%H$rL5NA$G:9$7>e$2$F$*$j$^$9"v(B $BEPO?$O$3$A$i"*(B http://www.00-love5.com/?0yen $B!y(B*$B!&(B*$B!y(B*$B!&(B*$B!y(B*$B!&(B*$B!y(B*$B!&(B*$B!y(B*$B!&(B*$B!y(B*$B!&(B*$B!y(B*$B!&(B*$B!y(B*$B!&!y(B*$B!&(B*$B!y(B*$B!&(B*$B!y(B $B!}L5NA%]%$%s%H$GAjEvM7$Y$^$9$N$G@'Hs$*;n$72<$5$$"v(B $B!}$[$H$s$I$NJ}$,L5NA%]%$%s%HFb$G!"=w at -$r(BGET$B$7$F$^$9!*!*(B $B!};HMQ$7$F$_$F!V$3$l$O!*!W$H;W$C$FD:$$$?J}$N$_M-NA$X$*?J$_2<$5$$!*(B $B;n$7$F$_$k!*"*(B http://www.00-love5.com/?0yen $B!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&!]!&(B $B$b$7!"5.J}$,AGE($J=P2q$$$r5a$a$F$$$i$C$7$c$i$J$1$l$P$*OM$S?=$7>e$2$^$9!#(B $B$*