From mst at mellanox.co.il Tue Nov 1 01:09:02 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Tue, 1 Nov 2005 11:09:02 +0200 Subject: [openib-general] Re: 2.6.14 patches In-Reply-To: References: Message-ID: <20051101090902.GG31134@mellanox.co.il> Quoting r. Sean Hefty : > Subject: RE: 2.6.14 patches > > >Sean, Hal, now that 2.6.14 is out, do you plan to apply > >the patches in > https://openib.org/svn/gen2/trunk/src/linux-kernel/patches/? > >Once you do, I'll put reverted patches in the backport directory. > > I'll apply the patch to addr.c shortly. Thanks for the reminder. OK, I have removed linux-2.6.14-rc3-addr.diff since trunk does not need it anymore. Hal, could you please apply the patch linux-2.6.14-rc3-at.diff to at.c? Thanks, -- MST From mirko.benz at xiranet.com Tue Nov 1 01:45:17 2005 From: mirko.benz at xiranet.com (Mirko Benz) Date: Tue, 01 Nov 2005 10:45:17 +0100 Subject: [openib-general] [PATCH/RFC] IB: Add SCSI RDMA Protocol (SRP) initiator Message-ID: <4367392D.7080804@xiranet.com> Hello, We (Xiranet) are developing SRP targets / routers, too. We are testing against: - OpenIB Linux SRP initiator - OpenIB Windows SRP initiator - Mellanox Linux SRP initiator The OpenIB Linux initiator works already well with our system. We would like to see it in mainline. Some suggestions (not necessarily kernel related): - More status information: When parameter parsing for configuration fails no message is printed out (e.g. when a parameter name is misspelled like ioc_gguid instead of ioc_guid when following your Email). - OpenIB SRP Wiki Page should be updated. - Maybe auto discovery/connection of targets should be integrated as on option (as the Windows and Mellanox SRP initiators do). - We had problems with the configuration tool (dmcli) on 64 bit Linux. - An explanation for the pkey and service_id parameters should be added. - backport for Linux enterprise distributions with prebuild RPMs for drivers, config and documentation would be helpful Regards, Mirko From liran at mellanox.co.il Tue Nov 1 03:14:43 2005 From: liran at mellanox.co.il (Liran Sorani) Date: Tue, 1 Nov 2005 13:14:43 +0200 Subject: [openib-general] [PATCH] Osmtest - update command options + vapi fix Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E35AC09B@mtlexch01.mtl.com> Hi , Hal . We've decided to keep and maintain Osmtest in the main trunk , since it is not only a test but a tool to validate SA/SM. The following is a small patch for the follwoing : 1. Support old form of running osmtest , i.e instead of -g= , use -g and add '-p' option to display current available port guids. 2. Support Vapi stack. 3. Update Service flow (Update one of the service lease checks from 1 sec to 4 sec). 4. Ident switch-case) issues in main.c Thanks , Liran . Signed-off-by: Liran Sorani Index: osmt_mtl_regular_qp.c =================================================================== --- osmt_mtl_regular_qp.c (revision 3928) +++ osmt_mtl_regular_qp.c (working copy) @@ -73,7 +73,7 @@ #include #include #include - +#include /* * Initialize the QP etc. * Given in res: port_num, max_outs_sq, max_outs_rq Index: osmt_service.c =================================================================== --- osmt_service.c (revision 3928) +++ osmt_service.c (working copy) @@ -1266,7 +1266,7 @@ p_osmt, cl_ntoh64(id[1]), /* IN ib_net64_t service_id, */ IB_DEFAULT_PKEY,/* IN ib_net16_t service_pkey, */ - cl_hton32(0x00000001), /* IN ib_net32_t service_lease, */ + cl_hton32(0x00000004), /* IN ib_net32_t service_lease, */ 11, /* IN uint8_t service_key_lsb, */ (char*)service_name[1] /* IN char *service_name */ ); Index: main.c =================================================================== --- main.c (revision 3928) +++ main.c (working copy) @@ -128,9 +128,11 @@ "--guid \n" " This option specifies the local port GUID value\n" " with which osmtest should bind. osmtest may be\n" - " bound to 1 port at a time.\n" - " Without -g, osmtest displays a menu of possible\n" - " port GUIDs and waits for user input.\n\n" ); + " bound to 1 port at a time.\n\n"); + printf( "-p \n" + "--port\n" + " This option display menu of possible local port GUID values\n" + " with which osmtest could bind.\n\n"); printf( "-h\n" "--help\n" " Display this usage info then exit.\n\n" ); printf( "-i \n" @@ -160,9 +162,9 @@ " --- -----------------\n" " -M1 - Short Multicast Flow (default) - single mode.\n" " -M2 - Short Multicast Flow - multiple mode.\n" - " -M3 - Long Multicast Flow - single mode.\n" - " -M4 - Long Multicast Flow - mutiple mode.\n" - " Single mode - Osmtest is tested alone, with no other\n" + " -M3 - Long MultiCast Flow - single mode.\n" + " -M4 - Long MultiCast Flow - mutiple mode.\n" + " Single mode - Osmtest is tested alone , with no other \n" " apps that interact vs. OpenSM MC.\n" " Multiple mode - Could be run with other apps using MC vs.\n" " OpenSM." @@ -305,7 +307,7 @@ char flow_name[64]; boolean_t mem_track = FALSE; uint32_t next_option; - const char *const short_option = "f:l:m:M:d:g::s:t:i:cvVh"; + const char *const short_option = "f:l:m:M:d:g:s:t:i:pcvVh"; /* * In the array below, the 2nd parameter specified the number @@ -322,9 +324,10 @@ {"inventory", 1, NULL, 'i'}, {"max_lid", 1, NULL, 'm'}, {"guid", 2, NULL, 'g'}, + {"port", 0, NULL, 'p'}, {"help", 0, NULL, 'h'}, {"stress", 1, NULL, 's'}, - {"Multicast_Mode", 1, NULL, 'M'}, + {"MultiCast_Mode", 1, NULL, 'M'}, {"timeout", 1, NULL, 't'}, {"verbose", 0, NULL, 'v'}, {"log_file", 1, NULL, 'l'}, @@ -363,7 +366,6 @@ { next_option = getopt_long_only( argc, argv, short_option, long_option, NULL ); - switch ( next_option ) { case 'c': @@ -446,28 +448,30 @@ break; case 'g': - /* - Specifies port guid with which to bind. - */ - if (optarg) { - guid = cl_hton64( strtoull( optarg, NULL, 16 )); - printf(" Guid <0x%"PRIx64">\n", cl_hton64( guid )); - } else - guid = INVALID_GUID; - break; - + /* + * Specifies port guid with which to bind. + */ + guid = cl_hton64( strtoull( optarg, NULL, 16 )); + printf(" Guid <0x%"PRIx64">\n", cl_hton64( guid )); + break; + case 'p': + /* + * Display current port guids + */ + guid = INVALID_GUID; + break; case 't': - /* + /* * Specifies transaction timeout. - */ - opt.transaction_timeout = strtol( optarg, NULL, 0 ); - printf( "\tTransaction timeout = %d\n", opt.transaction_timeout ); - break; + */ + opt.transaction_timeout = strtol( optarg, NULL, 0 ); + printf( "\tTransaction timeout = %d\n", opt.transaction_timeout ); + break; case 'l': - opt.log_file = optarg; - printf("\tLog File:%s\n", opt.log_file ); - break; + opt.log_file = optarg; + printf("\tLog File:%s\n", opt.log_file ); + break; case 'v': /* @@ -510,32 +514,32 @@ } break; - case 'M': - /* - * Perform stress test. - */ - opt.mmode = strtol( optarg, NULL, 0 ); - printf( "\tMulticast test enabled: " ); - switch ( opt.mmode ) - { - case 1: - printf( "Short MC Flow - single mode (default)\n" ); - break; - case 2: - printf( "Short MC Flow - mutiple mode\n" ); - break; - case 3: - printf( "Long MC Flow - single mode\n" ); - break; - case 4: - printf( "Long MC Flow - mutiple mode\n" ); - break; - default: - printf( "Unknown value %u (ignored)\n", opt.stress ); - opt.mmode = 0; - break; - } - break; + case 'M': + /* + * Perform stress test. + */ + opt.mmode = strtol( optarg, NULL, 0 ); + printf( "\tMultiCast test enabled: " ); + switch ( opt.mmode ) + { + case 1: + printf( "Short MC Flow - single mode (default)\n" ); + break; + case 2: + printf( "Short MC Flow - mutiple mode\n" ); + break; + case 3: + printf( "Long MC Flow - single mode\n" ); + break; + case 4: + printf( "Long MC Flow - mutiple mode\n" ); + break; + default: + printf( "Unknown value %u (ignored)\n", opt.stress ); + opt.mmode = 0; + break; + } + break; case 'd': /* Index: Makefile.am =================================================================== --- Makefile.am (revision 3928) +++ Makefile.am (working copy) @@ -13,9 +13,11 @@ bin_PROGRAMS = osmtest osmtest_SOURCES = main.c osmtest.c osmt_service.c osmt_slvl_vl_arb.c \ osmt_multicast.c osmt_inform.c - +if OSMV_VAPI +osmtest_SOURCES = osmt_mtl_regular_qp.c +endif osmtest_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT $(DBGFLAGS) -osmtest_LDADD = -L../complib -L../libvendor -L../opensm -L$(libdir) \ +osmtest_LDADD = -L../complib -L../libvendor -L../opensm -L$(libdir) -L. \ $(OSMV_LDADD) -lopensm -losmcomp -losmvendor osmtest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -lpthread -L../opensm > Liran Sorani > Mellanox Technologies LTD. > mailto:liran at mellanox.co.il > Phone: +972(4)9097200 Ext: 214 > Israel, Yokneam P.O.B 586 ZIP 20692 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From info at gtgfd.com Tue Nov 1 01:41:38 2005 From: info at gtgfd.com (info at gtgfd.com) Date: 1 Nov 2005 18:41:38 +0900 Subject: [openib-general] $B$$$-$J$j$9$_$^$;$s!*(B Message-ID: <20051101094138.1547.qmail@mail.gtgfd.com> http://www.s-bj.net/?luckget $B=P2q$$7O%5%$%H$r1?1D$7$F$$$kEDCf$H?=$7$^$9!#:#G/$O=w at -2q0w3MF@$K(B $B%l%G%#%3%_Ej9F$d1XA0$G$N%F%#%C%7%eG[I[$K#12/$rEj;q$7$?7k2LCK at -2q(B $B0w$H$NHfN($,(B7$B!'(B3$B$K$J$C$F$7$^$$!"=w at -$+$i$N6l>p$,=P$F$7$^$C$F:$$C(B $B$F$$$^$9!#$=$N$?$a$"$J$?$r1J5WE*$KFCJLL5NA$G$*;H$$$$$?$@$1$kFCJL(B $B2q0w$K$J$C$F$$$?$@$-$?$$$H;W$C$F$*$j$^$9!#%K%C%/%M!<%`$N:G8e$K(B $B!V(B*$B!W$rIU$1$F$$$?$@$1$l$P$3$A$i$N$[$&$GFCJL2q0w$K at _Dj$5$;$F$$$?(B $B$@$-$^$9!#(B http://www.s-bj.net/?luckget $B$f$C$/$j$H9bNp$N$*6b$b$A$N=w at -$r8+$D$1$F%j%C%A$J at 83h$rAw$C$F$_$F(B $B$/$@$5$$!#(B $B -----Forwarded Message----- From: Hal Rosenstock To: Itamar Rabenstein Cc: openib-general at openib.org, Eitan Zahavi Subject: Re: opensm problem ??? Date: 31 Oct 2005 16:49:58 -0500 Hi Itamar, On Wed, 2005-10-26 at 11:25, Itamar Rabenstein wrote: > Hi All, > I am running openib gen2 svn rev 3872 (kernel + user). > my system is EM64T (x86_64) + SUSE9.3 + k2.6.13.4 I've run Opterons with 2.6.13 and not quite as recent svn 3850. I'm in the process of updating to the latest now that I'm back. Do you still have this problem ? > I have arbel in memfree mode (fw 5.1.132) . I don't have a memfree HCA (arbel or otherwise). It also appears you are using more recent firmware than is generally available. Are you sure it's unrelated to that ? > my 2 ports are connected in loopback. Loopback configuration works in general. > I am running opensm but the links are not getting into ACTIVE. > in the osm.log i see > > Oct 26 16:59:25 366150 [43005960] -> __osm_vl15_poller: 1 QP0 MADs on > wire, 1 outstanding, 0 unicasts sent, 1 total sent. > > Oct 26 16:59:33 937993 [44007960] -> umad_receiver: ERR 5404: recv > error on MAD sized umad (Interrupted system call) It looks to me like the code in osm_vendor_ibumad.c::umad_receiver() should handle this (just indicates this occured) and reissue the umad_recv. It appears that the GetResp for NodeInfo is never received yet this transaction doesn't timeout either which would have been what I expected. -- Hal > > Does it works for others ? > > Itamar From halr at voltaire.com Tue Nov 1 05:13:03 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Nov 2005 08:13:03 -0500 Subject: [openib-general] Re: [PATCH] Osmtest - update command options + vapi fix Message-ID: <1130850782.15904.6103.camel@hal.voltaire.com> Hi Liran, On Tue, 2005-11-01 at 06:14, Liran Sorani wrote: > Hi , Hal . > We've decided to keep and maintain Osmtest in the main trunk , since > it is not only a test but a tool to validate SA/SM. I'm not sure I see the difference. It is a test tool which validates SA/SM. Other tools validate other components. Anyhow, I will apply the patch. > The following is a small patch for the follwoing : > 1. Support old form of running osmtest , i.e instead of -g= guid> , use -g and add '-p' option to display current > available port guids. > > 2. Support Vapi stack. > 3. Update Service flow (Update one of the service lease checks from 1 > sec to 4 sec). > 4. Ident switch-case) issues in main.c > > Thanks , Liran . The patch for main.c appears to be line wrapped: patching file main.c patch: **** malformed patch at line 34: value\n" I only need that part of the patch. > Signed-off-by: Liran Sorani [snip...] > Index: Makefile.am > =================================================================== > --- Makefile.am (revision 3928) > +++ Makefile.am (working copy) > @@ -13,9 +13,11 @@ > bin_PROGRAMS = osmtest > osmtest_SOURCES = main.c osmtest.c osmt_service.c osmt_slvl_vl_arb.c > \ > osmt_multicast.c osmt_inform.c > - > +if OSMV_VAPI > +osmtest_SOURCES = osmt_mtl_regular_qp.c Shouldn't this be: osmtest_SOURCES += osmt_mtl_regular_qp.c -- Hal > +endif > osmtest_CFLAGS = -Wall $(OSMV_CFLAGS) -DVENDOR_RMPP_SUPPORT > $(DBGFLAGS) > -osmtest_LDADD = -L../complib -L../libvendor -L../opensm -L$(libdir) \ > +osmtest_LDADD = -L../complib -L../libvendor -L../opensm -L$(libdir) > -L. \ > $(OSMV_LDADD) -lopensm -losmcomp -losmvendor > > osmtest_LDFLAGS = -Wl,--rpath -Wl,$(libdir) -lpthread -L../opensm > > > > > Liran Sorani > > Mellanox Technologies LTD. > > mailto:liran at mellanox.co.il > > Phone: +972(4)9097200 Ext: 214 > > Israel, Yokneam P.O.B 586 ZIP 20692 > > > > > From bardov at gmail.com Tue Nov 1 05:17:21 2005 From: bardov at gmail.com (Dan Bar Dov) Date: Tue, 1 Nov 2005 15:17:21 +0200 Subject: [openib-general] ppc64 compilation failure In-Reply-To: <20051031190340.GE6246@us.ibm.com> References: <20051031184924.GD6246@us.ibm.com> <20051031190340.GE6246@us.ibm.com> Message-ID: I fixed the iser compile warning r3929. Dan On 10/31/05, Nishanth Aravamudan wrote: > On 31.10.2005 [10:49:24 -0800], Nishanth Aravamudan wrote: > > Hi Roland, > > > > Looks like ppc64 build with 2.6.14-git3 and svn 3918 is busted: > > Only the ppc64 build had finished when I sent this mail, but the same > happens on x86, with an additional: > > drivers/infiniband/ulp/iser/iser_mod.c:59: warning: large integer implicitly truncated to unsigned type > > Thanks, > Nish > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From liran at mtl075.yok.mtl.com Tue Nov 1 05:38:51 2005 From: liran at mtl075.yok.mtl.com (Liran Sorani) Date: 01 Nov 2005 15:38:51 +0200 Subject: [openib-general] Re:[PATCH] Osmtest - update command option + vapi fix Message-ID: <30u0ewv66s.fsf@mtl066.yok.mtl.com> Hi Hal, 1. Regarding the osmtest_SOURCES , it works both ways (i.e compile all files required) , still the correct one is += 2. Following is the patch for main.c : Index: main.c =================================================================== --- main.c (revision 3928) +++ main.c (working copy) @@ -128,9 +128,11 @@ "--guid \n" " This option specifies the local port GUID value\n" " with which osmtest should bind. osmtest may be\n" - " bound to 1 port at a time.\n" - " Without -g, osmtest displays a menu of possible\n" - " port GUIDs and waits for user input.\n\n" ); + " bound to 1 port at a time.\n\n"); + printf( "-p \n" + "--port\n" + " This option display menu of possible local port GUID values\n" + " with which osmtest could bind.\n\n"); printf( "-h\n" "--help\n" " Display this usage info then exit.\n\n" ); printf( "-i \n" @@ -160,9 +162,9 @@ " --- -----------------\n" " -M1 - Short Multicast Flow (default) - single mode.\n" " -M2 - Short Multicast Flow - multiple mode.\n" - " -M3 - Long Multicast Flow - single mode.\n" - " -M4 - Long Multicast Flow - mutiple mode.\n" - " Single mode - Osmtest is tested alone, with no other\n" + " -M3 - Long MultiCast Flow - single mode.\n" + " -M4 - Long MultiCast Flow - mutiple mode.\n" + " Single mode - Osmtest is tested alone , with no other \n" " apps that interact vs. OpenSM MC.\n" " Multiple mode - Could be run with other apps using MC vs.\n" " OpenSM." @@ -305,7 +307,7 @@ char flow_name[64]; boolean_t mem_track = FALSE; uint32_t next_option; - const char *const short_option = "f:l:m:M:d:g::s:t:i:cvVh"; + const char *const short_option = "f:l:m:M:d:g:s:t:i:pcvVh"; /* * In the array below, the 2nd parameter specified the number @@ -322,9 +324,10 @@ {"inventory", 1, NULL, 'i'}, {"max_lid", 1, NULL, 'm'}, {"guid", 2, NULL, 'g'}, + {"port", 0, NULL, 'p'}, {"help", 0, NULL, 'h'}, {"stress", 1, NULL, 's'}, - {"Multicast_Mode", 1, NULL, 'M'}, + {"MultiCast_Mode", 1, NULL, 'M'}, {"timeout", 1, NULL, 't'}, {"verbose", 0, NULL, 'v'}, {"log_file", 1, NULL, 'l'}, @@ -363,7 +366,6 @@ { next_option = getopt_long_only( argc, argv, short_option, long_option, NULL ); - switch ( next_option ) { case 'c': @@ -446,28 +448,30 @@ break; case 'g': - /* - Specifies port guid with which to bind. - */ - if (optarg) { - guid = cl_hton64( strtoull( optarg, NULL, 16 )); - printf(" Guid <0x%"PRIx64">\n", cl_hton64( guid )); - } else - guid = INVALID_GUID; - break; - + /* + * Specifies port guid with which to bind. + */ + guid = cl_hton64( strtoull( optarg, NULL, 16 )); + printf(" Guid <0x%"PRIx64">\n", cl_hton64( guid )); + break; + case 'p': + /* + * Display current port guids + */ + guid = INVALID_GUID; + break; case 't': - /* + /* * Specifies transaction timeout. - */ - opt.transaction_timeout = strtol( optarg, NULL, 0 ); - printf( "\tTransaction timeout = %d\n", opt.transaction_timeout ); - break; + */ + opt.transaction_timeout = strtol( optarg, NULL, 0 ); + printf( "\tTransaction timeout = %d\n", opt.transaction_timeout ); + break; case 'l': - opt.log_file = optarg; - printf("\tLog File:%s\n", opt.log_file ); - break; + opt.log_file = optarg; + printf("\tLog File:%s\n", opt.log_file ); + break; case 'v': /* @@ -510,32 +514,32 @@ } break; - case 'M': - /* - * Perform stress test. - */ - opt.mmode = strtol( optarg, NULL, 0 ); - printf( "\tMulticast test enabled: " ); - switch ( opt.mmode ) - { - case 1: - printf( "Short MC Flow - single mode (default)\n" ); - break; - case 2: - printf( "Short MC Flow - mutiple mode\n" ); - break; - case 3: - printf( "Long MC Flow - single mode\n" ); - break; - case 4: - printf( "Long MC Flow - mutiple mode\n" ); - break; - default: - printf( "Unknown value %u (ignored)\n", opt.stress ); - opt.mmode = 0; - break; - } - break; + case 'M': + /* + * Perform stress test. + */ + opt.mmode = strtol( optarg, NULL, 0 ); + printf( "\tMultiCast test enabled: " ); + switch ( opt.mmode ) + { + case 1: + printf( "Short MC Flow - single mode (default)\n" ); + break; + case 2: + printf( "Short MC Flow - mutiple mode\n" ); + break; + case 3: + printf( "Long MC Flow - single mode\n" ); + break; + case 4: + printf( "Long MC Flow - mutiple mode\n" ); + break; + default: + printf( "Unknown value %u (ignored)\n", opt.stress ); + opt.mmode = 0; + break; + } + break; case 'd': /* From info at jfidu.com Tue Nov 1 06:02:40 2005 From: info at jfidu.com (info at jfidu.com) Date: 1 Nov 2005 23:02:40 +0900 Subject: [openib-general] $B:G?78D<<3+J|(B Message-ID: <20051101140240.9369.qmail@mail.jfidu.com> $B$*Hh$lMM$G$9!*:#F|=i$a$F;XL>$r$7$F$^$9!#(B $B5U!}(BOK$B$G$9!*!Y$H$$$&%a%C%;!<%8$,F~$j$^$7$?!#(B $B;XL>$r!"8f=P$G$/$@$5$$!#(B $B8D<<$G?4B!BQ$($J$$J}$*Bg;v$K(B $B5qH]!'(Bbadluck at arigatouo.net From rolandd at cisco.com Tue Nov 1 07:08:55 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 07:08:55 -0800 Subject: [openib-general] [PATCH/RFC] IB: Add SCSI RDMA Protocol (SRP) initiator In-Reply-To: <4367392D.7080804@xiranet.com> (Mirko Benz's message of "Tue, 01 Nov 2005 10:45:17 +0100") References: <4367392D.7080804@xiranet.com> Message-ID: <52oe54flrs.fsf@cisco.com> > We (Xiranet) are developing SRP targets / routers, too. Great, welcome to the club! > - More status information: > When parameter parsing for configuration fails no message is printed > out (e.g. when a parameter name is misspelled like ioc_gguid instead > of ioc_guid when following your Email). Good point, I'll add an error message to the configuration parsing code. > - OpenIB SRP Wiki Page should be updated. It's easy to create an account and edit Wiki pages... > - Maybe auto discovery/connection of targets should be integrated as > on option (as the Windows and Mellanox SRP initiators do). There's lots of scope for userspace tools for discovery, connection, health monitoring and so on for SRP targets. It would be great if someone started working on something like that. Do you have any interest in contributing to this? > - We had problems with the configuration tool (dmcli) on 64 bit Linux. Let's not call dmcli "the configuration tool" -- it's a quick hack that I put together that needs to be replaced by real device management (see above). > - An explanation for the pkey and service_id parameters should be added. Care to supply some suggested text? Thanks, Roland From kingman at storagegear.com Tue Nov 1 07:25:42 2005 From: kingman at storagegear.com (John Kingman) Date: Tue, 1 Nov 2005 09:25:42 -0600 (CST) Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation In-Reply-To: <52ek61gdx7.fsf@cisco.com> References: <52ek61gdx7.fsf@cisco.com> Message-ID: On Mon, 31 Oct 2005, Roland Dreier wrote: >With that said I don't think I like this patch. I don't think it's a >win to allocate 1 KB IUs when we'll almost never have gather/scatter >lists that big. Even the 256 byte IUs that the current driver uses >seem on the borderline of being too big. > >Also, is it really a win to have the target fetch a large indirect >buffer list? It seems like it would be better for performance to give >the SCSI layer a limit on the size of the gather/scatter list we >support so that our indirect buffer lists always fit in the IUs we send. Without knowing what the optimal values should be, perhaps we should make some of these module parameters. John From mshefty at ichips.intel.com Tue Nov 1 09:23:21 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 01 Nov 2005 09:23:21 -0800 Subject: [openib-general] Re: [PATCH] fix umad object lifetime stuff In-Reply-To: <528xwdqn4x.fsf@cisco.com> References: <528xwdqn4x.fsf@cisco.com> Message-ID: <4367A489.5070803@ichips.intel.com> Roland Dreier wrote: > Something like this is probably required for ucm and anything else > that exports a character device, since everyone seems to have copied > my bad user_mad code. But I haven't had a chance to do anything > beyond user_mad and uverbs so far... Thanks for the info. I'll take a look at ucm. - Sean From ftillier at silverstorm.com Tue Nov 1 09:29:49 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Tue, 1 Nov 2005 09:29:49 -0800 Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation In-Reply-To: Message-ID: <001301c5df09$dd0529d0$9e5aa8c0@infiniconsys.com> > From: John Kingman [mailto:kingman at storagegear.com] > Sent: Tuesday, November 01, 2005 7:26 AM > > On Mon, 31 Oct 2005, Roland Dreier wrote: > > >With that said I don't think I like this patch. I don't think it's a > >win to allocate 1 KB IUs when we'll almost never have gather/scatter > >lists that big. Even the 256 byte IUs that the current driver uses > >seem on the borderline of being too big. > > > >Also, is it really a win to have the target fetch a large indirect > >buffer list? It seems like it would be better for performance to give > >the SCSI layer a limit on the size of the gather/scatter list we > >support so that our indirect buffer lists always fit in the IUs we send. > > Without knowing what the optimal values should be, perhaps we should > make some of these module parameters. The Windows SRP initiator sizes the IU to be capable of performing a 64KB I/O with all SGEs specified in the IU. It takes 350 bytes to be able to put the full SGL into an IDBD IU, assuming 4K pages. An alternative is to always force a RDMA read of the SGL, and just go with the minimum size IU. I don't know how this would affect performance, though - likely increased latencies. In fact, in environments where each I/O buffer can be registered (via regular or fast MR) on the fly, DDBD should be used and the IU would be a constant 64 bytes. This should yield the best performance. - Fab From info at jfudy.com Tue Nov 1 07:09:43 2005 From: info at jfudy.com (info at jfudy.com) Date: 2 Nov 2005 00:09:43 +0900 Subject: [openib-general] $B:G?78D<<3+J|(B Message-ID: <20051101150943.19480.qmail@mail.jfudy.com> $B$*Hh$lMM$G$9!*:#F|=i$a$F;XL>$r$7$F$^$9!#(B $B5U!}(BOK$B$G$9!*!Y$H$$$&%a%C%;!<%8$,F~$j$^$7$?!#(B $B;XL>$r!"8f=P$G$/$@$5$$!#(B $B8D<<$G?4B!BQ$($J$$J}$*Bg;v$K(B $B5qH]!'(Bbadluck at arigatouo.net From info at kjgjd.com Tue Nov 1 06:50:11 2005 From: info at kjgjd.com (info at kjgjd.com) Date: 1 Nov 2005 23:50:11 +0900 Subject: [openib-general] $BLt6I1?1DpJs(B Message-ID: <20051101145011.3657.qmail@mail.kjgjd.com> $B5.J}$N%"%I%l%9$,!Z(BID:145265 $B at 6;R![$5$s$+$iD>@\;XL>$r$5$l$?$3$H$,3NG'$G$-$^$7$?$N$G!"D>@\O"Mm2DG=$H at _Dj$5$;$FD:$-$^$7$?!#:#$+$iD>@\O"MmJ}K!$r$40FFb$G$-$7$^$9$N$G!"G'>Z$H$7$F4JC1$JFCJL?=9~$_(B($BA4$FL5NA(B)$B$r$*4j$$CW$7$^$9!#(B $B8^IC$GL5NAEPO?"*%m%0%$%s!!(Bhttp://www.jumpb2.net/?raku $B"!4JC1(BPF$B>R2p"!(B $BG/Np!'Fb=o(B $B;E;v!'Lt6IE9J^1?1D(B($BA49q==FsE9J^(B) $B%3%a%s%H!'!V0l2s#5K|$/$i$$G=w at -$r0FFbCW$7$^$9$N$G!"D>@\%a!<%k(B $B$h$j%"%I%l%9$J$I$N3NG'$,$G$-$k$HJ]>Z$7$^$9!#L>A0!Z at 6;R![$G(B $BEPO?$5$l$F$*$j$^$9!#(B \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ $B5qH]%"%I(B (Refusal Adress) iranai at jumpb2.net $B!!(B From halr at voltaire.com Tue Nov 1 09:38:34 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Nov 2005 12:38:34 -0500 Subject: [openib-general] Re: [PATCH] fix umad object lifetime stuff In-Reply-To: <4367A489.5070803@ichips.intel.com> References: <528xwdqn4x.fsf@cisco.com> <4367A489.5070803@ichips.intel.com> Message-ID: <1130866714.4381.505.camel@hal.voltaire.com> On Tue, 2005-11-01 at 12:23, Sean Hefty wrote: > Roland Dreier wrote: > > Something like this is probably required for ucm and anything else > > that exports a character device, since everyone seems to have copied > > my bad user_mad code. But I haven't had a chance to do anything > > beyond user_mad and uverbs so far... > > Thanks for the info. I'll take a look at ucm. Should this be done for uat too or doesn't it matter ? -- Hal From yipeeyipeeyipeeyipee at yahoo.com Tue Nov 1 09:39:37 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Tue, 1 Nov 2005 17:39:37 +0000 (UTC) Subject: [openib-general] compilation platform dependencies Message-ID: Hi, I think that I've noticed a problem in compiling user applications with a different compiler than the running-kernel modules compiler (x86 32bit vs. 64bit). For compiling an openib application on a 32bit x86 and running it on a 64bit AMD Opteron. When compiling a program with a 32bit gcc, the sizeof(struct cm_abi_event_resp) was 184 bytes (written to the kernel from ib_cm_get_event()) vs. the 192 bytes resulting from a x86_64 compiler. When ucm's ib_ucm_event() compares the sizeof() of the received cmd/buffer to sizeof(struct ib_ucm_event_resp) it finds a mismatch and returns -ENOSPC. Notice that 32bit applications are allowed to run on a x86_64. I can see two fixes to this issue: 1. Disallow 32bit applications to use 64bit kernel modules and warn about it at run-time. 2. Specifiy gcc packing pragmas for user/kernel communication structures in header files. Any comments? Thanks, y From rolandd at cisco.com Tue Nov 1 09:51:59 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 09:51:59 -0800 Subject: [openib-general] Re: [PATCH] fix umad object lifetime stuff In-Reply-To: <1130866714.4381.505.camel@hal.voltaire.com> (Hal Rosenstock's message of "01 Nov 2005 12:38:34 -0500") References: <528xwdqn4x.fsf@cisco.com> <4367A489.5070803@ichips.intel.com> <1130866714.4381.505.camel@hal.voltaire.com> Message-ID: <52br14fe80.fsf@cisco.com> >> Roland Dreier wrote: > Something like this is probably required >> for ucm and anything else > that exports a character device, >> since everyone seems to have copied > my bad user_mad code. >> But I haven't had a chance to do anything > beyond user_mad and >> uverbs so far... Hal> Should this be done for uat too or doesn't it matter ? The bugs definitely exist in uat. However, fixing things like passing kernel pointers to userspace would seem like a higher priority to me. Or we could just deprecate uat in favor of Sean's work... - R. From halr at voltaire.com Tue Nov 1 09:51:28 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Nov 2005 12:51:28 -0500 Subject: [openib-general] compilation platform dependencies In-Reply-To: References: Message-ID: <1130867263.4381.537.camel@hal.voltaire.com> On Tue, 2005-11-01 at 12:39, yipee wrote: > Hi, > > > I think that I've noticed a problem in compiling user applications with a > different compiler than the running-kernel modules compiler (x86 32bit vs. > 64bit). For compiling an openib application on a 32bit x86 and running it on a > 64bit AMD Opteron. > When compiling a program with a 32bit gcc, the sizeof(struct cm_abi_event_resp) > was 184 bytes (written to the kernel from ib_cm_get_event()) vs. the 192 bytes > resulting from a x86_64 compiler. > When ucm's ib_ucm_event() compares the sizeof() of the received cmd/buffer to > sizeof(struct ib_ucm_event_resp) it finds a mismatch and returns -ENOSPC. > > Notice that 32bit applications are allowed to run on a x86_64. > I can see two fixes to this issue: > 1. Disallow 32bit applications to use 64bit kernel modules and warn about it at > run-time. It is a requirement to work in this mode so this would not be acceptable. -- Hal > 2. Specifiy gcc packing pragmas for user/kernel communication structures in > header files. > > Any comments? > Thanks, > y > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From mshefty at ichips.intel.com Tue Nov 1 09:55:40 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 01 Nov 2005 09:55:40 -0800 Subject: [openib-general] compilation platform dependencies In-Reply-To: References: Message-ID: <4367AC1C.5030600@ichips.intel.com> yipee wrote: > I think that I've noticed a problem in compiling user applications with a > different compiler than the running-kernel modules compiler (x86 32bit vs. > 64bit). For compiling an openib application on a 32bit x86 and running it on a > 64bit AMD Opteron. > When compiling a program with a 32bit gcc, the sizeof(struct cm_abi_event_resp) > was 184 bytes (written to the kernel from ib_cm_get_event()) vs. the 192 bytes > resulting from a x86_64 compiler. > When ucm's ib_ucm_event() compares the sizeof() of the received cmd/buffer to > sizeof(struct ib_ucm_event_resp) it finds a mismatch and returns -ENOSPC. I think that we can fix this by adding padding to the end of these structures to align them to a 64-bit boundary. Did you notice if any other data structures had this issue? - Sean From halr at voltaire.com Tue Nov 1 09:56:52 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Nov 2005 12:56:52 -0500 Subject: [openib-general] Re: [PATCH] fix umad object lifetime stuff In-Reply-To: <52br14fe80.fsf@cisco.com> References: <528xwdqn4x.fsf@cisco.com> <4367A489.5070803@ichips.intel.com> <1130866714.4381.505.camel@hal.voltaire.com> <52br14fe80.fsf@cisco.com> Message-ID: <1130867811.4381.571.camel@hal.voltaire.com> On Tue, 2005-11-01 at 12:51, Roland Dreier wrote: > >> Roland Dreier wrote: > Something like this is probably required > >> for ucm and anything else > that exports a character device, > >> since everyone seems to have copied > my bad user_mad code. > >> But I haven't had a chance to do anything > beyond user_mad and > >> uverbs so far... > > Hal> Should this be done for uat too or doesn't it matter ? > > The bugs definitely exist in uat. However, fixing things like passing > kernel pointers to userspace would seem like a higher priority to me. > > Or we could just deprecate uat in favor of Sean's work... Isn't that in process as AT is being deprecated in favor of CMA ? That's part of why I was asking: given that, are these issues needed to be fixed in the short term ? (I would prefer not to unless this is really needed by someone). -- Hal From rolandd at cisco.com Tue Nov 1 10:07:17 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 10:07:17 -0800 Subject: [openib-general] compilation platform dependencies In-Reply-To: (yipee's message of "Tue, 1 Nov 2005 17:39:37 +0000 (UTC)") References: Message-ID: <527jbsfdii.fsf@cisco.com> yipee> Hi, I think that I've noticed a problem in compiling user yipee> applications with a different compiler than the yipee> running-kernel modules compiler (x86 32bit vs. 64bit). For yipee> compiling an openib application on a 32bit x86 and running yipee> it on a 64bit AMD Opteron. When compiling a program with a yipee> 32bit gcc, the sizeof(struct cm_abi_event_resp) was 184 yipee> bytes (written to the kernel from ib_cm_get_event()) yipee> vs. the 192 bytes resulting from a x86_64 compiler. When yipee> ucm's ib_ucm_event() compares the sizeof() of the received yipee> cmd/buffer to sizeof(struct ib_ucm_event_resp) it finds a yipee> mismatch and returns -ENOSPC. Yes, this looks like a real bug. yipee> Notice that 32bit applications are allowed to run on a yipee> x86_64. I can see two fixes to this issue: 1. Disallow yipee> 32bit applications to use 64bit kernel modules and warn yipee> about it at run-time. 2. Specifiy gcc packing pragmas for yipee> user/kernel communication structures in header files. I think the real fix is just to fix the declaration so that the structure is laid out the same for all architectures, and bump the ABI version yet again. All structs more than 4 bytes in size have to be padded to a multiple of 8 bytes, or else they end up with a different size between 32-bit and 64-bit architectures. I think something like the patch below along with the corresponding userspace libibcm change is required. - R. --- infiniband/include/rdma/ib_user_cm.h (revision 3932) +++ infiniband/include/rdma/ib_user_cm.h (working copy) @@ -38,7 +38,7 @@ #include -#define IB_USER_CM_ABI_VERSION 3 +#define IB_USER_CM_ABI_VERSION 4 enum { IB_USER_CM_CMD_CREATE_ID, @@ -84,6 +84,7 @@ struct ib_ucm_create_id_resp { struct ib_ucm_destroy_id { __u64 response; __u32 id; + __u32 reserved; }; struct ib_ucm_destroy_id_resp { @@ -93,6 +94,7 @@ struct ib_ucm_destroy_id_resp { struct ib_ucm_attr_id { __u64 response; __u32 id; + __u32 reserved; }; struct ib_ucm_attr_id_resp { @@ -164,6 +166,7 @@ struct ib_ucm_listen { __be64 service_id; __be64 service_mask; __u32 id; + __u32 reserved; }; struct ib_ucm_establish { @@ -219,7 +222,7 @@ struct ib_ucm_req { __u8 rnr_retry_count; __u8 max_cm_retries; __u8 srq; - __u8 reserved[1]; + __u8 reserved[5]; }; struct ib_ucm_rep { @@ -236,6 +239,7 @@ struct ib_ucm_rep { __u8 flow_control; __u8 rnr_retry_count; __u8 srq; + __u32 reserved; }; struct ib_ucm_info { @@ -245,7 +249,7 @@ struct ib_ucm_info { __u64 data; __u8 info_len; __u8 data_len; - __u8 reserved[2]; + __u8 reserved[6]; }; struct ib_ucm_mra { @@ -273,6 +277,7 @@ struct ib_ucm_sidr_req { __u16 pkey; __u8 len; __u8 max_cm_retries; + __u32 reserved; }; struct ib_ucm_sidr_rep { @@ -284,7 +289,7 @@ struct ib_ucm_sidr_rep { __u64 data; __u8 info_len; __u8 data_len; - __u8 reserved[2]; + __u8 reserved[6]; }; /* * event notification ABI structures. @@ -295,7 +300,7 @@ struct ib_ucm_event_get { __u64 info; __u8 data_len; __u8 info_len; - __u8 reserved[2]; + __u8 reserved[6]; }; struct ib_ucm_req_event_resp { @@ -315,6 +320,7 @@ struct ib_ucm_req_event_resp { __u8 rnr_retry_count; __u8 srq; __u8 port; + __u8 reserved[7]; }; struct ib_ucm_rep_event_resp { @@ -329,7 +335,7 @@ struct ib_ucm_rep_event_resp { __u8 flow_control; __u8 rnr_retry_count; __u8 srq; - __u8 reserved[1]; + __u8 reserved[5]; }; struct ib_ucm_rej_event_resp { @@ -374,6 +380,7 @@ struct ib_ucm_event_resp { __u32 id; __u32 event; __u32 present; + __u32 reserved; union { struct ib_ucm_req_event_resp req_resp; struct ib_ucm_rep_event_resp rep_resp; From rolandd at cisco.com Tue Nov 1 10:09:26 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 10:09:26 -0800 Subject: [openib-general] compilation platform dependencies In-Reply-To: <4367AC1C.5030600@ichips.intel.com> (Sean Hefty's message of "Tue, 01 Nov 2005 09:55:40 -0800") References: <4367AC1C.5030600@ichips.intel.com> Message-ID: <523bmgfdex.fsf@cisco.com> Sean> Did you notice if any other data structures had this issue? I use the perl script below to check this. You feed it a header file, and it prints out a C program that prints the size of every struct. Then compile and run the program on both 32-bit and 64-bit architectures and diff the output. With the patch I just sent, ib_user_cm.h is clean. - R. #!/usr/bin/env perl use English; use strict; my @structs; while (<>) { if (m/(struct [^\s]+) \{/) { push @structs, $1; } s/__be/__u/; print; } print <<'EOT'; #include int main(int argc, char *argv[]) { printf("Word size: %zd\n", sizeof (long)); EOT for my $s (@structs) { print <<"EOT"; printf("%-40s:\\t%zd\\n", "$s", sizeof ($s)); EOT } print <<"EOT"; return 0; } EOT From info at ijdud.com Tue Nov 1 07:28:18 2005 From: info at ijdud.com (info at ijdud.com) Date: 2 Nov 2005 00:28:18 +0900 Subject: [openib-general] $B:G?78D<<3+J|(B Message-ID: <20051101152818.3515.qmail@mail.ijdud.com> $B$*Hh$lMM$G$9!*:#F|=i$a$F;XL>$r$7$F$^$9!#(B $B5U!}(BOK$B$G$9!*!Y$H$$$&%a%C%;!<%8$,F~$j$^$7$?!#(B $B;XL>$r!"8f=P$G$/$@$5$$!#(B $B8D<<$G?4B!BQ$($J$$J}$*Bg;v$K(B $B5qH]!'(Bbadluck at arigatouo.net From mshefty at ichips.intel.com Tue Nov 1 10:53:04 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 01 Nov 2005 10:53:04 -0800 Subject: [openib-general] compilation platform dependencies In-Reply-To: <527jbsfdii.fsf@cisco.com> References: <527jbsfdii.fsf@cisco.com> Message-ID: <4367B990.5040205@ichips.intel.com> Roland Dreier wrote: > I think the real fix is just to fix the declaration so that the > structure is laid out the same for all architectures, and bump the ABI > version yet again. I think that we'll need a similar update to cm_abi.h too. - Sean From jbarker at lanl.gov Tue Nov 1 10:55:01 2005 From: jbarker at lanl.gov (James W. Barker) Date: Tue, 01 Nov 2005 11:55:01 -0700 Subject: [openib-general] build fails on revision 3930 Message-ID: <6.2.3.4.2.20051101114151.0227ba00@cic-mail.lanl.gov> All, Following the instructions posted in your "installation cheetsheet" after I issue the command "make modules modules_install" the build fails with the error message below (this is revision 3930), the same procedure (not sure which revision number) was successful last week: CC [M] drivers/infiniband/core/addr.o drivers/infiniband/core/addr.c:330: warning: initialization from incompatible pointer type CC [M] drivers/infiniband/core/at.o In file included from drivers/infiniband/include/rdma/ib_sa.h:42, from drivers/infiniband/core/at.c:53: drivers/infiniband/include/rdma/ib_mad.h:601: error: syntax error before âgfp_tâ drivers/infiniband/include/rdma/ib_mad.h:601: warning: function declaration isnât a prototype In file included from drivers/infiniband/core/at.c:53: drivers/infiniband/include/rdma/ib_sa.h:288: error: syntax error before âgfp_tâ drivers/infiniband/include/rdma/ib_sa.h:291: error: âib_sa_path_rec_getâ declared as function returning a function drivers/infiniband/include/rdma/ib_sa.h:291: warning: function declaration isnât a prototype drivers/infiniband/include/rdma/ib_sa.h:292: error: syntax error before âvoidâ drivers/infiniband/include/rdma/ib_sa.h:299: error: syntax error before âgfp_tâ drivers/infiniband/include/rdma/ib_sa.h:302: error: âib_sa_mcmember_rec_queryâ declared as function returning a function drivers/infiniband/include/rdma/ib_sa.h:302: warning: function declaration isnât a prototype drivers/infiniband/include/rdma/ib_sa.h:303: error: syntax error before âvoidâ drivers/infiniband/include/rdma/ib_sa.h:310: error: syntax error before âgfp_tâ drivers/infiniband/include/rdma/ib_sa.h:313: error: âib_sa_service_rec_queryâ declared as function returning a function drivers/infiniband/include/rdma/ib_sa.h:313: warning: function declaration isnât a prototype drivers/infiniband/include/rdma/ib_sa.h:314: error: syntax error before âvoidâ drivers/infiniband/include/rdma/ib_sa.h:345: error: syntax error before âgfp_tâ drivers/infiniband/include/rdma/ib_sa.h:348: error: âib_sa_mcmember_rec_setâ declared as function returning a function drivers/infiniband/include/rdma/ib_sa.h:348: warning: function declaration isnât a prototype drivers/infiniband/include/rdma/ib_sa.h:349: error: syntax error before âvoidâ drivers/infiniband/include/rdma/ib_sa.h:387: error: syntax error before âgfp_tâ drivers/infiniband/include/rdma/ib_sa.h:390: error: âib_sa_mcmember_rec_deleteâ declared as function returning a function drivers/infiniband/include/rdma/ib_sa.h:390: warning: function declaration isnât a prototype drivers/infiniband/include/rdma/ib_sa.h:391: error: syntax error before âvoidâ make[3]: *** [drivers/infiniband/core/at.o] Error 1 make[2]: *** [drivers/infiniband/core] Error 2 make[1]: *** [drivers/infiniband] Error 2 make: *** [drivers] Error 2 James W. Barker, Ph.D. Los Alamos National Laboratory Computer and Computational Sciences Division Advanced Computing Laboratory - Resilient Technologies Team 505-665-9558 From kenjeffries at storagegear.com Tue Nov 1 11:19:31 2005 From: kenjeffries at storagegear.com (Ken Jeffries) Date: Tue, 1 Nov 2005 13:19:31 -0600 Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation References: <52ek61gdx7.fsf@cisco.com> Message-ID: <019501c5df19$30305ee0$0a97a8c0@blacktip> Roland, It's not clear to me which part(s) of the patch you don't like so I apologize if some of this is not relevant. The SRP-1 spec calls for iu size negotiation during login so not allowing iu size negotiation would be a bug in terms of spec compliance. I think there are valid reasons why iu size negotiation should be in the spec. I am sure that for a particular application and network that there is an optimum iu size (a Goldilocks size, neither too small nor too large). I suspect that a small iu will be better for small file or maybe small record database i/o and a large iu will be better for video serving. A server that has lots of cheap memory and perhaps an aversion to implementing the full indirect memory descriptor capability may be happy with very large iu's. An embedded system server with only modest memory may really need to not waste memory in permanently allocated big iu's that go largely unused. While I'm sure that there will be an optimum size iu for any particular application' and network I'm equally sure I don't know what that size is right now and I won't know what it is before we do considerable performance testing. Any particular srp client may connect to more than one srp server and those servers (and applications) may have different needs. One might be a video server and another might be a db server. Having the iu size set in a compile time variable in the srp client is less flexible than what we, at least, would like to see. When we were considering how to get both smaller iu's and to implement the real indirect memory descriptor capability it occurred to us that allowing the Linux side iu's to be sized by the existing compile time variable but making the on-the-wire iu size set by negotiation was an almost trivial extension to the existing code. By doing that applications can see a potentially large scatter/gather list length (a function of the client internal iu size) but the srp target also gets only what it wants. Since the indirect table memory descriptor just points to the descriptor list in the client side iu and since the "partial" list of descriptors in the on-the-wire iu is just a copy of the first descriptors in the client side iu, indirect descriptor setup and operation is easy. Regards, Ken Jeffries ----- Original Message ----- From: "Roland Dreier" To: "John Kingman" Cc: Sent: Monday, October 31, 2005 11:00 PM Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation > Thanks for the patch. However, I would like to hold off on new > features for the SRP driver to get it merged into 2.6.15. There's > about another week in the 2.6.15 merge window, so either way the delay > shouldn't be too long. > > With that said I don't think I like this patch. I don't think it's a > win to allocate 1 KB IUs when we'll almost never have gather/scatter > lists that big. Even the 256 byte IUs that the current driver uses > seem on the borderline of being too big. > > Also, is it really a win to have the target fetch a large indirect > buffer list? It seems like it would be better for performance to give > the SCSI layer a limit on the size of the gather/scatter list we > support so that our indirect buffer lists always fit in the IUs we send. > > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From iod00d at hp.com Tue Nov 1 11:19:05 2005 From: iod00d at hp.com (Grant Grundler) Date: Tue, 1 Nov 2005 11:19:05 -0800 Subject: [openib-general] compilation platform dependencies In-Reply-To: <527jbsfdii.fsf@cisco.com> References: <527jbsfdii.fsf@cisco.com> Message-ID: <20051101191905.GE6815@esmail.cup.hp.com> On Tue, Nov 01, 2005 at 10:07:17AM -0800, Roland Dreier wrote: > --- infiniband/include/rdma/ib_user_cm.h (revision 3932) > +++ infiniband/include/rdma/ib_user_cm.h (working copy) ... > @@ -84,6 +84,7 @@ struct ib_ucm_create_id_resp { > struct ib_ucm_destroy_id { > __u64 response; > __u32 id; > + __u32 reserved; I've seen use of this use of "data[0]": include/rdma/ib_user_verbs.h: __u64 driver_data[0]; isn't that for the same purpose? Apologies if I'm mixing things up... thanks, grant From mshefty at ichips.intel.com Tue Nov 1 11:26:49 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 01 Nov 2005 11:26:49 -0800 Subject: [openib-general] compilation platform dependencies In-Reply-To: References: Message-ID: <4367C179.5050102@ichips.intel.com> yipee wrote: > I think that I've noticed a problem in compiling user applications with a > different compiler than the running-kernel modules compiler (x86 32bit vs. > 64bit). For compiling an openib application on a 32bit x86 and running it on a > 64bit AMD Opteron. I've checked in Roland's patch, along with a similar one for userspace. Can you please pull the latest kernel and userspace and verify that your problem has been fixed? - Sean From mshefty at ichips.intel.com Tue Nov 1 11:30:57 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 01 Nov 2005 11:30:57 -0800 Subject: [openib-general] compilation platform dependencies In-Reply-To: <527jbsfdii.fsf@cisco.com> References: <527jbsfdii.fsf@cisco.com> Message-ID: <4367C271.3080208@ichips.intel.com> Roland Dreier wrote: > All structs more than 4 bytes in size have to be padded to a multiple > of 8 bytes, or else they end up with a different size between 32-bit > and 64-bit architectures. I think something like the patch below > along with the corresponding userspace libibcm change is required. Thanks - I committed this with a minor change to use __u8 reserved[4] in a couple places where __u32 were used. - Sean From mshefty at ichips.intel.com Tue Nov 1 12:06:10 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 01 Nov 2005 12:06:10 -0800 Subject: [openib-general] Re: [PATCH] fix umad object lifetime stuff In-Reply-To: <528xwdqn4x.fsf@cisco.com> References: <528xwdqn4x.fsf@cisco.com> Message-ID: <4367CAB2.6030600@ichips.intel.com> Roland Dreier wrote: > I just committed the following patch for user_mad.c, which fixes > various issues with possibly freeing various data structures before > the last reference is gone. For example, cdev_del() might return > before the last reference to the cdev is gone, so freeing a structure > containing the cdev is wrong at that point. (Side note: it's > essentially impossible to use cdev_init() safely unless the cdev in > question is statically allocated as part of the module). I can't say that I grasp the relationship between the cdev_* and class_* calls yet, but should umad and ucm create their own classes? I'm trying to add the ucma, and I'm wondering if we should add another infiniband_blah class, versus adding an rdma_cm entry somewhere else. - Sean From rolandd at cisco.com Tue Nov 1 12:10:56 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 12:10:56 -0800 Subject: [openib-general] compilation platform dependencies In-Reply-To: <20051101191905.GE6815@esmail.cup.hp.com> (Grant Grundler's message of "Tue, 1 Nov 2005 11:19:05 -0800") References: <527jbsfdii.fsf@cisco.com> <20051101191905.GE6815@esmail.cup.hp.com> Message-ID: <52pspkdt7z.fsf@cisco.com> > I've seen use of this use of "data[0]": > include/rdma/ib_user_verbs.h: __u64 driver_data[0]; > > isn't that for the same purpose? > Apologies if I'm mixing things up... The driver_data[] in ib_user_verbs.h is really there to give a hint that extra device-dependent data could follow. Reserved members of structs are used to pad it up to a 64-bit boundary. I'm not sure if __u64 driver_data[0]; forces alignment to an 8-byte boundary on i386... does it? - R. From rolandd at cisco.com Tue Nov 1 12:27:19 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 12:27:19 -0800 Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation In-Reply-To: <019501c5df19$30305ee0$0a97a8c0@blacktip> (Ken Jeffries's message of "Tue, 1 Nov 2005 13:19:31 -0600") References: <52ek61gdx7.fsf@cisco.com> <019501c5df19$30305ee0$0a97a8c0@blacktip> Message-ID: <52d5lkdsgo.fsf@cisco.com> Ken> The SRP-1 spec calls for iu size negotiation during login so Ken> not allowing iu size negotiation would be a bug in terms of Ken> spec compliance. I think there are valid reasons why iu size Ken> negotiation should be in the spec. Sure, no objection here. My objections are the following (as I said in my previous mail): - I don't like allocating a 1 KB IU for every send IU, since most of that memory will probably never be used. - I'm not convinced that it's _ever_ a win to have the target do another RDMA to fetch the indirect buffer list. You need to convince me that it's not better to simply tell the upper layers what the limit on s/g list length is to fit in the current IU size. - R. From rolandd at cisco.com Tue Nov 1 12:27:46 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 12:27:46 -0800 Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation In-Reply-To: (John Kingman's message of "Tue, 1 Nov 2005 09:25:42 -0600 (CST)") References: <52ek61gdx7.fsf@cisco.com> Message-ID: <528xw8dsfx.fsf@cisco.com> John> Without knowing what the optimal values should be, perhaps John> we should make some of these module parameters. Yes, or make them per-target-port tunables. - R. From rolandd at cisco.com Tue Nov 1 12:29:46 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 12:29:46 -0800 Subject: [openib-general] Re: [PATCH] fix umad object lifetime stuff In-Reply-To: <4367CAB2.6030600@ichips.intel.com> (Sean Hefty's message of "Tue, 01 Nov 2005 12:06:10 -0800") References: <528xwdqn4x.fsf@cisco.com> <4367CAB2.6030600@ichips.intel.com> Message-ID: <524q6wdscl.fsf@cisco.com> Sean> I can't say that I grasp the relationship between the cdev_* Sean> and class_* calls yet, but should umad and ucm create their Sean> own classes? I'm trying to add the ucma, and I'm wondering Sean> if we should add another infiniband_blah class, versus Sean> adding an rdma_cm entry somewhere else. ucma is not really attached to a single device, is it? How may character devices are you going to create? - R. From yipeeyipeeyipeeyipee at yahoo.com Tue Nov 1 12:43:44 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Tue, 1 Nov 2005 20:43:44 +0000 (UTC) Subject: [openib-general] Re: compilation platform dependencies References: <4367C179.5050102@ichips.intel.com> Message-ID: Sean Hefty ichips.intel.com> writes: [snip] > I've checked in Roland's patch, along with a similar one for userspace. > Can you please pull the latest kernel and userspace and verify that > your problem has been fixed? I've already left work for today and I'll be back only on Thursday. I'll test these fixes first thing in the morning. Thanks for the quick response (from everyone). y From yipeeyipeeyipeeyipee at yahoo.com Tue Nov 1 12:55:14 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Tue, 1 Nov 2005 20:55:14 +0000 (UTC) Subject: [openib-general] Re: compilation platform dependencies References: <4367AC1C.5030600@ichips.intel.com> Message-ID: Sean Hefty ichips.intel.com> writes: [snip] > Did you notice if any other data structures had this issue? Nope, that's the first one that bit me. It took some time to verify this problem. I had to update the kernel to 2.6.14 on two platforms, install everything and recheck. If I'll bump into more problems I'll post them here. Thanks, y From kingman at storagegear.com Tue Nov 1 13:21:40 2005 From: kingman at storagegear.com (John Kingman) Date: Tue, 1 Nov 2005 15:21:40 -0600 (CST) Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation In-Reply-To: <52d5lkdsgo.fsf@cisco.com> References: <52ek61gdx7.fsf@cisco.com> <019501c5df19$30305ee0$0a97a8c0@blacktip> <52d5lkdsgo.fsf@cisco.com> Message-ID: On Tue, 1 Nov 2005, Roland Dreier wrote: >My objections are the following (as I said in my previous mail): > - I don't like allocating a 1 KB IU for every send IU, since most of > that memory will probably never be used. I have no problem with changing the 1K IU to some other value. I would rather see this max IU size as a module parameter, however, so that it may be changed without having to rebuild the module. > - I'm not convinced that it's _ever_ a win to have the target do > another RDMA to fetch the indirect buffer list. You need to > convince me that it's not better to simply tell the upper layers > what the limit on s/g list length is to fit in the current IU size. If you agree that it_iu size negotiation is OK, then the case where you connect to a target with a smaller it_iu size than ib_srp was built with leaves some number of indirect descriptors in the position of only being available to the target via RDMA. This would probably be considered a win compared to not talking to the target at all. :-) John From rolandd at cisco.com Tue Nov 1 13:28:59 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 13:28:59 -0800 Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation In-Reply-To: (John Kingman's message of "Tue, 1 Nov 2005 15:21:40 -0600 (CST)") References: <52ek61gdx7.fsf@cisco.com> <019501c5df19$30305ee0$0a97a8c0@blacktip> <52d5lkdsgo.fsf@cisco.com> Message-ID: <52br14cb1g.fsf@cisco.com> John> I have no problem with changing the 1K IU to some other John> value. I would rather see this max IU size as a module John> parameter, however, so that it may be changed without having John> to rebuild the module. I guess that's OK for development but I'm not convinced we need to make this tunable for general end users. John> If you agree that it_iu size negotiation is OK, then the John> case where you connect to a target with a smaller it_iu size John> than ib_srp was built with leaves some number of indirect John> descriptors in the position of only being available to the John> target via RDMA. This would probably be considered a win John> compared to not talking to the target at all. :-) I'm missing something here. The SRP initiator registers itself with the SCSI midlayer after it has successfully connected to the target port, so I don't see why it can't pass exactly the right sg_tablesize value to the SCSI midlayer. - R. From mshefty at ichips.intel.com Tue Nov 1 13:40:18 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 01 Nov 2005 13:40:18 -0800 Subject: [openib-general] Re: [PATCH] fix umad object lifetime stuff In-Reply-To: <524q6wdscl.fsf@cisco.com> References: <528xwdqn4x.fsf@cisco.com> <4367CAB2.6030600@ichips.intel.com> <524q6wdscl.fsf@cisco.com> Message-ID: <4367E0C2.90704@ichips.intel.com> Roland Dreier wrote: > Sean> I can't say that I grasp the relationship between the cdev_* > Sean> and class_* calls yet, but should umad and ucm create their > Sean> own classes? I'm trying to add the ucma, and I'm wondering > Sean> if we should add another infiniband_blah class, versus > Sean> adding an rdma_cm entry somewhere else. > > ucma is not really attached to a single device, is it? How may > character devices are you going to create? No - it's not attached to a device. I was going to create just one character device, which is why I was wondering if creating a new class was the right approach. - Sean From brad.benton at us.ibm.com Tue Nov 1 14:27:32 2005 From: brad.benton at us.ibm.com (Brad Benton) Date: Tue, 1 Nov 2005 16:27:32 -0600 Subject: [openib-general] opensm errors with ehca In-Reply-To: <20051030235504.GT3275@kalmia.hozed.org> Message-ID: Troy Benjegerdes wrote on 10/30/2005 05:55:04 PM: > The firmware on the IBM eHCA causes opensm to spit out these kinds of > errors all the time.. > > Is there a way we can either not send P_KeyTable requests to any eHCA > guids, or figure out what (if anything) is broken in their firmware? > > Is this a spec violation, or just ambiguities in implementation? ... > Oct 30 17:49:46 053861 [43005960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > D bit...................0x0 > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x2 > trans_id................0x158c > attr_id.................0x16 (P_KeyTable) > resv....................0x0 > attr_mod................0x260000 Here is what is happening: The attribute modifier for the P_KeyTable attribute is divided into two, 16-bit halves. The most significant 16 bits is information that is only valid for switches. The problem here is that this SubnGet is for an HCA. The firmware currently sees that the upper bits are non-zero and since it is not a switch, throws the packet away. The proper response would be for it to ignore the upper bits and process the MAD. However, this is in firmware that won't be able to be changed quickly. So, in the meantime as a work around, would it be possible to have the opensm clear out the upper 16 bits of the attribute modifier when making a P_KeyTable request of an HCA? Thanks, --Brad -------------- next part -------------- An HTML attachment was scrubbed... URL: From info at lgatwg.com Tue Nov 1 14:31:38 2005 From: info at lgatwg.com (info at lgatwg.com) Date: 2 Nov 2005 07:31:38 +0900 Subject: [openib-general] $BLt6I1?1DpJs(B Message-ID: <20051101223138.12037.qmail@mail.lgatwg.com> $B5.J}$N%"%I%l%9$,!Z(BID:145265 $B at 6;R![$5$s$+$iD>@\;XL>$r$5$l$?$3$H$,3NG'$G$-$^$7$?$N$G!"D>@\O"Mm2DG=$H at _Dj$5$;$FD:$-$^$7$?!#:#$+$iD>@\O"MmJ}K!$r$40FFb$G$-$7$^$9$N$G!"G'>Z$H$7$F4JC1$JFCJL?=9~$_(B($BA4$FL5NA(B)$B$r$*4j$$CW$7$^$9!#(B $B8^IC$GL5NAEPO?"*%m%0%$%s!!(Bhttp://www.jumpb2.net/?raku $B"!4JC1(BPF$B>R2p"!(B $BG/Np!'Fb=o(B $B;E;v!'Lt6IE9J^1?1D(B($BA49q==FsE9J^(B) $B%3%a%s%H!'!V0l2s#5K|$/$i$$G=w at -$r0FFbCW$7$^$9$N$G!"D>@\%a!<%k(B $B$h$j%"%I%l%9$J$I$N3NG'$,$G$-$k$HJ]>Z$7$^$9!#L>A0!Z at 6;R![$G(B $BEPO?$5$l$F$*$j$^$9!#(B \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ $B5qH]%"%I(B (Refusal Adress) iranai at jumpb2.net $B!!(B From halr at voltaire.com Tue Nov 1 14:44:43 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Nov 2005 17:44:43 -0500 Subject: [openib-general] opensm errors with ehca In-Reply-To: References: Message-ID: <1130885082.4381.1202.camel@hal.voltaire.com> Hi Brad, On Tue, 2005-11-01 at 17:27, Brad Benton wrote: > > Troy Benjegerdes wrote on 10/30/2005 05:55:04 PM: > > > The firmware on the IBM eHCA causes opensm to spit out these kinds > of > > errors all the time.. > > > > Is there a way we can either not send P_KeyTable requests to any > eHCA > > guids, or figure out what (if anything) is broken in their firmware? > > > > Is this a spec violation, or just ambiguities in implementation? > ... > > Oct 30 17:49:46 053861 [43005960] -> SMP dump: > > base_ver................0x1 > > mgmt_class..............0x81 > > class_ver...............0x1 > > method..................0x1 > (SubnGet) > > D bit...................0x0 > > status..................0x0 > > hop_ptr.................0x0 > > hop_count...............0x2 > > trans_id................0x158c > > attr_id.................0x16 > (P_KeyTable) > > resv....................0x0 > > attr_mod................0x260000 > > Here is what is happening: The attribute modifier for the P_KeyTable > attribute is divided into two, 16-bit halves. The most significant 16 > bits > is information that is only valid for switches. The problem here is > that > this SubnGet is for an HCA. The firmware currently sees that the > upper > bits are non-zero and since it is not a switch, throws the packet > away. > The proper response would be for it to ignore the upper bits and > process > the MAD. However, this is in firmware that won't be able to be > changed > quickly. So, in the meantime as a work around, would it be possible > to > have the opensm clear out the upper 16 bits of the attribute modifier > when > making a P_KeyTable request of an HCA? I thought the IBM eHCA identified itself as both a switch and some number of HCAs behind it. Are you sure this is a SubnSet P_KeyTable to a HCA port ? If so, I will look at this and fix it so that even though this should be ignored for HCA and router ports, it will be set to 0. Troy, is there more of this log that can be sent ? -- Hal > Thanks, > --Brad > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From kingman at storagegear.com Tue Nov 1 15:43:07 2005 From: kingman at storagegear.com (John Kingman) Date: Tue, 1 Nov 2005 17:43:07 -0600 (CST) Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation In-Reply-To: <52br14cb1g.fsf@cisco.com> References: <52ek61gdx7.fsf@cisco.com> <019501c5df19$30305ee0$0a97a8c0@blacktip> <52d5lkdsgo.fsf@cisco.com> <52br14cb1g.fsf@cisco.com> Message-ID: On Tue, 1 Nov 2005, Roland Dreier wrote: >I'm missing something here. The SRP initiator registers itself with >the SCSI midlayer after it has successfully connected to the target >port, so I don't see why it can't pass exactly the right sg_tablesize >value to the SCSI midlayer. The current code sets the sg_tablesize prior to the call to scsi_host_alloc() which is done at the time the target is added, not at the time of the connection to the target. If sg_tablesize can be modified/supplied with the scsi_add_host() call, then the SRP initiator could pass it at connect time. John From rolandd at cisco.com Tue Nov 1 15:50:01 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 15:50:01 -0800 Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation In-Reply-To: (John Kingman's message of "Tue, 1 Nov 2005 17:43:07 -0600 (CST)") References: <52ek61gdx7.fsf@cisco.com> <019501c5df19$30305ee0$0a97a8c0@blacktip> <52d5lkdsgo.fsf@cisco.com> <52br14cb1g.fsf@cisco.com> Message-ID: <52y848apxy.fsf@cisco.com> John> The current code sets the sg_tablesize prior to the call to John> scsi_host_alloc() which is done at the time the target is John> added, not at the time of the connection to the target. There is an .sg_tablesize value in the host template we use, yes. But there's no reason that I know of that the sg_tablesize value in the actual SCSI host structure can't be modified after it's allocated, in exactly the same way as the existing code can modify the max_sectors value. - R. From rolandd at cisco.com Tue Nov 1 15:50:34 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 15:50:34 -0800 Subject: [openib-general] Re: [PATCH] fix umad object lifetime stuff In-Reply-To: <4367E0C2.90704@ichips.intel.com> (Sean Hefty's message of "Tue, 01 Nov 2005 13:40:18 -0800") References: <528xwdqn4x.fsf@cisco.com> <4367CAB2.6030600@ichips.intel.com> <524q6wdscl.fsf@cisco.com> <4367E0C2.90704@ichips.intel.com> Message-ID: <52u0ewapx1.fsf@cisco.com> Sean> No - it's not attached to a device. I was going to create Sean> just one character device, which is why I was wondering if Sean> creating a new class was the right approach. It might just make sense to use the existing misc class I guess. - R. From mshefty at ichips.intel.com Tue Nov 1 16:15:16 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Tue, 01 Nov 2005 16:15:16 -0800 Subject: [openib-general] Re: [PATCH] fix umad object lifetime stuff In-Reply-To: <52u0ewapx1.fsf@cisco.com> References: <528xwdqn4x.fsf@cisco.com> <4367CAB2.6030600@ichips.intel.com> <524q6wdscl.fsf@cisco.com> <4367E0C2.90704@ichips.intel.com> <52u0ewapx1.fsf@cisco.com> Message-ID: <43680514.9090208@ichips.intel.com> Roland Dreier wrote: > Sean> No - it's not attached to a device. I was going to create > Sean> just one character device, which is why I was wondering if > Sean> creating a new class was the right approach. > > It might just make sense to use the existing misc class I guess. It appears that doing this requires using the misc MAJOR number. Do we want to do that or use the IB MAJOR? - Sean From rolandd at cisco.com Tue Nov 1 16:20:09 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 16:20:09 -0800 Subject: [openib-general] Re: [PATCH] fix umad object lifetime stuff In-Reply-To: <43680514.9090208@ichips.intel.com> (Sean Hefty's message of "Tue, 01 Nov 2005 16:15:16 -0800") References: <528xwdqn4x.fsf@cisco.com> <4367CAB2.6030600@ichips.intel.com> <524q6wdscl.fsf@cisco.com> <4367E0C2.90704@ichips.intel.com> <52u0ewapx1.fsf@cisco.com> <43680514.9090208@ichips.intel.com> Message-ID: <52mzknc346.fsf@cisco.com> Sean> It appears that doing this requires using the misc MAJOR Sean> number. Do we want to do that or use the IB MAJOR? I don't think it really matters either way. Using misc is probably easier. - R. From kenjeffries at austin.rr.com Tue Nov 1 16:44:55 2005 From: kenjeffries at austin.rr.com (Kenneth L Jeffries) Date: Tue, 1 Nov 2005 18:44:55 -0600 Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu lengthnegotiation References: <52ek61gdx7.fsf@cisco.com> <019501c5df19$30305ee0$0a97a8c0@blacktip> <52d5lkdsgo.fsf@cisco.com> Message-ID: <020501c5df46$a58ec870$0a97a8c0@blacktip> > My objections are the following (as I said in my previous mail): > - I don't like allocating a 1 KB IU for every send IU, since most of > that memory will probably never be used. > - I'm not convinced that it's _ever_ a win to have the target do > another RDMA to fetch the indirect buffer list. You need to > convince me that it's not better to simply tell the upper layers > what the limit on s/g list length is to fit in the current IU size. I also don't want to allocate 1KB IU's. If IU's were fixed size, I'd want (probably, depending on performance testing) a fixed size of 350 bytes (from Fab Tiller's 64KB i/o, 4KB pages, Windows) or possibly even the mininum DDBD (as Fab Tiller also says). 1KB IU's with thousands of RC's causes me a lot of wasted space heartburn. [as an aside, it sure would be nice if we could do an SRP-3 (since SRP-2 is dead) where multiple direct descriptors would be allowed. The only way to get multiple descriptors now is with indirect descriptors.] I am pretty sure that someone doing a video server might want to do, say, 1MB i/o's. 1MB with 4KB pages means 256 descriptors and an iu of something over 4096 bytes. I definitely don't want to be told by the srp initiator that I need to use 4KB iu's. (So we agree there.) Your second point has a couple of parts. On the srp target side, if rmda reads of additional indirect buffer descriptors is done only 1% of the time and the trade off is much better memory utilization (ie. smaller iu's) then from the target's point of view there probably is a big win in doing the extra descriptor fetches. The other side is the number of trips from the application thru the the scsi and srp layers per i/o. But again, if extra trips are made only 1% of the time, then my guess is that smaller iu's would be better. I do find some appeal in having the internal initiator iu size be able to be larger (easily) than the on-the-wire iu size. If it were hard to do then the appeal would not outweigh the cost. As long as the target is able to set the iu size and the target can set the iu size to be fairly small, then I'm ok with just passing that size on to the scsi upper layer. I'm also ok with a per-connection internal initiator iu size if someone wants to code that. Ken Jeffries From rolandd at cisco.com Tue Nov 1 17:26:54 2005 From: rolandd at cisco.com (Roland Dreier) Date: Tue, 01 Nov 2005 17:26:54 -0800 Subject: [openib-general] [PATCH] kmalloc + memset(, 0, ) -> kzalloc conversions Message-ID: <524q6vc00x.fsf@cisco.com> Anyone have any objection to me committing the following patch? It has the following effect on a x86_64 build: text data bss dec hex filename 220354 7416 1336 229106 37ef2 drivers/infiniband/built-in.o-before 219826 7416 1336 228578 37ce2 drivers/infiniband/built-in.o-after Index: infiniband/ulp/ipoib/ipoib_main.c =================================================================== --- infiniband/ulp/ipoib/ipoib_main.c (revision 3935) +++ infiniband/ulp/ipoib/ipoib_main.c (working copy) @@ -729,25 +729,21 @@ int ipoib_dev_init(struct net_device *de /* Allocate RX/TX "rings" to hold queued skbs */ - priv->rx_ring = kmalloc(IPOIB_RX_RING_SIZE * sizeof (struct ipoib_rx_buf), + priv->rx_ring = kzalloc(IPOIB_RX_RING_SIZE * sizeof (struct ipoib_rx_buf), GFP_KERNEL); if (!priv->rx_ring) { printk(KERN_WARNING "%s: failed to allocate RX ring (%d entries)\n", ca->name, IPOIB_RX_RING_SIZE); goto out; } - memset(priv->rx_ring, 0, - IPOIB_RX_RING_SIZE * sizeof (struct ipoib_rx_buf)); - priv->tx_ring = kmalloc(IPOIB_TX_RING_SIZE * sizeof (struct ipoib_tx_buf), + priv->tx_ring = kzalloc(IPOIB_TX_RING_SIZE * sizeof (struct ipoib_tx_buf), GFP_KERNEL); if (!priv->tx_ring) { printk(KERN_WARNING "%s: failed to allocate TX ring (%d entries)\n", ca->name, IPOIB_TX_RING_SIZE); goto out_rx_ring_cleanup; } - memset(priv->tx_ring, 0, - IPOIB_TX_RING_SIZE * sizeof (struct ipoib_tx_buf)); /* priv->tx_head & tx_tail are already 0 */ Index: infiniband/ulp/ipoib/ipoib_multicast.c =================================================================== --- infiniband/ulp/ipoib/ipoib_multicast.c (revision 3935) +++ infiniband/ulp/ipoib/ipoib_multicast.c (working copy) @@ -135,12 +135,10 @@ static struct ipoib_mcast *ipoib_mcast_a { struct ipoib_mcast *mcast; - mcast = kmalloc(sizeof (*mcast), can_sleep ? GFP_KERNEL : GFP_ATOMIC); + mcast = kzalloc(sizeof *mcast, can_sleep ? GFP_KERNEL : GFP_ATOMIC); if (!mcast) return NULL; - memset(mcast, 0, sizeof (*mcast)); - init_completion(&mcast->done); mcast->dev = dev; Index: infiniband/core/agent.c =================================================================== --- infiniband/core/agent.c (revision 3935) +++ infiniband/core/agent.c (working copy) @@ -155,13 +155,12 @@ int ib_agent_port_open(struct ib_device int ret; /* Create new device info */ - port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL); + port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL); if (!port_priv) { printk(KERN_ERR SPFX "No memory for ib_agent_port_private\n"); ret = -ENOMEM; goto error1; } - memset(port_priv, 0, sizeof *port_priv); /* Obtain send only MAD agent for SMI QP */ port_priv->agent[0] = ib_register_mad_agent(device, port_num, Index: infiniband/core/cm.c =================================================================== --- infiniband/core/cm.c (revision 3935) +++ infiniband/core/cm.c (working copy) @@ -544,11 +544,10 @@ struct ib_cm_id *ib_create_cm_id(struct struct cm_id_private *cm_id_priv; int ret; - cm_id_priv = kmalloc(sizeof *cm_id_priv, GFP_KERNEL); + cm_id_priv = kzalloc(sizeof *cm_id_priv, GFP_KERNEL); if (!cm_id_priv) return ERR_PTR(-ENOMEM); - memset(cm_id_priv, 0, sizeof *cm_id_priv); cm_id_priv->id.state = IB_CM_IDLE; cm_id_priv->id.device = device; cm_id_priv->id.cm_handler = cm_handler; @@ -621,10 +620,9 @@ static struct cm_timewait_info * cm_crea { struct cm_timewait_info *timewait_info; - timewait_info = kmalloc(sizeof *timewait_info, GFP_KERNEL); + timewait_info = kzalloc(sizeof *timewait_info, GFP_KERNEL); if (!timewait_info) return ERR_PTR(-ENOMEM); - memset(timewait_info, 0, sizeof *timewait_info); timewait_info->work.local_id = local_id; INIT_WORK(&timewait_info->work.work, cm_work_handler, Index: infiniband/core/uverbs_main.c =================================================================== --- infiniband/core/uverbs_main.c (revision 3935) +++ infiniband/core/uverbs_main.c (working copy) @@ -725,12 +725,10 @@ static void ib_uverbs_add_one(struct ib_ if (!device->alloc_ucontext) return; - uverbs_dev = kmalloc(sizeof *uverbs_dev, GFP_KERNEL); + uverbs_dev = kzalloc(sizeof *uverbs_dev, GFP_KERNEL); if (!uverbs_dev) return; - memset(uverbs_dev, 0, sizeof *uverbs_dev); - kref_init(&uverbs_dev->ref); spin_lock(&map_lock); Index: infiniband/core/device.c =================================================================== --- infiniband/core/device.c (revision 3935) +++ infiniband/core/device.c (working copy) @@ -161,17 +161,9 @@ static int alloc_name(char *name) */ struct ib_device *ib_alloc_device(size_t size) { - void *dev; - BUG_ON(size < sizeof (struct ib_device)); - dev = kmalloc(size, GFP_KERNEL); - if (!dev) - return NULL; - - memset(dev, 0, size); - - return dev; + return kzalloc(size, GFP_KERNEL); } EXPORT_SYMBOL(ib_alloc_device); Index: infiniband/core/mad.c =================================================================== --- infiniband/core/mad.c (revision 3935) +++ infiniband/core/mad.c (working copy) @@ -255,12 +255,11 @@ struct ib_mad_agent *ib_register_mad_age } /* Allocate structures */ - mad_agent_priv = kmalloc(sizeof *mad_agent_priv, GFP_KERNEL); + mad_agent_priv = kzalloc(sizeof *mad_agent_priv, GFP_KERNEL); if (!mad_agent_priv) { ret = ERR_PTR(-ENOMEM); goto error1; } - memset(mad_agent_priv, 0, sizeof *mad_agent_priv); mad_agent_priv->agent.mr = ib_get_dma_mr(port_priv->qp_info[qpn].qp->pd, IB_ACCESS_LOCAL_WRITE); @@ -448,14 +447,13 @@ struct ib_mad_agent *ib_register_mad_sno goto error1; } /* Allocate structures */ - mad_snoop_priv = kmalloc(sizeof *mad_snoop_priv, GFP_KERNEL); + mad_snoop_priv = kzalloc(sizeof *mad_snoop_priv, GFP_KERNEL); if (!mad_snoop_priv) { ret = ERR_PTR(-ENOMEM); goto error1; } /* Now, fill in the various structures */ - memset(mad_snoop_priv, 0, sizeof *mad_snoop_priv); mad_snoop_priv->qp_info = &port_priv->qp_info[qpn]; mad_snoop_priv->agent.device = device; mad_snoop_priv->agent.recv_handler = recv_handler; @@ -794,10 +792,9 @@ struct ib_mad_send_buf * ib_create_send_ (!rmpp_active && buf_size > sizeof(struct ib_mad))) return ERR_PTR(-EINVAL); - buf = kmalloc(sizeof *mad_send_wr + buf_size, gfp_mask); + buf = kzalloc(sizeof *mad_send_wr + buf_size, gfp_mask); if (!buf) return ERR_PTR(-ENOMEM); - memset(buf, 0, sizeof *mad_send_wr + buf_size); mad_send_wr = buf + buf_size; mad_send_wr->send_buf.mad = buf; @@ -1039,14 +1036,12 @@ static int method_in_use(struct ib_mad_m static int allocate_method_table(struct ib_mad_mgmt_method_table **method) { /* Allocate management method table */ - *method = kmalloc(sizeof **method, GFP_ATOMIC); + *method = kzalloc(sizeof **method, GFP_ATOMIC); if (!*method) { printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_method_table\n"); return -ENOMEM; } - /* Clear management method table */ - memset(*method, 0, sizeof **method); return 0; } @@ -1137,15 +1132,14 @@ static int add_nonoui_reg_req(struct ib_ class = &port_priv->version[mad_reg_req->mgmt_class_version].class; if (!*class) { /* Allocate management class table for "new" class version */ - *class = kmalloc(sizeof **class, GFP_ATOMIC); + *class = kzalloc(sizeof **class, GFP_ATOMIC); if (!*class) { printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_class_table\n"); ret = -ENOMEM; goto error1; } - /* Clear management class table */ - memset(*class, 0, sizeof(**class)); + /* Allocate method table for this management class */ method = &(*class)->method_table[mgmt_class]; if ((ret = allocate_method_table(method))) @@ -1209,25 +1203,24 @@ static int add_oui_reg_req(struct ib_mad mad_reg_req->mgmt_class_version].vendor; if (!*vendor_table) { /* Allocate mgmt vendor class table for "new" class version */ - vendor = kmalloc(sizeof *vendor, GFP_ATOMIC); + vendor = kzalloc(sizeof *vendor, GFP_ATOMIC); if (!vendor) { printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_vendor_class_table\n"); goto error1; } - /* Clear management vendor class table */ - memset(vendor, 0, sizeof(*vendor)); + *vendor_table = vendor; } if (!(*vendor_table)->vendor_class[vclass]) { /* Allocate table for this management vendor class */ - vendor_class = kmalloc(sizeof *vendor_class, GFP_ATOMIC); + vendor_class = kzalloc(sizeof *vendor_class, GFP_ATOMIC); if (!vendor_class) { printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_vendor_class\n"); goto error2; } - memset(vendor_class, 0, sizeof(*vendor_class)); + (*vendor_table)->vendor_class[vclass] = vendor_class; } for (i = 0; i < MAX_MGMT_OUI; i++) { @@ -2524,12 +2517,12 @@ static int ib_mad_port_open(struct ib_de char name[sizeof "ib_mad123"]; /* Create new device info */ - port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL); + port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL); if (!port_priv) { printk(KERN_ERR PFX "No memory for ib_mad_port_private\n"); return -ENOMEM; } - memset(port_priv, 0, sizeof *port_priv); + port_priv->device = device; port_priv->port_num = port_num; spin_lock_init(&port_priv->reg_lock); Index: infiniband/core/sysfs.c =================================================================== --- infiniband/core/sysfs.c (revision 3935) +++ infiniband/core/sysfs.c (working copy) @@ -307,14 +307,13 @@ static ssize_t show_pma_counter(struct i if (!p->ibdev->process_mad) return sprintf(buf, "N/A (no PMA)\n"); - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); if (!in_mad || !out_mad) { ret = -ENOMEM; goto out; } - memset(in_mad, 0, sizeof *in_mad); in_mad->mad_hdr.base_version = 1; in_mad->mad_hdr.mgmt_class = IB_MGMT_CLASS_PERF_MGMT; in_mad->mad_hdr.class_version = 1; @@ -508,10 +507,9 @@ static int add_port(struct ib_device *de if (ret) return ret; - p = kmalloc(sizeof *p, GFP_KERNEL); + p = kzalloc(sizeof *p, GFP_KERNEL); if (!p) return -ENOMEM; - memset(p, 0, sizeof *p); p->ibdev = device; p->port_num = port_num; Index: infiniband/core/ucm.c =================================================================== --- infiniband/core/ucm.c (revision 3935) +++ infiniband/core/ucm.c (working copy) @@ -172,11 +172,10 @@ static struct ib_ucm_context *ib_ucm_ctx struct ib_ucm_context *ctx; int result; - ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); + ctx = kzalloc(sizeof *ctx, GFP_KERNEL); if (!ctx) return NULL; - memset(ctx, 0, sizeof *ctx); atomic_set(&ctx->ref, 1); init_waitqueue_head(&ctx->wait); ctx->file = file; @@ -386,11 +385,10 @@ static int ib_ucm_event_handler(struct i ctx = cm_id->context; - uevent = kmalloc(sizeof(*uevent), GFP_KERNEL); + uevent = kzalloc(sizeof *uevent, GFP_KERNEL); if (!uevent) goto err1; - memset(uevent, 0, sizeof(*uevent)); uevent->ctx = ctx; uevent->cm_id = cm_id; uevent->resp.uid = ctx->uid; @@ -1345,11 +1343,10 @@ static void ib_ucm_add_one(struct ib_dev if (!device->alloc_ucontext) return; - ucm_dev = kmalloc(sizeof *ucm_dev, GFP_KERNEL); + ucm_dev = kzalloc(sizeof *ucm_dev, GFP_KERNEL); if (!ucm_dev) return; - memset(ucm_dev, 0, sizeof *ucm_dev); ucm_dev->ib_dev = device; ucm_dev->devnum = find_first_zero_bit(dev_map, IB_UCM_MAX_DEVICES); Index: infiniband/hw/mthca/mthca_profile.c =================================================================== --- infiniband/hw/mthca/mthca_profile.c (revision 3935) +++ infiniband/hw/mthca/mthca_profile.c (working copy) @@ -80,12 +80,10 @@ u64 mthca_make_profile(struct mthca_dev struct mthca_resource tmp; int i, j; - profile = kmalloc(MTHCA_RES_NUM * sizeof *profile, GFP_KERNEL); + profile = kzalloc(MTHCA_RES_NUM * sizeof *profile, GFP_KERNEL); if (!profile) return -ENOMEM; - memset(profile, 0, MTHCA_RES_NUM * sizeof *profile); - profile[MTHCA_RES_QP].size = dev_lim->qpc_entry_sz; profile[MTHCA_RES_EEC].size = dev_lim->eec_entry_sz; profile[MTHCA_RES_SRQ].size = dev_lim->srq_entry_sz; Index: infiniband/hw/mthca/mthca_mr.c =================================================================== --- infiniband/hw/mthca/mthca_mr.c (revision 3935) +++ infiniband/hw/mthca/mthca_mr.c (working copy) @@ -140,13 +140,11 @@ static int __devinit mthca_buddy_init(st buddy->max_order = max_order; spin_lock_init(&buddy->lock); - buddy->bits = kmalloc((buddy->max_order + 1) * sizeof (long *), + buddy->bits = kzalloc((buddy->max_order + 1) * sizeof (long *), GFP_KERNEL); if (!buddy->bits) goto err_out; - memset(buddy->bits, 0, (buddy->max_order + 1) * sizeof (long *)); - for (i = 0; i <= buddy->max_order; ++i) { s = BITS_TO_LONGS(1 << (buddy->max_order - i)); buddy->bits[i] = kmalloc(s * sizeof (long), GFP_KERNEL); From halr at voltaire.com Tue Nov 1 18:26:56 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 01 Nov 2005 21:26:56 -0500 Subject: [openib-general] opensm errors with ehca In-Reply-To: <20051030235504.GT3275@kalmia.hozed.org> References: <20051030235504.GT3275@kalmia.hozed.org> Message-ID: <1130898415.4381.1784.camel@hal.voltaire.com> On Sun, 2005-10-30 at 18:55, Troy Benjegerdes wrote: > The firmware on the IBM eHCA causes opensm to spit out these kinds of > errors all the time.. > > Is there a way we can either not send P_KeyTable requests to any eHCA > guids, or figure out what (if anything) is broken in their firmware? > > Is this a spec violation, or just ambiguities in implementation? > > Oct 30 17:49:46 053820 [43005960] -> umad_receiver: ERR 5409: send > completed wit > h error (method=0x1 attr=0x16 trans_id=0x158c) -- dropping. > Oct 30 17:49:46 053830 [43005960] -> umad_receiver: ERR 5411: DR SMP hop > ptr 0 h > op count 2 DR SLID 0x0 DR DLID 0x0 > Oct 30 17:49:46 053839 [43005960] -> __osm_sm_mad_ctrl_send_err_cb: ERR > 3113: MA > D completed in error (IB_TIMEOUT). > Oct 30 17:49:46 053861 [43005960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > D bit...................0x0 > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x2 > trans_id................0x158c > attr_id.................0x16 (P_KeyTable) > resv....................0x0 > attr_mod................0x260000 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16] > Return path: [0][0][0] > Reserved: [0][0][0][0][0][0][0] Can you try the following opensm patch and see if this eliminates those timeout messages ? This patch clears the high part of the attribute modifier when not a switch (when obtaining the PKeyTable). -- Hal Index: osm_port_info_rcv.c =================================================================== --- osm_port_info_rcv.c (revision 3906) +++ osm_port_info_rcv.c (working copy) @@ -430,6 +430,7 @@ void osm_pkey_get_tables( osm_dr_path_t path; uint8_t port_num; uint16_t block_num, max_blocks; + uint32_t attr_mod_ho; osm_switch_t* p_switch; OSM_LOG_ENTER( p_log, osm_physp_has_pkey ); @@ -455,7 +456,7 @@ void osm_pkey_get_tables( else { /* This is a switch, and not a management port. The maximum blocks is defined - on the switch info partition enforcement cap. */ + in the switch info partition enforcement cap. */ p_switch = osm_get_switch_by_guid(p_subn, p_node->node_info.node_guid); if (! p_switch) @@ -472,10 +473,14 @@ void osm_pkey_get_tables( for (block_num = 0 ; block_num < max_blocks ; block_num++) { + if (osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH) + attr_mod_ho = block_num; + else + attr_mod_ho = block_num | (port_num << 16); status = osm_req_get( p_req, &path, IB_MAD_ATTR_P_KEY_TABLE, - cl_hton32(block_num | (port_num << 16) ), + cl_hton32(attr_mod_ho), CL_DISP_MSGID_NONE, &context ); From ftillier at silverstorm.com Tue Nov 1 20:22:14 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Tue, 1 Nov 2005 20:22:14 -0800 Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation In-Reply-To: <020501c5df46$a58ec870$0a97a8c0@blacktip> Message-ID: <000101c5df65$01a94a90$9e5aa8c0@infiniconsys.com> > From: Kenneth L Jeffries [mailto:kenjeffries at austin.rr.com] > Sent: Tuesday, November 01, 2005 4:45 PM > > > My objections are the following (as I said in my previous mail): > > - I don't like allocating a 1 KB IU for every send IU, since most of > > that memory will probably never be used. > > - I'm not convinced that it's _ever_ a win to have the target do > > another RDMA to fetch the indirect buffer list. You need to > > convince me that it's not better to simply tell the upper layers > > what the limit on s/g list length is to fit in the current IU size. > > I also don't want to allocate 1KB IU's. If IU's were fixed size, I'd want > (probably, depending on performance testing) a fixed size of 350 bytes > (from Fab Tiller's 64KB i/o, 4KB pages, Windows) or possibly even > the mininum DDBD (as Fab Tiller also says). 1KB IU's with thousands > of RC's causes me a lot of wasted space heartburn. Even 350 bytes is a burden - imagine a target that supports a queue depth of 1000 I/Os from a few dozen initators. Ideally, I'd like to see us use just DDBDs and the 64-byte IU, along with registering the data buffers on a per-I/O basis, either via FMR or regular MRs. > [as an aside, it sure would be nice if we could do an SRP-3 (since SRP-2 > is dead) where multiple direct descriptors would be allowed. The only > way to get multiple descriptors now is with indirect descriptors.] That saves you 20 bytes - not a huge gain. > I am pretty sure that someone doing a video server might want to do, say, > 1MB i/o's. 1MB with 4KB pages means 256 descriptors and an iu of > something over 4096 bytes. I definitely don't want to be told by the srp > initiator that I need to use 4KB iu's. (So we agree there.) For large I/O, doing a registration of the buffer and sending a DDBD with a single descriptor might well provide the best performance. If you look at the traffic on the wire, having the target do multiple page-sized RDMA operations is far less efficient than creating a virtual contiguous (to the target) region that a single RDMA operation can service. - Fab From info at dswrench.com Tue Nov 1 20:33:30 2005 From: info at dswrench.com (info at dswrench.com) Date: Tue, 01 Nov 2005 20:33:30 -0800 Subject: [openib-general] DS4000 Storage Server (Engenio based) Management Protocol Analyzer Message-ID: Dswrench.com announces a new debugging tool for the "DS4000 storage server (Engenio based disk storage array)." We are proud to introduce the DS4000 Management Protocol Analyzer, or DSMPA. DSMPA is freeware built by engineers for engineers, so we encourage you to use often and spread the word! What it does: DSPMA captures management network traffic between SANtricity or other management software and disk storage arrays. It assembles network packets into DS4000 management objects. The captured objects can be viewed graphically and played back. It is a powerful tool for DS4000 storage server support center, storage administrators, storage management software developers and quality assurance engineers. To view sample shots of DSMPA capabilities, click the following links: -To access the User guide, please click http://www.dswrench.com/documents/dsmpa.pdf. -To access the DSMPA main screen, please click http://www.dswrench.com/documents/CaptureAndView.html -DSMPA can save a communication session to html file. To view a sample session output, please click http://www.dswrench.com/documents/session_sample.html. -DSMPA also captures SANtricity generated network traffic. This page (http://www.dswrench.com/documents/snmp_sample.html) shows captured SNMP trap fired by the SANtricity management software. How to get it: DSMPA is now available for free download from http://www.dswrench.com For more information, please visit http://www.dswrench.com. Thank you for taking a few minutes to improve the operation of your DS4000 storage server. Please visit dswrench.com often for updates and new software. Use often and spread the word! Best Regards, The Dswrench.com Team From hozer at hozed.org Tue Nov 1 20:49:18 2005 From: hozer at hozed.org (Troy Benjegerdes) Date: Tue, 1 Nov 2005 22:49:18 -0600 Subject: [openib-general] opensm errors with ehca In-Reply-To: <1130898415.4381.1784.camel@hal.voltaire.com> References: <20051030235504.GT3275@kalmia.hozed.org> <1130898415.4381.1784.camel@hal.voltaire.com> Message-ID: <20051102044918.GZ3275@kalmia.hozed.org> > Can you try the following opensm patch and see if this eliminates those > timeout messages ? > > This patch clears the high part of the attribute modifier when not a > switch (when obtaining the PKeyTable). > > -- Hal > > Index: osm_port_info_rcv.c > =================================================================== > --- osm_port_info_rcv.c (revision 3906) > +++ osm_port_info_rcv.c (working copy) > @@ -430,6 +430,7 @@ void osm_pkey_get_tables( > osm_dr_path_t path; > uint8_t port_num; > uint16_t block_num, max_blocks; > + uint32_t attr_mod_ho; > osm_switch_t* p_switch; > > OSM_LOG_ENTER( p_log, osm_physp_has_pkey ); > @@ -455,7 +456,7 @@ void osm_pkey_get_tables( > else > { > /* This is a switch, and not a management port. The maximum blocks is defined > - on the switch info partition enforcement cap. */ > + in the switch info partition enforcement cap. */ > p_switch = osm_get_switch_by_guid(p_subn, p_node->node_info.node_guid); > > if (! p_switch) > @@ -472,10 +473,14 @@ void osm_pkey_get_tables( > > for (block_num = 0 ; block_num < max_blocks ; block_num++) > { > + if (osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH) > + attr_mod_ho = block_num; > + else > + attr_mod_ho = block_num | (port_num << 16); > status = osm_req_get( p_req, > &path, > IB_MAD_ATTR_P_KEY_TABLE, > - cl_hton32(block_num | (port_num << 16) ), > + cl_hton32(attr_mod_ho), > CL_DISP_MSGID_NONE, > &context ); > This seems to ignore the IBM logical HCA, but gives the same thing on the IBM Logical switch. Is there a way to ignore this as well? switchguids=0x2550000038580 Switch 63 "S-0002550000038580" # IBM Logical Switch 1 port 0 lid 21 [2] "H-0002550000038500"[1] [1] "S-0002c90200402917"[22] I still get: Nov 01 22:34:08 660205 [43005960] -> umad_receiver: ERR 5409: send completed wit h error (method=0x1 attr=0x16 trans_id=0x13c9) -- dropping. Nov 01 22:34:08 660213 [43005960] -> umad_receiver: ERR 5411: DR SMP hop ptr 0 h op count 2 DR SLID 0x0 DR DLID 0x0 Nov 01 22:34:08 660221 [43005960] -> __osm_sm_mad_ctrl_send_err_cb: ERR 3113: MA D completed in error (IB_TIMEOUT). Nov 01 22:34:08 660243 [43005960] -> SMP dump: base_ver................0x1 mgmt_class..............0x81 class_ver...............0x1 method..................0x1 (SubnGet) D bit...................0x0 status..................0x0 hop_ptr.................0x0 hop_count...............0x2 trans_id................0x13c9 attr_id.................0x16 (P_KeyTable) resv....................0x0 attr_mod................0x10000 m_key...................0x0000000000000000 dr_slid.................0xFFFF dr_dlid.................0xFFFF Initial path: [0][1][16] Return path: [0][0][0] Reserved: [0][0][0][0][0][0][0] From panda at cse.ohio-state.edu Tue Nov 1 20:54:49 2005 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Tue, 1 Nov 2005 23:54:49 -0500 (EST) Subject: [openib-general] Announcing the release of MVAPICH2 0.9.0 (MPI-2 over InfiniBand and other RDMA Interconnects) Message-ID: <200511020454.jA24snOm029539@xi.cse.ohio-state.edu> The MVAPICH team is pleased to announce the release of MVAPICH2 0.9.0 for the following platforms, OS, compilers, and InfiniBand adapters: - Platforms: EM64T, Opteron, IA-32, and Mac G5 - Operating Systems: Linux, Solaris, and Mac OSX - Compilers: gcc, intel, and pgi - InfiniBand Adapters: Mellanox adapters with PCI-X and PCI-Express (SDR and DDR with mem-full and mem-free cards) In addition to delivering high performance with VAPI interface, MVAPICH2 0.9.0 also provides uDAPL support for portability across networks and platforms with highest performance. The uDAPL interface of this release has been tested with InfiniBand (OpenIB/Gen2 uDAPL, IBGD/uDAPL, and Solaris IBTL/uDAPL), Ammasso GigE (Ammasso uDAPL), and Myrinet (DAPL-GM beta). Starting with this release, MVAPICH2 enables InfiniBand support for Solaris environment through uDAPL support. MVAPICH2 0.9.0 is being distributed as a single integrated package (with MPICH2 1.0.2p1 and MVICH). It is available under BSD license. This new release has the following features: - MPI-2 functionalities (one-sided, collectives, datatype) - all MPI-1 functionalities - high performance and optimized support for all one-sided operations (Get, Put, and Accumulate) - support for active and passive synchronization - optimized two-sided operations with RDMA support - efficient memory registration/de-registration schemes for RDMA operations - optimized intra-node shared memory support (bus-based and NUMA) - shared library support - ROMIO support - uDAPL support (tested for InfiniBand on Linux and Solaris, Myrinet, and Ammasso GigE) - scalable job start-up - optimized and tuned for the above platforms and different network interfaces (PCI-X and PCI-Express with SDR and DDR) - support for multiple compilers (gcc, icc, and pgi) - single code base for all of the above platforms and OS - memory efficient scaling modes for medium and large clusters Other features of this release include: - Excellent performance: Sample performance numbers include: Two-sided operations on EM64T, PCI-Ex: - 3.47 microsec one-way latency with IBA-SDR - 1502 MB/sec unidirectional bandwidth with IBA-DDR - 2752 MB/sec bidirectional bandwidth with IBA-DDR One-sided operations on EM64T, PCI-Ex, IBA-DDR: - 5.96 microsec Put latency - 1503 MB/sec unidirectional PUT bandwidth - 2759 MB/sec bidirectional PUT bandwidth Two-sided operations with Solaris uDAPL/IBTL on Opteron, PCI-X, IBA-SDR: - 5.58 microsec one-way latency - 655 MB/sec unidirectional bandwidth - 799 MB/sec bidirectional bandwidth Two-sided operations with OpenIB/Gen2 uDAPL on Opteron, PCI-Ex IBA-SDR: - 3.63 microsec one-way latency - 962 MB/sec unidirectional bandwidth - 1869 MB/sec bidirectional bandwidth Performance numbers for all other platforms, system configurations and operations can be viewed by visiting `Performance Results' section of the project's web page. - Similar performance with MVAPICH: With the new ADI-3-level design, MVAPICH2 0.9.0 delivers similar performance for two-sided operations compared to MVAPICH 0.9.5. Organizations and users interested in getting the best performance for both two-sided and one-sided operations may migrate from MVAPICH code base to MVAPICH2 code base. - A set of benchmarks to evaluate both two-sided and one-sided operations (Put, Get, and Accumulate) - An enhanced and detailed `User Guide' to assist users: - to install this package on different platforms with both interfaces (VAPI and uDAPL) and different options - to vary different parameters of the MPI installation to extract maximum performance and achieve scalability, especially on large-scale systems. You are welcome to download the MVAPICH2 0.9.0 package and access relevant information from the following URL: http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ A successive version with support for OpenIB/Gen2 will be available soon. All feedbacks, including bug reports and hints for performance tuning, are welcome. Please send an e-mail to mvapich-help at cse.ohio-state.edu. Thanks, MVAPICH Team at OSU/NBCL ---------- PS: If you would like to be removed from this mailing list, please end an e-mail to mvapich_request at cse.ohio-state.edu. ====================================================================== MVAPICH/MVAPICH2 project is currently supported with funding from U.S. National Science Foundation , U.S. DOE Office of Science, Mellanox, Intel, Cisco Systems, Sun Microsystems, and Linux Networx; and with equipment support from AMD, Ammasso, Apple, IBM, Intel, Mellanox, Microway, PathScale, SilverStorm and Sun Microsystems. Other technology partner include Etnus. ====================================================================== From pradeep at us.ibm.com Tue Nov 1 22:02:20 2005 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Tue, 1 Nov 2005 22:02:20 -0800 Subject: [openib-general] Re: Questions about libibat, ib_uat, and ib_a In-Reply-To: Message-ID: openib-general-bounces at openib.org wrote on 10/18/2005 03:40:47 PM: > > > > > On Mon, 2005-10-18 at 10:07, Kevin Reilly wrote: > >On Mon, 2005-10-17 at 10:07, Hal Rosenstock wrote: > >> > Should this code work, because it seems that out_dev is a kernel > >> > address (platform: PPC64) which cannot accessed by a userspace > >> > program. Via GDB I can see that rt has the following content: > >> > > >> > The address is rt->out_dev = 0xc0000000cffaa800 which looks like a > >> > kernel address. > >> > >> Yes, this is a bug which has been previously pointed out on the list and > >> not fixed. > > Can some one point me to the previous discussions on this list (search did not yield any results)? The problem is because of a copy_to_user (in uat.c) between struct ib_at_ib_route which are different between user and kernel space causing this crash. What was the rationale of putting a pointer to struct ibv_device in the user space version of ib_at_ib_route? The out_dev field in user space is not really used as far as I could see. > >The fix for this involves an ABI change: it should return the GID of the > >outgoing IB device. > > Would a simple solution like adding a device_name field to both the ib_at_ib_route structures be acceptable? The out_dev field could be used as a "reserved" field in user space and not be used. That should not break anything as far as I can see. > >-- Hal Pradeep pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From info at kijshd.com Tue Nov 1 20:37:34 2005 From: info at kijshd.com (info at kijshd.com) Date: 2 Nov 2005 13:37:34 +0900 Subject: [openib-general] $BLt6I1?1DpJs(B Message-ID: <20051102043734.9835.qmail@mail.kijshd.com> $B5.J}$N%"%I%l%9$,!Z(BID:145265 $B at 6;R![$5$s$+$iD>@\;XL>$r$5$l$?$3$H$,3NG'$G$-$^$7$?$N$G!"D>@\O"Mm2DG=$H at _Dj$5$;$FD:$-$^$7$?!#:#$+$iD>@\O"MmJ}K!$r$40FFb$G$-$7$^$9$N$G!"G'>Z$H$7$F4JC1$JFCJL?=9~$_(B($BA4$FL5NA(B)$B$r$*4j$$CW$7$^$9!#(B $B8^IC$GL5NAEPO?"*%m%0%$%s!!(Bhttp://www.jumpb2.net/?raku $B"!4JC1(BPF$B>R2p"!(B $BG/Np!'Fb=o(B $B;E;v!'Lt6IE9J^1?1D(B($BA49q==FsE9J^(B) $B%3%a%s%H!'!V0l2s#5K|$/$i$$G=w at -$r0FFbCW$7$^$9$N$G!"D>@\%a!<%k(B $B$h$j%"%I%l%9$J$I$N3NG'$,$G$-$k$HJ]>Z$7$^$9!#L>A0!Z at 6;R![$G(B $BEPO?$5$l$F$*$j$^$9!#(B \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ $B5qH]%"%I(B (Refusal Adress) iranai at jumpb2.net $B!!(B From mst at mellanox.co.il Wed Nov 2 00:45:01 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 2 Nov 2005 10:45:01 +0200 Subject: [openib-general] scsi/srp.h Message-ID: <20051102084501.GR31134@mellanox.co.il> Roland, would you mind moving scsi/srp.h from ulp/srp to infiniband/include in subversion, please? The fact that its under ulp/srp breaks build of a tree linked to under drivers/infiniband drivers/infiniband/ulp/srp/ib_srp.c:49:22: scsi/srp.h: No such file or directory And I think it makes sense to keep includes in one place, right? -- MST From mst at mellanox.co.il Wed Nov 2 00:54:29 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 2 Nov 2005 10:54:29 +0200 Subject: [openib-general] Re: 2.6.14 patches In-Reply-To: <20051030123622.GD4769@mellanox.co.il> References: <20051030123622.GD4769@mellanox.co.il> Message-ID: <20051102085429.GS31134@mellanox.co.il> Quoting Michael S. Tsirkin : > Sean, Hal, now that 2.6.14 is out, do you plan to apply > the patches in > https://openib.org/svn/gen2/trunk/src/linux-kernel/patches/? > Once you do, I'll put reverted patches in the backport directory. Guys, I know there are plans for removing at.c, but since it is, for now, included in the makefile, I plan to apply linux-2.6.14-rc3-at.diff and check in, to avoid the warning for 2.6.14 builds. Does anyone have a problem with this? -- MST From mst at mellanox.co.il Wed Nov 2 01:03:10 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 2 Nov 2005 11:03:10 +0200 Subject: [openib-general] Re: build fails on revision 3930 In-Reply-To: <6.2.3.4.2.20051101114151.0227ba00@cic-mail.lanl.gov> References: <6.2.3.4.2.20051101114151.0227ba00@cic-mail.lanl.gov> Message-ID: <20051102090310.GT31134@mellanox.co.il> Quoting James W. Barker : > Subject: build fails on revision 3930 > > All, > > Following the instructions posted in your > "installation cheetsheet" after I issue the > command "make modules modules_install" the build > fails with the error message below (this is > revision 3930), the same procedure (not sure > which revision number) was successful last week: > > CC [M] drivers/infiniband/core/addr.o > drivers/infiniband/core/addr.c:330: warning: > initialization from incompatible pointer type Looks like you are using kernel 2.6.13 or older. The subversion trunk is for the latest kernels.org kernel only, which is 2.6.14 as of this writing. Please note, that to load sdp, at and addr modules in kernel 2.6.14, you have to apply the following kernel patch https://openib.org/svn/gen2/trunk/src/linux-kernel/patches/linux-2.6.14-fib-frontend.diff If you want to work on older kernels, you need to apply the backport patches to the subversion repository. Find them here https://openib.org/svn/gen2/branches/backport/ -- MST From mst at mellanox.co.il Wed Nov 2 02:30:08 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 2 Nov 2005 12:30:08 +0200 Subject: [openib-general] [PATCH applied] remove side effects from kunmap_atomic Message-ID: <20051102103007.GU31134@mellanox.co.il> The following is already applied. --- On some platforms kunmap_atomic is an empty macro. Therefore it is unsafe for calls to kunmap_atomic to have side effects, such as incrementing a counter. Signed-off-by: Michael S. Tsirkin Index: linux-kernel/drivers/infiniband/ulp/sdp/sdp_send.c =================================================================== --- linux-kernel/drivers/infiniband/ulp/sdp/sdp_send.c (revision 3926) +++ linux-kernel/drivers/infiniband/ulp/sdp/sdp_send.c (working copy) @@ -727,7 +727,8 @@ static int sdp_send_iocb_buff_write(stru offset += copy; offset &= (~PAGE_MASK); - kunmap_atomic(iocb->page_array[counter++], KM_IRQ0); + kunmap_atomic(iocb->page_array[counter], KM_IRQ0); + ++counter; local_irq_restore(flags); } Index: linux-kernel/drivers/infiniband/ulp/sdp/sdp_recv.c =================================================================== --- linux-kernel/drivers/infiniband/ulp/sdp/sdp_recv.c (revision 3926) +++ linux-kernel/drivers/infiniband/ulp/sdp/sdp_recv.c (working copy) @@ -610,7 +610,8 @@ static int sdp_read_buff_iocb(struct sdp iocb->io_addr += copy; - kunmap_atomic(iocb->page_array[counter++], KM_IRQ0); + kunmap_atomic(iocb->page_array[counter], KM_IRQ0); + ++counter; local_irq_restore(flags); } -- MST From mst at mellanox.co.il Wed Nov 2 03:36:58 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 2 Nov 2005 13:36:58 +0200 Subject: [openib-general] Re: 2.6.14 patches In-Reply-To: <1130929425.4381.3327.camel@hal.voltaire.com> References: <1130929425.4381.3327.camel@hal.voltaire.com> Message-ID: <20051102113658.GV31134@mellanox.co.il> > > Guys, I know there are plans for removing at.c, but since it is, for > now, > > included in the makefile, I plan to apply linux-2.6.14-rc3-at.diff > > and check in, to avoid the warning for 2.6.14 builds. > > Does anyone have a problem with this? > > It's fine to do this. I haven't be able to upgrade to 2.6.14 yet. I was > going to do this during that process. > > -- Hal OK, I did this, removed the patch, and updated the backport directory appropriately. -- MST From jkwnd at go.com Wed Nov 2 01:36:00 2005 From: jkwnd at go.com (Terence Bowers) Date: Wed, 2 Nov 2005 11:36:00 +0200 Subject: [openib-general] Your request. Message-ID: <378p758t.6905415@go.com> We noticed you had bought one of our products before. We just recently slashed prices, and thought we should let you know. http://theewatchshop.net/ Check us out, im sure you will find something that you will like, at a price that is very affordable. Regards, Terence Bowers Customer Service Rep. unison see precedent try may spokane a it tabernacle some or activate , see tat it's a stipend ! may butterfat thesee valid in. hygrometer ! suspense the and stolid , , phonemic a but desegregate the , smooch it be grillwork or ! mirage insee arlington ,. From halr at voltaire.com Wed Nov 2 03:40:39 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 2 Nov 2005 13:40:39 +0200 Subject: [openib-general] Re: 2.6.14 patches Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F5175CD8@taurus.voltaire.com> On Wed, 2005-11-02 at 03:54, Michael S. Tsirkin wrote: > Quoting Michael S. Tsirkin : > > Sean, Hal, now that 2.6.14 is out, do you plan to apply > > the patches in > > https://openib.org/svn/gen2/trunk/src/linux-kernel/patches/? > > Once you do, I'll put reverted patches in the backport directory. > > Guys, I know there are plans for removing at.c, but since it is, for now, > included in the makefile, I plan to apply linux-2.6.14-rc3-at.diff > and check in, to avoid the warning for 2.6.14 builds. > Does anyone have a problem with this? If you can't wait for me to do it, it's fine to go ahead. I haven't found the time to upgrade to 2.6.14 yet. I was going to take care of this during that process. -- Hal From halr at voltaire.com Wed Nov 2 03:44:55 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 2 Nov 2005 13:44:55 +0200 Subject: [openib-general] opensm errors with ehca Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F5175CD9@taurus.voltaire.com> On Tue, 2005-11-01 at 23:49, Troy Benjegerdes wrote: > > Can you try the following opensm patch and see if this eliminates those > > timeout messages ? > > > > This patch clears the high part of the attribute modifier when not a > > switch (when obtaining the PKeyTable). > > > > -- Hal > > > > Index: osm_port_info_rcv.c > > =================================================================== > > --- osm_port_info_rcv.c (revision 3906) > > +++ osm_port_info_rcv.c (working copy) > > @@ -430,6 +430,7 @@ void osm_pkey_get_tables( > > osm_dr_path_t path; > > uint8_t port_num; > > uint16_t block_num, max_blocks; > > + uint32_t attr_mod_ho; > > osm_switch_t* p_switch; > > > > OSM_LOG_ENTER( p_log, osm_physp_has_pkey ); > > @@ -455,7 +456,7 @@ void osm_pkey_get_tables( > > else > > { > > /* This is a switch, and not a management port. The maximum blocks is defined > > - on the switch info partition enforcement cap. */ > > + in the switch info partition enforcement cap. */ > > p_switch = osm_get_switch_by_guid(p_subn, p_node->node_info.node_guid); > > > > if (! p_switch) > > @@ -472,10 +473,14 @@ void osm_pkey_get_tables( > > > > for (block_num = 0 ; block_num < max_blocks ; block_num++) > > { > > + if (osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH) > > + attr_mod_ho = block_num; > > + else > > + attr_mod_ho = block_num | (port_num << 16); > > status = osm_req_get( p_req, > > &path, > > IB_MAD_ATTR_P_KEY_TABLE, > > - cl_hton32(block_num | (port_num << 16) ), > > + cl_hton32(attr_mod_ho), > > CL_DISP_MSGID_NONE, > > &context ); > > > > This seems to ignore the IBM logical HCA, but gives the same thing > on the IBM Logical switch. Is there a way to ignore this as well? It is correct for the logical switch. It needs to be handled there per the spec. The high 16 bits are required to be the port number whereas for HCAs and routers this was ignore. This _will_ require a firmware change. I'm unaware of a workaround for this unless we want to do it only for the IBM OUI only temporarily. Will they all have this OUI 000255 ? BTW, getting this error does not appear to cause any bad effects. Does this agree with what you are seeing ? -- Hal > switchguids=0x2550000038580 > Switch 63 "S-0002550000038580" # IBM Logical Switch 1 port 0 > lid 21 > [2] "H-0002550000038500"[1] > [1] "S-0002c90200402917"[22] > > > I still get: > > Nov 01 22:34:08 660205 [43005960] -> umad_receiver: ERR 5409: send > completed wit > h error (method=0x1 attr=0x16 trans_id=0x13c9) -- dropping. > Nov 01 22:34:08 660213 [43005960] -> umad_receiver: ERR 5411: DR SMP hop > ptr 0 h > op count 2 DR SLID 0x0 DR DLID 0x0 > Nov 01 22:34:08 660221 [43005960] -> __osm_sm_mad_ctrl_send_err_cb: ERR > 3113: MA > D completed in error (IB_TIMEOUT). > Nov 01 22:34:08 660243 [43005960] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > D bit...................0x0 > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x2 > trans_id................0x13c9 > attr_id.................0x16 > (P_KeyTable) > resv....................0x0 > attr_mod................0x10000 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0][1][16] > Return path: [0][0][0] > Reserved: [0][0][0][0][0][0][0] > > > From halr at voltaire.com Wed Nov 2 03:58:02 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Nov 2005 06:58:02 -0500 Subject: [openib-general] Re: Questions about libibat, ib_uat, and ib_a In-Reply-To: References: Message-ID: <1130932682.4381.3532.camel@hal.voltaire.com> On Wed, 2005-11-02 at 01:02, Pradeep Satyanarayana wrote: > openib-general-bounces at openib.org wrote on 10/18/2005 03:40:47 PM: > > > > > > > > > > > On Mon, 2005-10-18 at 10:07, Kevin Reilly wrote: > > >On Mon, 2005-10-17 at 10:07, Hal Rosenstock wrote: > > >> > Should this code work, because it seems that out_dev is a > kernel > > >> > address (platform: PPC64) which cannot accessed by a userspace > > >> > program. Via GDB I can see that rt has the following content: > > >> > > > >> > The address is rt->out_dev = 0xc0000000cffaa800 which looks > like a > > >> > kernel address. > > >> > > >> Yes, this is a bug which has been previously pointed out on the > list and > > >> not fixed. > > > > > Can some one point me to the previous discussions on this list (search > did not yield any results)? There were various posts from Heiko J Schick on 10/17 and subsequent ones from Kevin. > The problem is because of a copy_to_user (in uat.c) between struct > ib_at_ib_route > which are different between user and kernel space causing this crash. > What was the rationale of putting a pointer to struct ibv_device in > the user space version of > ib_at_ib_route? The out_dev field in user space is not really used as > far as I could see. > > > >The fix for this involves an ABI change: it should return the GID > of the > > >outgoing IB device. > > > > > Would a simple solution like adding a device_name field to both the > ib_at_ib_route structures > be acceptable? The out_dev field could be used as a "reserved" field > in user space and not be used. > That should not break anything as far as I can see. Ideally, the current out_dev field should be removed and all consumers should be converted over to the new structures/interfaces. Guess the question also is whether people want this by name, GID, or both ? -- Hal > > >-- Hal > > Pradeep > pradeep at us.ibm.com > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From kenjeffries at austin.rr.com Wed Nov 2 04:49:09 2005 From: kenjeffries at austin.rr.com (Kenneth L Jeffries) Date: Wed, 2 Nov 2005 06:49:09 -0600 Subject: [openib-general] Re: [PATCH] [SRP] support for it_iu length negotiation References: <000101c5df65$01a94a90$9e5aa8c0@infiniconsys.com> Message-ID: <025401c5dfab$d22c7de0$0a97a8c0@blacktip> From: "Fab Tillier" Sent: Tuesday, November 01, 2005 10:22 PM >Even 350 bytes is a burden - imagine a target that supports a queue depth of >1000 I/Os from a few dozen initators. Ideally, I'd like to see us use just >DDBDs and the 64-byte IU, along with registering the data buffers on a per-I/O >basis, either via FMR or regular MRs. Wouldn't a registering a MR per i/o kill performance? Right now, I believe, the srp initiator registers all memory in as one region. >> [as an aside, it sure would be nice if we could do an SRP-3 (since SRP-2 >> is dead) where multiple direct descriptors would be allowed. The only >> way to get multiple descriptors now is with indirect descriptors.] >That saves you 20 bytes - not a huge gain. Yes but I wasn't clear. Allowing multiple direct descriptors would make it reasonable for a target to not implement indirect descriptors at all. Presently target implementers may be tempted to only partially implement indirect descriptors by implementing partial descriptor list processing but not the actual indirect list. There is an argument that says that making iu's really big will eliminate real indirect descriptors ( that is, indirect descriptors beyond the partial list delivered in the iu) and make complete implementation (ie fetching the rest of the list) of indirect descriptors unnecessary. >> I am pretty sure that someone doing a video server might want to do, say, >> 1MB i/o's. 1MB with 4KB pages means 256 descriptors and an iu of >> something over 4096 bytes. I definitely don't want to be told by the srp >> initiator that I need to use 4KB iu's. (So we agree there.) >For large I/O, doing a registration of the buffer and sending a DDBD with a >single descriptor might well provide the best performance. If you look at the >traffic on the wire, having the target do multiple page-sized RDMA operations is >far less efficient than creating a virtual contiguous (to the target) region >that a single RDMA operation can service. Agreed. But I'm missing something (no doubt because I'm working on the embedded target side, not the Linux side). It looks like the srp initiator registers all of kernel memory and does i/o from there. I'm not sure that an application can cause an arbitrarily large address-contiguous payload to appear on the wire. Probably I just don't understand all of what is happening there. Ken Jeffries From halr at voltaire.com Wed Nov 2 04:43:22 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Nov 2005 07:43:22 -0500 Subject: [openib-general] [PATCH] OpenSM: Clear port number in attribute modifier for P_KeyTable when not switch Message-ID: <1130935402.4381.3627.camel@hal.voltaire.com> Hi, Any objections to committing the patch below ? -- Hal When obtaining the P_KeyTable, clear the high 16 bits of the attribute modifier when node is not a switch. This is supposed to be an ignore field but not all implementations are conformant with this. Signed-off-by: Hal Rosenstock Index: osm_port_info_rcv.c =================================================================== --- osm_port_info_rcv.c (revision 3906) +++ osm_port_info_rcv.c (working copy) @@ -430,6 +430,7 @@ void osm_pkey_get_tables( osm_dr_path_t path; uint8_t port_num; uint16_t block_num, max_blocks; + uint32_t attr_mod_ho; osm_switch_t* p_switch; OSM_LOG_ENTER( p_log, osm_physp_has_pkey ); @@ -455,7 +456,7 @@ void osm_pkey_get_tables( else { /* This is a switch, and not a management port. The maximum blocks is defined - on the switch info partition enforcement cap. */ + in the switch info partition enforcement cap. */ p_switch = osm_get_switch_by_guid(p_subn, p_node->node_info.node_guid); if (! p_switch) @@ -472,10 +473,14 @@ void osm_pkey_get_tables( for (block_num = 0 ; block_num < max_blocks ; block_num++) { + if (osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH) + attr_mod_ho = block_num; + else + attr_mod_ho = block_num | (port_num << 16); status = osm_req_get( p_req, &path, IB_MAD_ATTR_P_KEY_TABLE, - cl_hton32(block_num | (port_num << 16) ), + cl_hton32(attr_mod_ho), CL_DISP_MSGID_NONE, &context ); From mst at mellanox.co.il Wed Nov 2 05:25:35 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 2 Nov 2005 15:25:35 +0200 Subject: [openib-general] Re: build fails on revision 3930 In-Reply-To: <20051102090310.GT31134@mellanox.co.il> References: <6.2.3.4.2.20051101114151.0227ba00@cic-mail.lanl.gov> <20051102090310.GT31134@mellanox.co.il> Message-ID: <20051102132535.GB31134@mellanox.co.il> Quoting r. Michael S. Tsirkin : > Subject: Re: build fails on revision 3930 > > Quoting James W. Barker : > > Subject: build fails on revision 3930 > > > > All, > > > > Following the instructions posted in your > > "installation cheetsheet" after I issue the > > command "make modules modules_install" the build > > fails with the error message below (this is > > revision 3930), the same procedure (not sure > > which revision number) was successful last week: > > Looks like you are using kernel 2.6.13 or older. > The subversion trunk is for the latest kernels.org kernel only, which > is 2.6.14 as of this writing. I have now updated the cheat sheet with this information. -- MST From poxbedvhxu at noos.fr Wed Nov 2 04:24:02 2005 From: poxbedvhxu at noos.fr (Tiffany Anthony) Date: Wed, 2 Nov 2005 13:24:02 +0100 Subject: [openib-general] Personalized mortgage rate quote! Message-ID: <20199474095115.poxbedvhxu@noos.fr> We are happy to present you with six deals from four different brokers. Please remember that there is no commitment required on your part, and your credit is not an issue. Please validate your information with our secure and private database to ensure our records are up to date and accurate. http://site-123.net/p2.asp Have a good day. Sincerely, Tiffany Anthony Customer Service Rep eKOX Inc. lame in eelgrass or , affectation it some frill , a complexion on see platform ! some tactile not in ecstasy ,but caprice on. Update on site tapa it's madeline may try crisis it on please but or pepperoni a it's divest see be analogous and may healthful ,, orpheus or. From halr at voltaire.com Wed Nov 2 05:35:38 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Nov 2005 08:35:38 -0500 Subject: [openib-general] Re: [PATCH] Osmtest - update command options + vapi fix In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E35AC09B@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E35AC09B@mtlexch01.mtl.com> Message-ID: <1130938538.4381.3739.camel@hal.voltaire.com> On Tue, 2005-11-01 at 06:14, Liran Sorani wrote: > Hi , Hal . > We've decided to keep and maintain Osmtest in the main trunk , since > it is not only a test but a tool to validate SA/SM. > > The following is a small patch for the follwoing : > 1. Support old form of running osmtest , i.e instead of -g= guid> , use -g and add '-p' option to display current > available port guids. > > 2. Support Vapi stack. > 3. Update Service flow (Update one of the service lease checks from 1 > sec to 4 sec). > 4. Ident switch-case) issues in main.c I just applied the changes for 2 and 3 so far. I am working on the updated change to main.c (1 and 4). -- Hal From eitan at mellanox.co.il Wed Nov 2 05:46:49 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 2 Nov 2005 15:46:49 +0200 Subject: [openib-general] [PATCH] OpenSM: Clear port number in attribu te modifier for P_KeyTable when not switch Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618891@mtlexch01.mtl.com> It's ok. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, November 02, 2005 2:43 PM > To: openib-general at openib.org > Subject: [openib-general] [PATCH] OpenSM: Clear port number in attribute modifier > for P_KeyTable when not switch > > Hi, > > Any objections to committing the patch below ? > > -- Hal > > When obtaining the P_KeyTable, clear the high 16 bits of the attribute > modifier when node is not a switch. This is supposed to be an ignore > field but not all implementations are conformant with this. > > Signed-off-by: Hal Rosenstock > > Index: osm_port_info_rcv.c > =================================================================== > --- osm_port_info_rcv.c (revision 3906) > +++ osm_port_info_rcv.c (working copy) > @@ -430,6 +430,7 @@ void osm_pkey_get_tables( > osm_dr_path_t path; > uint8_t port_num; > uint16_t block_num, max_blocks; > + uint32_t attr_mod_ho; > osm_switch_t* p_switch; > > OSM_LOG_ENTER( p_log, osm_physp_has_pkey ); > @@ -455,7 +456,7 @@ void osm_pkey_get_tables( > else > { > /* This is a switch, and not a management port. The maximum blocks is defined > - on the switch info partition enforcement cap. */ > + in the switch info partition enforcement cap. */ > p_switch = osm_get_switch_by_guid(p_subn, p_node->node_info.node_guid); > > if (! p_switch) > @@ -472,10 +473,14 @@ void osm_pkey_get_tables( > > for (block_num = 0 ; block_num < max_blocks ; block_num++) > { > + if (osm_node_get_type( p_node ) != IB_NODE_TYPE_SWITCH) > + attr_mod_ho = block_num; > + else > + attr_mod_ho = block_num | (port_num << 16); > status = osm_req_get( p_req, > &path, > IB_MAD_ATTR_P_KEY_TABLE, > - cl_hton32(block_num | (port_num << 16) ), > + cl_hton32(attr_mod_ho), > CL_DISP_MSGID_NONE, > &context ); > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed Nov 2 05:46:26 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Nov 2005 08:46:26 -0500 Subject: [openib-general] Re: Re:[PATCH] Osmtest - update command option + vapi fix In-Reply-To: <30u0ewv66s.fsf@mtl066.yok.mtl.com> References: <30u0ewv66s.fsf@mtl066.yok.mtl.com> Message-ID: <1130939185.4381.3763.camel@hal.voltaire.com> Hi Liran, On Tue, 2005-11-01 at 08:38, Liran Sorani wrote: > Hi Hal, > 1. Regarding the osmtest_SOURCES , it works both ways (i.e compile all files required) , > still the correct one is += I understand. You only had = not += in your patch for this. I changed it so that it works and doesn't override osmtest_SOURCES but adds to it when VAPI is being built. > 2. Following is the patch for main.c : > > Index: main.c > =================================================================== > --- main.c (revision 3928) > +++ main.c (working copy) > @@ -128,9 +128,11 @@ > "--guid \n" > " This option specifies the local port GUID value\n" > " with which osmtest should bind. osmtest may be\n" > - " bound to 1 port at a time.\n" > - " Without -g, osmtest displays a menu of possible\n" > - " port GUIDs and waits for user input.\n\n" ); > + " bound to 1 port at a time.\n\n"); > + printf( "-p \n" > + "--port\n" > + " This option display menu of possible local port GUID values\n" > + " with which osmtest could bind.\n\n"); > printf( "-h\n" > "--help\n" " Display this usage info then exit.\n\n" ); > printf( "-i \n" > @@ -160,9 +162,9 @@ > " --- -----------------\n" > " -M1 - Short Multicast Flow (default) - single mode.\n" > " -M2 - Short Multicast Flow - multiple mode.\n" > - " -M3 - Long Multicast Flow - single mode.\n" > - " -M4 - Long Multicast Flow - mutiple mode.\n" > - " Single mode - Osmtest is tested alone, with no other\n" > + " -M3 - Long MultiCast Flow - single mode.\n" > + " -M4 - Long MultiCast Flow - mutiple mode.\n" Should it be Multicast or MultiCast ? -- Hal > + " Single mode - Osmtest is tested alone , with no other \n" > " apps that interact vs. OpenSM MC.\n" > " Multiple mode - Could be run with other apps using MC vs.\n" > " OpenSM." > @@ -305,7 +307,7 @@ > char flow_name[64]; > boolean_t mem_track = FALSE; > uint32_t next_option; > - const char *const short_option = "f:l:m:M:d:g::s:t:i:cvVh"; > + const char *const short_option = "f:l:m:M:d:g:s:t:i:pcvVh"; > > /* > * In the array below, the 2nd parameter specified the number > @@ -322,9 +324,10 @@ > {"inventory", 1, NULL, 'i'}, > {"max_lid", 1, NULL, 'm'}, > {"guid", 2, NULL, 'g'}, > + {"port", 0, NULL, 'p'}, > {"help", 0, NULL, 'h'}, > {"stress", 1, NULL, 's'}, > - {"Multicast_Mode", 1, NULL, 'M'}, > + {"MultiCast_Mode", 1, NULL, 'M'}, > {"timeout", 1, NULL, 't'}, > {"verbose", 0, NULL, 'v'}, > {"log_file", 1, NULL, 'l'}, > @@ -363,7 +366,6 @@ > { > next_option = getopt_long_only( argc, argv, short_option, > long_option, NULL ); > - > switch ( next_option ) > { > case 'c': > @@ -446,28 +448,30 @@ > break; > > case 'g': > - /* > - Specifies port guid with which to bind. > - */ > - if (optarg) { > - guid = cl_hton64( strtoull( optarg, NULL, 16 )); > - printf(" Guid <0x%"PRIx64">\n", cl_hton64( guid )); > - } else > - guid = INVALID_GUID; > - break; > - > + /* > + * Specifies port guid with which to bind. > + */ > + guid = cl_hton64( strtoull( optarg, NULL, 16 )); > + printf(" Guid <0x%"PRIx64">\n", cl_hton64( guid )); > + break; > + case 'p': > + /* > + * Display current port guids > + */ > + guid = INVALID_GUID; > + break; > case 't': > - /* > + /* > * Specifies transaction timeout. > - */ > - opt.transaction_timeout = strtol( optarg, NULL, 0 ); > - printf( "\tTransaction timeout = %d\n", opt.transaction_timeout ); > - break; > + */ > + opt.transaction_timeout = strtol( optarg, NULL, 0 ); > + printf( "\tTransaction timeout = %d\n", opt.transaction_timeout ); > + break; > > case 'l': > - opt.log_file = optarg; > - printf("\tLog File:%s\n", opt.log_file ); > - break; > + opt.log_file = optarg; > + printf("\tLog File:%s\n", opt.log_file ); > + break; > > case 'v': > /* > @@ -510,32 +514,32 @@ > } > break; > > - case 'M': > - /* > - * Perform stress test. > - */ > - opt.mmode = strtol( optarg, NULL, 0 ); > - printf( "\tMulticast test enabled: " ); > - switch ( opt.mmode ) > - { > - case 1: > - printf( "Short MC Flow - single mode (default)\n" ); > - break; > - case 2: > - printf( "Short MC Flow - mutiple mode\n" ); > - break; > - case 3: > - printf( "Long MC Flow - single mode\n" ); > - break; > - case 4: > - printf( "Long MC Flow - mutiple mode\n" ); > - break; > - default: > - printf( "Unknown value %u (ignored)\n", opt.stress ); > - opt.mmode = 0; > - break; > - } > - break; > + case 'M': > + /* > + * Perform stress test. > + */ > + opt.mmode = strtol( optarg, NULL, 0 ); > + printf( "\tMultiCast test enabled: " ); > + switch ( opt.mmode ) > + { > + case 1: > + printf( "Short MC Flow - single mode (default)\n" ); > + break; > + case 2: > + printf( "Short MC Flow - mutiple mode\n" ); > + break; > + case 3: > + printf( "Long MC Flow - single mode\n" ); > + break; > + case 4: > + printf( "Long MC Flow - mutiple mode\n" ); > + break; > + default: > + printf( "Unknown value %u (ignored)\n", opt.stress ); > + opt.mmode = 0; > + break; > + } > + break; > > case 'd': > /* > From halr at voltaire.com Wed Nov 2 05:54:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Nov 2005 08:54:49 -0500 Subject: [openib-general] Re: Re:[PATCH] Osmtest - update command option + vapi fix Message-ID: <1130939586.4381.3778.camel@hal.voltaire.com> Hi Liran, On Tue, 2005-11-01 at 08:38, Liran Sorani wrote: > Hi Hal, > 1. Regarding the osmtest_SOURCES , it works both ways (i.e compile all files required) , > still the correct one is += I understand. You only had = not += in your patch for this. I changed it so that it works and doesn't override osmtest_SOURCES but adds to it when VAPI is being built. > 2. Following is the patch for main.c : > > Index: main.c > =================================================================== > --- main.c (revision 3928) > +++ main.c (working copy) > @@ -128,9 +128,11 @@ > "--guid \n" > " This option specifies the local port GUID value\n" > " with which osmtest should bind. osmtest may be\n" > - " bound to 1 port at a time.\n" > - " Without -g, osmtest displays a menu of possible\n" > - " port GUIDs and waits for user input.\n\n" ); > + " bound to 1 port at a time.\n\n"); > + printf( "-p \n" > + "--port\n" > + " This option display menu of possible local port GUID values\n" > + " with which osmtest could bind.\n\n"); > printf( "-h\n" > "--help\n" " Display this usage info then exit.\n\n" ); > printf( "-i \n" > @@ -160,9 +162,9 @@ > " --- -----------------\n" > " -M1 - Short Multicast Flow (default) - single mode.\n" > " -M2 - Short Multicast Flow - multiple mode.\n" > - " -M3 - Long Multicast Flow - single mode.\n" > - " -M4 - Long Multicast Flow - mutiple mode.\n" > - " Single mode - Osmtest is tested alone, with no other\n" > + " -M3 - Long MultiCast Flow - single mode.\n" > + " -M4 - Long MultiCast Flow - mutiple mode.\n" Should it be MultiCast or Multicast ? -- Hal > + " Single mode - Osmtest is tested alone , with no other \n" > " apps that interact vs. OpenSM MC.\n" > " Multiple mode - Could be run with other apps using MC vs.\n" > " OpenSM." > @@ -305,7 +307,7 @@ > char flow_name[64]; > boolean_t mem_track = FALSE; > uint32_t next_option; > - const char *const short_option = "f:l:m:M:d:g::s:t:i:cvVh"; > + const char *const short_option = "f:l:m:M:d:g:s:t:i:pcvVh"; > > /* > * In the array below, the 2nd parameter specified the number > @@ -322,9 +324,10 @@ > {"inventory", 1, NULL, 'i'}, > {"max_lid", 1, NULL, 'm'}, > {"guid", 2, NULL, 'g'}, > + {"port", 0, NULL, 'p'}, > {"help", 0, NULL, 'h'}, > {"stress", 1, NULL, 's'}, > - {"Multicast_Mode", 1, NULL, 'M'}, > + {"MultiCast_Mode", 1, NULL, 'M'}, > {"timeout", 1, NULL, 't'}, > {"verbose", 0, NULL, 'v'}, > {"log_file", 1, NULL, 'l'}, > @@ -363,7 +366,6 @@ > { > next_option = getopt_long_only( argc, argv, short_option, > long_option, NULL ); > - > switch ( next_option ) > { > case 'c': > @@ -446,28 +448,30 @@ > break; > > case 'g': > - /* > - Specifies port guid with which to bind. > - */ > - if (optarg) { > - guid = cl_hton64( strtoull( optarg, NULL, 16 )); > - printf(" Guid <0x%"PRIx64">\n", cl_hton64( guid )); > - } else > - guid = INVALID_GUID; > - break; > - > + /* > + * Specifies port guid with which to bind. > + */ > + guid = cl_hton64( strtoull( optarg, NULL, 16 )); > + printf(" Guid <0x%"PRIx64">\n", cl_hton64( guid )); > + break; > + case 'p': > + /* > + * Display current port guids > + */ > + guid = INVALID_GUID; > + break; > case 't': > - /* > + /* > * Specifies transaction timeout. > - */ > - opt.transaction_timeout = strtol( optarg, NULL, 0 ); > - printf( "\tTransaction timeout = %d\n", opt.transaction_timeout ); > - break; > + */ > + opt.transaction_timeout = strtol( optarg, NULL, 0 ); > + printf( "\tTransaction timeout = %d\n", opt.transaction_timeout ); > + break; > > case 'l': > - opt.log_file = optarg; > - printf("\tLog File:%s\n", opt.log_file ); > - break; > + opt.log_file = optarg; > + printf("\tLog File:%s\n", opt.log_file ); > + break; > > case 'v': > /* > @@ -510,32 +514,32 @@ > } > break; > > - case 'M': > - /* > - * Perform stress test. > - */ > - opt.mmode = strtol( optarg, NULL, 0 ); > - printf( "\tMulticast test enabled: " ); > - switch ( opt.mmode ) > - { > - case 1: > - printf( "Short MC Flow - single mode (default)\n" ); > - break; > - case 2: > - printf( "Short MC Flow - mutiple mode\n" ); > - break; > - case 3: > - printf( "Long MC Flow - single mode\n" ); > - break; > - case 4: > - printf( "Long MC Flow - mutiple mode\n" ); > - break; > - default: > - printf( "Unknown value %u (ignored)\n", opt.stress ); > - opt.mmode = 0; > - break; > - } > - break; > + case 'M': > + /* > + * Perform stress test. > + */ > + opt.mmode = strtol( optarg, NULL, 0 ); > + printf( "\tMultiCast test enabled: " ); > + switch ( opt.mmode ) > + { > + case 1: > + printf( "Short MC Flow - single mode (default)\n" ); > + break; > + case 2: > + printf( "Short MC Flow - mutiple mode\n" ); > + break; > + case 3: > + printf( "Long MC Flow - single mode\n" ); > + break; > + case 4: > + printf( "Long MC Flow - mutiple mode\n" ); > + break; > + default: > + printf( "Unknown value %u (ignored)\n", opt.stress ); > + opt.mmode = 0; > + break; > + } > + break; > > case 'd': > /* > From mst at mellanox.co.il Wed Nov 2 06:14:28 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 2 Nov 2005 16:14:28 +0200 Subject: [openib-general] openib segfaults when openib is not loaded Message-ID: <20051102141428.GE31134@mellanox.co.il> Hi! If I try to load opensm without loading any of openib modules, opensm crashes on exit. Has anyone else seen this? # /usr/local/bin/opensm ------------------------------------------------- OpenSM Rev:openib-1.1.0 Command Line Arguments: Log File: /var/log/osm.log ------------------------------------------------- OpenSM Rev:openib-1.1.0 ibwarn: [8954] umad_init: can't read ABI version from /sys/class/infiniband_mad/abi_version (No such file or directory): is ib_umad module loaded? Error from osm_vendor_get_all_port_attr (ffffffff) Error: Could not get port guid Exiting SM Segmentation fault (core dumped) -- MST From halr at voltaire.com Wed Nov 2 06:20:18 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Nov 2005 09:20:18 -0500 Subject: [openib-general] Re: openib segfaults when openib is not loaded In-Reply-To: <20051102141428.GE31134@mellanox.co.il> References: <20051102141428.GE31134@mellanox.co.il> Message-ID: <1130941218.4381.3821.camel@hal.voltaire.com> On Wed, 2005-11-02 at 09:14, Michael S. Tsirkin wrote: > Hi! > If I try to load opensm without loading any of openib modules, > opensm crashes on exit. > Has anyone else seen this? > > # /usr/local/bin/opensm > ------------------------------------------------- > OpenSM Rev:openib-1.1.0 > Command Line Arguments: > Log File: /var/log/osm.log > ------------------------------------------------- > OpenSM Rev:openib-1.1.0 > > ibwarn: [8954] umad_init: can't read ABI version from /sys/class/infiniband_mad/abi_version (No such file or directory): is ib_umad module loaded? > > Error from osm_vendor_get_all_port_attr (ffffffff) > Error: Could not get port guid > Exiting SM > > Segmentation fault (core dumped) Yes, this seg fault is caused due to the following: osm_opensm_destroy shutdowns the dispatcher and subsequent to this osm_vl15_destroy attempts to unregister with the dispatcher (although this has already been done). osm_opensm.c::osm_opensm_destroy /* shut down the dispatcher - so no new messages cross */ cl_disp_shutdown( &p_osm->disp ); /* cleanup all messages on VL15 fifo that were not sent yet */ osm_vl15_shutdown( &p_osm->vl15, &p_osm->mad_pool ); /* lock the whole thing so we do not get any requests etc */ cl_plock_excl_acquire( &p_osm->lock ); /* do the destruction in reverse order as init */ updn_destroy( p_osm->p_updn_ucast_routing ); osm_sa_destroy( &p_osm->sa ); osm_sm_destroy( &p_osm->sm ); osm_db_destroy( &p_osm->db ); osm_vl15_destroy( &p_osm->vl15, &p_osm->mad_pool ); My workaround has been to remove this from osm_vl15intf.c::osm_vl15_destroy but I'm not sure this is the best long term fix as yet. I hadn't searched out whether there were other paths that were different from this flow. This seems lower priority to me than some other issues I'm still sorting through but I will get back to this unless someone else gets to it first or thinks that the workaround I have should be made permanent. -- Hal From liran at mellanox.co.il Wed Nov 2 06:35:22 2005 From: liran at mellanox.co.il (Liran Sorani) Date: Wed, 2 Nov 2005 16:35:22 +0200 Subject: [openib-general] RE: Re:[PATCH] Osmtest - update command option + vapi fix Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E372A1E1@mtlexch01.mtl.com> Hi , Hal . PLS see below , search for [LS] -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Wednesday, November 02, 2005 3:55 PM To: Liran Sorani Cc: openib-general at openib.org Subject: Re: Re:[PATCH] Osmtest - update command option + vapi fix Hi Liran, On Tue, 2005-11-01 at 08:38, Liran Sorani wrote: > Hi Hal, > 1. Regarding the osmtest_SOURCES , it works both ways (i.e compile all files required) , > still the correct one is += I understand. You only had = not += in your patch for this. I changed it so that it works and doesn't override osmtest_SOURCES but adds to it when VAPI is being built. > 2. Following is the patch for main.c : > > Index: main.c > =================================================================== > --- main.c (revision 3928) > +++ main.c (working copy) > @@ -128,9 +128,11 @@ > "--guid \n" > " This option specifies the local port GUID value\n" > " with which osmtest should bind. osmtest may be\n" > - " bound to 1 port at a time.\n" > - " Without -g, osmtest displays a menu of possible\n" > - " port GUIDs and waits for user input.\n\n" ); > + " bound to 1 port at a time.\n\n"); > + printf( "-p \n" > + "--port\n" > + " This option display menu of possible local port GUID values\n" > + " with which osmtest could bind.\n\n"); > printf( "-h\n" > "--help\n" " Display this usage info then exit.\n\n" ); > printf( "-i \n" > @@ -160,9 +162,9 @@ > " --- -----------------\n" > " -M1 - Short Multicast Flow (default) - single mode.\n" > " -M2 - Short Multicast Flow - multiple mode.\n" > - " -M3 - Long Multicast Flow - single mode.\n" > - " -M4 - Long Multicast Flow - mutiple mode.\n" > - " Single mode - Osmtest is tested alone, with no other\n" > + " -M3 - Long MultiCast Flow - single mode.\n" > + " -M4 - Long MultiCast Flow - mutiple mode.\n" Should it be MultiCast or Multicast ? [LS] Lets set it to Multicast. -- Hal > + " Single mode - Osmtest is tested alone , with no other \n" > " apps that interact vs. OpenSM MC.\n" > " Multiple mode - Could be run with other apps using MC vs.\n" > " OpenSM." > @@ -305,7 +307,7 @@ > char flow_name[64]; > boolean_t mem_track = FALSE; > uint32_t next_option; > - const char *const short_option = "f:l:m:M:d:g::s:t:i:cvVh"; > + const char *const short_option = "f:l:m:M:d:g:s:t:i:pcvVh"; > > /* > * In the array below, the 2nd parameter specified the number > @@ -322,9 +324,10 @@ > {"inventory", 1, NULL, 'i'}, > {"max_lid", 1, NULL, 'm'}, > {"guid", 2, NULL, 'g'}, > + {"port", 0, NULL, 'p'}, > {"help", 0, NULL, 'h'}, > {"stress", 1, NULL, 's'}, > - {"Multicast_Mode", 1, NULL, 'M'}, > + {"MultiCast_Mode", 1, NULL, 'M'}, > {"timeout", 1, NULL, 't'}, > {"verbose", 0, NULL, 'v'}, > {"log_file", 1, NULL, 'l'}, > @@ -363,7 +366,6 @@ > { > next_option = getopt_long_only( argc, argv, short_option, > long_option, NULL ); > - > switch ( next_option ) > { > case 'c': > @@ -446,28 +448,30 @@ > break; > > case 'g': > - /* > - Specifies port guid with which to bind. > - */ > - if (optarg) { > - guid = cl_hton64( strtoull( optarg, NULL, 16 )); > - printf(" Guid <0x%"PRIx64">\n", cl_hton64( guid )); > - } else > - guid = INVALID_GUID; > - break; > - > + /* > + * Specifies port guid with which to bind. > + */ > + guid = cl_hton64( strtoull( optarg, NULL, 16 )); > + printf(" Guid <0x%"PRIx64">\n", cl_hton64( guid )); > + break; > + case 'p': > + /* > + * Display current port guids > + */ > + guid = INVALID_GUID; > + break; > case 't': > - /* > + /* > * Specifies transaction timeout. > - */ > - opt.transaction_timeout = strtol( optarg, NULL, 0 ); > - printf( "\tTransaction timeout = %d\n", opt.transaction_timeout ); > - break; > + */ > + opt.transaction_timeout = strtol( optarg, NULL, 0 ); > + printf( "\tTransaction timeout = %d\n", opt.transaction_timeout ); > + break; > > case 'l': > - opt.log_file = optarg; > - printf("\tLog File:%s\n", opt.log_file ); > - break; > + opt.log_file = optarg; > + printf("\tLog File:%s\n", opt.log_file ); > + break; > > case 'v': > /* > @@ -510,32 +514,32 @@ > } > break; > > - case 'M': > - /* > - * Perform stress test. > - */ > - opt.mmode = strtol( optarg, NULL, 0 ); > - printf( "\tMulticast test enabled: " ); > - switch ( opt.mmode ) > - { > - case 1: > - printf( "Short MC Flow - single mode (default)\n" ); > - break; > - case 2: > - printf( "Short MC Flow - mutiple mode\n" ); > - break; > - case 3: > - printf( "Long MC Flow - single mode\n" ); > - break; > - case 4: > - printf( "Long MC Flow - mutiple mode\n" ); > - break; > - default: > - printf( "Unknown value %u (ignored)\n", opt.stress ); > - opt.mmode = 0; > - break; > - } > - break; > + case 'M': > + /* > + * Perform stress test. > + */ > + opt.mmode = strtol( optarg, NULL, 0 ); > + printf( "\tMultiCast test enabled: " ); > + switch ( opt.mmode ) > + { > + case 1: > + printf( "Short MC Flow - single mode (default)\n" ); > + break; > + case 2: > + printf( "Short MC Flow - mutiple mode\n" ); > + break; > + case 3: > + printf( "Long MC Flow - single mode\n" ); > + break; > + case 4: > + printf( "Long MC Flow - mutiple mode\n" ); > + break; > + default: > + printf( "Unknown value %u (ignored)\n", opt.stress ); > + opt.mmode = 0; > + break; > + } > + break; > > case 'd': > /* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed Nov 2 06:40:28 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Nov 2005 09:40:28 -0500 Subject: [openib-general] RE: Re:[PATCH] Osmtest - update command option + vapi fix In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E372A1E1@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E372A1E1@mtlexch01.mtl.com> Message-ID: <1130942427.4381.3866.camel@hal.voltaire.com> On Wed, 2005-11-02 at 09:35, Liran Sorani wrote: > Hi , Hal . > PLS see below , search for [LS] > > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, November 02, 2005 3:55 PM > To: Liran Sorani > Cc: openib-general at openib.org > Subject: Re: Re:[PATCH] Osmtest - update command option + vapi fix > > > Hi Liran, > > On Tue, 2005-11-01 at 08:38, Liran Sorani wrote: > > Hi Hal, > > 1. Regarding the osmtest_SOURCES , it works both ways (i.e compile > all files required) , > > still the correct one is += > > I understand. You only had = not += in your patch for this. I changed > it > so that it works and doesn't override osmtest_SOURCES but adds to it > when VAPI is being built. > > > 2. Following is the patch for main.c : > > Thanks. Applied with some minor format changes. -- Hal From eitan at mellanox.co.il Wed Nov 2 06:50:57 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 2 Nov 2005 16:50:57 +0200 Subject: [openib-general] Re: openib segfaults when openib is not load ed Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618893@mtlexch01.mtl.com> Hi Hal, Yael is working on the exact same problem. She is probably going to complete it tomorrow. The issue was both the vl15 cl_unregister but we are also facing some issues as the umad receiver never exists. When MADs are arriving after the dispatcher is destroyed they cause a segfault. Hope it will be all fixed by the weekend. EZ Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, November 02, 2005 4:20 PM > To: Michael S. Tsirkin > Cc: openib-general at openib.org > Subject: [openib-general] Re: openib segfaults when openib is not loaded > > On Wed, 2005-11-02 at 09:14, Michael S. Tsirkin wrote: > > Hi! > > If I try to load opensm without loading any of openib modules, > > opensm crashes on exit. > > Has anyone else seen this? > > > > # /usr/local/bin/opensm > > ------------------------------------------------- > > OpenSM Rev:openib-1.1.0 > > Command Line Arguments: > > Log File: /var/log/osm.log > > ------------------------------------------------- > > OpenSM Rev:openib-1.1.0 > > > > ibwarn: [8954] umad_init: can't read ABI version from > /sys/class/infiniband_mad/abi_version (No such file or directory): is ib_umad module > loaded? > > > > Error from osm_vendor_get_all_port_attr (ffffffff) > > Error: Could not get port guid > > Exiting SM > > > > Segmentation fault (core dumped) > > Yes, this seg fault is caused due to the following: > osm_opensm_destroy shutdowns the dispatcher and subsequent to this > osm_vl15_destroy attempts to unregister with the dispatcher (although > this has already been done). > > osm_opensm.c::osm_opensm_destroy > > /* shut down the dispatcher - so no new messages cross */ > cl_disp_shutdown( &p_osm->disp ); > > /* cleanup all messages on VL15 fifo that were not sent yet */ > osm_vl15_shutdown( &p_osm->vl15, &p_osm->mad_pool ); > > /* lock the whole thing so we do not get any requests etc */ > cl_plock_excl_acquire( &p_osm->lock ); > > /* do the destruction in reverse order as init */ > updn_destroy( p_osm->p_updn_ucast_routing ); > osm_sa_destroy( &p_osm->sa ); > osm_sm_destroy( &p_osm->sm ); > osm_db_destroy( &p_osm->db ); > osm_vl15_destroy( &p_osm->vl15, &p_osm->mad_pool ); > > > My workaround has been to remove this from > osm_vl15intf.c::osm_vl15_destroy but I'm not sure this is the best long > term fix as yet. I hadn't searched out whether there were other paths > that were different from this flow. > > This seems lower priority to me than some other issues I'm still sorting > through but I will get back to this unless someone else gets to it first > or thinks that the workaround I have should be made permanent. > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Wed Nov 2 06:55:30 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 02 Nov 2005 06:55:30 -0800 Subject: [openib-general] Re: scsi/srp.h In-Reply-To: <20051102084501.GR31134@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 2 Nov 2005 10:45:01 +0200") References: <20051102084501.GR31134@mellanox.co.il> Message-ID: <52ek5z9k0t.fsf@cisco.com> Michael> Roland, would you mind moving scsi/srp.h from ulp/srp to Michael> infiniband/include in subversion, please? Michael> The fact that its under ulp/srp breaks build of a tree Michael> linked to under drivers/infiniband Michael> drivers/infiniband/ulp/srp/ib_srp.c:49:22: scsi/srp.h: No Michael> such file or directory It's not a big deal to move it but I don't understand why your build is breaking. I thought the kernel passed a "-I" option with the current source directory to gcc. I have lots of kernel trees with the svn linux-kernel/infiniband directory symlinked to drivers/infiniband, and the builds all work fine. What does make V=1 drivers/infiniband/ulp/srp/ib_srp.o show for you? - R. From mst at mellanox.co.il Wed Nov 2 07:07:49 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 2 Nov 2005 17:07:49 +0200 Subject: [openib-general] Re: scsi/srp.h In-Reply-To: <52ek5z9k0t.fsf@cisco.com> References: <52ek5z9k0t.fsf@cisco.com> Message-ID: <20051102150749.GG31134@mellanox.co.il> Quoting Roland Dreier : > Michael> Roland, would you mind moving scsi/srp.h from ulp/srp to > Michael> infiniband/include in subversion, please? > > Michael> The fact that its under ulp/srp breaks build of a tree > Michael> linked to under drivers/infiniband > Michael> drivers/infiniband/ulp/srp/ib_srp.c:49:22: scsi/srp.h: No > Michael> such file or directory > > It's not a big deal to move it but I don't understand why your build > is breaking. I thought the kernel passed a "-I" option with the > current source directory to gcc. I have lots of kernel trees with the > svn linux-kernel/infiniband directory symlinked to drivers/infiniband, > and the builds all work fine. What does > > make V=1 drivers/infiniband/ulp/srp/ib_srp.o > > show for you? > > - R. > # make V=1 drivers/infiniband/ulp/srp/ib_srp.o make -f scripts/Makefile.build obj=scripts/basic SPLIT include/linux/autoconf.h -> include/config/* make -f scripts/Makefile.build obj=scripts make -f scripts/Makefile.build obj=scripts/mod gcc -Wp,-MD,scripts/mod/.empty.o.d -nostdinc -isystem /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.5/include -D__KERNEL__ -Iinclude -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -O2 -fomit-frame-pointer -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -DKBUILD_BASENAME=empty -DKBUILD_MODNAME=empty -c -o scripts/mod/empty.o scripts/mod/empty.c scripts/mod/mk_elfconfig x86_64 < scripts/mod/empty.o > scripts/mod/elfconfig.h gcc -Wp,-MD,scripts/mod/.file2alias.o.d -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -c -o scripts/mod/file2alias.o scripts/mod/file2alias.c gcc -Wp,-MD,scripts/mod/.modpost.o.d -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -c -o scripts/mod/modpost.o scripts/mod/modpost.c gcc -Wp,-MD,scripts/mod/.sumversion.o.d -Wall -Wstrict-prototypes -O2 -fomit-frame-pointer -c -o scripts/mod/sumversion.o scripts/mod/sumversion.c gcc -o scripts/mod/modpost scripts/mod/modpost.o scripts/mod/file2alias.o scripts/mod/sumversion.o make -f scripts/Makefile.build obj=drivers/infiniband/ulp/srp drivers/infiniband/ulp/srp/ib_srp.o gcc -Wp,-MD,drivers/infiniband/ulp/srp/.ib_srp.o.d -nostdinc -isystem /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.5/include -D__KERNEL__ -Iinclude -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -O2 -fomit-frame-pointer -mno-red-zone -mcmodel=kernel -pipe -fno-reorder-blocks -Wno-sign-compare -fno-asynchronous-unwind-tables -funit-at-a-time -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -Idrivers/infiniband/include -DMODULE -DKBUILD_BASENAME=ib_srp -DKBUILD_MODNAME=ib_srp -c -o drivers/infiniband/ulp/srp/ib_srp.o drivers/infiniband/ulp/srp/ib_srp.c drivers/infiniband/ulp/srp/ib_srp.c:49:22: scsi/srp.h: No such file or directory -- MST From rolandd at cisco.com Wed Nov 2 07:11:56 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 02 Nov 2005 07:11:56 -0800 Subject: [openib-general] Re: scsi/srp.h In-Reply-To: <20051102150749.GG31134@mellanox.co.il> (Michael S. Tsirkin's message of "Wed, 2 Nov 2005 17:07:49 +0200") References: <52ek5z9k0t.fsf@cisco.com> <20051102150749.GG31134@mellanox.co.il> Message-ID: <52acgn9j9f.fsf@cisco.com> I see -- the build fails if you build directly in your kernel tree. I always build with O=xxx to keep my source tree clean (so I can build multiple targets from the same svn checkout). Anyway, I just moved the include in svn. - R. From halr at voltaire.com Wed Nov 2 07:29:15 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Wed, 2 Nov 2005 17:29:15 +0200 Subject: [openib-general] Re: openib segfaults when openib is not load ed Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F5175CDC@taurus.voltaire.com> On Wed, 2005-11-02 at 09:50, Eitan Zahavi wrote: > Hi Hal, > > Yael is working on the exact same problem. She is probably going to > complete it tomorrow. > > The issue was both the vl15 cl_unregister but we are also facing some > issues as the umad receiver never exists. Yes, I've also been working on making the umad receiver exit. This has also been a lower priority and I don't have a completed solution yet. -- Hal > When MADs are arriving after the dispatcher is destroyed they cause a > segfault. > > Hope it will be all fixed by the weekend. > > EZ > > Eitan Zahavi > Design Technology Director > Mellanox Technologies LTD > Tel:+972-4-9097208 > Fax:+972-4-9593245 > P.O. Box 586 Yokneam 20692 ISRAEL > > > > -----Original Message----- > > From: Hal Rosenstock [mailto:halr at voltaire.com] > > Sent: Wednesday, November 02, 2005 4:20 PM > > To: Michael S. Tsirkin > > Cc: openib-general at openib.org > > Subject: [openib-general] Re: openib segfaults when openib is not > loaded > > > > On Wed, 2005-11-02 at 09:14, Michael S. Tsirkin wrote: > > > Hi! > > > If I try to load opensm without loading any of openib modules, > > > opensm crashes on exit. > > > Has anyone else seen this? > > > > > > # /usr/local/bin/opensm > > > ------------------------------------------------- > > > OpenSM Rev:openib-1.1.0 > > > Command Line Arguments: > > > Log File: /var/log/osm.log > > > ------------------------------------------------- > > > OpenSM Rev:openib-1.1.0 > > > > > > ibwarn: [8954] umad_init: can't read ABI version from > > /sys/class/infiniband_mad/abi_version (No such file or directory): > is ib_umad module > > loaded? > > > > > > Error from osm_vendor_get_all_port_attr (ffffffff) > > > Error: Could not get port guid > > > Exiting SM > > > > > > Segmentation fault (core dumped) > > > > Yes, this seg fault is caused due to the following: > > osm_opensm_destroy shutdowns the dispatcher and subsequent to this > > osm_vl15_destroy attempts to unregister with the dispatcher > (although > > this has already been done). > > > > osm_opensm.c::osm_opensm_destroy > > > > /* shut down the dispatcher - so no new messages cross */ > > cl_disp_shutdown( &p_osm->disp ); > > > > /* cleanup all messages on VL15 fifo that were not sent yet */ > > osm_vl15_shutdown( &p_osm->vl15, &p_osm->mad_pool ); > > > > /* lock the whole thing so we do not get any requests etc */ > > cl_plock_excl_acquire( &p_osm->lock ); > > > > /* do the destruction in reverse order as init */ > > updn_destroy( p_osm->p_updn_ucast_routing ); > > osm_sa_destroy( &p_osm->sa ); > > osm_sm_destroy( &p_osm->sm ); > > osm_db_destroy( &p_osm->db ); > > osm_vl15_destroy( &p_osm->vl15, &p_osm->mad_pool ); > > > > > > My workaround has been to remove this from > > osm_vl15intf.c::osm_vl15_destroy but I'm not sure this is the best > long > > term fix as yet. I hadn't searched out whether there were other > paths > > that were different from this flow. > > > > This seems lower priority to me than some other issues I'm still > sorting > > through but I will get back to this unless someone else gets to it > first > > or thinks that the workaround I have should be made permanent. > > > > -- Hal > > > > _______________________________________________ > > openib-general mailing list > > openib-general at openib.org > > http://openib.org/mailman/listinfo/openib-general > > > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Wed Nov 2 08:40:45 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 02 Nov 2005 08:40:45 -0800 Subject: [openib-general] [PATCH] kmalloc + memset(, 0, ) -> kzalloc conversions In-Reply-To: <524q6vc00x.fsf@cisco.com> References: <524q6vc00x.fsf@cisco.com> Message-ID: <4368EC0D.1000009@ichips.intel.com> Roland Dreier wrote: > Anyone have any objection to me committing the following patch? I have no objection. - Sean From robert.j.woodruff at intel.com Wed Nov 2 09:27:26 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Wed, 2 Nov 2005 09:27:26 -0800 Subject: [openib-general] Problems with SDP on Itanium In-Reply-To: <4368EC0D.1000009@ichips.intel.com> Message-ID: Has anyone tried using SDP on Itanium ? I was trying to run a NetPIPE over SDP (svn Rev 3882). It seems to run fine for small transfers, but the applications hangs when it gets to > 1 Megabyte transfers. woody From bohra at cs.rutgers.edu Wed Nov 2 10:35:28 2005 From: bohra at cs.rutgers.edu (Aniruddha Bohra) Date: Wed, 02 Nov 2005 13:35:28 -0500 Subject: [openib-general] uDAPL again Message-ID: <436906F0.3050803@cs.rutgers.edu> Hello, The following is the log for a request I am sending, The number of IOVs for req is 2. And the iov is shown below : REQ[0] = (0xb5f3f100, 48, 0xca88003b)^M REQ[1] = (0xb5f3f2b8, 152, 0xca88003b)^M dapl_ep_post_send (0x8087110, 2, 0x808b300, 0xb5f3f6b4, 0)^M dapl_ep_post_send : LOCALIOV[0] = (0xb5f3f100, 48, 0xca88003b)^M dapl_ep_post_send : LOCALIOV[1] = (0xb5f3f2b8, 152, 0xca88003b)^M post_snd: ep 0x8087110 op 2 ck 0x8087374 sgs 2 l_iov 0x808b300 r_iov 0xbf964290 f 0^M post_snd: ep 0x8087110 cookie 0x8087374 segs 2 l_iov 0x808b300^M post_snd_localiov: lkey 0xca88003b va 0xb5f3f100 len 48 ^M post_snd: lkey 0xca88003b va 0xb5f3f100 len 48 ^M post_snd_localiov: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M post_snd: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M post_snd: op 0x2 flags 0x2 sglist 0xbf9641b0, 2^M post_snd: returned^M dapl_ep_post_send () returns 0x0^M dapl_evd_wait (0x8083ca0, -1, 1, 0xbf9642d0, 0xbf9642cc)^M dapl_evd_wait: EVD 0x8083ca0, CQ 0x8083da0^M cq_object_wait: CQ channel 0x8081290 time -1^M cq_object_wait: RET evd 0x8083ca0 ibv_cq 0x8083da0 ibv_ctx (nil) Success^M >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M dapl_evd_dto_callback : CQE ^M work_req_id 134771572^M status 12^M >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M DTO completion ERROR: 12: op 0xff^M disconnect(ep 0x8087110, conn 0x808a008, id 134774528 flags 0)^M destroy_cm_id: conn 0x808a008 id 134774528^M dapli_evd_post_event: Called with event # 4006^M Any ideas how to proceed to even debug this ? Thanks Aniruddha From iod00d at hp.com Wed Nov 2 10:42:40 2005 From: iod00d at hp.com (Grant Grundler) Date: Wed, 2 Nov 2005 10:42:40 -0800 Subject: [openib-general] Problems with SDP on Itanium In-Reply-To: References: <4368EC0D.1000009@ichips.intel.com> Message-ID: <20051102184240.GJ28222@esmail.cup.hp.com> On Wed, Nov 02, 2005 at 09:27:26AM -0800, Bob Woodruff wrote: > Has anyone tried using SDP on Itanium ? Yes - but it's been 5-6 weeks since I have tried it (SVN r3547). > I was trying to run a NetPIPE over SDP (svn Rev 3882). > It seems to run fine for small transfers, but > the applications hangs when it gets to > 1 Megabyte > transfers. I haven't tested message sizes > 128KB. I'll include 256/512/1024/2048 KB message sizes in the next round. And I still owe Michael some investigation results from the last round were perf dropped off to near zero for medium sized messages. grant From jlentini at netapp.com Wed Nov 2 10:44:05 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 2 Nov 2005 13:44:05 -0500 (EST) Subject: [openib-general] Re: uDAPL again In-Reply-To: <436906F0.3050803@cs.rutgers.edu> References: <436906F0.3050803@cs.rutgers.edu> Message-ID: On Wed, 2 Nov 2005, Aniruddha Bohra wrote: > Hello, > The following is the log for a request I am sending, > > The number of IOVs for req is 2. And the iov is shown below : > > REQ[0] = (0xb5f3f100, 48, 0xca88003b)^M > REQ[1] = (0xb5f3f2b8, 152, 0xca88003b)^M > > dapl_ep_post_send (0x8087110, 2, 0x808b300, 0xb5f3f6b4, 0)^M > dapl_ep_post_send : LOCALIOV[0] = (0xb5f3f100, 48, 0xca88003b)^M > dapl_ep_post_send : LOCALIOV[1] = (0xb5f3f2b8, 152, 0xca88003b)^M > post_snd: ep 0x8087110 op 2 ck 0x8087374 sgs 2 l_iov 0x808b300 r_iov > 0xbf964290 f 0^M > post_snd: ep 0x8087110 cookie 0x8087374 segs 2 l_iov 0x808b300^M > post_snd_localiov: lkey 0xca88003b va 0xb5f3f100 len 48 ^M > post_snd: lkey 0xca88003b va 0xb5f3f100 len 48 ^M > post_snd_localiov: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M > post_snd: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M > post_snd: op 0x2 flags 0x2 sglist 0xbf9641b0, 2^M > post_snd: returned^M > dapl_ep_post_send () returns 0x0^M > dapl_evd_wait (0x8083ca0, -1, 1, 0xbf9642d0, 0xbf9642cc)^M > dapl_evd_wait: EVD 0x8083ca0, CQ 0x8083da0^M > cq_object_wait: CQ channel 0x8081290 time -1^M > cq_object_wait: RET evd 0x8083ca0 ibv_cq 0x8083da0 ibv_ctx (nil) Success^M > >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M > dapl_evd_dto_callback : CQE ^M > work_req_id 134771572^M > status 12^M Status 12 is IBV_WC_RETRY_EXC_ERR. Are you sure you can communicate over IB? Do pings over IPoIB work, etc.? > >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M > DTO completion ERROR: 12: op 0xff^M > disconnect(ep 0x8087110, conn 0x808a008, id 134774528 flags 0)^M > destroy_cm_id: conn 0x808a008 id 134774528^M > dapli_evd_post_event: Called with event # 4006^M > > > Any ideas how to proceed to even debug this ? > > Thanks > Aniruddha > From jlentini at netapp.com Wed Nov 2 10:51:52 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 2 Nov 2005 13:51:52 -0500 (EST) Subject: [openib-general] [OpenSM] SA database query tool Message-ID: Hal, Is there an existing OpenIB tool that can query an SA's database using MADs? Specifically, I want to retrieve all of the SA's service records. If such a tool doesn't exist, where would you start writing one? Would you layer it on top of libibmad? james From bohra at cs.rutgers.edu Wed Nov 2 10:59:33 2005 From: bohra at cs.rutgers.edu (Aniruddha Bohra) Date: Wed, 02 Nov 2005 13:59:33 -0500 Subject: [openib-general] Re: uDAPL again In-Reply-To: References: <436906F0.3050803@cs.rutgers.edu> Message-ID: <43690C95.3050009@cs.rutgers.edu> James Lentini wrote: >On Wed, 2 Nov 2005, Aniruddha Bohra wrote: > > > >>Hello, >> The following is the log for a request I am sending, >> >>The number of IOVs for req is 2. And the iov is shown below : >> >>REQ[0] = (0xb5f3f100, 48, 0xca88003b)^M >>REQ[1] = (0xb5f3f2b8, 152, 0xca88003b)^M >> >>dapl_ep_post_send (0x8087110, 2, 0x808b300, 0xb5f3f6b4, 0)^M >>dapl_ep_post_send : LOCALIOV[0] = (0xb5f3f100, 48, 0xca88003b)^M >>dapl_ep_post_send : LOCALIOV[1] = (0xb5f3f2b8, 152, 0xca88003b)^M >>post_snd: ep 0x8087110 op 2 ck 0x8087374 sgs 2 l_iov 0x808b300 r_iov >>0xbf964290 f 0^M >>post_snd: ep 0x8087110 cookie 0x8087374 segs 2 l_iov 0x808b300^M >>post_snd_localiov: lkey 0xca88003b va 0xb5f3f100 len 48 ^M >>post_snd: lkey 0xca88003b va 0xb5f3f100 len 48 ^M >>post_snd_localiov: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M >>post_snd: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M >>post_snd: op 0x2 flags 0x2 sglist 0xbf9641b0, 2^M >>post_snd: returned^M >>dapl_ep_post_send () returns 0x0^M >>dapl_evd_wait (0x8083ca0, -1, 1, 0xbf9642d0, 0xbf9642cc)^M >>dapl_evd_wait: EVD 0x8083ca0, CQ 0x8083da0^M >>cq_object_wait: CQ channel 0x8081290 time -1^M >>cq_object_wait: RET evd 0x8083ca0 ibv_cq 0x8083da0 ibv_ctx (nil) Success^M >> >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M >> dapl_evd_dto_callback : CQE ^M >> work_req_id 134771572^M >> status 12^M >> >> > >Status 12 is IBV_WC_RETRY_EXC_ERR. > >Are you sure you can communicate over IB? Do pings over IPoIB work, >etc.? > > > bohra at hora-3 ~]$ ping -b 10.10.10.255 WARNING: pinging broadcast address PING 10.10.10.255 (10.10.10.255) 56(84) bytes of data. 64 bytes from 10.10.10.12: icmp_seq=0 ttl=64 time=0.034 ms 64 bytes from 10.10.10.13: icmp_seq=0 ttl=64 time=8.98 ms (DUP!) 64 bytes from 10.10.10.12: icmp_seq=1 ttl=64 time=0.033 ms 64 bytes from 10.10.10.13: icmp_seq=1 ttl=64 time=0.095 ms (DUP!) 64 bytes from 10.10.10.12: icmp_seq=2 ttl=64 time=0.025 ms 64 bytes from 10.10.10.13: icmp_seq=2 ttl=64 time=0.096 ms (DUP!) --- 10.10.10.255 ping statistics --- 3 packets transmitted, 3 received, +3 duplicates, 0% packet loss, time 2020ms rtt min/avg/max/mdev = 0.025/1.544/8.986/3.328 ms, pipe 2 [bohra at hora-3 ~]$ ifconfig ib0 ib0 Link encap:UNSPEC HWaddr 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.10.10.12 Bcast:10.255.255.255 Mask:255.255.255.0 inet6 addr: fe80::202:c901:81e:7471/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:4 errors:0 dropped:0 overruns:0 frame:0 TX packets:77 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:308 (308.0 b) TX bytes:4788 (4.6 KiB) My target is the filer, which does not respond to pings (10.10.10.11). Aniruddha From mst at mellanox.co.il Wed Nov 2 10:58:39 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Wed, 2 Nov 2005 20:58:39 +0200 Subject: [openib-general] Re: Problems with SDP on Itanium In-Reply-To: References: Message-ID: <20051102185838.GA26005@mellanox.co.il> Quoting r. Bob Woodruff : > Subject: Problems with SDP on Itanium > > Has anyone tried using SDP on Itanium ? > I was trying to run a NetPIPE over SDP (svn Rev 3882). > It seems to run fine for small transfers, but > the applications hangs when it gets to > 1 Megabyte > transfers. > > > woody > No, dont think I've seen that one, but its been a while since I last run anything on Itanium. Can you try to debug it a little? What does it mean that an application "hangs"? Is some data sent from one side not received by another one? -- MST From jlentini at netapp.com Wed Nov 2 11:01:11 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 2 Nov 2005 14:01:11 -0500 (EST) Subject: [openib-general] Re: uDAPL again In-Reply-To: <43690C95.3050009@cs.rutgers.edu> References: <436906F0.3050803@cs.rutgers.edu> <43690C95.3050009@cs.rutgers.edu> Message-ID: On Wed, 2 Nov 2005, Aniruddha Bohra wrote: > James Lentini wrote: > > > On Wed, 2 Nov 2005, Aniruddha Bohra wrote: > > > > > > > Hello, > > > The following is the log for a request I am sending, > > > > > > The number of IOVs for req is 2. And the iov is shown below : > > > > > > REQ[0] = (0xb5f3f100, 48, 0xca88003b)^M > > > REQ[1] = (0xb5f3f2b8, 152, 0xca88003b)^M > > > > > > dapl_ep_post_send (0x8087110, 2, 0x808b300, 0xb5f3f6b4, 0)^M > > > dapl_ep_post_send : LOCALIOV[0] = (0xb5f3f100, 48, 0xca88003b)^M > > > dapl_ep_post_send : LOCALIOV[1] = (0xb5f3f2b8, 152, 0xca88003b)^M > > > post_snd: ep 0x8087110 op 2 ck 0x8087374 sgs 2 l_iov 0x808b300 r_iov > > > 0xbf964290 f 0^M > > > post_snd: ep 0x8087110 cookie 0x8087374 segs 2 l_iov 0x808b300^M > > > post_snd_localiov: lkey 0xca88003b va 0xb5f3f100 len 48 ^M > > > post_snd: lkey 0xca88003b va 0xb5f3f100 len 48 ^M > > > post_snd_localiov: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M > > > post_snd: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M > > > post_snd: op 0x2 flags 0x2 sglist 0xbf9641b0, 2^M > > > post_snd: returned^M > > > dapl_ep_post_send () returns 0x0^M > > > dapl_evd_wait (0x8083ca0, -1, 1, 0xbf9642d0, 0xbf9642cc)^M > > > dapl_evd_wait: EVD 0x8083ca0, CQ 0x8083da0^M > > > cq_object_wait: CQ channel 0x8081290 time -1^M > > > cq_object_wait: RET evd 0x8083ca0 ibv_cq 0x8083da0 ibv_ctx (nil) Success^M > > > >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M > > > dapl_evd_dto_callback : CQE ^M > > > work_req_id 134771572^M > > > status 12^M > > > > > > > Status 12 is IBV_WC_RETRY_EXC_ERR. > > > > Are you sure you can communicate over IB? Do pings over IPoIB work, etc.? > > > > > bohra at hora-3 ~]$ ping -b 10.10.10.255 > WARNING: pinging broadcast address > PING 10.10.10.255 (10.10.10.255) 56(84) bytes of data. > 64 bytes from 10.10.10.12: icmp_seq=0 ttl=64 time=0.034 ms > 64 bytes from 10.10.10.13: icmp_seq=0 ttl=64 time=8.98 ms (DUP!) > 64 bytes from 10.10.10.12: icmp_seq=1 ttl=64 time=0.033 ms > 64 bytes from 10.10.10.13: icmp_seq=1 ttl=64 time=0.095 ms (DUP!) > 64 bytes from 10.10.10.12: icmp_seq=2 ttl=64 time=0.025 ms > 64 bytes from 10.10.10.13: icmp_seq=2 ttl=64 time=0.096 ms (DUP!) > > --- 10.10.10.255 ping statistics --- I don't see DUPs when I ping the broadcast address. Is it possible another machine is configured with the same IP address? Do you only have the one OpenIB node? > 3 packets transmitted, 3 received, +3 duplicates, 0% packet loss, time 2020ms > rtt min/avg/max/mdev = 0.025/1.544/8.986/3.328 ms, pipe 2 > [bohra at hora-3 ~]$ ifconfig ib0 > ib0 Link encap:UNSPEC HWaddr > 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:10.10.10.12 Bcast:10.255.255.255 Mask:255.255.255.0 > inet6 addr: fe80::202:c901:81e:7471/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > RX packets:4 errors:0 dropped:0 overruns:0 frame:0 > TX packets:77 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:128 > RX bytes:308 (308.0 b) TX bytes:4788 (4.6 KiB) > > My target is the filer, which does not respond to pings (10.10.10.11). > > Aniruddha > From halr at voltaire.com Wed Nov 2 11:02:06 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Nov 2005 14:02:06 -0500 Subject: [openib-general] Re: [OpenSM] SA database query tool In-Reply-To: References: Message-ID: <1130958126.4381.4109.camel@hal.voltaire.com> On Wed, 2005-11-02 at 13:51, James Lentini wrote: > Hal, > > Is there an existing OpenIB tool that can query an SA's database using > MADs? Specifically, I want to retrieve all of the SA's service > records. The only current way is via ibis. > If such a tool doesn't exist, where would you start writing one? > Would you layer it on top of libibmad? There are two approaches I can think of off the top of my head: 1. Support for this and other SA searches in the SM console 2. Be able to obtain these remotely via building on top of umad in some form (a real userspace SA client will likely be done in the not too distant future). What is the timeframe for this need ? How formal do you need something in the interim ? -- Hal From halr at voltaire.com Wed Nov 2 11:06:59 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Nov 2005 14:06:59 -0500 Subject: [openib-general] Re: uDAPL again In-Reply-To: References: <436906F0.3050803@cs.rutgers.edu> <43690C95.3050009@cs.rutgers.edu> Message-ID: <1130958417.4381.4118.camel@hal.voltaire.com> On Wed, 2005-11-02 at 14:01, James Lentini wrote: > On Wed, 2 Nov 2005, Aniruddha Bohra wrote: > > > James Lentini wrote: > > > > > On Wed, 2 Nov 2005, Aniruddha Bohra wrote: > > > > > > > > > > Hello, > > > > The following is the log for a request I am sending, > > > > > > > > The number of IOVs for req is 2. And the iov is shown below : > > > > > > > > REQ[0] = (0xb5f3f100, 48, 0xca88003b)^M > > > > REQ[1] = (0xb5f3f2b8, 152, 0xca88003b)^M > > > > > > > > dapl_ep_post_send (0x8087110, 2, 0x808b300, 0xb5f3f6b4, 0)^M > > > > dapl_ep_post_send : LOCALIOV[0] = (0xb5f3f100, 48, 0xca88003b)^M > > > > dapl_ep_post_send : LOCALIOV[1] = (0xb5f3f2b8, 152, 0xca88003b)^M > > > > post_snd: ep 0x8087110 op 2 ck 0x8087374 sgs 2 l_iov 0x808b300 r_iov > > > > 0xbf964290 f 0^M > > > > post_snd: ep 0x8087110 cookie 0x8087374 segs 2 l_iov 0x808b300^M > > > > post_snd_localiov: lkey 0xca88003b va 0xb5f3f100 len 48 ^M > > > > post_snd: lkey 0xca88003b va 0xb5f3f100 len 48 ^M > > > > post_snd_localiov: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M > > > > post_snd: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M > > > > post_snd: op 0x2 flags 0x2 sglist 0xbf9641b0, 2^M > > > > post_snd: returned^M > > > > dapl_ep_post_send () returns 0x0^M > > > > dapl_evd_wait (0x8083ca0, -1, 1, 0xbf9642d0, 0xbf9642cc)^M > > > > dapl_evd_wait: EVD 0x8083ca0, CQ 0x8083da0^M > > > > cq_object_wait: CQ channel 0x8081290 time -1^M > > > > cq_object_wait: RET evd 0x8083ca0 ibv_cq 0x8083da0 ibv_ctx (nil) Success^M > > > > >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M > > > > dapl_evd_dto_callback : CQE ^M > > > > work_req_id 134771572^M > > > > status 12^M > > > > > > > > > > Status 12 is IBV_WC_RETRY_EXC_ERR. > > > > > > Are you sure you can communicate over IB? Do pings over IPoIB work, etc.? > > > > > > > > bohra at hora-3 ~]$ ping -b 10.10.10.255 > > WARNING: pinging broadcast address > > PING 10.10.10.255 (10.10.10.255) 56(84) bytes of data. > > 64 bytes from 10.10.10.12: icmp_seq=0 ttl=64 time=0.034 ms > > 64 bytes from 10.10.10.13: icmp_seq=0 ttl=64 time=8.98 ms (DUP!) > > 64 bytes from 10.10.10.12: icmp_seq=1 ttl=64 time=0.033 ms > > 64 bytes from 10.10.10.13: icmp_seq=1 ttl=64 time=0.095 ms (DUP!) > > 64 bytes from 10.10.10.12: icmp_seq=2 ttl=64 time=0.025 ms > > 64 bytes from 10.10.10.13: icmp_seq=2 ttl=64 time=0.096 ms (DUP!) > > > > --- 10.10.10.255 ping statistics --- > > > I don't see DUPs when I ping the broadcast address. I get dups. This is ping. When using the subnet broadcast address, it does not distriguish that the replies are different; just that it got multiple replies for a single request. It may depend on the version of ping. -- Hal > Is it possible > another machine is configured with the same IP address? > > Do you only have the one OpenIB node? > > > 3 packets transmitted, 3 received, +3 duplicates, 0% packet loss, time 2020ms > > rtt min/avg/max/mdev = 0.025/1.544/8.986/3.328 ms, pipe 2 > > [bohra at hora-3 ~]$ ifconfig ib0 > > ib0 Link encap:UNSPEC HWaddr > > 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 > > inet addr:10.10.10.12 Bcast:10.255.255.255 Mask:255.255.255.0 > > inet6 addr: fe80::202:c901:81e:7471/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 > > RX packets:4 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:77 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:128 > > RX bytes:308 (308.0 b) TX bytes:4788 (4.6 KiB) > > > > My target is the filer, which does not respond to pings (10.10.10.11). > > > > Aniruddha > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From bohra at cs.rutgers.edu Wed Nov 2 11:16:12 2005 From: bohra at cs.rutgers.edu (Aniruddha Bohra) Date: Wed, 02 Nov 2005 14:16:12 -0500 Subject: [openib-general] Re: uDAPL again In-Reply-To: References: <436906F0.3050803@cs.rutgers.edu> <43690C95.3050009@cs.rutgers.edu> Message-ID: <4369107C.2070203@cs.rutgers.edu> James Lentini wrote: >On Wed, 2 Nov 2005, Aniruddha Bohra wrote: > > > >>James Lentini wrote: >> >> >> >>>On Wed, 2 Nov 2005, Aniruddha Bohra wrote: >>> >>> >>> >>> >>>>Hello, >>>> The following is the log for a request I am sending, >>>> >>>>The number of IOVs for req is 2. And the iov is shown below : >>>> >>>>REQ[0] = (0xb5f3f100, 48, 0xca88003b)^M >>>>REQ[1] = (0xb5f3f2b8, 152, 0xca88003b)^M >>>> >>>>dapl_ep_post_send (0x8087110, 2, 0x808b300, 0xb5f3f6b4, 0)^M >>>>dapl_ep_post_send : LOCALIOV[0] = (0xb5f3f100, 48, 0xca88003b)^M >>>>dapl_ep_post_send : LOCALIOV[1] = (0xb5f3f2b8, 152, 0xca88003b)^M >>>>post_snd: ep 0x8087110 op 2 ck 0x8087374 sgs 2 l_iov 0x808b300 r_iov >>>>0xbf964290 f 0^M >>>>post_snd: ep 0x8087110 cookie 0x8087374 segs 2 l_iov 0x808b300^M >>>>post_snd_localiov: lkey 0xca88003b va 0xb5f3f100 len 48 ^M >>>>post_snd: lkey 0xca88003b va 0xb5f3f100 len 48 ^M >>>>post_snd_localiov: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M >>>>post_snd: lkey 0xca88003b va 0xb5f3f2b8 len 152 ^M >>>>post_snd: op 0x2 flags 0x2 sglist 0xbf9641b0, 2^M >>>>post_snd: returned^M >>>>dapl_ep_post_send () returns 0x0^M >>>>dapl_evd_wait (0x8083ca0, -1, 1, 0xbf9642d0, 0xbf9642cc)^M >>>>dapl_evd_wait: EVD 0x8083ca0, CQ 0x8083da0^M >>>>cq_object_wait: CQ channel 0x8081290 time -1^M >>>>cq_object_wait: RET evd 0x8083ca0 ibv_cq 0x8083da0 ibv_ctx (nil) Success^M >>>> >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M >>>> dapl_evd_dto_callback : CQE ^M >>>> work_req_id 134771572^M >>>> status 12^M >>>> >>>> >>>> >>>Status 12 is IBV_WC_RETRY_EXC_ERR. >>> >>>Are you sure you can communicate over IB? Do pings over IPoIB work, etc.? >>> >>> >>> >>> >>bohra at hora-3 ~]$ ping -b 10.10.10.255 >>WARNING: pinging broadcast address >>PING 10.10.10.255 (10.10.10.255) 56(84) bytes of data. >>64 bytes from 10.10.10.12: icmp_seq=0 ttl=64 time=0.034 ms >>64 bytes from 10.10.10.13: icmp_seq=0 ttl=64 time=8.98 ms (DUP!) >>64 bytes from 10.10.10.12: icmp_seq=1 ttl=64 time=0.033 ms >>64 bytes from 10.10.10.13: icmp_seq=1 ttl=64 time=0.095 ms (DUP!) >>64 bytes from 10.10.10.12: icmp_seq=2 ttl=64 time=0.025 ms >>64 bytes from 10.10.10.13: icmp_seq=2 ttl=64 time=0.096 ms (DUP!) >> >>--- 10.10.10.255 ping statistics --- >> >> > > >I don't see DUPs when I ping the broadcast address. Is it possible >another machine is configured with the same IP address? > > > No. There are 3 nodes on the switch. Two openib nodes. I can login to the .13 node through the 10.10.10 subnet using IPoIB. bohra at hora-3 ~]$ ping 10.10.10.13 PING 10.10.10.13 (10.10.10.13) 56(84) bytes of data. 64 bytes from 10.10.10.13: icmp_seq=0 ttl=64 time=0.112 ms 64 bytes from 10.10.10.13: icmp_seq=1 ttl=64 time=0.063 ms 64 bytes from 10.10.10.13: icmp_seq=2 ttl=64 time=0.073 ms 64 bytes from 10.10.10.13: icmp_seq=3 ttl=64 time=0.052 ms 64 bytes from 10.10.10.13: icmp_seq=4 ttl=64 time=0.069 ms 64 bytes from 10.10.10.13: icmp_seq=5 ttl=64 time=0.055 ms 64 bytes from 10.10.10.13: icmp_seq=6 ttl=64 time=0.076 ms 64 bytes from 10.10.10.13: icmp_seq=7 ttl=64 time=0.052 ms 64 bytes from 10.10.10.13: icmp_seq=8 ttl=64 time=0.070 ms Aniruddha From ardavis at ichips.intel.com Wed Nov 2 12:02:57 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 02 Nov 2005 12:02:57 -0800 Subject: [openib-general] Re: uDAPL again In-Reply-To: <436906F0.3050803@cs.rutgers.edu> References: <436906F0.3050803@cs.rutgers.edu> Message-ID: <43691B71.2040500@ichips.intel.com> Aniruddha Bohra wrote: > cq_object_wait: RET evd 0x8083ca0 ibv_cq 0x8083da0 ibv_ctx (nil) > Success^M > >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M > dapl_evd_dto_callback : CQE ^M > work_req_id 134771572^M > status 12^M > >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M > DTO completion ERROR: 12: op 0xff^M > disconnect(ep 0x8087110, conn 0x808a008, id 134774528 flags 0)^M > destroy_cm_id: conn 0x808a008 id 134774528^M > dapli_evd_post_event: Called with event # 4006^M > > > Any ideas how to proceed to even debug this ? Are you using the uDAPL provider with socket CM (VERBS=openib_scm) or the default one that use's uCM and uAT? For the socket_CM version the timeout is set to 14 (~67ms) and the retries are set to 7 so the receiving node would have to be delayed beyond ~469ms to get this failure. For the default uCM/uAT version the retries are set to 7 and the timeout is set to pktlifetime+1 so you would have to look at the path-record for the timeout value for the connection. Can you successfully run the IB verbs ibv_rc_pingpong test suite? Anything special about your fabric configuration that could induce this kind of latencies? Something on the fabric or in your remote system is delaying ACK's beyond your total timeout/retry times. If you had no buffers posted or attempted to send to unregistered memory you would get different errors. -arlin > > Thanks > Aniruddha > From jerome.pioux at bull.com Wed Nov 2 12:31:16 2005 From: jerome.pioux at bull.com (Jerome Pioux) Date: Wed, 2 Nov 2005 13:31:16 -0700 Subject: [openib-general] Problems with SDP on Itanium References: <4368EC0D.1000009@ichips.intel.com> <20051102184240.GJ28222@esmail.cup.hp.com> Message-ID: <000f01c5dfec$619ad4a0$0211708d@gpv.az05.bull.com> I am running SDP on rev 3882 on ia64 (modified RHEL4 - 2.6.12 kernel). I do not run NetPIPE but TTCP with options "-l 1048576 -b 1048576" which I think means 1M. I just tried a run with "-l 2097152 -b 2097152" which then would mean 2M and it seems to run okay: ttcp-t: buflen=2097152, nbuf=45000, align=16384/0, port=5001, sockbufsize=2097152 tcp -> 192.168.0.100 ttcp-t: 94371840000 bytes in 148.11 real seconds = 607.65 MB/sec +++ ttcp-t: 45000 I/O calls, msec/call = 3.37, calls/sec = 303.82 ttcp-t: user: 41016 sys: 36393573 total: 36434589 real: 148112374 But maybe, TTCP does not run message size > 1M even with these options set?... Jerome ----- Original Message ----- From: "Grant Grundler" To: "Bob Woodruff" Cc: Sent: Wednesday, November 02, 2005 11:42 AM Subject: Re: [openib-general] Problems with SDP on Itanium > On Wed, Nov 02, 2005 at 09:27:26AM -0800, Bob Woodruff wrote: >> Has anyone tried using SDP on Itanium ? > > Yes - but it's been 5-6 weeks since I have tried it (SVN r3547). > >> I was trying to run a NetPIPE over SDP (svn Rev 3882). >> It seems to run fine for small transfers, but >> the applications hangs when it gets to > 1 Megabyte >> transfers. > > I haven't tested message sizes > 128KB. > I'll include 256/512/1024/2048 KB message sizes in the next round. > > And I still owe Michael some investigation results from the > last round were perf dropped off to near zero for medium > sized messages. > > grant > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general From bohra at cs.rutgers.edu Wed Nov 2 12:44:22 2005 From: bohra at cs.rutgers.edu (Aniruddha Bohra) Date: Wed, 02 Nov 2005 15:44:22 -0500 Subject: [openib-general] Re: uDAPL again In-Reply-To: <43691B71.2040500@ichips.intel.com> References: <436906F0.3050803@cs.rutgers.edu> <43691B71.2040500@ichips.intel.com> Message-ID: <43692526.3030003@cs.rutgers.edu> Arlin Davis wrote: > Aniruddha Bohra wrote: > >> cq_object_wait: RET evd 0x8083ca0 ibv_cq 0x8083da0 ibv_ctx (nil) >> Success^M >> >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M >> dapl_evd_dto_callback : CQE ^M >> work_req_id 134771572^M >> status 12^M >> >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M >> DTO completion ERROR: 12: op 0xff^M >> disconnect(ep 0x8087110, conn 0x808a008, id 134774528 flags 0)^M >> destroy_cm_id: conn 0x808a008 id 134774528^M >> dapli_evd_post_event: Called with event # 4006^M >> >> >> Any ideas how to proceed to even debug this ? > > > > Are you using the uDAPL provider with socket CM (VERBS=openib_scm) or > the default one that use's uCM and uAT? For the socket_CM version the > timeout is set to 14 (~67ms) and the retries are set to 7 so the > receiving node would have to be delayed beyond ~469ms to get this > failure. For the default uCM/uAT version the retries are set to 7 and > the timeout is set to pktlifetime+1 so you would have to look at the > path-record for the timeout value for the connection. > I am using the default one. Actually, even the dapl_ep_connect() takes a long time. I am not sure, but arent uCM and uAT simply for connection establishment? > Can you successfully run the IB verbs ibv_rc_pingpong test suite? Between the two OpenIB nodes, I can run the ibv_rc_pingpong. > Anything special about your fabric configuration that could induce > this kind of latencies? Something on the fabric or in your remote > system is delaying ACK's beyond your total timeout/retry times. It has 3 machines on the switch : one is a netapp filer, which might be the source of the problem :( > > If you had no buffers posted or attempted to send to unregistered > memory you would get different errors. This is good, as it seems my code is trying to DTRT :) Thanks Aniruddha From jlentini at netapp.com Wed Nov 2 12:57:45 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 2 Nov 2005 15:57:45 -0500 (EST) Subject: [openib-general] Re: [OpenSM] SA database query tool In-Reply-To: <1130958126.4381.4109.camel@hal.voltaire.com> References: <1130958126.4381.4109.camel@hal.voltaire.com> Message-ID: On Wed, 2 Nov 2005, Hal Rosenstock wrote: > On Wed, 2005-11-02 at 13:51, James Lentini wrote: > > Hal, > > > > Is there an existing OpenIB tool that can query an SA's database using > > MADs? Specifically, I want to retrieve all of the SA's service > > records. > > The only current way is via ibis. Is ibis in the OpenIB tree? I've seen it somewhere, but I can't remember where. > > If such a tool doesn't exist, where would you start writing one? > > Would you layer it on top of libibmad? > > There are two approaches I can think of off the top of my head: > > 1. Support for this and other SA searches in the SM console > 2. Be able to obtain these remotely via building on top of umad in some > form (a real userspace SA client will likely be done in the not too > distant future). This sounds like a good approach since the tool could be used with SM's other than OpenSM. Is there an application in the OpenIB tree that is a good example of how to use the umad library? > What is the timeframe for this need ? I'm thinking of debugging tools that would be useful for me at SC05. > How formal do you need something in the interim ? I don't need anything formal. From eitan at mellanox.co.il Wed Nov 2 13:05:01 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 2 Nov 2005 23:05:01 +0200 Subject: [openib-general] Re: [OpenSM] SA database query tool Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618896@mtlexch01.mtl.com> Hi James, Assuming you ask for a remote access to the data through the IB network (In-Band): (but actually this will work also in loopback if run on the same IB port) If you need the query tool for scripting ibis is your best choice. A quick example for what you need to do to get all NodeInfoRecor's: ibis -port_num 1 sacNodeQuery getTable 0 # the zero is the comp mask. You could also read the ibis.c and the osm_vendor_sa_api.h to see how one can implement that in C. You could use umad directly Or decide to re-write the exiting software too. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, November 02, 2005 9:02 PM > To: James Lentini > Cc: openib-general > Subject: [openib-general] Re: [OpenSM] SA database query tool > > On Wed, 2005-11-02 at 13:51, James Lentini wrote: > > Hal, > > > > Is there an existing OpenIB tool that can query an SA's database using > > MADs? Specifically, I want to retrieve all of the SA's service > > records. > > The only current way is via ibis. > > > If such a tool doesn't exist, where would you start writing one? > > Would you layer it on top of libibmad? > > There are two approaches I can think of off the top of my head: > > 1. Support for this and other SA searches in the SM console > 2. Be able to obtain these remotely via building on top of umad in some > form (a real userspace SA client will likely be done in the not too > distant future). > > What is the timeframe for this need ? How formal do you need something > in the interim ? > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From eitan at mellanox.co.il Wed Nov 2 13:08:53 2005 From: eitan at mellanox.co.il (Eitan Zahavi) Date: Wed, 2 Nov 2005 23:08:53 +0200 Subject: [openib-general] Re: [OpenSM] SA database query tool Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E3618897@mtlexch01.mtl.com> Hi Again, Ibis is currently under: https://openib.org/svn/gen2/utils/src/linux-user/ibis A doc regarding how to write SA client queries is available in the file: doc/ibis_wrap.html If you will need more info or examples I will be happy to provide them. Eitan Zahavi Design Technology Director Mellanox Technologies LTD Tel:+972-4-9097208 Fax:+972-4-9593245 P.O. Box 586 Yokneam 20692 ISRAEL > -----Original Message----- > From: James Lentini [mailto:jlentini at netapp.com] > Sent: Wednesday, November 02, 2005 10:58 PM > To: Hal Rosenstock > Cc: openib-general > Subject: [openib-general] Re: [OpenSM] SA database query tool > > > > On Wed, 2 Nov 2005, Hal Rosenstock wrote: > > > On Wed, 2005-11-02 at 13:51, James Lentini wrote: > > > Hal, > > > > > > Is there an existing OpenIB tool that can query an SA's database using > > > MADs? Specifically, I want to retrieve all of the SA's service > > > records. > > > > The only current way is via ibis. > > Is ibis in the OpenIB tree? I've seen it somewhere, but I can't > remember where. > > > > If such a tool doesn't exist, where would you start writing one? > > > Would you layer it on top of libibmad? > > > > There are two approaches I can think of off the top of my head: > > > > 1. Support for this and other SA searches in the SM console > > 2. Be able to obtain these remotely via building on top of umad in some > > form (a real userspace SA client will likely be done in the not too > > distant future). > > This sounds like a good approach since the tool could be used with > SM's other than OpenSM. > > Is there an application in the OpenIB tree that is a good example > of how to use the umad library? > > > What is the timeframe for this need ? > > I'm thinking of debugging tools that would be useful for me at SC05. > > > How formal do you need something in the interim ? > > I don't need anything formal. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Wed Nov 2 13:11:15 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 02 Nov 2005 16:11:15 -0500 Subject: [openib-general] Re: [OpenSM] SA database query tool In-Reply-To: References: <1130958126.4381.4109.camel@hal.voltaire.com> Message-ID: <1130965874.4381.4249.camel@hal.voltaire.com> On Wed, 2005-11-02 at 15:57, James Lentini wrote: > On Wed, 2 Nov 2005, Hal Rosenstock wrote: > > > On Wed, 2005-11-02 at 13:51, James Lentini wrote: > > > Hal, > > > > > > Is there an existing OpenIB tool that can query an SA's database using > > > MADs? Specifically, I want to retrieve all of the SA's service > > > records. > > > > The only current way is via ibis. > > Is ibis in the OpenIB tree? I've seen it somewhere, but I can't > remember where. > > > > If such a tool doesn't exist, where would you start writing one? > > > Would you layer it on top of libibmad? > > > > There are two approaches I can think of off the top of my head: > > > > 1. Support for this and other SA searches in the SM console > > 2. Be able to obtain these remotely via building on top of umad in some > > form (a real userspace SA client will likely be done in the not too > > distant future). > > This sounds like a good approach since the tool could be used with > SM's other than OpenSM. > > Is there an application in the OpenIB tree that is a good example > of how to use the umad library? As I said there is likely a real SA client that will be developed. In the short term, you can use some diag as an example but these are SMP rather than GMP based (except for perfquery). There is some SA infrastructure in place but I'm not sure how well it works. Would you be using RMPP too as little has exercised it to date ? There's sa_call and just an ib_path_query right now (in libibmad/src/sa.c). A service query could be easily added. RMPP is not supported yet at this level. > > What is the timeframe for this need ? > > I'm thinking of debugging tools that would be useful for me at SC05. I was planning on using ibis at SC05 if this was needed. -- Hal > > How formal do you need something in the interim ? > > I don't need anything formal. From rolandd at cisco.com Wed Nov 2 13:34:24 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 02 Nov 2005 13:34:24 -0800 Subject: [openib-general] [PATCH/RFC v2] IB: Add SCSI RDMA Protocol (SRP) initiator In-Reply-To: <20051101002811.GD3107@esmail.cup.hp.com> (Grant Grundler's message of "Mon, 31 Oct 2005 16:28:11 -0800") References: <52wtjtk3d1.fsf@cisco.com> <20051101002811.GD3107@esmail.cup.hp.com> Message-ID: <52r79y91jz.fsf_-_@cisco.com> Here is an updated version of the patch to add an IB SRP initiator. I've incorporated Grant's suggestions, and also split off the SRP structures and constants into so that we can move ibmvscsi to sharing the same header file. Are there any more suggestions before I ask Linus to pull this code? Positive votes and/or vetoes are also appreciated. Thanks, Roland Subject: [PATCH] IB: Add SCSI RDMA Protocol (SRP) initiator Add an InfiniBand SCSI RDMA Protocol (SRP) initiator. This driver is used to talk talk to InfiniBand SRP targets (storage devices). Signed-off-by: Roland Dreier --- drivers/infiniband/Kconfig | 2 drivers/infiniband/Makefile | 1 drivers/infiniband/ulp/srp/Kbuild | 1 drivers/infiniband/ulp/srp/Kconfig | 11 drivers/infiniband/ulp/srp/ib_srp.c | 1696 +++++++++++++++++++++++++++++++++++ drivers/infiniband/ulp/srp/ib_srp.h | 150 +++ include/scsi/srp.h | 226 +++++ 7 files changed, 2087 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/srp/Kbuild create mode 100644 drivers/infiniband/ulp/srp/Kconfig create mode 100644 drivers/infiniband/ulp/srp/ib_srp.c create mode 100644 drivers/infiniband/ulp/srp/ib_srp.h create mode 100644 include/scsi/srp.h applies-to: d918cd1ba0ef9afa692cef281afee2f6d6634a1e c449fd3cc9e2194757c866cb1973fb98975331c8 diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 325d502..bdf0891 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -33,4 +33,6 @@ source "drivers/infiniband/hw/mthca/Kcon source "drivers/infiniband/ulp/ipoib/Kconfig" +source "drivers/infiniband/ulp/srp/Kconfig" + endmenu diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index d256cf7..a43fb34 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -1,3 +1,4 @@ obj-$(CONFIG_INFINIBAND) += core/ obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mthca/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ +obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ diff --git a/drivers/infiniband/ulp/srp/Kbuild b/drivers/infiniband/ulp/srp/Kbuild new file mode 100644 index 0000000..a16c73c --- /dev/null +++ b/drivers/infiniband/ulp/srp/Kbuild @@ -0,0 +1 @@ +obj-$(CONFIG_INFINIBAND_SRP) += ib_srp.o diff --git a/drivers/infiniband/ulp/srp/Kconfig b/drivers/infiniband/ulp/srp/Kconfig new file mode 100644 index 0000000..8fe3be4 --- /dev/null +++ b/drivers/infiniband/ulp/srp/Kconfig @@ -0,0 +1,11 @@ +config INFINIBAND_SRP + tristate "InfiniBand SCSI RDMA Protocol" + depends on INFINIBAND && SCSI + ---help--- + Support for the SCSI RDMA Protocol over InfiniBand. This + allows you to access storage devices that speak SRP over + InfiniBand. + + The SRP protocol is defined by the INCITS T10 technical + committee. See . + diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c new file mode 100644 index 0000000..502635a --- /dev/null +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -0,0 +1,1696 @@ +/* + * Copyright (c) 2005 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: ib_srp.c 3932 2005-11-01 17:19:29Z roland $ + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include +#include +#include +#include + +#include + +#include "ib_srp.h" + +#define DRV_NAME "ib_srp" +#define PFX DRV_NAME ": " +#define DRV_VERSION "0.2" +#define DRV_RELDATE "November 1, 2005" + +MODULE_AUTHOR("Roland Dreier"); +MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol initiator " + "v" DRV_VERSION " (" DRV_RELDATE ")"); +MODULE_LICENSE("Dual BSD/GPL"); + +static int topspin_workarounds = 1; + +module_param(topspin_workarounds, int, 0444); +MODULE_PARM_DESC(topspin_workarounds, + "Enable workarounds for Topspin/Cisco SRP target bugs if != 0"); + +static const u8 topspin_oui[3] = { 0x00, 0x05, 0xad }; + +static void srp_add_one(struct ib_device *device); +static void srp_remove_one(struct ib_device *device); +static void srp_completion(struct ib_cq *cq, void *target_ptr); +static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event); + +static struct ib_client srp_client = { + .name = "srp", + .add = srp_add_one, + .remove = srp_remove_one +}; + +static inline struct srp_target_port *host_to_target(struct Scsi_Host *host) +{ + return (struct srp_target_port *) host->hostdata; +} + +static const char *srp_target_info(struct Scsi_Host *host) +{ + return host_to_target(host)->target_name; +} + +static struct srp_iu *srp_alloc_iu(struct srp_host *host, size_t size, + gfp_t gfp_mask, + enum dma_data_direction direction) +{ + struct srp_iu *iu; + + iu = kmalloc(sizeof *iu, gfp_mask); + if (!iu) + goto out; + + iu->buf = kzalloc(size, gfp_mask); + if (!iu->buf) + goto out_free_iu; + + iu->dma = dma_map_single(host->dev->dma_device, iu->buf, size, direction); + if (dma_mapping_error(iu->dma)) + goto out_free_buf; + + iu->size = size; + iu->direction = direction; + + return iu; + +out_free_buf: + kfree(iu->buf); +out_free_iu: + kfree(iu); +out: + return NULL; +} + +static void srp_free_iu(struct srp_host *host, struct srp_iu *iu) +{ + if (!iu) + return; + + dma_unmap_single(host->dev->dma_device, iu->dma, iu->size, iu->direction); + kfree(iu->buf); + kfree(iu); +} + +static void srp_qp_event(struct ib_event *event, void *context) +{ + printk(KERN_ERR PFX "QP event %d\n", event->event); +} + +static int srp_init_qp(struct srp_target_port *target, + struct ib_qp *qp) +{ + struct ib_qp_attr *attr; + int ret; + + attr = kmalloc(sizeof *attr, GFP_KERNEL); + if (!attr) + return -ENOMEM; + + ret = ib_find_cached_pkey(target->srp_host->dev, + target->srp_host->port, + be16_to_cpu(target->path.pkey), + &attr->pkey_index); + if (ret) + return ret; + + attr->qp_state = IB_QPS_INIT; + attr->qp_access_flags = (IB_ACCESS_REMOTE_READ | + IB_ACCESS_REMOTE_WRITE); + attr->port_num = target->srp_host->port; + + return ib_modify_qp(qp, attr, + IB_QP_STATE | + IB_QP_PKEY_INDEX | + IB_QP_ACCESS_FLAGS | + IB_QP_PORT); +} + +static int srp_create_target_ib(struct srp_target_port *target) +{ + struct ib_qp_init_attr *init_attr; + int ret; + + init_attr = kzalloc(sizeof *init_attr, GFP_KERNEL); + if (!init_attr) + return -ENOMEM; + + target->cq = ib_create_cq(target->srp_host->dev, srp_completion, + NULL, target, SRP_CQ_SIZE); + if (IS_ERR(target->cq)) { + ret = PTR_ERR(target->cq); + goto out; + } + + ib_req_notify_cq(target->cq, IB_CQ_NEXT_COMP); + + init_attr->event_handler = srp_qp_event; + init_attr->cap.max_send_wr = SRP_SQ_SIZE; + init_attr->cap.max_recv_wr = SRP_RQ_SIZE; + init_attr->cap.max_recv_sge = 1; + init_attr->cap.max_send_sge = 1; + init_attr->sq_sig_type = IB_SIGNAL_ALL_WR; + init_attr->qp_type = IB_QPT_RC; + init_attr->send_cq = target->cq; + init_attr->recv_cq = target->cq; + + target->qp = ib_create_qp(target->srp_host->pd, init_attr); + if (IS_ERR(target->qp)) { + ret = PTR_ERR(target->qp); + ib_destroy_cq(target->cq); + goto out; + } + + ret = srp_init_qp(target, target->qp); + if (ret) { + ib_destroy_qp(target->qp); + ib_destroy_cq(target->cq); + goto out; + } + +out: + kfree(init_attr); + return ret; +} + +static void srp_free_target_ib(struct srp_target_port *target) +{ + int i; + + ib_destroy_qp(target->qp); + ib_destroy_cq(target->cq); + + for (i = 0; i < SRP_RQ_SIZE; ++i) + srp_free_iu(target->srp_host, target->rx_ring[i]); + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) + srp_free_iu(target->srp_host, target->tx_ring[i]); +} + +static void srp_path_rec_completion(int status, + struct ib_sa_path_rec *pathrec, + void *target_ptr) +{ + struct srp_target_port *target = target_ptr; + + target->status = status; + if (status) + printk(KERN_ERR PFX "Got failed path rec status %d\n", status); + else + target->path = *pathrec; + complete(&target->done); +} + +static int srp_lookup_path(struct srp_target_port *target) +{ + target->path.numb_path = 1; + + init_completion(&target->done); + + target->path_query_id = ib_sa_path_rec_get(target->srp_host->dev, + target->srp_host->port, + &target->path, + IB_SA_PATH_REC_DGID | + IB_SA_PATH_REC_SGID | + IB_SA_PATH_REC_NUMB_PATH | + IB_SA_PATH_REC_PKEY, + SRP_PATH_REC_TIMEOUT_MS, + GFP_KERNEL, + srp_path_rec_completion, + target, &target->path_query); + if (target->path_query_id < 0) + return target->path_query_id; + + wait_for_completion(&target->done); + + if (target->status < 0) + printk(KERN_WARNING PFX "Path record query failed\n"); + + return target->status; +} + +static int srp_send_req(struct srp_target_port *target) +{ + struct { + struct ib_cm_req_param param; + struct srp_login_req priv; + } *req = NULL; + int status; + + req = kzalloc(sizeof *req, GFP_KERNEL); + if (!req) + return -ENOMEM; + + req->param.primary_path = &target->path; + req->param.alternate_path = NULL; + req->param.service_id = target->service_id; + req->param.qp_num = target->qp->qp_num; + req->param.qp_type = target->qp->qp_type; + req->param.private_data = &req->priv; + req->param.private_data_len = sizeof req->priv; + req->param.flow_control = 1; + + get_random_bytes(&req->param.starting_psn, 4); + req->param.starting_psn &= 0xffffff; + + /* + * Pick some arbitrary defaults here; we could make these + * module parameters if anyone cared about setting them. + */ + req->param.responder_resources = 4; + req->param.remote_cm_response_timeout = 20; + req->param.local_cm_response_timeout = 20; + req->param.retry_count = 7; + req->param.rnr_retry_count = 7; + req->param.max_cm_retries = 15; + + req->priv.opcode = SRP_LOGIN_REQ; + req->priv.tag = 0; + req->priv.req_it_iu_len = cpu_to_be32(SRP_MAX_IU_LEN); + req->priv.req_buf_fmt = cpu_to_be16(SRP_BUF_FORMAT_DIRECT | + SRP_BUF_FORMAT_INDIRECT); + memcpy(req->priv.initiator_port_id, target->srp_host->initiator_port_id, 16); + /* + * Topspin/Cisco SRP targets will reject our login unless we + * zero out the first 8 bytes of our initiator port ID. The + * second 8 bytes must be our local node GUID, but we always + * use that anyway. + */ + if (topspin_workarounds && !memcmp(&target->ioc_guid, topspin_oui, 3)) { + printk(KERN_DEBUG PFX "Topspin/Cisco initiator port ID workaround " + "activated for target GUID %016llx\n", + (unsigned long long) be64_to_cpu(target->ioc_guid)); + memset(req->priv.initiator_port_id, 0, 8); + } + memcpy(req->priv.target_port_id, &target->id_ext, 8); + memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8); + + status = ib_send_cm_req(target->cm_id, &req->param); + + kfree(req); + + return status; +} + +static void srp_disconnect_target(struct srp_target_port *target) +{ + /* XXX should send SRP_I_LOGOUT request */ + + init_completion(&target->done); + ib_send_cm_dreq(target->cm_id, NULL, 0); + wait_for_completion(&target->done); +} + +static void srp_remove_work(void *target_ptr) +{ + struct srp_target_port *target = target_ptr; + + spin_lock_irq(target->scsi_host->host_lock); + if (target->state != SRP_TARGET_DEAD) { + spin_unlock_irq(target->scsi_host->host_lock); + scsi_host_put(target->scsi_host); + return; + } + target->state = SRP_TARGET_REMOVED; + spin_unlock_irq(target->scsi_host->host_lock); + + down(&target->srp_host->target_mutex); + list_del(&target->list); + up(&target->srp_host->target_mutex); + + scsi_remove_host(target->scsi_host); + ib_destroy_cm_id(target->cm_id); + srp_free_target_ib(target); + scsi_host_put(target->scsi_host); + /* And another put to really free the target port... */ + scsi_host_put(target->scsi_host); +} + +static int srp_connect_target(struct srp_target_port *target) +{ + int ret; + + ret = srp_lookup_path(target); + if (ret) + return ret; + + while (1) { + init_completion(&target->done); + ret = srp_send_req(target); + if (ret) + return ret; + wait_for_completion(&target->done); + + /* + * The CM event handling code will set status to + * SRP_PORT_REDIRECT if we get a port redirect REJ + * back, or SRP_DLID_REDIRECT if we get a lid/qp + * redirect REJ back. + */ + switch (target->status) { + case 0: + return 0; + + case SRP_PORT_REDIRECT: + ret = srp_lookup_path(target); + if (ret) + return ret; + break; + + case SRP_DLID_REDIRECT: + break; + + default: + return target->status; + } + } +} + +static int srp_reconnect_target(struct srp_target_port *target) +{ + struct ib_cm_id *new_cm_id; + struct ib_qp_attr qp_attr; + struct srp_request *req; + struct ib_wc wc; + int ret; + int i; + + spin_lock_irq(target->scsi_host->host_lock); + if (target->state != SRP_TARGET_LIVE) { + spin_unlock_irq(target->scsi_host->host_lock); + return -EAGAIN; + } + target->state = SRP_TARGET_CONNECTING; + spin_unlock_irq(target->scsi_host->host_lock); + + srp_disconnect_target(target); + /* + * Now get a new local CM ID so that we avoid confusing the + * target in case things are really fouled up. + */ + new_cm_id = ib_create_cm_id(target->srp_host->dev, + srp_cm_handler, target); + if (IS_ERR(new_cm_id)) { + ret = PTR_ERR(new_cm_id); + goto err; + } + ib_destroy_cm_id(target->cm_id); + target->cm_id = new_cm_id; + + qp_attr.qp_state = IB_QPS_RESET; + ret = ib_modify_qp(target->qp, &qp_attr, IB_QP_STATE); + if (ret) + goto err; + + ret = srp_init_qp(target, target->qp); + if (ret) + goto err; + + while (ib_poll_cq(target->cq, 1, &wc) > 0) + ; /* nothing */ + + list_for_each_entry(req, &target->req_queue, list) { + req->scmnd->result = DID_RESET << 16; + req->scmnd->scsi_done(req->scmnd); + } + + target->rx_head = 0; + target->tx_head = 0; + target->tx_tail = 0; + target->req_head = 0; + for (i = 0; i < SRP_SQ_SIZE - 1; ++i) + target->req_ring[i].next = i + 1; + target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->req_queue); + + ret = srp_connect_target(target); + if (ret) + goto err; + + spin_lock_irq(target->scsi_host->host_lock); + if (target->state == SRP_TARGET_CONNECTING) { + ret = 0; + target->state = SRP_TARGET_LIVE; + } else + ret = -EAGAIN; + spin_unlock_irq(target->scsi_host->host_lock); + + return ret; + +err: + printk(KERN_ERR PFX "reconnect failed (%d), removing target port.\n", ret); + + /* + * We couldn't reconnect, so kill our target port off. + * However, we have to defer the real removal because we might + * be in the context of the SCSI error handler now, which + * would deadlock if we call scsi_remove_host(). + */ + spin_lock_irq(target->scsi_host->host_lock); + if (target->state == SRP_TARGET_CONNECTING) { + target->state = SRP_TARGET_DEAD; + INIT_WORK(&target->work, srp_remove_work, target); + schedule_work(&target->work); + } + spin_unlock_irq(target->scsi_host->host_lock); + + return ret; +} + +static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target, + struct srp_request *req) +{ + struct srp_cmd *cmd = req->cmd->buf; + int len; + u8 fmt; + + if (!scmnd->request_buffer || scmnd->sc_data_direction == DMA_NONE) + return sizeof (struct srp_cmd); + + if (scmnd->sc_data_direction != DMA_FROM_DEVICE && + scmnd->sc_data_direction != DMA_TO_DEVICE) { + printk(KERN_WARNING PFX "Unhandled data direction %d\n", + scmnd->sc_data_direction); + return -EINVAL; + } + + if (scmnd->use_sg) { + struct scatterlist *scat = scmnd->request_buffer; + int n; + int i; + + n = dma_map_sg(target->srp_host->dev->dma_device, + scat, scmnd->use_sg, scmnd->sc_data_direction); + + if (n == 1) { + struct srp_direct_buf *buf = (void *) cmd->add_data; + + fmt = SRP_DATA_DESC_DIRECT; + + buf->va = cpu_to_be64(sg_dma_address(scat)); + buf->key = cpu_to_be32(target->srp_host->mr->rkey); + buf->len = cpu_to_be32(sg_dma_len(scat)); + + len = sizeof (struct srp_cmd) + + sizeof (struct srp_direct_buf); + } else { + struct srp_indirect_buf *buf = (void *) cmd->add_data; + u32 datalen = 0; + + fmt = SRP_DATA_DESC_INDIRECT; + + if (scmnd->sc_data_direction == DMA_TO_DEVICE) + cmd->data_out_desc_cnt = n; + else + cmd->data_in_desc_cnt = n; + + buf->table_desc.va = cpu_to_be64(req->cmd->dma + + sizeof *cmd + + sizeof *buf); + buf->table_desc.key = + cpu_to_be32(target->srp_host->mr->rkey); + buf->table_desc.len = + cpu_to_be32(n * sizeof (struct srp_direct_buf)); + + for (i = 0; i < n; ++i) { + buf->desc_list[i].va = cpu_to_be64(sg_dma_address(&scat[i])); + buf->desc_list[i].key = + cpu_to_be32(target->srp_host->mr->rkey); + buf->desc_list[i].len = cpu_to_be32(sg_dma_len(&scat[i])); + + datalen += sg_dma_len(&scat[i]); + } + + buf->len = cpu_to_be32(datalen); + + len = sizeof (struct srp_cmd) + + sizeof (struct srp_indirect_buf) + + n * sizeof (struct srp_direct_buf); + } + } else { + struct srp_direct_buf *buf = (void *) cmd->add_data; + dma_addr_t dma; + + dma = dma_map_single(target->srp_host->dev->dma_device, + scmnd->request_buffer, scmnd->request_bufflen, + scmnd->sc_data_direction); + if (dma_mapping_error(dma)) { + printk(KERN_WARNING PFX "unable to map %p/%d (dir %d)\n", + scmnd->request_buffer, (int) scmnd->request_bufflen, + scmnd->sc_data_direction); + return -EINVAL; + } + + pci_unmap_addr_set(req, direct_mapping, dma); + + buf->va = cpu_to_be64(dma); + buf->key = cpu_to_be32(target->srp_host->mr->rkey); + buf->len = cpu_to_be32(scmnd->request_bufflen); + + fmt = SRP_DATA_DESC_DIRECT; + + len = sizeof (struct srp_cmd) + sizeof (struct srp_direct_buf); + } + + if (scmnd->sc_data_direction == DMA_TO_DEVICE) + cmd->buf_fmt = fmt << 4; + else + cmd->buf_fmt = fmt; + + + return len; +} + +static void srp_unmap_data(struct scsi_cmnd *scmnd, + struct srp_target_port *target, + struct srp_request *req) +{ + if (!scmnd->request_buffer || + (scmnd->sc_data_direction != DMA_TO_DEVICE && + scmnd->sc_data_direction != DMA_FROM_DEVICE)) + return; + + if (scmnd->use_sg) + dma_unmap_sg(target->srp_host->dev->dma_device, + (struct scatterlist *) scmnd->request_buffer, + scmnd->use_sg, scmnd->sc_data_direction); + else + dma_unmap_single(target->srp_host->dev->dma_device, + pci_unmap_addr(req, direct_mapping), + scmnd->request_bufflen, + scmnd->sc_data_direction); +} + +static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp) +{ + struct srp_request *req; + struct scsi_cmnd *scmnd; + unsigned long flags; + s32 delta; + + delta = (s32) be32_to_cpu(rsp->req_lim_delta); + + spin_lock_irqsave(target->scsi_host->host_lock, flags); + + target->req_lim += delta; + + req = &target->req_ring[rsp->tag & ~SRP_TAG_TSK_MGMT]; + + if (unlikely(rsp->tag & SRP_TAG_TSK_MGMT)) { + if (be32_to_cpu(rsp->resp_data_len) < 4) + req->tsk_status = -1; + else + req->tsk_status = rsp->data[3]; + complete(&req->done); + } else { + scmnd = req->scmnd; + if (!scmnd) + printk(KERN_ERR "Null scmnd for RSP w/tag %016llx\n", + (unsigned long long) rsp->tag); + scmnd->result = rsp->status; + + if (rsp->flags & SRP_RSP_FLAG_SNSVALID) { + memcpy(scmnd->sense_buffer, rsp->data + + be32_to_cpu(rsp->resp_data_len), + min_t(int, be32_to_cpu(rsp->sense_data_len), + SCSI_SENSE_BUFFERSIZE)); + } + + if (rsp->flags & (SRP_RSP_FLAG_DOOVER | SRP_RSP_FLAG_DOUNDER)) + scmnd->resid = be32_to_cpu(rsp->data_out_res_cnt); + else if (rsp->flags & (SRP_RSP_FLAG_DIOVER | SRP_RSP_FLAG_DIUNDER)) + scmnd->resid = be32_to_cpu(rsp->data_in_res_cnt); + + srp_unmap_data(scmnd, target, req); + + if (!req->tsk_mgmt) { + req->scmnd = NULL; + scmnd->host_scribble = (void *) -1L; + scmnd->scsi_done(scmnd); + + list_del(&req->list); + req->next = target->req_head; + target->req_head = rsp->tag & ~SRP_TAG_TSK_MGMT; + } else + req->cmd_done = 1; + } + + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); +} + +static void srp_reconnect_work(void *target_ptr) +{ + struct srp_target_port *target = target_ptr; + + srp_reconnect_target(target); +} + +static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc) +{ + struct srp_iu *iu; + u8 opcode; + + iu = target->rx_ring[wc->wr_id & ~SRP_OP_RECV]; + + dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, + target->max_ti_iu_len, DMA_FROM_DEVICE); + + opcode = *(u8 *) iu->buf; + + if (0) { + int i; + + printk(KERN_ERR PFX "recv completion, opcode 0x%02x\n", opcode); + + for (i = 0; i < wc->byte_len; ++i) { + if (i % 8 == 0) + printk(KERN_ERR " [%02x] ", i); + printk(" %02x", ((u8 *) iu->buf)[i]); + if ((i + 1) % 8 == 0) + printk("\n"); + } + + if (wc->byte_len % 8) + printk("\n"); + } + + switch (opcode) { + case SRP_RSP: + srp_process_rsp(target, iu->buf); + break; + + case SRP_T_LOGOUT: + /* XXX Handle target logout */ + printk(KERN_WARNING PFX "Got target logout request\n"); + break; + + default: + printk(KERN_WARNING PFX "Unhandled SRP opcode 0x%02x\n", opcode); + break; + } + + dma_sync_single_for_device(target->srp_host->dev->dma_device, iu->dma, + target->max_ti_iu_len, DMA_FROM_DEVICE); +} + +static void srp_completion(struct ib_cq *cq, void *target_ptr) +{ + struct srp_target_port *target = target_ptr; + struct ib_wc wc; + unsigned long flags; + + ib_req_notify_cq(cq, IB_CQ_NEXT_COMP); + while (ib_poll_cq(cq, 1, &wc) > 0) { + if (wc.status) { + printk(KERN_ERR PFX "failed %s status %d\n", + wc.wr_id & SRP_OP_RECV ? "receive" : "send", + wc.status); + spin_lock_irqsave(target->scsi_host->host_lock, flags); + if (target->state == SRP_TARGET_LIVE) + schedule_work(&target->work); + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); + break; + } + + if (wc.wr_id & SRP_OP_RECV) + srp_handle_recv(target, &wc); + else + ++target->tx_tail; + } +} + +static int __srp_post_recv(struct srp_target_port *target) +{ + struct srp_iu *iu; + struct ib_sge list; + struct ib_recv_wr wr, *bad_wr; + unsigned int next; + int ret; + + next = target->rx_head & (SRP_RQ_SIZE - 1); + wr.wr_id = next | SRP_OP_RECV; + iu = target->rx_ring[next]; + + list.addr = iu->dma; + list.length = iu->size; + list.lkey = target->srp_host->mr->lkey; + + wr.next = NULL; + wr.sg_list = &list; + wr.num_sge = 1; + + ret = ib_post_recv(target->qp, &wr, &bad_wr); + if (!ret) + ++target->rx_head; + + return ret; +} + +static int srp_post_recv(struct srp_target_port *target) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(target->scsi_host->host_lock, flags); + ret = __srp_post_recv(target); + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); + + return ret; +} + +/* + * Must be called with target->scsi_host->host_lock held to protect + * req_lim and tx_head. + */ +static struct srp_iu *__srp_get_tx_iu(struct srp_target_port *target) +{ + if (target->tx_head - target->tx_tail >= SRP_SQ_SIZE) + return NULL; + + return target->tx_ring[target->tx_head & SRP_SQ_SIZE]; +} + +/* + * Must be called with target->scsi_host->host_lock held to protect + * req_lim and tx_head. + */ +static int __srp_post_send(struct srp_target_port *target, + struct srp_iu *iu, int len) +{ + struct ib_sge list; + struct ib_send_wr wr, *bad_wr; + int ret = 0; + + if (target->req_lim < 1) { + printk(KERN_ERR PFX "Target has req_lim %d\n", target->req_lim); + return -EAGAIN; + } + + list.addr = iu->dma; + list.length = len; + list.lkey = target->srp_host->mr->lkey; + + wr.next = NULL; + wr.wr_id = target->tx_head & SRP_SQ_SIZE; + wr.sg_list = &list; + wr.num_sge = 1; + wr.opcode = IB_WR_SEND; + wr.send_flags = IB_SEND_SIGNALED; + + ret = ib_post_send(target->qp, &wr, &bad_wr); + + if (!ret) { + ++target->tx_head; + --target->req_lim; + } + + return ret; +} + +static int srp_queuecommand(struct scsi_cmnd *scmnd, + void (*done)(struct scsi_cmnd *)) +{ + struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_request *req; + struct srp_iu *iu; + struct srp_cmd *cmd; + long req_index; + int len; + + if (target->state == SRP_TARGET_CONNECTING) + goto err; + + if (target->state == SRP_TARGET_DEAD || + target->state == SRP_TARGET_REMOVED) { + scmnd->result = DID_BAD_TARGET << 16; + done(scmnd); + return 0; + } + + iu = __srp_get_tx_iu(target); + if (!iu) + goto err; + + dma_sync_single_for_cpu(target->srp_host->dev->dma_device, iu->dma, + SRP_MAX_IU_LEN, DMA_TO_DEVICE); + + req_index = target->req_head; + + scmnd->scsi_done = done; + scmnd->result = 0; + scmnd->host_scribble = (void *) req_index; + + cmd = iu->buf; + memset(cmd, 0, sizeof *cmd); + + cmd->opcode = SRP_CMD; + cmd->lun = cpu_to_be64((u64) scmnd->device->lun << 48); + cmd->tag = req_index; + memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len); + + req = &target->req_ring[req_index]; + + req->scmnd = scmnd; + req->cmd = iu; + req->cmd_done = 0; + req->tsk_mgmt = NULL; + + len = srp_map_data(scmnd, target, req); + if (len < 0) { + printk(KERN_ERR PFX "Failed to map data\n"); + goto err; + } + + if (__srp_post_recv(target)) { + printk(KERN_ERR PFX "Recv failed\n"); + goto err_unmap; + } + + dma_sync_single_for_device(target->srp_host->dev->dma_device, iu->dma, + SRP_MAX_IU_LEN, DMA_TO_DEVICE); + + if (__srp_post_send(target, iu, len)) { + printk(KERN_ERR PFX "Send failed\n"); + goto err_unmap; + } + + target->req_head = req->next; + list_add_tail(&req->list, &target->req_queue); + + return 0; + +err_unmap: + srp_unmap_data(scmnd, target, req); + +err: + return SCSI_MLQUEUE_HOST_BUSY; +} + +static int srp_alloc_iu_bufs(struct srp_target_port *target) +{ + int i; + + for (i = 0; i < SRP_RQ_SIZE; ++i) { + target->rx_ring[i] = srp_alloc_iu(target->srp_host, + target->max_ti_iu_len, + GFP_KERNEL, DMA_FROM_DEVICE); + if (!target->rx_ring[i]) + goto err; + } + + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) { + target->tx_ring[i] = srp_alloc_iu(target->srp_host, + SRP_MAX_IU_LEN, + GFP_KERNEL, DMA_TO_DEVICE); + if (!target->tx_ring[i]) + goto err; + } + + return 0; + +err: + for (i = 0; i < SRP_RQ_SIZE; ++i) { + srp_free_iu(target->srp_host, target->rx_ring[i]); + target->rx_ring[i] = NULL; + } + + for (i = 0; i < SRP_SQ_SIZE + 1; ++i) { + srp_free_iu(target->srp_host, target->tx_ring[i]); + target->tx_ring[i] = NULL; + } + + return -ENOMEM; +} + +static void srp_cm_rej_handler(struct ib_cm_id *cm_id, + struct ib_cm_event *event, + struct srp_target_port *target) +{ + struct ib_class_port_info *cpi; + int opcode; + + switch (event->param.rej_rcvd.reason) { + case IB_CM_REJ_PORT_CM_REDIRECT: + cpi = event->param.rej_rcvd.ari; + target->path.dlid = cpi->redirect_lid; + target->path.pkey = cpi->redirect_pkey; + cm_id->remote_cm_qpn = be32_to_cpu(cpi->redirect_qp) & 0x00ffffff; + memcpy(target->path.dgid.raw, cpi->redirect_gid, 16); + + target->status = target->path.dlid ? + SRP_DLID_REDIRECT : SRP_PORT_REDIRECT; + break; + + case IB_CM_REJ_PORT_REDIRECT: + if (topspin_workarounds && + !memcmp(&target->ioc_guid, topspin_oui, 3)) { + /* + * Topspin/Cisco SRP gateways incorrectly send + * reject reason code 25 when they mean 24 + * (port redirect). + */ + memcpy(target->path.dgid.raw, + event->param.rej_rcvd.ari, 16); + + printk(KERN_DEBUG PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n", + (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix), + (unsigned long long) be64_to_cpu(target->path.dgid.global.interface_id)); + + target->status = SRP_PORT_REDIRECT; + } else { + printk(KERN_WARNING " REJ reason: IB_CM_REJ_PORT_REDIRECT\n"); + target->status = -ECONNRESET; + } + break; + + case IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID: + printk(KERN_WARNING " REJ reason: IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID\n"); + target->status = -ECONNRESET; + break; + + case IB_CM_REJ_CONSUMER_DEFINED: + opcode = *(u8 *) event->private_data; + if (opcode == SRP_LOGIN_REJ) { + struct srp_login_rej *rej = event->private_data; + u32 reason = be32_to_cpu(rej->reason); + + if (reason == SRP_LOGIN_REJ_REQ_IT_IU_LENGTH_TOO_LARGE) + printk(KERN_WARNING PFX + "SRP_LOGIN_REJ: requested max_it_iu_len too large\n"); + else + printk(KERN_WARNING PFX + "SRP LOGIN REJECTED, reason 0x%08x\n", reason); + } else + printk(KERN_WARNING " REJ reason: IB_CM_REJ_CONSUMER_DEFINED," + " opcode 0x%02x\n", opcode); + target->status = -ECONNRESET; + break; + + default: + printk(KERN_WARNING " REJ reason 0x%x\n", + event->param.rej_rcvd.reason); + target->status = -ECONNRESET; + } +} + +static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event) +{ + struct srp_target_port *target = cm_id->context; + struct ib_qp_attr *qp_attr = NULL; + int attr_mask = 0; + int comp = 0; + int opcode = 0; + + switch (event->event) { + case IB_CM_REQ_ERROR: + printk(KERN_DEBUG PFX "Sending CM REQ failed\n"); + comp = 1; + target->status = -ECONNRESET; + break; + + case IB_CM_REP_RECEIVED: + comp = 1; + opcode = *(u8 *) event->private_data; + + if (opcode == SRP_LOGIN_RSP) { + struct srp_login_rsp *rsp = event->private_data; + + target->max_ti_iu_len = be32_to_cpu(rsp->max_ti_iu_len); + target->req_lim = be32_to_cpu(rsp->req_lim_delta); + + target->scsi_host->can_queue = min(target->req_lim, + target->scsi_host->can_queue); + } else { + printk(KERN_WARNING PFX "Unhandled RSP opcode %#x\n", opcode); + target->status = -ECONNRESET; + break; + } + + target->status = srp_alloc_iu_bufs(target); + if (target->status) + break; + + qp_attr = kmalloc(sizeof *qp_attr, GFP_KERNEL); + if (!qp_attr) { + target->status = -ENOMEM; + break; + } + + qp_attr->qp_state = IB_QPS_RTR; + target->status = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask); + if (target->status) + break; + + target->status = ib_modify_qp(target->qp, qp_attr, attr_mask); + if (target->status) + break; + + target->status = srp_post_recv(target); + if (target->status) + break; + + qp_attr->qp_state = IB_QPS_RTS; + target->status = ib_cm_init_qp_attr(cm_id, qp_attr, &attr_mask); + if (target->status) + break; + + target->status = ib_modify_qp(target->qp, qp_attr, attr_mask); + if (target->status) + break; + + target->status = ib_send_cm_rtu(cm_id, NULL, 0); + if (target->status) + break; + + break; + + case IB_CM_REJ_RECEIVED: + printk(KERN_DEBUG PFX "REJ received\n"); + comp = 1; + + srp_cm_rej_handler(cm_id, event, target); + break; + + case IB_CM_MRA_RECEIVED: + printk(KERN_ERR PFX "MRA received\n"); + break; + + case IB_CM_DREP_RECEIVED: + break; + + case IB_CM_TIMEWAIT_EXIT: + printk(KERN_ERR PFX "connection closed\n"); + + comp = 1; + target->status = 0; + break; + + default: + printk(KERN_WARNING PFX "Unhandled CM event %d\n", event->event); + break; + } + + if (comp) + complete(&target->done); + + kfree(qp_attr); + + return 0; +} + +static int srp_send_tsk_mgmt(struct scsi_cmnd *scmnd, u8 func) +{ + struct srp_target_port *target = host_to_target(scmnd->device->host); + struct srp_request *req; + struct srp_iu *iu; + struct srp_tsk_mgmt *tsk_mgmt; + int req_index; + int ret = FAILED; + + spin_lock_irq(target->scsi_host->host_lock); + + if (scmnd->host_scribble == (void *) -1L) + goto out; + + req_index = (long) scmnd->host_scribble; + printk(KERN_ERR "Abort for req_index %d\n", req_index); + + req = &target->req_ring[req_index]; + init_completion(&req->done); + + iu = __srp_get_tx_iu(target); + if (!iu) + goto out; + + tsk_mgmt = iu->buf; + memset(tsk_mgmt, 0, sizeof *tsk_mgmt); + + tsk_mgmt->opcode = SRP_TSK_MGMT; + tsk_mgmt->lun = cpu_to_be64((u64) scmnd->device->lun << 48); + tsk_mgmt->tag = req_index | SRP_TAG_TSK_MGMT; + tsk_mgmt->tsk_mgmt_func = func; + tsk_mgmt->task_tag = req_index; + + if (__srp_post_send(target, iu, sizeof *tsk_mgmt)) + goto out; + + req->tsk_mgmt = iu; + + spin_unlock_irq(target->scsi_host->host_lock); + if (!wait_for_completion_timeout(&req->done, + msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS))) + return FAILED; + spin_lock_irq(target->scsi_host->host_lock); + + if (req->cmd_done) { + list_del(&req->list); + req->next = target->req_head; + target->req_head = req_index; + + scmnd->scsi_done(scmnd); + } else if (!req->tsk_status) { + scmnd->result = DID_ABORT << 16; + ret = SUCCESS; + } + +out: + spin_unlock_irq(target->scsi_host->host_lock); + return ret; +} + +static int srp_abort(struct scsi_cmnd *scmnd) +{ + printk(KERN_ERR "SRP abort called\n"); + + return srp_send_tsk_mgmt(scmnd, SRP_TSK_ABORT_TASK); +} + +static int srp_reset_device(struct scsi_cmnd *scmnd) +{ + printk(KERN_ERR "SRP reset_device called\n"); + + return srp_send_tsk_mgmt(scmnd, SRP_TSK_LUN_RESET); +} + +static int srp_reset_host(struct scsi_cmnd *scmnd) +{ + struct srp_target_port *target = host_to_target(scmnd->device->host); + int ret = FAILED; + + printk(KERN_ERR PFX "SRP reset_host called\n"); + + if (!srp_reconnect_target(target)) + ret = SUCCESS; + + return ret; +} + +static struct scsi_host_template srp_template = { + .module = THIS_MODULE, + .name = DRV_NAME, + .info = srp_target_info, + .queuecommand = srp_queuecommand, + .eh_abort_handler = srp_abort, + .eh_device_reset_handler = srp_reset_device, + .eh_host_reset_handler = srp_reset_host, + .can_queue = SRP_SQ_SIZE, + .this_id = -1, + .sg_tablesize = SRP_MAX_INDIRECT, + .cmd_per_lun = SRP_SQ_SIZE, + .use_clustering = ENABLE_CLUSTERING +}; + +static int srp_add_target(struct srp_host *host, struct srp_target_port *target) +{ + sprintf(target->target_name, "SRP.T10:%016llX", + (unsigned long long) be64_to_cpu(target->id_ext)); + + if (scsi_add_host(target->scsi_host, host->dev->dma_device)) + return -ENODEV; + + down(&host->target_mutex); + list_add_tail(&target->list, &host->target_list); + up(&host->target_mutex); + + target->state = SRP_TARGET_LIVE; + + /* XXX: are we supposed to have a definition of SCAN_WILD_CARD ?? */ + scsi_scan_target(&target->scsi_host->shost_gendev, + 0, target->scsi_id, ~0, 0); + + return 0; +} + +static void srp_release_class_dev(struct class_device *class_dev) +{ + struct srp_host *host = + container_of(class_dev, struct srp_host, class_dev); + + complete(&host->released); +} + +static struct class srp_class = { + .name = "infiniband_srp", + .release = srp_release_class_dev +}; + +/* + * Target ports are added by writing + * + * id_ext=,ioc_guid=,dgid=, + * pkey=,service_id= + * + * to the add_target sysfs attribute. + */ +enum { + SRP_OPT_ERR = 0, + SRP_OPT_ID_EXT = 1 << 0, + SRP_OPT_IOC_GUID = 1 << 1, + SRP_OPT_DGID = 1 << 2, + SRP_OPT_PKEY = 1 << 3, + SRP_OPT_SERVICE_ID = 1 << 4, + SRP_OPT_MAX_SECT = 1 << 5, + SRP_OPT_ALL = (SRP_OPT_ID_EXT | + SRP_OPT_IOC_GUID | + SRP_OPT_DGID | + SRP_OPT_PKEY | + SRP_OPT_SERVICE_ID), +}; + +static match_table_t srp_opt_tokens = { + { SRP_OPT_ID_EXT, "id_ext=%s" }, + { SRP_OPT_IOC_GUID, "ioc_guid=%s" }, + { SRP_OPT_DGID, "dgid=%s" }, + { SRP_OPT_PKEY, "pkey=%x" }, + { SRP_OPT_SERVICE_ID, "service_id=%s" }, + { SRP_OPT_MAX_SECT, "max_sect=%d" }, + { SRP_OPT_ERR, NULL } +}; + +static int srp_parse_options(const char *buf, struct srp_target_port *target) +{ + char *options, *sep_opt; + char *p; + char dgid[3]; + substring_t args[MAX_OPT_ARGS]; + int opt_mask = 0; + int token; + int ret = -EINVAL; + int i; + + options = kstrdup(buf, GFP_KERNEL); + if (!options) + return -ENOMEM; + + sep_opt = options; + while ((p = strsep(&sep_opt, ",")) != NULL) { + if (!*p) + continue; + + token = match_token(p, srp_opt_tokens, args); + opt_mask |= token; + + switch (token) { + case SRP_OPT_ID_EXT: + p = match_strdup(args); + target->id_ext = cpu_to_be64(simple_strtoull(p, NULL, 16)); + kfree(p); + break; + + case SRP_OPT_IOC_GUID: + p = match_strdup(args); + target->ioc_guid = cpu_to_be64(simple_strtoull(p, NULL, 16)); + kfree(p); + break; + + case SRP_OPT_DGID: + p = match_strdup(args); + if (strlen(p) != 32) { + printk(KERN_WARNING PFX "bad dest GID parameter '%s'\n", p); + goto out; + } + + for (i = 0; i < 16; ++i) { + strlcpy(dgid, p + i * 2, 3); + target->path.dgid.raw[i] = simple_strtoul(dgid, NULL, 16); + } + break; + + case SRP_OPT_PKEY: + if (match_hex(args, &token)) { + printk(KERN_WARNING PFX "bad P_Key parameter '%s'\n", p); + goto out; + } + target->path.pkey = cpu_to_be16(token); + break; + + case SRP_OPT_SERVICE_ID: + p = match_strdup(args); + target->service_id = cpu_to_be64(simple_strtoull(p, NULL, 16)); + kfree(p); + break; + + case SRP_OPT_MAX_SECT: + if (match_int(args, &token)) { + printk(KERN_WARNING PFX "bad max sect parameter '%s'\n", p); + goto out; + } + target->scsi_host->max_sectors = token; + break; + + default: + printk(KERN_WARNING PFX "unknown parameter or missing value " + "'%s' in target creation request\n", p); + goto out; + } + } + + if ((opt_mask & SRP_OPT_ALL) == SRP_OPT_ALL) + ret = 0; + else + for (i = 0; i < ARRAY_SIZE(srp_opt_tokens); ++i) + if ((srp_opt_tokens[i].token & SRP_OPT_ALL) && + !(srp_opt_tokens[i].token & opt_mask)) + printk(KERN_WARNING PFX "target creation request is " + "missing parameter '%s'\n", + srp_opt_tokens[i].pattern); + +out: + kfree(options); + return ret; +} + +static ssize_t srp_create_target(struct class_device *class_dev, + const char *buf, size_t count) +{ + struct srp_host *host = + container_of(class_dev, struct srp_host, class_dev); + struct Scsi_Host *target_host; + struct srp_target_port *target; + int ret; + int i; + + target_host = scsi_host_alloc(&srp_template, + sizeof (struct srp_target_port)); + if (!target_host) + return -ENOMEM; + + target = host_to_target(target_host); + memset(target, 0, sizeof *target); + + target->scsi_host = target_host; + target->srp_host = host; + + INIT_WORK(&target->work, srp_reconnect_work, target); + + for (i = 0; i < SRP_SQ_SIZE - 1; ++i) + target->req_ring[i].next = i + 1; + target->req_ring[SRP_SQ_SIZE - 1].next = -1; + INIT_LIST_HEAD(&target->req_queue); + + ret = srp_parse_options(buf, target); + if (ret) + goto err; + + ib_get_cached_gid(host->dev, host->port, 0, &target->path.sgid); + + printk(KERN_DEBUG PFX "new target: id_ext %016llx ioc_guid %016llx pkey %04x " + "service_id %016llx dgid %04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x\n", + (unsigned long long) be64_to_cpu(target->id_ext), + (unsigned long long) be64_to_cpu(target->ioc_guid), + be16_to_cpu(target->path.pkey), + (unsigned long long) be64_to_cpu(target->service_id), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[0]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[2]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[4]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[6]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[8]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[10]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[12]), + (int) be16_to_cpu(*(__be16 *) &target->path.dgid.raw[14])); + + ret = srp_create_target_ib(target); + if (ret) + goto err; + + target->cm_id = ib_create_cm_id(host->dev, srp_cm_handler, target); + if (IS_ERR(target->cm_id)) { + ret = PTR_ERR(target->cm_id); + goto err_free; + } + + ret = srp_connect_target(target); + if (ret) { + printk(KERN_ERR PFX "Connection failed\n"); + goto err_cm_id; + } + + ret = srp_add_target(host, target); + if (ret) + goto err_disconnect; + + return count; + +err_disconnect: + srp_disconnect_target(target); + +err_cm_id: + ib_destroy_cm_id(target->cm_id); + +err_free: + srp_free_target_ib(target); + +err: + scsi_host_put(target_host); + + return ret; +} + +static CLASS_DEVICE_ATTR(add_target, S_IWUSR, NULL, srp_create_target); + +static ssize_t show_ibdev(struct class_device *class_dev, char *buf) +{ + struct srp_host *host = + container_of(class_dev, struct srp_host, class_dev); + + return sprintf(buf, "%s\n", host->dev->name); +} + +static CLASS_DEVICE_ATTR(ibdev, S_IRUGO, show_ibdev, NULL); + +static ssize_t show_port(struct class_device *class_dev, char *buf) +{ + struct srp_host *host = + container_of(class_dev, struct srp_host, class_dev); + + return sprintf(buf, "%d\n", host->port); +} + +static CLASS_DEVICE_ATTR(port, S_IRUGO, show_port, NULL); + +static struct srp_host *srp_add_port(struct ib_device *device, + __be64 node_guid, u8 port) +{ + struct srp_host *host; + + host = kzalloc(sizeof *host, GFP_KERNEL); + if (!host) + return NULL; + + INIT_LIST_HEAD(&host->target_list); + init_MUTEX(&host->target_mutex); + init_completion(&host->released); + host->dev = device; + host->port = port; + + host->initiator_port_id[7] = port; + memcpy(host->initiator_port_id + 8, &node_guid, 8); + + host->pd = ib_alloc_pd(device); + if (IS_ERR(host->pd)) + goto err_free; + + host->mr = ib_get_dma_mr(host->pd, + IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_READ | + IB_ACCESS_REMOTE_WRITE); + if (IS_ERR(host->mr)) + goto err_pd; + + host->class_dev.class = &srp_class; + host->class_dev.dev = device->dma_device; + snprintf(host->class_dev.class_id, BUS_ID_SIZE, "srp-%s-%d", + device->name, port); + + if (class_device_register(&host->class_dev)) + goto err_mr; + if (class_device_create_file(&host->class_dev, &class_device_attr_add_target)) + goto err_class; + if (class_device_create_file(&host->class_dev, &class_device_attr_ibdev)) + goto err_class; + if (class_device_create_file(&host->class_dev, &class_device_attr_port)) + goto err_class; + + return host; + +err_class: + class_device_unregister(&host->class_dev); + +err_mr: + ib_dereg_mr(host->mr); + +err_pd: + ib_dealloc_pd(host->pd); + +err_free: + kfree(host); + + return NULL; +} + +static void srp_add_one(struct ib_device *device) +{ + struct list_head *dev_list; + struct srp_host *host; + struct ib_device_attr *dev_attr; + int s, e, p; + + dev_attr = kmalloc(sizeof *dev_attr, GFP_KERNEL); + if (!dev_attr) + return; + + if (ib_query_device(device, dev_attr)) { + printk(KERN_WARNING PFX "Couldn't query node GUID for %s.\n", + device->name); + goto out; + } + + dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL); + if (!dev_list) + goto out; + + INIT_LIST_HEAD(dev_list); + + if (device->node_type == IB_NODE_SWITCH) { + s = 0; + e = 0; + } else { + s = 1; + e = device->phys_port_cnt; + } + + for (p = s; p <= e; ++p) { + host = srp_add_port(device, dev_attr->node_guid, p); + if (host) + list_add_tail(&host->list, dev_list); + } + + ib_set_client_data(device, &srp_client, dev_list); + +out: + kfree(dev_attr); +} + +static void srp_remove_one(struct ib_device *device) +{ + struct list_head *dev_list; + struct srp_host *host, *tmp_host; + LIST_HEAD(target_list); + struct srp_target_port *target, *tmp_target; + unsigned long flags; + + dev_list = ib_get_client_data(device, &srp_client); + + list_for_each_entry_safe(host, tmp_host, dev_list, list) { + class_device_unregister(&host->class_dev); + /* + * Wait for the sysfs entry to go away, so that no new + * target ports can be created. + */ + wait_for_completion(&host->released); + + /* + * Mark all target ports as removed, so we stop queueing + * commands and don't try to reconnect. + */ + down(&host->target_mutex); + list_for_each_entry_safe(target, tmp_target, + &host->target_list, list) { + spin_lock_irqsave(target->scsi_host->host_lock, flags); + if (target->state != SRP_TARGET_REMOVED) + target->state = SRP_TARGET_REMOVED; + spin_unlock_irqrestore(target->scsi_host->host_lock, flags); + } + up(&host->target_mutex); + + /* + * Wait for any reconnection tasks that may have + * started before we marked our target ports as + * removed, and any target port removal tasks. + */ + flush_scheduled_work(); + + list_for_each_entry_safe(target, tmp_target, + &host->target_list, list) { + scsi_remove_host(target->scsi_host); + srp_disconnect_target(target); + ib_destroy_cm_id(target->cm_id); + srp_free_target_ib(target); + scsi_host_put(target->scsi_host); + } + + ib_dereg_mr(host->mr); + ib_dealloc_pd(host->pd); + kfree(host); + } + + kfree(dev_list); +} + +static int __init srp_init_module(void) +{ + int ret; + + ret = class_register(&srp_class); + if (ret) { + printk(KERN_ERR PFX "couldn't register class infiniband_srp\n"); + return ret; + } + + ret = ib_register_client(&srp_client); + if (ret) { + printk(KERN_ERR PFX "couldn't register IB client\n"); + class_unregister(&srp_class); + return ret; + } + + return 0; +} + +static void __exit srp_cleanup_module(void) +{ + ib_unregister_client(&srp_client); + class_unregister(&srp_class); +} + +module_init(srp_init_module); +module_exit(srp_cleanup_module); diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h new file mode 100644 index 0000000..4fec28a --- /dev/null +++ b/drivers/infiniband/ulp/srp/ib_srp.h @@ -0,0 +1,150 @@ +/* + * Copyright (c) 2005 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: ib_srp.h 3932 2005-11-01 17:19:29Z roland $ + */ + +#ifndef IB_SRP_H +#define IB_SRP_H + +#include +#include + +#include + +#include +#include + +#include +#include +#include + +enum { + SRP_PATH_REC_TIMEOUT_MS = 1000, + SRP_ABORT_TIMEOUT_MS = 5000, + + SRP_PORT_REDIRECT = 1, + SRP_DLID_REDIRECT = 2, + + SRP_MAX_IU_LEN = 256, + + SRP_RQ_SHIFT = 6, + SRP_RQ_SIZE = 1 << SRP_RQ_SHIFT, + SRP_SQ_SIZE = SRP_RQ_SIZE - 1, + SRP_CQ_SIZE = SRP_SQ_SIZE + SRP_RQ_SIZE, + + SRP_TAG_TSK_MGMT = 1 << (SRP_RQ_SHIFT + 1) +}; + +#define SRP_OP_RECV (1 << 31) +#define SRP_MAX_INDIRECT ((SRP_MAX_IU_LEN - \ + sizeof (struct srp_cmd) - \ + sizeof (struct srp_indirect_buf)) / 16) + +enum srp_target_state { + SRP_TARGET_LIVE, + SRP_TARGET_CONNECTING, + SRP_TARGET_DEAD, + SRP_TARGET_REMOVED +}; + +struct srp_host { + u8 initiator_port_id[16]; + struct ib_device *dev; + u8 port; + struct ib_pd *pd; + struct ib_mr *mr; + struct class_device class_dev; + struct list_head target_list; + struct semaphore target_mutex; + struct completion released; + struct list_head list; +}; + +struct srp_request { + struct list_head list; + struct scsi_cmnd *scmnd; + struct srp_iu *cmd; + struct srp_iu *tsk_mgmt; + DECLARE_PCI_UNMAP_ADDR(direct_mapping) + struct completion done; + short next; + u8 cmd_done; + u8 tsk_status; +}; + +struct srp_target_port { + __be64 id_ext; + __be64 ioc_guid; + __be64 service_id; + struct srp_host *srp_host; + struct Scsi_Host *scsi_host; + char target_name[32]; + unsigned int scsi_id; + + struct ib_sa_path_rec path; + struct ib_sa_query *path_query; + int path_query_id; + + struct ib_cm_id *cm_id; + struct ib_cq *cq; + struct ib_qp *qp; + + int max_ti_iu_len; + s32 req_lim; + + unsigned rx_head; + struct srp_iu *rx_ring[SRP_RQ_SIZE]; + + unsigned tx_head; + unsigned tx_tail; + struct srp_iu *tx_ring[SRP_SQ_SIZE + 1]; + + int req_head; + struct list_head req_queue; + struct srp_request req_ring[SRP_SQ_SIZE]; + + struct work_struct work; + + struct list_head list; + struct completion done; + int status; + enum srp_target_state state; +}; + +struct srp_iu { + dma_addr_t dma; + void *buf; + size_t size; + enum dma_data_direction direction; +}; + +#endif /* IB_SRP_H */ diff --git a/include/scsi/srp.h b/include/scsi/srp.h new file mode 100644 index 0000000..6c2681d --- /dev/null +++ b/include/scsi/srp.h @@ -0,0 +1,226 @@ +/* + * Copyright (c) 2005 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id$ + */ + +#ifndef SCSI_SRP_H +#define SCSI_SRP_H + +/* + * Structures and constants for the SCSI RDMA Protocol (SRP) as + * defined by the INCITS T10 committee. This file was written using + * draft Revision 16a of the SRP standard. + */ + +#include + +enum { + SRP_LOGIN_REQ = 0x00, + SRP_TSK_MGMT = 0x01, + SRP_CMD = 0x02, + SRP_I_LOGOUT = 0x03, + SRP_LOGIN_RSP = 0xc0, + SRP_RSP = 0xc1, + SRP_LOGIN_REJ = 0xc2, + SRP_T_LOGOUT = 0x80, + SRP_CRED_REQ = 0x81, + SRP_AER_REQ = 0x82, + SRP_CRED_RSP = 0x41, + SRP_AER_RSP = 0x42 +}; + +enum { + SRP_BUF_FORMAT_DIRECT = 1 << 1, + SRP_BUF_FORMAT_INDIRECT = 1 << 2 +}; + +enum { + SRP_NO_DATA_DESC = 0, + SRP_DATA_DESC_DIRECT = 1, + SRP_DATA_DESC_INDIRECT = 2 +}; + +enum { + SRP_TSK_ABORT_TASK = 0x01, + SRP_TSK_ABORT_TASK_SET = 0x02, + SRP_TSK_CLEAR_TASK_SET = 0x04, + SRP_TSK_LUN_RESET = 0x08, + SRP_TSK_CLEAR_ACA = 0x40 +}; + +enum srp_login_rej_reason { + SRP_LOGIN_REJ_UNABLE_ESTABLISH_CHANNEL = 0x00010000, + SRP_LOGIN_REJ_INSUFFICIENT_RESOURCES = 0x00010001, + SRP_LOGIN_REJ_REQ_IT_IU_LENGTH_TOO_LARGE = 0x00010002, + SRP_LOGIN_REJ_UNABLE_ASSOCIATE_CHANNEL = 0x00010003, + SRP_LOGIN_REJ_UNSUPPORTED_DESCRIPTOR_FMT = 0x00010004, + SRP_LOGIN_REJ_MULTI_CHANNEL_UNSUPPORTED = 0x00010005, + SRP_LOGIN_REJ_CHANNEL_LIMIT_REACHED = 0x00010006 +}; + +struct srp_direct_buf { + __be64 va; + __be32 key; + __be32 len; +}; + +/* + * We need the packed attribute because the SRP spec puts the list of + * descriptors at an offset of 20, which is not aligned to the size + * of struct srp_direct_buf. + */ +struct srp_indirect_buf { + struct srp_direct_buf table_desc; + __be32 len; + struct srp_direct_buf desc_list[0] __attribute__((packed)); +}; + +enum { + SRP_MULTICHAN_SINGLE = 0, + SRP_MULTICHAN_MULTI = 1 +}; + +struct srp_login_req { + u8 opcode; + u8 reserved1[7]; + u64 tag; + __be32 req_it_iu_len; + u8 reserved2[4]; + __be16 req_buf_fmt; + u8 req_flags; + u8 reserved3[5]; + u8 initiator_port_id[16]; + u8 target_port_id[16]; +}; + +struct srp_login_rsp { + u8 opcode; + u8 reserved1[3]; + __be32 req_lim_delta; + u64 tag; + __be32 max_it_iu_len; + __be32 max_ti_iu_len; + __be16 buf_fmt; + u8 rsp_flags; + u8 reserved2[25]; +}; + +struct srp_login_rej { + u8 opcode; + u8 reserved1[3]; + __be32 reason; + u64 tag; + u8 reserved2[8]; + __be16 buf_fmt; + u8 reserved3[6]; +}; + +struct srp_i_logout { + u8 opcode; + u8 reserved[7]; + u64 tag; +}; + +struct srp_t_logout { + u8 opcode; + u8 sol_not; + u8 reserved[2]; + __be32 reason; + u64 tag; +}; + +/* + * We need the packed attribute because the SRP spec only aligns the + * 8-byte LUN field to 4 bytes. + */ +struct srp_tsk_mgmt { + u8 opcode; + u8 sol_not; + u8 reserved1[6]; + u64 tag; + u8 reserved2[4]; + __be64 lun __attribute__((packed)); + u8 reserved3[2]; + u8 tsk_mgmt_func; + u8 reserved4; + u64 task_tag; + u8 reserved5[8]; +}; + +/* + * We need the packed attribute because the SRP spec only aligns the + * 8-byte LUN field to 4 bytes. + */ +struct srp_cmd { + u8 opcode; + u8 sol_not; + u8 reserved1[3]; + u8 buf_fmt; + u8 data_out_desc_cnt; + u8 data_in_desc_cnt; + u64 tag; + u8 reserved2[4]; + __be64 lun __attribute__((packed)); + u8 reserved3; + u8 task_attr; + u8 reserved4; + u8 add_cdb_len; + u8 cdb[16]; + u8 add_data[0]; +}; + +enum { + SRP_RSP_FLAG_RSPVALID = 1 << 0, + SRP_RSP_FLAG_SNSVALID = 1 << 1, + SRP_RSP_FLAG_DOOVER = 1 << 2, + SRP_RSP_FLAG_DOUNDER = 1 << 3, + SRP_RSP_FLAG_DIOVER = 1 << 4, + SRP_RSP_FLAG_DIUNDER = 1 << 5 +}; + +struct srp_rsp { + u8 opcode; + u8 sol_not; + u8 reserved1[2]; + __be32 req_lim_delta; + u64 tag; + u8 reserved2[2]; + u8 flags; + u8 status; + __be32 data_out_res_cnt; + __be32 data_in_res_cnt; + __be32 sense_data_len; + __be32 resp_data_len; + u8 data[0]; +}; + +#endif /* SCSI_SRP_H */ --- 0.99.9 From ftillier at silverstorm.com Wed Nov 2 13:48:28 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Wed, 2 Nov 2005 13:48:28 -0800 Subject: [openib-general] Re: [OpenSM] SA database query tool In-Reply-To: <1130965874.4381.4249.camel@hal.voltaire.com> Message-ID: <000301c5dff7$29c5eb40$9e5aa8c0@infiniconsys.com> > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Wednesday, November 02, 2005 1:11 PM > > As I said there is likely a real SA client that will be developed. In > the short term, you can use some diag as an example but these are SMP > rather than GMP based (except for perfquery). There is some SA > infrastructure in place but I'm not sure how well it works. Would you be > using RMPP too as little has exercised it to date ? RMPP would be required for a query of all service registrations. > There's sa_call and just an ib_path_query right now (in > libibmad/src/sa.c). A service query could be easily added. RMPP is not > supported yet at this level. > > > > What is the timeframe for this need ? > > > > I'm thinking of debugging tools that would be useful for me at SC05. > > I was planning on using ibis at SC05 if this was needed. If there are Windows boxes on the same IB fabric, you could pretty easily write a program to do the query for you. Windows supports user-mode SA queries including RMPP. I don't know if this is practical for your SC05 needs. - Fab From mshefty at ichips.intel.com Wed Nov 2 13:48:11 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 02 Nov 2005 13:48:11 -0800 Subject: [openib-general] common userspace support Message-ID: <4369341B.8080004@ichips.intel.com> I'm implementing the userspace CMA and noticed that there are a couple of areas where userspace support overlaps. For example, both the CMA and IB CM need to copy path records between userspace and the kernel. They also copy QP attributes, which would also be needed by verbs at some point to support query QP. In these cases, the data structures passed between userspace and the kernel are the same, as is the code to copy them. Does anyone have a preference for how to deal with this issue on both the kernel and userspace sides? My thinking is that for the kernel, the kernel structures would be defined in a common header, with functions exported to copy to/from them. This results in additional dependencies between modules. (E.g. rdma_ucm would require ib_uverbs and ib_usa modules. ib_user_verbs.h would define the QP attribute structure and uverbs_?.c would export copy routines.) For userspace, we can do something similar, which would build dependencies between the different libraries. - Sean From robert.j.woodruff at intel.com Wed Nov 2 13:47:34 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Wed, 2 Nov 2005 13:47:34 -0800 Subject: [openib-general] RE: Problems with SDP on Itanium In-Reply-To: <20051102185838.GA26005@mellanox.co.il> Message-ID: Michael wrote, >No, dont think I've seen that one, but its been a while >since I last run anything on Itanium. >Can you try to debug it a little? What does it mean that >an application "hangs"? Is some data sent from one side not received >by another one? >-- >MST Looks like it is stuck in the write()system call. 103: 1048573 bytes 21 times --> 3853.24 Mbps in 2076.17 usec 104: 1048576 bytes 24 times --> 3854.65 Mbps in 2075.42 usec 105: 1048579 bytes 24 times --> 3847.86 Mbps in 2079.08 usec 106: 1572861 bytes 24 times --> Program received signal SIGINT, Interrupt. 0xa000000000010641 in ?? () (gdb) bt #0 0xa000000000010641 in ?? () #1 0x20000000001bf9c0 in write () from /lib/tls/libc.so.6.1 #2 0x4000000000004920 in SendData () #3 0x40000000000036e0 in main () Here is the gdb traceback from the other side after it hangs. It is blocked in a read() system call. (gdb) run Starting program: /home/exports/NetPIPE_3.5-SDP/NPtcp Failed to read a valid object file image from memory. (no debugging symbols found) (no debugging symbols found) (no debugging symbols found) Send and receive buffers are 135168 and 135168 bytes (A bug in Linux doubles the requested buffer sizes) Program received signal SIGINT, Interrupt. 0xa000000000010641 in ?? () (gdb) bt #0 0xa000000000010641 in ?? () #1 0x20000000001bf8c0 in read () from /lib/tls/libc.so.6.1 #2 0x4000000000004a50 in RecvData () #3 0x4000000000003aa0 in main () From mst at mellanox.co.il Wed Nov 2 14:03:58 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 3 Nov 2005 00:03:58 +0200 Subject: [openib-general] Re: [PATCH/RFC v2] IB: Add SCSI RDMA Protocol (SRP) initiator In-Reply-To: <52r79y91jz.fsf_-_@cisco.com> References: <52r79y91jz.fsf_-_@cisco.com> Message-ID: <20051102220358.GA27132@mellanox.co.il> Hello, Roland! Quoting Roland Dreier : > +static int srp_init_qp(struct srp_target_port *target, > + struct ib_qp *qp) > +{ > + struct ib_qp_attr *attr; > + int ret; > + > + attr = kmalloc(sizeof *attr, GFP_KERNEL); > + if (!attr) > + return -ENOMEM; > + > + ret = ib_find_cached_pkey(target->srp_host->dev, > + target->srp_host->port, > + be16_to_cpu(target->path.pkey), > + &attr->pkey_index); > + if (ret) > + return ret; > + > + attr->qp_state = IB_QPS_INIT; > + attr->qp_access_flags = (IB_ACCESS_REMOTE_READ | > + IB_ACCESS_REMOTE_WRITE); > + attr->port_num = target->srp_host->port; > + > + return ib_modify_qp(qp, attr, > + IB_QP_STATE | > + IB_QP_PKEY_INDEX | > + IB_QP_ACCESS_FLAGS | > + IB_QP_PORT); > +} This seems to leak sizeof *attr bytes if ib_find_cached_pkey returns an error. -- MST From trimmer at silverstorm.com Wed Nov 2 14:04:07 2005 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Wed, 2 Nov 2005 17:04:07 -0500 Subject: [openib-general] Re: [PATCH/RFC v2] IB: Add SCSI RDMA Protocol(SRP) initiator Message-ID: <5D78D28F88822E4D8702BB9EEF1A436773E947@mercury.infiniconsys.com> also leaks it on success > -----Original Message----- > From: Michael S. Tsirkin [mailto:mst at mellanox.co.il] > Sent: Wednesday, November 02, 2005 5:04 PM > To: Roland Dreier > Cc: openib-general at openib.org; linux-kernel at vger.kernel.org; > linux-scsi at vger.kernel.org > Subject: [openib-general] Re: [PATCH/RFC v2] IB: Add SCSI RDMA > Protocol(SRP) initiator > > > Hello, Roland! > Quoting Roland Dreier : > > +static int srp_init_qp(struct srp_target_port *target, > > + struct ib_qp *qp) > > +{ > > + struct ib_qp_attr *attr; > > + int ret; > > + > > + attr = kmalloc(sizeof *attr, GFP_KERNEL); > > + if (!attr) > > + return -ENOMEM; > > + > > + ret = ib_find_cached_pkey(target->srp_host->dev, > > + target->srp_host->port, > > + be16_to_cpu(target->path.pkey), > > + &attr->pkey_index); > > + if (ret) > > + return ret; > > + > > + attr->qp_state = IB_QPS_INIT; > > + attr->qp_access_flags = (IB_ACCESS_REMOTE_READ | > > + IB_ACCESS_REMOTE_WRITE); > > + attr->port_num = target->srp_host->port; > > + > > + return ib_modify_qp(qp, attr, > > + IB_QP_STATE | > > + IB_QP_PKEY_INDEX | > > + IB_QP_ACCESS_FLAGS | > > + IB_QP_PORT); > > +} > > This seems to leak sizeof *attr bytes if ib_find_cached_pkey > returns an error. > > -- > MST > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From rolandd at cisco.com Wed Nov 2 14:04:41 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 02 Nov 2005 14:04:41 -0800 Subject: [openib-general] Re: [PATCH/RFC v2] IB: Add SCSI RDMA Protocol (SRP) initiator In-Reply-To: <20051102220358.GA27132@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 3 Nov 2005 00:03:58 +0200") References: <52r79y91jz.fsf_-_@cisco.com> <20051102220358.GA27132@mellanox.co.il> Message-ID: <52mzkm905i.fsf@cisco.com> Michael> This seems to leak sizeof *attr bytes if Michael> ib_find_cached_pkey returns an error. Good catch. It actually seems to leak attr unconditionally... I'll fix it up now. - R. From rolandd at cisco.com Wed Nov 2 14:08:35 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 02 Nov 2005 14:08:35 -0800 Subject: [openib-general] Re: [PATCH/RFC v2] IB: Add SCSI RDMA Protocol (SRP) initiator In-Reply-To: <20051102220358.GA27132@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 3 Nov 2005 00:03:58 +0200") References: <52r79y91jz.fsf_-_@cisco.com> <20051102220358.GA27132@mellanox.co.il> Message-ID: <52irva8zz0.fsf@cisco.com> OK, I replaced the function with: +static int srp_init_qp(struct srp_target_port *target, + struct ib_qp *qp) +{ + struct ib_qp_attr *attr; + int ret; + + attr = kmalloc(sizeof *attr, GFP_KERNEL); + if (!attr) + return -ENOMEM; + + ret = ib_find_cached_pkey(target->srp_host->dev, + target->srp_host->port, + be16_to_cpu(target->path.pkey), + &attr->pkey_index); + if (ret) + goto out; + + attr->qp_state = IB_QPS_INIT; + attr->qp_access_flags = (IB_ACCESS_REMOTE_READ | + IB_ACCESS_REMOTE_WRITE); + attr->port_num = target->srp_host->port; + + ret = ib_modify_qp(qp, attr, + IB_QP_STATE | + IB_QP_PKEY_INDEX | + IB_QP_ACCESS_FLAGS | + IB_QP_PORT); + +out: + kfree(attr); + return ret; +} From ardavis at ichips.intel.com Wed Nov 2 14:08:21 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Wed, 02 Nov 2005 14:08:21 -0800 Subject: [openib-general] Re: uDAPL again In-Reply-To: <43692526.3030003@cs.rutgers.edu> References: <436906F0.3050803@cs.rutgers.edu> <43691B71.2040500@ichips.intel.com> <43692526.3030003@cs.rutgers.edu> Message-ID: <436938D5.6030403@ichips.intel.com> Aniruddha Bohra wrote: > Arlin Davis wrote: > >> Aniruddha Bohra wrote: >> >>> cq_object_wait: RET evd 0x8083ca0 ibv_cq 0x8083da0 ibv_ctx (nil) >>> Success^M >>> >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M >>> dapl_evd_dto_callback : CQE ^M >>> work_req_id 134771572^M >>> status 12^M >>> >>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<^M >>> DTO completion ERROR: 12: op 0xff^M >>> disconnect(ep 0x8087110, conn 0x808a008, id 134774528 flags 0)^M >>> destroy_cm_id: conn 0x808a008 id 134774528^M >>> dapli_evd_post_event: Called with event # 4006^M >>> >>> >>> Any ideas how to proceed to even debug this ? >> >> Are you using the uDAPL provider with socket CM (VERBS=openib_scm) or >> the default one that use's uCM and uAT? For the socket_CM version >> the timeout is set to 14 (~67ms) and the retries are set to 7 so the >> receiving node would have to be delayed beyond ~469ms to get this >> failure. For the default uCM/uAT version the retries are set to 7 and >> the timeout is set to pktlifetime+1 so you would have to look at the >> path-record for the timeout value for the connection. >> > I am using the default one. Actually, even the dapl_ep_connect() takes > a long time. How long does it typically take to process your dapl_ep_connect? Your time is most likely being spent resolving the remote IP address to a GID and then resolving the path record. Both require SA quieries. > I am not sure, but arent uCM and uAT simply for connection establishment? > Yes, but they also set up many of the transfer attributes of the connected QP. The uCM/uAT version uses path_records from the SA query but the socket_CM version just builds them by hand similiar to the way ibv_rc_pingpong does. You would have to look at the pathrecord->pktlifetime to see the actual timeout value being used. > >> Can you successfully run the IB verbs ibv_rc_pingpong test suite? > > > Between the two OpenIB nodes, I can run the ibv_rc_pingpong. I would suggest that you try the socket CM version and see if you get different results. Just build with "make VERBS=openib_scm". -arlin From jlentini at netapp.com Wed Nov 2 14:09:56 2005 From: jlentini at netapp.com (James Lentini) Date: Wed, 2 Nov 2005 17:09:56 -0500 (EST) Subject: [openib-general] Re: [OpenSM] SA database query tool In-Reply-To: <1130965874.4381.4249.camel@hal.voltaire.com> References: <1130958126.4381.4109.camel@hal.voltaire.com> <1130965874.4381.4249.camel@hal.voltaire.com> Message-ID: halr> > I'm thinking of debugging tools that would be useful for me at SC05. halr> halr> I was planning on using ibis at SC05 if this was needed. I'll check out ibis. Based on Eitan mail, it sounds perfect. james From mst at mellanox.co.il Wed Nov 2 14:15:22 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 3 Nov 2005 00:15:22 +0200 Subject: [openib-general] Re: common userspace support In-Reply-To: <4369341B.8080004@ichips.intel.com> References: <4369341B.8080004@ichips.intel.com> Message-ID: <20051102221522.GA27731@mellanox.co.il> Quoting r. Sean Hefty : > Subject: common userspace support > > I'm implementing the userspace CMA and noticed that there are a couple of areas > where userspace support overlaps. > > For example, both the CMA and IB CM need to copy path records between userspace > and the kernel. They also copy QP attributes, which would also be needed by > verbs at some point to support query QP. In these cases, the data structures > passed between userspace and the kernel are the same, as is the code to copy them. > > Does anyone have a preference for how to deal with this issue on both the kernel > and userspace sides? > > My thinking is that for the kernel, the kernel structures would be defined in a > common header, with functions exported to copy to/from them. This results in > additional dependencies between modules. (E.g. rdma_ucm would require ib_uverbs > and ib_usa modules. ib_user_verbs.h would define the QP attribute structure and > uverbs_?.c would export copy routines.) > > For userspace, we can do something similar, which would build dependencies > between the different libraries. > > - Sean Common header files/structures might make some sense, but what would the routines do, besides copy to/from user? Could you give an example? -- MST From mst at mellanox.co.il Wed Nov 2 14:18:50 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 3 Nov 2005 00:18:50 +0200 Subject: [openib-general] Re: Problems with SDP on Itanium In-Reply-To: References: Message-ID: <20051102221850.GB27132@mellanox.co.il> Quoting r. Bob Woodruff : > Subject: RE: Problems with SDP on Itanium > > Michael wrote, > >No, dont think I've seen that one, but its been a while > >since I last run anything on Itanium. > >Can you try to debug it a little? What does it mean that > >an application "hangs"? Is some data sent from one side not received > >by another one? > > >-- > >MST > > Looks like it is stuck in the write()system call. > > 103: 1048573 bytes 21 times --> 3853.24 Mbps in 2076.17 usec > 104: 1048576 bytes 24 times --> 3854.65 Mbps in 2075.42 usec > 105: 1048579 bytes 24 times --> 3847.86 Mbps in 2079.08 usec > 106: 1572861 bytes 24 times --> > Program received signal SIGINT, Interrupt. > 0xa000000000010641 in ?? () > (gdb) bt > #0 0xa000000000010641 in ?? () > #1 0x20000000001bf9c0 in write () from /lib/tls/libc.so.6.1 > #2 0x4000000000004920 in SendData () > #3 0x40000000000036e0 in main () > > Here is the gdb traceback from the other side after it hangs. > It is blocked in a read() system call. > > (gdb) run > Starting program: /home/exports/NetPIPE_3.5-SDP/NPtcp > Failed to read a valid object file image from memory. > (no debugging symbols found) > (no debugging symbols found) > (no debugging symbols found) > Send and receive buffers are 135168 and 135168 bytes > (A bug in Linux doubles the requested buffer sizes) > > Program received signal SIGINT, Interrupt. > 0xa000000000010641 in ?? () > (gdb) bt > #0 0xa000000000010641 in ?? () > #1 0x20000000001bf8c0 in read () from /lib/tls/libc.so.6.1 > #2 0x4000000000004a50 in RecvData () > #3 0x4000000000003aa0 in main () > Interesting. I'll try to look at this next week - shouldnt be too hard to debug if I manage to reproduce it here. Meanwhile, could you please try to enable sdp data debugging, and post the resulting log if the problem reproduces there? -- MST From mshefty at ichips.intel.com Wed Nov 2 14:15:44 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Wed, 02 Nov 2005 14:15:44 -0800 Subject: [openib-general] Re: common userspace support In-Reply-To: <20051102221522.GA27731@mellanox.co.il> References: <4369341B.8080004@ichips.intel.com> <20051102221522.GA27731@mellanox.co.il> Message-ID: <43693A90.80400@ichips.intel.com> Michael S. Tsirkin wrote: > Common header files/structures might make some sense, but what would > the routines do, besides copy to/from user? > Could you give an example? The "copies" aren't memory copies, but field by field assignments. The function below is used by ib_ucm to copy QP attributes from the kernel to the userspace app. The same functionality is needed by rdma_ucm. - Sean static void ib_ucm_copy_qp_attr(struct ib_ucm_init_qp_attr_resp *dest_attr, struct ib_qp_attr *src_attr) { dest_attr->cur_qp_state = src_attr->cur_qp_state; dest_attr->path_mtu = src_attr->path_mtu; dest_attr->path_mig_state = src_attr->path_mig_state; dest_attr->qkey = src_attr->qkey; dest_attr->rq_psn = src_attr->rq_psn; dest_attr->sq_psn = src_attr->sq_psn; dest_attr->dest_qp_num = src_attr->dest_qp_num; dest_attr->qp_access_flags = src_attr->qp_access_flags; dest_attr->max_send_wr = src_attr->cap.max_send_wr; dest_attr->max_recv_wr = src_attr->cap.max_recv_wr; dest_attr->max_send_sge = src_attr->cap.max_send_sge; dest_attr->max_recv_sge = src_attr->cap.max_recv_sge; dest_attr->max_inline_data = src_attr->cap.max_inline_data; ib_ucm_copy_ah_attr(&dest_attr->ah_attr, &src_attr->ah_attr); ib_ucm_copy_ah_attr(&dest_attr->alt_ah_attr, &src_attr->alt_ah_attr); dest_attr->pkey_index = src_attr->pkey_index; dest_attr->alt_pkey_index = src_attr->alt_pkey_index; dest_attr->en_sqd_async_notify = src_attr->en_sqd_async_notify; dest_attr->sq_draining = src_attr->sq_draining; dest_attr->max_rd_atomic = src_attr->max_rd_atomic; dest_attr->max_dest_rd_atomic = src_attr->max_dest_rd_atomic; dest_attr->min_rnr_timer = src_attr->min_rnr_timer; dest_attr->port_num = src_attr->port_num; dest_attr->timeout = src_attr->timeout; dest_attr->retry_cnt = src_attr->retry_cnt; dest_attr->rnr_retry = src_attr->rnr_retry; dest_attr->alt_port_num = src_attr->alt_port_num; dest_attr->alt_timeout = src_attr->alt_timeout; } From mst at mellanox.co.il Wed Nov 2 14:28:24 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 3 Nov 2005 00:28:24 +0200 Subject: [openib-general] Re: common userspace support In-Reply-To: <4369341B.8080004@ichips.intel.com> References: <4369341B.8080004@ichips.intel.com> Message-ID: <20051102222824.GB27731@mellanox.co.il> Quoting r. Sean Hefty : > Subject: common userspace support > > I'm implementing the userspace CMA and noticed that there are a couple of areas > where userspace support overlaps. > > For example, both the CMA and IB CM need to copy path records between userspace > and the kernel. They also copy QP attributes, which would also be needed by > verbs at some point to support query QP. In these cases, the data structures > passed between userspace and the kernel are the same, as is the code to copy them. > > Does anyone have a preference for how to deal with this issue on both the kernel > and userspace sides? > > My thinking is that for the kernel, the kernel structures would be defined in a > common header, with functions exported to copy to/from them. This results in > additional dependencies between modules. (E.g. rdma_ucm would require ib_uverbs > and ib_usa modules. ib_user_verbs.h would define the QP attribute structure and > uverbs_?.c would export copy routines.) > > For userspace, we can do something similar, which would build dependencies > between the different libraries. > > - Sean I see what you mean now. In my opinion, given that cma is going to be used with uverbs anyway, it shouldnt be a problem to make cma depend on uverbs. -- MST From robert.j.woodruff at intel.com Wed Nov 2 15:16:30 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Wed, 2 Nov 2005 15:16:30 -0800 Subject: [openib-general] RE: Problems with SDP on Itanium In-Reply-To: <20051102221850.GB27132@mellanox.co.il> Message-ID: Michael wrote, >Interesting. I'll try to look at this next week - shouldnt be too hard >to debug if I manage to reproduce it here. >Meanwhile, could you please try to enable sdp data debugging, and post the >resulting log if the problem reproduces there? >-- >MST Yes, when I get some time, I will rebuild my kernel with debug and re-run it. woody From rolandd at cisco.com Wed Nov 2 15:27:51 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 02 Nov 2005 15:27:51 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable Datagram Sockets) to OpenIB Message-ID: <52d5li8waw.fsf@cisco.com> What are your plans for porting the RDS code so that it works with the upstream Linux IB stack? I've only seen a couple of checkins, and the code that you've dropped so far doesn't look usable and needs a lot of cleanup. There's not even a Makefile there. Someone uncharitable might believe that the whole purpose of this exercise was just to be able to issue your press release (http://silverstorm.com/news/rel/092005.asp). - R. From robert.j.woodruff at intel.com Wed Nov 2 17:37:52 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Wed, 2 Nov 2005 17:37:52 -0800 Subject: [openib-general] Problems with SDP on Itanium In-Reply-To: <014f01c5e00b$e35fb6d0$0211708d@gpv.az05.bull.com> Message-ID: Jerome wrote, >I tried your package and I have the same "hang" that you have at test 106; >sender in write call and receiver in read call. Not sure why ttcp would not >have this problem also? I will rebuild my kernel tomorrow with debug turned on and see if that provides and clues. woody From troy at scl.ameslab.gov Wed Nov 2 19:24:00 2005 From: troy at scl.ameslab.gov (Troy Benjegerdes) Date: Wed, 2 Nov 2005 21:24:00 -0600 Subject: [openib-general] OpenSM errors question.. Message-ID: <20051103032400.GF8748@minbar.scl.ameslab.gov> What does the following mean? (the ERR 1B11, in particular) Nov 02 16:18:33 656702 [41001960] -> osm_report_notice: Reporting Generic Notice type:4 num:144 from LID:0x001B GID:0xfe80000000000000,0x0002c90200402789 Nov 02 16:18:33 674607 [41802960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Nov 02 16:18:34 197522 [41802960] -> osm_ucast_mgr_process: Min Hop Tables configured on all switches. Nov 02 16:19:59 917207 [41802960] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x001B GID:0xfe80000000000000,0x0002c90200402789 Nov 02 16:19:59 917610 [41001960] -> osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x001B GID:0xfe80000000000000,0x0002c90200402789 Nov 02 16:19:59 926829 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method =SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID: 0xff12601bffff0000 : 0x0000000000000016 Nov 02 16:20:01 670893 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method =SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID: 0xff12601bffff0000 : 0x0000000000000002 (I got this on an isolated subnet with 3 machines.. two opterons with mellanox cards and an IBM with the eHCA card) From halr at voltaire.com Wed Nov 2 19:58:51 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 3 Nov 2005 05:58:51 +0200 Subject: [openib-general] OpenSM errors question.. Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F589A9DF@taurus.voltaire.com> On Wed, 2005-11-02 at 22:24, Troy Benjegerdes wrote: > What does the following mean? (the ERR 1B11, in particular) > > Nov 02 16:18:33 656702 [41001960] -> osm_report_notice: Reporting > Generic Notice type:4 num:144 from LID:0x001B > GID:0xfe80000000000000,0x0002c90200402789 > Nov 02 16:18:33 674607 [41802960] -> osm_ucast_mgr_process: Min Hop > Tables configured on all switches. > Nov 02 16:18:34 197522 [41802960] -> osm_ucast_mgr_process: Min Hop > Tables configured on all switches. > Nov 02 16:19:59 917207 [41802960] -> osm_report_notice: Reporting > Generic Notice type:3 num:66 from LID:0x001B > GID:0xfe80000000000000,0x0002c90200402789 > Nov 02 16:19:59 917610 [41001960] -> osm_report_notice: Reporting > Generic Notice type:3 num:66 from LID:0x001B > GID:0xfe80000000000000,0x0002c90200402789 > Nov 02 16:19:59 926829 [41001960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: > method =SubnAdmSet,scope_state = 0x1, component mask = > 0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID: > 0xff12601bffff0000 : 0x0000000000000016 > Nov 02 16:20:01 670893 [41802960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: > method =SubnAdmSet,scope_state = 0x1, component mask = > 0x0000000000010083, expected comp mask = 0x00000000000130c7, MGID: > 0xff12601bffff0000 : 0x0000000000000002 > > > (I got this on an isolated subnet with 3 machines.. two opterons with > mellanox cards and an IBM with the eHCA card) That means a join is being attempted to a multicast group which is not yet created. These are typically groups that you can ignore. They are benign. The two above are both IPv6 multicast groups. The first one ends in 0x16 and the second one 0x2. I think those are IGMP and all routers multicast groups. -- Hal From rolandd at cisco.com Wed Nov 2 20:18:07 2005 From: rolandd at cisco.com (Roland Dreier) Date: Wed, 02 Nov 2005 20:18:07 -0800 Subject: [openib-general] [PATCH] umad: fix hotplug Message-ID: <52veza74ao.fsf@cisco.com> I just committed the patch below, which should fix hotplug handling in umad. The practical effect of this that you can do "modprobe -r ib_mthca" with opensm running and not get an oops. Comments and test results solicited.... Thanks, Roland --- infiniband/core/user_mad.c (revision 3945) +++ infiniband/core/user_mad.c (working copy) @@ -94,6 +94,9 @@ struct ib_umad_port { struct class_device *sm_class_dev; struct semaphore sm_sem; + struct rw_semaphore mutex; + struct list_head file_list; + struct ib_device *ib_dev; struct ib_umad_device *umad_dev; int dev_num; @@ -108,10 +111,10 @@ struct ib_umad_device { struct ib_umad_file { struct ib_umad_port *port; - spinlock_t recv_lock; struct list_head recv_list; + struct list_head port_list; + spinlock_t recv_lock; wait_queue_head_t recv_wait; - struct rw_semaphore agent_mutex; struct ib_mad_agent *agent[IB_UMAD_MAX_AGENTS]; struct ib_mr *mr[IB_UMAD_MAX_AGENTS]; }; @@ -148,7 +151,7 @@ static int queue_packet(struct ib_umad_f { int ret = 1; - down_read(&file->agent_mutex); + down_read(&file->port->mutex); for (packet->mad.hdr.id = 0; packet->mad.hdr.id < IB_UMAD_MAX_AGENTS; packet->mad.hdr.id++) @@ -161,7 +164,7 @@ static int queue_packet(struct ib_umad_f break; } - up_read(&file->agent_mutex); + up_read(&file->port->mutex); return ret; } @@ -322,7 +325,7 @@ static ssize_t ib_umad_write(struct file goto err; } - down_read(&file->agent_mutex); + down_read(&file->port->mutex); agent = file->agent[packet->mad.hdr.id]; if (!agent) { @@ -419,7 +422,7 @@ static ssize_t ib_umad_write(struct file if (ret) goto err_msg; - up_read(&file->agent_mutex); + up_read(&file->port->mutex); return count; @@ -430,7 +433,7 @@ err_ah: ib_destroy_ah(ah); err_up: - up_read(&file->agent_mutex); + up_read(&file->port->mutex); err: kfree(packet); @@ -460,7 +463,12 @@ static int ib_umad_reg_agent(struct ib_u int agent_id; int ret; - down_write(&file->agent_mutex); + down_write(&file->port->mutex); + + if (!file->port->ib_dev) { + ret = -EPIPE; + goto out; + } if (copy_from_user(&ureq, (void __user *) arg, sizeof ureq)) { ret = -EFAULT; @@ -522,7 +530,7 @@ err: ib_unregister_mad_agent(agent); out: - up_write(&file->agent_mutex); + up_write(&file->port->mutex); return ret; } @@ -531,7 +539,7 @@ static int ib_umad_unreg_agent(struct ib u32 id; int ret = 0; - down_write(&file->agent_mutex); + down_write(&file->port->mutex); if (get_user(id, (u32 __user *) arg)) { ret = -EFAULT; @@ -548,7 +556,7 @@ static int ib_umad_unreg_agent(struct ib file->agent[id] = NULL; out: - up_write(&file->agent_mutex); + up_write(&file->port->mutex); return ret; } @@ -569,6 +577,7 @@ static int ib_umad_open(struct inode *in { struct ib_umad_port *port; struct ib_umad_file *file; + int ret = 0; spin_lock(&port_lock); port = umad_port[iminor(inode) - IB_UMAD_MINOR_BASE]; @@ -579,21 +588,32 @@ static int ib_umad_open(struct inode *in if (!port) return -ENXIO; + down_write(&port->mutex); + + if (!port->ib_dev) { + ret = -ENXIO; + goto out; + } + file = kzalloc(sizeof *file, GFP_KERNEL); if (!file) { kref_put(&port->umad_dev->ref, ib_umad_release_dev); - return -ENOMEM; + ret = -ENOMEM; + goto out; } spin_lock_init(&file->recv_lock); - init_rwsem(&file->agent_mutex); INIT_LIST_HEAD(&file->recv_list); init_waitqueue_head(&file->recv_wait); file->port = port; filp->private_data = file; - return 0; + list_add_tail(&file->port_list, &port->file_list); + +out: + up_write(&port->mutex); + return ret; } static int ib_umad_close(struct inode *inode, struct file *filp) @@ -680,9 +700,13 @@ static int ib_umad_sm_close(struct inode struct ib_port_modify props = { .clr_port_cap_mask = IB_PORT_SM }; - int ret; + int ret = 0; + + down_write(&port->mutex); + if (port->ib_dev) + ret = ib_modify_port(port->ib_dev, port->port_num, 0, &props); + up_write(&port->mutex); - ret = ib_modify_port(port->ib_dev, port->port_num, 0, &props); up(&port->sm_sem); kref_put(&port->umad_dev->ref, ib_umad_release_dev); @@ -745,6 +769,8 @@ static int ib_umad_init_port(struct ib_d port->ib_dev = device; port->port_num = port_num; init_MUTEX(&port->sm_sem); + init_rwsem(&port->mutex); + INIT_LIST_HEAD(&port->file_list); port->dev = cdev_alloc(); if (!port->dev) @@ -813,6 +839,9 @@ err_cdev: static void ib_umad_kill_port(struct ib_umad_port *port) { + struct ib_umad_file *file; + int id; + class_set_devdata(port->class_dev, NULL); class_set_devdata(port->sm_class_dev, NULL); @@ -826,6 +855,21 @@ static void ib_umad_kill_port(struct ib_ umad_port[port->dev_num] = NULL; spin_unlock(&port_lock); + down_write(&port->mutex); + + port->ib_dev = NULL; + + list_for_each_entry(file, &port->file_list, port_list) + for (id = 0; id < IB_UMAD_MAX_AGENTS; ++id) { + if (!file->agent[id]) + continue; + ib_dereg_mr(file->mr[id]); + ib_unregister_mad_agent(file->agent[id]); + file->agent[id] = NULL; + } + + up_write(&port->mutex); + clear_bit(port->dev_num, dev_map); } From mst at mellanox.co.il Thu Nov 3 00:30:44 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 3 Nov 2005 10:30:44 +0200 Subject: [openib-general] Re: [PATCH] umad: fix hotplug In-Reply-To: <52veza74ao.fsf@cisco.com> References: <52veza74ao.fsf@cisco.com> Message-ID: <20051103083044.GJ31134@mellanox.co.il> Quoting Roland Dreier : > Subject: [PATCH] umad: fix hotplug > > I just committed the patch below, which should fix hotplug handling in > umad. The practical effect of this that you can do "modprobe -r ib_mthca" > with opensm running and not get an oops. > > Comments and test results solicited.... > > Thanks, > Roland I've just checked in the following obvious fix. Hope thats OK with everyone. Committed revision 3956. --- Protect file->mr changes by port->mutex, and remove it from port_list on close. Signed-off-by: Michael S. Tsirkin Index: linux-kernel/drivers/infiniband/core/user_mad.c =================================================================== --- linux-kernel/drivers/infiniband/core/user_mad.c (revision 3955) +++ linux-kernel/drivers/infiniband/core/user_mad.c (working copy) @@ -623,6 +623,7 @@ static int ib_umad_close(struct inode *i struct ib_umad_packet *packet, *tmp; int i; + down_write(&file->port->mutex); for (i = 0; i < IB_UMAD_MAX_AGENTS; ++i) if (file->agent[i]) { ib_dereg_mr(file->mr[i]); @@ -632,6 +633,9 @@ static int ib_umad_close(struct inode *i list_for_each_entry_safe(packet, tmp, &file->recv_list, list) kfree(packet); + list_del(&file->port_list); + up_write(&file->port->mutex); + kfree(file); kref_put(&dev->ref, ib_umad_release_dev); -- MST From yipeeyipeeyipeeyipee at yahoo.com Thu Nov 3 01:12:33 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Thu, 3 Nov 2005 09:12:33 +0000 (UTC) Subject: [openib-general] Re: compilation platform dependencies References: <4367C179.5050102@ichips.intel.com> Message-ID: yipee yahoo.com> writes: Hi again, I've updated my openib sources from the main trunk and verified that your fixes fixed the error I got. Now the call to ib_cm_get_event() returns a correct value. Yhanks, y From halr at voltaire.com Thu Nov 3 03:22:31 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Nov 2005 06:22:31 -0500 Subject: [openib-general] [PATCH] OpenSM: Don't obtain PKeyTables on switch when option not supported Message-ID: <1131016951.4338.45.camel@hal.voltaire.com> OpenSM: Don't obtain PKeyTables on switch when partition enforcement option not supported. Part of patch supplied by Brad Benton Signed-off-by: Hal Rosenstock Index: osm_port_info_rcv.c =================================================================== --- osm_port_info_rcv.c (revision 3942) +++ osm_port_info_rcv.c (working copy) @@ -467,6 +467,11 @@ void osm_pkey_get_tables( cl_ntoh64(p_node->node_info.node_guid) ); goto Exit; } + + /* bail out if this is a switch with no partition enforcement capability */ + if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) + goto Exit; + max_blocks = (cl_ntoh16(p_switch->switch_info.enforce_cap)+IB_NUM_PKEY_ELEMENTS_IN_BLOCK -1) / IB_NUM_PKEY_ELEMENTS_IN_BLOCK ; } From yipeeyipeeyipeeyipee at yahoo.com Thu Nov 3 04:15:50 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Thu, 3 Nov 2005 12:15:50 +0000 (UTC) Subject: [openib-general] netstat Message-ID: Hi, Is there some way to view the list of current CM end points in their various states (listen,connection)? I'm looking for some utility that would provide me with information similar to what netstat provides about kernel sockets. for example: [yipee at yipee new_mini_host]$ netstat -nat Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN tcp 0 0 10.100.0.95:798 10.100.0.93:2049 ESTABLISHED tcp 0 0 10.100.0.95:800 10.100.0.93:2049 ESTABLISHED tcp 0 0 10.100.0.95:35148 10.100.0.93:111 TIME_WAIT The "Local Addresss" & "Foreign Address" fields can display the pair, the "State" field is meaningful too. thanks, y From yael at mellanox.co.il Thu Nov 3 05:07:42 2005 From: yael at mellanox.co.il (Yael Kalka) Date: 03 Nov 2005 15:07:42 +0200 Subject: [openib-general] [PATCH] Opensm - bug in osm_sa_path_record with 0 records Message-ID: <5zacglyj4x.fsf@mtl066.yok.mtl.com> Hi Hal, During some testing of path record we found a bug in the code. If the number of records return is zero, then there is clearing of non allocated memory. I've added some changes to the __osm_pr_rcv_respond function, to match other sa responses. Attached is a patch to fix it. Thanks, Yael Thanks, Yael Signed-off-by: Yael Kalka Index: opensm/osm_sa_path_record.c =================================================================== --- opensm/osm_sa_path_record.c (revision 3955) +++ opensm/osm_sa_path_record.c (working copy) @@ -1448,7 +1448,7 @@ __osm_pr_rcv_respond( osm_madw_t* p_resp_madw; const ib_sa_mad_t* p_sa_mad; ib_sa_mad_t* p_resp_sa_mad; - size_t num_rec, num_copied; + size_t num_rec, num_copied, pre_trim_num_rec; #ifndef VENDOR_RMPP_SUPPORT size_t trim_num_rec; #endif @@ -1456,6 +1456,7 @@ __osm_pr_rcv_respond( ib_api_status_t status; const ib_sa_mad_t* p_rcvd_mad = osm_madw_get_sa_mad_ptr( p_madw ); osm_pr_item_t* p_pr_item; + uint32_t i; OSM_LOG_ENTER( p_rcv->p_log, __osm_pr_rcv_respond ); @@ -1483,6 +1484,7 @@ __osm_pr_rcv_respond( goto Exit; } + pre_trim_num_rec = num_rec; #ifndef VENDOR_RMPP_SUPPORT trim_num_rec = (MAD_BLOCK_SIZE - IB_SA_MAD_HDR_SIZE) / sizeof(ib_path_rec_t); if (trim_num_rec < num_rec) @@ -1495,11 +1497,15 @@ __osm_pr_rcv_respond( } #endif - if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) - { osm_log( p_rcv->p_log, OSM_LOG_DEBUG, "__osm_pr_rcv_respond: " "Generating response with %u records.\n", num_rec ); + + if ((p_rcvd_mad->method == IB_MAD_METHOD_GET) && (num_rec == 0)) + { + osm_sa_send_error( p_rcv->p_resp, p_madw, + IB_SA_MAD_STATUS_NO_RECORDS ); + goto Exit; } /* @@ -1514,6 +1520,16 @@ __osm_pr_rcv_respond( osm_log( p_rcv->p_log, OSM_LOG_ERROR, "__osm_pr_rcv_respond: ERR 1F14: " "Unable to allocate MAD.\n" ); + + for( i = 0; i < num_rec; i++ ) + { + p_pr_item = (osm_pr_item_t*)cl_qlist_remove_head( p_list ); + cl_qlock_pool_put( &p_rcv->pr_pool, &p_pr_item->pool_item ); + } + + osm_sa_send_error( p_rcv->p_resp, p_madw, + IB_SA_MAD_STATUS_NO_RESOURCES ); + goto Exit; } @@ -1528,6 +1544,8 @@ __osm_pr_rcv_respond( p_resp_sa_mad->attr_offset = ib_get_attr_offset( sizeof(ib_path_rec_t) ); + p_resp_pr = (ib_path_rec_t*)ib_sa_mad_get_payload_ptr( p_resp_sa_mad ); + #ifndef VENDOR_RMPP_SUPPORT /* we support only one packet RMPP - so we will set the first and last flags for gettable */ @@ -1542,37 +1560,19 @@ __osm_pr_rcv_respond( p_resp_sa_mad->rmpp_flags = IB_RMPP_FLAG_ACTIVE; #endif - p_resp_pr = (ib_path_rec_t*)ib_sa_mad_get_payload_ptr( p_resp_sa_mad ); - - if ( num_rec == 0 ) - { - if (p_resp_sa_mad->method == IB_MAD_METHOD_GET_RESP) - p_resp_sa_mad->status = IB_SA_MAD_STATUS_NO_RECORDS; - cl_memclr( p_resp_pr, sizeof(*p_resp_pr) ); - } - else + for ( i = 0; i < pre_trim_num_rec; i++ ) { p_pr_item = (osm_pr_item_t*)cl_qlist_remove_head( p_list ); - - /* we need to track the number of copied items so we can - * stop the copy - but clear them all - */ - num_copied = 0; - - while( p_pr_item != (osm_pr_item_t*)cl_qlist_end( p_list ) ) - { - /* Copy the Path Records from the list into the MAD */ - if (num_copied < num_rec) - { + /* copy only if not trimmed */ + if (i < num_rec) *p_resp_pr = p_pr_item->path_rec; - num_copied++; - } + cl_qlock_pool_put( &p_rcv->pr_pool, &p_pr_item->pool_item ); p_resp_pr++; - p_pr_item = (osm_pr_item_t*)cl_qlist_remove_head( p_list ); - } } + CL_ASSERT( cl_is_qlist_empty( p_list ) ); + status = osm_vendor_send( p_resp_madw->h_bind, p_resp_madw, FALSE ); if( status != IB_SUCCESS ) From halr at voltaire.com Thu Nov 3 05:47:40 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Nov 2005 08:47:40 -0500 Subject: [openib-general] [PATCH] OpenSM: Workaround for IBM eHCA logical switch partition enforcement Message-ID: <1131024668.4338.178.camel@hal.voltaire.com> OpenSM: Workaround for IBM eHCA logical switch partition enforcement The problem is that the eHCA logical switches do not support partition enforcement. This *should* be reflected by a zero value in the PartitionEnforcementCap component of the switchinfo attribute. The IBM firmware bug is that it returns a one rather than a zero in this field. However, when subsequent requests to the switch port are received for the P_KeyTable, the firmware drops them on the floor and opensm thrashes timing out all the get P_KeyTable MADs it issues for all of the ports on the two logical switches. Remainder of patch supplied by Brad Benton Signed-off-by: Hal Rosenstock Index: osm_port_info_rcv.c =================================================================== --- osm_port_info_rcv.c (revision 3959) +++ osm_port_info_rcv.c (working copy) @@ -416,6 +416,7 @@ __osm_pi_rcv_process_router_port( OSM_LOG_EXIT( p_rcv->p_log ); } +#define IBM_VENDOR_ID (0x5076) /********************************************************************** **********************************************************************/ void osm_pkey_get_tables( @@ -431,6 +432,7 @@ void osm_pkey_get_tables( uint8_t port_num; uint16_t block_num, max_blocks; uint32_t attr_mod_ho; + uint32_t vendor_id; osm_switch_t* p_switch; OSM_LOG_ENTER( p_log, osm_physp_has_pkey ); @@ -468,7 +470,12 @@ void osm_pkey_get_tables( goto Exit; } - /* bail out if this is a switch with no partition enforcement capability */ + /* Check for IBM eHCA firmware defect in reporting partition enforcement cap */ + vendor_id = cl_ntoh32(ib_node_info_get_vendor_id( &p_node->node_info)); + if (vendor_id == IBM_VENDOR_ID && cl_ntoh16(p_switch->switch_info.enforce_cap) == 1) + p_switch->switch_info.enforce_cap = 0; + + /* Bail out if this is a switch with no partition enforcement capability */ if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) goto Exit; From mst at mellanox.co.il Thu Nov 3 06:00:12 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 3 Nov 2005 16:00:12 +0200 Subject: [openib-general] [PATCH] support kernel-level sockets in sdp Message-ID: <20051103140011.GA31134@mellanox.co.il> Hi! I plan to commit the following. Comments? --- The following patch adds support for kernel-level sockets in SDP Zcopy (currently used with AIO). Signed-off-by: Michael S. Tsirkin Index: drivers/infiniband/ulp/sdp/sdp_iocb.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_iocb.c (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_iocb.c (working copy) @@ -176,30 +176,40 @@ if (!iocb->page_array) goto err_page; - down_read(¤t->mm->mmap_sem); - - result = get_user_pages(current, current->mm, - iocb->addr, iocb->page_count, - !!(iocb->flags & SDP_IOCB_F_RECV), 0, - iocb->page_array, NULL); + if (segment_eq(get_fs(), get_ds())) { + /* Kernel request */ + for (i = 0; i< iocb->page_count; ++i) { + iocb->page_array[i] = virt_to_page(addr); + iocb->addr_array[i] = page_to_phys(iocb->page_array[i]); + addr += PAGE_SIZE; + } + } else { + /* User-level request */ + down_read(¤t->mm->mmap_sem); - up_read(¤t->mm->mmap_sem); + result = get_user_pages(current, current->mm, + iocb->addr, iocb->page_count, + !!(iocb->flags & SDP_IOCB_F_RECV), 0, + iocb->page_array, NULL); - if (result != iocb->page_count) { - sdp_dbg_err("unable to lock <%lx:%Zu> error <%d> <%d>", - iocb->addr, iocb->size, result, iocb->page_count); - goto err_get; + up_read(¤t->mm->mmap_sem); + + if (result != iocb->page_count) { + sdp_dbg_err("unable to lock <%lx:%Zu> error <%d> <%d>", + iocb->addr, iocb->size, result, + iocb->page_count); + goto err_get; + } + + iocb->flags |= SDP_IOCB_F_LOCKED; + iocb->mm = current->mm; + iocb->tsk = current; + + + for (i = 0; i< iocb->page_count; ++i) { + iocb->addr_array[i] = page_to_phys(iocb->page_array[i]); + } } - - iocb->flags |= SDP_IOCB_F_LOCKED; - iocb->mm = current->mm; - iocb->tsk = current; - - - for (i = 0; i< iocb->page_count; ++i) { - iocb->addr_array[i] = page_to_phys(iocb->page_array[i]); - } - return 0; err_get: -- MST From halr at voltaire.com Thu Nov 3 06:11:35 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Thu, 3 Nov 2005 16:11:35 +0200 Subject: [openib-general] Re: [PATCH] Opensm - bug in osm_sa_path_record with 0 records Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F589A9E1@taurus.voltaire.com> Hi Yael, On Thu, 2005-11-03 at 08:07, Yael Kalka wrote: > Hi Hal, > > During some testing of path record we found a bug in the code. > If the number of records return is zero, then there is clearing of > non allocated memory. > I've added some changes to the __osm_pr_rcv_respond function, to match > other sa responses. > Attached is a patch to fix it. A couple of minor comments below. -- Hal > Thanks, > Yael > > Thanks, > Yael > > Signed-off-by: Yael Kalka > > Index: opensm/osm_sa_path_record.c > =================================================================== > --- opensm/osm_sa_path_record.c (revision 3955) > +++ opensm/osm_sa_path_record.c (working copy) > @@ -1448,7 +1448,7 @@ __osm_pr_rcv_respond( > osm_madw_t* p_resp_madw; > const ib_sa_mad_t* p_sa_mad; > ib_sa_mad_t* p_resp_sa_mad; > - size_t num_rec, num_copied; > + size_t num_rec, num_copied, pre_trim_num_rec; > #ifndef VENDOR_RMPP_SUPPORT > size_t trim_num_rec; > #endif > @@ -1456,6 +1456,7 @@ __osm_pr_rcv_respond( > ib_api_status_t status; > const ib_sa_mad_t* p_rcvd_mad = osm_madw_get_sa_mad_ptr( p_madw ); > osm_pr_item_t* p_pr_item; > + uint32_t i; > > OSM_LOG_ENTER( p_rcv->p_log, __osm_pr_rcv_respond ); > > @@ -1483,6 +1484,7 @@ __osm_pr_rcv_respond( > goto Exit; > } > > + pre_trim_num_rec = num_rec; > #ifndef VENDOR_RMPP_SUPPORT > trim_num_rec = (MAD_BLOCK_SIZE - IB_SA_MAD_HDR_SIZE) / sizeof(ib_path_rec_t); > if (trim_num_rec < num_rec) > @@ -1495,11 +1497,15 @@ __osm_pr_rcv_respond( > } > #endif > > - if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) > - { > osm_log( p_rcv->p_log, OSM_LOG_DEBUG, > "__osm_pr_rcv_respond: " > "Generating response with %u records.\n", num_rec ); > + > + if ((p_rcvd_mad->method == IB_MAD_METHOD_GET) && (num_rec == 0)) > + { > + osm_sa_send_error( p_rcv->p_resp, p_madw, > + IB_SA_MAD_STATUS_NO_RECORDS ); > + goto Exit; > } This can be moved up immediately after the C15-0.1.30 clause, OK ? > /* > @@ -1514,6 +1520,16 @@ __osm_pr_rcv_respond( > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > "__osm_pr_rcv_respond: ERR 1F14: " > "Unable to allocate MAD.\n" ); > + > + for( i = 0; i < num_rec; i++ ) > + { > + p_pr_item = (osm_pr_item_t*)cl_qlist_remove_head( p_list ); > + cl_qlock_pool_put( &p_rcv->pr_pool, &p_pr_item->pool_item ); > + } > + > + osm_sa_send_error( p_rcv->p_resp, p_madw, > + IB_SA_MAD_STATUS_NO_RESOURCES ); > + osm_sa_send_error also attempts to get a MAD from the pool. Is there a chance this succeeds after the one in this routine fails ? (Should this be eliminated ?) > goto Exit; > } > > @@ -1528,6 +1544,8 @@ __osm_pr_rcv_respond( > p_resp_sa_mad->attr_offset = > ib_get_attr_offset( sizeof(ib_path_rec_t) ); > > + p_resp_pr = (ib_path_rec_t*)ib_sa_mad_get_payload_ptr( p_resp_sa_mad ); > + > #ifndef VENDOR_RMPP_SUPPORT > /* we support only one packet RMPP - so we will set the first and > last flags for gettable */ > @@ -1542,37 +1560,19 @@ __osm_pr_rcv_respond( > p_resp_sa_mad->rmpp_flags = IB_RMPP_FLAG_ACTIVE; > #endif > > - p_resp_pr = (ib_path_rec_t*)ib_sa_mad_get_payload_ptr( p_resp_sa_mad ); > - > - if ( num_rec == 0 ) > - { > - if (p_resp_sa_mad->method == IB_MAD_METHOD_GET_RESP) > - p_resp_sa_mad->status = IB_SA_MAD_STATUS_NO_RECORDS; > - cl_memclr( p_resp_pr, sizeof(*p_resp_pr) ); > - } > - else > + for ( i = 0; i < pre_trim_num_rec; i++ ) > { > p_pr_item = (osm_pr_item_t*)cl_qlist_remove_head( p_list ); > - > - /* we need to track the number of copied items so we can > - * stop the copy - but clear them all > - */ > - num_copied = 0; > - > - while( p_pr_item != (osm_pr_item_t*)cl_qlist_end( p_list ) ) > - { > - /* Copy the Path Records from the list into the MAD */ > - if (num_copied < num_rec) > - { > + /* copy only if not trimmed */ > + if (i < num_rec) > *p_resp_pr = p_pr_item->path_rec; > - num_copied++; > - } > + > cl_qlock_pool_put( &p_rcv->pr_pool, &p_pr_item->pool_item ); > p_resp_pr++; > - p_pr_item = (osm_pr_item_t*)cl_qlist_remove_head( p_list ); > - } > } > > + CL_ASSERT( cl_is_qlist_empty( p_list ) ); > + > status = osm_vendor_send( p_resp_madw->h_bind, p_resp_madw, FALSE ); > > if( status != IB_SUCCESS ) > From glebn at voltaire.com Thu Nov 3 06:19:24 2005 From: glebn at voltaire.com (Gleb Natapov) Date: Thu, 3 Nov 2005 16:19:24 +0200 Subject: [openib-general] [hugh@veritas.com: Re: Nick's core remove PageReserved broke vmware...] Message-ID: <20051103141924.GE22185@minantech.com> Hello Michael, It seems that it is time to resurrect your DONTCOPY patch. Can you do it? If you have no time now I can handle it. ----- Forwarded message from Hugh Dickins ----- From: Hugh Dickins To: Gleb Natapov Cc: Benjamin Herrenschmidt , Petr Vandrovec , Nick Piggin , "Michael S. Tsirkin" , Badari Pulavarty , Linux Kernel Mailing List Subject: Re: Nick's core remove PageReserved broke vmware... Date: Thu, 3 Nov 2005 14:11:46 +0000 (GMT) On Thu, 3 Nov 2005, Gleb Natapov wrote: > On Wed, Nov 02, 2005 at 10:02:49PM +0000, Hugh Dickins wrote: > > On Thu, 3 Nov 2005, Benjamin Herrenschmidt wrote: > > > On Wed, 2005-11-02 at 21:41 +0000, Hugh Dickins wrote: > > > > > > > The only extant problem here is if the pages are private, and you > > > > fork while this is going on, and the parent user process writes to the > > > > area before completion: then COW leaves the child with the page being > > > > DMAed into, giving the parent a copied page which may be incomplete. > > > > > > Won't happen, and if it does, it's a user error to rely on that working, > > > so it doesn't matter. > > > > I wish everyone else would see it that way! (But some people do > > have valid scenarios where it can't just be ruled out completely.) > > > I am one of those people :) > > Last discussion about this issue ended without resolution, but I remember > you mentioned the possibility to leave ptes writable in parent during fork > for private pages mapped for DMA. Is this approach acceptable? I was toying with that idea back then, but it leaves the pages in a peculiar limbo between being shared and private, such that it's hard to think through the consequences. We do already have a case rather like that (ptrace writing to a write-protected area), but some of us are a bit worried by that one, so I'd be foolish now to recommend another such subversion of the rules. In the time since we discussed before, I've rather come full circle round to my original position: abandoning such ideas of trying to handle it from get_user_pages itself, appreciating the simplicity of the original PROT_DONTCOPY idea from you guys; but sticking to my initial reaction that this is better done by madvise(MADV_DONTCOPY), not by the mmap/mprotect route in Michael's patch. (I never bought the "racy" argument advanced in favour of the mmap flag.) One of the factors which has swayed me to the DONTCOPY approach, is Nick's 2.6.14 optimization in fork's copy_page_range, where areas which can be safely faulted later are not copied pte by pte. But that doesn't apply to all areas, and in particular cannot apply to VM_NONLINEAR shared areas. It should be of benefit to apps which use large such areas, and also do a lot of forking children who don't need those areas, to be able to mark them VM_DONTCOPY. Or any other vmas the children won't need. (But there's one big distinction between the optimization and VM_DONTCOPY: the optimization copies vma but doesn't fill in its ptes, VM_DONTCOPY doesn't even copy the vma.) Two warnings if someone would like to post a MADV_DONTCOPY patch. It should include a matching MADV_DOCOPY to clear the condition, but that must not be allowed to clear VM_DONTCOPY set originally by driver: perhaps you'll end up with a VM_UDONTCOPY or something like that. And Badari has a MADV_REMOVE patch in the works, taking the next slot (just after MADV_DONTNEED in most of the arches): probably best for you to base yours on top of his (though yours is simpler and might jump ahead). Hugh ----- End forwarded message ----- -- Gleb. From schihei at de.ibm.com Thu Nov 3 06:28:38 2005 From: schihei at de.ibm.com (Heiko J Schick) Date: Thu, 03 Nov 2005 15:28:38 +0100 Subject: [openib-general] libehca causes segfault when not physically present.. In-Reply-To: <20051031071703.GU3275@kalmia.hozed.org> References: <20051031071703.GU3275@kalmia.hozed.org> Message-ID: <436A1E96.4050003@de.ibm.com> Hello Troy, this bug should be fixed in OpenIB trunk 3960. Many thanks for pointing out this problem. Regards, Heiko Troy Benjegerdes wrote: > On an Openpower720 system with a mellanox HCA (and no IBM ehca > installed), I get the following when trying to run ibv_rc_pingpong: > > Starting program: > /usr/src/openib-src/userspace/libibverbs/examples/.libs/ibv_rc_pingpong > [Thread debugging using libthread_db enabled] > [New Thread 4398046660640 (LWP 6167)] > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 4398046660640 (LWP 6167)] > hipz_galpa_store (galpa={fw_handle = 0}, offset=48, value=0) > at src/hcp_phyp.c:72 > 72 *(u64 *) addr = value; > (gdb) bt > #0 hipz_galpa_store (galpa={fw_handle = 0}, offset=48, value=0) > at src/hcp_phyp.c:72 > #1 0x0000000010001b7c in pp_post_recv (ctx=0x100177d0, n=-3807848) > at verbs.h:844 > #2 0x0000000010002364 in main (argc=Variable "argc" is not available. > ) at examples/rc_pingpong.c:566 > > > I assume this means something somewhere is not actually checking sysfs > to see if the driver is actually there and active. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > -- Mit freundlichen Gruessen / Kind Regards Heiko Joerg Schick ---------------------------------------------------------------------- Heiko J Schick I/O Firmware Development II Linux InfiniBand Device Drivers IBM Deutschland Entwicklung GmbH external: 49-07031-16-0 x4219 Schoenaicher Str. 220 t/l: 120-4129 71032 Boeblingen email: schickhj at de.ibm.com ---------------------------------------------------------------------- From mst at mellanox.co.il Thu Nov 3 06:39:15 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 3 Nov 2005 16:39:15 +0200 Subject: [openib-general] Re: [hugh@veritas.com: Re: Nick's core remove PageReserved broke vmware...] In-Reply-To: <20051103141924.GE22185@minantech.com> References: <20051103141924.GE22185@minantech.com> Message-ID: <20051103143915.GC31134@mellanox.co.il> Hello Geb, I expect so, unless more fires sping up. I'll let you know if I need help. Thanks for the offer, MST Quoting glebn at voltaire.com : > Subject: [hugh at veritas.com: Re: Nick's core remove PageReserved broke vmware...] > > Hello Michael, > > It seems that it is time to resurrect your DONTCOPY patch. Can you do > it? > If you have no time now I can handle it. > > ----- Forwarded message from Hugh Dickins ----- > > From: Hugh Dickins > To: Gleb Natapov > Cc: Benjamin Herrenschmidt , > Petr Vandrovec , > Nick Piggin , > "Michael S. Tsirkin" , > Badari Pulavarty , > Linux Kernel Mailing List > Subject: Re: Nick's core remove PageReserved broke vmware... > Date: Thu, 3 Nov 2005 14:11:46 +0000 (GMT) > > On Thu, 3 Nov 2005, Gleb Natapov wrote: > > On Wed, Nov 02, 2005 at 10:02:49PM +0000, Hugh Dickins wrote: > > > On Thu, 3 Nov 2005, Benjamin Herrenschmidt wrote: > > > > On Wed, 2005-11-02 at 21:41 +0000, Hugh Dickins wrote: > > > > > > > > > The only extant problem here is if the pages are private, and > you > > > > > fork while this is going on, and the parent user process writes > to the > > > > > area before completion: then COW leaves the child with the page > being > > > > > DMAed into, giving the parent a copied page which may be > incomplete. > > > > > > > > Won't happen, and if it does, it's a user error to rely on that > working, > > > > so it doesn't matter. > > > > > > I wish everyone else would see it that way! (But some people do > > > have valid scenarios where it can't just be ruled out completely.) > > > > > I am one of those people :) > > > > Last discussion about this issue ended without resolution, but I > remember > > you mentioned the possibility to leave ptes writable in parent during > fork > > for private pages mapped for DMA. Is this approach acceptable? > > I was toying with that idea back then, but it leaves the pages in a > peculiar limbo between being shared and private, such that it's hard > to think through the consequences. We do already have a case rather > like that (ptrace writing to a write-protected area), but some of us > are a bit worried by that one, so I'd be foolish now to recommend > another such subversion of the rules. > > In the time since we discussed before, I've rather come full circle > round to my original position: abandoning such ideas of trying to > handle it from get_user_pages itself, appreciating the simplicity > of the original PROT_DONTCOPY idea from you guys; but sticking to my > initial reaction that this is better done by madvise(MADV_DONTCOPY), > not by the mmap/mprotect route in Michael's patch. (I never bought > the "racy" argument advanced in favour of the mmap flag.) > > One of the factors which has swayed me to the DONTCOPY approach, is > Nick's 2.6.14 optimization in fork's copy_page_range, where areas > which can be safely faulted later are not copied pte by pte. But > that doesn't apply to all areas, and in particular cannot apply to > VM_NONLINEAR shared areas. It should be of benefit to apps which > use large such areas, and also do a lot of forking children who don't > need those areas, to be able to mark them VM_DONTCOPY. Or any other > vmas the children won't need. (But there's one big distinction between > the optimization and VM_DONTCOPY: the optimization copies vma but > doesn't fill in its ptes, VM_DONTCOPY doesn't even copy the vma.) > > Two warnings if someone would like to post a MADV_DONTCOPY patch. > It should include a matching MADV_DOCOPY to clear the condition, but > that must not be allowed to clear VM_DONTCOPY set originally by driver: > perhaps you'll end up with a VM_UDONTCOPY or something like that. > > And Badari has a MADV_REMOVE patch in the works, taking the next > slot (just after MADV_DONTNEED in most of the arches): probably > best for you to base yours on top of his (though yours is simpler > and might jump ahead). > > Hugh > > ----- End forwarded message ----- > > -- > Gleb. > -- MST From rolandd at cisco.com Thu Nov 3 07:09:55 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 07:09:55 -0800 Subject: [openib-general] Re: [PATCH] umad: fix hotplug In-Reply-To: <20051103083044.GJ31134@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 3 Nov 2005 10:30:44 +0200") References: <52veza74ao.fsf@cisco.com> <20051103083044.GJ31134@mellanox.co.il> Message-ID: <527jbp7oos.fsf@cisco.com> Thanks, good catch. - R. From mst at mellanox.co.il Thu Nov 3 07:19:39 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 3 Nov 2005 17:19:39 +0200 Subject: [openib-general] Re: [hugh@veritas.com: Re: Nick's core remove PageReserved broke vmware...] In-Reply-To: <20051103143915.GC31134@mellanox.co.il> References: <20051103143915.GC31134@mellanox.co.il> Message-ID: <20051103151939.GF31134@mellanox.co.il> Quoting Michael S. Tsirkin : > Hello Geb, Gleb :) Sorry about a typo. -- MST From halr at voltaire.com Thu Nov 3 07:32:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Nov 2005 10:32:46 -0500 Subject: [openib-general] [PATCH] umad: fix hotplug In-Reply-To: <52veza74ao.fsf@cisco.com> References: <52veza74ao.fsf@cisco.com> Message-ID: <1131031773.4338.332.camel@hal.voltaire.com> On Wed, 2005-11-02 at 23:18, Roland Dreier wrote: > I just committed the patch below, which should fix hotplug handling in > umad. The practical effect of this that you can do "modprobe -r ib_mthca" > with opensm running and not get an oops. > > Comments and test results solicited.... How do umad based application get notified of removals and insertions ? Are the devices ready for use then ? -- Hal From mst at mellanox.co.il Thu Nov 3 07:57:28 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Thu, 3 Nov 2005 17:57:28 +0200 Subject: [openib-general] Re: [PATCH] umad: fix hotplug In-Reply-To: <1131031773.4338.332.camel@hal.voltaire.com> References: <1131031773.4338.332.camel@hal.voltaire.com> Message-ID: <20051103155728.GI31134@mellanox.co.il> Quoting r. Hal Rosenstock : > Subject: Re: [PATCH] umad: fix hotplug > > On Wed, 2005-11-02 at 23:18, Roland Dreier wrote: > > I just committed the patch below, which should fix hotplug handling in > > umad. The practical effect of this that you can do "modprobe -r > ib_mthca" > > with opensm running and not get an oops. > > > > Comments and test results solicited.... > > How do umad based application get notified of removals and insertions ? > Are the devices ready for use then ? > > -- Hal Would not a regular hotplug event for umad work? -- MST From eli at mellanox.co.il Thu Nov 3 08:01:12 2005 From: eli at mellanox.co.il (Eli Cohen) Date: Thu, 3 Nov 2005 18:01:12 +0200 Subject: [openib-general] DHCP over Infiniband Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E306629F@mtlexch01.mtl.com> Hi, has anyone had the chance to run a DHCP server on an Infiniband interface? I checked this on Suse 10 kernel 2.6.13-15-bigsmp and I do not get responses from the server to DHCP requests. When running tcpdump on ib0 interface I can see the requests but the server does not respond. The server's version is isc-dhcpd-V3.0.3. I also tried version dhcp-3.0.4b1 but with no luck. I checked on Suse SLES 9 with Mellanox's IBGD1.8 and the server responds to requests. I still had a problem that the server does not set the client identifier option in its responses although the client does set this option. If you have any experience with this please let me know. Thanks Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From halr at voltaire.com Thu Nov 3 07:53:39 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Nov 2005 10:53:39 -0500 Subject: [openib-general] [PATCHv2] OpenSM: Workaround for IBM eHCA logical switch partition enforcement Message-ID: <1131032959.4338.369.camel@hal.voltaire.com> OpenSM: Workaround for IBM eHCA logical switch partition enforcement The problem is that the eHCA logical switches do not support partition enforcement. This *should* be reflected by a zero value in the PartitionEnforcementCap component of the switchinfo attribute. The IBM firmware bug is that it returns a one rather than a zero in this field. However, when subsequent requests to the switch port are received for the P_KeyTable, the firmware drops them on the floor and opensm thrashes timing out all the get P_KeyTable MADs it issues for all of the ports on the two logical switches. Remainder of patch supplied by Brad Benton Signed-off-by: Hal Rosenstock Index: osm_port_info_rcv.c =================================================================== --- osm_port_info_rcv.c (revision 3959) +++ osm_port_info_rcv.c (working copy) @@ -416,6 +416,7 @@ __osm_pi_rcv_process_router_port( OSM_LOG_EXIT( p_rcv->p_log ); } +#define IBM_VENDOR_ID (0x5076) /********************************************************************** **********************************************************************/ void osm_pkey_get_tables( @@ -468,7 +469,11 @@ void osm_pkey_get_tables( goto Exit; } - /* bail out if this is a switch with no partition enforcement capability */ + /* Check for IBM eHCA firmware defect in reporting partition enforcement cap */ + if (cl_ntoh32(ib_node_info_get_vendor_id(&p_node->node_info)) == IBM_VENDOR_ID) + p_switch->switch_info.enforce_cap = 0; + + /* Bail out if this is a switch with no partition enforcement capability */ if (cl_ntoh16(p_switch->switch_info.enforce_cap) == 0) goto Exit; From halr at voltaire.com Thu Nov 3 08:06:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Nov 2005 11:06:49 -0500 Subject: [openib-general] DHCP over Infiniband In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E306629F@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E306629F@mtlexch01.mtl.com> Message-ID: <1131034009.4338.389.camel@hal.voltaire.com> On Thu, 2005-11-03 at 11:01, Eli Cohen wrote: > Hi, > has anyone had the chance to run a DHCP server on an Infiniband > interface? I checked this on Suse 10 kernel 2.6.13-15-bigsmp and I do > not get responses from the server to DHCP requests. When running > tcpdump on ib0 interface I can see the requests but the server does > not respond. The server's version is isc-dhcpd-V3.0.3. I also tried > version dhcp-3.0.4b1 but with no luck. I checked on Suse SLES 9 with > Mellanox's IBGD1.8 and the server responds to requests. I still had a > problem that the server does not set the client identifier option in > its responses although the client does set this option. If you have > any experience with this please let me know. What DHCP server and what client are you using ? This has been done with the ISC ones. It requires modifications due to the difference in hardware addresses (and there is the QPN issue). -- Hal From eli at mellanox.co.il Thu Nov 3 08:47:07 2005 From: eli at mellanox.co.il (Eli Cohen) Date: Thu, 3 Nov 2005 18:47:07 +0200 Subject: [openib-general] DHCP over Infiniband Message-ID: <6AB138A2AB8C8E4A98B9C0C3D52670E30662A1@mtlexch01.mtl.com> The client is Etherboot's client for configuring a client at boot time. The server is ISC. Can you explain what you mean by "the QPN issue"? -----Original Message----- From: Hal Rosenstock [mailto:halr at voltaire.com] Sent: Thursday, November 03, 2005 6:07 PM To: Eli Cohen Cc: 'openib-general at openib.org' Subject: Re: [openib-general] DHCP over Infiniband On Thu, 2005-11-03 at 11:01, Eli Cohen wrote: > Hi, > has anyone had the chance to run a DHCP server on an Infiniband > interface? I checked this on Suse 10 kernel 2.6.13-15-bigsmp and I do > not get responses from the server to DHCP requests. When running > tcpdump on ib0 interface I can see the requests but the server does > not respond. The server's version is isc-dhcpd-V3.0.3. I also tried > version dhcp-3.0.4b1 but with no luck. I checked on Suse SLES 9 with > Mellanox's IBGD1.8 and the server responds to requests. I still had a > problem that the server does not set the client identifier option in > its responses although the client does set this option. If you have > any experience with this please let me know. What DHCP server and what client are you using ? This has been done with the ISC ones. It requires modifications due to the difference in hardware addresses (and there is the QPN issue). -- Hal _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Thu Nov 3 08:44:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 08:44:10 -0800 Subject: [openib-general] Re: [PATCH] umad: fix hotplug In-Reply-To: <20051103155728.GI31134@mellanox.co.il> (Michael S. Tsirkin's message of "Thu, 3 Nov 2005 17:57:28 +0200") References: <1131031773.4338.332.camel@hal.voltaire.com> <20051103155728.GI31134@mellanox.co.il> Message-ID: <52r79x65r9.fsf@cisco.com> Michael> Would not a regular hotplug event for umad work? Yes, and in fact they are generated -- that's how udev knows to create/destroy the device nodes for example. - R. From rolandd at cisco.com Thu Nov 3 08:44:39 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 08:44:39 -0800 Subject: [openib-general] [PATCH] umad: fix hotplug In-Reply-To: <1131031773.4338.332.camel@hal.voltaire.com> (Hal Rosenstock's message of "03 Nov 2005 10:32:46 -0500") References: <52veza74ao.fsf@cisco.com> <1131031773.4338.332.camel@hal.voltaire.com> Message-ID: <52mzkl65qg.fsf@cisco.com> Hal> How do umad based application get notified of removals and Hal> insertions ? Are the devices ready for use then ? There is no notification beyond the usual hotplug events that the kernel generates for all character devices. - R. From halr at voltaire.com Thu Nov 3 08:47:50 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Nov 2005 11:47:50 -0500 Subject: [openib-general] DHCP over Infiniband In-Reply-To: <6AB138A2AB8C8E4A98B9C0C3D52670E30662A1@mtlexch01.mtl.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30662A1@mtlexch01.mtl.com> Message-ID: <1131036320.4338.441.camel@hal.voltaire.com> On Thu, 2005-11-03 at 11:47, Eli Cohen wrote: > The client is Etherboot's client for configuring a client at boot > time. The server is ISC. I think that client needs modifications. > Can you explain what you mean by "the QPN issue"? The QPN is part of the hardware address and is not fixed. -- Hal > -----Original Message----- > From: Hal Rosenstock [mailto:halr at voltaire.com] > Sent: Thursday, November 03, 2005 6:07 PM > To: Eli Cohen > Cc: 'openib-general at openib.org' > Subject: Re: [openib-general] DHCP over Infiniband > > > On Thu, 2005-11-03 at 11:01, Eli Cohen wrote: > > Hi, > > has anyone had the chance to run a DHCP server on an Infiniband > > interface? I checked this on Suse 10 kernel 2.6.13-15-bigsmp and I > do > > not get responses from the server to DHCP requests. When running > > tcpdump on ib0 interface I can see the requests but the server does > > not respond. The server's version is isc-dhcpd-V3.0.3. I also tried > > version dhcp-3.0.4b1 but with no luck. I checked on Suse SLES 9 with > > Mellanox's IBGD1.8 and the server responds to requests. I still had > a > > problem that the server does not set the client identifier option in > > its responses although the client does set this option. If you have > > any experience with this please let me know. > > What DHCP server and what client are you using ? This has been done > with > the ISC ones. It requires modifications due to the difference in > hardware addresses (and there is the QPN issue). > > -- Hal > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From iod00d at hp.com Thu Nov 3 08:52:52 2005 From: iod00d at hp.com (Grant Grundler) Date: Thu, 3 Nov 2005 08:52:52 -0800 Subject: [openib-general] compilation platform dependencies In-Reply-To: <52pspkdt7z.fsf@cisco.com> References: <527jbsfdii.fsf@cisco.com> <20051101191905.GE6815@esmail.cup.hp.com> <52pspkdt7z.fsf@cisco.com> Message-ID: <20051103165252.GA32699@esmail.cup.hp.com> Hi Roland, since no one smarter touched this.... On Tue, Nov 01, 2005 at 12:10:56PM -0800, Roland Dreier wrote: > > I've seen use of this use of "data[0]": > > include/rdma/ib_user_verbs.h: __u64 driver_data[0]; > > > > isn't that for the same purpose? > > Apologies if I'm mixing things up... > > The driver_data[] in ib_user_verbs.h is really there to give a hint > that extra device-dependent data could follow. Reserved members of > structs are used to pad it up to a 64-bit boundary. Yeah, this is the right way to do it. I just wasn't sure. > I'm not sure if __u64 driver_data[0]; forces alignment to an 8-byte > boundary on i386... does it? I'm now convinced it doesn't on x86. See output below. thanks, grant grundler <481>uname -a Linux ob500 2.6.13 #6 Sat Oct 1 23:58:35 PDT 2005 i686 GNU/Linux grundler <482>cat alignment_test.c #include #include struct foo { int y; unsigned long long x; }; int main(void) { return printf("offset of x is %d\n", offsetof(struct foo, x)); } grundler <483>make alignment_test cc alignment_test.c -o alignment_test grundler <484>./alignment_test offset of x is 4 From halr at voltaire.com Thu Nov 3 08:51:04 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Nov 2005 11:51:04 -0500 Subject: [openib-general] Re: [PATCH] Opensm - bug in osm_sa_path_record with 0 records In-Reply-To: <1131025659.4338.206.camel@hal.voltaire.com> References: <5zacglyj4x.fsf@mtl066.yok.mtl.com> <1131025659.4338.206.camel@hal.voltaire.com> Message-ID: <1131036469.4338.446.camel@hal.voltaire.com> One additional comment on this: On Thu, 2005-11-03 at 08:50, Hal Rosenstock wrote: > On Thu, 2005-11-03 at 08:07, Yael Kalka wrote: > > Hi Hal, > > > > During some testing of path record we found a bug in the code. > > If the number of records return is zero, then there is clearing of > > non allocated memory. > > I've added some changes to the __osm_pr_rcv_respond function, to match > > other sa responses. > > Attached is a patch to fix it. > > A couple of minor comments below. > > -- Hal > > > Thanks, > > Yael > > > > Thanks, > > Yael > > > > Signed-off-by: Yael Kalka > > > > Index: opensm/osm_sa_path_record.c > > =================================================================== > > --- opensm/osm_sa_path_record.c (revision 3955) > > +++ opensm/osm_sa_path_record.c (working copy) > > @@ -1448,7 +1448,7 @@ __osm_pr_rcv_respond( > > osm_madw_t* p_resp_madw; > > const ib_sa_mad_t* p_sa_mad; > > ib_sa_mad_t* p_resp_sa_mad; > > - size_t num_rec, num_copied; > > + size_t num_rec, num_copied, pre_trim_num_rec; > > #ifndef VENDOR_RMPP_SUPPORT > > size_t trim_num_rec; > > #endif > > @@ -1456,6 +1456,7 @@ __osm_pr_rcv_respond( > > ib_api_status_t status; > > const ib_sa_mad_t* p_rcvd_mad = osm_madw_get_sa_mad_ptr( p_madw ); > > osm_pr_item_t* p_pr_item; > > + uint32_t i; > > > > OSM_LOG_ENTER( p_rcv->p_log, __osm_pr_rcv_respond ); > > > > @@ -1483,6 +1484,7 @@ __osm_pr_rcv_respond( > > goto Exit; > > } > > > > + pre_trim_num_rec = num_rec; > > #ifndef VENDOR_RMPP_SUPPORT > > trim_num_rec = (MAD_BLOCK_SIZE - IB_SA_MAD_HDR_SIZE) / sizeof(ib_path_rec_t); > > if (trim_num_rec < num_rec) > > @@ -1495,11 +1497,15 @@ __osm_pr_rcv_respond( > > } > > #endif > > > > - if( osm_log_is_active( p_rcv->p_log, OSM_LOG_DEBUG ) ) > > - { > > osm_log( p_rcv->p_log, OSM_LOG_DEBUG, > > "__osm_pr_rcv_respond: " > > "Generating response with %u records.\n", num_rec ); > > + > > + if ((p_rcvd_mad->method == IB_MAD_METHOD_GET) && (num_rec == 0)) > > + { > > + osm_sa_send_error( p_rcv->p_resp, p_madw, > > + IB_SA_MAD_STATUS_NO_RECORDS ); > > + goto Exit; > > } > > This can be moved up immediately after the C15-0.1.30 clause, OK ? > > > /* > > @@ -1514,6 +1520,16 @@ __osm_pr_rcv_respond( > > osm_log( p_rcv->p_log, OSM_LOG_ERROR, > > "__osm_pr_rcv_respond: ERR 1F14: " > > "Unable to allocate MAD.\n" ); > > + > > + for( i = 0; i < num_rec; i++ ) > > + { > > + p_pr_item = (osm_pr_item_t*)cl_qlist_remove_head( p_list ); > > + cl_qlock_pool_put( &p_rcv->pr_pool, &p_pr_item->pool_item ); > > + } > > + > > + osm_sa_send_error( p_rcv->p_resp, p_madw, > > + IB_SA_MAD_STATUS_NO_RESOURCES ); > > + > > osm_sa_send_error also attempts to get a MAD from the pool. Is there a > chance this succeeds after the one in this routine fails ? (Should this > be eliminated ?) > > > goto Exit; > > } > > > > @@ -1528,6 +1544,8 @@ __osm_pr_rcv_respond( > > p_resp_sa_mad->attr_offset = > > ib_get_attr_offset( sizeof(ib_path_rec_t) ); > > > > + p_resp_pr = (ib_path_rec_t*)ib_sa_mad_get_payload_ptr( p_resp_sa_mad ); > > + > > #ifndef VENDOR_RMPP_SUPPORT > > /* we support only one packet RMPP - so we will set the first and > > last flags for gettable */ > > @@ -1542,37 +1560,19 @@ __osm_pr_rcv_respond( > > p_resp_sa_mad->rmpp_flags = IB_RMPP_FLAG_ACTIVE; > > #endif > > > > - p_resp_pr = (ib_path_rec_t*)ib_sa_mad_get_payload_ptr( p_resp_sa_mad ); > > - > > - if ( num_rec == 0 ) > > - { > > - if (p_resp_sa_mad->method == IB_MAD_METHOD_GET_RESP) > > - p_resp_sa_mad->status = IB_SA_MAD_STATUS_NO_RECORDS; > > - cl_memclr( p_resp_pr, sizeof(*p_resp_pr) ); > > - } > > - else > > + for ( i = 0; i < pre_trim_num_rec; i++ ) > > { > > p_pr_item = (osm_pr_item_t*)cl_qlist_remove_head( p_list ); > > - > > - /* we need to track the number of copied items so we can > > - * stop the copy - but clear them all > > - */ > > - num_copied = 0; > > - > > - while( p_pr_item != (osm_pr_item_t*)cl_qlist_end( p_list ) ) > > - { > > - /* Copy the Path Records from the list into the MAD */ > > - if (num_copied < num_rec) > > - { > > + /* copy only if not trimmed */ > > + if (i < num_rec) > > *p_resp_pr = p_pr_item->path_rec; > > - num_copied++; > > - } > > + > > cl_qlock_pool_put( &p_rcv->pr_pool, &p_pr_item->pool_item ); > > p_resp_pr++; > > - p_pr_item = (osm_pr_item_t*)cl_qlist_remove_head( p_list ); > > - } > > } Should p_resp_pr only be incremented if i < num_recs ? Also, these comments apply to all the other SA record code as well. -- Hal > > + CL_ASSERT( cl_is_qlist_empty( p_list ) ); > > + > > status = osm_vendor_send( p_resp_madw->h_bind, p_resp_madw, FALSE ); > > > > if( status != IB_SUCCESS ) > > From robert.j.woodruff at intel.com Thu Nov 3 08:54:55 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Thu, 3 Nov 2005 08:54:55 -0800 Subject: [openib-general] RE: Problems with SDP on Itanium In-Reply-To: Message-ID: Woody wrote, >Yes, when I get some time, I will rebuild my kernel with debug >and re-run it. >woody Here are the dmesg logs when it hangs. woody Client side log: es. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240316> of <16384> bytes. ib_sdp DATA: <0> <1171> state <00001171> size <819939> pending <0> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <200000000115bd20> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aaae:0003ab3d> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aaaf:0003ab3d> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aab0:0003ab3d> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aab1:0003ab3d> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aab2:0003ab3d> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240317> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240318> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240319> of <16384> bytes. ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240320> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240321> of <16384> bytes. ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aab3:0003ab3d> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aab4:0003ab3d> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aab5:0003ab3d> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> state <00001171> size <738099> pending <49104> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <200000000116fcd0> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aab6:0003ab3d> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aab7:0003ab3e> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aab8:0003ab3e> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aab9:0003ab3e> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aaba:0003ab3e> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aabb:0003ab3e> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240322> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240323> of <16384> bytes. ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240324> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240325> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240326> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240327> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240328> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240329> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240330> of <16384> bytes. ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aabc:0003ab3e> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aabd:0003ab3e> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240331> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240332> of <16384> bytes. ib_sdp DATA: <0> <1171> state <00001171> size <558051> pending <0> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <200000000119bc20> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aabe:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aabf:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aac0:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aac1:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aac2:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240333> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240334> of <16384> bytes. ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240335> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240336> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240337> of <16384> bytes. ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aac3:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aac4:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aac5:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aac6:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> state <00001171> size <476211> pending <65472> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <20000000011afbd0> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aac7:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aac8:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aac9:0003ab3f> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240338> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240339> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240340> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240341> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240342> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240343> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240344> of <16384> bytes. ib_sdp DATA: <0> <1171> state <00001171> size <361635> pending <0> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <20000000011cbb60> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aaca:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aacb:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aacc:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aacd:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240345> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240346> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240347> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240348> of <16384> bytes. ib_sdp DATA: <0> <1171> state <00001171> size <296163> pending <0> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <20000000011dbb20> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aace:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aacf:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aad0:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240349> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240350> of <16384> bytes. ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240351> of <16384> bytes. ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aad1:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aad2:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> state <00001171> size <247059> pending <32736> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <20000000011e7af0> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aad3:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aad4:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aad5:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aad6:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aad7:0003ab40> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240352> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240353> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240354> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240355> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240356> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240357> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240358> of <16384> bytes. ib_sdp DATA: <0> <1171> state <00001171> size <132483> pending <0> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <2000000001203a80> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aad8:0003ab41> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aad9:0003ab41> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aada:0003ab41> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240359> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240360> of <16384> bytes. ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240361> of <16384> bytes. ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aadb:0003ab41> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aadc:0003ab41> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aadd:0003ab41> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> state <00001171> size <83379> pending <49104> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <200000000120fa50> users <1> flags <00000000> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240362> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240363> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240364> of <16384> bytes. ib_sdp DATA: <0> <1171> state <00001171> size <34275> pending <0> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <200000000121ba20> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aade:0003ab41> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00004000:0003aadf:0003ab41> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00000613:0003aae0:0003ab41> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <1539> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240365> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240366> of <16384> bytes. ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240367> of <16384> bytes. ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00000016:0003aae1:0003ab41> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <6> ib_sdp DATA: <0> <1171> send state <1171> size <6> flags <00000000> ib_sdp DATA: <0> <1171> write IOCB <-1> addr <60000ffffffbcaf0> user <1> flag <00000000> ib_sdp DATA: <0> <1171> state <00001171> size <6> pending <6> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <60000ffffffbcb00> users <1> flags <00000000> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240368> of <16384> bytes. ib_sdp DATA: <0> <1171> send state <1171> size <4> flags <00000000> ib_sdp DATA: <0> <1171> write IOCB <-1> addr <60000ffffffbcb00> user <1> flag <00000000> ib_sdp DATA: <0> <1171> send state <1171> size <6> flags <00000000> ib_sdp DATA: <0> <1171> write IOCB <-1> addr <60000ffffffbcaf0> user <1> flag <00000000> ib_sdp DATA: <0> <1171> state <00001171> size <6> pending <0> falgs <00000000> ib_sdp DATA: <0> <1171> read IOCB <-1> addr <60000ffffffbcb00> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <000f:00:ff:00000016:0003aae2:0003ab45> ib_sdp DATA: <0> <1171> RECV BUFF, bytes <6> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240369> of <16384> bytes. ib_sdp DATA: <0> <1171> send state <1171> size <2097149> flags <00000000> ib_sdp DATA: <0> <1171> write IOCB <-1> addr <20000000016c4000> user <1> flag <00000000> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:10000000:3bab0300:8caa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:10000000:3cab0300:9aaa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:10000000:3dab0300:a8aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:10000000:3eab0300:b2aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:10000000:3fab0300:bbaa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:10000000:40ab0300:c9aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:10000000:41ab0300:d7aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:10000000:42ab0300:e0aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <0f00:00:ff:16000000:43ab0300:e1aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:14000000:44ab0300:e1aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:16000000:45ab0300:e1aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:00400000:46ab0300:e2aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:00400000:47ab0300:e2aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:00400000:48ab0300:e2aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:00400000:49ab0300:e2aa0300> ib_sdp DATA: <0> <1171> SENT BSDH <1000:00:ff:00400000:4aab0300:e2aa0300> ib_sdp DATA: CQ event. hashent <0> ib_sdp DATA: <0> <1171> RECV BSDH <0010:00:ff:00000010:0003aae3:0003ab4e> ib_sdp DATA: <0> <1171> POST RECV BUFF wrid <240370> of <16384> bytes. ib_sdp CRTL: info delete <192.168.0.21> <4295552652:4295242151> Server side log: DATA: <2> <1171> read IOCB <-1> addr <20000000011f7d70> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab34:0003aa7e> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab35:0003aa7e> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab36:0003aa7e> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab37:0003aa7e> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab38:0003aa7e> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab39:0003aa7e> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000313:0003ab3a:0003aa7e> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <771> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240450> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240451> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240452> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240453> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240454> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240455> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240456> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240457> of <16384> bytes. ib_sdp DATA: <2> <1171> send state <1171> size <1572867> flags <00000000> ib_sdp DATA: <2> <1171> write IOCB <-1> addr <2000000001094000> user <1> flag <00000000> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000010:0003ab3b:0003aa8c> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240458> of <16384> bytes. ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:10000000:7eaa0300:30ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:10000000:7faa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:80aa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:81aa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:82aa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:83aa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:84aa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:85aa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:86aa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:87aa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:88aa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:89aa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:8aaa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:8baa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:8caa0300:3aab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:8daa0300:3bab0300> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000010:0003ab3c:0003aa9a> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240459> of <16384> bytes. ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:8eaa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:8faa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:90aa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:91aa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:92aa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:93aa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:94aa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:95aa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:96aa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:97aa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:98aa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:99aa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:9aaa0300:3bab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:9baa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:9caa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:9daa0300:3cab0300> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000010:0003ab3d:0003aaa8> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240460> of <16384> bytes. ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:9eaa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:9faa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:a0aa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:a1aa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:a2aa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:a3aa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:a4aa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:a5aa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:a6aa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:a7aa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:a8aa0300:3cab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:a9aa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:aaaa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:abaa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:acaa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:adaa0300:3dab0300> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000010:0003ab3e:0003aab2> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240461> of <16384> bytes. ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:aeaa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:afaa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:b0aa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:b1aa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:b2aa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:b3aa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:b4aa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:b5aa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:b6aa0300:3dab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:b7aa0300:3eab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:b8aa0300:3eab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:b9aa0300:3eab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:baaa0300:3eab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:bbaa0300:3eab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:bcaa0300:3eab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:bdaa0300:3eab0300> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000010:0003ab3f:0003aabb> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240462> of <16384> bytes. ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000010:0003ab40:0003aac9> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240463> of <16384> bytes. ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:beaa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:bfaa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:c0aa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:c1aa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:c2aa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:c3aa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:c4aa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:c5aa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:c6aa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:c7aa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:c8aa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:c9aa0300:3fab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:caaa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:cbaa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:ccaa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:cdaa0300:40ab0300> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000010:0003ab41:0003aad7> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240464> of <16384> bytes. ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:ceaa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:cfaa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:d0aa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:d1aa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:d2aa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:d3aa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:d4aa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:d5aa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:d6aa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:d7aa0300:40ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:d8aa0300:41ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:d9aa0300:41ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:daaa0300:41ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:dbaa0300:41ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:dcaa0300:41ab0300> ib_sdp DATA: <2> <1171> SENT BSDH <1000:00:ff:00400000:ddaa0300:41ab0300> ib_sdp DATA: <2> <1171> send state <1171> size <6> flags <00000000> ib_sdp DATA: <2> <1171> write IOCB <-1> addr <60000ffffffbcc40> user <1> flag <00000000> ib_sdp DATA: <2> <1171> state <00001171> size <6> pending <0> falgs <00000000> ib_sdp DATA: <2> <1171> read IOCB <-1> addr <60000ffffffbcc50> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000010:0003ab42:0003aae0> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240465> of <16384> bytes. ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <000f:00:ff:00000016:0003ab43:0003aae1> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <6> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240466> of <16384> bytes. ib_sdp DATA: <2> <1171> state <00001171> size <4> pending <0> falgs <00000000> ib_sdp DATA: <2> <1171> read IOCB <-1> addr <60000ffffffbcc50> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000014:0003ab44:0003aae1> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <4> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240467> of <16384> bytes. ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00000016:0003ab45:0003aae1> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <6> ib_sdp DATA: <2> <1171> send state <1171> size <6> flags <00000000> ib_sdp DATA: <2> <1171> write IOCB <-1> addr <60000ffffffbcc40> user <1> flag <00000000> ib_sdp DATA: <2> <1171> state <00001171> size <6> pending <6> falgs <00000000> ib_sdp DATA: <2> <1171> read IOCB <-1> addr <60000ffffffbcc50> users <1> flags <00000000> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240468> of <16384> bytes. ib_sdp DATA: <2> <1171> state <00001171> size <2097149> pending <0> falgs <00000000> ib_sdp DATA: <2> <1171> read IOCB <-1> addr <20000000016b4000> users <1> flags <00000000> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab46:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab47:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab48:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab49:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab4a:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab4b:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab4c:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab4d:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab4e:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240469> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240470> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240471> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240472> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240473> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240474> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240475> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240476> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240477> of <16384> bytes. ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab4f:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab50:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab51:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab52:0003aae2> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: CQ event. hashent <2> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab53:0003aae3> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab54:0003aae3> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> RECV BSDH <0010:00:ff:00004000:0003ab55:0003aae3> ib_sdp DATA: <2> <1171> RECV BUFF, bytes <16368> ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240478> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240479> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240480> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240481> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240482> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240483> of <16384> bytes. ib_sdp DATA: <2> <1171> POST RECV BUFF wrid <240484> of <16384> bytes. ib_sdp DATA: <2> <1171> state <00001171> size <1835261> pending <0> falgs <00000000> ib_sdp DATA: <2> <1171> read IOCB <-1> addr <20000000016f3f00> users <1> flags <00000000> From robert.j.woodruff at intel.com Thu Nov 3 09:06:44 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Thu, 3 Nov 2005 09:06:44 -0800 Subject: [openib-general] RE: Problems with SDP on Itanium In-Reply-To: Message-ID: Woody wrote, >Here are the dmesg logs when it hangs. This might have something to do with it. >ib_sdp CRTL: info delete <192.168.0.21> <4295552652:4295242151> The code looks like this message gets printed when SDP calls sdp_path_info_distroy(). Question is, why is it tearing down the connection ? /* * sdp_link_sweep - periodic path information cleanup function */ static void sdp_link_sweep(void *data) { struct sdp_path_info *info; struct sdp_path_info *sweep; sweep = info_list; while (sweep) { info = sweep; sweep = sweep->next; if (jiffies > (info->use + SDP_LINK_INFO_TIMEOUT)) { sdp_dbg_ctrl(NULL, "info delete <%d.%d.%d.%d> <%lu:%lu>", info->dst & 0x000000ff, (info->dst & 0x0000ff00) >> 8, (info->dst & 0x00ff0000) >> 16, (info->dst & 0xff000000) >> 24, jiffies, info->use); sdp_path_info_destroy(info, -ETIMEDOUT); } } From hch at lst.de Thu Nov 3 09:12:10 2005 From: hch at lst.de (Christoph Hellwig) Date: Thu, 3 Nov 2005 18:12:10 +0100 Subject: [openib-general] compilation platform dependencies In-Reply-To: <20051103165252.GA32699@esmail.cup.hp.com> References: <527jbsfdii.fsf@cisco.com> <20051101191905.GE6815@esmail.cup.hp.com> <52pspkdt7z.fsf@cisco.com> <20051103165252.GA32699@esmail.cup.hp.com> Message-ID: <20051103171210.GA12783@lst.de> On Thu, Nov 03, 2005 at 08:52:52AM -0800, Grant Grundler wrote: > > I'm not sure if __u64 driver_data[0]; forces alignment to an 8-byte > > boundary on i386... does it? > > I'm now convinced it doesn't on x86. > See output below. Yes, alignment rules for x86 are different for every other architecture in that respect. It causes a lot of problems with ioctl translations for x86 binaries on ia64/x86_64. For the private data I'd suggest you copy the network driver layer approach, see alloc_netdev and netdev_priv for details. From johnip at sgi.com Thu Nov 3 09:39:03 2005 From: johnip at sgi.com (John Partridge) Date: Thu, 03 Nov 2005 11:39:03 -0600 Subject: [openib-general] mvapich-gen2 on 2 x 16 CPU SGI Altix 1330 cluster Message-ID: <436A4B37.1080801@sgi.com> Hi DK, I just though you would like to know that I have now tested the Pallas benchmark on a two node SGI Altix 1330 cluster using OpenIB and mvapich-gen2. Each node had 16 CPU's. To do this I had to change SMPI_MAX_NUMLOCALNODES to be defined as 16 instead of the normal 4 for the test. I ran a 2x16 (32 total) CPU Pallas benchmark several times with no hang ups or errors. I'm wondering if there would be any more changes I would need to make for scaling to much larger systems. I do plan at some point in the near future to test this on a much larger system with a LOT more CPU's The test was conducted using a "kernel.org" 2.6.14 kernel and an OpenIB svn gen2 release of 3926 using Voltaire HCA's and switch We will be demonstrating OpenIB and mvapich-gen2 mpi at Supercomputing 05 (running smaller jobs though because the 32 way jobs take so long to complete). We will also demo rdma_lat, rdma_bw and IpoIB. I can send you the pallas results if you are interested. Regards John -- John Partridge Silicon Graphics Inc Tel: 651-683-3428 Vnet: 233-3428 E-Mail: johnip at sgi.com From mshefty at ichips.intel.com Thu Nov 3 09:44:18 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 03 Nov 2005 09:44:18 -0800 Subject: [openib-general] netstat In-Reply-To: References: Message-ID: <436A4C71.3020007@ichips.intel.com> yipee wrote: > Is there some way to view the list of current CM end points in their various > states (listen,connection)? Nothing like this is available today. I can record this as something to add in the future, but it's unlikely to be a high priority for at least a few weeks. - Sean From ardavis at ichips.intel.com Thu Nov 3 09:51:14 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 03 Nov 2005 09:51:14 -0800 Subject: [openib-general] [PATCH] Re: uDAPL again In-Reply-To: <436938D5.6030403@ichips.intel.com> References: <436906F0.3050803@cs.rutgers.edu> <43691B71.2040500@ichips.intel.com> <43692526.3030003@cs.rutgers.edu> <436938D5.6030403@ichips.intel.com> Message-ID: <436A4E12.1060005@ichips.intel.com> Arlin Davis wrote: > Aniruddha Bohra wrote: > >> I am not sure, but arent uCM and uAT simply for connection >> establishment? >> > Yes, but they also set up many of the transfer attributes of the > connected QP. The uCM/uAT version uses path_records from the SA query > but the socket_CM version just builds them by hand similiar to the way > ibv_rc_pingpong does. You would have to look at the > pathrecord->pktlifetime to see the actual timeout value being used. > Ok, I added some debug and it looks like the path record returned from uAT looks suspect. Here are the results from tuAT and opensm running on my cluster. Path record pktlife is 0 (uCM adds 1) so the ACK timeout value for this connection will be very short. path_comp_handler: ctxt 0x525fa0, req_id 90 rec_num 1 path_comp_handler: SRC GID subnet fe80000000000000 id 0002c9020000409d path_comp_handler: DST GID subnet fe80000000000000 id 0002c90200004071 path_comp_handler: slid 5 dlid 2 mtu 120203(2) pktlife 0(0) <<< ??? path_comp_handler: hops 0 npaths 0 pkey ffff tclass 0 rate 0(0) <<< ??? Hal, can you take a look at uAT and see if the copy to user space is working correctly. Aniruddha, can you apply the following patch and send us the output from your run? -arlin Signed-off by: Arlin Davis Index: dapl/openib/dapl_ib_cm.c =================================================================== --- dapl/openib/dapl_ib_cm.c (revision 3951) +++ dapl/openib/dapl_ib_cm.c (working copy) @@ -136,14 +136,27 @@ dapl_dbg_log(DAPL_DBG_TYPE_CM, " path_comp_handler: SRC GID subnet %016llx id %016llx\n", - (unsigned long long)cpu_to_be64(conn->dapl_rt.sgid.global.subnet_prefix), - (unsigned long long)cpu_to_be64(conn->dapl_rt.sgid.global.interface_id) ); + (unsigned long long)cpu_to_be64(conn->dapl_path.sgid.global.subnet_prefix), + (unsigned long long)cpu_to_be64(conn->dapl_path.sgid.global.interface_id) ); dapl_dbg_log(DAPL_DBG_TYPE_CM, " path_comp_handler: DST GID subnet %016llx id %016llx\n", - (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.subnet_prefix), - (unsigned long long)cpu_to_be64(conn->dapl_rt.dgid.global.interface_id) ); + (unsigned long long)cpu_to_be64(conn->dapl_path.dgid.global.subnet_prefix), + (unsigned long long)cpu_to_be64(conn->dapl_path.dgid.global.interface_id) ); + dapl_dbg_log(DAPL_DBG_TYPE_CM, + " path_comp_handler: slid %x dlid %x mtu %x(%x) pktlife %x(%x)\n", + ntohs(conn->dapl_path.slid), ntohs(conn->dapl_path.dlid), + conn->dapl_path.mtu, conn->dapl_path.mtu_selector, + conn->dapl_path.packet_life_time, + conn->dapl_path.packet_life_time_selector ); + + dapl_dbg_log(DAPL_DBG_TYPE_CM, + " path_comp_handler: hops %x npaths %x pkey %x tclass %x rate %x(%x)\n", + conn->dapl_path.hop_limit, conn->dapl_path.numb_path, + conn->dapl_path.pkey, conn->dapl_path.traffic_class, + conn->dapl_path.rate, conn->dapl_path.rate_selector); + if (rec_num <= 0) { dapl_dbg_log(DAPL_DBG_TYPE_CM, " path_comp_handler: ERR %d retry %d\n", From halr at voltaire.com Thu Nov 3 09:57:46 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 03 Nov 2005 12:57:46 -0500 Subject: [openib-general] [PATCH] Re: uDAPL again In-Reply-To: <436A4E12.1060005@ichips.intel.com> References: <436906F0.3050803@cs.rutgers.edu> <43691B71.2040500@ichips.intel.com> <43692526.3030003@cs.rutgers.edu> <436938D5.6030403@ichips.intel.com> <436A4E12.1060005@ichips.intel.com> Message-ID: <1131040666.4340.12.camel@hal.voltaire.com> Hi Arlin, On Thu, 2005-11-03 at 12:51, Arlin Davis wrote: > Arlin Davis wrote: > > > Aniruddha Bohra wrote: > > > >> I am not sure, but arent uCM and uAT simply for connection > >> establishment? > >> > > Yes, but they also set up many of the transfer attributes of the > > connected QP. The uCM/uAT version uses path_records from the SA query > > but the socket_CM version just builds them by hand similiar to the way > > ibv_rc_pingpong does. You would have to look at the > > pathrecord->pktlifetime to see the actual timeout value being used. > > > Ok, I added some debug and it looks like the path record returned from > uAT looks suspect. Here are the results from tuAT and opensm running on > my cluster. Path record pktlife is 0 (uCM adds 1) so the ACK timeout > value for this connection will be very short. > > path_comp_handler: ctxt 0x525fa0, req_id 90 rec_num 1 > path_comp_handler: SRC GID subnet fe80000000000000 id 0002c9020000409d > path_comp_handler: DST GID subnet fe80000000000000 id 0002c90200004071 > path_comp_handler: slid 5 dlid 2 mtu 120203(2) pktlife > 0(0) <<< ??? > path_comp_handler: hops 0 npaths 0 pkey ffff tclass 0 rate > 0(0) <<< ??? > > Hal, can you take a look at uAT and see if the copy to user space is > working correctly. Just want to clarify what I should be looking for: So you suspect pktlife and rate being bad (and the rest of the SA PR look OK) ? Is OpenSM being used in Aniruddha's subnet ? -- Hal From sean.hefty at intel.com Thu Nov 3 10:01:12 2005 From: sean.hefty at intel.com (Sean Hefty) Date: Thu, 3 Nov 2005 10:01:12 -0800 Subject: [openib-general] [RFC] patch to export userspace to kernel QP attribute structure Message-ID: The following patch would expose support in uverbs to define a QP attribute structure that could be used by other kernel modules (e.g. the IB CM and CMA) needing to exchange QP attribute information with a userspace library. I've only compile tested this patch, since I'm only looking for feedback on whether this approach is acceptable. Similar functionality would be added to libibverbs. As a side note, uverbs defines two structures that are almost the same: ib_uverbs_qp_dest and ib_uverbs_ah_attr. The only difference is that the reserved fields in each are in different locations, so eliminating one of the structures would result in an abi change. - Sean Index: core/uverbs_cmd.c =================================================================== --- core/uverbs_cmd.c (revision 3947) +++ core/uverbs_cmd.c (working copy) @@ -808,6 +808,75 @@ return ret ? ret : in_len; } +static void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst, + struct ib_ah_attr *src) +{ + memcpy(dst->grh.dgid, src->grh.dgid.raw, sizeof dst->grh.dgid); + dst->grh.flow_label = src->grh.flow_label; + dst->grh.sgid_index = src->grh.sgid_index; + dst->grh.hop_limit = src->grh.hop_limit; + dst->grh.traffic_class = src->grh.traffic_class; + dst->dlid = src->dlid; + dst->sl = src->sl; + dst->src_path_bits = src->src_path_bits; + dst->static_rate = src->static_rate; + dst->is_global = src->ah_flags & IB_AH_GRH ? 1 : 0; + dst->port_num = src->port_num; +} + +static void ib_copy_ah_attr_from_user(struct ib_ah_attr *dst, + struct ib_uverbs_ah_attr *src) +{ + memcpy(dst->grh.dgid.raw, src->grh.dgid, sizeof dst->grh.dgid); + dst->grh.flow_label = src->grh.flow_label; + dst->grh.sgid_index = src->grh.sgid_index; + dst->grh.hop_limit = src->grh.hop_limit; + dst->grh.traffic_class = src->grh.traffic_class; + dst->dlid = src->dlid; + dst->sl = src->sl; + dst->src_path_bits = src->src_path_bits; + dst->static_rate = src->static_rate; + dst->ah_flags = src->is_global ? IB_AH_GRH : 0; + dst->port_num = src->port_num; +} + +void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst, + struct ib_qp_attr *src) +{ + dst->cur_qp_state = src->cur_qp_state; + dst->path_mtu = src->path_mtu; + dst->path_mig_state = src->path_mig_state; + dst->qkey = src->qkey; + dst->rq_psn = src->rq_psn; + dst->sq_psn = src->sq_psn; + dst->dest_qp_num = src->dest_qp_num; + dst->qp_access_flags = src->qp_access_flags; + + dst->max_send_wr = src->cap.max_send_wr; + dst->max_recv_wr = src->cap.max_recv_wr; + dst->max_send_sge = src->cap.max_send_sge; + dst->max_recv_sge = src->cap.max_recv_sge; + dst->max_inline_data = src->cap.max_inline_data; + + ib_copy_ah_attr_to_user(&dst->ah_attr, &src->ah_attr); + ib_copy_ah_attr_to_user(&dst->alt_ah_attr, &src->alt_ah_attr); + + dst->pkey_index = src->pkey_index; + dst->alt_pkey_index = src->alt_pkey_index; + dst->en_sqd_async_notify = src->en_sqd_async_notify; + dst->sq_draining = src->sq_draining; + dst->max_rd_atomic = src->max_rd_atomic; + dst->max_dest_rd_atomic = src->max_dest_rd_atomic; + dst->min_rnr_timer = src->min_rnr_timer; + dst->port_num = src->port_num; + dst->timeout = src->timeout; + dst->retry_cnt = src->retry_cnt; + dst->rnr_retry = src->rnr_retry; + dst->alt_port_num = src->alt_port_num; + dst->alt_timeout = src->alt_timeout; +} +EXPORT_SYMBOL(ib_copy_qp_attr_to_user); + ssize_t ib_uverbs_create_qp(struct ib_uverbs_file *file, const char __user *buf, int in_len, int out_len) @@ -1433,16 +1502,7 @@ uobj->user_handle = cmd.user_handle; uobj->context = file->ucontext; - attr.dlid = cmd.attr.dlid; - attr.sl = cmd.attr.sl; - attr.src_path_bits = cmd.attr.src_path_bits; - attr.static_rate = cmd.attr.static_rate; - attr.port_num = cmd.attr.port_num; - attr.grh.flow_label = cmd.attr.grh.flow_label; - attr.grh.sgid_index = cmd.attr.grh.sgid_index; - attr.grh.hop_limit = cmd.attr.grh.hop_limit; - attr.grh.traffic_class = cmd.attr.grh.traffic_class; - memcpy(attr.grh.dgid.raw, cmd.attr.grh.dgid, 16); + ib_copy_ah_attr_from_user(&attr, &cmd.attr); ah = ib_create_ah(pd, &attr); if (IS_ERR(ah)) { Index: include/rdma/ib_user_verbs.h =================================================================== --- include/rdma/ib_user_verbs.h (revision 3947) +++ include/rdma/ib_user_verbs.h (working copy) @@ -38,6 +38,7 @@ #define IB_USER_VERBS_H #include +#include /* * Increment this value if any changes that break userspace ABI @@ -311,6 +312,64 @@ __u32 async_events_reported; }; +struct ib_uverbs_global_route { + __u8 dgid[16]; + __u32 flow_label; + __u8 sgid_index; + __u8 hop_limit; + __u8 traffic_class; + __u8 reserved; +}; + +struct ib_uverbs_ah_attr { + struct ib_uverbs_global_route grh; + __u16 dlid; + __u8 sl; + __u8 src_path_bits; + __u8 static_rate; + __u8 is_global; + __u8 port_num; + __u8 reserved; +}; + +struct ib_uverbs_qp_attr { + __u32 qp_attr_mask; + __u32 qp_state; + __u32 cur_qp_state; + __u32 path_mtu; + __u32 path_mig_state; + __u32 qkey; + __u32 rq_psn; + __u32 sq_psn; + __u32 dest_qp_num; + __u32 qp_access_flags; + + struct ib_uverbs_ah_attr ah_attr; + struct ib_uverbs_ah_attr alt_ah_attr; + + /* ib_qp_cap */ + __u32 max_send_wr; + __u32 max_recv_wr; + __u32 max_send_sge; + __u32 max_recv_sge; + __u32 max_inline_data; + + __u16 pkey_index; + __u16 alt_pkey_index; + __u8 en_sqd_async_notify; + __u8 sq_draining; + __u8 max_rd_atomic; + __u8 max_dest_rd_atomic; + __u8 min_rnr_timer; + __u8 port_num; + __u8 timeout; + __u8 retry_cnt; + __u8 rnr_retry; + __u8 alt_port_num; + __u8 alt_timeout; + __u8 reserved[5]; +}; + struct ib_uverbs_create_qp { __u64 response; __u64 user_handle; @@ -482,26 +541,6 @@ __u32 bad_wr; }; -struct ib_uverbs_global_route { - __u8 dgid[16]; - __u32 flow_label; - __u8 sgid_index; - __u8 hop_limit; - __u8 traffic_class; - __u8 reserved; -}; - -struct ib_uverbs_ah_attr { - struct ib_uverbs_global_route grh; - __u16 dlid; - __u8 sl; - __u8 src_path_bits; - __u8 static_rate; - __u8 is_global; - __u8 port_num; - __u8 reserved; -}; - struct ib_uverbs_create_ah { __u64 response; __u64 user_handle; @@ -568,4 +607,7 @@ __u32 events_reported; }; +void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst, + struct ib_qp_attr *src); + #endif /* IB_USER_VERBS_H */ From rolandd at cisco.com Thu Nov 3 10:12:21 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 10:12:21 -0800 Subject: [openib-general] Re: [RFC] patch to export userspace to kernel QP attribute structure In-Reply-To: (Sean Hefty's message of "Thu, 3 Nov 2005 10:01:12 -0800") References: Message-ID: <524q6t61oa.fsf@cisco.com> Seems OK but maybe we should create a new file (uverbs_marshall.c?) rather than dumping more stuff into uverbs_cmd.c. That file is big enough as it is. Also: > --- include/rdma/ib_user_verbs.h (revision 3947) > +++ include/rdma/ib_user_verbs.h (working copy) > @@ -38,6 +38,7 @@ > #define IB_USER_VERBS_H > > #include > +#include [snip] > +void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst, > + struct ib_qp_attr *src); I very carefully made ib_user_verbs.h a file that did not include any kernel internals and could be safely included from userspace. So this needs to go in a different file (probably just ib_verbs.h is fine). - R. From rpandit at silverstorm.com Thu Nov 3 10:15:57 2005 From: rpandit at silverstorm.com (Pandit, Ranjit) Date: Thu, 3 Nov 2005 13:15:57 -0500 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable Datagram Sockets) to OpenIB Message-ID: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> I'm hoping to start the port soon. It should have started earlier, but unfortunately I got side tracked by some unforeseen issues. Ranjit > -----Original Message----- > From: Roland Dreier [mailto:rolandd at cisco.com] > Sent: Wednesday, November 02, 2005 3:28 PM > To: Pandit, Ranjit; openib-general at openib.org > Subject: Re: [openib-general] [ANNOUNCE] Contribute RDS (Reliable Datagram > Sockets) to OpenIB > > What are your plans for porting the RDS code so that it works with the > upstream Linux IB stack? I've only seen a couple of checkins, and the > code that you've dropped so far doesn't look usable and needs a lot of > cleanup. There's not even a Makefile there. > > Someone uncharitable might believe that the whole purpose of this > exercise was just to be able to issue your press release > (http://silverstorm.com/news/rel/092005.asp). > > - R. From mshefty at ichips.intel.com Thu Nov 3 10:29:20 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 03 Nov 2005 10:29:20 -0800 Subject: [openib-general] Re: [RFC] patch to export userspace to kernel QP attribute structure In-Reply-To: <524q6t61oa.fsf@cisco.com> References: <524q6t61oa.fsf@cisco.com> Message-ID: <436A5700.8090102@ichips.intel.com> Roland Dreier wrote: > Seems OK but maybe we should create a new file (uverbs_marshall.c?) Sounds fine, but see below... > I very carefully made ib_user_verbs.h a file that did not include any > kernel internals and could be safely included from userspace. So this > needs to go in a different file (probably just ib_verbs.h is fine). I have no problem moving this. If the declaration goes in ib_verbs.h, should the function just go in verbs.c (or a new file) and be included as part of ib_core? It just seems a little off to me to implement functions declared in a single header file in separate modules. Also, I'll need to do this with a path record as well. I can either include the routine as part of ib_sa, add a new module, ib_user_sa, or drop that call into one of the existing ib_user_* modules. - Sean From bohra at cs.rutgers.edu Thu Nov 3 10:38:15 2005 From: bohra at cs.rutgers.edu (Aniruddha Bohra) Date: Thu, 03 Nov 2005 13:38:15 -0500 Subject: [openib-general] [PATCH] Re: uDAPL again In-Reply-To: <436A4E12.1060005@ichips.intel.com> References: <436906F0.3050803@cs.rutgers.edu> <43691B71.2040500@ichips.intel.com> <43692526.3030003@cs.rutgers.edu> <436938D5.6030403@ichips.intel.com> <436A4E12.1060005@ichips.intel.com> Message-ID: <436A5917.1080306@cs.rutgers.edu> Aniruddha, can you apply the following patch and send us the output from your run? Hi Arlin The log is at http://www.cs.rutgers.edu/~bohra/dapl.log. Hal, OpenSM is running on our subnet. Aniruddha > > -arlin > > Signed-off by: Arlin Davis > > Index: dapl/openib/dapl_ib_cm.c > =================================================================== > --- dapl/openib/dapl_ib_cm.c (revision 3951) > +++ dapl/openib/dapl_ib_cm.c (working copy) > @@ -136,14 +136,27 @@ > > dapl_dbg_log(DAPL_DBG_TYPE_CM, > " path_comp_handler: SRC GID subnet %016llx id %016llx\n", > - (unsigned long > long)cpu_to_be64(conn->dapl_rt.sgid.global.subnet_prefix), > - (unsigned long > long)cpu_to_be64(conn->dapl_rt.sgid.global.interface_id) ); > + (unsigned long > long)cpu_to_be64(conn->dapl_path.sgid.global.subnet_prefix), > + (unsigned long > long)cpu_to_be64(conn->dapl_path.sgid.global.interface_id) ); > > dapl_dbg_log(DAPL_DBG_TYPE_CM, > " path_comp_handler: DST GID subnet %016llx id %016llx\n", > - (unsigned long > long)cpu_to_be64(conn->dapl_rt.dgid.global.subnet_prefix), > - (unsigned long > long)cpu_to_be64(conn->dapl_rt.dgid.global.interface_id) ); > + (unsigned long > long)cpu_to_be64(conn->dapl_path.dgid.global.subnet_prefix), > + (unsigned long > long)cpu_to_be64(conn->dapl_path.dgid.global.interface_id) ); > > + dapl_dbg_log(DAPL_DBG_TYPE_CM, > + " path_comp_handler: slid %x dlid %x mtu %x(%x) > pktlife %x(%x)\n", > + ntohs(conn->dapl_path.slid), ntohs(conn->dapl_path.dlid), > + conn->dapl_path.mtu, conn->dapl_path.mtu_selector, > + conn->dapl_path.packet_life_time, > + conn->dapl_path.packet_life_time_selector ); > + > + dapl_dbg_log(DAPL_DBG_TYPE_CM, > + " path_comp_handler: hops %x npaths %x pkey %x tclass > %x rate %x(%x)\n", > + conn->dapl_path.hop_limit, conn->dapl_path.numb_path, > + conn->dapl_path.pkey, conn->dapl_path.traffic_class, > + conn->dapl_path.rate, conn->dapl_path.rate_selector); > + > if (rec_num <= 0) { > dapl_dbg_log(DAPL_DBG_TYPE_CM, > " path_comp_handler: ERR %d retry %d\n", > > > From ardavis at ichips.intel.com Thu Nov 3 10:49:19 2005 From: ardavis at ichips.intel.com (Arlin Davis) Date: Thu, 03 Nov 2005 10:49:19 -0800 Subject: [openib-general] [PATCH] Re: uDAPL again In-Reply-To: <1131040666.4340.12.camel@hal.voltaire.com> References: <436906F0.3050803@cs.rutgers.edu> <43691B71.2040500@ichips.intel.com> <43692526.3030003@cs.rutgers.edu> <436938D5.6030403@ichips.intel.com> <436A4E12.1060005@ichips.intel.com> <1131040666.4340.12.camel@hal.voltaire.com> Message-ID: <436A5BAF.7060202@ichips.intel.com> Hal Rosenstock wrote: >Hi Arlin, > > >> >>Hal, can you take a look at uAT and see if the copy to user space is >>working correctly. >> >> > >Just want to clarify what I should be looking for: > >So you suspect pktlife and rate being bad (and the rest of the SA PR >look OK) ? > > Yes, you can see from the debug print that GIDs, LIDs, pkey, mtu look ok. Here is Aniruddha's latest output from a run with opensm: path_comp_handler: ctxt 0x808a008, req_id 292 rec_num 1 path_comp_handler: SRC GID subnet fe80000000000000 id 0002c901081e7471 path_comp_handler: DST GID subnet fe80000000000000 id 0001730000008461 path_comp_handler: slid 1 dlid 3 mtu 120203(2) pktlife 0(0) path_comp_handler: hops 0 npaths 0 pkey ffff tclass 0 rate 0(0) >Is OpenSM being used in Aniruddha's subnet ? > >-- Hal > > > From rolandd at cisco.com Thu Nov 3 11:07:14 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 11:07:14 -0800 Subject: [openib-general] Re: [RFC] patch to export userspace to kernel QP attribute structure In-Reply-To: <436A5700.8090102@ichips.intel.com> (Sean Hefty's message of "Thu, 03 Nov 2005 10:29:20 -0800") References: <524q6t61oa.fsf@cisco.com> <436A5700.8090102@ichips.intel.com> Message-ID: <52vez94kkd.fsf@cisco.com> Sean> I have no problem moving this. If the declaration goes in Sean> ib_verbs.h, should the function just go in verbs.c (or a new Sean> file) and be included as part of ib_core? It just seems a Sean> little off to me to implement functions declared in a single Sean> header file in separate modules. We can easily create a new header for these declaration. That's probably the cleanest thing to do, although I can't come up with a very good name for it right now, though.... Sean> Also, I'll need to do this with a path record as well. I Sean> can either include the routine as part of ib_sa, add a new Sean> module, ib_user_sa, or drop that call into one of the Sean> existing ib_user_* modules. If it's just marshalling between user and kernel formats, I'd stick it in uverbs_marshall.c. But if there's going to be something substantial then maybe it make sense to create a user SA module. - R. From rolandd at cisco.com Thu Nov 3 11:13:58 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 11:13:58 -0800 Subject: [openib-general] libehca causes segfault when not physically present.. In-Reply-To: <436A1E96.4050003@de.ibm.com> (Heiko J. Schick's message of "Thu, 03 Nov 2005 15:28:38 +0100") References: <20051031071703.GU3275@kalmia.hozed.org> <436A1E96.4050003@de.ibm.com> Message-ID: <52r79x4k95.fsf@cisco.com> Heiko> this bug should be fixed in OpenIB trunk 3960. It's good to see this fixed and all the other cleanups in this checkin. I'll have to go back to my ehca code reviewing.... However, when this code moves upstream, you'll have to make your changes in smaller digestible chunks. The diff between r3959 and r3960 is rather gigantic: 33 files changed, 945 insertions(+), 1163 deletions(-) And this piece: > -MODULE_VERSION("EHCA2_0035"); > +MODULE_VERSION("EHCA2_0037"); indicates that there was a 0036 that you never let anyone see. I would suggest you try to use the openib.org svn tree as your real development repository. This will be the way you will have to work once your driver is in the upstream kernel, and even now you will get benefit from getting better patch review and having users better able to pin down when a regression might have been introduced. For your latest checkin, it would have been better to see a series of changesets with commit logs like: - remove asm_sync_mem() and mftb(), which duplicate existing definitions in include/asm-ppc64 - make sure device is an eHCA in libehca's openib_driver_init() - update Kconfig help text and so on... Thanks, Roland From ladros at gmail.com Thu Nov 3 11:24:15 2005 From: ladros at gmail.com (Josh Aune) Date: Thu, 3 Nov 2005 14:24:15 -0500 Subject: [openib-general] DHCP over Infiniband In-Reply-To: <1131036320.4338.441.camel@hal.voltaire.com> References: <6AB138A2AB8C8E4A98B9C0C3D52670E30662A1@mtlexch01.mtl.com> <1131036320.4338.441.camel@hal.voltaire.com> Message-ID: <98a233180511031124v50347fat521cbf85091e2bf1@mail.gmail.com> On 03 Nov 2005 11:47:50 -0500, Hal Rosenstock wrote: > > On Thu, 2005-11-03 at 11:47, Eli Cohen wrote: > > The client is Etherboot's client for configuring a client at boot > > time. The server is ISC. > > I think that client needs modifications. No version of etherboot that I have used supports IB interfaces.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbublundh at hotmail.com Thu Nov 3 11:19:00 2005 From: jbublundh at hotmail.com (Nestor Mason) Date: Thu, 3 Nov 2005 20:19:00 +0100 Subject: [openib-general] Mortgage News Update. Message-ID: <23910554095115.jbublundh@hotmail.com> We are happy to present you with six deals from four different brokers. Please remember that there is no commitment required on your part, and your credit is not an issue. Please validate your information with our secure and private database to ensure our records are up to date and accurate. http://peace-1.com/p1.asp Have a good day. Sincerely, Nestor Mason Customer Service Rep eIZL Inc. puffery may foolhardy may or penance or it shrewd be and grendel a may corrugate may or lace , ! message aa necromancy try. Update on site mar on duty it , tollgate and a lice see it incombustible , it retinal a on ghoulish ! and hath it'sit's skittle the. From mshefty at ichips.intel.com Thu Nov 3 12:45:04 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 03 Nov 2005 12:45:04 -0800 Subject: [openib-general] Re: [RFC] patch to export userspace to kernel QP attribute structure In-Reply-To: <52vez94kkd.fsf@cisco.com> References: <524q6t61oa.fsf@cisco.com> <436A5700.8090102@ichips.intel.com> <52vez94kkd.fsf@cisco.com> Message-ID: <436A76D0.5010506@ichips.intel.com> Roland Dreier wrote: > If it's just marshalling between user and kernel formats, I'd stick it > in uverbs_marshall.c. But if there's going to be something > substantial then maybe it make sense to create a user SA module. I added a three new files: ib_marshall.h - defines the copy functions (kernel only) ib_user_sa.h - defines the user path record (user/kernel) uverbs_marshall.c - implements the copy functions Any objection to doing something similar for libibverbs? This would move sa.h from libibat to libibverbs, which would allow libibcm and librdmacm to both depend only on libibverbs. - Sean From yipeeyipeeyipeeyipee at yahoo.com Thu Nov 3 14:19:10 2005 From: yipeeyipeeyipeeyipee at yahoo.com (yipee) Date: Thu, 3 Nov 2005 22:19:10 +0000 (UTC) Subject: [openib-general] Re: netstat References: <436A4C71.3020007@ichips.intel.com> Message-ID: Sean Hefty ichips.intel.com> writes: > > Is there some way to view the list of current CM end points in their various > > states (listen,connection)? > > Nothing like this is available today. I can record this as something to add in > the future, but it's unlikely to be a high priority for at least a few weeks. Maybe I can write some initial implementation. How do you think the data flow should be? libibcm.so issues a write(ucm_fd, buf, buf_len) with the request in buf (and enough extra space in buf for the reply). ib_ucm copies the information from ib_cm into buf. Maybe another command is needed inorder to know how large the reply buffer should be? any comments? y From rolandd at cisco.com Thu Nov 3 14:27:03 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 14:27:03 -0800 Subject: [openib-general] Re: netstat In-Reply-To: (yipee's message of "Thu, 3 Nov 2005 22:19:10 +0000 (UTC)") References: <436A4C71.3020007@ichips.intel.com> Message-ID: <52irv94bbc.fsf@cisco.com> yipee> Maybe I can write some initial implementation. How do you yipee> think the data flow should be? libibcm.so issues a yipee> write(ucm_fd, buf, buf_len) with the request in buf (and yipee> enough extra space in buf for the reply). ib_ucm copies yipee> the information from ib_cm into buf. Probably easier just to use debugfs and seq_file to get this sort of thing. - R. From mst at mellanox.co.il Thu Nov 3 14:33:14 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 4 Nov 2005 00:33:14 +0200 Subject: [openib-general] Re: netstat In-Reply-To: References: Message-ID: <20051103223314.GA6498@mellanox.co.il> Quoting r. yipee : > Subject: Re: netstat > > Sean Hefty ichips.intel.com> writes: > > > > Is there some way to view the list of current CM end points in their > various > > > states (listen,connection)? > > > > Nothing like this is available today. I can record this as something > to add in > > the future, but it's unlikely to be a high priority for at least a few > weeks. > > Maybe I can write some initial implementation. > How do you think the data flow should be? > libibcm.so issues a write(ucm_fd, buf, buf_len) with the request in > buf (and enough extra space in buf for the reply). > ib_ucm copies the information from ib_cm into buf. > > Maybe another command is needed inorder to know how large the reply > buffer > should be? > > > any comments? > > y I would a imagine ib_cm would need to implement a file in /proc or sysfs. One place to start would be to look at how /proc/net/tcp is implemented (thats what netstat uses, I think). -- MST From mshefty at ichips.intel.com Thu Nov 3 14:34:14 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 03 Nov 2005 14:34:14 -0800 Subject: [openib-general] Re: netstat In-Reply-To: References: <436A4C71.3020007@ichips.intel.com> Message-ID: <436A9066.5060200@ichips.intel.com> yipee wrote: > Maybe I can write some initial implementation. If you could do that, it'd be great. I think something like this would be useful. I just don't know if I'll have time to add it soon. - Sean From Richard.Frank at oracle.com Thu Nov 3 14:34:23 2005 From: Richard.Frank at oracle.com (Rick Frank) Date: Thu, 3 Nov 2005 17:34:23 -0500 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB References: <52d5li8waw.fsf@cisco.com> Message-ID: <000101c5e0c6$f7f54540$6401a8c0@YOURA11C73D0FD> It is very important to Oracle for RDS to be available in OpenIB in as many Linux distributions as possible. Is this going to happen and in what timeframe / what are the plans for Linux distributions to pick up OpenIB with RDS support ? How can we (Oracle) help ? ----- Original Message ----- From: "Roland Dreier" To: ; Sent: Wednesday, November 02, 2005 6:27 PM Subject: Re: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB > What are your plans for porting the RDS code so that it works with the > upstream Linux IB stack? I've only seen a couple of checkins, and the > code that you've dropped so far doesn't look usable and needs a lot of > cleanup. There's not even a Makefile there. > > Someone uncharitable might believe that the whole purpose of this > exercise was just to be able to issue your press release > (http://silverstorm.com/news/rel/092005.asp). > > - R. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From Richard.Frank at oracle.com Thu Nov 3 14:37:47 2005 From: Richard.Frank at oracle.com (Rick Frank) Date: Thu, 3 Nov 2005 17:37:47 -0500 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> Message-ID: <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> It is very important to Oracle for RDS to be available in OpenIB in as many Linux distributions as possible. Is this going to happen and in what timeframe / what are the plans for Linux distributions to pick up OpenIB with RDS support ? How can we (Oracle) help ? ----- Original Message ----- From: "Pandit, Ranjit" To: "Roland Dreier" ; Sent: Thursday, November 03, 2005 1:15 PM Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB I'm hoping to start the port soon. It should have started earlier, but unfortunately I got side tracked by some unforeseen issues. Ranjit > -----Original Message----- > From: Roland Dreier [mailto:rolandd at cisco.com] > Sent: Wednesday, November 02, 2005 3:28 PM > To: Pandit, Ranjit; openib-general at openib.org > Subject: Re: [openib-general] [ANNOUNCE] Contribute RDS (Reliable Datagram > Sockets) to OpenIB > > What are your plans for porting the RDS code so that it works with the > upstream Linux IB stack? I've only seen a couple of checkins, and the > code that you've dropped so far doesn't look usable and needs a lot of > cleanup. There's not even a Makefile there. > > Someone uncharitable might believe that the whole purpose of this > exercise was just to be able to issue your press release > (http://silverstorm.com/news/rel/092005.asp). > > - R. _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From rolandd at cisco.com Thu Nov 3 14:58:50 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 14:58:50 -0800 Subject: [openib-general] Re: [PATCH] mthca: fixes pkey_ix processing in mthca_modify_qp In-Reply-To: <20051002141043.GD9873@mellanox.co.il> (Jack Morgenstein's message of "Sun, 2 Oct 2005 16:10:44 +0200") References: <20051002141043.GD9873@mellanox.co.il> Message-ID: <52ek5x49ud.fsf@cisco.com> Thanks, applied. - R. From rolandd at cisco.com Thu Nov 3 15:05:37 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 15:05:37 -0800 Subject: [openib-general] Re: [PATCH] fix page_size_cap value in ib_query_device for mellanox provider In-Reply-To: <20051020110443.GA7198@mellanox.co.il> (Jack Morgenstein's message of "Thu, 20 Oct 2005 13:04:44 +0200") References: <20051020110443.GA7198@mellanox.co.il> Message-ID: <527jbp49j2.fsf@cisco.com> Can we just use something like this instead? I don't think we need the comments talking about the semantics of page_size_cap, since we don't say what any other field means. And I don't see what casting mdev->limits.page_size_cap to u64 accomplishes -- it will get promoted to u64 anyway, since props->page_size_cap is a u64. - R. --- infiniband/hw/mthca/mthca_provider.c (revision 3965) +++ infiniband/hw/mthca/mthca_provider.c (working copy) @@ -94,6 +94,7 @@ static int mthca_query_device(struct ib_ memcpy(&props->node_guid, out_mad->data + 12, 8); props->max_mr_size = ~0ull; + props->page_size_cap = mdev->limits.page_size_cap; props->max_qp = mdev->limits.num_qps - mdev->limits.reserved_qps; props->max_qp_wr = mdev->limits.max_wqes; props->max_sge = mdev->limits.max_sg; --- infiniband/hw/mthca/mthca_main.c (revision 3965) +++ infiniband/hw/mthca/mthca_main.c (working copy) @@ -181,6 +181,7 @@ static int __devinit mthca_dev_lim(struc mdev->limits.reserved_uars = dev_lim->reserved_uars; mdev->limits.reserved_pds = dev_lim->reserved_pds; mdev->limits.port_width_cap = dev_lim->max_port_width; + mdev->limits.page_size_cap = ~(u32) (dev_lim->min_page_sz - 1); mdev->limits.flags = dev_lim->flags; /* IB_DEVICE_RESIZE_MAX_WR not supported by driver. From rolandd at cisco.com Thu Nov 3 15:10:59 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 23:10:59 +0000 Subject: [openib-general] [git patch review 4/7] [IPoIB] don't compile debug code if debugging isn't enabled In-Reply-To: <1131059459423-3dc7f03665037bf0@cisco.com> Message-ID: <1131059459423-c39565dcb8db8aaa@cisco.com> Don't build ipoib_mcast_iter_ functions if CONFIG_INFINIBAND_IPOIB_DEBUG is not enabled -- their only callers will not be built either. Also move the prototype for ipoib_open() to ipoib.h to fix a sparse warning. Signed-off-by: Roland Dreier --- drivers/infiniband/ulp/ipoib/ipoib.h | 3 +++ drivers/infiniband/ulp/ipoib/ipoib_ib.c | 1 - drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 4 ++++ 3 files changed, 7 insertions(+), 1 deletions(-) applies-to: 3179960b8e0f3ccb4feff19eb5582298d48324a0 8ae5a8a24f7fe797027d481f88c1464b0e47eede diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h index c994a91..0095acc 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib.h +++ b/drivers/infiniband/ulp/ipoib/ipoib.h @@ -235,6 +235,7 @@ static inline void ipoib_put_ah(struct i kref_put(&ah->ref, ipoib_free_ah); } +int ipoib_open(struct net_device *dev); int ipoib_add_pkey_attr(struct net_device *dev); void ipoib_send(struct net_device *dev, struct sk_buff *skb, @@ -267,6 +268,7 @@ int ipoib_mcast_stop_thread(struct net_d void ipoib_mcast_dev_down(struct net_device *dev); void ipoib_mcast_dev_flush(struct net_device *dev); +#ifdef CONFIG_INFINIBAND_IPOIB_DEBUG struct ipoib_mcast_iter *ipoib_mcast_iter_init(struct net_device *dev); void ipoib_mcast_iter_free(struct ipoib_mcast_iter *iter); int ipoib_mcast_iter_next(struct ipoib_mcast_iter *iter); @@ -276,6 +278,7 @@ void ipoib_mcast_iter_read(struct ipoib_ unsigned int *queuelen, unsigned int *complete, unsigned int *send_only); +#endif int ipoib_mcast_attach(struct net_device *dev, u16 mlid, union ib_gid *mgid); diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 0a6f578..54ef2fe 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -636,7 +636,6 @@ void ipoib_ib_dev_cleanup(struct net_dev * Bug #2507. This implementation will probably be removed when the P_Key * change async notification is available. */ -int ipoib_open(struct net_device *dev); static void ipoib_pkey_dev_check_presence(struct net_device *dev) { diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 022eec7..3ecf78a 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -917,6 +917,8 @@ void ipoib_mcast_restart_task(void *dev_ ipoib_mcast_start_thread(dev); } +#ifdef CONFIG_INFINIBAND_IPOIB_DEBUG + struct ipoib_mcast_iter *ipoib_mcast_iter_init(struct net_device *dev) { struct ipoib_mcast_iter *iter; @@ -989,3 +991,5 @@ void ipoib_mcast_iter_read(struct ipoib_ *complete = iter->complete; *send_only = iter->send_only; } + +#endif /* CONFIG_INFINIBAND_IPOIB_DEBUG */ --- 0.99.9 From rolandd at cisco.com Thu Nov 3 15:10:59 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 23:10:59 +0000 Subject: [openib-general] [git patch review 1/7] [IB] ucm: 32/64 compatibility fixes Message-ID: <1131059459422-6013455baf532b88@cisco.com> Fix structure layouts to ensure same size on 32-bit and 64-bit architectures. This permits 32-bit userspace apps on a 64-bit kernel. Signed-off-by: Sean Hefty Signed-off-by: Roland Dreier --- include/rdma/ib_user_cm.h | 19 +++++++++++++------ 1 files changed, 13 insertions(+), 6 deletions(-) applies-to: ecb02f68e1055343bb45fc38350a8e33c827efc9 7b28b0d000eeb62d77add636f5d6eb0da04e48aa diff --git a/include/rdma/ib_user_cm.h b/include/rdma/ib_user_cm.h index 3037588..19be116 100644 --- a/include/rdma/ib_user_cm.h +++ b/include/rdma/ib_user_cm.h @@ -38,7 +38,7 @@ #include -#define IB_USER_CM_ABI_VERSION 3 +#define IB_USER_CM_ABI_VERSION 4 enum { IB_USER_CM_CMD_CREATE_ID, @@ -84,6 +84,7 @@ struct ib_ucm_create_id_resp { struct ib_ucm_destroy_id { __u64 response; __u32 id; + __u32 reserved; }; struct ib_ucm_destroy_id_resp { @@ -93,6 +94,7 @@ struct ib_ucm_destroy_id_resp { struct ib_ucm_attr_id { __u64 response; __u32 id; + __u32 reserved; }; struct ib_ucm_attr_id_resp { @@ -164,6 +166,7 @@ struct ib_ucm_listen { __be64 service_id; __be64 service_mask; __u32 id; + __u32 reserved; }; struct ib_ucm_establish { @@ -219,7 +222,7 @@ struct ib_ucm_req { __u8 rnr_retry_count; __u8 max_cm_retries; __u8 srq; - __u8 reserved[1]; + __u8 reserved[5]; }; struct ib_ucm_rep { @@ -236,6 +239,7 @@ struct ib_ucm_rep { __u8 flow_control; __u8 rnr_retry_count; __u8 srq; + __u8 reserved[4]; }; struct ib_ucm_info { @@ -245,7 +249,7 @@ struct ib_ucm_info { __u64 data; __u8 info_len; __u8 data_len; - __u8 reserved[2]; + __u8 reserved[6]; }; struct ib_ucm_mra { @@ -273,6 +277,7 @@ struct ib_ucm_sidr_req { __u16 pkey; __u8 len; __u8 max_cm_retries; + __u8 reserved[4]; }; struct ib_ucm_sidr_rep { @@ -284,7 +289,7 @@ struct ib_ucm_sidr_rep { __u64 data; __u8 info_len; __u8 data_len; - __u8 reserved[2]; + __u8 reserved[6]; }; /* * event notification ABI structures. @@ -295,7 +300,7 @@ struct ib_ucm_event_get { __u64 info; __u8 data_len; __u8 info_len; - __u8 reserved[2]; + __u8 reserved[6]; }; struct ib_ucm_req_event_resp { @@ -315,6 +320,7 @@ struct ib_ucm_req_event_resp { __u8 rnr_retry_count; __u8 srq; __u8 port; + __u8 reserved[7]; }; struct ib_ucm_rep_event_resp { @@ -329,7 +335,7 @@ struct ib_ucm_rep_event_resp { __u8 flow_control; __u8 rnr_retry_count; __u8 srq; - __u8 reserved[1]; + __u8 reserved[5]; }; struct ib_ucm_rej_event_resp { @@ -374,6 +380,7 @@ struct ib_ucm_event_resp { __u32 id; __u32 event; __u32 present; + __u32 reserved; union { struct ib_ucm_req_event_resp req_resp; struct ib_ucm_rep_event_resp rep_resp; --- 0.99.9 From rolandd at cisco.com Thu Nov 3 15:10:59 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 23:10:59 +0000 Subject: [openib-general] [git patch review 3/7] [IPoIB] remove unneeded initializations to 0 In-Reply-To: <1131059459423-f6e7ac335ed94eef@cisco.com> Message-ID: <1131059459423-3dc7f03665037bf0@cisco.com> Shrink our source and .text a little by removing a few assignments of NULL and 0 to memory that is already cleared as part of the allocation. Signed-off-by: Roland Dreier --- drivers/infiniband/ulp/ipoib/ipoib_main.c | 11 ++--------- 1 files changed, 2 insertions(+), 9 deletions(-) applies-to: 7463446a05b5e9a5d2fc400da0be8d4a6c2ff6f1 21a384897d48c116b879924c3dd9e96f6f1e764b diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 8b67db8..ce02962 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -356,18 +356,15 @@ static struct ipoib_path *path_rec_creat struct ipoib_dev_priv *priv = netdev_priv(dev); struct ipoib_path *path; - path = kmalloc(sizeof *path, GFP_ATOMIC); + path = kzalloc(sizeof *path, GFP_ATOMIC); if (!path) return NULL; - path->dev = dev; - path->pathrec.dlid = 0; - path->ah = NULL; + path->dev = dev; skb_queue_head_init(&path->queue); INIT_LIST_HEAD(&path->neigh_list); - path->query = NULL; init_completion(&path->done); memcpy(path->pathrec.dgid.raw, gid->raw, sizeof (union ib_gid)); @@ -800,10 +797,6 @@ static void ipoib_setup(struct net_devic dev->watchdog_timeo = HZ; - dev->rebuild_header = NULL; - dev->set_mac_address = NULL; - dev->header_cache_update = NULL; - dev->flags |= IFF_BROADCAST | IFF_MULTICAST; /* --- 0.99.9 From rolandd at cisco.com Thu Nov 3 15:10:59 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 23:10:59 +0000 Subject: [openib-general] [git patch review 2/7] [IB] kzalloc() conversions In-Reply-To: <1131059459422-6013455baf532b88@cisco.com> Message-ID: <1131059459423-f6e7ac335ed94eef@cisco.com> Replace kmalloc()+memset(,0,) with kzalloc(), for a net savings of 35 source lines and about 500 bytes of text. Signed-off-by: Roland Dreier --- drivers/infiniband/core/agent.c | 3 +- drivers/infiniband/core/cm.c | 6 ++--- drivers/infiniband/core/device.c | 10 +------- drivers/infiniband/core/mad.c | 31 +++++++++--------------- drivers/infiniband/core/sysfs.c | 6 ++--- drivers/infiniband/core/ucm.c | 9 ++----- drivers/infiniband/core/uverbs_main.c | 4 +-- drivers/infiniband/hw/mthca/mthca_mr.c | 4 +-- drivers/infiniband/hw/mthca/mthca_profile.c | 4 +-- drivers/infiniband/ulp/ipoib/ipoib_main.c | 8 ++---- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 4 +-- 11 files changed, 27 insertions(+), 62 deletions(-) applies-to: 184c63c9358b790f4dd3288ea24b8d0c7973247f de6eb66b56d9df5ce6bd254994f05e065214e8cd diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c index 0c3c695..7545775 100644 --- a/drivers/infiniband/core/agent.c +++ b/drivers/infiniband/core/agent.c @@ -155,13 +155,12 @@ int ib_agent_port_open(struct ib_device int ret; /* Create new device info */ - port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL); + port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL); if (!port_priv) { printk(KERN_ERR SPFX "No memory for ib_agent_port_private\n"); ret = -ENOMEM; goto error1; } - memset(port_priv, 0, sizeof *port_priv); /* Obtain send only MAD agent for SMI QP */ port_priv->agent[0] = ib_register_mad_agent(device, port_num, diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 580c3a2..02110e0 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -544,11 +544,10 @@ struct ib_cm_id *ib_create_cm_id(struct struct cm_id_private *cm_id_priv; int ret; - cm_id_priv = kmalloc(sizeof *cm_id_priv, GFP_KERNEL); + cm_id_priv = kzalloc(sizeof *cm_id_priv, GFP_KERNEL); if (!cm_id_priv) return ERR_PTR(-ENOMEM); - memset(cm_id_priv, 0, sizeof *cm_id_priv); cm_id_priv->id.state = IB_CM_IDLE; cm_id_priv->id.device = device; cm_id_priv->id.cm_handler = cm_handler; @@ -621,10 +620,9 @@ static struct cm_timewait_info * cm_crea { struct cm_timewait_info *timewait_info; - timewait_info = kmalloc(sizeof *timewait_info, GFP_KERNEL); + timewait_info = kzalloc(sizeof *timewait_info, GFP_KERNEL); if (!timewait_info) return ERR_PTR(-ENOMEM); - memset(timewait_info, 0, sizeof *timewait_info); timewait_info->work.local_id = local_id; INIT_WORK(&timewait_info->work.work, cm_work_handler, diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 5a6e449..e169e79 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -161,17 +161,9 @@ static int alloc_name(char *name) */ struct ib_device *ib_alloc_device(size_t size) { - void *dev; - BUG_ON(size < sizeof (struct ib_device)); - dev = kmalloc(size, GFP_KERNEL); - if (!dev) - return NULL; - - memset(dev, 0, size); - - return dev; + return kzalloc(size, GFP_KERNEL); } EXPORT_SYMBOL(ib_alloc_device); diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 88f9f8c..3d8175e 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -255,12 +255,11 @@ struct ib_mad_agent *ib_register_mad_age } /* Allocate structures */ - mad_agent_priv = kmalloc(sizeof *mad_agent_priv, GFP_KERNEL); + mad_agent_priv = kzalloc(sizeof *mad_agent_priv, GFP_KERNEL); if (!mad_agent_priv) { ret = ERR_PTR(-ENOMEM); goto error1; } - memset(mad_agent_priv, 0, sizeof *mad_agent_priv); mad_agent_priv->agent.mr = ib_get_dma_mr(port_priv->qp_info[qpn].qp->pd, IB_ACCESS_LOCAL_WRITE); @@ -448,14 +447,13 @@ struct ib_mad_agent *ib_register_mad_sno goto error1; } /* Allocate structures */ - mad_snoop_priv = kmalloc(sizeof *mad_snoop_priv, GFP_KERNEL); + mad_snoop_priv = kzalloc(sizeof *mad_snoop_priv, GFP_KERNEL); if (!mad_snoop_priv) { ret = ERR_PTR(-ENOMEM); goto error1; } /* Now, fill in the various structures */ - memset(mad_snoop_priv, 0, sizeof *mad_snoop_priv); mad_snoop_priv->qp_info = &port_priv->qp_info[qpn]; mad_snoop_priv->agent.device = device; mad_snoop_priv->agent.recv_handler = recv_handler; @@ -794,10 +792,9 @@ struct ib_mad_send_buf * ib_create_send_ (!rmpp_active && buf_size > sizeof(struct ib_mad))) return ERR_PTR(-EINVAL); - buf = kmalloc(sizeof *mad_send_wr + buf_size, gfp_mask); + buf = kzalloc(sizeof *mad_send_wr + buf_size, gfp_mask); if (!buf) return ERR_PTR(-ENOMEM); - memset(buf, 0, sizeof *mad_send_wr + buf_size); mad_send_wr = buf + buf_size; mad_send_wr->send_buf.mad = buf; @@ -1039,14 +1036,12 @@ static int method_in_use(struct ib_mad_m static int allocate_method_table(struct ib_mad_mgmt_method_table **method) { /* Allocate management method table */ - *method = kmalloc(sizeof **method, GFP_ATOMIC); + *method = kzalloc(sizeof **method, GFP_ATOMIC); if (!*method) { printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_method_table\n"); return -ENOMEM; } - /* Clear management method table */ - memset(*method, 0, sizeof **method); return 0; } @@ -1137,15 +1132,14 @@ static int add_nonoui_reg_req(struct ib_ class = &port_priv->version[mad_reg_req->mgmt_class_version].class; if (!*class) { /* Allocate management class table for "new" class version */ - *class = kmalloc(sizeof **class, GFP_ATOMIC); + *class = kzalloc(sizeof **class, GFP_ATOMIC); if (!*class) { printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_class_table\n"); ret = -ENOMEM; goto error1; } - /* Clear management class table */ - memset(*class, 0, sizeof(**class)); + /* Allocate method table for this management class */ method = &(*class)->method_table[mgmt_class]; if ((ret = allocate_method_table(method))) @@ -1209,25 +1203,24 @@ static int add_oui_reg_req(struct ib_mad mad_reg_req->mgmt_class_version].vendor; if (!*vendor_table) { /* Allocate mgmt vendor class table for "new" class version */ - vendor = kmalloc(sizeof *vendor, GFP_ATOMIC); + vendor = kzalloc(sizeof *vendor, GFP_ATOMIC); if (!vendor) { printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_vendor_class_table\n"); goto error1; } - /* Clear management vendor class table */ - memset(vendor, 0, sizeof(*vendor)); + *vendor_table = vendor; } if (!(*vendor_table)->vendor_class[vclass]) { /* Allocate table for this management vendor class */ - vendor_class = kmalloc(sizeof *vendor_class, GFP_ATOMIC); + vendor_class = kzalloc(sizeof *vendor_class, GFP_ATOMIC); if (!vendor_class) { printk(KERN_ERR PFX "No memory for " "ib_mad_mgmt_vendor_class\n"); goto error2; } - memset(vendor_class, 0, sizeof(*vendor_class)); + (*vendor_table)->vendor_class[vclass] = vendor_class; } for (i = 0; i < MAX_MGMT_OUI; i++) { @@ -2524,12 +2517,12 @@ static int ib_mad_port_open(struct ib_de char name[sizeof "ib_mad123"]; /* Create new device info */ - port_priv = kmalloc(sizeof *port_priv, GFP_KERNEL); + port_priv = kzalloc(sizeof *port_priv, GFP_KERNEL); if (!port_priv) { printk(KERN_ERR PFX "No memory for ib_mad_port_private\n"); return -ENOMEM; } - memset(port_priv, 0, sizeof *port_priv); + port_priv->device = device; port_priv->port_num = port_num; spin_lock_init(&port_priv->reg_lock); diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c index 7ce7a6c..b812065 100644 --- a/drivers/infiniband/core/sysfs.c +++ b/drivers/infiniband/core/sysfs.c @@ -307,14 +307,13 @@ static ssize_t show_pma_counter(struct i if (!p->ibdev->process_mad) return sprintf(buf, "N/A (no PMA)\n"); - in_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); + in_mad = kzalloc(sizeof *in_mad, GFP_KERNEL); out_mad = kmalloc(sizeof *in_mad, GFP_KERNEL); if (!in_mad || !out_mad) { ret = -ENOMEM; goto out; } - memset(in_mad, 0, sizeof *in_mad); in_mad->mad_hdr.base_version = 1; in_mad->mad_hdr.mgmt_class = IB_MGMT_CLASS_PERF_MGMT; in_mad->mad_hdr.class_version = 1; @@ -508,10 +507,9 @@ static int add_port(struct ib_device *de if (ret) return ret; - p = kmalloc(sizeof *p, GFP_KERNEL); + p = kzalloc(sizeof *p, GFP_KERNEL); if (!p) return -ENOMEM; - memset(p, 0, sizeof *p); p->ibdev = device; p->port_num = port_num; diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c index 2847756..6e15787 100644 --- a/drivers/infiniband/core/ucm.c +++ b/drivers/infiniband/core/ucm.c @@ -172,11 +172,10 @@ static struct ib_ucm_context *ib_ucm_ctx struct ib_ucm_context *ctx; int result; - ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); + ctx = kzalloc(sizeof *ctx, GFP_KERNEL); if (!ctx) return NULL; - memset(ctx, 0, sizeof *ctx); atomic_set(&ctx->ref, 1); init_waitqueue_head(&ctx->wait); ctx->file = file; @@ -386,11 +385,10 @@ static int ib_ucm_event_handler(struct i ctx = cm_id->context; - uevent = kmalloc(sizeof(*uevent), GFP_KERNEL); + uevent = kzalloc(sizeof *uevent, GFP_KERNEL); if (!uevent) goto err1; - memset(uevent, 0, sizeof(*uevent)); uevent->ctx = ctx; uevent->cm_id = cm_id; uevent->resp.uid = ctx->uid; @@ -1345,11 +1343,10 @@ static void ib_ucm_add_one(struct ib_dev if (!device->alloc_ucontext) return; - ucm_dev = kmalloc(sizeof *ucm_dev, GFP_KERNEL); + ucm_dev = kzalloc(sizeof *ucm_dev, GFP_KERNEL); if (!ucm_dev) return; - memset(ucm_dev, 0, sizeof *ucm_dev); ucm_dev->ib_dev = device; ucm_dev->devnum = find_first_zero_bit(dev_map, IB_UCM_MAX_DEVICES); diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index e58a7b2..de6581d 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -725,12 +725,10 @@ static void ib_uverbs_add_one(struct ib_ if (!device->alloc_ucontext) return; - uverbs_dev = kmalloc(sizeof *uverbs_dev, GFP_KERNEL); + uverbs_dev = kzalloc(sizeof *uverbs_dev, GFP_KERNEL); if (!uverbs_dev) return; - memset(uverbs_dev, 0, sizeof *uverbs_dev); - kref_init(&uverbs_dev->ref); spin_lock(&map_lock); diff --git a/drivers/infiniband/hw/mthca/mthca_mr.c b/drivers/infiniband/hw/mthca/mthca_mr.c index 1f97a44..e995e2a 100644 --- a/drivers/infiniband/hw/mthca/mthca_mr.c +++ b/drivers/infiniband/hw/mthca/mthca_mr.c @@ -140,13 +140,11 @@ static int __devinit mthca_buddy_init(st buddy->max_order = max_order; spin_lock_init(&buddy->lock); - buddy->bits = kmalloc((buddy->max_order + 1) * sizeof (long *), + buddy->bits = kzalloc((buddy->max_order + 1) * sizeof (long *), GFP_KERNEL); if (!buddy->bits) goto err_out; - memset(buddy->bits, 0, (buddy->max_order + 1) * sizeof (long *)); - for (i = 0; i <= buddy->max_order; ++i) { s = BITS_TO_LONGS(1 << (buddy->max_order - i)); buddy->bits[i] = kmalloc(s * sizeof (long), GFP_KERNEL); diff --git a/drivers/infiniband/hw/mthca/mthca_profile.c b/drivers/infiniband/hw/mthca/mthca_profile.c index 0576056..408cd55 100644 --- a/drivers/infiniband/hw/mthca/mthca_profile.c +++ b/drivers/infiniband/hw/mthca/mthca_profile.c @@ -80,12 +80,10 @@ u64 mthca_make_profile(struct mthca_dev struct mthca_resource tmp; int i, j; - profile = kmalloc(MTHCA_RES_NUM * sizeof *profile, GFP_KERNEL); + profile = kzalloc(MTHCA_RES_NUM * sizeof *profile, GFP_KERNEL); if (!profile) return -ENOMEM; - memset(profile, 0, MTHCA_RES_NUM * sizeof *profile); - profile[MTHCA_RES_QP].size = dev_lim->qpc_entry_sz; profile[MTHCA_RES_EEC].size = dev_lim->eec_entry_sz; profile[MTHCA_RES_SRQ].size = dev_lim->srq_entry_sz; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 273d5f4..8b67db8 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -729,25 +729,21 @@ int ipoib_dev_init(struct net_device *de /* Allocate RX/TX "rings" to hold queued skbs */ - priv->rx_ring = kmalloc(IPOIB_RX_RING_SIZE * sizeof (struct ipoib_rx_buf), + priv->rx_ring = kzalloc(IPOIB_RX_RING_SIZE * sizeof (struct ipoib_rx_buf), GFP_KERNEL); if (!priv->rx_ring) { printk(KERN_WARNING "%s: failed to allocate RX ring (%d entries)\n", ca->name, IPOIB_RX_RING_SIZE); goto out; } - memset(priv->rx_ring, 0, - IPOIB_RX_RING_SIZE * sizeof (struct ipoib_rx_buf)); - priv->tx_ring = kmalloc(IPOIB_TX_RING_SIZE * sizeof (struct ipoib_tx_buf), + priv->tx_ring = kzalloc(IPOIB_TX_RING_SIZE * sizeof (struct ipoib_tx_buf), GFP_KERNEL); if (!priv->tx_ring) { printk(KERN_WARNING "%s: failed to allocate TX ring (%d entries)\n", ca->name, IPOIB_TX_RING_SIZE); goto out_rx_ring_cleanup; } - memset(priv->tx_ring, 0, - IPOIB_TX_RING_SIZE * sizeof (struct ipoib_tx_buf)); /* priv->tx_head & tx_tail are already 0 */ diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c index 36ce298..022eec7 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c @@ -135,12 +135,10 @@ static struct ipoib_mcast *ipoib_mcast_a { struct ipoib_mcast *mcast; - mcast = kmalloc(sizeof (*mcast), can_sleep ? GFP_KERNEL : GFP_ATOMIC); + mcast = kzalloc(sizeof *mcast, can_sleep ? GFP_KERNEL : GFP_ATOMIC); if (!mcast) return NULL; - memset(mcast, 0, sizeof (*mcast)); - init_completion(&mcast->done); mcast->dev = dev; --- 0.99.9 From rolandd at cisco.com Thu Nov 3 15:10:59 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 23:10:59 +0000 Subject: [openib-general] [git patch review 5/7] [IB] mthca: fix format of FW version In-Reply-To: <1131059459423-c39565dcb8db8aaa@cisco.com> Message-ID: <1131059459423-9ff5e95fb47caab0@cisco.com> Mellanox has decided that the components of the firmware version are really meant to be displayed in decimal, e.g. 0x000400070190 is version 4.7.400. Change the format we use from "%x.%x.%x" to "%d.%d.%d" to match this convention. Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_main.c | 2 +- drivers/infiniband/hw/mthca/mthca_provider.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) applies-to: 389cecdfb0769cdddd0e901c1d60b9549b0a6322 87cfe32375e0b69b999b59bf8287f501df3e43f7 diff --git a/drivers/infiniband/hw/mthca/mthca_main.c b/drivers/infiniband/hw/mthca/mthca_main.c index 883d1e5..45c6328 100644 --- a/drivers/infiniband/hw/mthca/mthca_main.c +++ b/drivers/infiniband/hw/mthca/mthca_main.c @@ -1057,7 +1057,7 @@ static int __devinit mthca_init_one(stru goto err_cmd; if (mdev->fw_ver < mthca_hca_table[id->driver_data].latest_fw) { - mthca_warn(mdev, "HCA FW version %x.%x.%x is old (%x.%x.%x is current).\n", + mthca_warn(mdev, "HCA FW version %d.%d.%d is old (%d.%d.%d is current).\n", (int) (mdev->fw_ver >> 32), (int) (mdev->fw_ver >> 16) & 0xffff, (int) (mdev->fw_ver & 0xffff), (int) (mthca_hca_table[id->driver_data].latest_fw >> 32), diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index 1b9477e..6b01666 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -1028,7 +1028,7 @@ static ssize_t show_rev(struct class_dev static ssize_t show_fw_ver(struct class_device *cdev, char *buf) { struct mthca_dev *dev = container_of(cdev, struct mthca_dev, ib_dev.class_dev); - return sprintf(buf, "%x.%x.%x\n", (int) (dev->fw_ver >> 32), + return sprintf(buf, "%d.%d.%d\n", (int) (dev->fw_ver >> 32), (int) (dev->fw_ver >> 16) & 0xffff, (int) dev->fw_ver & 0xffff); } --- 0.99.9 From rolandd at cisco.com Thu Nov 3 15:10:59 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 23:10:59 +0000 Subject: [openib-general] [git patch review 7/7] [IB] mthca: check P_Key index in modify QP In-Reply-To: <1131059459423-5367bfddb028b876@cisco.com> Message-ID: <1131059459423-4e378ef68b019b7e@cisco.com> Make sure that the P_Key index passed into mthca_modify_qp() is within the device's P_Key table. Signed-off-by: Jack Morgenstein Signed-off-by: Roland Dreier --- drivers/infiniband/hw/mthca/mthca_qp.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) applies-to: b974a31452cb645f063589262bde09b6c5b05701 d09e32764176b61c4afee9fd5e7fe04713bfa56f diff --git a/drivers/infiniband/hw/mthca/mthca_qp.c b/drivers/infiniband/hw/mthca/mthca_qp.c index 62ff091..8b0b935 100644 --- a/drivers/infiniband/hw/mthca/mthca_qp.c +++ b/drivers/infiniband/hw/mthca/mthca_qp.c @@ -582,6 +582,13 @@ int mthca_modify_qp(struct ib_qp *ibqp, return -EINVAL; } + if ((attr_mask & IB_QP_PKEY_INDEX) && + attr->pkey_index >= dev->limits.pkey_table_len) { + mthca_dbg(dev, "PKey index (%u) too large. max is %d\n", + attr->pkey_index,dev->limits.pkey_table_len-1); + return -EINVAL; + } + mailbox = mthca_alloc_mailbox(dev, GFP_KERNEL); if (IS_ERR(mailbox)) return PTR_ERR(mailbox); --- 0.99.9 From rolandd at cisco.com Thu Nov 3 15:10:59 2005 From: rolandd at cisco.com (Roland Dreier) Date: Thu, 03 Nov 2005 23:10:59 +0000 Subject: [openib-general] [git patch review 6/7] [IB] umad: fix hot remove of IB devices In-Reply-To: <1131059459423-9ff5e95fb47caab0@cisco.com> Message-ID: <1131059459423-5367bfddb028b876@cisco.com> Fix hotplug of devices for ib_umad module: when a device goes away, kill off all MAD agents for open files associated with that device, and make sure that the device is not touched again after ib_umad returns from its remove_one function. Signed-off-by: Roland Dreier --- drivers/infiniband/core/user_mad.c | 80 +++++++++++++++++++++++++++++------- 1 files changed, 64 insertions(+), 16 deletions(-) applies-to: 2cbc1b1e7bb230afcf4903b6527e3238f689de89 0c99cb6d5fe77872c5a32cff837c05f70158ce15 diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index 97128e2..aed5ca2 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -94,6 +94,9 @@ struct ib_umad_port { struct class_device *sm_class_dev; struct semaphore sm_sem; + struct rw_semaphore mutex; + struct list_head file_list; + struct ib_device *ib_dev; struct ib_umad_device *umad_dev; int dev_num; @@ -108,10 +111,10 @@ struct ib_umad_device { struct ib_umad_file { struct ib_umad_port *port; - spinlock_t recv_lock; struct list_head recv_list; + struct list_head port_list; + spinlock_t recv_lock; wait_queue_head_t recv_wait; - struct rw_semaphore agent_mutex; struct ib_mad_agent *agent[IB_UMAD_MAX_AGENTS]; struct ib_mr *mr[IB_UMAD_MAX_AGENTS]; }; @@ -148,7 +151,7 @@ static int queue_packet(struct ib_umad_f { int ret = 1; - down_read(&file->agent_mutex); + down_read(&file->port->mutex); for (packet->mad.hdr.id = 0; packet->mad.hdr.id < IB_UMAD_MAX_AGENTS; packet->mad.hdr.id++) @@ -161,7 +164,7 @@ static int queue_packet(struct ib_umad_f break; } - up_read(&file->agent_mutex); + up_read(&file->port->mutex); return ret; } @@ -322,7 +325,7 @@ static ssize_t ib_umad_write(struct file goto err; } - down_read(&file->agent_mutex); + down_read(&file->port->mutex); agent = file->agent[packet->mad.hdr.id]; if (!agent) { @@ -419,7 +422,7 @@ static ssize_t ib_umad_write(struct file if (ret) goto err_msg; - up_read(&file->agent_mutex); + up_read(&file->port->mutex); return count; @@ -430,7 +433,7 @@ err_ah: ib_destroy_ah(ah); err_up: - up_read(&file->agent_mutex); + up_read(&file->port->mutex); err: kfree(packet); @@ -460,7 +463,12 @@ static int ib_umad_reg_agent(struct ib_u int agent_id; int ret; - down_write(&file->agent_mutex); + down_write(&file->port->mutex); + + if (!file->port->ib_dev) { + ret = -EPIPE; + goto out; + } if (copy_from_user(&ureq, (void __user *) arg, sizeof ureq)) { ret = -EFAULT; @@ -522,7 +530,7 @@ err: ib_unregister_mad_agent(agent); out: - up_write(&file->agent_mutex); + up_write(&file->port->mutex); return ret; } @@ -531,7 +539,7 @@ static int ib_umad_unreg_agent(struct ib u32 id; int ret = 0; - down_write(&file->agent_mutex); + down_write(&file->port->mutex); if (get_user(id, (u32 __user *) arg)) { ret = -EFAULT; @@ -548,7 +556,7 @@ static int ib_umad_unreg_agent(struct ib file->agent[id] = NULL; out: - up_write(&file->agent_mutex); + up_write(&file->port->mutex); return ret; } @@ -569,6 +577,7 @@ static int ib_umad_open(struct inode *in { struct ib_umad_port *port; struct ib_umad_file *file; + int ret = 0; spin_lock(&port_lock); port = umad_port[iminor(inode) - IB_UMAD_MINOR_BASE]; @@ -579,21 +588,32 @@ static int ib_umad_open(struct inode *in if (!port) return -ENXIO; + down_write(&port->mutex); + + if (!port->ib_dev) { + ret = -ENXIO; + goto out; + } + file = kzalloc(sizeof *file, GFP_KERNEL); if (!file) { kref_put(&port->umad_dev->ref, ib_umad_release_dev); - return -ENOMEM; + ret = -ENOMEM; + goto out; } spin_lock_init(&file->recv_lock); - init_rwsem(&file->agent_mutex); INIT_LIST_HEAD(&file->recv_list); init_waitqueue_head(&file->recv_wait); file->port = port; filp->private_data = file; - return 0; + list_add_tail(&file->port_list, &port->file_list); + +out: + up_write(&port->mutex); + return ret; } static int ib_umad_close(struct inode *inode, struct file *filp) @@ -603,6 +623,7 @@ static int ib_umad_close(struct inode *i struct ib_umad_packet *packet, *tmp; int i; + down_write(&file->port->mutex); for (i = 0; i < IB_UMAD_MAX_AGENTS; ++i) if (file->agent[i]) { ib_dereg_mr(file->mr[i]); @@ -612,6 +633,9 @@ static int ib_umad_close(struct inode *i list_for_each_entry_safe(packet, tmp, &file->recv_list, list) kfree(packet); + list_del(&file->port_list); + up_write(&file->port->mutex); + kfree(file); kref_put(&dev->ref, ib_umad_release_dev); @@ -680,9 +704,13 @@ static int ib_umad_sm_close(struct inode struct ib_port_modify props = { .clr_port_cap_mask = IB_PORT_SM }; - int ret; + int ret = 0; + + down_write(&port->mutex); + if (port->ib_dev) + ret = ib_modify_port(port->ib_dev, port->port_num, 0, &props); + up_write(&port->mutex); - ret = ib_modify_port(port->ib_dev, port->port_num, 0, &props); up(&port->sm_sem); kref_put(&port->umad_dev->ref, ib_umad_release_dev); @@ -745,6 +773,8 @@ static int ib_umad_init_port(struct ib_d port->ib_dev = device; port->port_num = port_num; init_MUTEX(&port->sm_sem); + init_rwsem(&port->mutex); + INIT_LIST_HEAD(&port->file_list); port->dev = cdev_alloc(); if (!port->dev) @@ -813,6 +843,9 @@ err_cdev: static void ib_umad_kill_port(struct ib_umad_port *port) { + struct ib_umad_file *file; + int id; + class_set_devdata(port->class_dev, NULL); class_set_devdata(port->sm_class_dev, NULL); @@ -826,6 +859,21 @@ static void ib_umad_kill_port(struct ib_ umad_port[port->dev_num] = NULL; spin_unlock(&port_lock); + down_write(&port->mutex); + + port->ib_dev = NULL; + + list_for_each_entry(file, &port->file_list, port_list) + for (id = 0; id < IB_UMAD_MAX_AGENTS; ++id) { + if (!file->agent[id]) + continue; + ib_dereg_mr(file->mr[id]); + ib_unregister_mad_agent(file->agent[id]); + file->agent[id] = NULL; + } + + up_write(&port->mutex); + clear_bit(port->dev_num, dev_map); } --- 0.99.9 From mst at mellanox.co.il Thu Nov 3 15:23:18 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 4 Nov 2005 01:23:18 +0200 Subject: [openib-general] Re: [PATCH] fix page_size_cap value in ib_query_device for mellanox provider In-Reply-To: <527jbp49j2.fsf@cisco.com> References: <527jbp49j2.fsf@cisco.com> Message-ID: <20051103232318.GC6498@mellanox.co.il> Quoting Roland Dreier : > Subject: Re: [PATCH] fix page_size_cap value in ib_query_device for mellanox provider > > Can we just use something like this instead? I don't think we need > the comments talking about the semantics of page_size_cap, since we > don't say what any other field means. This was intended more as a clarification for you. I think its fine to remove this comment if you think its clear that _cap name means that its a bit mask. > And I don't see what casting mdev->limits.page_size_cap to u64 > accomplishes -- it will get promoted to u64 anyway, since > props->page_size_cap is a u64. > > - R. Makes sense, to me. -- MST From halr at voltaire.com Thu Nov 3 15:25:32 2005 From: halr at voltaire.com (Hal Rosenstock) Date: Fri, 4 Nov 2005 01:25:32 +0200 Subject: [openib-general] [PATCH] Re: uDAPL again Message-ID: <5CE025EE7D88BA4599A2C8FEFCF226F589A9E7@taurus.voltaire.com> On Thu, 2005-11-03 at 13:49, Arlin Davis wrote: > Yes, you can see from the debug print that GIDs, LIDs, pkey, mtu look ok. > > Here is Aniruddha's latest output from a run with opensm: > > path_comp_handler: ctxt 0x808a008, req_id 292 rec_num 1 > path_comp_handler: SRC GID subnet fe80000000000000 id 0002c901081e7471 > path_comp_handler: DST GID subnet fe80000000000000 id 0001730000008461 > path_comp_handler: slid 1 dlid 3 mtu 120203(2) pktlife 0(0) > path_comp_handler: hops 0 npaths 0 pkey ffff tclass 0 rate 0(0) The problem was in the uat library. Anirduddha, Can you update userspace/libibat, rebuild, and test ? You should get real rates and packet lifetimes now. -- Hal From iod00d at hp.com Thu Nov 3 16:21:01 2005 From: iod00d at hp.com (Grant Grundler) Date: Thu, 3 Nov 2005 16:21:01 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB In-Reply-To: <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> Message-ID: <20051104002101.GC1478@esmail.cup.hp.com> On Thu, Nov 03, 2005 at 05:37:47PM -0500, Rick Frank wrote: > It is very important to Oracle for RDS to be available in OpenIB in as many > Linux distributions as possible. > > Is this going to happen and in what timeframe / what are the plans for Linux > distributions to pick up OpenIB with RDS support ? OpenIB doesn't have RDS support yet AFAICT. Some code is in contrib/silverstorm/rds/ but not in the trunk where Roland can ship it to kernel.org and where every distro will look for it. But as Roland said, RDS doesn't even have a Makefile. I've reviewed some of it shortly after it got dropped in but still need to go through alot more of the code. > How can we (Oracle) help ? 1) Port contrib/silverstorm/rds/ to linux-kernel/infiniband/ulp/rds/ 2) include some docs on it's use and why RDS is better than SDP. 3) nag people to review the ported code 4) post functional test results That's a prioritized list if that helps. Regarding (2), I need to re-read your last OpenIB conf RDS presentation. ISTR there was a reason but the details escape me. You guys once explained it but the details didn't stick. cheers, grant From mshefty at ichips.intel.com Thu Nov 3 16:50:26 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Thu, 03 Nov 2005 16:50:26 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB In-Reply-To: <20051104002101.GC1478@esmail.cup.hp.com> References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> <20051104002101.GC1478@esmail.cup.hp.com> Message-ID: <436AB052.9070507@ichips.intel.com> Grant Grundler wrote: > But as Roland said, RDS doesn't even have a Makefile. > I've reviewed some of it shortly after it got dropped in but > still need to go through alot more of the code. The code that I've looked at doesn't appear to be written to any of the openib code. A Makefile wouldn't help, since I don't think that any of it would compile anyway. Porting it to openib will be a major rewrite. > 2) include some docs on it's use and why RDS is better than SDP. Does someone have a link to a doc or presentation on why RDS is better than SDP? Or better yet, some actual data showing how it provides better performance or scalability? - Sean From robert.j.woodruff at intel.com Thu Nov 3 16:53:26 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Thu, 3 Nov 2005 16:53:26 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB In-Reply-To: <20051104002101.GC1478@esmail.cup.hp.com> Message-ID: Grant wrote, >2) include some docs on it's use and why RDS is better than SDP. >3) nag people to review the ported code >4) post functional test results Looking at the code that is in the contrib branch, it looks like RDS uses connected channels, Is that correct ? If so, I do not see that it provides any value over SDP. If it indeed were using datagrams over IB, then I see that it might provide for better scaling than SDP, since with very large numbers of connections, memory usage becomes an issue, but as it is currently coded, I don't see the point. I was unable to attend the RDS talk at OpenIB workshop, so perhaps Rick can provide some reason why this protocol is better than SDP. woody From Richard.Frank at oracle.com Thu Nov 3 16:55:56 2005 From: Richard.Frank at oracle.com (Rick Frank) Date: Thu, 3 Nov 2005 19:55:56 -0500 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> <20051104002101.GC1478@esmail.cup.hp.com> <436AB052.9070507@ichips.intel.com> Message-ID: <00ec01c5e0da$846dd510$6401a8c0@YOURA11C73D0FD> RDS is UDP protocol with reliability added - it remains connectionless from the consumer perspective. SDP is connection based - at least currently. ----- Original Message ----- From: "Sean Hefty" To: "Grant Grundler" Cc: "Rick Frank" ; "Kothanda Umamageswaran (Kodi) (E-mail)" ; "Sumanta Chatterjee" ; Sent: Thursday, November 03, 2005 7:50 PM Subject: Re: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB > Grant Grundler wrote: >> But as Roland said, RDS doesn't even have a Makefile. >> I've reviewed some of it shortly after it got dropped in but >> still need to go through alot more of the code. > > The code that I've looked at doesn't appear to be written to any of the > openib code. A Makefile wouldn't help, since I don't think that any of it > would compile anyway. Porting it to openib will be a major rewrite. > >> 2) include some docs on it's use and why RDS is better than SDP. > > Does someone have a link to a doc or presentation on why RDS is better > than SDP? Or better yet, some actual data showing how it provides better > performance or scalability? > > - Sean > From rpandit at silverstorm.com Thu Nov 3 17:01:59 2005 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Thu, 3 Nov 2005 17:01:59 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB In-Reply-To: <20051104002101.GC1478@esmail.cup.hp.com> References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> <20051104002101.GC1478@esmail.cup.hp.com> Message-ID: <96f8e60e0511031701i7b9ce5a0gbdade306735695e6@mail.gmail.com> On 11/3/05, Grant Grundler wrote: > On Thu, Nov 03, 2005 at 05:37:47PM -0500, Rick Frank wrote: > > It is very important to Oracle for RDS to be available in OpenIB in as many > > Linux distributions as possible. > > > > Is this going to happen and in what timeframe / what are the plans for Linux > > distributions to pick up OpenIB with RDS support ? > > OpenIB doesn't have RDS support yet AFAICT. Some code is in > contrib/silverstorm/rds/ The code in contrib/silverstorm/rds is up-to-date but is on SilverStorm IbAccess layer. As previously mentioned, this code is for reference only and needs to be ported to OpenIB verbs and moved into trunk. > but not in the trunk where Roland can ship it to kernel.org > and where every distro will look for it. > > But as Roland said, RDS doesn't even have a Makefile. > I've reviewed some of it shortly after it got dropped in but > still need to go through alot more of the code. > I will go ahead and post the Makefile...but it's currently specific to SST Access layer. > > How can we (Oracle) help ? > > 1) Port contrib/silverstorm/rds/ to linux-kernel/infiniband/ulp/rds/ Grant, at the last OpenIB conference, you had volunteered to help port the code or, at the coding style. :) > 2) include some docs on it's use and why RDS is better than SDP. I will checkin the RDS presentations shortly. > 3) nag people to review the ported code > 4) post functional test results > > That's a prioritized list if that helps. > > Regarding (2), I need to re-read your last OpenIB conf RDS presentation. > ISTR there was a reason but the details escape me. You guys once explained > it but the details didn't stick. > > cheers, > grant > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > Ranjit From Richard.Frank at oracle.com Thu Nov 3 17:06:21 2005 From: Richard.Frank at oracle.com (Rick Frank) Date: Thu, 3 Nov 2005 20:06:21 -0500 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> <20051104002101.GC1478@esmail.cup.hp.com> <96f8e60e0511031701i7b9ce5a0gbdade306735695e6@mail.gmail.com> Message-ID: <010201c5e0db$f9ed3820$6401a8c0@YOURA11C73D0FD> I've atttached a draft proposal for RDS from Oracle which discusses some of the motivation for RDS. ----- Original Message ----- From: "Ranjit Pandit" To: "Grant Grundler" Cc: "Rick Frank" ; "Kothanda Umamageswaran (Kodi) (E-mail)" ; "Sumanta Chatterjee" ; Sent: Thursday, November 03, 2005 8:01 PM Subject: Re: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB On 11/3/05, Grant Grundler wrote: > On Thu, Nov 03, 2005 at 05:37:47PM -0500, Rick Frank wrote: > > It is very important to Oracle for RDS to be available in OpenIB in as > > many > > Linux distributions as possible. > > > > Is this going to happen and in what timeframe / what are the plans for > > Linux > > distributions to pick up OpenIB with RDS support ? > > OpenIB doesn't have RDS support yet AFAICT. Some code is in > contrib/silverstorm/rds/ The code in contrib/silverstorm/rds is up-to-date but is on SilverStorm IbAccess layer. As previously mentioned, this code is for reference only and needs to be ported to OpenIB verbs and moved into trunk. > but not in the trunk where Roland can ship it to kernel.org > and where every distro will look for it. > > But as Roland said, RDS doesn't even have a Makefile. > I've reviewed some of it shortly after it got dropped in but > still need to go through alot more of the code. > I will go ahead and post the Makefile...but it's currently specific to SST Access layer. > > How can we (Oracle) help ? > > 1) Port contrib/silverstorm/rds/ to linux-kernel/infiniband/ulp/rds/ Grant, at the last OpenIB conference, you had volunteered to help port the code or, at the coding style. :) > 2) include some docs on it's use and why RDS is better than SDP. I will checkin the RDS presentations shortly. > 3) nag people to review the ported code > 4) post functional test results > > That's a prioritized list if that helps. > > Regarding (2), I need to re-read your last OpenIB conf RDS presentation. > ISTR there was a reason but the details escape me. You guys once explained > it but the details didn't stick. > > cheers, > grant > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > Ranjit -------------- next part -------------- A non-text attachment was scrubbed... Name: Proposal for a Reliable Datagram Socket Interface.doc Type: application/msword Size: 51200 bytes Desc: not available URL: From hozer at hozed.org Thu Nov 3 21:39:15 2005 From: hozer at hozed.org (Troy Benjegerdes) Date: Thu, 3 Nov 2005 23:39:15 -0600 Subject: [openib-general] libehca causes segfault when not physically present.. In-Reply-To: <52r79x4k95.fsf@cisco.com> References: <20051031071703.GU3275@kalmia.hozed.org> <436A1E96.4050003@de.ibm.com> <52r79x4k95.fsf@cisco.com> Message-ID: <20051104053915.GJ3275@kalmia.hozed.org> On Thu, Nov 03, 2005 at 11:13:58AM -0800, Roland Dreier wrote: > Heiko> this bug should be fixed in OpenIB trunk 3960. > > It's good to see this fixed and all the other cleanups in this > checkin. I'll have to go back to my ehca code reviewing.... > > However, when this code moves upstream, you'll have to make your > changes in smaller digestible chunks. The diff between r3959 and > r3960 is rather gigantic: > > 33 files changed, 945 insertions(+), 1163 deletions(-) > > And this piece: > > > -MODULE_VERSION("EHCA2_0035"); > > +MODULE_VERSION("EHCA2_0037"); > > indicates that there was a 0036 that you never let anyone see. I'll second the comment about smaller digestible chunks. A second thing I don't completely understand is the vast size difference between the ehca and mthca drivers. Is the ehca really that much more complex? I also want to comment that EHCA is the only thing that's versioned that is easy to tell what version of the module is actually loaded at the moment. I'd rather have versions I don't see float by than see every file in mthca get updated, but no version rev. I tried adding some code to generate a version string from the outupt of svnversion but it didn't work too well. The same goes for OpenSM as well.. the only version string you get when starting it is 'Opensm-1.1.0', which isn't very usefull. And once that's figured out, maybe we can start thinking about how to make sure kernel module versions match userspace versions. Personally, I'd like to see the ehca functions exported as a VDSO. From halr at voltaire.com Thu Nov 3 21:46:05 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Nov 2005 00:46:05 -0500 Subject: [openib-general] libehca causes segfault when not physically present.. In-Reply-To: <20051104053915.GJ3275@kalmia.hozed.org> References: <20051031071703.GU3275@kalmia.hozed.org> <436A1E96.4050003@de.ibm.com> <52r79x4k95.fsf@cisco.com> <20051104053915.GJ3275@kalmia.hozed.org> Message-ID: <1131083165.4340.1631.camel@hal.voltaire.com> On Fri, 2005-11-04 at 00:39, Troy Benjegerdes wrote: > On Thu, Nov 03, 2005 at 11:13:58AM -0800, Roland Dreier wrote: > > Heiko> this bug should be fixed in OpenIB trunk 3960. > > > > It's good to see this fixed and all the other cleanups in this > > checkin. I'll have to go back to my ehca code reviewing.... > > > > However, when this code moves upstream, you'll have to make your > > changes in smaller digestible chunks. The diff between r3959 and > > r3960 is rather gigantic: > > > > 33 files changed, 945 insertions(+), 1163 deletions(-) > > > > And this piece: > > > > > -MODULE_VERSION("EHCA2_0035"); > > > +MODULE_VERSION("EHCA2_0037"); > > > > indicates that there was a 0036 that you never let anyone see. > > I'll second the comment about smaller digestible chunks. A second thing > I don't completely understand is the vast size difference between the > ehca and mthca drivers. Is the ehca really that much more complex? > > I also want to comment that EHCA is the only thing that's versioned that > is easy to tell what version of the module is actually loaded at the > moment. I'd rather have versions I don't see float by than see every > file in mthca get updated, but no version rev. > > I tried adding some code to generate a version string from the outupt of > svnversion but it didn't work too well. > > The same goes for OpenSM as well.. the only version string you get when > starting it is 'Opensm-1.1.0', which isn't very usefull. What version would you propose for OpenSM ? Should I change the last something with every checkin ? Or perhaps append it with 1.1.0-svn version until we hit rc1 for this ? -- Hal > And once that's figured out, maybe we can start thinking about how to > make sure kernel module versions match userspace versions. Personally, > I'd like to see the ehca functions exported as a VDSO. > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From RAISCH at de.ibm.com Fri Nov 4 02:32:27 2005 From: RAISCH at de.ibm.com (Christoph Raisch) Date: Fri, 4 Nov 2005 11:32:27 +0100 Subject: [openib-general] libehca causes segfault when not physically present.. In-Reply-To: <52r79x4k95.fsf@cisco.com> Message-ID: The secret of 0036 was that we managed to build a driver which didn't work in all cases. I would guess in these changes > 33 files changed, 945 insertions(+), 1163 deletions(-) There are about ~80 lines of new code, rest of all that were modifications which don't change any algorithm but are desperately needed to be kernel coding style compliant: That's the mostly complete list what we've changed removing+renaming already existing assembly macros in ehca_asm.h changed the ehca_module pointer to a ehca_module struct removed EHCA_MEMPAGESIZE replaced quite a lot of typedef struct by struct capitalize DEFINES, changed most struct members to small letters replaced all ehca_retcode_t by u64 replaced the ehca_sleep() by appropriate kernel function replaced the assert() by BUG_ON() replaced ntohd() some naming and comment cleanup on struct hcp_modify_qp_control_block Roland, in case you're missing some some changes in there, we'll add these to one of the next releases to seperate the coding style cleanups from the functional changes. Gruss / Regards . . . Christoph R. Roland Dreier <> wrote on 03.11.2005 20:13:58: > Heiko> this bug should be fixed in OpenIB trunk 3960. > > It's good to see this fixed and all the other cleanups in this > checkin. I'll have to go back to my ehca code reviewing.... > > However, when this code moves upstream, you'll have to make your > changes in smaller digestible chunks. The diff between r3959 and > r3960 is rather gigantic: > > 33 files changed, 945 insertions(+), 1163 deletions(-) > > And this piece: > > > -MODULE_VERSION("EHCA2_0035"); > > +MODULE_VERSION("EHCA2_0037"); > > indicates that there was a 0036 that you never let anyone see. > > I would suggest you try to use the openib.org svn tree as your real > development repository. This will be the way you will have to work > once your driver is in the upstream kernel, and even now you will get > benefit from getting better patch review and having users better able > to pin down when a regression might have been introduced. > > For your latest checkin, it would have been better to see a series of > changesets with commit logs like: > > - remove asm_sync_mem() and mftb(), which duplicate existing > definitions in include/asm-ppc64 > - make sure device is an eHCA in libehca's openib_driver_init() > - update Kconfig help text > > and so on... > > Thanks, > Roland -------------- next part -------------- An HTML attachment was scrubbed... URL: From mst at mellanox.co.il Fri Nov 4 04:23:31 2005 From: mst at mellanox.co.il (Michael S. Tsirkin) Date: Fri, 4 Nov 2005 14:23:31 +0200 Subject: [openib-general] [PATCH] sdp zero copy support Message-ID: <20051104122331.GB15158@mellanox.co.il> Pls review the following. I wont have performance numbers for this code that till next week. --- Add zero copy support to synchronous socket operations (send_msg/recv_msg). This patch also includes a couple of fixes for aio, which I'll split and commit separately. Signed-off-by: Michael S. Tsirkin Index: drivers/infiniband/ulp/sdp/Kconfig =================================================================== --- drivers/infiniband/ulp/sdp/Kconfig (revision 3958) +++ drivers/infiniband/ulp/sdp/Kconfig (working copy) @@ -8,6 +8,20 @@ libsdp library from to have standard sockets applications use SDP. +config INFINIBAND_SDP_SEND_ZCOPY + bool "Sockets Direct Protocol Zero Copy Send support" + depends on INFINIBAND_SDP + default y + ---help--- + This option enables Zero Copy support for send_msg transactions. + +config INFINIBAND_SDP_RECV_ZCOPY + bool "Sockets Direct Protocol Zero Copy Receive support" + depends on INFINIBAND_SDP && INFINIBAND_SDP_SEND_ZCOPY + default y + ---help--- + This option enables Zero Copy support for recv_msg transactions. + config INFINIBAND_SDP_DEBUG bool "Sockets Direct Protocol debugging" depends on INFINIBAND_SDP Index: drivers/infiniband/ulp/sdp/sdp_rcvd.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_rcvd.c (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_rcvd.c (working copy) @@ -439,6 +439,11 @@ sdp_advt_destroy(advt); } + +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + /* There are no more src_pend, wake any waiting thread */ + sdp_iocb_wake(&conn->src_wait_list); +#endif /* * If there are active reads, mark the connection as being in * source cancel. Otherwise Index: drivers/infiniband/ulp/sdp/sdp_sock.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_sock.h (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_sock.h (working copy) @@ -61,7 +61,9 @@ #define SDP_ZCOPY_THRSH_SRC 257 /* Threshold for AIO write advertisments */ #define SDP_ZCOPY_THRSH_SNK 258 /* Threshold for AIO read advertisments */ #define SDP_ZCOPY_THRSH 256 /* Convenience for read and write */ - +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +#define SDP_ZCOPY_CANCEL_TIMEOUT (HZ * 60) /* Time before abortive close */ +#endif /* * Default values for SDP specific socket options. (for reference) */ Index: drivers/infiniband/ulp/sdp/sdp_proto.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_proto.h (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_proto.h (working copy) @@ -152,7 +152,10 @@ void sdp_iocb_q_put_tail(struct sdpc_iocb_q *table, struct sdpc_iocb *iocb); struct sdpc_iocb *sdp_iocb_q_lookup(struct sdpc_iocb_q *table, u32 key); +struct sdpc_iocb *sdp_iocb_q_lookup_req(struct sdpc_iocb_q *table, struct kiocb *req); +void sdp_iocb_q_mark_cancel(struct sdpc_iocb_q *table, struct kiocb *req); + void sdp_iocb_q_cancel(struct sdpc_iocb_q *table, u32 mask, ssize_t comp); void sdp_iocb_q_remove(struct sdpc_iocb *iocb); @@ -197,6 +200,8 @@ void *arg), void *arg); +int sdp_iocb_find_req(struct sdpc_desc *element, void *arg); + int sdp_desc_q_types_size(struct sdpc_desc_q *table, enum sdp_desc_type type); Index: drivers/infiniband/ulp/sdp/sdp_read.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_read.c (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_read.c (working copy) @@ -93,6 +93,12 @@ } } +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + /* If there are no more src_pend, wake any waiting thread */ + if (!sdp_advt_q_size(&conn->src_pend)) + sdp_iocb_wake(&conn->src_wait_list); + +#endif done: return 0; error: Index: drivers/infiniband/ulp/sdp/sdp_send.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_send.c (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_send.c (working copy) @@ -122,6 +122,10 @@ send_param.send_flags |= IB_SEND_SIGNALED; conn->send_cons = 0; } + + if (buff->bsdh_hdr->mid == SDP_MID_SRC_CANCEL) + sdp_dbg_ctrl(conn, "SRC_CANCEL bsdh_hdr->seq_num = %d conn->send_seq=%d\n", + buff->bsdh_hdr->seq_num, conn->send_seq); /* * post send */ @@ -1680,8 +1684,8 @@ static int sdp_inet_write_cancel(struct kiocb *req, struct io_event *ev) { struct sock_iocb *si = kiocb_to_siocb(req); - struct sdp_sock *conn; struct sdpc_iocb *iocb; + struct sdp_sock *conn; int result = 0; sdp_dbg_ctrl(NULL, "Cancel Write IOCB user <%d> key <%d> flag <%08lx>", @@ -1738,7 +1742,7 @@ /* * completion reference */ - aio_put_req(req); + aio_put_req(iocb->req); result = 0; } @@ -1797,9 +1801,8 @@ * no IOCB found. The cancel is probably in a race with a completion. * Assume the IOCB will be completed, return appropriate value. */ - sdp_warn("Cancel write with no IOCB. <%d:%d:%08lx>", - req->ki_users, req->ki_key, req->ki_flags); - + sdp_dbg_warn(conn, "Cancel write with no IOCB. <%d:%d:%08lx>", + req->ki_users, req->ki_key, req->ki_flags); result = -EAGAIN; unlock: @@ -1810,7 +1813,151 @@ return result; } +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +static int sdp_write_src_cancel(struct sdpc_desc *element, void *arg) +{ + struct sdpc_iocb *iocb = (struct sdpc_iocb *) element; + struct kiocb *req = (struct kiocb *)arg; + + if (element->type == SDP_DESC_TYPE_IOCB && iocb->req == req) + iocb->flags |= SDP_IOCB_F_CANCEL; + return -ERANGE; +} + +static int sdp_req_busy(struct sdp_sock *conn, struct sdpc_iocb_wait *wait) +{ + unsigned long flags; + int result = -EAGAIN; + + sdp_conn_lock(conn); + sdp_conn_unlock(conn); + + spin_lock_irqsave(&wait->lock, flags); + if (!wait->outstanding) + result = 0; + spin_unlock_irqrestore(&wait->lock, flags); + return result; +} /* + * sdp_write_cancel - cancel a synchronous IO operation + */ +static int sdp_write_cancel(struct kiocb *req, struct sdp_sock *conn, + struct sdpc_iocb_wait *wait) +{ + struct sdpc_iocb *iocb; + int result = 0; + + sdp_dbg_ctrl(NULL, "Cancel Write IOCB user <%d> key <%d> flag <%08lx>", + req->ki_users, req->ki_key, req->ki_flags); + + sdp_conn_lock(conn); + + sdp_dbg_ctrl(conn, "Cancel Write IOCB. <%08x:%04x> <%08x:%04x>", + conn->src_addr, conn->src_port, + conn->dst_addr, conn->dst_port); + /* + * attempt to find the IOCB for this key. we don't have an indication + * whether this is a read or write. + */ + + while ((iocb = (struct sdpc_iocb *) + sdp_desc_q_lookup(&conn->send_queue, sdp_iocb_find_req, req))) { + iocb->flags |= SDP_IOCB_F_CANCEL; + + /* + * always remove the IOCB. + * If active, then place it into the correct active queue + */ + sdp_desc_q_remove((struct sdpc_desc *)iocb); + + if (iocb->flags & SDP_IOCB_F_ACTIVE) { + if (iocb->flags & SDP_IOCB_F_RDMA_W) + sdp_desc_q_put_tail(&conn->w_snk, + (struct sdpc_desc *)iocb); + else { + SDP_EXPECT((iocb->flags & SDP_IOCB_F_RDMA_R)); + + sdp_iocb_q_put_tail(&conn->w_src, iocb); + } + } else { + /* + * empty IOCBs can be deleted, while partials + * needs to be compelted. + */ + if (iocb->post > 0) { + sdp_iocb_complete(iocb, 0); + result = -EAGAIN; + } else { + sdp_iocb_destroy(iocb); + + /* + * completion reference + */ + if (!iocb->wait) + aio_put_req(iocb->req); + else { + unsigned long flags; + spin_lock_irqsave(&iocb->wait->lock, flags); + --iocb->wait->outstanding; + /* No need to wake up, + since we call sdp_req_busy + directly below */ + + spin_unlock_irqrestore(&iocb->wait->lock, flags); + } + } + } + } + + /* + * check the sink queue, not much to do, since the operation is + * already in flight. + */ + sdp_desc_q_lookup(&conn->w_snk, sdp_write_src_cancel, req); + + iocb = (struct sdpc_iocb *)sdp_desc_q_lookup(&conn->w_snk, + sdp_iocb_find_req, + req); + if (iocb) { + sdp_dbg_ctrl(conn, "Sink Queue busy\n"); + result = -EAGAIN; + } + + /* + * check source queue. If we're in the source queue, then a cancel + * needs to be issued. + */ + sdp_iocb_q_mark_cancel(&conn->w_src, req); + + iocb = sdp_iocb_q_lookup_req(&conn->w_src, req); + if (iocb) { + sdp_dbg_ctrl(conn, "Sending Src Cancel\n"); + + if (! (conn->flags & SDP_CONN_F_SRC_CANCEL_L)) { + sdp_desc_q_lookup(&conn->w_snk, sdp_write_src_cancel, req); + conn->flags |= SDP_CONN_F_SRC_CANCEL_L; + result = sdp_send_ctrl_src_cancel(conn); + SDP_EXPECT(result >= 0); + } + + result = -EAGAIN; + } + + if (!result) { + /* + * no IOCB found. Assume the IOCB will be completed. + */ + sdp_dbg_ctrl(conn, "Cancel IOCB done. <%d:%d:%08lx>", + req->ki_users, req->ki_key, req->ki_flags); + } + + sdp_conn_unlock(conn); + + return sdp_req_busy(conn, wait); +} +#endif + +/* * sdp_send_flush_advt - Flush passive sink advertisments */ static int sdp_send_flush_advt(struct sdp_sock *conn) @@ -1987,7 +2134,7 @@ return timeout; } -static inline int sdp_queue_iocb(struct kiocb *req, struct sdp_sock *conn, +static inline int sdp_queue_aio(struct kiocb *req, struct sdp_sock *conn, struct msghdr *msg, size_t size, size_t *copied) { @@ -2038,14 +2185,79 @@ return -EIOCBQUEUED; } +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +static inline int sdp_queue_sync(struct kiocb *req, struct sdp_sock *conn, + struct msghdr *msg, size_t size, + size_t *copied, + struct sdpc_iocb_wait *wait) +{ + struct sdpc_iocb *iocb; + struct iovec *msg_iov; + unsigned long flags; + size_t len; + int result; + /* + * create IOCB with remaining space + */ + iocb = sdp_iocb_create(); + if (!iocb) { + sdp_dbg_warn(conn, "Failed to allocate IOCB <%Zu:%ld>", + size, (long)*copied); + return -ENOMEM; + } + + for (msg_iov = msg->msg_iov; !msg->msg_iov->iov_len; ++msg_iov); + + /* FMR alignment can add an extra page. */ + len = min(msg_iov->iov_len, (size_t)SDP_IOCB_SIZE_MAX - 4096); + iocb->len = len; + iocb->post = 0; + iocb->size = len; + iocb->req = req; + iocb->key = req->ki_key; + iocb->addr = (unsigned long)msg_iov->iov_base; + iocb->wait = wait; + + result = sdp_iocb_lock(iocb); + if (result < 0) { + sdp_dbg_warn(conn, "Error <%d> locking IOCB <%Zu:%ld>", + result, size, (long)copied); + + sdp_iocb_destroy(iocb); + return result; + } + + SDP_CONN_STAT_WQ_INC(conn, iocb->size); + + result = sdp_send_data_queue(conn, (struct sdpc_desc *)iocb); + if (result < 0) { + sdp_dbg_warn(conn, "Error <%d> queueing write IOCB", result); + sdp_iocb_destroy(iocb); + return result; + } + + spin_lock_irqsave(&wait->lock, flags); + ++wait->outstanding; + spin_unlock_irqrestore(&wait->lock, flags); + + conn->send_pipe += len; + *copied += len; /* copied amount was saved in IOCB. */ + msg_iov->iov_len -= len; + msg_iov->iov_base += len; + return 0; +} +#endif /* * sdp_inet_send - send data from user space to the network */ int sdp_inet_send(struct kiocb *req, struct socket *sock, struct msghdr *msg, size_t size) { - struct sock *sk; - struct sdp_sock *conn; +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + struct sdpc_iocb_wait wait; +#endif + struct sock *sk; + struct sdp_sock *conn; int result = 0; size_t copied = 0; int oob, zcopy; @@ -2074,6 +2286,7 @@ if (conn->state == SDP_CONN_ST_LISTEN || conn->state == SDP_CONN_ST_CLOSED) { result = -ENOTCONN; + sdp_conn_unlock(conn); goto done; } /* @@ -2082,13 +2295,24 @@ * they are smaller then the zopy threshold, but only if there is * no buffer write space. */ - zcopy = (size >= conn->src_zthresh && !is_sync_kiocb(req)); + zcopy = (size >= conn->src_zthresh && (!is_sync_kiocb(req) +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + || (!(msg->msg_flags & MSG_DONTWAIT) && !oob) +#endif + )); /* * clear ASYN space bit, it'll be reset if there is no space. */ if (!zcopy) clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags); +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + else if (is_sync_kiocb(req)) { + init_waitqueue_head(&wait.wait); + spin_lock_init(&wait.lock); + wait.outstanding = 0; + } +#endif /* * process data first if window is open, next check conditions, then * wait if there is more work to be done. The absolute window size is @@ -2143,14 +2367,45 @@ * completion. Wait on sync IO call create IOCB for async * call. */ +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + if (is_sync_kiocb(req) && zcopy) + result = sdp_queue_sync(req, conn, msg, size, &copied, + &wait); + /* TODO: limit the # of outstanding reqs */ + /* TODO: sleep on recoverable errors */ + else +#endif if (is_sync_kiocb(req)) timeout = sdp_wait_till_space(sk, conn, oob, timeout); else - result = sdp_queue_iocb(req, conn, msg, size, &copied); + result = sdp_queue_aio(req, conn, msg, size, &copied); } + sdp_conn_unlock(conn); + +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + if (!result && is_sync_kiocb(req) && zcopy) { + timeout = wait_event_interruptible_timeout(wait.wait, + !sdp_req_busy(conn, &wait), timeout); + if (!timeout) + result = -EAGAIN; + } + + if (signal_pending(current) && is_sync_kiocb(req) && zcopy) { + result = (timeout > 0) ? sock_intr_errno(timeout) : -EAGAIN; + + timeout = wait_event_timeout(wait.wait, + !sdp_write_cancel(req, conn, &wait), + SDP_ZCOPY_CANCEL_TIMEOUT); + if (!timeout) { + sdp_warn("sdp_write_cancel timed out. Abort.\n"); + sdp_conn_lock(conn); + sdp_conn_abort(conn); + sdp_conn_unlock(conn); + } + } +#endif done: - sdp_conn_unlock(conn); result = ((copied > 0) ? copied : result); if (result == -EPIPE && !(msg->msg_flags & MSG_NOSIGNAL)) Index: drivers/infiniband/ulp/sdp/sdp_conn.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_conn.c (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_conn.c (working copy) @@ -1279,7 +1279,15 @@ * connection lock */ sdp_conn_lock_init(conn); + +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY /* + * Tasks to wake up when we finish all src avail + */ + INIT_LIST_HEAD(&conn->src_wait_list); + +#endif + /* * insert connection into lookup table */ result = sdp_conn_table_insert(conn); Index: drivers/infiniband/ulp/sdp/sdp_recv.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_recv.c (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_recv.c (working copy) @@ -327,6 +327,10 @@ iocb = sdp_iocb_q_look(&conn->r_pend); if (!iocb) return ENODEV; +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + if (iocb->flags & SDP_IOCB_F_WAITALL) + return ENODEV; +#endif /* * check zcopy threshold */ @@ -708,6 +712,9 @@ */ if (!iocb->len || (!conn->src_recv && +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + !(iocb->flags & SDP_IOCB_F_WAITALL) && +#endif !(sk_sdp(conn)->sk_rcvlowat > iocb->post))) { /* * complete IOCB @@ -1055,7 +1062,178 @@ return result; } +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY /* + * sdp_read_cancel - cancel a synchronous IO operation + */ +static int sdp_read_cancel(struct kiocb *req, struct sdp_sock *conn, + struct sdpc_iocb_wait *wait, size_t *copied) +{ + struct sdpc_iocb *iocb; + int result = 0; + + sdp_dbg_ctrl(NULL, "Cancel Read IOCB. user <%d> key <%d> flag <%08lx>", + req->ki_users, req->ki_key, req->ki_flags); + + sdp_dbg_ctrl(conn, "Cancel Read IOCB. <%08x:%04x> <%08x:%04x>", + conn->src_addr, conn->src_port, + conn->dst_addr, conn->dst_port); + /* + * attempt to find the IOCB for this key. we don't have an indication + * whether this is a read or write. + */ + while ((iocb = sdp_iocb_q_lookup_req(&conn->r_pend, req))) { + /* + * always remove the IOCB. If active, then place it into + * the correct active queue. Inactive empty IOCBs can be + * deleted, while inactive partials needs to be compelted. + */ + sdp_iocb_q_remove(iocb); + + if (!(iocb->flags & SDP_IOCB_F_ACTIVE)) { + *copied -= iocb->len; + if (iocb->post > 0) { + /* + * callback to complete IOCB, or drop reference + */ + sdp_iocb_complete(iocb, 0); + result = -EAGAIN; + } + else { + sdp_iocb_destroy(iocb); + /* + * completion reference + */ + if (iocb->wait) { + unsigned long flags; + spin_lock_irqsave(&iocb->wait->lock, flags); + if (!--iocb->wait->outstanding) { + wake_up(&iocb->wait->wait); + } + spin_unlock_irqrestore(&iocb->wait->lock, flags); + } else + aio_put_req(req); + + result = 0; + } + + goto out; + } + + if (iocb->flags & SDP_IOCB_F_RDMA_W) + sdp_iocb_q_put_tail(&conn->r_snk, iocb); + else { + SDP_EXPECT((iocb->flags & SDP_IOCB_F_RDMA_R)); + + sdp_desc_q_put_tail(&conn->r_src, + (struct sdpc_desc *)iocb); + } + } + /* + * check the source queue, not much to do, since the operation is + * already in flight. + */ + iocb = (struct sdpc_iocb *)sdp_desc_q_lookup(&conn->r_src, + sdp_iocb_find_req, req); + if (iocb) { + iocb->flags |= SDP_IOCB_F_CANCEL; + result = -EAGAIN; + + goto out; + } + /* + * check sink queue. If we're in the sink queue, then a cancel + * needs to be issued. + */ + iocb = sdp_iocb_q_lookup_req(&conn->r_snk, req); + if (iocb) { + /* + * Unfortunetly there is only a course grain cancel in SDP, so + * we have to cancel everything. + */ + if (!(conn->flags & SDP_CONN_F_SNK_CANCEL)) { + + result = sdp_send_ctrl_snk_cancel(conn); + SDP_EXPECT(result >= 0); + + conn->flags |= SDP_CONN_F_SNK_CANCEL; + } + + iocb->flags |= SDP_IOCB_F_CANCEL; + result = -EAGAIN; + + goto out; + } + /* + * no IOCB found. The cancel is probably in a race with a completion. + * Assume the IOCB will be completed, return appropriate value. + */ + sdp_dbg_ctrl(NULL, "Cancel read with no IOCB. <%d:%d:%08lx>", + req->ki_users, req->ki_key, req->ki_flags); + + result = -EAGAIN; + +out: + return result; +} + +static int sdp_req_busy(struct kiocb *req, struct sdp_sock *conn, + struct sdpc_iocb_wait *wait, int waitall, + size_t *copied) +{ + struct sdpc_iocb *iocb; + unsigned long flags; + int result = -EAGAIN; + + for (;;) { + spin_lock_irqsave(&wait->lock, flags); + iocb = sdp_iocb_q_get_head(&wait->q); + if (!iocb) + break; + --wait->outstanding; + spin_unlock_irqrestore(&wait->lock, flags); + + sdp_iocb_release(iocb); + sdp_iocb_unlock(iocb); + sdp_iocb_destroy(iocb); + } + + if (!wait->outstanding) + result = 0; + + spin_unlock_irqrestore(&wait->lock, flags); + + sdp_conn_lock(conn); + + /* If WAITALL is clear, and there are no more src_pend, + remove all pending iocbs */ + if (!waitall && !sdp_advt_q_size(&conn->src_pend)) { + sdp_read_cancel(req, conn, wait, copied); + result = 0; + } + + if (!result) + list_del_init(&wait->src_wait_list); + + sdp_conn_unlock(conn); + + return result; +} +/* + * sdp_inet_read_cancel - cancel an IO operation + */ +static int sdp_cancel_read(struct kiocb *req, struct sdp_sock *conn, + struct sdpc_iocb_wait *wait, size_t *copied) +{ + sdp_conn_lock(conn); + sdp_read_cancel(req, conn, wait, copied); + sdp_conn_unlock(conn); + + return sdp_req_busy(req, conn, wait, 1, copied); +} +#endif + +/* * sdp_inet_recv - recv data from the network to user space */ int sdp_inet_recv(struct kiocb *req, struct socket *sock, struct msghdr *msg, @@ -1065,17 +1243,22 @@ struct sdp_sock *conn; struct sdpc_iocb *iocb; struct sdpc_buff *buff; - long timeout; + long timeout = 0 /*Turn off compiler warning */; size_t length; int result = 0; int expect; int low_water; - int copied = 0; + size_t copied = 0; int copy; int update; s8 oob = 0; s8 ack = 0; struct sdpc_buff_q peek_queue; +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + int zcopy = 0; + struct sdpc_iocb_wait wait; + unsigned long f; +#endif sk = sock->sk; conn = sdp_sk(sk); @@ -1293,6 +1476,76 @@ /* * Either wait or create IOCB for defered completion. */ +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + if (is_sync_kiocb(req) && !(flags & MSG_PEEK) && + (zcopy || size - copied >= conn->snk_zthresh) && + (conn->src_recv || + (low_water - copied >= conn->snk_zthresh))) { + struct iovec *msg_iov; + size_t len; + /* + * create IOCB with remaining space + */ + iocb = sdp_iocb_create(); + if (!iocb) { + sdp_dbg_warn(conn, + "Error allocating IOCB <%Zu:%Zd>", + size, copied); + result = -ENOMEM; + break; + } + + for (msg_iov = msg->msg_iov; !msg->msg_iov->iov_len; ++msg_iov); + + /* FMR alignment can add an extra page. */ + len = min(msg_iov->iov_len, (size_t)SDP_IOCB_SIZE_MAX - 4096); + iocb->len = len; + iocb->post = 0; + iocb->size = len; + iocb->req = req; + iocb->key = req->ki_key; + iocb->addr = (unsigned long)msg_iov->iov_base; + iocb->wait = &wait; + + iocb->flags |= SDP_IOCB_F_RECV | SDP_IOCB_F_WAITALL; + + req->ki_cancel = sdp_inet_read_cancel; + + result = sdp_iocb_lock(iocb); + if (result < 0) { + sdp_dbg_warn(conn, + "Error <%d> IOCB lock <%Zu:%Zd>", + result, size, copied); + + sdp_iocb_destroy(iocb); + break; + } + + SDP_CONN_STAT_RQ_INC(conn, iocb->size); + + if (!zcopy) { + init_waitqueue_head(&wait.wait); + INIT_LIST_HEAD(&wait.src_wait_list); + spin_lock_init(&wait.lock); + sdp_iocb_q_init(&wait.q); + wait.outstanding = 0; + zcopy = 1; + } + + sdp_iocb_q_put_tail(&conn->r_pend, iocb); + + spin_lock_irqsave(&wait.lock, f); + ++wait.outstanding; + spin_unlock_irqrestore(&wait.lock, f); + + /* TODO: set it? */ + ack = 1; + copied += len; + msg_iov->iov_len -= len; + msg_iov->iov_base += len; + break; + } else +#endif if (is_sync_kiocb(req)) { DECLARE_WAITQUEUE(wait, current); @@ -1325,7 +1578,7 @@ iocb = sdp_iocb_create(); if (!iocb) { sdp_dbg_warn(conn, - "Error allocating IOCB <%Zu:%d>", + "Error allocating IOCB <%Zu:%Zd>", size, copied); result = -ENOMEM; break; @@ -1346,7 +1599,7 @@ result = sdp_iocb_lock(iocb); if (result < 0) { sdp_dbg_warn(conn, - "Error <%d> IOCB lock <%Zu:%d>", + "Error <%d> IOCB lock <%Zu:%Zd>", result, size, copied); sdp_iocb_destroy(iocb); @@ -1382,6 +1635,43 @@ while ((buff = sdp_buff_q_get_tail(&peek_queue))) sdp_buff_q_put_head(&conn->recv_pool, buff); +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + /* If WAITALL is clear, wake up also when we run out of src avail */ + if (!result && is_sync_kiocb(req) && zcopy && !(flags & MSG_WAITALL)) { + list_add_tail(&conn->src_wait_list, &wait.src_wait_list); + } +#endif sdp_conn_unlock(conn); +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + if (!result && is_sync_kiocb(req) && zcopy) { + timeout = wait_event_interruptible_timeout(wait.wait, + !sdp_req_busy(req, conn, &wait, + (flags & MSG_WAITALL), &copied), + timeout); + if (!timeout) { + result = -EAGAIN; + if (!(flags & MSG_WAITALL)) { + sdp_conn_lock(conn); + list_del_init(&wait.src_wait_list); + sdp_conn_unlock(conn); + } + } + } + + if (signal_pending(current) && is_sync_kiocb(req) && zcopy) { + result = (timeout > 0) ? sock_intr_errno(timeout) : -EAGAIN; + + timeout = wait_event_timeout(wait.wait, + !sdp_cancel_read(req, conn, &wait, &copied), + SDP_ZCOPY_CANCEL_TIMEOUT); + if (!timeout) { + sdp_warn("sdp_read_cancel timed out. Abort.\n"); + sdp_conn_lock(conn); + sdp_conn_abort(conn); + sdp_conn_unlock(conn); + } + } +#endif + return ((copied > 0) ? copied : result); } Index: drivers/infiniband/ulp/sdp/sdp_conn.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_conn.h (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_conn.h (working copy) @@ -377,6 +377,9 @@ #ifdef _SDP_CONN_STATE_REC struct sdp_conn_state state_rec; #endif +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + struct list_head src_wait_list; +#endif }; #define SDP_WRAP_GT(x, y) ((signed int)((x) - (y)) > 0) Index: drivers/infiniband/ulp/sdp/sdp_iocb.c =================================================================== --- drivers/infiniband/ulp/sdp/sdp_iocb.c (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_iocb.c (working copy) @@ -307,12 +307,23 @@ sdp_dbg_data(NULL, "IOCB complete. <%d:%d:%08lx> value <%ld>", iocb->req->ki_users, iocb->req->ki_key, iocb->req->ki_flags, value); + +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + if (iocb->wait) { + unsigned long flags; + spin_lock_irqsave(&iocb->wait->lock, flags); + if (!--iocb->wait->outstanding) { + wake_up(&iocb->wait->wait); + } + spin_unlock_irqrestore(&iocb->wait->lock, flags); + } else +#endif + /* + * valid result can be 0 or 1 for complete so + * we ignore the value. + */ + (void)aio_complete(iocb->req, value, 0); /* - * valid result can be 0 or 1 for complete so - * we ignore the value. - */ - (void)aio_complete(iocb->req, value, 0); - /* * delete IOCB */ sdp_iocb_destroy(iocb); @@ -325,7 +336,17 @@ { iocb->status = status; - if (in_atomic() || irqs_disabled()) { +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + if ((iocb->flags & SDP_IOCB_F_RECV) && iocb->wait) { + unsigned long flags; + spin_lock_irqsave(&iocb->wait->lock, flags); + sdp_iocb_q_put_tail(&iocb->wait->q, iocb); + wake_up(&iocb->wait->wait); + spin_unlock_irqrestore(&iocb->wait->lock, flags); + } else +#endif + if ((iocb->flags & SDP_IOCB_F_RECV) && + (in_atomic() || irqs_disabled())) { INIT_WORK(&iocb->completion, do_iocb_complete, (void *)iocb); schedule_work(&iocb->completion); } else @@ -382,6 +403,43 @@ return NULL; } +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +struct sdpc_iocb *sdp_iocb_q_lookup_req(struct sdpc_iocb_q *table, struct kiocb *req) +{ + struct sdpc_iocb *iocb = NULL; + int counter; + + for (counter = 0, iocb = table->head; counter < table->size; + counter++, iocb = iocb->next) + if (iocb->req == req) + return iocb; + + return NULL; +} + +void sdp_iocb_q_mark_cancel(struct sdpc_iocb_q *table, struct kiocb *req) +{ + struct sdpc_iocb *iocb = NULL; + int counter; + + for (counter = 0, iocb = table->head; counter < table->size; + counter++, iocb = iocb->next) + if (iocb->req == req) + iocb->flags |= SDP_IOCB_F_CANCEL; + +} + +int sdp_iocb_find_req(struct sdpc_desc *element, void *arg) +{ + struct sdpc_iocb *iocb = (struct sdpc_iocb *) element; + struct kiocb *req = (struct kiocb *)arg; + + if (element->type == SDP_DESC_TYPE_IOCB && iocb->req == req) + return 0; + return -ERANGE; +} +#endif + /* * sdp_iocb_create - create an IOCB object */ Index: drivers/infiniband/ulp/sdp/sdp_iocb.h =================================================================== --- drivers/infiniband/ulp/sdp/sdp_iocb.h (revision 3958) +++ drivers/infiniband/ulp/sdp/sdp_iocb.h (working copy) @@ -55,6 +55,9 @@ #define SDP_IOCB_F_LOCKED 0x00000040 /* IOCB is locked in memory */ #define SDP_IOCB_F_REG 0x00000080 /* IOCB memory is registered */ #define SDP_IOCB_F_RECV 0x00000100 /* IOCB is for a receive request */ +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY +#define SDP_IOCB_F_WAITALL 0x00000200 /* IOCB is for WAITALL request */ +#endif #define SDP_IOCB_F_ALL 0xFFFFFFFF /* IOCB all mask */ /* * zcopy constants. @@ -66,10 +69,12 @@ */ #define sdp_iocb_q_size(table) ((table)->size) +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +struct sdpc_iocb_wait; +#endif /* * INET read/write IOCBs */ - /* * save a kvec read/write for processing once data shows up. */ @@ -80,7 +85,7 @@ struct sdpc_iocb_q *table; /* table to which this iocb belongs */ void (*release)(struct sdpc_iocb *iocb); /* release the object */ /* - * iocb sepcific + * iocb specific */ int flags; /* usage flags */ /* @@ -112,6 +117,9 @@ int page_offset; /* offset into first page. */ struct work_struct completion; /* task for defered completion. */ +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY + struct sdpc_iocb_wait *wait; +#endif /* * kernel iocb structure */ @@ -127,4 +135,26 @@ int size; /* current number of IOCBs in table */ }; +#ifdef CONFIG_INFINIBAND_SDP_SEND_ZCOPY +/* Report completions here */ +struct sdpc_iocb_wait { + spinlock_t lock; + int outstanding; + wait_queue_head_t wait; +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY + struct sdpc_iocb_q q; /* Receive iocbs only */ + struct list_head src_wait_list; /* Receive only */ +#endif +}; + +#ifdef CONFIG_INFINIBAND_SDP_RECV_ZCOPY +static inline void sdp_iocb_wake(struct list_head *head) +{ + struct sdpc_iocb_wait *wait; + list_for_each_entry(wait, head, src_wait_list) + wake_up(&wait->wait); +} +#endif + +#endif #endif /* _SDP_IOCB_H */ -- MST From tom at opengridcomputing.com Fri Nov 4 09:02:41 2005 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 04 Nov 2005 11:02:41 -0600 Subject: [openib-general][PATCH] local device search with source address wildcard Message-ID: <1131123761.3839.14.camel@trinity.austin.ammasso.com> Sean: I was looking through ip_resolve_local and it looks to me like if the source address is 0, it will end up getting set to the destination IP instead of the IP address of the local interface. Also if ip_dev_find can't find a local interface with connectivity to the specified remote peer, shouldn't the error be EHOSTUNREACH? Finally, if the user specifies a bogus source address, we don't compare it against the source address configured on the local interface found in the route. It will probably still fail later, but in some bizarre fashion. Here's a patch to show you what I mean. BTW, I think this brings up another issue: which locally configured IP address do we use if more than one is configured on the device (aliasing)? This patch just arbitrarily uses the first one. You could look for a key word in the ifname for example, i.e. eth0:rnic0 or something. I only compiled this in my branch and did not test it. It is just a conversation piece at this point. Signed-off-by: Tom Tucker Index: addr.c =================================================================== --- addr.c (revision 3860) +++ addr.c (working copy) @@ -216,17 +216,20 @@ struct ib_addr *addr) { struct net_device *dev; + struct in_device* indev; u32 src_ip = src_in->sin_addr.s_addr; u32 dst_ip = dst_in->sin_addr.s_addr; int ret = 0; dev = ip_dev_find(dst_ip); if (!dev) - return -EADDRNOTAVAIL; + return -EHOSTUNREACH; + indev = __in_dev_get(dev); + if (!src_ip) { - src_in->sin_family = dst_in->sin_family; - src_in->sin_addr.s_addr = dst_ip; + src_in->sin_family = AF_INET; + src_in->sin_addr.s_addr = indev->ifa_list->ifa_address; addr->sgid = *(union ib_gid *) (dev->dev_addr + 4); addr->pkey = addr_get_pkey(dev); } else { @@ -234,6 +237,11 @@ &addr->sgid, &addr->pkey); if (ret) goto out; + + if (src_in->sin_addr.s_addr != indev->ifa_list- >ifa_address) { + ret = -EINVAL; + goto out; + } } addr->dgid = *(union ib_gid *) (dev->dev_addr + 4); From iod00d at hp.com Fri Nov 4 08:32:51 2005 From: iod00d at hp.com (Grant Grundler) Date: Fri, 4 Nov 2005 08:32:51 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB In-Reply-To: <96f8e60e0511031701i7b9ce5a0gbdade306735695e6@mail.gmail.com> References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> <20051104002101.GC1478@esmail.cup.hp.com> <96f8e60e0511031701i7b9ce5a0gbdade306735695e6@mail.gmail.com> Message-ID: <20051104163251.GA4463@esmail.cup.hp.com> On Thu, Nov 03, 2005 at 05:01:59PM -0800, Ranjit Pandit wrote: > I will go ahead and post the Makefile...but it's currently specific to > SST Access layer. I wouldn't bother. > > 1) Port contrib/silverstorm/rds/ to linux-kernel/infiniband/ulp/rds/ > > Grant, at the last OpenIB conference, you had volunteered to help port > the code or, at the coding style. :) Yes, I offered to help port. But I can't do the "heavy lifting". I can test, code review, and fix up coding style nits. > > 2) include some docs on it's use and why RDS is better than SDP. > > I will checkin the RDS presentations shortly. That would be good. Just as a reminder, your slideset for the OpenIB "Data Center Workshop" is posted here: http://openib.org/docs/oib_wkshp_082205/Reliable_Datagram_Sockets.ppt thanks, grant From robert.j.woodruff at intel.com Fri Nov 4 08:35:10 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 4 Nov 2005 08:35:10 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB In-Reply-To: <010201c5e0db$f9ed3820$6401a8c0@YOURA11C73D0FD> Message-ID: Rick wrote, >I've atttached a draft proposal for RDS from Oracle which discusses some of >the motivation for RDS. I assume that you have a driver that uses TCP sockets, Correct ? If so, have you compared the performance of RDS to SDP ? woody From panda at cse.ohio-state.edu Fri Nov 4 09:00:34 2005 From: panda at cse.ohio-state.edu (Dhabaleswar Panda) Date: Fri, 4 Nov 2005 12:00:34 -0500 (EST) Subject: [openib-general] mvapich-gen2 on 2 x 16 CPU SGI Altix 1330 cluster In-Reply-To: <436A4B37.1080801@sgi.com> from "John Partridge" at Nov 03, 2005 11:39:03 AM Message-ID: <200511041700.jA4H0YKp025723@xi.cse.ohio-state.edu> Hi John, > I just though you would like to know that I have now tested the > Pallas benchmark on a two node SGI Altix 1330 cluster using OpenIB > and mvapich-gen2. Each node had 16 CPU's. To do this I had to change > SMPI_MAX_NUMLOCALNODES to be defined as 16 instead of the normal 4 > for the test. I ran a 2x16 (32 total) CPU Pallas benchmark several > times with no hang ups or errors. Very glad to know that you are able to run Pallas with the above parameter change. We had put that parameter for such multi-way SMP systems (Altix) in mind. > I'm wondering if there would be any more changes I would need to > make for scaling to much larger systems. I do plan at some point in > the near future to test this on a much larger system with a LOT more > CPU's We have some ideas on scaling mvapich on multi-way SMP systems (like Altix). Unfortunately, we neither have information on the details of the memory hierarchy/organization of these systems nor access to such systems to try out these ideas. > The test was conducted using a "kernel.org" 2.6.14 kernel and an > OpenIB svn gen2 release of 3926 using Voltaire HCA's and switch > > We will be demonstrating OpenIB and mvapich-gen2 mpi at > Supercomputing 05 (running smaller jobs though because the 32 way > jobs take so long to complete). We will also demo rdma_lat, rdma_bw > and IpoIB. Good to know that you will be having the demo at SC '05. I will stop by your booth. If you will have some time, we can discuss about the memory hierarchy/organization of such systems and how to optimize mvapich on it. > I can send you the pallas results if you are interested. Yes, please send it to me. We will be happy to look at the results. Best Regards, DK > Regards > John > > -- > John Partridge > > Silicon Graphics Inc > Tel: 651-683-3428 > Vnet: 233-3428 > E-Mail: johnip at sgi.com > From mshefty at ichips.intel.com Fri Nov 4 09:36:16 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 04 Nov 2005 09:36:16 -0800 Subject: [openib-general][PATCH] local device search with source address wildcard In-Reply-To: <1131123761.3839.14.camel@trinity.austin.ammasso.com> References: <1131123761.3839.14.camel@trinity.austin.ammasso.com> Message-ID: <436B9C10.7050301@ichips.intel.com> Tom Tucker wrote: > Sean: > > I was looking through ip_resolve_local and it looks to me like > if the source address is 0, it will end up getting set to the > destination IP instead of the IP address of the local interface. The intent of ip_resolve_local() is to check if a given destination address is on the local system. If it is and no source address is specified, then the source address is set to the same address as the destination. > Also if ip_dev_find can't find a local interface with connectivity > to the specified remote peer, shouldn't the error be EHOSTUNREACH? If the address is not a local address, then a check is made to find a route to that address assuming that it exists somewhere remotely. See ib_resolve_addr() which calls ip_resolve_local() and ip_resolve_remote(). So, the return code from ip_resolve_local() returns that the given address is not available on the local system. The address may still be reachable as a remote address. > Finally, if the user specifies a bogus source address, we don't > compare it against the source address configured on the local > interface found in the route. It will probably still fail later, > but in some bizarre fashion. If the source address is bogus, then the call to ib_translate_addr() will fail with EADDRNOTAVAIL. I'm not tied to the return codes, so if one of them works better than the other, I can change it. - Sean From iod00d at hp.com Fri Nov 4 10:19:52 2005 From: iod00d at hp.com (Grant Grundler) Date: Fri, 4 Nov 2005 10:19:52 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB In-Reply-To: <010201c5e0db$f9ed3820$6401a8c0@YOURA11C73D0FD> References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> <20051104002101.GC1478@esmail.cup.hp.com> <96f8e60e0511031701i7b9ce5a0gbdade306735695e6@mail.gmail.com> <010201c5e0db$f9ed3820$6401a8c0@YOURA11C73D0FD> Message-ID: <20051104181952.GB4463@esmail.cup.hp.com> On Thu, Nov 03, 2005 at 08:06:21PM -0500, Rick Frank wrote: > I've atttached a draft proposal for RDS from Oracle which discusses some of > the motivation for RDS. Thanks! Some questions/comments... o What is "GE" acronym for? o I'm seeing about 1/5th CPU load for SDP (vs IPoIB). The "50% less" number doesn't seem that impressive for RDP (vs IPoIB). Maybe this is a difference in the benchmark (I'm running netperf). o RDP wants to provide AF_INET_OFFLOAD. This doesn't exist in my source tree. I don't know who assigns these but it isn't lanana.org. Oracle would be wise to stick with what's in include/linux/sockets.h in order to avoid long term maintenance issues. ISTR OpenIB got flamed for wanting to use AF_INET_OFFLOAD name. If RDP is accepted, I would expect RDP to get AF_INET_RDP. And then use "LD_PRELOAD" and clone libsdp.so to take over AF_INET. ie follow a similar trajectory that SDP had. o Is access control to the RDP protocol something that applies to all protocols? I'm looking item #2 of "Additional Features". o Doesn't SDP meet the following requirement as well? | A goal of RDP should be to support all existing socket | functionality relevant to UDP with no changes to any | existing socket application - other than specifying | AF_INET_OFFLOAD. However, an RDP aware socket application | can take advantage of the RDP features. o I'm struggling with the "RDP is connectionless" comments made earlier. Later in this proposal, "RDP Interface" says packets will be delivered "in order". Doesn't that conflict with "connectionless"? Does UDP guarantee order? o The "crossover" value for zero copy vs inlining data is chipset specific. Ie even within the same architecture, different combinations of CPUs and chipsets will give wide variance. Things like cache size, cache replacement algorithm, available memory bandwidth, memory latency, et al, affect the choice. This value is normally define by/for each architecture since that's practical and lets each arch decide what the right tradeoff is. o The comments in "Recv operations" talk about "backpressure". Is this another way of saying the driver should drop packets once the "fairness threshold" is exceeded? o Does detecting the "death of a remote node" still fall within the "connectionless" definition? o I didn't look through the "config" and "statistics". o "RDP Information" section reminds me of the previous email thread about "netstat" support. Those probably want to be aligned so Oracle can leverage the same command as other users. ie reduce long term maintenance. And while researching the above, I found some nits with SDP: o I was expecting AF_INET_SDP to be in 2.6.14 and it's not. I hope it's part of 2.6.15-rc*. o The ulp/sdp/Kconfig comments say "AF_INET_SDP (address family 26)". AF_LLC uses 26 and sdp_sock.h defines 27. Michael - need a patch or is this trivial enough to fix by hand? thanks, grant From Richard.Frank at oracle.com Fri Nov 4 10:22:41 2005 From: Richard.Frank at oracle.com (Rick Frank) Date: Fri, 4 Nov 2005 13:22:41 -0500 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB References: Message-ID: <001401c5e16c$eefe06b0$6401a8c0@YOURA11C73D0FD> No we do not use TCP sockets - we use to many connections for this 100k+. ----- Original Message ----- From: "Bob Woodruff" To: "'Rick Frank'" ; "Ranjit Pandit" ; "Grant Grundler" Cc: Sent: Friday, November 04, 2005 11:35 AM Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB > Rick wrote, >>I've atttached a draft proposal for RDS from Oracle which discusses some >>of > >>the motivation for RDS. > > I assume that you have a driver that uses TCP sockets, Correct ? > If so, have you compared the performance of RDS to SDP ? > > woody > > From swise at opengridcomputing.com Fri Nov 4 10:22:34 2005 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 4 Nov 2005 12:22:34 -0600 Subject: [openib-general][PATCH] local device search with source addresswildcard References: <1131123761.3839.14.camel@trinity.austin.ammasso.com> <436B9C10.7050301@ichips.intel.com> Message-ID: <00fb01c5e16c$ba78be80$d5000a0a@STEVO> ----- Original Message ----- From: "Sean Hefty" To: "Tom Tucker" Cc: Sent: Friday, November 04, 2005 11:36 AM Subject: Re: [openib-general][PATCH] local device search with source addresswildcard > Tom Tucker wrote: >> Sean: >> >> I was looking through ip_resolve_local and it looks to me like >> if the source address is 0, it will end up getting set to the >> destination IP instead of the IP address of the local interface. > > The intent of ip_resolve_local() is to check if a given destination > address is on the local system. If it is and no source address is > specified, then the source address is set to the same address as the > destination. > This doesn't sound correct to me. The src ip address is supposed to be the local ip address to be used for establishing the connection. If you set it to the destination address, then you'd end up passing that address to the peer in the private data, and that is incorrect... >> Also if ip_dev_find can't find a local interface with connectivity to >> the specified remote peer, shouldn't the error be EHOSTUNREACH? > > If the address is not a local address, then a check is made to find a > route to that address assuming that it exists somewhere remotely. See > ib_resolve_addr() which calls ip_resolve_local() and > ip_resolve_remote(). So, the return code from ip_resolve_local() > returns that the given address is not available on the local system. > The address may still be reachable as a remote address. > >> Finally, if the user specifies a bogus source address, we don't >> compare it against the source address configured on the local >> interface found in the route. It will probably still fail later, but >> in some bizarre fashion. > > If the source address is bogus, then the call to ib_translate_addr() > will fail with EADDRNOTAVAIL. > > I'm not tied to the return codes, so if one of them works better than > the other, I can change it. > > - Sean > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From mshefty at ichips.intel.com Fri Nov 4 10:30:03 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 04 Nov 2005 10:30:03 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB In-Reply-To: <001401c5e16c$eefe06b0$6401a8c0@YOURA11C73D0FD> References: <001401c5e16c$eefe06b0$6401a8c0@YOURA11C73D0FD> Message-ID: <436BA8AB.1050602@ichips.intel.com> Rick Frank wrote: > No we do not use TCP sockets - we use to many connections for this 100k+. Isn't RDS implemented on top of reliable IB/RDMA connections anyway? - Sean From mshefty at ichips.intel.com Fri Nov 4 10:32:08 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 04 Nov 2005 10:32:08 -0800 Subject: [openib-general][PATCH] local device search with source addresswildcard In-Reply-To: <00fb01c5e16c$ba78be80$d5000a0a@STEVO> References: <1131123761.3839.14.camel@trinity.austin.ammasso.com> <436B9C10.7050301@ichips.intel.com> <00fb01c5e16c$ba78be80$d5000a0a@STEVO> Message-ID: <436BA928.9020403@ichips.intel.com> Steve Wise wrote: >> The intent of ip_resolve_local() is to check if a given destination >> address is on the local system. If it is and no source address is >> specified, then the source address is set to the same address as the >> destination. > > This doesn't sound correct to me. The src ip address is supposed to be > the local ip address to be used for establishing the connection. If you > set it to the destination address, then you'd end up passing that > address to the peer in the private data, and that is incorrect... If ip_resolve_local() succeeds, then the destination address is a local address. - Sean From ftillier at silverstorm.com Fri Nov 4 10:33:19 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 4 Nov 2005 10:33:19 -0800 Subject: [openib-general][PATCH] local device search with sourceaddresswildcard In-Reply-To: <00fb01c5e16c$ba78be80$d5000a0a@STEVO> Message-ID: <000801c5e16e$3b4a1490$9e5aa8c0@infiniconsys.com> > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Friday, November 04, 2005 10:23 AM > > ----- Original Message ----- > From: "Sean Hefty" > To: "Tom Tucker" > Cc: > > > Tom Tucker wrote: > >> Sean: > >> > >> I was looking through ip_resolve_local and it looks to me like > >> if the source address is 0, it will end up getting set to the > >> destination IP instead of the IP address of the local interface. > > > > The intent of ip_resolve_local() is to check if a given destination > > address is on the local system. If it is and no source address is > > specified, then the source address is set to the same address as the > > destination. > > > > This doesn't sound correct to me. The src ip address is supposed to be > the local ip address to be used for establishing the connection. If you > set it to the destination address, then you'd end up passing that > address to the peer in the private data, and that is incorrect... If the destination address is on the local system, then the user is establishing a loopback connection. I think that if the user didn't specify a source address, returning the same address as the destination should give the proper results. For loopback connections, source and destination can (and will likely) be the same. - Fab From swise at opengridcomputing.com Fri Nov 4 10:33:29 2005 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 4 Nov 2005 12:33:29 -0600 Subject: [openib-general][PATCH] local device search with sourceaddresswildcard References: <000801c5e16e$3b4a1490$9e5aa8c0@infiniconsys.com> Message-ID: <011e01c5e16e$409f21b0$d5000a0a@STEVO> i misunderstood. I didn't realize ip_resolve_local() will fail if the address it a remote address. nevermind... :-\ ----- Original Message ----- From: "Fab Tillier" To: "'Steve Wise'" ; "Sean Hefty" ; "Tom Tucker" Cc: Sent: Friday, November 04, 2005 12:33 PM Subject: RE: [openib-general][PATCH] local device search with sourceaddresswildcard > From: Steve Wise [mailto:swise at opengridcomputing.com] > Sent: Friday, November 04, 2005 10:23 AM > > ----- Original Message ----- > From: "Sean Hefty" > To: "Tom Tucker" > Cc: > > > Tom Tucker wrote: > >> Sean: > >> > >> I was looking through ip_resolve_local and it looks to me like > >> if the source address is 0, it will end up getting set to the > >> destination IP instead of the IP address of the local interface. > > > > The intent of ip_resolve_local() is to check if a given destination > > address is on the local system. If it is and no source address is > > specified, then the source address is set to the same address as the > > destination. > > > > This doesn't sound correct to me. The src ip address is supposed to > be > the local ip address to be used for establishing the connection. If > you > set it to the destination address, then you'd end up passing that > address to the peer in the private data, and that is incorrect... If the destination address is on the local system, then the user is establishing a loopback connection. I think that if the user didn't specify a source address, returning the same address as the destination should give the proper results. For loopback connections, source and destination can (and will likely) be the same. - Fab From tom at opengridcomputing.com Fri Nov 4 11:40:27 2005 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 04 Nov 2005 13:40:27 -0600 Subject: [openib-general][PATCH] local device search with sourceaddresswildcard In-Reply-To: <011e01c5e16e$409f21b0$d5000a0a@STEVO> References: <000801c5e16e$3b4a1490$9e5aa8c0@infiniconsys.com> <011e01c5e16e$409f21b0$d5000a0a@STEVO> Message-ID: <1131133227.3839.44.camel@trinity.austin.ammasso.com> Well.... It won't necessarily fail will it? If you specified the source address as another port on the same machine, but NOT the one with connectivity to the remote peer, the routine will succeed, but the results are not what you expect...and you will fail further down the line (looking up the path record). This is one of the "bizarre" failures I was originally referring to. By the way, the function name is addr_resolve_local, not ip_xxx ... sorry. On Fri, 2005-11-04 at 12:33 -0600, Steve Wise wrote: > i misunderstood. > > I didn't realize ip_resolve_local() will fail if the address it a remote > address. > > nevermind... > > :-\ > > > > ----- Original Message ----- > From: "Fab Tillier" > To: "'Steve Wise'" ; "Sean Hefty" > ; "Tom Tucker" > Cc: > Sent: Friday, November 04, 2005 12:33 PM > Subject: RE: [openib-general][PATCH] local device search with > sourceaddresswildcard > > > > From: Steve Wise [mailto:swise at opengridcomputing.com] > > Sent: Friday, November 04, 2005 10:23 AM > > > > ----- Original Message ----- > > From: "Sean Hefty" > > To: "Tom Tucker" > > Cc: > > > > > Tom Tucker wrote: > > >> Sean: > > >> > > >> I was looking through ip_resolve_local and it looks to me like > > >> if the source address is 0, it will end up getting set to the > > >> destination IP instead of the IP address of the local interface. > > > > > > The intent of ip_resolve_local() is to check if a given destination > > > address is on the local system. If it is and no source address is > > > specified, then the source address is set to the same address as the > > > destination. > > > > > > > This doesn't sound correct to me. The src ip address is supposed to > > be > > the local ip address to be used for establishing the connection. If > > you > > set it to the destination address, then you'd end up passing that > > address to the peer in the private data, and that is incorrect... > > If the destination address is on the local system, then the user is > establishing > a loopback connection. I think that if the user didn't specify a source > address, returning the same address as the destination should give the > proper > results. > > For loopback connections, source and destination can (and will likely) > be the > same. > > - Fab From ftillier at silverstorm.com Fri Nov 4 10:42:27 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 4 Nov 2005 10:42:27 -0800 Subject: [openib-general] [ANNOUNCE] ContributeRDS (ReliableDatagramSockets) to OpenIB In-Reply-To: <436BA8AB.1050602@ichips.intel.com> Message-ID: <000901c5e16f$827a26b0$9e5aa8c0@infiniconsys.com> > From: Sean Hefty [mailto:mshefty at ichips.intel.com] > Sent: Friday, November 04, 2005 10:30 AM > > Rick Frank wrote: > > No we do not use TCP sockets - we use to many connections for this 100k+. > > Isn't RDS implemented on top of reliable IB/RDMA connections anyway? There is not a 1:1 relationship between a UDP application socket and an IB QP, rather there is a single IB connection between systems over which traffic from multiple UDP sockets flows. - Fab From mshefty at ichips.intel.com Fri Nov 4 10:44:13 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 04 Nov 2005 10:44:13 -0800 Subject: [openib-general] [ANNOUNCE] ContributeRDS (ReliableDatagramSockets) to OpenIB In-Reply-To: <000901c5e16f$827a26b0$9e5aa8c0@infiniconsys.com> References: <000901c5e16f$827a26b0$9e5aa8c0@infiniconsys.com> Message-ID: <436BABFD.2010704@ichips.intel.com> Fab Tillier wrote: > There is not a 1:1 relationship between a UDP application socket and an IB QP, > rather there is a single IB connection between systems over which traffic from > multiple UDP sockets flows. Sounds like software based IB RDD. - Sean From mshefty at ichips.intel.com Fri Nov 4 10:54:26 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 04 Nov 2005 10:54:26 -0800 Subject: [openib-general][PATCH] local device search with sourceaddresswildcard In-Reply-To: <1131133227.3839.44.camel@trinity.austin.ammasso.com> References: <000801c5e16e$3b4a1490$9e5aa8c0@infiniconsys.com> <011e01c5e16e$409f21b0$d5000a0a@STEVO> <1131133227.3839.44.camel@trinity.austin.ammasso.com> Message-ID: <436BAE62.20205@ichips.intel.com> Tom Tucker wrote: > Well.... It won't necessarily fail will it? If you specified the source > address as another port on the same machine, but NOT the one with > connectivity to the remote peer, the routine will succeed, but the > results are not what you expect...and you will fail further down the > line (looking up the path record). This is one of the "bizarre" failures > I was originally referring to. If a source address is NOT specified, then I don't think that there's any issue. If a source address is specified, a failure can occur during rdma_resolve_route() with an error that the source and destination addresses are not reachable. This seems reasonable to me. - Sean From robert.j.woodruff at intel.com Fri Nov 4 10:57:40 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 4 Nov 2005 10:57:40 -0800 Subject: [openib-general] [ANNOUNCE] ContributeRDS(ReliableDatagramSockets) to OpenIB In-Reply-To: <000901c5e16f$827a26b0$9e5aa8c0@infiniconsys.com> Message-ID: Fab wrote, >There is not a 1:1 relationship between a UDP application socket and an IB QP, >rather there is a single IB connection between systems over which traffic from >multiple UDP sockets flows. >- Fab That would probably provide better scalability, since there would not be a 1:1 mapping between UDP sockets and IB connections, however for large clusters there may still be a scalability issue if every node needs to have a connection to every other node. If you implemented it on top of datagrams instead, then each node would only need one QP, rather than one for every node in the cluster. woody From tom at opengridcomputing.com Fri Nov 4 12:00:58 2005 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 04 Nov 2005 14:00:58 -0600 Subject: [openib-general][PATCH] local device search with sourceaddresswildcard In-Reply-To: <436BAE62.20205@ichips.intel.com> References: <000801c5e16e$3b4a1490$9e5aa8c0@infiniconsys.com> <011e01c5e16e$409f21b0$d5000a0a@STEVO> <1131133227.3839.44.camel@trinity.austin.ammasso.com> <436BAE62.20205@ichips.intel.com> Message-ID: <1131134458.3839.52.camel@trinity.austin.ammasso.com> Ok, so lets assume that all the other stuff is background noise about error codes... If you specify a 0 as a source address, won't the private data contain the destination address as the source address, or did I miss something? On Fri, 2005-11-04 at 10:54 -0800, Sean Hefty wrote: > Tom Tucker wrote: > > Well.... It won't necessarily fail will it? If you specified the source > > address as another port on the same machine, but NOT the one with > > connectivity to the remote peer, the routine will succeed, but the > > results are not what you expect...and you will fail further down the > > line (looking up the path record). This is one of the "bizarre" failures > > I was originally referring to. > > If a source address is NOT specified, then I don't think that there's any issue. > > If a source address is specified, a failure can occur during > rdma_resolve_route() with an error that the source and destination addresses are > not reachable. This seems reasonable to me. > > - Sean From mshefty at ichips.intel.com Fri Nov 4 11:05:44 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 04 Nov 2005 11:05:44 -0800 Subject: [openib-general][PATCH] local device search with sourceaddresswildcard In-Reply-To: <1131134458.3839.52.camel@trinity.austin.ammasso.com> References: <000801c5e16e$3b4a1490$9e5aa8c0@infiniconsys.com> <011e01c5e16e$409f21b0$d5000a0a@STEVO> <1131133227.3839.44.camel@trinity.austin.ammasso.com> <436BAE62.20205@ichips.intel.com> <1131134458.3839.52.camel@trinity.austin.ammasso.com> Message-ID: <436BB108.7090009@ichips.intel.com> Tom Tucker wrote: > If you specify a 0 as a source address, won't the private data contain > the destination address as the source address, or did I miss something? The source and destination addresses will be the same if the destination is on the local system. E.g. The local system has address 192.168.0.101. The destination address is 192.168.0.101. The source address will also be set to 192.168.0.101. - Sean From tom at opengridcomputing.com Fri Nov 4 12:09:35 2005 From: tom at opengridcomputing.com (Tom Tucker) Date: Fri, 04 Nov 2005 14:09:35 -0600 Subject: [openib-general][PATCH] local device search with sourceaddresswildcard In-Reply-To: <436BB108.7090009@ichips.intel.com> References: <000801c5e16e$3b4a1490$9e5aa8c0@infiniconsys.com> <011e01c5e16e$409f21b0$d5000a0a@STEVO> <1131133227.3839.44.camel@trinity.austin.ammasso.com> <436BAE62.20205@ichips.intel.com> <1131134458.3839.52.camel@trinity.austin.ammasso.com> <436BB108.7090009@ichips.intel.com> Message-ID: <1131134975.3839.59.camel@trinity.austin.ammasso.com> Sean: I think I'm convinced I'm confused... which is another way of saying "you're right". Thanks, Tom On Fri, 2005-11-04 at 11:05 -0800, Sean Hefty wrote: > Tom Tucker wrote: > > If you specify a 0 as a source address, won't the private data contain > > the destination address as the source address, or did I miss something? > > The source and destination addresses will be the same if the destination is on > the local system. > > E.g. The local system has address 192.168.0.101. The destination address is > 192.168.0.101. The source address will also be set to 192.168.0.101. > > - Sean From ftillier at silverstorm.com Fri Nov 4 11:16:20 2005 From: ftillier at silverstorm.com (Fab Tillier) Date: Fri, 4 Nov 2005 11:16:20 -0800 Subject: [openib-general] [ANNOUNCE] ContributeRDS(ReliableDatagramSockets) to OpenIB In-Reply-To: Message-ID: <000a01c5e174$3f3549c0$9e5aa8c0@infiniconsys.com> > From: Bob Woodruff [mailto:robert.j.woodruff at intel.com] > Sent: Friday, November 04, 2005 10:58 AM > > Fab wrote, > >There is not a 1:1 relationship between a UDP application socket > >and an IB QP, rather there is a single IB connection between systems > >over which traffic from multiple UDP sockets flows. > > That would probably provide better scalability, since there > would not be a 1:1 mapping between UDP sockets and IB connections, > however for large clusters there may still be a scalability issue > if every node needs to have a connection to every other node. > If you implemented it on top of datagrams instead, then each node > would only need one QP, rather than one for every node in the cluster. Doing a UDP to IB-UD protocol is unlikely to buy you anything over just using IPoIB. I don't know about doing UDP to IB-RDD, but the complexity of supporting end to end contexts and RDD QPs seems to me to outweigh the complexity of doing SW multiplexing over multiple IB-RC QPs. I don't think software multiplexing over IB-RC costs much from both a system/HCA resource and performance perspective, especially compared to doing something like uDAPL or SDP where there's a 1:1 relationship between EP or socket to QP, respectively. - Fab From caitlinb at broadcom.com Fri Nov 4 11:15:09 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 4 Nov 2005 11:15:09 -0800 Subject: [openib-general] [ANNOUNCE] ContributeRDS(ReliableDatagramSockets) to OpenIB Message-ID: <54AD0F12E08D1541B826BE97C98F99F1020C40@NT-SJCA-0751.brcm.ad.broadcom.com> > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Bob Woodruff > Sent: Friday, November 04, 2005 10:58 AM > To: 'Fab Tillier'; 'Sean Hefty'; Rick Frank > Cc: openib-general at openib.org > Subject: RE: [openib-general] [ANNOUNCE] > ContributeRDS(ReliableDatagramSockets) to OpenIB > > Fab wrote, > >There is not a 1:1 relationship between a UDP application > socket and an > >IB > QP, > >rather there is a single IB connection between systems over which > >traffic > from > >multiple UDP sockets flows. > > >- Fab > > That would probably provide better scalability, since there > would not be a 1:1 mapping between UDP sockets and IB > connections, however for large clusters there may still be a > scalability issue if every node needs to have a connection to > every other node. > If you implemented it on top of datagrams instead, then each > node would only need one QP, rather than one for every node > in the cluster. > But then the application would have to take responsibility for congestion control and retries after network packet losses. RDS allows an application all the benefits of a reliable connection without the overhead, except for per connection back-pressure. Many applications do not need pre-connection back-pressure since they already have session-wide flow control policies in place. Going from one connection for each pair of application endpoints to one connection for each pair of hosts is a major improvement. For most applications going down to a single QP after that is not sufficiently valuable to add the complexity of working over a totally unreliable protocol. From swise at opengridcomputing.com Fri Nov 4 11:14:58 2005 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 4 Nov 2005 13:14:58 -0600 Subject: [openib-general] machine check stop Message-ID: <015801c5e174$0bfea1a0$d5000a0a@STEVO> All, I'm running on X86_64 platforms with 2.6.13.3+kdb, the openib iwarp branch. I'm running with 2 dual port mellanox cards hooked up point to point between the two systems. So I have 4 IB subnets. I'm running a kernel module that sets up connections and pounds them with rdma writes. Intermittently, when I kick off these tests I get a machine check stop on the client system. Something like this: CPU 0: Machine Check Stop 4 Bank 4: b200000000070f0 TSC blah blah blah. Does this ring a bell with anyone? Thanks, Steve. From robert.j.woodruff at intel.com Fri Nov 4 11:31:04 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 4 Nov 2005 11:31:04 -0800 Subject: [openib-general] [ANNOUNCE] ContributeRDS(ReliableDatagramSockets) to OpenIB In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F1020C40@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: Catlin wrote, >Going from one connection for each pair of application endpoints >to one connection for each pair of hosts is a major improvement. >For most applications going down to a single QP after that is >not sufficiently valuable to add the complexity of working over >a totally unreliable protocol. I agree that there is some improvement in going from one QP per UDP socket to one per node, but it still will likely not scale to 10,000 node clusters, which is something that Oracle probably does not care about, but others in HPC do. If we are going to invent a Reliable Datagram Service, shouldn't it be made to scale so that MPIs that currently use datagrams could also benefit ? woody From rpandit at silverstorm.com Fri Nov 4 11:33:29 2005 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Fri, 4 Nov 2005 11:33:29 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB In-Reply-To: <20051104181952.GB4463@esmail.cup.hp.com> References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> <20051104002101.GC1478@esmail.cup.hp.com> <96f8e60e0511031701i7b9ce5a0gbdade306735695e6@mail.gmail.com> <010201c5e0db$f9ed3820$6401a8c0@YOURA11C73D0FD> <20051104181952.GB4463@esmail.cup.hp.com> Message-ID: <96f8e60e0511041133i222fc7edt172b7b5b1e7ba9fa@mail.gmail.com> On 11/4/05, Grant Grundler wrote: > On Thu, Nov 03, 2005 at 08:06:21PM -0500, Rick Frank wrote: > > I've atttached a draft proposal for RDS from Oracle which discusses some of > > the motivation for RDS. > > Thanks! > > Some questions/comments... > > o What is "GE" acronym for? Gigabit Ethernet > > o I'm seeing about 1/5th CPU load for SDP (vs IPoIB). > The "50% less" number doesn't seem that impressive for RDP (vs IPoIB). > Maybe this is a difference in the benchmark (I'm running netperf). 1/5th CPU load with SDP is with zero copy or without? > > o RDP wants to provide AF_INET_OFFLOAD. This doesn't exist in my source tree. > I don't know who assigns these but it isn't lanana.org. > Oracle would be wise to stick with what's in include/linux/sockets.h > in order to avoid long term maintenance issues. > > ISTR OpenIB got flamed for wanting to use AF_INET_OFFLOAD name. > If RDP is accepted, I would expect RDP to get AF_INET_RDP. > And then use "LD_PRELOAD" and clone libsdp.so to take over AF_INET. > ie follow a similar trajectory that SDP had. > On SST stack, AF_INET_OFFLOAD is used for both SDP as well as RDS. The difference is in the socket_type, SOCK_STREAM Vs SOCK_DGRAM. Can something similar be done on OpenIB? > o Is access control to the RDP protocol something that applies to > all protocols? > I'm looking item #2 of "Additional Features". > In this particular case, Oracle had a specific requirement for access control on RDP. I don't know if other users will have similar requirement on other ULPs or not. > > o Doesn't SDP meet the following requirement as well? > > | A goal of RDP should be to support all existing socket > | functionality relevant to UDP with no changes to any > | existing socket application - other than specifying > | AF_INET_OFFLOAD. However, an RDP aware socket application > | can take advantage of the RDP features. > SDP does not support SOCK_DGRAM... ulp/sdp/sdp_inet.c if (SOCK_STREAM != sock->type || (IPPROTO_IP != protocol && IPPROTO_TCP != protocol)) { sdp_dbg_warn(NULL, "SOCKET: unsupported type/proto. <%d:%d>", sock->type, protocol); return -EPROTONOSUPPORT; } > > o I'm struggling with the "RDP is connectionless" comments made earlier. > Later in this proposal, "RDP Interface" says packets will be > delivered "in order". Doesn't that conflict with "connectionless"? > Does UDP guarantee order? > As Fab and Rick mentioned, RDS provides UDP like connectionless model to applications but it uses IB/RC to communicate to the remote node. So the application doesn't have to maintain a connection state, which is a problem with TCP/SDP when there are 100K odd connections involved. > o The "crossover" value for zero copy vs inlining data is chipset specific. > Ie even within the same architecture, different combinations of CPUs > and chipsets will give wide variance. Things like cache size, cache > replacement algorithm, available memory bandwidth, memory latency, > et al, affect the choice. This value is normally define by/for each > architecture since that's practical and lets each arch decide > what the right tradeoff is. > Agreed. > o The comments in "Recv operations" talk about "backpressure". > Is this another way of saying the driver should drop packets once > the "fairness threshold" is exceeded? > The driver cannot drop packets. When backpressure'd, RDS returns EWOULDBLOCK to the application and then the application can retry. > o Does detecting the "death of a remote node" still fall > within the "connectionless" definition? When a particular socket is "backpressure'd/stalled" the application gets EWOULDBLOCK. Meanwhile, if the destination node dies, that socket needs to be unblocked. Any subsequent sends will return an error so the application can take corrective measures or cleanup. > > o I didn't look through the "config" and "statistics". > > o "RDP Information" section reminds me of the previous email thread > about "netstat" support. Those probably want to be aligned so > Oracle can leverage the same command as other users. > ie reduce long term maintenance. > > > And while researching the above, I found some nits with SDP: > > o I was expecting AF_INET_SDP to be in 2.6.14 and it's not. > I hope it's part of 2.6.15-rc*. > > o The ulp/sdp/Kconfig comments say "AF_INET_SDP (address family 26)". > AF_LLC uses 26 and sdp_sock.h defines 27. > Michael - need a patch or is this trivial enough to fix by hand? > > thanks, > grant > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From iod00d at hp.com Fri Nov 4 11:38:23 2005 From: iod00d at hp.com (Grant Grundler) Date: Fri, 4 Nov 2005 11:38:23 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB In-Reply-To: <20051104181952.GB4463@esmail.cup.hp.com> References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> <20051104002101.GC1478@esmail.cup.hp.com> <96f8e60e0511031701i7b9ce5a0gbdade306735695e6@mail.gmail.com> <010201c5e0db$f9ed3820$6401a8c0@YOURA11C73D0FD> <20051104181952.GB4463@esmail.cup.hp.com> Message-ID: <20051104193823.GD4463@esmail.cup.hp.com> On Fri, Nov 04, 2005 at 10:19:52AM -0800, Grant Grundler wrote: ... > o The comments in "Recv operations" talk about "backpressure". > Is this another way of saying the driver should drop packets once > the "fairness threshold" is exceeded? Ranjot's slideset answered this question (I think): | o Slow receiver ports are stalled at sender side | - combination of activity (LRU) and memory utilization used | to detect slow receivers | - sendmsg() to stalled destination port returns | EWOULDBLOCK, application can retry | - recvmsg() on a stalled port un-stalls it I'm having trouble reconciling previous "connectionless" and "transperent to user space" comments this this slide. Especially the "EWOULDBLOCK" return code. If a reciever can cause a sender to stall, it implies the packets will get dropped on the send side. This is a subtle change in behavior that I don't think any UDP application can assume. But I'm no networking protocol expert... thanks, grant From ranjit.pandit.ib at gmail.com Fri Nov 4 11:54:27 2005 From: ranjit.pandit.ib at gmail.com (pandit ib) Date: Fri, 4 Nov 2005 11:54:27 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB In-Reply-To: <20051104193823.GD4463@esmail.cup.hp.com> References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> <20051104002101.GC1478@esmail.cup.hp.com> <96f8e60e0511031701i7b9ce5a0gbdade306735695e6@mail.gmail.com> <010201c5e0db$f9ed3820$6401a8c0@YOURA11C73D0FD> <20051104181952.GB4463@esmail.cup.hp.com> <20051104193823.GD4463@esmail.cup.hp.com> Message-ID: <96f8e60e0511041154r305d89a0iac7942c0006b5cbb@mail.gmail.com> On 11/4/05, Grant Grundler wrote: > On Fri, Nov 04, 2005 at 10:19:52AM -0800, Grant Grundler wrote: > ... > > o The comments in "Recv operations" talk about "backpressure". > > Is this another way of saying the driver should drop packets once > > the "fairness threshold" is exceeded? > > Ranjot's slideset answered this question (I think): > | o Slow receiver ports are stalled at sender side > | - combination of activity (LRU) and memory utilization used > | to detect slow receivers > | - sendmsg() to stalled destination port returns > | EWOULDBLOCK, application can retry > | - recvmsg() on a stalled port un-stalls it > > I'm having trouble reconciling previous "connectionless" and > "transperent to user space" comments this this slide. > Especially the "EWOULDBLOCK" return code. > > If a reciever can cause a sender to stall, it implies the packets > will get dropped on the send side. This is a subtle change > in behavior that I don't think any UDP application can assume. > But I'm no networking protocol expert... When the sender is stalled, the driver will backpressure the application.. no packets will be dropped. Since a UDP application assumes the underlying transport is unrealiable it should not have any problems running on RDS. On getting EWOUDBLOCK it will simply retry. > > thanks, > grant > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From rolandd at cisco.com Fri Nov 4 12:05:31 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 04 Nov 2005 12:05:31 -0800 Subject: [openib-general] machine check stop In-Reply-To: <015801c5e174$0bfea1a0$d5000a0a@STEVO> (Steve Wise's message of "Fri, 4 Nov 2005 13:14:58 -0600") References: <015801c5e174$0bfea1a0$d5000a0a@STEVO> Message-ID: <52wtjo2n78.fsf@cisco.com> Steve> CPU 0: Machine Check Stop 4 Bank 4: b200000000070f0 TSC Steve> blah blah blah. The chipset and/or CPU detected "something bad" like a parity error or something like that. You'll need to know all the details of your system and find some low-level documentation to decode the machine check output. - R. From rolandd at cisco.com Fri Nov 4 12:10:04 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 04 Nov 2005 12:10:04 -0800 Subject: [openib-general] [PATCH] sdp zero copy support In-Reply-To: <20051104122331.GB15158@mellanox.co.il> (Michael S. Tsirkin's message of "Fri, 4 Nov 2005 14:23:31 +0200") References: <20051104122331.GB15158@mellanox.co.il> Message-ID: <52oe502mzn.fsf@cisco.com> I haven't read the code yet, but: > +config INFINIBAND_SDP_SEND_ZCOPY > + bool "Sockets Direct Protocol Zero Copy Send support" > + depends on INFINIBAND_SDP > + default y > + ---help--- > + This option enables Zero Copy support for send_msg transactions. > + > +config INFINIBAND_SDP_RECV_ZCOPY > + bool "Sockets Direct Protocol Zero Copy Receive support" > + depends on INFINIBAND_SDP && INFINIBAND_SDP_SEND_ZCOPY > + default y > + ---help--- > + This option enables Zero Copy support for recv_msg transactions. Why would I ever say 'n'? I think we should either get rid of these config options, or if there is a reason for them, explain it better in the help text. - R. From swise at opengridcomputing.com Fri Nov 4 12:20:38 2005 From: swise at opengridcomputing.com (Steve Wise) Date: Fri, 4 Nov 2005 14:20:38 -0600 Subject: [openib-general] machine check stop References: <015801c5e174$0bfea1a0$d5000a0a@STEVO> <52wtjo2n78.fsf@cisco.com> Message-ID: <01a201c5e17d$38b9a7e0$d5000a0a@STEVO> Searching on the web shows this particular "4 bank 4" check when there are memory problems. I was just wondering if bad kernel/module code could cause this, or if really indicates a HW system issue, and if anyone else has seen this running openib stress tests on X86_64... Thanx, Stevo. ----- Original Message ----- From: "Roland Dreier" To: "Steve Wise" Cc: Sent: Friday, November 04, 2005 2:05 PM Subject: Re: [openib-general] machine check stop > Steve> CPU 0: Machine Check Stop 4 Bank 4: b200000000070f0 TSC > Steve> blah blah blah. > > The chipset and/or CPU detected "something bad" like a parity error or > something like that. > > You'll need to know all the details of your system and find some > low-level documentation to decode the machine check output. > > - R. > From halr at voltaire.com Fri Nov 4 12:43:07 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Nov 2005 15:43:07 -0500 Subject: [openib-general] [PATCH] user_mad.c: Only allow ib_umad_open to succeed on IB node types Message-ID: <1131136986.4340.4689.camel@hal.voltaire.com> user_mad.c: Only allow ib_umad_open to succeed on IB node types Signed-off-by: Hal Rosenstock Index: user_mad.c =================================================================== --- user_mad.c (revision 3968) +++ user_mad.c (working copy) @@ -595,6 +595,12 @@ static int ib_umad_open(struct inode *in goto out; } + if (port->ib_dev->node_type < IB_NODE_CA || + port->ib_dev->node_type > IB_NODE_ROUTER) { + ret = -ENODEV; + goto out; + } + file = kzalloc(sizeof *file, GFP_KERNEL); if (!file) { kref_put(&port->umad_dev->ref, ib_umad_release_dev); From rpandit at silverstorm.com Fri Nov 4 12:59:13 2005 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Fri, 4 Nov 2005 12:59:13 -0800 Subject: [openib-general] [ANNOUNCE] ContributeRDS(ReliableDatagramSockets) to OpenIB In-Reply-To: References: <54AD0F12E08D1541B826BE97C98F99F1020C40@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: <96f8e60e0511041259v655a217anba925ae53f5c3dee@mail.gmail.com> > I agree that there is some improvement in going from one QP per > UDP socket to one per node, but it still will likely not > scale to 10,000 node clusters, which is something that Oracle > probably does not care about, but others in HPC do. > To put the improvement in perspective: For Mpi running on a 10,000 node cluster with 2 or 4 way nodes, here are the QP/ CM connection requirements: (assuming intra node communication doesn't use IB) Procs per node uDapl/Sdp Rds 2 19996 9999 4 39984 9999 Clearly, there is tradeoff in performance as we go from uDapl/Sdp to Rds. The choice will have to depend on the requirements of performance Vs Scalability. Btw, for this large a cluster, there is a huge overhead in just setting up the connections. Rds connections are setup only once. > If we are going to invent a Reliable Datagram Service, shouldn't > it be made to scale so that MPIs that currently use datagrams > could also benefit ? > > woody > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From robert.j.woodruff at intel.com Fri Nov 4 13:09:59 2005 From: robert.j.woodruff at intel.com (Woodruff, Robert J) Date: Fri, 4 Nov 2005 13:09:59 -0800 Subject: [openib-general] [ANNOUNCE] ContributeRDS(ReliableDatagramSockets) to OpenIB Message-ID: <1AC79F16F5C5284499BB9591B33D6F00060828EB@orsmsx408> >To put the improvement in perspective: And if RDS were implemented over a connectionless QP, the QP savings are even more... Procs per node uDapl/Sdp Rds/connection oriented RDS/conectionless 2 19996 9999 1 4 39984 9999 1 Of coarse, connectionless would require the reliability to be done in S/W in addition to the demuxing of packets. woody From dpilon at gmail.com Fri Nov 4 13:11:41 2005 From: dpilon at gmail.com (Denis Pilon) Date: Fri, 4 Nov 2005 16:11:41 -0500 Subject: [openib-general] error compiling kernel... Message-ID: I am trying to compile but keep getting errors... linux-2.6.14(vanilla) plus latest svn release 3972. LD drivers/infiniband/built-in.o LD drivers/infiniband/core/built-in.o CC [M] drivers/infiniband/core/addr.o CC [M] drivers/infiniband/core/at.o CC [M] drivers/infiniband/core/cm.o drivers/infiniband/core/cm.c: In function `cm_alloc_msg': drivers/infiniband/core/cm.c:179: error: `IB_MGMT_MAD_HDR' undeclared (first use in this function) drivers/infiniband/core/cm.c:179: error: (Each undeclared identifier is reported only once drivers/infiniband/core/cm.c:179: error: for each function it appears in.) drivers/infiniband/core/cm.c:180: error: too few arguments to function `ib_create_send_mad' drivers/infiniband/core/cm.c:187: error: structure has no member named `ah' drivers/infiniband/core/cm.c:188: error: structure has no member named `retries' drivers/infiniband/core/cm.c: In function `cm_alloc_response_msg': drivers/infiniband/core/cm.c:209: error: `IB_MGMT_MAD_HDR' undeclared (first use in this function) drivers/infiniband/core/cm.c:210: error: too few arguments to function `ib_create_send_mad' drivers/infiniband/core/cm.c:215: error: structure has no member named `ah' drivers/infiniband/core/cm.c: In function `cm_free_msg': drivers/infiniband/core/cm.c:222: error: structure has no member named `ah' drivers/infiniband/core/cm.c: In function `cm_insert_listen': drivers/infiniband/core/cm.c:371: error: structure has no member named `device' drivers/infiniband/core/cm.c:371: error: structure has no member named `device' drivers/infiniband/core/cm.c:374: error: structure has no member named `device' drivers/infiniband/core/cm.c:374: error: structure has no member named `device' drivers/infiniband/core/cm.c:376: error: structure has no member named `device' drivers/infiniband/core/cm.c:376: error: structure has no member named `device' drivers/infiniband/core/cm.c: In function `cm_find_listen': drivers/infiniband/core/cm.c:398: error: structure has no member named `device' drivers/infiniband/core/cm.c:401: error: structure has no member named `device' drivers/infiniband/core/cm.c:403: error: structure has no member named `device' drivers/infiniband/core/cm.c: At top level: drivers/infiniband/core/cm.c:543: error: conflicting types for 'ib_create_cm_id' include/rdma/ib_cm.h:306: error: previous declaration of 'ib_create_cm_id' was here drivers/infiniband/core/cm.c:543: error: conflicting types for 'ib_create_cm_id' include/rdma/ib_cm.h:306: error: previous declaration of 'ib_create_cm_id' was here drivers/infiniband/core/cm.c: In function `ib_create_cm_id': drivers/infiniband/core/cm.c:552: error: structure has no member named `device' drivers/infiniband/core/cm.c: In function `ib_destroy_cm_id': drivers/infiniband/core/cm.c:679: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c:690: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c:707: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c: In function `ib_send_cm_req': drivers/infiniband/core/cm.c:933: error: structure has no member named `timeout_ms' drivers/infiniband/core/cm.c:942: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:942: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_issue_rej': drivers/infiniband/core/cm.c:987: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:987: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_dup_req_handler': drivers/infiniband/core/cm.c:1195: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:1195: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_match_req': drivers/infiniband/core/cm.c:1235: error: structure has no member named `device' drivers/infiniband/core/cm.c: In function `ib_send_cm_rep': drivers/infiniband/core/cm.c:1381: error: structure has no member named `timeout_ms' drivers/infiniband/core/cm.c:1384: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:1384: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `ib_send_cm_rtu': drivers/infiniband/core/cm.c:1448: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:1448: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_dup_rep_handler': drivers/infiniband/core/cm.c:1520: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:1520: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_rep_handler': drivers/infiniband/core/cm.c:1588: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c: In function `cm_establish_handler': drivers/infiniband/core/cm.c:1622: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c: In function `cm_rtu_handler': drivers/infiniband/core/cm.c:1661: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c: In function `ib_send_cm_dreq': drivers/infiniband/core/cm.c:1719: error: structure has no member named `timeout_ms' drivers/infiniband/core/cm.c:1722: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:1722: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `ib_send_cm_drep': drivers/infiniband/core/cm.c:1785: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:1785: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_dreq_handler': drivers/infiniband/core/cm.c:1820: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c:1834: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:1834: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_drep_handler': drivers/infiniband/core/cm.c:1881: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c: In function `ib_send_cm_rej': drivers/infiniband/core/cm.c:1949: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:1949: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_rej_handler': drivers/infiniband/core/cm.c:2025: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c:2035: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c: In function `ib_send_cm_mra': drivers/infiniband/core/cm.c:2093: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:2093: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c:2106: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:2106: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c:2119: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:2119: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_mra_handler': drivers/infiniband/core/cm.c:2181: warning: passing arg 2 of `ib_modify_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c:2188: warning: passing arg 2 of `ib_modify_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c:2196: warning: passing arg 2 of `ib_modify_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c: In function `ib_send_cm_lap': drivers/infiniband/core/cm.c:2279: error: structure has no member named `timeout_ms' drivers/infiniband/core/cm.c:2282: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:2282: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_lap_handler': drivers/infiniband/core/cm.c:2359: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:2359: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `ib_send_cm_apr': drivers/infiniband/core/cm.c:2437: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:2437: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_apr_handler': drivers/infiniband/core/cm.c:2476: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c: In function `ib_send_cm_sidr_req': drivers/infiniband/core/cm.c:2573: error: structure has no member named `timeout_ms' drivers/infiniband/core/cm.c:2578: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:2578: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_sidr_req_handler': drivers/infiniband/core/cm.c:2642: error: structure has no member named `device' drivers/infiniband/core/cm.c: In function `ib_send_cm_sidr_rep': drivers/infiniband/core/cm.c:2713: warning: passing arg 1 of `ib_post_send_mad' from incompatible pointer type drivers/infiniband/core/cm.c:2713: error: too few arguments to function `ib_post_send_mad' drivers/infiniband/core/cm.c: In function `cm_sidr_rep_handler': drivers/infiniband/core/cm.c:2766: warning: passing arg 2 of `ib_cancel_mad' makes integer from pointer without a cast drivers/infiniband/core/cm.c: In function `cm_send_handler': drivers/infiniband/core/cm.c:2834: error: structure has no member named `send_buf' make[3]: *** [drivers/infiniband/core/cm.o] Error 1 make[2]: *** [drivers/infiniband/core] Error 2 make[1]: *** [drivers/infiniband] Error 2 make: *** [drivers] Error 2 Am i missing something ? DP -------------- next part -------------- An HTML attachment was scrubbed... URL: From caitlinb at broadcom.com Fri Nov 4 13:17:30 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 4 Nov 2005 13:17:30 -0800 Subject: [openib-general] [ANNOUNCE] ContributeRDS(ReliableDatagramSockets) to OpenIB Message-ID: <54AD0F12E08D1541B826BE97C98F99F10414A1@NT-SJCA-0751.brcm.ad.broadcom.com> > -----Original Message----- > From: Woodruff, Robert J [mailto:robert.j.woodruff at intel.com] > Sent: Friday, November 04, 2005 1:10 PM > To: Ranjit Pandit > Cc: Caitlin Bestler; Fab Tillier; Sean Hefty; Rick Frank; > Matt L. Leininger; openib-general at openib.org > Subject: RE: [openib-general] [ANNOUNCE] > ContributeRDS(ReliableDatagramSockets) to OpenIB > > >To put the improvement in perspective: > > And if RDS were implemented over a connectionless QP, the QP > savings are even more... > > Procs per node uDapl/Sdp Rds/connection oriented > RDS/conectionless > 2 19996 9999 1 > 4 39984 9999 1 > > Of coarse, connectionless would require the reliability to be > done in S/W in addition to the demuxing of packets. > > woody > > It would also require the application to do SAR for all packets that are larger than the PMTU. One of the benefits of trying to ride on the SOCK_DGRAM interface is that it already defines a larger guaranteed message size. From robert.j.woodruff at intel.com Fri Nov 4 13:24:48 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 4 Nov 2005 13:24:48 -0800 Subject: [openib-general] [ANNOUNCE] ContributeRDS(ReliableDatagramSockets) to OpenIB In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10414A1@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: Catlin wrote, >> >> And if RDS were implemented over a connectionless QP, the QP >> savings are even more... >> >> Procs per node uDapl/Sdp Rds/connection oriented conectionless >> 2 19996 9999 1 >> 4 39984 9999 1 >> >> Of coarse, connectionless would require the reliability to be >> done in S/W in addition to the demuxing of packets. >> >> woody >> >> >It would also require the application to do SAR for all >packets that are larger than the PMTU. One of the benefits >of trying to ride on the SOCK_DGRAM interface is that it >already defines a larger guaranteed message size. I suppose the RDS driver could also handle the SAR (if it were needed) in addition to any retries of lost packets. woody From mshefty at ichips.intel.com Fri Nov 4 13:34:35 2005 From: mshefty at ichips.intel.com (Sean Hefty) Date: Fri, 04 Nov 2005 13:34:35 -0800 Subject: [openib-general] userverbs device node_type Message-ID: <436BD3EB.9020407@ichips.intel.com> Is there a way to get the node_type for an ibv_device? - Sean From lindahl at pathscale.com Fri Nov 4 13:46:53 2005 From: lindahl at pathscale.com (Greg Lindahl) Date: Fri, 4 Nov 2005 13:46:53 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (Reliable DatagramSockets) to OpenIB In-Reply-To: <96f8e60e0511041154r305d89a0iac7942c0006b5cbb@mail.gmail.com> References: <5D78D28F88822E4D8702BB9EEF1A4367C2DD62@mercury.infiniconsys.com> <001701c5e0c7$37d1b090$6401a8c0@YOURA11C73D0FD> <20051104002101.GC1478@esmail.cup.hp.com> <96f8e60e0511031701i7b9ce5a0gbdade306735695e6@mail.gmail.com> <010201c5e0db$f9ed3820$6401a8c0@YOURA11C73D0FD> <20051104181952.GB4463@esmail.cup.hp.com> <20051104193823.GD4463@esmail.cup.hp.com> <96f8e60e0511041154r305d89a0iac7942c0006b5cbb@mail.gmail.com> Message-ID: <20051104214653.GC2013@greglaptop.internal.keyresearch.com> On Fri, Nov 04, 2005 at 11:54:27AM -0800, pandit ib wrote: > Since a UDP application assumes the underlying transport is > unrealiable it should not have any problems running on RDS. > On getting EWOUDBLOCK it will simply retry. Most existing UDP applications do not expect a return error code of EWOULDBLOCK. To begin with, the Linux manpages say that you have to specify non-blocking to get this error in the first place. Another possibility is ENOBUFS, which gives the advice "Normally, this does not occur in Linux. Packets are silently dropped when a device queue overflows." There was a somewhat famous case showing lack of error handling in UCP applications under Linux, where Alan Cox decided to read the RFCs different from everyone else, and caused an ICMP 'port unreach' to later cause the same sending socket to return an error for a send to some unrelated host. Many UDP-using apps considered this a fatal error. This was ~ 7 years ago, and this misfeature caused enough anger that it was corrected soon after Alan stopped owning the TCP/UDP stack. In short, I'm not sure there would be much benefit for giving existing UDP-expecting apps a reliable, ordered stream of datagrams. The only app which would see a benefit are those who know that they can turn off their reliability and ordering code, and handle backpressure explicitly. Those folks would benefit from a simpler programming interface than verbs. -- greg From pradeep at us.ibm.com Fri Nov 4 14:06:32 2005 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Fri, 4 Nov 2005 14:06:32 -0800 Subject: [openib-general] Data structure size mismatch Message-ID: I realize that address translation will be replaced shortly. However, here are a few things that I observed which I believe are important. I recently saw an e-mail thread about compilation problems and data structure padding; this is in line with that. So that new incarnation does not face the same pitfalls of address translation, I will describe them here. When I tried running uatt it fails with -EFAULT. Debug revealed that it fails. The following copy_from_user() fails. ib_route = kmalloc(sizeof *ib_route, GFP_KERNEL); if (!ib_route) { result = -ENOMEM; goto err1; } if (copy_from_user(ib_route, cmd.ib_route, sizeof(ib_route))) { result = -EFAULT; goto err2; } In fact I believe this copy_from_user() is unnecessary since this will be actually filled in by "address translation" and passed back to user space later on. So, if I eliminate this copy_from_user(), uatt again fails with EFAULT in: if (copy_to_user((void __user *)(unsigned long)cmd.response, &resp, sizeof(resp))) { result = -EFAULT; goto err4; } The environment I was using a 32-bit app and 64-bit kernel on Power. The reason is struct ib_uat_route_by_ip_req has pointers in them (LP64 vs ILP32). I am told a 64-bit app succeeded on a 64-bit kernel which confirmed my suspicions. Given that I took a quick look at all the places that copy_from_user() is used (I did not do this exercise for copy_to_user(), which would be the complete thing to do) and found that this (data structure size mismatch) potentially also occurs in user_mad,c. I did not see any anomalies in ucm and uverbs. Comments from people who are more familair with the code? Pradeep pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.j.woodruff at intel.com Fri Nov 4 14:14:58 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 4 Nov 2005 14:14:58 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB In-Reply-To: <010201c5e0db$f9ed3820$6401a8c0@YOURA11C73D0FD> Message-ID: Rick wrote, >I've atttached a draft proposal for RDS from Oracle which discusses some of >the motivation for RDS. Couple of questions/comments on the spec. AF_INET_OFFLOAD should be renamed to something like AF_INET_RDS. Would something like SCTP provide the same type of capabilities (relaible datagrams) that you are suggesting to add with RDP ? http://www.networksorcery.com/enp/protocol/sctp.htm http://www.faqs.org/rfcs/rfc2960.html From caitlinb at broadcom.com Fri Nov 4 14:30:34 2005 From: caitlinb at broadcom.com (Caitlin Bestler) Date: Fri, 4 Nov 2005 14:30:34 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB Message-ID: <54AD0F12E08D1541B826BE97C98F99F10414A7@NT-SJCA-0751.brcm.ad.broadcom.com> > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Bob Woodruff > Sent: Friday, November 04, 2005 2:15 PM > To: 'Rick Frank'; Ranjit Pandit; Grant Grundler > Cc: openib-general at openib.org > Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS ( > ReliableDatagramSockets) to OpenIB > > Rick wrote, > >I've atttached a draft proposal for RDS from Oracle which discusses > >some of > > >the motivation for RDS. > > Couple of questions/comments on the spec. > > > AF_INET_OFFLOAD should be renamed to something like AF_INET_RDS. > > Would something like SCTP provide the same type of > capabilities (relaible datagrams) that you are suggesting to > add with RDP ? > Each stream within an SCTP association provides a reliable, ordered service. There would be two primary constraints in using SCTP for this usage profile: 1) The Stream ID is 16 bits, and the natural mapping would be to have each stream represent a source/destination pairing. That would imply fewer than 256 endpoints per host. If the source were encoded by hand then the limitation would be 64K, but that's an awkard mix of application and transport layer encoding. 2) The network has to be composed of SCTP friendly equipment. When IP network equipment operated exclusively at L2/L3, and L4 was left to the endpoints, SCTP would have had no problem being deployed. But because of security and IPV4 address shortages there are a lot of middleboxes that are L4 aware, and generally that L4 awareness is limited to TCP and UDP. SCTP support would also have to be part of the offload device. RDS enables reliable datagrams using existing offloaded RC services (IB RC, iWARP, TOE). No NIC enhancements are required. From trimmer at silverstorm.com Fri Nov 4 14:34:37 2005 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 4 Nov 2005 17:34:37 -0500 Subject: [openib-general] [ANNOUNCE]ContributeRDS(ReliableDatagramSockets) to OpenIB Message-ID: <5D78D28F88822E4D8702BB9EEF1A43670A08BD@mercury.infiniconsys.com> > Fab wrote, > >There is not a 1:1 relationship between a UDP application > socket and an IB > QP, > >rather there is a single IB connection between systems over > which traffic > from > >multiple UDP sockets flows. > > >- Fab > > Bob wrote, > That would probably provide better scalability, since there > would not be a 1:1 mapping between UDP sockets and IB connections, > however for large clusters there may still be a scalability issue > if every node needs to have a connection to every other node. > If you implemented it on top of datagrams instead, then each node > would only need one QP, rather than one for every node in the cluster. That is essentially what Oracle previously did when using UDP over IPoIB. Significant performance gains were realized with RDS (as compared to IPoIB) for a number of reasons: 1. use of RC connections allows for messages larger than IB MTU, which allows for more efficiency and better performance. 2. By using RC connections and flow control in the RDS socket mux, Oracle was able to remove the need for timeouts and retries in application space. Such algorithms in application space can get expensive, especially due to error handling which is inevitable when congestion and stress force the loss of packets by any unreliable datagram protocol (IB/UD or IPoIB/UDP or Ethernet/UDP). Todd Rimmer From halr at voltaire.com Fri Nov 4 14:30:49 2005 From: halr at voltaire.com (Hal Rosenstock) Date: 04 Nov 2005 17:30:49 -0500 Subject: [openib-general] Data structure size mismatch In-Reply-To: References: Message-ID: <1131143449.4340.5204.camel@hal.voltaire.com> On Fri, 2005-11-04 at 17:06, Pradeep Satyanarayana wrote: > I realize that address translation will be replaced shortly. However, > here are a few things that > I observed which I believe are important. Important to fix in what time frame ? > I recently saw an e-mail thread about compilation problems and > data structure padding; this is in line with that. > > So that new incarnation does not face the same pitfalls of address > translation, I will describe them here. > > When I tried running uatt it fails with -EFAULT. Debug revealed that > it fails. The following > copy_from_user() fails. > > ib_route = kmalloc(sizeof *ib_route, GFP_KERNEL); > if (!ib_route) { > result = -ENOMEM; > goto err1; > } > > if (copy_from_user(ib_route, cmd.ib_route, sizeof(ib_route))) { > result = -EFAULT; > goto err2; > } > > In fact I believe this copy_from_user() is unnecessary since this will > be actually filled in by "address translation" and > passed back to user space later on. Not always. If I recall correctly, there is a case where this copy is needed. It is not in the mode that uatt uses AT right now though. > So, if I eliminate this copy_from_user(), uatt again fails with > EFAULT in: > > if (copy_to_user((void __user *)(unsigned long)cmd.response, > &resp, sizeof(resp))) { > result = -EFAULT; > goto err4; > } > > The environment I was using a 32-bit app and 64-bit kernel on Power. > The reason is > struct ib_uat_route_by_ip_req has pointers in them (LP64 vs ILP32). This needs to be replaced by the port GID. Another alternative is the name. This has been discussed before on the list. -- Hal > I am told a 64-bit app succeeded on a 64-bit kernel which confirmed my > suspicions. > > Given that I took a quick look at all the places that copy_from_user() > is used (I did not > do this exercise for copy_to_user(), which would be the complete thing > to do) and found > that this (data structure size mismatch) potentially also occurs in > user_mad,c. I did not see any anomalies > in ucm and uverbs. > > Comments from people who are more familair with the code? > > Pradeep > pradeep at us.ibm.com > > ______________________________________________________________________ > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From robert.j.woodruff at intel.com Fri Nov 4 14:53:04 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 4 Nov 2005 14:53:04 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB In-Reply-To: <54AD0F12E08D1541B826BE97C98F99F10414A7@NT-SJCA-0751.brcm.ad.broadcom.com> Message-ID: Catlin wrote, >SCTP support would also have to be part of the offload device. >RDS enables reliable datagrams using existing offloaded RC >services (IB RC, iWARP, TOE). No NIC enhancements are required. BTW. SCTP runs in Linux today without any NIC enhancements or offload support. Perhaps if tunneling udp packets over RC connections rather than UD connections provides better performance, as was seen in the RDS experiment, then why not just convert IPoIB to use a connected model (rather than datagrams) and then all existing IP upper level protocols would could benefit, TCP, UDP, SCTP, .... woody -----Original Message----- From: Caitlin Bestler [mailto:caitlinb at broadcom.com] Sent: Friday, November 04, 2005 2:31 PM To: Woodruff, Robert J; Rick Frank; Ranjit Pandit; Grant Grundler Cc: openib-general at openib.org Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Bob Woodruff > Sent: Friday, November 04, 2005 2:15 PM > To: 'Rick Frank'; Ranjit Pandit; Grant Grundler > Cc: openib-general at openib.org > Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS ( > ReliableDatagramSockets) to OpenIB > > Rick wrote, > >I've atttached a draft proposal for RDS from Oracle which discusses > >some of > > >the motivation for RDS. > > Couple of questions/comments on the spec. > > > AF_INET_OFFLOAD should be renamed to something like AF_INET_RDS. > > Would something like SCTP provide the same type of > capabilities (relaible datagrams) that you are suggesting to > add with RDP ? > Each stream within an SCTP association provides a reliable, ordered service. There would be two primary constraints in using SCTP for this usage profile: 1) The Stream ID is 16 bits, and the natural mapping would be to have each stream represent a source/destination pairing. That would imply fewer than 256 endpoints per host. If the source were encoded by hand then the limitation would be 64K, but that's an awkard mix of application and transport layer encoding. 2) The network has to be composed of SCTP friendly equipment. When IP network equipment operated exclusively at L2/L3, and L4 was left to the endpoints, SCTP would have had no problem being deployed. But because of security and IPV4 address shortages there are a lot of middleboxes that are L4 aware, and generally that L4 awareness is limited to TCP and UDP. SCTP support would also have to be part of the offload device. RDS enables reliable datagrams using existing offloaded RC services (IB RC, iWARP, TOE). No NIC enhancements are required. From trimmer at silverstorm.com Fri Nov 4 15:02:04 2005 From: trimmer at silverstorm.com (Rimmer, Todd) Date: Fri, 4 Nov 2005 18:02:04 -0500 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB Message-ID: <5D78D28F88822E4D8702BB9EEF1A436773E971@mercury.infiniconsys.com> > Bob wrote, > Perhaps if tunneling udp packets over RC connections rather than > UD connections provides better performance, as was seen in the RDS > experiment, then why not just convert > IPoIB to use a connected model (rather than datagrams) > and then all existing IP upper level > protocols would could benefit, TCP, UDP, SCTP, .... This would miss the second major improvement of RDS, namely removing the need for the application to perform timeouts and retries on datagram packets. If Oracle ran over UDP/IP/IPoIB it would not be guaranteed a loss-less reliable interface. If UDP/IP/IPoIB provided a loss-less reliable interface it would likely break or affect other UDP applications which are expecting a flow controlled interface. Todd Rimmer From robert.j.woodruff at intel.com Fri Nov 4 15:03:17 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 4 Nov 2005 15:03:17 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB In-Reply-To: Message-ID: Woody wrote, >Perhaps if tunneling udp packets over RC connections rather than >UD connections provides better performance, as was seen in the RDS >experiment, then why not just convert >IPoIB to use a connected model (rather than datagrams) >and then all existing IP upper level >protocols would could benefit, TCP, UDP, SCTP, .... Saying this another way. Make the hardware run the existing protocols better, don't design a new protocol to work around the problems with a specific hardware transport. woody -----Original Message----- From: Caitlin Bestler [mailto:caitlinb at broadcom.com] Sent: Friday, November 04, 2005 2:31 PM To: Woodruff, Robert J; Rick Frank; Ranjit Pandit; Grant Grundler Cc: openib-general at openib.org Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS ( ReliableDatagramSockets) to OpenIB > -----Original Message----- > From: openib-general-bounces at openib.org > [mailto:openib-general-bounces at openib.org] On Behalf Of Bob Woodruff > Sent: Friday, November 04, 2005 2:15 PM > To: 'Rick Frank'; Ranjit Pandit; Grant Grundler > Cc: openib-general at openib.org > Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS ( > ReliableDatagramSockets) to OpenIB > > Rick wrote, > >I've atttached a draft proposal for RDS from Oracle which discusses > >some of > > >the motivation for RDS. > > Couple of questions/comments on the spec. > > > AF_INET_OFFLOAD should be renamed to something like AF_INET_RDS. > > Would something like SCTP provide the same type of > capabilities (relaible datagrams) that you are suggesting to > add with RDP ? > Each stream within an SCTP association provides a reliable, ordered service. There would be two primary constraints in using SCTP for this usage profile: 1) The Stream ID is 16 bits, and the natural mapping would be to have each stream represent a source/destination pairing. That would imply fewer than 256 endpoints per host. If the source were encoded by hand then the limitation would be 64K, but that's an awkard mix of application and transport layer encoding. 2) The network has to be composed of SCTP friendly equipment. When IP network equipment operated exclusively at L2/L3, and L4 was left to the endpoints, SCTP would have had no problem being deployed. But because of security and IPV4 address shortages there are a lot of middleboxes that are L4 aware, and generally that L4 awareness is limited to TCP and UDP. SCTP support would also have to be part of the offload device. RDS enables reliable datagrams using existing offloaded RC services (IB RC, iWARP, TOE). No NIC enhancements are required. _______________________________________________ openib-general mailing list openib-general at openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general From robert.j.woodruff at intel.com Fri Nov 4 15:10:54 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 4 Nov 2005 15:10:54 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB In-Reply-To: <5D78D28F88822E4D8702BB9EEF1A436773E971@mercury.infiniconsys.com> Message-ID: Todd wrote, >This would miss the second major improvement of RDS, namely removing the need for >the application to perform timeouts and retries on datagram packets. If Oracle >ran over UDP/IP/IPoIB it would not be guaranteed a loss-less reliable interface. >If UDP/IP/IPoIB provided a loss-less reliable interface it would likely break or >affect other UDP applications which are expecting a flow controlled interface. >Todd Rimmer Then use SCTP instead of UDP, which already provides a loss-less reliable interface. If SCTP has problems with the number of endpoints it can currently support, why not just fix that problem and fix IpoIB to use a connected model to increase performance, rather than inventing a completly new protocol and/or address family. Just a thought. woody From rpandit at silverstorm.com Fri Nov 4 15:16:52 2005 From: rpandit at silverstorm.com (Ranjit Pandit) Date: Fri, 4 Nov 2005 15:16:52 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB In-Reply-To: References: Message-ID: <96f8e60e0511041516v48a115a3m5025a2ffc3026f5a@mail.gmail.com> On 11/4/05, Bob Woodruff wrote: > Woody wrote, > >Perhaps if tunneling udp packets over RC connections rather than > >UD connections provides better performance, as was seen in the RDS > >experiment, then why not just convert > >IPoIB to use a connected model (rather than datagrams) > >and then all existing IP upper level > >protocols would could benefit, TCP, UDP, SCTP, .... > > Saying this another way. > Make the hardware run the existing protocols better, don't > design a new protocol to work around the problems with a > specific hardware transport. > What about SDP? Isn't SDP bypassing the existing TCP protocol stack to take advantage of a specific harware transport - IB? RDS is somewhat like SDP in that it offloads/accelerates SOCK_DGRAM instead of SOCK_STREAM. > woody > > > > > -----Original Message----- > From: Caitlin Bestler [mailto:caitlinb at broadcom.com] > Sent: Friday, November 04, 2005 2:31 PM > To: Woodruff, Robert J; Rick Frank; Ranjit Pandit; Grant Grundler > Cc: openib-general at openib.org > Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS ( > ReliableDatagramSockets) to OpenIB > > > > > -----Original Message----- > > From: openib-general-bounces at openib.org > > [mailto:openib-general-bounces at openib.org] On Behalf Of Bob Woodruff > > Sent: Friday, November 04, 2005 2:15 PM > > To: 'Rick Frank'; Ranjit Pandit; Grant Grundler > > Cc: openib-general at openib.org > > Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS ( > > ReliableDatagramSockets) to OpenIB > > > > Rick wrote, > > >I've atttached a draft proposal for RDS from Oracle which discusses > > >some of > > > > >the motivation for RDS. > > > > Couple of questions/comments on the spec. > > > > > > AF_INET_OFFLOAD should be renamed to something like AF_INET_RDS. > > > > Would something like SCTP provide the same type of > > capabilities (relaible datagrams) that you are suggesting to > > add with RDP ? > > > > Each stream within an SCTP association provides a reliable, > ordered service. > > There would be two primary constraints in using SCTP for > this usage profile: > > 1) The Stream ID is 16 bits, and the natural mapping would > be to have each stream represent a source/destination > pairing. That would imply fewer than 256 endpoints per > host. If the source were encoded by hand then the limitation > would be 64K, but that's an awkard mix of application and > transport layer encoding. > 2) The network has to be composed of SCTP friendly equipment. > When IP network equipment operated exclusively at L2/L3, > and L4 was left to the endpoints, SCTP would have had no > problem being deployed. But because of security and IPV4 > address shortages there are a lot of middleboxes that are > L4 aware, and generally that L4 awareness is limited to > TCP and UDP. > > SCTP support would also have to be part of the offload device. > RDS enables reliable datagrams using existing offloaded RC > services (IB RC, iWARP, TOE). No NIC enhancements are required. > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > From Richard.Frank at oracle.com Fri Nov 4 15:27:42 2005 From: Richard.Frank at oracle.com (Rick Frank) Date: Fri, 4 Nov 2005 18:27:42 -0500 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB References: <5D78D28F88822E4D8702BB9EEF1A436773E971@mercury.infiniconsys.com> Message-ID: <004201c5e197$99ecc730$6401a8c0@YOURA11C73D0FD> Folks, I just realized that the RDS proposal doc I posted was not the latest - I have attached a newer doc. BTW - Our goal at Oracle is to eventually replace our use of UDP with RDS - we want to get out of the business of making UDP work for us (from user mode) - each time we create new internal database clients with corresponding new IPC requirements. I'm not proposing that we change our use of the connectionless datagram model - we have to many dependencies on this. For example, very shortly we will need to support reliably (and efficiently) moving 1meg msgs in our IPC - which will further complicate the UDP implementation - and further reduce its performance compared to RDP - which can support the 1meg MTU naturally for some interconnects - and or rely on a driver level implementation RDS / transport for those that do not. Basically it is very hard to do this stuff from user mode. Note that we will still be using our existing IPC module for RDS - just removing the remaining UDP vestages. Of course for this to work - we will need RDS to be ubiquitous - supported on all interconnects - to include simple Ethernet NICs. ----- Original Message ----- From: "Rimmer, Todd" To: "Bob Woodruff" ; "Caitlin Bestler" ; "Rick Frank" ; "Pandit, Ranjit" ; "Grant Grundler" Cc: Sent: Friday, November 04, 2005 6:02 PM Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB > Bob wrote, > Perhaps if tunneling udp packets over RC connections rather than > UD connections provides better performance, as was seen in the RDS > experiment, then why not just convert > IPoIB to use a connected model (rather than datagrams) > and then all existing IP upper level > protocols would could benefit, TCP, UDP, SCTP, .... This would miss the second major improvement of RDS, namely removing the need for the application to perform timeouts and retries on datagram packets. If Oracle ran over UDP/IP/IPoIB it would not be guaranteed a loss-less reliable interface. If UDP/IP/IPoIB provided a loss-less reliable interface it would likely break or affect other UDP applications which are expecting a flow controlled interface. Todd Rimmer -------------- next part -------------- A non-text attachment was scrubbed... Name: Proposal_for_a_Reliable_Datagram_Socket_Interface.doc Type: application/msword Size: 51712 bytes Desc: not available URL: From Richard.Frank at oracle.com Fri Nov 4 15:29:57 2005 From: Richard.Frank at oracle.com (Rick Frank) Date: Fri, 4 Nov 2005 18:29:57 -0500 Subject: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB References: Message-ID: <004701c5e197$ac1d2670$6401a8c0@YOURA11C73D0FD> SCTP is connection based - we have many dependencies on our connectionless datagram model. ----- Original Message ----- From: "Bob Woodruff" To: "'Rimmer, Todd'" ; "Caitlin Bestler" ; "Rick Frank" ; "Pandit, Ranjit" ; "Grant Grundler" Cc: Sent: Friday, November 04, 2005 6:10 PM Subject: RE: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB > Todd wrote, >>This would miss the second major improvement of RDS, namely removing the > need for >the application to perform timeouts and retries on datagram > packets. If Oracle >>ran over UDP/IP/IPoIB it would not be guaranteed a loss-less reliable > interface. >If UDP/IP/IPoIB provided a loss-less reliable interface it > would likely break or >affect other UDP applications which are expecting a > flow controlled interface. > >>Todd Rimmer > > Then use SCTP instead of UDP, which already provides a loss-less reliable > interface. > If SCTP has problems with the number of endpoints it can currently > support, > why not just fix that problem and fix IpoIB to use a connected model to > increase performance, rather than inventing a completly new protocol > and/or > address family. > > Just a thought. > > woody > > > > > > > _______________________________________________ > openib-general mailing list > openib-general at openib.org > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > From robert.j.woodruff at intel.com Fri Nov 4 15:49:33 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 4 Nov 2005 15:49:33 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS(ReliableDatagramSockets) to OpenIB In-Reply-To: <004701c5e197$ac1d2670$6401a8c0@YOURA11C73D0FD> Message-ID: Rick wrote, >SCTP is connection based - we have many dependencies on our connectionless >datagram model. I think I get it now. I was just talking with Roy about SCTP, and he said the same thing, SCTP is a connected rather than datagram model, so SCTP does not seem to solve the problem since it has the same FD scaling problems as TCP. >Of course for this to work - we will need RDS to be ubiquitous - supported >on all interconnects - to include simple Ethernet NICs. Makes sense. woody From robert.j.woodruff at intel.com Fri Nov 4 15:58:13 2005 From: robert.j.woodruff at intel.com (Bob Woodruff) Date: Fri, 4 Nov 2005 15:58:13 -0800 Subject: [openib-general] [ANNOUNCE] Contribute RDS (ReliableDatagramSockets) to OpenIB In-Reply-To: <96f8e60e0511041516v48a115a3m5025a2ffc3026f5a@mail.gmail.com> Message-ID: Ranjit wrote, >RDS is somewhat like SDP in that it offloads/accelerates SOCK_DGRAM >instead of SOCK_STREAM. So back to the question from Roland that started this thread. When do you plan to re-work the code to use the OpenIB verbs and make it suitable for the kernel ? And do you plan to develop the code, or at least the infrastructure to allow multiple RDS providers to plug in so that it is ubiquitous - supported on all interconnects - to include simple Ethernet NICs ? woody From pradeep at us.ibm.com Fri Nov 4 15:59:38 2005 From: pradeep at us.ibm.com (Pradeep Satyanarayana) Date: Fri, 4 Nov 2005 15:59:38 -0800 Subject: [openib-general] Data structure size mismatch In-Reply-To: <1131143449.4340.5204.camel@hal.voltaire.com> Message-ID: Hal Rosenstock wrote on 11/04/2005 02:30:49 PM: > On Fri, 2005-11-04 at 17:06, Pradeep Satyanarayana wrote: > > I realize that address translation will be replaced shortly. However, > > here are a few things that > > I observed which I believe are important. > > Important to fix in what time frame ? > > > I recently saw an e-mail thread about compilation problems and > > data structure padding; this is in line with that. > > > > So that new incarnation does not face the same pitfalls of address > > translation, I will describe them here. > > > > When I tried running uatt it fails with -EFAULT. Debug revealed that > > it fails. The following > > copy_from_user() fails. > > > > ib_route = kmalloc(sizeof *ib_route, GFP_KERNEL); > > if (!ib_route) { > > result = -ENOMEM; > > goto err1; > > } > > > > if (copy_from_user(ib_route, cmd.ib_route, sizeof(ib_route))) { > > result = -EFAULT; > > goto err2; > > } > > > > In fact I believe this copy_from_user() is unnecessary since this will > > be actually filled in by "address translation" and > > passed back to user space later on. > > Not always. If I recall correctly, there is a case where this copy is > needed. It is not in the mode that uatt uses AT right now though. Maybe true, but there is still a 32-bit app 64-bit kernel issue that needs to be fixed, unless we agree to change the data structure to say incorporate a device_name as you suggest below. > > > So, if I eliminate this copy_from_user(), uatt again fails with > > EFAULT in: > > > > if (copy_to_user((void __user *)(unsigned long)cmd.response, > > &resp, sizeof(resp))) { > > result = -EFAULT; > > goto err4; > > } > > > > The environment I was using a 32-bit app and 64-bit kernel on Power. > > The reason is > > struct ib_uat_route_by_ip_req has pointers in them (LP64 vs ILP32). > > This needs to be replaced by the port GID. Another alternative is the > name. This has been discussed before on the list. > > -- Hal > > > I am told a 64-bit app succeeded on a 64-bit kernel which confirmed my > > suspicions. > > > > Given that I took a quick look at all the places that copy_from_user() > > is used (I did not > > do this exercise for copy_to_user(), which would be the complete thing > > to do) and found > > that this (data structure size mismatch) potentially also occurs in > > user_mad,c. I did not see any anomalies Even if we change struct ib_uat_route_by_ip_req, there still is user_mad.c that needs to be looked into. Pradeep pradeep at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rolandd at cisco.com Fri Nov 4 16:00:02 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 04 Nov 2005 16:00:02 -0800 Subject: [openib-general] userverbs device node_type In-Reply-To: <436BD3EB.9020407@ichips.intel.com> (Sean Hefty's message of "Fri, 04 Nov 2005 13:34:35 -0800") References: <436BD3EB.9020407@ichips.intel.com> Message-ID: <52hdas2ccd.fsf@cisco.com> Sean> Is there a way to get the node_type for an ibv_device? Not at the moment... it's probably safe to assume it's a CA for now. - R. From rolandd at cisco.com Fri Nov 4 16:09:39 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 04 Nov 2005 16:09:39 -0800 Subject: [openib-general] Data structure size mismatch In-Reply-To: (Pradeep Satyanarayana's message of "Fri, 4 Nov 2005 15:59:38 -0800") References: Message-ID: <52d5lg2bwc.fsf@cisco.com> >>>>> "Pradeep" == Pradeep Satyanarayana writes: Pradeep> Even if we change struct ib_uat_route_by_ip_req, there Pradeep> still is user_mad.c that needs to be looked into. Could you be specific? As far as I can tell, all of the structures copied to and from userspace in user_mad.c are laid out identically for 32-bit and 64-bit architectures. Thanks, Roland From rolandd at cisco.com Fri Nov 4 16:20:03 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 04 Nov 2005 16:20:03 -0800 Subject: [openib-general] [git pull] IB updates Message-ID: <523bmc2bf0.fsf@cisco.com> Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git for-linus This tree is also available from kernel.org mirrors at: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git for-linus The pull will get the following changes: Jack Morgenstein: [IB] mthca: check P_Key index in modify QP Michael S. Tsirkin: [IB] mthca: report asynchronous CQ events Roland Dreier: [IPoIB] use spin_trylock_irqsave() [IB] uverbs: Avoid NULL pointer deref on CQ async event [IB] mthca: Avoid SRQ free WQE list corruption [IPoIB] cleanups: fix comment, remove useless variables [IB] kzalloc() conversions [IPoIB] remove unneeded initializations to 0 [IPoIB] don't compile debug code if debugging isn't enabled [IB] mthca: fix format of FW version [IB] umad: fix hot remove of IB devices Sean Hefty: [IB] ucm: 32/64 compatibility fixes drivers/infiniband/core/agent.c | 3 - drivers/infiniband/core/cm.c | 6 +- drivers/infiniband/core/device.c | 10 --- drivers/infiniband/core/mad.c | 31 ++++----- drivers/infiniband/core/sysfs.c | 6 +- drivers/infiniband/core/ucm.c | 9 +-- drivers/infiniband/core/user_mad.c | 80 +++++++++++++++++++----- drivers/infiniband/core/uverbs.h | 1 drivers/infiniband/core/uverbs_cmd.c | 1 drivers/infiniband/core/uverbs_main.c | 13 +--- drivers/infiniband/hw/mthca/mthca_cq.c | 31 +++++++++ drivers/infiniband/hw/mthca/mthca_dev.h | 4 + drivers/infiniband/hw/mthca/mthca_eq.c | 4 + drivers/infiniband/hw/mthca/mthca_main.c | 2 - drivers/infiniband/hw/mthca/mthca_mr.c | 4 - drivers/infiniband/hw/mthca/mthca_profile.c | 4 - drivers/infiniband/hw/mthca/mthca_provider.c | 2 - drivers/infiniband/hw/mthca/mthca_qp.c | 7 ++ drivers/infiniband/hw/mthca/mthca_srq.c | 13 ++-- drivers/infiniband/ulp/ipoib/ipoib.h | 3 + drivers/infiniband/ulp/ipoib/ipoib_ib.c | 13 ++-- drivers/infiniband/ulp/ipoib/ipoib_main.c | 24 ++----- drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 8 ++ drivers/infiniband/ulp/ipoib/ipoib_verbs.c | 4 - include/rdma/ib_user_cm.h | 19 ++++-- 25 files changed, 178 insertions(+), 124 deletions(-) From rolandd at cisco.com Fri Nov 4 16:21:10 2005 From: rolandd at cisco.com (Roland Dreier) Date: Fri, 04 Nov 2005 16:21:10 -0800 Subject: [openib-general] [git pull] Add IB SCSI RDMA Protocol (storage) initiator Message-ID: <52y8440wsp.fsf@cisco.com> Linus, please pull from master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git srp This tree is also available from kernel.org mirrors at: rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git srp The pull will get the following change, which adds an InfiniBand SCSI RDMA Protocol initiator (used to talk to IB storage devices). Thanks, Roland IB: Add SCSI RDMA Protocol (SRP) initiator Add an InfiniBand SCSI RDMA Protocol (SRP) initiator. This driver is used to talk talk to InfiniBand SRP targets (storage devices). Signed-off-by: Roland Dreier --- drivers/infiniband/Kconfig | 2 drivers/infiniband/Makefile | 1 drivers/infiniband/ulp/srp/Kbuild | 1 drivers/infiniband/ulp/srp/Kconfig | 11 drivers/infiniband/ulp/srp/ib_srp.c | 1700 +++++++++++++++++++++++++++++++++++ drivers/infiniband/ulp/srp/ib_srp.h | 150 +++ include/scsi/srp.h | 226 +++++ 7 files changed, 2091 insertions(+), 0 deletions(-) create mode 100644 drivers/infiniband/ulp/srp/Kbuild create mode 100644 drivers/infiniband/ulp/srp/Kconfig create mode 100644 drivers/infiniband/ulp/srp/ib_srp.c create mode 100644 drivers/infiniband/ulp/srp/ib_srp.h create mode 100644 include/scsi/srp.h applies-to: d918cd1ba0ef9afa692cef281afee2f6d6634a1e aef9ec39c47f0cece886ddd6b53c440321e0b2a6 diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 325d502..bdf0891 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -33,4 +33,6 @@ source "drivers/infiniband/hw/mthca/Kcon source "drivers/infiniband/ulp/ipoib/Kconfig" +source "drivers/infiniband/ulp/srp/Kconfig" + endmenu diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile index d256cf7..a43fb34 100644 --- a/drivers/infiniband/Makefile +++ b/drivers/infiniband/Makefile @@ -1,3 +1,4 @@ obj-$(CONFIG_INFINIBAND) += core/ obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mthca/ obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/ +obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/ diff --git a/drivers/infiniband/ulp/srp/Kbuild b/drivers/infiniband/ulp/srp/Kbuild new file mode 100644 index 0000000..a16c73c --- /dev/null +++ b/drivers/infiniband/ulp/srp/Kbuild @@ -0,0 +1 @@ +obj-$(CONFIG_INFINIBAND_SRP) += ib_srp.o diff --git a/drivers/infiniband/ulp/srp/Kconfig b/drivers/infiniband/ulp/srp/Kconfig new file mode 100644 index 0000000..8fe3be4 --- /dev/null +++ b/drivers/infiniband/ulp/srp/Kconfig @@ -0,0 +1,11 @@ +config INFINIBAND_SRP + tristate "InfiniBand SCSI RDMA Protocol" + depends on INFINIBAND && SCSI + ---help--- + Support for the SCSI RDMA Protocol over InfiniBand. This + allows you to access storage devices that speak SRP over + InfiniBand. + + The SRP protocol is defined by the INCITS T10 technical + committee. See . + diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c new file mode 100644 index 0000000..2687e34 --- /dev/null +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -0,0 +1,1700 @@ +/* + * Copyright (c) 2005 Cisco Systems. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + * + * $Id: ib_srp.c 3932 2005-11-01 17:19:29Z roland $ + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include