[ofw] opensm 3.3.11 crash

Hal Rosenstock hal.rosenstock at gmail.com
Thu Oct 27 09:01:04 PDT 2011


On Thu, Oct 27, 2011 at 11:49 AM, Smith, Stan <stan.smith at intel.com> wrote:

>  Hello,****
>
> PSB.****
>
> ** **
>
> *From:* Leonid Keller [mailto:leonid at mellanox.com]
> *Sent:* Thursday, October 27, 2011 5:52 AM
>
> *To:* Smith, Stan; Hefty, Sean
> *Cc:* ofw_list
> *Subject:* RE: opensm 3.3.11 crash****
>
>  ** **
>
> What about Vista ?****
>
> (I’m not sure whether it’s relevant for you)****
>
> We are compiling code for Server 2008  with 6001.18001 compiler.****
>
> We got two warnings.****
>
> ** **
>
> [stan]****
>
> I use the compiler from the Windows Driver Kit version 7.1.0 release
>  ‘C:\WinDDK\7600.16385.1’ and compile for win7, Svr2008-R2, Svr2008(wlh) and
> Vista on x86, x64 & ia64.****
>
> None of the compiles of osm_link_mgr.c show any compilation
> errors/warnings?****
>
> ** **
>
> Googling around I see the 6001.1801 compiler is from a much older DDK,
> around 2009.****
>
> You might want to consider updating your compilation environment (WDK).***
> *
>
> ** **
>
> Stan.****
>
> ** **
>
> ** **
>
> In link_mgr_set_physp_pi() on zero-size memset:****
>
> ** **
>
>                 memset(payload + sizeof(ib_port_info_t), 0,****
>
>                        IB_SMP_DATA_SIZE - sizeof(ib_port_info_t));****
>
> ** **
>
> ( Because after the last changes IB_SMP_DATA_SIZE - sizeof(ib_port_info_t)
> = 0; )****
>
> ** **
>
> I’ve just removed the memset.
>
>
>

I know I removed other similar (size 0) memset but somehow missed this one
:-( I'll send an official patch for this shortly so it'll be fixed in the
next version.

-- Hal


>
>
> ** **
>
> Index:
> B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c****
>
> ===================================================================****
>
> ---
> B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c
> (revision 9050)****
>
> +++
> B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c
> (revision 9051)****
>
> @@ -157,8 +157,6 @@****
>
>                 }****
>
>  ****
>
>                 memcpy(payload, p_old_pi, sizeof(ib_port_info_t));****
>
> -              memset(payload + sizeof(ib_port_info_t), 0,****
>
> -                     IB_SMP_DATA_SIZE - sizeof(ib_port_info_t));****
>
>  ****
>
>                 /*****
>
>                    Should never write back a value that is bigger then 3 in
> ****
>
> ** **
>
> ** **
>
> ** **
>
> Another warnings are in  osm_torus.c:****
>
> ** **
>
> The warnings I witnessed in compilation osm_torus.c required the cast
> (bool) in order to get a clean (warnings free compile).****
>
> The (bool) cast is not in the OFED for Linux code base.****
>
> ** **
>
> 13>b:\users\leonid\svn\winib\trunk\ulp\opensm\user\opensm\osm_torus.c(1191)
> : warning C4311: 'type cast' : pointer truncation from  'f_switch *' to
> 'unsigned int'****
>
> And some more like that.****
>
> I excluded them with #pragma – didn’t want to change Linux code.****
>
> ** **
>
> Index: B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c*
> ***
>
> ===================================================================****
>
> --- B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c
> (revision 9078)****
>
> +++
> B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c
> (revision 9079)****
>
> @@ -1188,7 +1188,9 @@****
>
>                                 port_cnt =
> osm_node_get_num_physp(osm_node);****
>
>                                 sw_guid = osm_node_get_node_guid(osm_node);
> ****
>
>  ****
>
> +#pragma warning( disable:4311 )****
>
>                                 success = (bool)alloc_fswitch(fabric,
> sw_guid, port_cnt);****
>
> +#pragma warning( default:4311 )****
>
>                                 if (!success)****
>
>                                                 goto out;****
>
>                 }****
>
> @@ -1457,6 +1459,7 @@****
>
>                 jp1 = canonicalize(j + 1, t->y_sz);****
>
>                 kp1 = canonicalize(k + 1, t->z_sz);****
>
>  ****
>
> +#pragma warning( disable:4311 )****
>
>                 fp  = set_fp_bit((bool)t->sw[i][j][k], 0, 0, 0);****
>
>                 fp |= set_fp_bit((bool)t->sw[ip1][j][k], x_sz_gt1, 0, 0);*
> ***
>
>                 fp |= set_fp_bit((bool)t->sw[i][jp1][k], 0, y_sz_gt1, 0);*
> ***
>
> @@ -1465,6 +1468,7 @@****
>
>                 fp |= set_fp_bit((bool)t->sw[ip1][j][kp1], x_sz_gt1, 0,
> z_sz_gt1);****
>
>                 fp |= set_fp_bit((bool)t->sw[i][jp1][kp1], 0, y_sz_gt1,
> z_sz_gt1);****
>
>                 fp |= set_fp_bit((bool)t->sw[ip1][jp1][kp1], x_sz_gt1,
> y_sz_gt1, z_sz_gt1);****
>
> +#pragma warning( default:4311 )****
>
>  ****
>
>                 fp |= x_sz_gt1 << 8;****
>
>                 fp |= y_sz_gt1 << 9;****
>
> @@ -8789,6 +8793,7 @@****
>
>                 return IB_SUCCESS;****
>
>  }****
>
>  ****
>
> +#pragma warning( disable:4311 )****
>
>  static****
>
>  bool good_xy_ring(struct torus *t, const int x, const int y, const int z)
> ****
>
>  {****
>
> @@ -8804,6 +8809,7 @@****
>
>  ****
>
>                 return good_ring;****
>
>  }****
>
> +#pragma warning( default:4311 )****
>
>  ****
>
>  static****
>
>  struct t_switch *find_plane_mid(struct torus *t, const int z)****
>
> @@ -8830,6 +8836,7 @@****
>
>                 return NULL;****
>
>  }****
>
>  ****
>
> +#pragma warning( disable:4311 )****
>
>  static****
>
>  struct t_switch *find_stree_root(struct torus *t)****
>
>  {****
>
> @@ -8892,6 +8899,7 @@****
>
>  out:****
>
>                 return root;****
>
>  }****
>
> +#pragma warning( default:4311 )****
>
>  ****
>
>  static****
>
>  bool sw_in_master_stree(struct t_switch *sw)****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Smith, Stan [mailto:stan.smith at intel.com]
> *Sent:* Monday, October 17, 2011 8:20 PM
>
> *To:* Leonid Keller; Hefty, Sean
> *Cc:* ofw_list
> *Subject:* RE: opensm 3.3.11 crash****
>
>  ** **
>
> No problems Leo; please feel to criticize, I value your insights and
> experience – all comments are welcome; that’s the way the code gets better!
> ****
>
> I just wanted you to understand you are way ahead of us in win8 testing…**
> **
>
> ** **
>
> Stan.****
>
> ** **
>
> *From:* Leonid Keller [mailto:leonid at mellanox.co.il]
> *Sent:* Monday, October 17, 2011 9:56 AM
> *To:* Smith, Stan; Hefty, Sean
> *Cc:* ofw_list
> *Subject:* RE: opensm 3.3.11 crash****
>
> ** **
>
> Stan, I absolutely didn’t mean to criticize or so. L****
>
> I know you didn’t work with win8, but even if you did, so what ?****
>
> It was just a bug  report with mentioning where it happened.****
>
> Because I’m not sure you can reproduce it in win7, for example.****
>
> ** **
>
> *From:* Smith, Stan [mailto:stan.smith at intel.com]
> *Sent:* Monday, October 17, 2011 6:17 PM
> *To:* Leonid Keller; Hefty, Sean
> *Cc:* ofw_list
> *Subject:* RE: opensm 3.3.11 crash****
>
> ** **
>
> Hi Leo,****
>
>   Just so everyone understands, neither Sean nor myself have done any
> testing on Win8; we are still trying to get winOFED 3.0 out the door…  J**
> **
>
> Please continue the win8 reports as it only helps, just understand no one
> here has walked the win8 path.****
>
> ** **
>
> Stan.****
>
> ** **
>
> *From:* ofw-bounces at lists.openfabrics.org [mailto:
> ofw-bounces at lists.openfabrics.org] *On Behalf Of *Leonid Keller
> *Sent:* Monday, October 17, 2011 8:26 AM
> *To:* Hefty, Sean
> *Cc:* ofw_list
> *Subject:* [ofw] opensm 3.3.11 crash****
>
> ** **
>
> Hi Sean,****
>
> ** **
>
> We experienced a crash of opensm in win8.****
>
> The core reason for the crash was that WinMad driver has not started for
> some reason.****
>
> So libibumad couldn’t work.****
>
> But it had a bug in error flow which caused an access violation.****
>
> ** **
>
> I bring here two quotes of umad_open_port() function with my comments:****
>
> ** **
>
> // umad_open_port****
>
> ** **
>
> …****
>
>                 hr = WmGetObject(IID_IWMProvider, (LPVOID*)
> &ports[portid].prov);****
>
>                 if (FAILED(hr)) {****
>
>                                 CloseHandle(ports[portid].overlap.hEvent);
> ****
>
>                                 portid = GetLastError() & 0x80000000;****
>
>                                 goto out;****
>
>                 }****
>
>                 // if there is no providers WmGetObject will return error,
> but portid wil be set to 0****
>
>                 // We saw it when GetLastError() returned 6. Maybe you mean
> it to be HRESULT_FROM_WIN32(GetLastError()) ?****
>
> ** **
>
>                 // As a result umad_open_port() returns 0, i.e. SUCCESS
> which causes later an access violation.****
>
> ** **
>
> ** **
>
> _______________________________________________
> ofw mailing list
> ofw at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ofw
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20111027/0d591346/attachment.html>


More information about the ofw mailing list