[ofw] opensm 3.3.11 crash

Smith, Stan stan.smith at intel.com
Thu Oct 27 08:49:16 PDT 2011


Hello,
PSB.

From: Leonid Keller [mailto:leonid at mellanox.com]
Sent: Thursday, October 27, 2011 5:52 AM
To: Smith, Stan; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

What about Vista ?
(I'm not sure whether it's relevant for you)
We are compiling code for Server 2008  with 6001.18001 compiler.
We got two warnings.

[stan]
I use the compiler from the Windows Driver Kit version 7.1.0 release  'C:\WinDDK\7600.16385.1' and compile for win7, Svr2008-R2, Svr2008(wlh) and Vista on x86, x64 & ia64.
None of the compiles of osm_link_mgr.c show any compilation errors/warnings?

Googling around I see the 6001.1801 compiler is from a much older DDK, around 2009.
You might want to consider updating your compilation environment (WDK).

Stan.


In link_mgr_set_physp_pi() on zero-size memset:

                memset(payload + sizeof(ib_port_info_t), 0,
                       IB_SMP_DATA_SIZE - sizeof(ib_port_info_t));

( Because after the last changes IB_SMP_DATA_SIZE - sizeof(ib_port_info_t) = 0; )

I've just removed the memset.

Index: B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c
===================================================================
--- B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c            (revision 9050)
+++ B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_link_mgr.c         (revision 9051)
@@ -157,8 +157,6 @@
                }

                memcpy(payload, p_old_pi, sizeof(ib_port_info_t));
-              memset(payload + sizeof(ib_port_info_t), 0,
-                     IB_SMP_DATA_SIZE - sizeof(ib_port_info_t));

                /*
                   Should never write back a value that is bigger then 3 in



Another warnings are in  osm_torus.c:

The warnings I witnessed in compilation osm_torus.c required the cast (bool) in order to get a clean (warnings free compile).
The (bool) cast is not in the OFED for Linux code base.

13>b:\users\leonid\svn\winib\trunk\ulp\opensm\user\opensm\osm_torus.c(1191) : warning C4311: 'type cast' : pointer truncation from  'f_switch *' to 'unsigned int'
And some more like that.
I excluded them with #pragma - didn't want to change Linux code.

Index: B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c
===================================================================
--- B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c   (revision 9078)
+++ B:/users/leonid/svn/winib/trunk/ulp/opensm/user/opensm/osm_torus.c                (revision 9079)
@@ -1188,7 +1188,9 @@
                                port_cnt = osm_node_get_num_physp(osm_node);
                                sw_guid = osm_node_get_node_guid(osm_node);

+#pragma warning( disable:4311 )
                                success = (bool)alloc_fswitch(fabric, sw_guid, port_cnt);
+#pragma warning( default:4311 )
                                if (!success)
                                                goto out;
                }
@@ -1457,6 +1459,7 @@
                jp1 = canonicalize(j + 1, t->y_sz);
                kp1 = canonicalize(k + 1, t->z_sz);

+#pragma warning( disable:4311 )
                fp  = set_fp_bit((bool)t->sw[i][j][k], 0, 0, 0);
                fp |= set_fp_bit((bool)t->sw[ip1][j][k], x_sz_gt1, 0, 0);
                fp |= set_fp_bit((bool)t->sw[i][jp1][k], 0, y_sz_gt1, 0);
@@ -1465,6 +1468,7 @@
                fp |= set_fp_bit((bool)t->sw[ip1][j][kp1], x_sz_gt1, 0, z_sz_gt1);
                fp |= set_fp_bit((bool)t->sw[i][jp1][kp1], 0, y_sz_gt1, z_sz_gt1);
                fp |= set_fp_bit((bool)t->sw[ip1][jp1][kp1], x_sz_gt1, y_sz_gt1, z_sz_gt1);
+#pragma warning( default:4311 )

                fp |= x_sz_gt1 << 8;
                fp |= y_sz_gt1 << 9;
@@ -8789,6 +8793,7 @@
                return IB_SUCCESS;
 }

+#pragma warning( disable:4311 )
 static
 bool good_xy_ring(struct torus *t, const int x, const int y, const int z)
 {
@@ -8804,6 +8809,7 @@

                return good_ring;
 }
+#pragma warning( default:4311 )

 static
 struct t_switch *find_plane_mid(struct torus *t, const int z)
@@ -8830,6 +8836,7 @@
                return NULL;
 }

+#pragma warning( disable:4311 )
 static
 struct t_switch *find_stree_root(struct torus *t)
 {
@@ -8892,6 +8899,7 @@
 out:
                return root;
 }
+#pragma warning( default:4311 )

 static
 bool sw_in_master_stree(struct t_switch *sw)




From: Smith, Stan [mailto:stan.smith at intel.com]
Sent: Monday, October 17, 2011 8:20 PM
To: Leonid Keller; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

No problems Leo; please feel to criticize, I value your insights and experience - all comments are welcome; that's the way the code gets better!
I just wanted you to understand you are way ahead of us in win8 testing...

Stan.

From: Leonid Keller [mailto:leonid at mellanox.co.il]
Sent: Monday, October 17, 2011 9:56 AM
To: Smith, Stan; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

Stan, I absolutely didn't mean to criticize or so. :(
I know you didn't work with win8, but even if you did, so what ?
It was just a bug  report with mentioning where it happened.
Because I'm not sure you can reproduce it in win7, for example.

From: Smith, Stan [mailto:stan.smith at intel.com]
Sent: Monday, October 17, 2011 6:17 PM
To: Leonid Keller; Hefty, Sean
Cc: ofw_list
Subject: RE: opensm 3.3.11 crash

Hi Leo,
  Just so everyone understands, neither Sean nor myself have done any testing on Win8; we are still trying to get winOFED 3.0 out the door...  :)
Please continue the win8 reports as it only helps, just understand no one here has walked the win8 path.

Stan.

From: ofw-bounces at lists.openfabrics.org [mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Leonid Keller
Sent: Monday, October 17, 2011 8:26 AM
To: Hefty, Sean
Cc: ofw_list
Subject: [ofw] opensm 3.3.11 crash

Hi Sean,

We experienced a crash of opensm in win8.
The core reason for the crash was that WinMad driver has not started for some reason.
So libibumad couldn't work.
But it had a bug in error flow which caused an access violation.

I bring here two quotes of umad_open_port() function with my comments:

// umad_open_port

...
                hr = WmGetObject(IID_IWMProvider, (LPVOID*) &ports[portid].prov);
                if (FAILED(hr)) {
                                CloseHandle(ports[portid].overlap.hEvent);
                                portid = GetLastError() & 0x80000000;
                                goto out;
                }
                // if there is no providers WmGetObject will return error, but portid wil be set to 0
                // We saw it when GetLastError() returned 6. Maybe you mean it to be HRESULT_FROM_WIN32(GetLastError()) ?

                // As a result umad_open_port() returns 0, i.e. SUCCESS which causes later an access violation.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20111027/d548ecc5/attachment.html>


More information about the ofw mailing list