<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2654.45">
<TITLE>RE: [Openib-windows] [PATCH v2] Handle RMPP send payload < MAD buffer length</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2>Hi Fab,</FONT>
</P>
<P><FONT SIZE=2>Please add the patch also as an attachment to your mail, since Exchange seems to break the text no meter what we do.</FONT>
</P>
<P><FONT SIZE=2>In any case I have manually applied the patch and received a blue screen after a few hours. (Actually I was receiving this even without your change from time to time). </FONT></P>
<P><FONT SIZE=2>I'm not so familiar with rmpp, but I think that I know where the problem is:</FONT>
<BR><FONT SIZE=2>Replace the line:</FONT>
<BR> <FONT SIZE=2>max_len = h_send->p_send_mad->size - offset;</FONT>
<BR><FONT SIZE=2>with </FONT>
<BR> <FONT SIZE=2>max_len = h_send->p_send_mad->size - offset - hdr_len;</FONT>
<BR><FONT SIZE=2>(of course I might be wrong here).</FONT>
</P>
<P><FONT SIZE=2>I'll try to run with this change and let you know if there are still problems. Please note that as the problem is very hard to reproduce it will be hard to tell that this is indeed a fix.</FONT></P>
<P><FONT SIZE=2>Thanks</FONT>
<BR><FONT SIZE=2>Tzachi</FONT>
</P>
<BR>
<P><FONT SIZE=2>More info:</FONT>
</P>
<P><FONT SIZE=2>The line that caused the problem was</FONT>
</P>
<P><FONT SIZE=2> cl_memcpy(</FONT>
<BR><FONT SIZE=2> p_rmpp_dst + hdr_len, p_rmpp_src + hdr_len + offset, max_len );</FONT>
<BR> <FONT SIZE=2>}</FONT>
</P>
<P><FONT SIZE=2>The immediate problem was in the source part - that is:</FONT>
<BR><FONT SIZE=2>p_rmpp_src + hdr_len + offset + max_len was crossing a page boundary and this page was not found. This immediately caused a blue screen.</FONT></P>
<BR>
<P><FONT SIZE=2>Some more info:</FONT>
<BR><FONT SIZE=2>p_rmpp_hdr->seg_num == 1</FONT>
</P>
<P><FONT SIZE=2>I believe that if you change max_len to be h_send->p_send_mad->size - offset - hdr_len</FONT>
<BR><FONT SIZE=2>it will solve the problem (at least in this example):</FONT>
<BR><FONT SIZE=2>=> max_len = 90. And the copy operation will not access places behind this page.</FONT>
</P>
<P><FONT SIZE=2>The local variables are:</FONT>
</P>
<P><FONT SIZE=2> h_mad_reg = 0x813e733c</FONT>
<BR><FONT SIZE=2> h_send = 0xfef74570</FONT>
<BR><FONT SIZE=2> hdr_len = 0x38</FONT>
<BR><FONT SIZE=2> offset = 0x38</FONT>
<BR><FONT SIZE=2> p_rmpp_hdr = 0xfecf3ee8</FONT>
<BR><FONT SIZE=2> max_len = 0xc8</FONT>
<BR><FONT SIZE=2> p_rmpp_dst = 0x813beabc "???"</FONT>
<BR><FONT SIZE=2> p_send_wr = 0xfef745a8</FONT>
<BR><FONT SIZE=2> p_al_element = 0x813be9c0</FONT>
<BR><FONT SIZE=2> p_rmpp_src = 0xfecf3ee8 "???"</FONT>
</P>
<BR>
<P><FONT SIZE=2>1: kd> dt -r2 h_send</FONT>
<BR><FONT SIZE=2>Local var @ 0xf50f0b2c Type _al_mad_send*</FONT>
<BR><FONT SIZE=2>0xfef74570 </FONT>
<BR><FONT SIZE=2> +0x000 pool_item : _cl_pool_item</FONT>
<BR><FONT SIZE=2> +0x000 list_item : _cl_list_item</FONT>
<BR><FONT SIZE=2> +0x000 p_next : 0xfef3af10 </FONT>
<BR><FONT SIZE=2> +0x004 p_prev : 0xfef3af10 </FONT>
<BR><FONT SIZE=2> +0x008 p_list : 0xfef3af10 </FONT>
<BR><FONT SIZE=2> +0x00c pad : 0x46200000 </FONT>
<BR><FONT SIZE=2> +0x010 p_pool : 0x45484644 </FONT>
<BR><FONT SIZE=2> +0x000 num_components : ??</FONT>
<BR><FONT SIZE=2> +0x004 component_sizes : ???? </FONT>
<BR><FONT SIZE=2> +0x008 p_components : ???? </FONT>
<BR><FONT SIZE=2> +0x00c num_objects : ??</FONT>
<BR><FONT SIZE=2> +0x010 max_objects : ??</FONT>
<BR><FONT SIZE=2> +0x014 grow_size : ??</FONT>
<BR><FONT SIZE=2> +0x018 pfn_init : ???? </FONT>
<BR><FONT SIZE=2> +0x01c pfn_dtor : ???? </FONT>
<BR><FONT SIZE=2> +0x020 context : ???? </FONT>
<BR><FONT SIZE=2> +0x024 free_list : _cl_qlist</FONT>
<BR><FONT SIZE=2> +0x038 alloc_list : _cl_qlist</FONT>
<BR><FONT SIZE=2> +0x04c state : ??</FONT>
<BR><FONT SIZE=2> (No matching name)</FONT>
<BR><FONT SIZE=2> +0x014 p_send_mad : 0x813be9e8 </FONT>
<BR><FONT SIZE=2> +0x000 p_next : (null) </FONT>
<BR><FONT SIZE=2> +0x008 context1 : 0x001505e8 </FONT>
<BR><FONT SIZE=2> +0x010 context2 : (null) </FONT>
<BR><FONT SIZE=2> +0x018 p_mad_buf : 0xfecf3ee8 </FONT>
<BR><FONT SIZE=2> +0x000 base_ver : 0x1 ''</FONT>
<BR><FONT SIZE=2> +0x001 mgmt_class : 0x3 ''</FONT>
<BR><FONT SIZE=2> +0x002 class_ver : 0x2 ''</FONT>
<BR><FONT SIZE=2> +0x003 method : 0x92 ''</FONT>
<BR><FONT SIZE=2> +0x004 status : 0</FONT>
<BR><FONT SIZE=2> +0x006 class_spec : 0</FONT>
<BR><FONT SIZE=2> +0x008 trans_id : 0x9b000000`01000000</FONT>
<BR><FONT SIZE=2> +0x010 attr_id : 0x1100</FONT>
<BR><FONT SIZE=2> +0x012 resv : 0</FONT>
<BR><FONT SIZE=2> +0x014 attr_mod : 0</FONT>
<BR><FONT SIZE=2> +0x020 size : 0x118</FONT>
<BR><FONT SIZE=2> +0x024 immediate_data : 0</FONT>
<BR><FONT SIZE=2> +0x028 remote_qp : 0x1000000</FONT>
<BR><FONT SIZE=2> +0x030 h_av : 0x820f0854 </FONT>
<BR><FONT SIZE=2> +0x000 obj : _al_obj</FONT>
<BR><FONT SIZE=2> +0x0b8 h_ci_av : 0x813bf248 </FONT>
<BR><FONT SIZE=2> +0x0c0 av_attr : _ib_av_attr</FONT>
<BR><FONT SIZE=2> +0x0f8 list_item : _cl_list_item</FONT>
<BR><FONT SIZE=2> +0x038 send_opt : 4</FONT>
<BR><FONT SIZE=2> +0x03c remote_qkey : 0x180</FONT>
<BR><FONT SIZE=2> +0x040 resp_expected : 0</FONT>
<BR><FONT SIZE=2> +0x044 timeout_ms : 0x64</FONT>
<BR><FONT SIZE=2> +0x048 retry_cnt : 3</FONT>
<BR><FONT SIZE=2> +0x04c rmpp_version : 0 ''</FONT>
<BR><FONT SIZE=2> +0x050 status : d ( IB_WCS_UNKNOWN )</FONT>
<BR><FONT SIZE=2> +0x054 grh_valid : 0</FONT>
<BR><FONT SIZE=2> +0x058 p_grh : 0x813bea94 </FONT>
<BR><FONT SIZE=2> +0x000 ver_class_flow : 0</FONT>
<BR><FONT SIZE=2> +0x004 resv1 : 0</FONT>
<BR><FONT SIZE=2> +0x006 resv2 : 0 ''</FONT>
<BR><FONT SIZE=2> +0x007 hop_limit : 0 ''</FONT>
<BR><FONT SIZE=2> +0x008 src_gid : _ib_gid</FONT>
<BR><FONT SIZE=2> +0x018 dest_gid : _ib_gid</FONT>
<BR><FONT SIZE=2> +0x060 recv_opt : 0</FONT>
<BR><FONT SIZE=2> +0x064 remote_lid : 0</FONT>
<BR><FONT SIZE=2> +0x066 remote_sl : 0 ''</FONT>
<BR><FONT SIZE=2> +0x068 pkey_index : 0</FONT>
<BR><FONT SIZE=2> +0x06a path_bits : 0 ''</FONT>
<BR><FONT SIZE=2> +0x070 send_context1 : (null) </FONT>
<BR><FONT SIZE=2> +0x078 send_context2 : (null) </FONT>
<BR><FONT SIZE=2> +0x018 h_av : (null) </FONT>
<BR><FONT SIZE=2> +0x020 mad_wr : _al_mad_wr</FONT>
<BR><FONT SIZE=2> +0x000 list_item : _cl_list_item</FONT>
<BR><FONT SIZE=2> +0x000 p_next : (null) </FONT>
<BR><FONT SIZE=2> +0x004 p_prev : (null) </FONT>
<BR><FONT SIZE=2> +0x008 p_list : (null) </FONT>
<BR><FONT SIZE=2> +0x00c client_id : 1</FONT>
<BR><FONT SIZE=2> +0x010 client_tid : 0xd3510000`00000000</FONT>
<BR><FONT SIZE=2> +0x018 send_wr : _ib_send_wr</FONT>
<BR><FONT SIZE=2> +0x000 p_next : (null) </FONT>
<BR><FONT SIZE=2> +0x008 wr_id : 0</FONT>
<BR><FONT SIZE=2> +0x010 wr_type : 1 ( WR_SEND )</FONT>
<BR><FONT SIZE=2> +0x014 send_opt : 4</FONT>
<BR><FONT SIZE=2> +0x018 num_ds : 0</FONT>
<BR><FONT SIZE=2> +0x020 ds_array : (null) </FONT>
<BR><FONT SIZE=2> +0x028 immediate_data : 0</FONT>
<BR><FONT SIZE=2> +0x030 dgrm : _send_dgrm</FONT>
<BR><FONT SIZE=2> +0x050 remote_ops : _send_remote_ops</FONT>
<BR><FONT SIZE=2> +0x0a8 p_resp_mad : (null) </FONT>
<BR><FONT SIZE=2> +0x0b0 retry_time : 0xffffffff`ffffffff</FONT>
<BR><FONT SIZE=2> +0x0b8 delay : 0</FONT>
<BR><FONT SIZE=2> +0x0bc retry_cnt : 3</FONT>
<BR><FONT SIZE=2> +0x0c0 canceled : 0</FONT>
<BR><FONT SIZE=2> +0x0c4 uses_rmpp : 1</FONT>
<BR><FONT SIZE=2> +0x0c8 ack_seg : 0</FONT>
<BR><FONT SIZE=2> +0x0cc cur_seg : 2</FONT>
<BR><FONT SIZE=2> +0x0d0 seg_limit : 1</FONT>
<BR><FONT SIZE=2> +0x0d4 total_seg : 2</FONT>
<BR><FONT SIZE=2>Memory read error 45484690</FONT>
</P>
<BR>
<BR>
<BR>
<BR>
<P><FONT SIZE=2>>-----Original Message-----</FONT>
<BR><FONT SIZE=2>>From: Fab Tillier [<A HREF="mailto:ftillier@silverstorm.com">mailto:ftillier@silverstorm.com</A>]</FONT>
<BR><FONT SIZE=2>>Sent: Tuesday, October 04, 2005 2:42 AM</FONT>
<BR><FONT SIZE=2>>To: openib-windows@openib.org</FONT>
<BR><FONT SIZE=2>>Subject: RE: [Openib-windows] [PATCH v2] Handle RMPP send payload < MAD</FONT>
<BR><FONT SIZE=2>>buffer length</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>>> From: Tillier, Fabian</FONT>
<BR><FONT SIZE=2>>> Sent: Monday, October 03, 2005 4:12 PM</FONT>
<BR><FONT SIZE=2>>></FONT>
<BR><FONT SIZE=2>>> Folks,</FONT>
<BR><FONT SIZE=2>>></FONT>
<BR><FONT SIZE=2>>> I found a bug in sending MADs where the last segment of an RMPP send</FONT>
<BR><FONT SIZE=2>>would try</FONT>
<BR><FONT SIZE=2>>> to send a full payload's worth of data in the MAD, which could result in</FONT>
<BR><FONT SIZE=2>>> copying data beyond the end of the source buffer and the corresponding</FONT>
<BR><FONT SIZE=2>>BSOD.</FONT>
<BR><FONT SIZE=2>>></FONT>
<BR><FONT SIZE=2>>> Here's a patch that corrects this. It does not clear the remaining bytes</FONT>
<BR><FONT SIZE=2>>of</FONT>
<BR><FONT SIZE=2>>> the MAD, as I wasn't sure it was needed. Please take a look and confirm</FONT>
<BR><FONT SIZE=2>>that</FONT>
<BR><FONT SIZE=2>>> what I'm doing is sane, and I'll check it in.</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>>Here's an updated version that adds a line to clear the unused portion of</FONT>
<BR><FONT SIZE=2>>the</FONT>
<BR><FONT SIZE=2>>MAD payload. This is probably being overly cautious, but I didn't want to</FONT>
<BR><FONT SIZE=2>>let a</FONT>
<BR><FONT SIZE=2>>previous MAD's potentially sensitive information be partially</FONT>
<BR><FONT SIZE=2>>retransmitted.</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>>- Fab</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>>Signed-off-by: Fab Tillier (ftillier@silverstorm.com)</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>>Index: core/al/al_mad.c</FONT>
<BR><FONT SIZE=2>>===================================================================</FONT>
<BR><FONT SIZE=2>>--- core/al/al_mad.c (revision 100)</FONT>
<BR><FONT SIZE=2>>+++ core/al/al_mad.c (working copy)</FONT>
<BR><FONT SIZE=2>>@@ -1681,6 +1681,7 @@</FONT>
<BR><FONT SIZE=2>> al_mad_element_t *p_al_element;</FONT>
<BR><FONT SIZE=2>> ib_rmpp_mad_t *p_rmpp_hdr;</FONT>
<BR><FONT SIZE=2>> uint8_t *p_rmpp_src, *p_rmpp_dst;</FONT>
<BR><FONT SIZE=2>>+ uintn_t hdr_len, offset, max_len;</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> CL_ENTER( AL_DBG_MAD_SVC, g_al_dbg_lvl );</FONT>
<BR><FONT SIZE=2>> p_send_wr = &h_send->mad_wr.send_wr;</FONT>
<BR><FONT SIZE=2>>@@ -1702,28 +1703,32 @@</FONT>
<BR><FONT SIZE=2>> p_rmpp_dst = (uint8_t*)(uintn_t)p_al_element->mad_ds.vaddr;</FONT>
<BR><FONT SIZE=2>> #endif</FONT>
<BR><FONT SIZE=2>> p_rmpp_src = (uint8_t* __ptr64)h_send->p_send_mad->p_mad_buf;</FONT>
<BR><FONT SIZE=2>>- p_rmpp_hdr = (ib_rmpp_mad_t* __ptr64)h_send->p_send_mad->p_mad_buf;</FONT>
<BR><FONT SIZE=2>>+ p_rmpp_hdr = (ib_rmpp_mad_t*)p_rmpp_src;</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> if( h_send->p_send_mad->p_mad_buf->mgmt_class == IB_MCLASS_SUBN_ADM</FONT>
<BR><FONT SIZE=2>>)</FONT>
<BR><FONT SIZE=2>>- {</FONT>
<BR><FONT SIZE=2>>- /* Copy the header into the registered send buffer. */</FONT>
<BR><FONT SIZE=2>>- cl_memcpy( p_rmpp_dst, p_rmpp_src, IB_SA_MAD_HDR_SIZE );</FONT>
<BR><FONT SIZE=2>>- /* Copy this segment's payload into the registered send buffer.</FONT>
<BR><FONT SIZE=2>>*/</FONT>
<BR><FONT SIZE=2>>- p_rmpp_dst = p_rmpp_dst + IB_SA_MAD_HDR_SIZE;</FONT>
<BR><FONT SIZE=2>>- p_rmpp_src = p_rmpp_src + IB_SA_MAD_HDR_SIZE +</FONT>
<BR><FONT SIZE=2>>- ( (cl_ntoh32( p_rmpp_hdr->seg_num ) - 1) * IB_SA_DATA_SIZE</FONT>
<BR><FONT SIZE=2>>);</FONT>
<BR><FONT SIZE=2>>- cl_memcpy( p_rmpp_dst, p_rmpp_src, IB_SA_DATA_SIZE );</FONT>
<BR><FONT SIZE=2>>- }</FONT>
<BR><FONT SIZE=2>>+ hdr_len = IB_SA_MAD_HDR_SIZE;</FONT>
<BR><FONT SIZE=2>> else</FONT>
<BR><FONT SIZE=2>>+ hdr_len = MAD_RMPP_HDR_SIZE;</FONT>
<BR><FONT SIZE=2>>+</FONT>
<BR><FONT SIZE=2>>+ max_len = MAD_BLOCK_SIZE - hdr_len;</FONT>
<BR><FONT SIZE=2>>+</FONT>
<BR><FONT SIZE=2>>+ offset = hdr_len + (max_len * (cl_ntoh32( p_rmpp_hdr->seg_num ) -</FONT>
<BR><FONT SIZE=2>>1));</FONT>
<BR><FONT SIZE=2>>+</FONT>
<BR><FONT SIZE=2>>+ /* Copy the header into the registered send buffer. */</FONT>
<BR><FONT SIZE=2>>+ cl_memcpy( p_rmpp_dst, p_rmpp_src, hdr_len );</FONT>
<BR><FONT SIZE=2>>+</FONT>
<BR><FONT SIZE=2>>+ /* Copy this segment's payload into the registered send buffer. */</FONT>
<BR><FONT SIZE=2>>+ CL_ASSERT( h_send->p_send_mad->size != offset );</FONT>
<BR><FONT SIZE=2>>+ if( (h_send->p_send_mad->size - offset) < max_len )</FONT>
<BR><FONT SIZE=2>> {</FONT>
<BR><FONT SIZE=2>>- /* Copy the header into the registered send buffer. */</FONT>
<BR><FONT SIZE=2>>- cl_memcpy( p_rmpp_dst, p_rmpp_src, MAD_RMPP_HDR_SIZE );</FONT>
<BR><FONT SIZE=2>>- /* Copy this segment's payload into the registered send buffer.</FONT>
<BR><FONT SIZE=2>>*/</FONT>
<BR><FONT SIZE=2>>- p_rmpp_dst = p_rmpp_dst + MAD_RMPP_HDR_SIZE;</FONT>
<BR><FONT SIZE=2>>- p_rmpp_src = p_rmpp_src + MAD_RMPP_HDR_SIZE +</FONT>
<BR><FONT SIZE=2>>- ( (cl_ntoh32( p_rmpp_hdr->seg_num ) - 1) *</FONT>
<BR><FONT SIZE=2>>MAD_RMPP_DATA_SIZE );</FONT>
<BR><FONT SIZE=2>>- cl_memcpy( p_rmpp_dst, p_rmpp_src, MAD_RMPP_DATA_SIZE );</FONT>
<BR><FONT SIZE=2>>+ max_len = h_send->p_send_mad->size - offset;</FONT>
<BR><FONT SIZE=2>>+ /* Clear unused payload. */</FONT>
<BR><FONT SIZE=2>>+ cl_memclr( p_rmpp_dst + hdr_len + max_len,</FONT>
<BR><FONT SIZE=2>>+ MAD_BLOCK_SIZE - hdr_len - max_len );</FONT>
<BR><FONT SIZE=2>> }</FONT>
<BR><FONT SIZE=2>>+</FONT>
<BR><FONT SIZE=2>>+ cl_memcpy(</FONT>
<BR><FONT SIZE=2>>+ p_rmpp_dst + hdr_len, p_rmpp_src + hdr_len + offset, max_len );</FONT>
<BR><FONT SIZE=2>> }</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>> p_send_wr->num_ds = 1;</FONT>
<BR><FONT SIZE=2>></FONT>
<BR><FONT SIZE=2>>_______________________________________________</FONT>
<BR><FONT SIZE=2>>openib-windows mailing list</FONT>
<BR><FONT SIZE=2>>openib-windows@openib.org</FONT>
<BR><FONT SIZE=2>><A HREF="http://openib.org/mailman/listinfo/openib-windows" TARGET="_blank">http://openib.org/mailman/listinfo/openib-windows</A></FONT>
</P>
</BODY>
</HTML>