[openib-general] Re: [Fwd: Re: [Fwd: Re: OpenSM Bugs]]

Tom Duffy tduffy at sun.com
Mon Jan 17 16:12:22 PST 2005


[ putting back on openib-general list ]

On Mon, 2005-01-17 at 15:27 -0500, Hal Rosenstock wrote:
> On Mon, 2005-01-17 at 14:47, Tom Duffy wrote:
> > On Sat, 2005-01-15 at 07:30 -0500, Hal Rosenstock wrote:
> > > I will have another patch later today which may actually get this to
> > > work now. I forgot (hopefully) one last thing.
> > 
> > After using the latest OpenSM, I am getting a hang on Solaris when
> > running devfsadm -C.  This is new behavior.  There are no debug outputs
> > when running at debug level 2, so I bumped it up to 3 and got this:
> > 
> > [root at dongon ~]# devfsadm -C
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_session_open: opening session, guid = 0002c901097651d1, prefix = 0000000000000003
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_session_open(): port exists
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_add_client: num_registered_clients 2
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_session_open: clientp = 30001e97068, subnetp = 300024b0c50
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_add_event_subscriber: Adding client to event subscriber list, client = 0x1e97068
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_sa_access_start() enter. attr_id = 0x35, access_type = 0x0, comp_mask = 0000000000001808
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_check_sa_support: cap_mask = 0x202, attr_id = 0x35
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_check_sa_support() exiting, attr_supported = 1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_populate_ud_dest_list(): Count not below low water mark
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_init_msg: Sending MAD, class = 0x3, method = 0x12, attr_id = 0x35
> That's SA GetTable for PathRecords of some sort.
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_get_attr_id_length(): attr_id: 0x35 size 64
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_init_msg: Packed payload successfully, attr_id = 0x35, length = 64
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_saa_impl_init_msg() exiting ibmf_status = 0
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): Added message, msgp = 0x30003968200, class = 0x3, method = 0x12, attributeID = 0x35
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): msgp = 0x30003968200, TID = 0x97651d100000005, transp_op_flags = 0x2
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): msgp = 0x30003968200, local_lid = 0x2, remote_lid = 0x1, remote_qpn = 0x1, block = 1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): unsetting timer 30003968200 0
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_msg_transport(): blocking for completion, msgp = 0x30003968200
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg_client(): Found message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_compl(): Sequenced transaction, setting response timer msgp = 30003968200
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_set_timer: setting response timer, interval = 1073745 resp_time 4 round trip time 10624d
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_cb(): Send callback done.  Dec ref count, msg = 30003968200
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Received MAD, tid = 097651d100000005, class = 0x3, attrID = 0x35, lid = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Comparing to msg, msgp = 0x30003968200, tid = 0x97651d100000005, remote_lid = 0x1, mgmt_class = 0x3
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Found message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Handling rmpp MAD, tid = 097651d100000005,flags = 0x7 rmpp_type = 1, rmpp_segnum = 0
> This is the SA response of DATA packet indicating First and Last (and Active).
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): first RMPP pkt received, msgimplp = 30003968200
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb: new resp time received, resp_time 0
> Oops. I forgot about setting RRespTime in the RMPP header too.
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_active_flow(): DATA packet received, processing packet
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_flow_main(): segnum = 0, es = 1, wl = 1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_flow_main(): Unexpected segment number, discarding packet
> I also need to set SegmentNumber (to 1 as this is a First packet) and PayloadLength in the RMPP header for the DATA packet.
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_send_rmpp(): msgp = 0x30003968200, next_seg = 0x0, num_pkts = 0
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_init_send_wqe: msgimplp = 30003968200, rmpp_type = 2, next_seg = 0, num_pkts = 0
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_init_send_wqe: msgimplp = 30003968200, rmpp_type = 2, rmpp_flags = 0x1, rmpp_segnum = 0, pyld_nwl = 5
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_set_timer: setting response timer, interval = 1073742 resp_time 1 round trip time 10624d
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg_client(): Found message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_compl(): Received send callback for RMPP trans msgp = 30003968200, rmpp_state = 0x3
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_send_cb(): Send callback done.  Dec ref count, msg = 30003968200
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Received MAD, tid = 097651d100000005, class = 0x3, attrID = 0x35, lid = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Comparing to msg, msgp = 0x30003968200, tid = 0x97651d100000005, remote_lid = 0x1, mgmt_class = 0x3
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_find_msg(): Found message. Inc ref count, msgp = 0x30003968200, ref_cnt = 0x1
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb(): Handling rmpp MAD, tid = 097651d100000005,flags = 0x1 rmpp_type = 2, rmpp_segnum = 0
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_do_recv_cb: new resp time received, resp_time 14
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_rmpp_recvr_active_flow(): ACK packet received, discarding packet
> > Jan 17 11:29:25 dongon.SFBay.Sun.COM ibmf: ibmf_i_set_timer: setting response timer, interval = 1090125 resp_time 4000 round trip time 10624d
> > Jan 17 11:29:26 dongon.SFBay.Sun.COM ibmf: ibmf_i_send_timeout(): resetting id - 893736
> > Jan 17 11:29:26 dongon.SFBay.Sun.COM ibmf: ibmf_i_send_timeout(): Message not in undefined state, return without processing send timeout, msgp = 0x30003968200
> > 
> > This hangs now and is unkillable.  Never returns.
> > 
> > So, setting the rmpp_version presumably makes Solaris even more confused.
> 
> I forgot about the other fields in the packet that need setting.
> 
> I am not sure whether we are getting deeper into a rat hole yet. Are you
> willing to keep going ?

Yeah, sure.  I'll test any patches you send my way...

-tduffy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050117/b3266a3c/attachment.sig>


More information about the general mailing list