[openib-general] RMPP Message Format Errors
Eitan Zahavi
eitan at mellanox.co.il
Sat Sep 17 13:48:16 PDT 2005
Hi Hal and Sean
I wrote:
>>I wonder how come the received MAD is only of 256 bytes. I expected it
>
> to be of headers + data = 56 + 336 = 392byte.
>
I have rerun the test on a fresh build from the main trunk.
What I see now is the following:
I have noticed that the last record in each RMPP GetTableResp are only partly filled.
I have traced that to the actual sent data on the wire so I guess there
is another bug in the sender. I attach here the text dump of the analyzer trace.
You can see how the "node description" field is cut in the NodeInfoRec query and how
the last PortInfoRec is mostly zeros in the second MAD.
To reproduce the situation you need a switch and a two port HCA.
Then run opensm and osmtest -f c (to create the inventory file). What you should see is
broken node description for the last node and port record full of zeros for the last port:
DEFINE_NODE
lid 0x3
base_version 0x1
class_version 0x1
node_type 0x2 # (Switch)
num_ports 0x18
sys_guid 0x0000000000000000
node_guid 0x0002c900deadbeaf
port_guid 0x0002c900deadbeaf
partition_cap 0x8
device_id 0xB924
revision 0xA0
# port_num 0x5
# vendor_id 0x2C9
# node_desc MT47396 Infi
END
snip
DEFINE_PORT
lid 0x3
port_num 0x18
m_key 0x0000000000000000
subnet_prefix 0x0000000000000000
base_lid 0x0
master_sm_base_lid 0x0
capability_mask 0x0
diag_code 0x0
m_key_lease_period 0x0
local_port_num 0x0
link_width_enabled 0x0
link_width_supported 0x0
link_width_active 0x0
link_speed_supported 0x0
port_state No State Change (NOP)
state_info2 0x0
mpb 0x0
lmc 0x0
link_speed 0x0
mtu_smsl 0x0
vl_cap 0x0
vl_high_limit 0x0
vl_arb_high_cap 0x0
vl_arb_low_cap 0x0
mtu_cap 0x0
vl_stall_life 0x0
vl_enforce 0x0
m_key_violations 0x0
p_key_violations 0x0
q_key_violations 0x0
guid_cap 0x0
subnet_timeout 0x0
resp_time_value 0x0
error_threshold 0x0
END
Also for some reason after the receiver gets the reassembled data it always gets an extra 20 bytes:
Sep 17 23:23:13 807391 [8003] -> osm_vendor_get: [
Sep 17 23:23:13 807408 [8003] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x8085cf8, size = 412.
Sep 17 23:23:13 807429 [8003] -> osm_vendor_get: Acquired UMAD 0x8087868, size = 412.
Sep 17 23:23:13 807447 [8003] -> osm_vendor_get: ]
Sep 17 23:23:13 807464 [8003] -> osm_mad_pool_get: Acquired p_madw = 0x8085ce4, p_mad = 0x80878a0, size = 412.
Sep 17 23:23:13 807481 [8003] -> osm_mad_pool_get: ]
Sep 17 23:23:13 807498 [8003] -> __osmv_sa_mad_rcv_cb: [
Sep 17 23:23:13 807515 [8003] -> __osmv_sa_mad_rcv_cb: Count = 3 = 356 / 112 (20)
Sep 17 23:23:13 807534 [8003] -> osmtest_query_res_cb: [
Sep 17 23:23:13 807551 [8003] -> osmtest_query_res_cb: ]
Sep 17 23:23:13 807583 [8003] -> __osmv_sa_mad_rcv_cb: ]
Sep 17 23:23:13 807591 [4000] -> __osmv_send_sa_req: ]
The expected size of the first GetTable(NodeInfoRecord) MAD with 3 records should have been:
392 bytes long.
On top of that I do not see now the issue with the 256 byte.. So I guess we are left with the sender bug
and extra SA mad header in the reassembly.
Thanks
Eitan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Bad RMPP gen2 17 Sep 2005.txt
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050917/6f7e2c40/attachment.txt>
More information about the general
mailing list