[openib-general] RMPP Message Format Errors

Eitan Zahavi eitan at mellanox.co.il
Sat Sep 17 13:48:16 PDT 2005


Hi Hal and Sean

I wrote:
>>I wonder how come the received MAD is only of 256 bytes. I expected it
> 
> to be of headers + data = 56 + 336 = 392byte.
> 

I have rerun the test on a fresh build from the main trunk.
What I see now is the following:

I have noticed that the last record in each RMPP GetTableResp are only partly filled.
I have traced that to the actual sent data on the wire so I guess there
is another bug in the sender. I attach here the text dump of the analyzer trace.
You can see how the "node description" field is cut in the NodeInfoRec query and how
the last PortInfoRec is mostly zeros in the second MAD.

To reproduce the situation you need a switch and a two port HCA.
Then run opensm and osmtest -f c (to create the inventory file). What you should see is
broken node description for the last node and port record full of zeros for the last port:
DEFINE_NODE
lid                     0x3
base_version            0x1
class_version           0x1
node_type               0x2 # (Switch)
num_ports               0x18
sys_guid                0x0000000000000000
node_guid               0x0002c900deadbeaf
port_guid               0x0002c900deadbeaf
partition_cap           0x8
device_id               0xB924
revision                0xA0
# port_num              0x5
# vendor_id             0x2C9
# node_desc             MT47396 Infi
END
snip
DEFINE_PORT
lid                     0x3
port_num                0x18
m_key                   0x0000000000000000
subnet_prefix           0x0000000000000000
base_lid                0x0
master_sm_base_lid      0x0
capability_mask         0x0
diag_code               0x0
m_key_lease_period      0x0
local_port_num          0x0
link_width_enabled      0x0
link_width_supported    0x0
link_width_active       0x0
link_speed_supported    0x0
port_state              No State Change (NOP)
state_info2             0x0
mpb                     0x0
lmc                     0x0
link_speed              0x0
mtu_smsl                0x0
vl_cap                  0x0
vl_high_limit           0x0
vl_arb_high_cap         0x0
vl_arb_low_cap          0x0
mtu_cap                 0x0
vl_stall_life           0x0
vl_enforce              0x0
m_key_violations        0x0
p_key_violations        0x0
q_key_violations        0x0
guid_cap                0x0
subnet_timeout          0x0
resp_time_value         0x0
error_threshold         0x0
END



Also for some reason after the receiver gets the reassembled data it always gets an extra 20 bytes:
Sep 17 23:23:13 807391 [8003] -> osm_vendor_get: [
Sep 17 23:23:13 807408 [8003] -> osm_vendor_get: Acquiring UMAD for p_madw = 0x8085cf8, size = 412.
Sep 17 23:23:13 807429 [8003] -> osm_vendor_get: Acquired UMAD 0x8087868, size = 412.
Sep 17 23:23:13 807447 [8003] -> osm_vendor_get: ]
Sep 17 23:23:13 807464 [8003] -> osm_mad_pool_get: Acquired p_madw = 0x8085ce4, p_mad = 0x80878a0, size = 412.
Sep 17 23:23:13 807481 [8003] -> osm_mad_pool_get: ]
Sep 17 23:23:13 807498 [8003] -> __osmv_sa_mad_rcv_cb: [
Sep 17 23:23:13 807515 [8003] -> __osmv_sa_mad_rcv_cb: Count = 3 = 356 / 112 (20)
Sep 17 23:23:13 807534 [8003] -> osmtest_query_res_cb: [
Sep 17 23:23:13 807551 [8003] -> osmtest_query_res_cb: ]
Sep 17 23:23:13 807583 [8003] -> __osmv_sa_mad_rcv_cb: ]
Sep 17 23:23:13 807591 [4000] -> __osmv_send_sa_req: ]

The expected size of the first GetTable(NodeInfoRecord) MAD with 3 records should have been:
392 bytes long.

On top of that I do not see now the issue with the 256 byte.. So I guess we are left with the sender bug
and extra SA mad header in the reassembly.

Thanks

Eitan



-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Bad RMPP gen2 17 Sep 2005.txt
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050917/6f7e2c40/attachment.txt>


More information about the general mailing list