Dear all,<div><br></div><div>I have installed the OFED-1.5.2-rxe on our linux host, which has three network interfaces, Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet NIC, Mellanox Technologies MT26478 [ConnectX EN 40GigE, PCIe 2.0 5GT/s] and NetEffect NE020 10Gb Accelerated Ethernet Adapter (iWARP RNIC). The Soft-RoCE can work on the first two cards, but when I applied the Soft-RoCE to the Neteffect card and use "ibv_devinfo" to view the RDMA device, I got following errors on some of the hosts in our cluster, and the whole OFED stack doesnot work.</div>
<div><br></div><div><div><b>[root@netqos14 ~]# rxe_cfg status</b></div><div><b>Name Link Driver Speed MTU IPv4_addr S-RoCE RMTU</b></div><div><b>eth0 yes bnx2 1500 198.124.220.155</b></div>
<div><b>eth1 no bnx2 1500</b></div><div><b>eth2 no bnx2 1500</b></div><div><b>eth3 no bnx2 1500</b></div><div><b>eth4 yes iw_nes 1500 198.124.220.207 rxe0</b></div>
<div><b>rxe eth_proto_id: 0x8915</b></div><div><b><br></b></div><div><b><br></b></div><div><b>[root@netqos14 ~]# ibv_devinfo</b></div><div><b>hca_id: mlx4_0</b></div><div><b> transport: InfiniBand (0)</b></div>
<div><b> fw_ver: 2.7.626</b></div><div><b> node_guid: 0002:c903:000b:f306</b></div><div><b> sys_image_guid: 0002:c903:000b:f309</b></div><div>
<b> vendor_id: 0x02c9</b></div><div><b> vendor_part_id: 26428</b></div><div><b> hw_ver: 0xB0</b></div><div><b> board_id: MT_0D90110009</b></div>
<div><b> phys_port_cnt: 1</b></div><div><b> port: 1</b></div><div><b> state: PORT_ACTIVE (4)</b></div><div><b> max_mtu: 2048 (4)</b></div>
<div><b> active_mtu: 2048 (4)</b></div><div><b> sm_lid: 6</b></div><div><b> port_lid: 4</b></div><div><b> port_lmc: 0x00</b></div>
<div><b> link_layer: IB</b></div><div><b><br></b></div><div><b>hca_id: nes0</b></div><div><b> transport: iWARP (1)</b></div><div><b> fw_ver: 3.16</b></div>
<div><b> node_guid: 0012:5502:f6ac:0000</b></div><div><b> sys_image_guid: 0012:5502:f6ac:0000</b></div><div><b> vendor_id: 0x1255</b></div><div>
<b> vendor_part_id: 256</b></div><div><b> hw_ver: 0x5</b></div><div><b> board_id: NES020 Board ID</b></div><div><b> phys_port_cnt: 1</b></div>
<div><b> port: 1</b></div><div><b> state: PORT_ACTIVE (4)</b></div><div><b> max_mtu: 4096 (5)</b></div><div><b> active_mtu: 1024 (3)</b></div>
<div><b> sm_lid: 0</b></div><div><b> port_lid: 1</b></div><div><b> port_lmc: 0x00</b></div><div><b> link_layer: Ethernet</b></div>
<div><br></div><div><font class="Apple-style-span" color="#FF0000"><b>libnes: nes_ualloc_context: Invalid kernel driver version detected. Detected 0, should be 1</b></font></div><div><font class="Apple-style-span" color="#FF0000"><b>libnes: nes_ualloc_context: Failed to allocate context for device.</b></font></div>
<div><font class="Apple-style-span" color="#FF0000"><b>Failed to open device</b></font></div></div><div><br></div><div>However, some hosts in our cluster can make Soft-RoCE work on the iWARP RNIC with the same configuration. The info is as follows,</div>
<div><br></div><div><div>[root@netqos13 rftp]# rxe_cfg status</div><div>Name Link Driver Speed MTU IPv4_addr S-RoCE RMTU</div><div>eth0 yes bnx2 1500 198.124.220.154</div><div>
eth1 no bnx2 1500</div><div>eth2 no bnx2 1500</div><div>eth3 no bnx2 1500</div><div>eth4 yes iw_nes 1500 198.124.220.206 rxe0 1024 (3)</div>
<div>rxe eth_proto_id: 0x8915</div><div><br></div><div>[root@netqos13 rftp]# ibv_devinfo</div><div>hca_id: mlx4_0</div><div> transport: InfiniBand (0)</div><div> fw_ver: 2.7.626</div>
<div> node_guid: 0002:c903:000b:f31e</div><div> sys_image_guid: 0002:c903:000b:f321</div><div> vendor_id: 0x02c9</div><div> vendor_part_id: 26428</div>
<div> hw_ver: 0xB0</div><div> board_id: MT_0D90110009</div><div> phys_port_cnt: 1</div><div> port: 1</div><div> state: PORT_ACTIVE (4)</div>
<div> max_mtu: 2048 (4)</div><div> active_mtu: 2048 (4)</div><div> sm_lid: 6</div><div> port_lid: 1</div>
<div> port_lmc: 0x00</div><div> link_layer: IB</div><div><br></div><div>hca_id: nes0</div><div> transport: iWARP (1)</div>
<div> fw_ver: 3.16</div><div> node_guid: 0012:5502:f208:0000</div><div> sys_image_guid: 0012:5502:f208:0000</div><div> vendor_id: 0x1255</div>
<div> vendor_part_id: 256</div><div> hw_ver: 0x5</div><div> board_id: NES020 Board ID</div><div> phys_port_cnt: 1</div>
<div> port: 1</div><div> state: PORT_ACTIVE (4)</div><div> max_mtu: 4096 (5)</div><div> active_mtu: 1024 (3)</div>
<div> sm_lid: 0</div><div> port_lid: 1</div><div> port_lmc: 0x00</div><div> link_layer: Ethernet</div>
<div><br></div><div>hca_id: rxe0</div><div> transport: InfiniBand (0)</div><div> fw_ver: 0.0.0</div><div> node_guid: 0212:55ff:fe02:f208</div>
<div> sys_image_guid: 0000:0000:0000:0000</div><div> vendor_id: 0x0000</div><div> vendor_part_id: 0</div><div> hw_ver: 0x0</div>
<div> phys_port_cnt: 1</div><div> port: 1</div><div> state: PORT_ACTIVE (4)</div><div> max_mtu: 4096 (5)</div>
<div> active_mtu: 1024 (3)</div><div> sm_lid: 0</div><div> port_lid: 0</div><div> port_lmc: 0x00</div>
<div> link_layer: Ethernet</div></div><div><br></div><div>The two host are identical. The system info is as follows,</div><div><br></div><div><div>[root@netqos13 rftp]# uname -a</div><div>
Linux netqos13 2.6.18-164.11.1.el5_lustre.1.8.3 #1 SMP Fri Apr 9 18:00:39 MDT 2010 x86_64 x86_64 x86_64 GNU/Linux</div></div><div><br></div><div><div>[root@netqos13 rftp]# ifconfig</div><div>eth0 Link encap:Ethernet HWaddr A4:BA:DB:1E:CC:8D</div>
<div> inet addr:198.124.220.154 Bcast:198.124.220.63 Mask:255.255.255.192</div><div> inet6 addr: fe80::a6ba:dbff:fe1e:cc8d/64 Scope:Link</div><div> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1</div>
<div> RX packets:105558250 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:137816731 errors:0 dropped:0 overruns:0 carrier:0</div><div> collisions:0 txqueuelen:1000</div><div> RX bytes:95088704022 (88.5 GiB) TX bytes:156759141516 (145.9 GiB)</div>
<div> Interrupt:98 Memory:d2000000-d2012800</div><div><br></div><div>eth4 Link encap:Ethernet HWaddr 00:12:55:02:F2:08</div><div> inet addr:198.124.220.206 Bcast:198.124.220.255 Mask:255.255.255.192</div>
<div> inet6 addr: fe80::212:55ff:fe02:f208/64 Scope:Link</div><div> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1</div><div> RX packets:59487544 errors:0 dropped:0 overruns:0 frame:0</div><div>
TX packets:55691374 errors:0 dropped:0 overruns:0 carrier:0</div><div> collisions:0 txqueuelen:1000</div><div> RX bytes:82372790409 (76.7 GiB) TX bytes:34462883454 (32.0 GiB)</div><div> Interrupt:130</div>
<div><br></div><div>ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00</div><div> inet addr:192.168.1.13 Bcast:192.168.1.255 Mask:255.255.255.0</div><div> inet6 addr: fe80::202:c903:b:f31f/64 Scope:Link</div>
<div> UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1</div><div> RX packets:4461 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:17 errors:0 dropped:9 overruns:0 carrier:0</div><div>
collisions:0 txqueuelen:256</div><div> RX bytes:264959 (258.7 KiB) TX bytes:3267 (3.1 KiB)</div><div><br></div><div>lo Link encap:Local Loopback</div><div> inet addr:127.0.0.1 Mask:255.0.0.0</div>
<div> inet6 addr: ::1/128 Scope:Host</div><div> UP LOOPBACK RUNNING MTU:16436 Metric:1</div><div> RX packets:19040792 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:19040792 errors:0 dropped:0 overruns:0 carrier:0</div>
<div> collisions:0 txqueuelen:0</div><div> RX bytes:147810608491 (137.6 GiB) TX bytes:147810608491 (137.6 GiB)</div></div><div><br></div><div>So, my question is why is that Soft-RoCE does not work on some of the NetEffect iWARP RNIC's, but does work on the other NetEffect iWARP RNIC's? All iWARP RNIC's are on different hosts of the same cluster, and connected via a Juniper EX 2500 switch. </div>
<div><br></div><div>Any help will be greatly appreciated.</div><div><br></div><div>-- <br><div>Best regards,</div>
<div> </div>
<div>-----------------------------------------------------------------------------------------------<br>Li, Tan<br>PhD Candidate & Research Assistant, <br>Electrical Engineering, <br>Stony Brook University, NY<br><br>
Personal Web Site: <a href="https://sites.google.com/site/homepagelitan/Home" target="_blank">https://sites.google.com/site/homepagelitan/Home</a><br><br>Email: <a href="mailto:fanqielee@gmail.com" target="_blank">fanqielee@gmail.com</a></div>
<br>
</div>