[ofw] Setting up Infiniband over WinXp- Help Needed

Leonid Keller leonid at mellanox.co.il
Sun Jun 14 02:25:28 PDT 2009


Please, send output for both machines, so we could compare.
I'd also suggest to configure the interfaces statically to exclude the
influence of DHCP.


________________________________

	From: Tzachi Dar 
	Sent: Sunday, June 14, 2009 12:03 PM
	To: Ashwath Narasimhan; Leonid Keller
	Cc: Fab Tillier; ofw at lists.openfabrics.org
	Subject: RE: [ofw] Setting up Infiniband over WinXp- Help Needed
	
	
	Hi,
	 
	It seems that your problem comes from some kind of
firewall/filter/antivirus that is installed on one machine but not the
others.
	 
	In order for us to get more information, please do the
followings:
	1) Delete the arp tables on both machines (run "arp -d"), than
start wireshark on both and ping from machine a to b.
	2) Delete the arp tables on both machines (run "arp -d"), than
start wireshark on both and ping from machine b to a.
	Please send me the captures of these two experiments. (you
should have 4 files).
	 
	Can you try using a 3rd computer and see how it works?
	 
	Thanks
	Tzachi


________________________________

		From: nashwath at gmail.com [mailto:nashwath at gmail.com] On
Behalf Of Ashwath Narasimhan
		Sent: Friday, June 12, 2009 4:15 AM
		To: Leonid Keller
		Cc: Tzachi Dar; Fab Tillier; ofw at lists.openfabrics.org
		Subject: Re: [ofw] Setting up Infiniband over WinXp-
Help Needed
		
		
		Hi Everyone,
		 
		 Thank you so much for your replies. Still the same
problem.. able to ping sucessfully from one side but not from the other.
		 
		Hi Tzachi and Leonid,
		a. I followed your steps. I am able to view the
infiniband data when I run the server (ib_send_bw -a) on computer 2 and
I connect to this from computer 1 (ib_send_bw -a <ip>). However, I do
not view this data when I run server on computer 1 and connect from
computer 2. I get a pp_connect_sock<ip,port> failed in the latter case.
		 
		b. I disabled and enabled network interfaces on both
ports, but no luck. It still doesnt work.
		 
		c. I know that its not a hardware issue because the same
problem persists when I interchange the infiniband cards i.e. the card
that was actually plugged into computer 2 is now plugged into computer 1
and vice versa. I get the same issue in this case too. 
		 
		d. I then installed Ultra VNC and ran one end as server
and the other as client.. And it worked perfectly fine!!!!!!!.. Both
from computer 1 to computer 2 and computer 2 to computer 1. I then
installed WIRESHARK on both computers. I could see the Computer 2 send
the Ping requests to computer 1 in Computer 2's Wireshark window but for
some bizzare reason computer 1 was rejecting these ping requests. When I
checked the connection status of Computer 1, I could see the number of
received packets also increasing but Computer 1 did not send back any
packets. 
		 
		e. I suspect this issue is arising because of some win
xp setting in Computer 1. There is no difference between the two PC's.
both are brand new PC's having Xp. The only difference is that I have a
wifi driver on computer 1. all my firewall settings are disabled. I even
uninstalled my wifi driver, but still the same problem persists. 
		 
		f. ipconfig and vstat return the correct values.. here's
the output of these commands on computer 1

		Windows IP Configuration
		        Host Name . . . . . . . . . . . . :
LENOVO-CF61BEED
		        Primary Dns Suffix  . . . . . . . :
		        Node Type . . . . . . . . . . . . : Unknown
		        IP Routing Enabled. . . . . . . . : No
		        WINS Proxy Enabled. . . . . . . . : No
		        DNS Suffix Search List. . . . . . :
ee.columbia.edu
		Ethernet adapter Local Area Connection:
		        Connection-specific DNS Suffix  . :
ee.columbia.edu
		        Description . . . . . . . . . . . : Marvell
Yukon 88E8056 PCI-E Gigabit
		Ethernet Controller
		        Physical Address. . . . . . . . . :
00-21-97-CB-64-97
		        Dhcp Enabled. . . . . . . . . . . : Yes
		        Autoconfiguration Enabled . . . . : Yes
		        IP Address. . . . . . . . . . . . :
128.59.65.132
		        Subnet Mask . . . . . . . . . . . :
255.255.252.0
		        Default Gateway . . . . . . . . . : 128.59.64.1
		        DHCP Server . . . . . . . . . . . : 128.59.64.59
		        DNS Servers . . . . . . . . . . . : 128.59.64.59
		                                            128.59.16.20
		        Lease Obtained. . . . . . . . . . : Thursday,
June 11, 2009 5:39:25 PM
		        Lease Expires . . . . . . . . . . : Saturday,
June 13, 2009 12:39:25 PM
		Ethernet adapter Local Area Connection 7:
		        Media State . . . . . . . . . . . : Media
disconnected
		        Description . . . . . . . . . . . : Mellanox
IPoIB Adapter #4
		        Physical Address. . . . . . . . . :
00-05-AD-04-E7-C6
		Ethernet adapter Local Area Connection 6:
		        Connection-specific DNS Suffix  . :
		        Description . . . . . . . . . . . : Mellanox
IPoIB Adapter #3
		        Physical Address. . . . . . . . . :
00-05-AD-04-E7-C5
		        Dhcp Enabled. . . . . . . . . . . : Yes
		        Autoconfiguration Enabled . . . . : Yes
		        Autoconfiguration IP Address. . . :
169.254.53.191
		        Subnet Mask . . . . . . . . . . . : 255.255.0.0 
		        Default Gateway . . . . . . . . . :
		
		C:\<my directory>vstat
		        hca_idx=0
		        uplink={BUS=PCI_E, SPEED=2.5 Gbps,
		        vendor_id=0x05ad
		        vendor_part_id=0x6278
		        hw_ver=0xa0
		        fw_ver=0x400080395
		        node_guid=0005:ad00:0004:e7c4
		        num_phys_ports=2
		                port=1
		                port_state=PORT_ACTIVE (4)
		                link_speed=2.5 Gbps (1)
		                link_width=4x (2)
		                rate=10 Gbps
		                port_phys_state=LINK_UP (5)
		                active_speed=2.5 Gbps (1)
		                sm_lid=0x0001
		                port_lid=0x0002
		                port_lmc=0x0
		                max_mtu=2048 (4)
		                port=2
		                port_state=PORT_DOWN (1)
		                link_speed=NA
		                link_width=NA
		                rate=NA
		                port_phys_state=POLLING (2)
		                active_speed=2.5 Gbps (1)
		                sm_lid=0x0000
		                port_lid=0x0000
		                port_lmc=0x0
		                max_mtu=2048 (4)
		 
		P.S. I am using the first port.
		 
		regards,
		Ashwath
		 
		 

		 
		On Thu, Jun 11, 2009 at 7:00 AM, Leonid Keller
<leonid at mellanox.co.il> wrote:
		

			Hi Ashwath,
			 
			If you still have problems, send us, please, the
output of 'vstat -v' and 'ipconfig /all' on both machines.
			 
			TIA
			Leonid


________________________________

				From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Tzachi Dar
				Sent: Thursday, June 11, 2009 4:43 PM
				To: Ashwath Narasimhan; Fab Tillier 

				Cc: ofw at lists.openfabrics.org
				
				Subject: RE: [ofw] Setting up Infiniband
over WinXp- Help Needed
				
				
				Hi Ashwath,

				There are a few things that I would like
you to try:

				1) Please run some low level IB test to
see that traffic is indeed ok. On one computer please run 

				ib_send_bw -a

				and on the other computer please run

				ib_send_bw -a 192.168.0.x        (where
x is the ip of the remote side. Please start this test with the Ethernet
addresses of the ports).
				 
				2) Assuming all works well please try to
disable and enable the network interfaces (ipoib) on both ports. Please
see if this helps.
				 
				3) If this doesn't help, you will
probably need to change the parameter of "Guid bitwise mask" to e7. To
do this, please open the device manager, than go to "network adapters"
select the ipoib interfaces and than right click properties. Select the
"Guid bitwise mask" and change it to e7.
				 
				If all doesn't help, can you give me
remote access to these stations?
				 
				Thanks
				Tzachi


________________________________

				From: ofw-bounces at lists.openfabrics.org
[mailto:ofw-bounces at lists.openfabrics.org] On Behalf Of Ashwath
Narasimhan
				Sent: Thursday, June 11, 2009 5:04 AM
				To: Fab Tillier
				Cc: ofw at lists.openfabrics.org
				Subject: Re: [ofw] Setting up Infiniband
over WinXp- Help Needed
				
				
				Hi Fab,
				
				             I restarted opensm on the
other node. I ran both opensm and ibdiagnet on the other node (not on
the node where opensm is running). The logs are similar to the one I
attached in my previous mail. (Computer 1 :-192.168.0.1 logs in my
previous mail). I have disabled firewall settings on both nodes.
However, I still cannot get it to work. I cannot access the shared
folder of each node from the other.  Is there something else I can try?
				
				p.s. There is a typo in my previous
mail. I had opensm running on computer 2 and not computer 1.
				
				regards,
				Ashwath
				
				
				On Wed, Jun 10, 2009 at 9:42 PM, Fab
Tillier <ftillier at windows.microsoft.com> wrote:
				

				Hi Ashwath,
				

				>I am new to the world of infiniband and
I am trying to set up an
				>infiniband network between two Lenovo
x86 Desktops (Windows Xp).
				
				
				Welcome!
				

				>Problem:-
				>I am able to ping 192.168.0.2 from
192.168.0.1 however ping does not
				>work the other way around i.e. from
192.168.0.2 to 192.168.0.1. I don't
				>understand why this is not happening. I
see that the "bind" fails but I
				>dont understand why. Shouldn't it be
two way? (I am using one cable to
				>connect the two adaptors) Please help
me. Thanks.
				
				
				Check your firewall settings on the
192.168.0.1 box.  Can you access the administrative share on each node
from the other (\\192.168.0.1\c$, and \\192.168.0.2\c$?)
				

	
>=======================================================================
=
				>Computer No2: 192.168.0.2
				>when I ran osmtest here:
				>
				> C:\<mydirectory>osmtest -f -a
				>Command Line Arguments
				>Done with args
				>        Flow = All Validations
				>Using default guid 0x5ad000004e7c6
				>[17:59:17:437][0388] ->
osm_vendor_bind: Binding to port
				>0x5ad000004e7c6.
				>[17:59:17:437][0388] ->
osm_vendor_bind: ERR 3B21: Unable to register
				>QP0 MAD se
				>rvice (IB_INSUFFICIENT_MEMORY).
				>[17:59:17:437][0388] -> osmv_bind_sa:
ERR 0506: Fail to bind to vendor
				>SMI.
				>[17:59:17:437][0388] -> osmtest_bind:
ERR 0137: Unable to bind to SA
				
				
				You probably have OpenSM running on this
node, yes?  You can't run osmtest on the same port where OpenSM is
running.
				

				>when I ran ibdiagnet here
				>
				>C:\<my directory>ibdiagnet
				>Loading IBIS from: C:/Program
	
>Files/Mellanox/MLNX_WinOF/Tools/ibdiagnet.exe/lib/
				>ibis1.0
				>Loading IBDM from: C:/Program
	
>Files/Mellanox/MLNX_WinOF/Tools/ibdiagnet.exe/lib/
				>ibdm1.0
				>-W- Topology file is not specified.
				>    Reports regarding cluster links
will use direct routes.
				>-I- Using port 2 as the local port.
				>-E- Fail to ibsac_bind.
				
				
				Don't know the details of this tool,
maybe it's running into the same problems as osmtest?  Try running
OpenSM on the other node and see if the problem follows the SM or not.
				
				-Fab
				




				-- 
				regards,
				Ashwath
				




		-- 
		regards,
		Ashwath
		

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20090614/b0b24e33/attachment.html>


More information about the ofw mailing list