[Openib-windows] Receive Queue

Guy Corem guyc at voltaire.com
Sun Mar 26 07:10:53 PST 2006


 

Hi Fabian and all,

 

While testing the Pallas reduce_scatter over IPoIB, with 3 hosts using
the following command line:

 

mpiexec.exe -env MPICH_NETMASK 10.0.0.0/255.255.255.0 -hosts 3 10.0.0.1
10.0.0.2 10.0.0.3 PMB_MPI1.exe reduce_scatter

 

I've discovered that the receive queue (default to 128 packets) is being
exhausted at the second host (10.0.0.2).

 

The TCP/IP stack is holding all the 128 packets, and the only way to
regain connectivity with this host (even ping are not working, of
course) is to kill the smpd application (or wait enough time for TCP
timeouts to expired - although I didn't actually test it).

 

When setting the receive queue to 1024 packets, the problem didn't
occur.

 

All my machines are 2-way SMPs. When running with /ONECPU boot.ini
parameter, the problem occurred, but less frequently.

 

My questions:

1.	Have you encounter similar situations?
2.	I've noticed the "Receive Pool Growth" parameter - but it
doesn't seem to be "connected". Why ? If I would like to "connect" it
(i.e. write the appropriate code to handle queue growth) what should be
done and where?
3.	And I really don't know if someone can answer this: Why does the
Windows TCP/IP stack behave in such a way ? Why it doesn't copy the
packets in case of extreme situations like the above ?

 

Thanks,

 

Guy Corem
Windows Developer
Voltaire Ltd.
Mobile: +972-50-7321946
Tel: +972-9-9717672
Fax: +972-9-9717660
Operator: +972-9-9717666
email: guyc at voltaire.com
web: http://www.voltaire.com

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060326/5d0d42ef/attachment.html>


More information about the ofw mailing list