[Openib-windows] A few issues running a program that was doing many connects.

Tzachi Dar tzachid at mellanox.co.il
Thu Jul 6 12:48:42 PDT 2006


Hi Fab,
 
In the previous couple of days I was doing a study on a small program.
This program was using WSD and was doing many connects to the remote
side. (the source of it is in the end of this mail).
 
It seems that this program was running, but looking at the task manager
showed that the memory, threads and handles were going up.
 
I have made some investigation of the problem found there and here are
my results:
 
1) Once the program was connecting, the number of WSD sockets that were
opened and not closed was increasing all the time. This seems like a
reason for the leak. When I have stopped the main thread from
connecting, it seems that the sockets have been closed. This is some
mitigation, but If there is something that we can do about it than we
should probably do.
 
2) It seems that the mechanism that allocates a CQ and a thread for it
every time that we reach the maximum size is broken. That is there might
be empty CQ's while new ones will still be opened. This problem will
probably be smaller when resize CQ will be implemented.
 
3) The next problem that I saw was that the handles don't come down even
when I stop from time to time. After some debugging I understood that
the main reason is that this handles represent allocations. When they
are freed they return to a pool that can only increase in size. This is
probably fine.
 
4) The last issue was memory demand that was constantly increasing.
using gflags and umdh.exe I was not able to find any real leak (the
program was still leaking about 50 MB a minute). After some
investigation, I came to conclusion that the memory that was registered
in the cache and later we have used MmSecureVirtualMemory on it was
leaking. (I'll probably have to make some more investigations to fully
understand this, but this is what I currently see). When I have removed
this call from the driver it seems that the problem was over (As the
program got stacked very soon it is hard to be sure).
 
So, My questions are: 1) have you seen this before. 2) One of the WSD
goals is to run everything that Ethernet can. It seems that this simple
program will run forever on Ethernet, but only a few seconds on WSD. Can
we do anything to fix it?
 
Thanks
Tzachi
 
 
 
Code of the program:
 
int client_main(
 IN struct VL_random_t *rand_t)
{
 struct sockaddr_in  serv_addr;
 struct in_addr   remote_addr;
 SOCKET    *socket_array = NULL;
 char    *buf = NULL;
 int    num_sends_per_port = (config.num_connections /
config.num_outs_connections) * config.num_outs_connections;
 int    i;
 int    j;
 int    result = -1;
 int    rc;
 int     count = 0;
 
 
 socket_array = malloc(config.num_outs_connections * config.num_ports *
sizeof(SOCKET));
 CHECK_NOT_EQUAL("malloc", socket_array, NULL, goto cleanup);
 
 memset(socket_array, 0, config.num_outs_connections * config.num_ports
* sizeof(SOCKET));
 
 buf = malloc(config.num_outs_connections * config.num_ports *
config.message_size);
 CHECK_NOT_EQUAL("malloc", buf, NULL, goto cleanup);
 
 remote_addr.S_un.S_addr = inet_addr(config.ip);
 CHECK_NOT_EQUAL("inet_addr", remote_addr.S_un.S_addr, INADDR_NONE, goto
cleanup);
 
 serv_addr.sin_addr = remote_addr;      
 serv_addr.sin_family = AF_INET;
 serv_addr.sin_port = htons((u_short)config.portnum);
 
 for (i = 0; i < config.num_outs_connections * config.num_ports; ++i)
  set_buffer_data(buf +  i * config.message_size, config.message_size, i
% config.num_ports);
 
 for (j = 0; j < num_sends_per_port; j += config.num_outs_connections) {
  for (i = 0; i < config.num_outs_connections * config.num_ports; ++i) {
   if (socket_array[i]) {
    rc = closesocket(socket_array[i]);
    CHECK_VALUE("closesocket", rc, 0, goto cleanup);
 
    socket_array[i] = 0;
   }
 
   socket_array[i] = socket(AF_INET, SOCK_STREAM, 0);
   CHECK_NOT_EQUAL("socket", socket_array[i], INVALID_SOCKET, goto
cleanup);
 
   serv_addr.sin_port = htons((u_short)config.portnum + (i %
config.num_ports));
 
   rc = connect(socket_array[i], (struct sockaddr*)&serv_addr,
sizeof(serv_addr));
   CHECK_NOT_EQUAL("connect", rc, SOCKET_ERROR, printf("i %d j %d
gle=%d\n", i, j, GetLastError()); goto cleanup);
   OutputDebugString("Connect finished\n");
   printf("Connect finished\n");
   count ++;
   if (count % 1000 == 0) {
    printf("sleeping \n");
    Sleep(20000);
   }
 
   rc = do_regular_send(socket_array[i], buf + i * config.message_size,
config.message_size, 0);
   CHECK_VALUE("do_regular_send", rc, 0, goto cleanup);
  }
 }
 
 result = 0;
cleanup:
 
 if (socket_array) {
  for (i = 0; i < config.num_outs_connections * config.num_ports; ++i) {
   if (socket_array[i]) {
    rc = closesocket(socket_array[i]);
    CHECK_NOT_EQUAL("closesocket", rc, SOCKET_ERROR, result = -1;
break);
   }
  }
  free(socket_array);
 }
 
 if (buf)
  free(buf);
 
 return result;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060706/479854c3/attachment.html>


More information about the ofw mailing list