[openfabrics-ewg] OFED 1.0 - error while running ib_rdma_bw

Or Gerlitz ogerlitz at voltaire.com
Sun Jun 18 05:13:06 PDT 2006


Running ib_rdma_bw (eg from the trunk but also with OFED) from time to time outputs the following message:

	server read: Success 
	0/45: Couldn't read remote address 

Looking in the code, line 148 (and actually 142 as well) seems to be buggy:

   133  struct pingpong_dest * pp_client_exch_dest(int sockfd,
   134                                             const struct pingpong_dest *my_dest)
   135  {
   136          struct pingpong_dest *rem_dest = NULL;
   137          char msg[sizeof "0000:000000:000000:00000000:0000000000000000"];
   138          int parsed;
   139
   140          sprintf(msg, "%04x:%06x:%06x:%08x:%016Lx", my_dest->lid, my_dest->qpn,
   141                          my_dest->psn,my_dest->rkey,my_dest->vaddr);
   142          if (write(sockfd, msg, sizeof msg) != sizeof msg) {
   143                  perror("client write");
   144                  fprintf(stderr, "Couldn't send local address\n");
   145                  goto out;
   146          }
   147
   148          if (read(sockfd, msg, sizeof msg) != sizeof msg) {
   149                  perror("client read");
   150                  fprintf(stderr, "Couldn't read remote address\n");
   151                  goto out;
   152          }

as read(2) can read less then the max (expected) bytes count, and indeed error is 0 (no error)
when the print is seen.

The below script wouls allow you to easily reproduce it.

At some point, there's also an IB completion with error printed, but it might be realated to the socket handling bug

Or.

SERVER=dill
echo "" 
for i in 16384 32768 65536 131072 262144 524288 1048576 2097152 
do 
for k in 4 
do 
ssh $SERVER "/usr/local/ofed/bin/ib_rdma_bw" & 
sleep 5 
echo $(date) -s = $i -n = $((512*1024*1024/$i)) -t = $k start 
/usr/local/ofed/bin/ib_rdma_bw $SERVER -s $i -n $((512*1024*1024/$i)) -t $k 
echo $(date) -s = $i -n = $((512*1024*1024/$i)) sleeping 3 seconds..... 
sleep 3 
echo $(date) -s = $i -n = $((512*1024*1024/$i)) end 
echo "" 
wait 
done 
done 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20060618/1176fb1c/attachment.html>


More information about the ewg mailing list