[openfabrics-ewg] OFED 1.0 - error while running ib_rdma_bw
Or Gerlitz
ogerlitz at voltaire.com
Sun Jun 18 05:13:06 PDT 2006
Running ib_rdma_bw (eg from the trunk but also with OFED) from time to time outputs the following message:
server read: Success
0/45: Couldn't read remote address
Looking in the code, line 148 (and actually 142 as well) seems to be buggy:
133 struct pingpong_dest * pp_client_exch_dest(int sockfd,
134 const struct pingpong_dest *my_dest)
135 {
136 struct pingpong_dest *rem_dest = NULL;
137 char msg[sizeof "0000:000000:000000:00000000:0000000000000000"];
138 int parsed;
139
140 sprintf(msg, "%04x:%06x:%06x:%08x:%016Lx", my_dest->lid, my_dest->qpn,
141 my_dest->psn,my_dest->rkey,my_dest->vaddr);
142 if (write(sockfd, msg, sizeof msg) != sizeof msg) {
143 perror("client write");
144 fprintf(stderr, "Couldn't send local address\n");
145 goto out;
146 }
147
148 if (read(sockfd, msg, sizeof msg) != sizeof msg) {
149 perror("client read");
150 fprintf(stderr, "Couldn't read remote address\n");
151 goto out;
152 }
as read(2) can read less then the max (expected) bytes count, and indeed error is 0 (no error)
when the print is seen.
The below script wouls allow you to easily reproduce it.
At some point, there's also an IB completion with error printed, but it might be realated to the socket handling bug
Or.
SERVER=dill
echo ""
for i in 16384 32768 65536 131072 262144 524288 1048576 2097152
do
for k in 4
do
ssh $SERVER "/usr/local/ofed/bin/ib_rdma_bw" &
sleep 5
echo $(date) -s = $i -n = $((512*1024*1024/$i)) -t = $k start
/usr/local/ofed/bin/ib_rdma_bw $SERVER -s $i -n $((512*1024*1024/$i)) -t $k
echo $(date) -s = $i -n = $((512*1024*1024/$i)) sleeping 3 seconds.....
sleep 3
echo $(date) -s = $i -n = $((512*1024*1024/$i)) end
echo ""
wait
done
done
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20060618/1176fb1c/attachment.html>
More information about the ewg
mailing list