[ewg] Program Fails to Send/Receive Successfully

Greg I Kerr kerr.g at husky.neu.edu
Thu Mar 17 14:20:26 PDT 2011


Hi,

I'm working on an Infiniband program. When I run ibv_post_send and
ibv_post_recv on both nodes, no error is returned, but ibv_poll_cq
never finds any completions on the queue. I was wondering what could
be the cause of this, since I've spent a few days looking now.

The dest_qp_num seems fine on both nodes, as do the rq_psn and sq_psn.
I'm not where else there could be a problem.

To provide more background information I ran ibdump on my program (on
both nodes) and then analyzed the output in Wireshark. Basically node1
shows nothing but RC Acknowledge packets and Node 2 shows nothing but
RC Send First packets. Does that reveal anything about where the
problem likely lies?

Of course if I look at the output of, say, ibv_rc_pingpong in
Wireshark both nodes show RC Send First, RC Send Middle, and RC Send
Last packets, among others.

I know this is too vague to really pinpoint my problem but I am hoping
someone can nudge me in the right direction of where I might try
looking (since nothing I've looked at so far has identified any clear
problems).

Thanks,

Greg Kerr



More information about the ewg mailing list