[ofa-general] SDP and iWARP
Craig Prescott
prescott at hpc.ufl.edu
Thu Jan 10 07:07:17 PST 2008
Steve Wise wrote:
>
> First make sure the sdp kernel module uses the rdma cma. Then I'd add
> printk hooks in cma.c, addr.c, and iwcm.c to see what's going on and
> where things are failing. Also a wire trace is good if we're getting
> that far (like at least doing arp resolution).
>
Small update - a little progress. printk's spinkled liberally and
ib_sdp debug options turned on. The initial problem was on the
listener during an IW_CM_EVENT_CONNECT_REQUEST event; the SDP hello
header was rejected in sdp_cma.c:sdp_connect_handler() because its
max_adverts field was zero, which is not permissible. In fact, all
of the sdp_hh fields were zero.
Comparing with the RDMA_TRANSPORT_IB case, I saw that
cma.c:cma_connect_ib() does some work to create the SDP header
via cma_format_hdr(). But cma_connect_iw() did not.
I patched cma_connect_iw() to create the SDP header as
cma_connect_ib() does. This gets us farther - examining the
SDP header on the listener side looks right now, and the
listener at least enters rdma_accept(), but iw_cm_accept()
fails due to cm_id->device->iwcm->accept(cm_id, iw_param)
returning -104. The above call also emits a couple of messages
into the listener's syslog now :
Jan 9 21:53:54 tebow2 kernel: iwch_ev_dispatch - CQE Err qpid 0x20
opcode 14 status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
Jan 9 21:53:54 tebow2 kernel: post_qp_event - AE qpid 0x20 opcode 14
status 0x6 type 1 wrid.hi 0x0 wrid.lo 0x80000000
In the end, we still end up in rdma_reject(). Will keep digging.
Cheers,
Craig
More information about the general
mailing list