<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2654.45">
<TITLE>RE: calling to ibv_create_qp with big number in qp_init_attr.cap.max_ inline_data never return</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2>here is a test that reproduces the problem:</FONT>
</P>
<BR>
<P><FONT SIZE=2>#include <stdlib.h></FONT>
<BR><FONT SIZE=2>#include <stdio.h></FONT>
<BR><FONT SIZE=2>#include <infiniband/verbs.h></FONT>
</P>
<P><FONT SIZE=2>#define PORT_NUM 1 /* IB port number to work with */</FONT>
<BR><FONT SIZE=2>#define MR_SIZE (1024) /* size of the MR */</FONT>
<BR><FONT SIZE=2>#define CQ_SIZE 10 /* size of the CQ */</FONT>
<BR><FONT SIZE=2>#define QP_CAP_SG_WR 1 /* s/g and w/r of the QP */</FONT>
</P>
<BR>
<P><FONT SIZE=2>/* structure of test parameters */</FONT>
<BR><FONT SIZE=2>struct config_t {</FONT>
<BR><FONT SIZE=2> uint32_t tcp_port;</FONT>
<BR><FONT SIZE=2> const char* ip;</FONT>
<BR><FONT SIZE=2> const char *dev_name;</FONT>
<BR><FONT SIZE=2> uint8_t ib_port;</FONT>
</P>
<P><FONT SIZE=2> uint8_t is_daemon;</FONT>
</P>
<P><FONT SIZE=2>};</FONT>
</P>
<P><FONT SIZE=2>struct config_t config = {</FONT>
<BR><FONT SIZE=2> 20000, /* tcp_port */</FONT>
<BR><FONT SIZE=2> "0", /* ip */</FONT>
<BR><FONT SIZE=2> "mthca0", /* ib_dev */</FONT>
<BR><FONT SIZE=2> PORT_NUM, /* ib_port */</FONT>
<BR><FONT SIZE=2>};</FONT>
</P>
<BR>
<P><FONT SIZE=2>int main(</FONT>
<BR><FONT SIZE=2> int argc,</FONT>
<BR><FONT SIZE=2> char *argv[])</FONT>
<BR><FONT SIZE=2>{</FONT>
<BR><FONT SIZE=2> int test_result = 1;</FONT>
<BR><FONT SIZE=2> struct dlist *dev_list;</FONT>
<BR><FONT SIZE=2> struct ibv_device *ib_dev = NULL;</FONT>
<BR><FONT SIZE=2> struct ibv_context *ctx = NULL;</FONT>
<BR><FONT SIZE=2> struct ibv_pd *pd = NULL;</FONT>
<BR><FONT SIZE=2> struct ibv_cq *rcq = NULL;</FONT>
<BR><FONT SIZE=2> struct ibv_cq *scq = NULL;</FONT>
<BR><FONT SIZE=2> struct ibv_qp *qp = NULL;</FONT>
<BR><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2> printf("Finding IB devices\n");</FONT>
<BR><FONT SIZE=2> /* get device names in the system */</FONT>
<BR><FONT SIZE=2> dev_list = ibv_get_devices();</FONT>
<BR><FONT SIZE=2> if (dev_list == NULL) {</FONT>
<BR><FONT SIZE=2> perror("Error, failed to get IB devices list");</FONT>
<BR><FONT SIZE=2> goto cleanup;</FONT>
<BR><FONT SIZE=2> }</FONT>
<BR><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2> dlist_for_each_data(dev_list, ib_dev, struct ibv_device)</FONT>
<BR><FONT SIZE=2> if (!strcmp(ibv_get_device_name(ib_dev), config.dev_name)) {</FONT>
<BR><FONT SIZE=2> break;</FONT>
<BR><FONT SIZE=2> }</FONT>
</P>
<P><FONT SIZE=2> if (ib_dev == NULL) {</FONT>
<BR><FONT SIZE=2> printf("Error, IB device %s wasn't found\n", config.dev_name);</FONT>
<BR><FONT SIZE=2> goto cleanup;</FONT>
<BR><FONT SIZE=2> }</FONT>
<BR><FONT SIZE=2> printf("Device %s was found\n", config.dev_name);</FONT>
<BR><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2> ctx = ibv_open_device(ib_dev);</FONT>
<BR><FONT SIZE=2> if (ctx == NULL) {</FONT>
<BR><FONT SIZE=2> perror("Error, failed to open device");</FONT>
<BR><FONT SIZE=2> goto cleanup;</FONT>
<BR><FONT SIZE=2> }</FONT>
<BR><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2> pd = ibv_alloc_pd(ctx);</FONT>
<BR><FONT SIZE=2> if (pd == NULL) {</FONT>
<BR><FONT SIZE=2> printf("Error, failed to allocate PD\n");</FONT>
<BR><FONT SIZE=2> goto cleanup;</FONT>
<BR><FONT SIZE=2> }</FONT>
<BR><FONT SIZE=2> printf("PD was allocated\n");</FONT>
</P>
<P><FONT SIZE=2> rcq = ibv_create_cq(ctx, CQ_SIZE, NULL);</FONT>
<BR><FONT SIZE=2> if (rcq == NULL) {</FONT>
<BR><FONT SIZE=2> perror("Error, failed to create receive CQ");</FONT>
<BR><FONT SIZE=2> goto cleanup;</FONT>
<BR><FONT SIZE=2> }</FONT>
<BR><FONT SIZE=2> printf("Receive CQ was created with %u entries\n", rcq->cqe);</FONT>
</P>
<P><FONT SIZE=2> scq = ibv_create_cq(ctx, CQ_SIZE, NULL);</FONT>
<BR><FONT SIZE=2> if (scq == NULL) {</FONT>
<BR><FONT SIZE=2> perror("Error, failed to create send CQ");</FONT>
<BR><FONT SIZE=2> goto cleanup;</FONT>
<BR><FONT SIZE=2> }</FONT>
<BR><FONT SIZE=2> printf("Send was created with %u entries\n", scq->cqe);</FONT>
</P>
<P><FONT SIZE=2> {</FONT>
<BR><FONT SIZE=2> struct ibv_qp_init_attr qp_init_attr = {</FONT>
<BR><FONT SIZE=2> .qp_type = IBV_QPT_RC,</FONT>
<BR><FONT SIZE=2> .recv_cq = rcq,</FONT>
<BR><FONT SIZE=2> .send_cq = scq,</FONT>
<BR><FONT SIZE=2> .sq_sig_all = 0,</FONT>
<BR><FONT SIZE=2> .cap.max_send_wr = QP_CAP_SG_WR,</FONT>
<BR><FONT SIZE=2> .cap.max_send_sge = QP_CAP_SG_WR,</FONT>
<BR><FONT SIZE=2> .cap.max_recv_wr = QP_CAP_SG_WR,</FONT>
<BR><FONT SIZE=2> .cap.max_recv_sge = QP_CAP_SG_WR</FONT>
<BR><FONT SIZE=2> };</FONT>
</P>
<P><FONT SIZE=2> qp_init_attr.cap.max_inline_data = 1075060724;</FONT>
<BR><FONT SIZE=2> </FONT>
<BR><FONT SIZE=2> printf("before calling create QP\n");</FONT>
<BR><FONT SIZE=2> qp = ibv_create_qp(pd, &qp_init_attr);</FONT>
<BR><FONT SIZE=2> printf("after calling create QP\n");</FONT>
<BR><FONT SIZE=2> if (qp == NULL) {</FONT>
<BR><FONT SIZE=2> perror("Error, failed to create QP");</FONT>
<BR><FONT SIZE=2> goto cleanup;</FONT>
<BR><FONT SIZE=2> }</FONT>
<BR><FONT SIZE=2> }</FONT>
<BR><FONT SIZE=2> printf("QP with number 0x%x was created\n", qp->qp_num);</FONT>
</P>
<P><FONT SIZE=2> test_result = 0;</FONT>
<BR><FONT SIZE=2>cleanup:</FONT>
<BR><FONT SIZE=2> return test_result;</FONT>
<BR><FONT SIZE=2>}</FONT>
</P>
<BR>
<BR>
<P><FONT SIZE=2>Dotan</FONT>
</P>
<P><FONT SIZE=2>-----Original Message-----</FONT>
<BR><FONT SIZE=2>From: Michael S. Tsirkin [<A HREF="mailto:mst@mellanox.co.il">mailto:mst@mellanox.co.il</A>]</FONT>
<BR><FONT SIZE=2>Sent: Monday, July 18, 2005 3:36 PM</FONT>
<BR><FONT SIZE=2>To: Roland Dreier</FONT>
<BR><FONT SIZE=2>Cc: Dotan Barak; openib-general@openib.org</FONT>
<BR><FONT SIZE=2>Subject: Re: calling to ibv_create_qp with big number in</FONT>
<BR><FONT SIZE=2>qp_init_attr.cap.max_ inline_data never return</FONT>
</P>
<BR>
<P><FONT SIZE=2>Hi, Roland!</FONT>
<BR><FONT SIZE=2>Quoting r. Roland Dreier <rolandd@cisco.com>:</FONT>
<BR><FONT SIZE=2>> Subject: Re: calling to ibv_create_qp with big number in qp_init_attr.cap.max_ inline_data never return</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Dotan> the create_qp function never ends.</FONT>
<BR><FONT SIZE=2>> </FONT>
<BR><FONT SIZE=2>> Where does it hang? Can you do strace on the process? If it's stuck</FONT>
<BR><FONT SIZE=2>> sleeping, what does /proc/<pid>/wchan say?</FONT>
</P>
<P><FONT SIZE=2>Here:</FONT>
</P>
<P><FONT SIZE=2> size = sizeof (struct mthca_next_seg) +</FONT>
<BR><FONT SIZE=2> qp->sq.max_gs * sizeof (struct mthca_data_seg);</FONT>
<BR><FONT SIZE=2> switch (qp->qpt) {</FONT>
<BR><FONT SIZE=2> case IBV_QPT_UD:</FONT>
<BR><FONT SIZE=2> if (mthca_is_memfree(pd->context))</FONT>
<BR><FONT SIZE=2> size += sizeof (struct mthca_arbel_ud_seg);</FONT>
<BR><FONT SIZE=2> else</FONT>
<BR><FONT SIZE=2> size += sizeof (struct mthca_tavor_ud_seg);</FONT>
<BR><FONT SIZE=2> break;</FONT>
<BR><FONT SIZE=2> default:</FONT>
<BR><FONT SIZE=2> /* bind seg is as big as atomic + raddr segs */</FONT>
<BR><FONT SIZE=2> size += sizeof (struct mthca_bind_seg);</FONT>
<BR><FONT SIZE=2> }</FONT>
</P>
<P><FONT SIZE=2>----></FONT>
</P>
<P><FONT SIZE=2> for (qp->sq.wqe_shift = 6; 1 << qp->sq.wqe_shift < size;</FONT>
<BR><FONT SIZE=2> qp->sq.wqe_shift++)</FONT>
<BR><FONT SIZE=2> ; /* nothing */</FONT>
</P>
<BR>
<P><FONT SIZE=2>The problem here is that size is bigger than 0x40000000.</FONT>
<BR><FONT SIZE=2>As a result 1 << qp->sq.wqe_shift gets to 0x80000000, which is negative,</FONT>
<BR><FONT SIZE=2>so its less than size, and everything starts all over again.</FONT>
</P>
<P><FONT SIZE=2>Looking at the code, passing insanely huge values in qp params</FONT>
<BR><FONT SIZE=2>will get all kind of overflows (e.g. size could get negative).</FONT>
</P>
<P><FONT SIZE=2>I think the best way is to check qp parameters for sanity in</FONT>
<BR><FONT SIZE=2>mthca_create_qp.</FONT>
</P>
<P><FONT SIZE=2>-- </FONT>
<BR><FONT SIZE=2>MST</FONT>
</P>
</BODY>
</HTML>