[openib-general] kdapltest regression? failing now...
James Lentini
jlentini at netapp.com
Thu May 19 18:43:44 PDT 2005
I commited a fix for this in revision 2420. The problem turned out to
be that DAPL wasn't initializing the max_inline_data value of the QP
attr's cap structure.
Let me know if you still have any problems.
There is a patch in the pipeline that will remove the IBAT printout
you mentioned.
james
On Thu, 19 May 2005, James Lentini wrote:
>
> I think I figure this out. DAPL was assuming a particular maximum scatter
> gather list size. I'm going to change it to query for this value. Hopefully
> I'll have a fix shortly.
>
> james
>
> On Thu, 19 May 2005, James Lentini wrote:
>
>>
>> For what it's worth, this is the check that we are "failing":
>>
>> qp->sq.max_gs > dev->limits.max_sg
>>
>> ( qp->sq.max_gs + 2 > dev->limits.max_sg is also true but
>> qp->transport == MLX is not).
>>
>> On Thu, 19 May 2005, James Lentini wrote:
>>
>>>
>>> I'm looking into this Tom.
>>>
>>> The following code was added to hw/mthca/mthca_qp.c on Friday
>>> (starting on line 1233):
>>>
>>>
>>> if ((qp->transport == MLX && qp->sq.max_gs + 2 > dev->limits.max_sg) ||
>>> qp->sq.max_gs > dev->limits.max_sg || qp->rq.max_gs >
>>> dev->limits.max_sg)
>>> return -EINVAL;
>>>
>>> If anyone knows what we have set incorrectly, please let me know.
>>>
>>> Thanks,
>>> james
>>>
>>> On Thu, 19 May 2005, Tom Duffy wrote:
>>>
>>> tduffy> I am not sure when this started, but after updating to top of
>>> trunk*, I
>>> tduffy> can no longer get kdapltest to work properly. Both ipoib and sdp
>>> are
>>> tduffy> working.
>>> tduffy>
>>> tduffy> Both server and client are returning an error: DAT_INVALID_HANDLE.
>>> This
>>> tduffy> is coming from ib_create_qp(). With debugging turned on:
>>> tduffy>
>>> tduffy> [root at flopteron2 ~]# ./kdapltest -T S -D mthca0a -d
>>> tduffy> kDAPL: dapl_ia_open (mthca0a, 8, ffff81000b806308,
>>> ffff81000b8062d8)
>>> tduffy> kDAPL: dapl_ia_open () returns 0x0
>>> tduffy> kDAPL: dapl_pz_create (ffff81001ba165c8, ffff81000b8062e0)
>>> tduffy> kDAPL: dapl_evd_kcreate (ffff81001ba165c8, 8, 1, upcall, 0x20,
>>> ffff81000b8062e8)
>>> tduffy> kDAPL: dapl_evd_kcreate (ffff81001ba165c8, 8, 1, upcall, 0xa0,
>>> ffff81000b8062f0)
>>> tduffy> kDAPL: dapl_evd_kcreate (ffff81001ba165c8, 8, 1, upcall, 0x10,
>>> ffff81000b806300)
>>> tduffy> kDAPL: dapl_evd_kcreate (ffff81001ba165c8, 8, 1, upcall, 0x40,
>>> ffff81000b8062f8)
>>> tduffy> kDAPL: dapl_ep_create (ffff81001ba165c8, ffff81001b9442c8,
>>> ffff81001ba164b0, ffff81001ba166e0, ffff81001ba22050, 0000000000000000,
>>> ffff81000b806318)
>>> tduffy> kDAPL: dapl_ib_qp_alloc: ib_create_qp failed = -22
>>> tduffy> kDAPL: dapl_evd_free (ffff81001ba22050)
>>> tduffy> kDAPL: dapl_evd_free () returns 0x0
>>> tduffy> kDAPL: dapl_evd_free (ffff81001ba22168)
>>> tduffy> kDAPL: dapl_evd_free () returns 0x0
>>> tduffy> kDAPL: dapl_evd_free (ffff81001ba166e0)
>>> tduffy> kDAPL: dapl_evd_free () returns 0x0
>>> tduffy> kDAPL: dapl_evd_free (ffff81001ba164b0)
>>> tduffy> kDAPL: dapl_evd_free () returns 0x0
>>> tduffy> kDAPL: dapl_pz_free (ffff81001b9442c8)
>>> tduffy> kDAPL: dapl_ia_query (ffff81001ba165c8, 0000000000000000,
>>> 0000000000000000, ffff81001bba7b28)
>>> tduffy> kDAPL: dapl_ia_query () returns 0x0
>>> tduffy> kDAPL: dapl_ia_close (ffff81001ba165c8, 1)
>>> tduffy> kDAPL: dapl_evd_free (ffff81001ba167f8)
>>> tduffy> kDAPL: dapl_evd_free () returns 0x0
>>> tduffy> Server_Cmd.debug: 1
>>> tduffy> Server_Cmd.dapl_name: mthca0a
>>> tduffy> DT_cs_Server: IA mthca0a opened
>>> tduffy> DT_cs_Server: PZ created
>>> tduffy> DT_cs_Server: dat_ep_create error: DAT_INVALID_HANDLE
>>> tduffy> DT_cs_Server: Waiting for clients to all go away...
>>> tduffy> DT_cs_Server: Cleaning up ...
>>> tduffy> DT_cs_Server: IA mthca0a closed
>>> tduffy> DT_cs_Server (mthca0a): Exiting.
>>> tduffy> TEST INSTANCE 0
>>> tduffy> TEST return code = 1
>>> tduffy>
>>> tduffy> Also, the ib_at module prints this out now when you ping (after
>>> running
>>> tduffy> kdapltest)...
>>> tduffy>
>>> tduffy> ib_at: ib_at_arp_work: Process IB ARP ip <192.168.0.26> gid
>>> <0xfe800000000000000002c9010a99e031>
>>> tduffy>
>>> tduffy> -tduffy
>>> tduffy>
>>> tduffy> * running x86_64 SMP, 2.6.12-rc4, gcc 4.0.0-6, OpenIB r2414,
>>> opensm r2414 2 machines back-2-back
>>> tduffy>
>>>
>>
>
More information about the general
mailing list