[ofa-general] uDAPL libdat2.so version [PATCH] udapl v1 and v2 - dat_create_psp_any() seed value wrong

Tang, Changqing changquing.tang at hp.com
Sat Feb 9 08:38:28 PST 2008


I am testing OFED 1.3 udapl v1, I have three nodes, n1, n2, and n3,
if I run two ranks between n1 and n2, it works, n2 and n3, it works again,
but if I run between n1 and n3, it fails with:

dat_cr_accept() failed: DAT_INTERNAL_ERROR

What could be the reason ? I did not change anything else except the
node to run. Thanks for help.


--CQ


> -----Original Message-----
> From: Arlin Davis [mailto:ardavis at ichips.intel.com]
> Sent: Friday, February 08, 2008 5:19 PM
> To: Tang, Changqing
> Cc: OpenFabrics General; James Lentini
> Subject: Re: [ofa-general] uDAPL libdat2.so version [PATCH]
> udapl v1 and v2 - dat_create_psp_any() seed value wrong
>
> Tang, Changqing wrote:
> > Arlin:
> >         I am running today's OFED tarball uDAPL v1 version,
> pure RDMA
> > works, but if I switch to SRQ mode, I got segfault in
> > dat_srq_create(), I checked the parameters to
> dat_srq_create(), I don't see anything wrong:
> >
> > Core was generated by `/mpiscratch/ctang/test/pp.x'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x00002aaaabda5c3b in dat_srq_create () from
> /usr/lib64/libdat.so
> >
> > (gdb) print hpmp_udapl->ia_handle
> > $7 = (DAT_IA_HANDLE) 0x1
> > (gdb) print hpmp_udapl->pz_handle
> > $8 = (DAT_PZ_HANDLE) 0xc4540e0
> > (gdb) print srq_attr
> > $9 = {max_recv_dtos = 16, max_recv_iov = 1, low_watermark = 0}
> > (gdb) print &srq_attr
> > $10 = (DAT_SRQ_ATTR *) 0x7fffe64fb760
> > (gdb) print &hpmp_udapl->srq_handle
> > $11 = (DAT_SRQ_HANDLE *) 0xc448bb8
> >
> >
> > Do you have any idea ?
>
> Did you have SRQ working on previous versions?
>
> I am not certain that the v1.2 SRQ implementation has ever
> been fully tested.
>
> James, can you shed some light on SRQ DAPL code status?
>
> -arlin
>



More information about the general mailing list