[openib-general] uDAPL open HCA problem
LEI CHAI
chai.15 at osu.edu
Fri Oct 21 18:14:16 PDT 2005
Hi,
Thank you very much for your reply. Now the open HCA problem comes back :-(
Here is the log message:
[chail at ro0] mpiexec -n 2 ./a.out
DAPL: NOT Setting Loopback
dapl_ib_init:
ib_thread_init(12016)
dapl_ia_open (ib0, 8, 0x7ffffff28668, 0xd9da48)
open_hca: mthca0 - 0xdb3390
ib_thread(12016,0x40200960): ENTER: pipe 8 at 4
open_hca: Found dev mthca0 0002c902004002e8
open_hca: GID subnet fe80000000000000 id 0002c902004002e9
ips_by_gid: RET 0 at_rec 0x7ffffff283d0 -> id 2861
dapli_at_event_cb()
ip_comp_handler: rec 0x7ffffff283d0 ->id 2861 id 2861 num -22 3afa6000
ip_comp_handler: resolution err -22 retry 1
ip_comp_handler: ips_by_gid 0 rec 0x7ffffff283d0->id 2862
dapli_at_event_cb()
ip_comp_handler: rec 0x7ffffff283d0 ->id 2862 id 2862 num -22 0
ip_comp_handler: resolution err -22 retry 2
ip_comp_handler: ips_by_gid 0 rec 0x7ffffff283d0->id 2863
dapli_at_event_cb()
ip_comp_handler: rec 0x7ffffff283d0 ->id 2863 id 2863 num -22 0
ip_comp_handler: resolution err -22 retry 3
ip_comp_handler: ips_by_gid 0 rec 0x7ffffff283d0->id 2864
dapli_at_event_cb()
ip_comp_handler: rec 0x7ffffff283d0 ->id 2864 id 2864 num -22 0
ip_comp_handler: resolution err -22 retry 4
ip_comp_handler: ERR: at_rec 0x7ffffff283d0, id 2864 num -22
open_hca: ERR ib_at_ips_by_gid for mthca0
dapls_ib_open_hca failed 40000
dapl_ia_open () returns 0x40000
DAPL: Stopped (dapl_fini)
dapl_ib_release:
ib_thread_destroy(12016)
ib_thread_destroy: waiting for ib_thread
ib_thread(12016) EXIT
[rdma_udapl_priv.c:640] error(262144): Cannot open IA
DAPL: NOT Setting Loopback
dapl_ib_init:
ib_thread_init(11337)
dapl_ia_open (ib0, 8, 0x7fffffa8d618, 0xd9da48)
open_hca: mthca0 - 0xdb3390
ib_thread(11337,0x40800960): ENTER: pipe 8 at 4
open_hca: Found dev mthca0 0002c90200400314
open_hca: GID subnet fe80000000000000 id 0002c90200400315
ips_by_gid: RET 0 at_rec 0x7fffffa8d380 -> id 4627
dapli_at_event_cb()
ip_comp_handler: rec 0x7fffffa8d380 ->id 4627 id 4627 num -22 3c66c000
ip_comp_handler: resolution err -22 retry 1
ip_comp_handler: ips_by_gid 0 rec 0x7fffffa8d380->id 4628
dapli_at_event_cb()
ip_comp_handler: rec 0x7fffffa8d380 ->id 4628 id 4628 num -22 0
ip_comp_handler: resolution err -22 retry 2
[rdma_udapl_priv.c:640] error(262144): Cannot open IA
ip_comp_handler: ips_by_gid 0 rec 0x7fffffa8d380->id 4629
dapli_at_event_cb()
ip_comp_handler: rec 0x7fffffa8d380 ->id 4629 id 4629 num -22 0
ip_comp_handler: resolution err -22 retry 3
ip_comp_handler: ips_by_gid 0 rec 0x7fffffa8d380->id 4630
dapli_at_event_cb()
ip_comp_handler: rec 0x7fffffa8d380 ->id 4630 id 4630 num -22 0
ip_comp_handler: resolution err -22 retry 4
ip_comp_handler: ERR: at_rec 0x7fffffa8d380, id 4630 num -22
open_hca: ERR ib_at_ips_by_gid for mthca0
dapls_ib_open_hca failed 40000
dapl_ia_open () returns 0x40000
DAPL: Stopped (dapl_fini)
dapl_ib_release:
ib_thread_destroy(11337)
ib_thread_destroy: waiting for ib_thread
ib_thread(11337) EXIT
ib_thread_destroy(12016) exit
rank 0 in job 421 ro0_33361 caused collective abort of all ranks
exit status of rank 0: return code 1
Any idea what is going on?
Thanks.
Lei
----- Original Message -----
From: Roland Dreier <rolandd at cisco.com>
Date: Friday, October 21, 2005 7:48 pm
Subject: Re: [openib-general] uDAPL open HCA problem
> LEI> Hi, I'm from the same lab as Sayantan. Thanks for your
> LEI> suggestion. Currently we could not reproduce the problem,
> LEI> however, we meet another problem. When I try to tear
> down a
> LEI> connection between two nodes I often get some messages like
> LEI> this:
>
> LEI> [ 0] 005e0406 [ 4] 00000000 [ 8] 00000000 [ c] 00000000
> LEI> [10] 05f90000 [14] 00000000 [18] 00000008 [1c] fe100000
>
> That's OK, it's just showing that you polled a "work request flushed"
> status from a completion queue. The latest version of libmthca should
> no longer print these messages.
>
> - R.
>
More information about the general
mailing list