[openib-general] uDAPL open HCA problem

LEI CHAI chai.15 at osu.edu
Fri Oct 21 18:14:16 PDT 2005


Hi,
Thank you very much for your reply. Now the open HCA problem comes back :-(

Here is the log message:

[chail at ro0] mpiexec -n 2 ./a.out
DAPL: NOT Setting Loopback
 dapl_ib_init:
 ib_thread_init(12016)
dapl_ia_open (ib0, 8, 0x7ffffff28668, 0xd9da48)
 open_hca: mthca0 - 0xdb3390
 ib_thread(12016,0x40200960): ENTER: pipe 8 at 4
 open_hca: Found dev mthca0 0002c902004002e8
 open_hca: GID subnet fe80000000000000 id 0002c902004002e9
 ips_by_gid: RET 0 at_rec 0x7ffffff283d0 -> id 2861
 dapli_at_event_cb()
 ip_comp_handler: rec 0x7ffffff283d0 ->id 2861 id 2861 num -22 3afa6000
 ip_comp_handler: resolution err -22 retry 1
 ip_comp_handler: ips_by_gid 0 rec 0x7ffffff283d0->id 2862
 dapli_at_event_cb()
 ip_comp_handler: rec 0x7ffffff283d0 ->id 2862 id 2862 num -22 0
 ip_comp_handler: resolution err -22 retry 2
 ip_comp_handler: ips_by_gid 0 rec 0x7ffffff283d0->id 2863
 dapli_at_event_cb()
 ip_comp_handler: rec 0x7ffffff283d0 ->id 2863 id 2863 num -22 0
 ip_comp_handler: resolution err -22 retry 3
 ip_comp_handler: ips_by_gid 0 rec 0x7ffffff283d0->id 2864
 dapli_at_event_cb()
 ip_comp_handler: rec 0x7ffffff283d0 ->id 2864 id 2864 num -22 0
 ip_comp_handler: resolution err -22 retry 4
 ip_comp_handler: ERR: at_rec 0x7ffffff283d0, id 2864 num -22
 open_hca: ERR ib_at_ips_by_gid for mthca0
dapls_ib_open_hca failed 40000
dapl_ia_open () returns 0x40000
DAPL: Stopped (dapl_fini)
 dapl_ib_release:
 ib_thread_destroy(12016)
 ib_thread_destroy: waiting for ib_thread
 ib_thread(12016) EXIT
[rdma_udapl_priv.c:640] error(262144): Cannot open IA
DAPL: NOT Setting Loopback
 dapl_ib_init:
 ib_thread_init(11337)
dapl_ia_open (ib0, 8, 0x7fffffa8d618, 0xd9da48)
 open_hca: mthca0 - 0xdb3390
 ib_thread(11337,0x40800960): ENTER: pipe 8 at 4
 open_hca: Found dev mthca0 0002c90200400314
 open_hca: GID subnet fe80000000000000 id 0002c90200400315
 ips_by_gid: RET 0 at_rec 0x7fffffa8d380 -> id 4627
 dapli_at_event_cb()
 ip_comp_handler: rec 0x7fffffa8d380 ->id 4627 id 4627 num -22 3c66c000
 ip_comp_handler: resolution err -22 retry 1
 ip_comp_handler: ips_by_gid 0 rec 0x7fffffa8d380->id 4628
 dapli_at_event_cb()
 ip_comp_handler: rec 0x7fffffa8d380 ->id 4628 id 4628 num -22 0
 ip_comp_handler: resolution err -22 retry 2
[rdma_udapl_priv.c:640] error(262144): Cannot open IA
 ip_comp_handler: ips_by_gid 0 rec 0x7fffffa8d380->id 4629
 dapli_at_event_cb()
 ip_comp_handler: rec 0x7fffffa8d380 ->id 4629 id 4629 num -22 0
 ip_comp_handler: resolution err -22 retry 3
 ip_comp_handler: ips_by_gid 0 rec 0x7fffffa8d380->id 4630
 dapli_at_event_cb()
 ip_comp_handler: rec 0x7fffffa8d380 ->id 4630 id 4630 num -22 0
 ip_comp_handler: resolution err -22 retry 4
 ip_comp_handler: ERR: at_rec 0x7fffffa8d380, id 4630 num -22
 open_hca: ERR ib_at_ips_by_gid for mthca0
dapls_ib_open_hca failed 40000
dapl_ia_open () returns 0x40000
DAPL: Stopped (dapl_fini)
 dapl_ib_release:
 ib_thread_destroy(11337)
 ib_thread_destroy: waiting for ib_thread
 ib_thread(11337) EXIT
 ib_thread_destroy(12016) exit
rank 0 in job 421  ro0_33361   caused collective abort of all ranks
  exit status of rank 0: return code 1

Any idea what is going on?

Thanks.
Lei




----- Original Message -----
From: Roland Dreier <rolandd at cisco.com>
Date: Friday, October 21, 2005 7:48 pm
Subject: Re: [openib-general] uDAPL open HCA problem

>    LEI> Hi, I'm from the same lab as Sayantan. Thanks for your
>    LEI> suggestion. Currently we could not reproduce the problem,
>    LEI> however, we meet another problem.  When I try to tear 
> down a
>    LEI> connection between two nodes I often get some messages like
>    LEI> this:
> 
>    LEI>   [ 0] 005e0406 [ 4] 00000000 [ 8] 00000000 [ c] 00000000
>    LEI> [10] 05f90000 [14] 00000000 [18] 00000008 [1c] fe100000
> 
> That's OK, it's just showing that you polled a "work request flushed"
> status from a completion queue.  The latest version of libmthca should
> no longer print these messages.
> 
> - R.
> 




More information about the general mailing list