[ewg] ib_acme fails for requests with IPv4 addresses (ofed 3.5)

Jens Domke jens.domke at tu-dresden.de
Fri Mar 22 10:07:54 PDT 2013


Hello Sean,

that hint was really the missing piece in the puzzle. Somehow the IP addresses were not present in some of the configuration files and after deleting the file ibacm_addr.cfg I was able to run ibacm properly.
Thank you very much for the help.

Now I have another problem with 3 out of 18 nodes. All 3 get the correct information for the other 15 nodes if I run ib_acme, and also the other 15 can obtain the right information for the 3, but if I run ib_acme among those 3 nodes then I get a "Connection timed out". 
On all three nodes the command for 'localhost' does work, too.

Here the ouput:
====================================================================================
rc001 ~ $ pdsh -w rc0[00-17] 'for x in `seq 100 117`; do ib_acme -f i -d 10.1.4.${x} -v; done' | grep failed -B 1
rc002: Destination: 10.1.4.106
rc002: ib_acm_resolve_ip failed: Connection timed out
rc002: SA verification: failed Cannot assign requested address
--
rc011: Destination: 10.1.4.102
rc011: ib_acm_resolve_ip failed: Connection timed out
rc011: SA verification: failed Cannot assign requested address
--
rc006: Destination: 10.1.4.102
rc006: ib_acm_resolve_ip failed: Connection timed out
rc006: SA verification: failed Cannot assign requested address
--
rc002: Destination: 10.1.4.111
rc002: ib_acm_resolve_ip failed: Connection timed out
rc002: SA verification: failed Cannot assign requested address
--
rc011: Destination: 10.1.4.106
rc011: ib_acm_resolve_ip failed: Connection timed out
rc011: SA verification: failed Cannot assign requested address
--
rc006: Destination: 10.1.4.111
rc006: ib_acm_resolve_ip failed: Connection timed out
rc006: SA verification: failed Cannot assign requested address
====================================================================================

Do you have seen this type of problem before? In this case it should not be related to the ibacm_addr.cfg, right?
Maybe its a problem with the switch or links, I will try some other ports of the switch tomorrow.

Please find the log file, of rc011 (10.1.4.111) trying to get the information for rc006 (10.1.4.106), attached.
Just in case you might want to take a look at the log file.

Regards,
Jens

PS: I have a second rail running on the second port of the HCAs with a similar setup and I'm able to run ib_acme for all 18 nodes on the 2. rail w/o trouble.


On Mar 22, 2013, at 6:37 AM, Hefty, Sean wrote:

> Note that you can test each node separately by making the source/destination addresses the same.  This may show that your first system, rc002, is working, but rc003 is not.
> 
>> On the second node, the ib_acme command fails only for IPs, too. But it returns
>> with a different message ('Cannot assign requested address'):
>> ===============================================================================
>> ==
>> rc003 ~/tmp/ibacm-1.0.7 $ ib_acme -f i -s 10.0.0.52 -d 10.0.0.51 -v -P -V
>> Service: localhost
>> Destination: 10.0.0.51
>> Source: 10.0.0.52
>> ib_acm_resolve_ip failed: Cannot assign requested address
>> SA verification: failed Cannot assign requested address
>> 
>> Error Count,Resolve Count,No Data,Addr Query Count,Addr Cache Count,Route Query
>> Count,Route Cache Count
>> localhost,1,2,0,0,0,0,0
>> return status 0x0
>> 
>> rc003 ~/ $ cat /var/log/ibacm.log
>> ...
>> 1363872021.460: acm_svr_accept:
>> 1363872021.460: acm_svr_accept: assigned client 0
>> 1363872021.460: acm_server: receiving from client 0
>> 1363872021.460: acm_svr_receive: client 0
>> 1363872021.460: acm_svr_resolve_dest: client 0
>> 1363872021.460: acm_svr_resolve_dest: src  10.0.0.52
>> 1363872021.460: acm_get_ep: 10.0.0.52
>> 1363872021.460: acm_get_ep: notice - could not find 10.0.0.52
> 
> It doesn't appear that the ibacm address information is correct.  Having the complete log file may help.  The assigned address configuration would end up being near the top of the log file.
> 
> ibacm uses an address file, ibacm_addr.cfg, to assign address information to ports.  If this file is not present, it will be created.  It's a text file, and the format is hopefully straightforward to follow.  As a couple of places to look , the file may be in:
> 
> /etc/rdma/ibacm_addr.cfg
> /usr/local/etc/rdma/ibacm_addr.cfg
> 
> If you find the file, the simplest thing to do may be to just remove it.  You can look at the existing file to see that the correct IP address has been assigned to the right port.
> 
> - Sean
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ibacm.log
Type: application/octet-stream
Size: 8981 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20130323/8b1feb52/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4624 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20130323/8b1feb52/attachment.bin>


More information about the ewg mailing list