[ofa-general] multi-rail subnet_prefix and MPI issue

Frank Leers Frank.Leers at Sun.COM
Wed Jun 3 14:35:54 PDT 2009


Thanks for the reply Hal,


On Jun 3, 2009, at 1:52 PM, Hal Rosenstock wrote:

> On Wed, Jun 3, 2009 at 4:10 PM, Frank Leers <Frank.Leers at sun.com>  
> wrote:
>> Hi,
>>
>> I have a question regarding unique subnet prefixes for a 2-rail QDR
>> configuration.  I'm using connectX on CentOS 5.3 with the distro- 
>> supplied
>> OFED 1.3.2, kernel 2.6.18-128.el5, opensm-3.2.2-3.el5.
>>
>> Any insight appreciated.
>>
>> -frank
>>
>>
>>
>>
>> # ibstat
>> CA 'mlx4_0'
>>        CA type: MT26428
>>        Number of ports: 2
>>        Firmware version: 2.6.0
>>        Hardware version: a0
>>        Node GUID: 0x0003ba000100fc04
>>        System image GUID: 0x0003ba000100fc07
>>        Port 1:
>>                State: Active
>>                Physical state: LinkUp
>>                Rate: 40
>>                Base lid: 1
>>                LMC: 0
>>                SM lid: 1
>>                Capability mask: 0x0251086a
>>                Port GUID: 0x0003ba000100fc05
>>        Port 2:
>>                State: Active
>>                Physical state: LinkUp
>>                Rate: 10
>>                Base lid: 18
>>                LMC: 0
>>                SM lid: 18
>>                Capability mask: 0x0251086a
>>                Port GUID: 0x0003ba000100fc06
>>
>> ...and specify the unique subnet prefixes in each SM instance's  
>> config file
>> like this:
>>
>> # grep prefix /etc/ofed/opensm-ib0.conf
>> OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000010"
>>
>> # grep prefix /etc/ofed/opensm-ib1.conf
>> OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000020"
>>
>>
>> Which results in these instances running:
>>
>> # ps -ef|grep opensm
>> root      1498     1  0 05:16 ?        00:00:00 /usr/sbin/opensm -B  
>> -maxsmps
>> 0 -f /var/log/opensm-ib0.log -p 15 -g 0x0003ba000100fc05  
>> subnet_prefix
>> 0xfe80000000000010
>> root      2450     1  0 05:37 ?        00:00:00 /usr/sbin/opensm -B  
>> -maxsmps
>> 0 -f /var/log/opensm-ib1.log -p 15 -g 0x0003ba000100fc06  
>> subnet_prefix
>> 0xfe80000000000020
>
> I'm unaware of being able to set the subnet prefix as you indicate. It
> needs to be set in the opensm config file.

That's what I thought I was doing.  The only reference that I find for  
this option is on the openMPI FAQ, there is no default or uncommented  
value in opensm.conf

Do you happen to know what the correct value is, I have :

OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000010"

..and there isn't an uncommented option already there to learn from.



>
>
> You can double check the subnet prefix on the two ports by using
> smpquery portinfo if you have infiniband diags installed.

Thanks for that clue...it reveals that what I am doing is not  
affecting anything, yet opensm hapily runs with this as an option.


# smpquery portinfo 1 1
# Port info: Lid 1 port 1
Mkey:............................0x0000000000000000
GidPrefix:.......................0xfe80000000000000

<snip>



>
>
> -- Hal
>
>> Yet we are seeing complaints from openMPI as such(single rail works  
>> fine):
>>
>> $ mpirun -np 2 -machinefile ~/myduals ./IMB-MPI1
>> --------------------------------------------------------------------------
>> WARNING: There are more than one active ports on host 'n0051', but  
>> the
>> default subnet GID prefix was detected on more than one of these
>> ports.  If these ports are connected to different physical IB
>> networks, this configuration will fail in Open MPI.  This version of
>> Open MPI requires that every physically separate IB subnet that is
>> used between connected MPI processes must have different subnet ID
>> values.
>>
>> Please see this FAQ entry for more details:
>>
>> http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
>>
>> NOTE: You can turn off this warning by setting the MCA parameter
>>    btl_openib_warn_default_gid_prefix to 0.
>> --------------------------------------------------------------------------
>> [[1372,1],0][btl_openib_component.c:2827:handle_wc] from n0051 to:  
>> n0052
>> error polling LP CQ with status RETRY EXCEEDED ERROR status number  
>> 12 for
>> wr_id 197710080 opcode 0 qp_idx 0
>> --------------------------------------------------------------------------
>> The InfiniBand retry count between two MPI processes has been
>> exceeded.  "Retry count" is defined in the InfiniBand spec 1.2
>> (section 12.7.38):
>>
>>  The total number of times that the sender wishes the receiver to
>>  retry timeout, packet sequence, etc. errors before posting a
>>  completion error.
>>
>> This error typically means that there is something awry within the
>> InfiniBand fabric itself.  You should note the hosts on which this
>> error has occurred; it has been observed that rebooting or removing a
>> particular host from the job can sometimes resolve this issue.
>> Two MCA parameters can be used to control Open MPI's behavior with
>> respect to the retry count:
>>
>> * btl_openib_ib_retry_count - The number of times the sender will
>> attempt to retry (defaulted to 7, the maximum value).
>> * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
>> to 10).  The actual timeout value used is calculated as:
>>
>>   4.096 microseconds * (2^btl_openib_ib_timeout)
>>
>> See the InfiniBand spec 1.2 (section 12.7.34) for more details.
>>
>> Below is some information about the host that raised the error and  
>> the
>> peer to which it was connected:
>>
>> Local host:   n0051
>> Local device: mlx4_0
>> Peer host:    n0052
>>
>> You may need to consult with your system administrator to get this
>> problem fixed.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 0 with PID 9242 on
>> node n0051 exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>> [admin1:03239] 1 more process has sent help message help-mpi-btl- 
>> openib.txt
>> / default subnet prefix
>> [admin1:03239] Set MCA parameter "orte_base_help_aggregate" to 0 to  
>> see all
>> help / error messages
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>




More information about the general mailing list