[ofa-general] multi-rail subnet_prefix and MPI issue
Frank Leers
Frank.Leers at Sun.COM
Wed Jun 3 17:59:40 PDT 2009
On Jun 3, 2009, at 2:42 PM, Hal Rosenstock wrote:
> On Wed, Jun 3, 2009 at 5:35 PM, Frank Leers <Frank.Leers at sun.com>
> wrote:
>> Thanks for the reply Hal,
>>
>>
>> On Jun 3, 2009, at 1:52 PM, Hal Rosenstock wrote:
>>
>>> On Wed, Jun 3, 2009 at 4:10 PM, Frank Leers <Frank.Leers at sun.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have a question regarding unique subnet prefixes for a 2-rail QDR
>>>> configuration. I'm using connectX on CentOS 5.3 with the distro-
>>>> supplied
>>>> OFED 1.3.2, kernel 2.6.18-128.el5, opensm-3.2.2-3.el5.
>>>>
>>>> Any insight appreciated.
>>>>
>>>> -frank
>>>>
>>>>
>>>>
>>>>
>>>> # ibstat
>>>> CA 'mlx4_0'
>>>> CA type: MT26428
>>>> Number of ports: 2
>>>> Firmware version: 2.6.0
>>>> Hardware version: a0
>>>> Node GUID: 0x0003ba000100fc04
>>>> System image GUID: 0x0003ba000100fc07
>>>> Port 1:
>>>> State: Active
>>>> Physical state: LinkUp
>>>> Rate: 40
>>>> Base lid: 1
>>>> LMC: 0
>>>> SM lid: 1
>>>> Capability mask: 0x0251086a
>>>> Port GUID: 0x0003ba000100fc05
>>>> Port 2:
>>>> State: Active
>>>> Physical state: LinkUp
>>>> Rate: 10
>>>> Base lid: 18
>>>> LMC: 0
>>>> SM lid: 18
>>>> Capability mask: 0x0251086a
>>>> Port GUID: 0x0003ba000100fc06
>>>>
>>>> ...and specify the unique subnet prefixes in each SM instance's
>>>> config
>>>> file
>>>> like this:
>>>>
>>>> # grep prefix /etc/ofed/opensm-ib0.conf
>>>> OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000010"
>>>>
>>>> # grep prefix /etc/ofed/opensm-ib1.conf
>>>> OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000020"
>>>>
>>>>
>>>> Which results in these instances running:
>>>>
>>>> # ps -ef|grep opensm
>>>> root 1498 1 0 05:16 ? 00:00:00 /usr/sbin/opensm -B
>>>> -maxsmps
>>>> 0 -f /var/log/opensm-ib0.log -p 15 -g 0x0003ba000100fc05
>>>> subnet_prefix
>>>> 0xfe80000000000010
>>>> root 2450 1 0 05:37 ? 00:00:00 /usr/sbin/opensm -B
>>>> -maxsmps
>>>> 0 -f /var/log/opensm-ib1.log -p 15 -g 0x0003ba000100fc06
>>>> subnet_prefix
>>>> 0xfe80000000000020
>>>
>>> I'm unaware of being able to set the subnet prefix as you
>>> indicate. It
>>> needs to be set in the opensm config file.
>>
>> That's what I thought I was doing. The only reference that I find
>> for this
>> option is on the openMPI FAQ, there is no default or uncommented
>> value in
>> opensm.conf
>>
>> Do you happen to know what the correct value is, I have :
>>
>> OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000010"
>>
>> ..and there isn't an uncommented option already there to learn from.
>
> Remove those subnet_prefix es there.
>
> Look at opensm man page. Generate a config file using opensm -c
> and then copy this to two different config files opensm.config.ib0
> and ib1.
> Edit those files for the appropriate subnet prefixes and then add
> --config <path to those config files>/opensm.config.ib0/1 depending
> on the port.
>
> If it's done correctly, you should see the results in smpquery
> portinfo and not see that error message from MPI.
This did get me past the MPI issue, thanks!
>
>
> -- Hal
>>
>>>
>>>
>>> You can double check the subnet prefix on the two ports by using
>>> smpquery portinfo if you have infiniband diags installed.
>>
>> Thanks for that clue...it reveals that what I am doing is not
>> affecting
>> anything, yet opensm hapily runs with this as an option.
>>
>>
>> # smpquery portinfo 1 1
>> # Port info: Lid 1 port 1
>> Mkey:............................0x0000000000000000
>> GidPrefix:.......................0xfe80000000000000
>>
>> <snip>
>>
>>
>>
>>>
>>>
>>> -- Hal
>>>
>>>> Yet we are seeing complaints from openMPI as such(single rail works
>>>> fine):
>>>>
>>>> $ mpirun -np 2 -machinefile ~/myduals ./IMB-MPI1
>>>>
>>>> --------------------------------------------------------------------------
>>>> WARNING: There are more than one active ports on host 'n0051',
>>>> but the
>>>> default subnet GID prefix was detected on more than one of these
>>>> ports. If these ports are connected to different physical IB
>>>> networks, this configuration will fail in Open MPI. This version
>>>> of
>>>> Open MPI requires that every physically separate IB subnet that is
>>>> used between connected MPI processes must have different subnet ID
>>>> values.
>>>>
>>>> Please see this FAQ entry for more details:
>>>>
>>>> http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
>>>>
>>>> NOTE: You can turn off this warning by setting the MCA parameter
>>>> btl_openib_warn_default_gid_prefix to 0.
>>>>
>>>> --------------------------------------------------------------------------
>>>> [[1372,1],0][btl_openib_component.c:2827:handle_wc] from n0051
>>>> to: n0052
>>>> error polling LP CQ with status RETRY EXCEEDED ERROR status
>>>> number 12 for
>>>> wr_id 197710080 opcode 0 qp_idx 0
>>>>
>>>> --------------------------------------------------------------------------
>>>> The InfiniBand retry count between two MPI processes has been
>>>> exceeded. "Retry count" is defined in the InfiniBand spec 1.2
>>>> (section 12.7.38):
>>>>
>>>> The total number of times that the sender wishes the receiver to
>>>> retry timeout, packet sequence, etc. errors before posting a
>>>> completion error.
>>>>
>>>> This error typically means that there is something awry within the
>>>> InfiniBand fabric itself. You should note the hosts on which this
>>>> error has occurred; it has been observed that rebooting or
>>>> removing a
>>>> particular host from the job can sometimes resolve this issue.
>>>> Two MCA parameters can be used to control Open MPI's behavior with
>>>> respect to the retry count:
>>>>
>>>> * btl_openib_ib_retry_count - The number of times the sender will
>>>> attempt to retry (defaulted to 7, the maximum value).
>>>> * btl_openib_ib_timeout - The local ACK timeout parameter
>>>> (defaulted
>>>> to 10). The actual timeout value used is calculated as:
>>>>
>>>> 4.096 microseconds * (2^btl_openib_ib_timeout)
>>>>
>>>> See the InfiniBand spec 1.2 (section 12.7.34) for more details.
>>>>
>>>> Below is some information about the host that raised the error
>>>> and the
>>>> peer to which it was connected:
>>>>
>>>> Local host: n0051
>>>> Local device: mlx4_0
>>>> Peer host: n0052
>>>>
>>>> You may need to consult with your system administrator to get this
>>>> problem fixed.
>>>>
>>>> --------------------------------------------------------------------------
>>>>
>>>> --------------------------------------------------------------------------
>>>> mpirun has exited due to process rank 0 with PID 9242 on
>>>> node n0051 exiting without calling "finalize". This may
>>>> have caused other processes in the application to be
>>>> terminated by signals sent by mpirun (as reported here).
>>>>
>>>> --------------------------------------------------------------------------
>>>> [admin1:03239] 1 more process has sent help message
>>>> help-mpi-btl-openib.txt
>>>> / default subnet prefix
>>>> [admin1:03239] Set MCA parameter "orte_base_help_aggregate" to 0
>>>> to see
>>>> all
>>>> help / error messages
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> general mailing list
>>>> general at lists.openfabrics.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>>
>>>> To unsubscribe, please visit
>>>> http://openib.org/mailman/listinfo/openib-general
>>>>
>>
>>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list