[ofa-general] multi-rail subnet_prefix and MPI issue

Hal Rosenstock hal.rosenstock at gmail.com
Wed Jun 3 14:42:40 PDT 2009


On Wed, Jun 3, 2009 at 5:35 PM, Frank Leers <Frank.Leers at sun.com> wrote:
> Thanks for the reply Hal,
>
>
> On Jun 3, 2009, at 1:52 PM, Hal Rosenstock wrote:
>
>> On Wed, Jun 3, 2009 at 4:10 PM, Frank Leers <Frank.Leers at sun.com> wrote:
>>>
>>> Hi,
>>>
>>> I have a question regarding unique subnet prefixes for a 2-rail QDR
>>> configuration.  I'm using connectX on CentOS 5.3 with the distro-supplied
>>> OFED 1.3.2, kernel 2.6.18-128.el5, opensm-3.2.2-3.el5.
>>>
>>> Any insight appreciated.
>>>
>>> -frank
>>>
>>>
>>>
>>>
>>> # ibstat
>>> CA 'mlx4_0'
>>>       CA type: MT26428
>>>       Number of ports: 2
>>>       Firmware version: 2.6.0
>>>       Hardware version: a0
>>>       Node GUID: 0x0003ba000100fc04
>>>       System image GUID: 0x0003ba000100fc07
>>>       Port 1:
>>>               State: Active
>>>               Physical state: LinkUp
>>>               Rate: 40
>>>               Base lid: 1
>>>               LMC: 0
>>>               SM lid: 1
>>>               Capability mask: 0x0251086a
>>>               Port GUID: 0x0003ba000100fc05
>>>       Port 2:
>>>               State: Active
>>>               Physical state: LinkUp
>>>               Rate: 10
>>>               Base lid: 18
>>>               LMC: 0
>>>               SM lid: 18
>>>               Capability mask: 0x0251086a
>>>               Port GUID: 0x0003ba000100fc06
>>>
>>> ...and specify the unique subnet prefixes in each SM instance's config
>>> file
>>> like this:
>>>
>>> # grep prefix /etc/ofed/opensm-ib0.conf
>>> OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000010"
>>>
>>> # grep prefix /etc/ofed/opensm-ib1.conf
>>> OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000020"
>>>
>>>
>>> Which results in these instances running:
>>>
>>> # ps -ef|grep opensm
>>> root      1498     1  0 05:16 ?        00:00:00 /usr/sbin/opensm -B
>>> -maxsmps
>>> 0 -f /var/log/opensm-ib0.log -p 15 -g 0x0003ba000100fc05 subnet_prefix
>>> 0xfe80000000000010
>>> root      2450     1  0 05:37 ?        00:00:00 /usr/sbin/opensm -B
>>> -maxsmps
>>> 0 -f /var/log/opensm-ib1.log -p 15 -g 0x0003ba000100fc06 subnet_prefix
>>> 0xfe80000000000020
>>
>> I'm unaware of being able to set the subnet prefix as you indicate. It
>> needs to be set in the opensm config file.
>
> That's what I thought I was doing.  The only reference that I find for this
> option is on the openMPI FAQ, there is no default or uncommented value in
> opensm.conf
>
> Do you happen to know what the correct value is, I have :
>
> OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000010"
>
> ..and there isn't an uncommented option already there to learn from.

Remove those subnet_prefix es there.

Look at opensm man page. Generate a config file using opensm -c
and then copy this to two different config files opensm.config.ib0 and ib1.
Edit those files for the appropriate subnet prefixes and then add
--config <path to those config files>/opensm.config.ib0/1 depending on the port.

If it's done correctly, you should see the results in smpquery
portinfo and not see that error message from MPI.

-- Hal
>
>>
>>
>> You can double check the subnet prefix on the two ports by using
>> smpquery portinfo if you have infiniband diags installed.
>
> Thanks for that clue...it reveals that what I am doing is not affecting
> anything, yet opensm hapily runs with this as an option.
>
>
> # smpquery portinfo 1 1
> # Port info: Lid 1 port 1
> Mkey:............................0x0000000000000000
> GidPrefix:.......................0xfe80000000000000
>
> <snip>
>
>
>
>>
>>
>> -- Hal
>>
>>> Yet we are seeing complaints from openMPI as such(single rail works
>>> fine):
>>>
>>> $ mpirun -np 2 -machinefile ~/myduals ./IMB-MPI1
>>>
>>> --------------------------------------------------------------------------
>>> WARNING: There are more than one active ports on host 'n0051', but the
>>> default subnet GID prefix was detected on more than one of these
>>> ports.  If these ports are connected to different physical IB
>>> networks, this configuration will fail in Open MPI.  This version of
>>> Open MPI requires that every physically separate IB subnet that is
>>> used between connected MPI processes must have different subnet ID
>>> values.
>>>
>>> Please see this FAQ entry for more details:
>>>
>>> http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
>>>
>>> NOTE: You can turn off this warning by setting the MCA parameter
>>>   btl_openib_warn_default_gid_prefix to 0.
>>>
>>> --------------------------------------------------------------------------
>>> [[1372,1],0][btl_openib_component.c:2827:handle_wc] from n0051 to: n0052
>>> error polling LP CQ with status RETRY EXCEEDED ERROR status number 12 for
>>> wr_id 197710080 opcode 0 qp_idx 0
>>>
>>> --------------------------------------------------------------------------
>>> The InfiniBand retry count between two MPI processes has been
>>> exceeded.  "Retry count" is defined in the InfiniBand spec 1.2
>>> (section 12.7.38):
>>>
>>>  The total number of times that the sender wishes the receiver to
>>>  retry timeout, packet sequence, etc. errors before posting a
>>>  completion error.
>>>
>>> This error typically means that there is something awry within the
>>> InfiniBand fabric itself.  You should note the hosts on which this
>>> error has occurred; it has been observed that rebooting or removing a
>>> particular host from the job can sometimes resolve this issue.
>>> Two MCA parameters can be used to control Open MPI's behavior with
>>> respect to the retry count:
>>>
>>> * btl_openib_ib_retry_count - The number of times the sender will
>>> attempt to retry (defaulted to 7, the maximum value).
>>> * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
>>> to 10).  The actual timeout value used is calculated as:
>>>
>>>  4.096 microseconds * (2^btl_openib_ib_timeout)
>>>
>>> See the InfiniBand spec 1.2 (section 12.7.34) for more details.
>>>
>>> Below is some information about the host that raised the error and the
>>> peer to which it was connected:
>>>
>>> Local host:   n0051
>>> Local device: mlx4_0
>>> Peer host:    n0052
>>>
>>> You may need to consult with your system administrator to get this
>>> problem fixed.
>>>
>>> --------------------------------------------------------------------------
>>>
>>> --------------------------------------------------------------------------
>>> mpirun has exited due to process rank 0 with PID 9242 on
>>> node n0051 exiting without calling "finalize". This may
>>> have caused other processes in the application to be
>>> terminated by signals sent by mpirun (as reported here).
>>>
>>> --------------------------------------------------------------------------
>>> [admin1:03239] 1 more process has sent help message
>>> help-mpi-btl-openib.txt
>>> / default subnet prefix
>>> [admin1:03239] Set MCA parameter "orte_base_help_aggregate" to 0 to see
>>> all
>>> help / error messages
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> general mailing list
>>> general at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>
>>> To unsubscribe, please visit
>>> http://openib.org/mailman/listinfo/openib-general
>>>
>
>



More information about the general mailing list