[ofa-general] multi-rail subnet_prefix and MPI issue
Frank Leers
Frank.Leers at Sun.COM
Wed Jun 3 14:35:54 PDT 2009
Thanks for the reply Hal,
On Jun 3, 2009, at 1:52 PM, Hal Rosenstock wrote:
> On Wed, Jun 3, 2009 at 4:10 PM, Frank Leers <Frank.Leers at sun.com>
> wrote:
>> Hi,
>>
>> I have a question regarding unique subnet prefixes for a 2-rail QDR
>> configuration. I'm using connectX on CentOS 5.3 with the distro-
>> supplied
>> OFED 1.3.2, kernel 2.6.18-128.el5, opensm-3.2.2-3.el5.
>>
>> Any insight appreciated.
>>
>> -frank
>>
>>
>>
>>
>> # ibstat
>> CA 'mlx4_0'
>> CA type: MT26428
>> Number of ports: 2
>> Firmware version: 2.6.0
>> Hardware version: a0
>> Node GUID: 0x0003ba000100fc04
>> System image GUID: 0x0003ba000100fc07
>> Port 1:
>> State: Active
>> Physical state: LinkUp
>> Rate: 40
>> Base lid: 1
>> LMC: 0
>> SM lid: 1
>> Capability mask: 0x0251086a
>> Port GUID: 0x0003ba000100fc05
>> Port 2:
>> State: Active
>> Physical state: LinkUp
>> Rate: 10
>> Base lid: 18
>> LMC: 0
>> SM lid: 18
>> Capability mask: 0x0251086a
>> Port GUID: 0x0003ba000100fc06
>>
>> ...and specify the unique subnet prefixes in each SM instance's
>> config file
>> like this:
>>
>> # grep prefix /etc/ofed/opensm-ib0.conf
>> OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000010"
>>
>> # grep prefix /etc/ofed/opensm-ib1.conf
>> OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000020"
>>
>>
>> Which results in these instances running:
>>
>> # ps -ef|grep opensm
>> root 1498 1 0 05:16 ? 00:00:00 /usr/sbin/opensm -B
>> -maxsmps
>> 0 -f /var/log/opensm-ib0.log -p 15 -g 0x0003ba000100fc05
>> subnet_prefix
>> 0xfe80000000000010
>> root 2450 1 0 05:37 ? 00:00:00 /usr/sbin/opensm -B
>> -maxsmps
>> 0 -f /var/log/opensm-ib1.log -p 15 -g 0x0003ba000100fc06
>> subnet_prefix
>> 0xfe80000000000020
>
> I'm unaware of being able to set the subnet prefix as you indicate. It
> needs to be set in the opensm config file.
That's what I thought I was doing. The only reference that I find for
this option is on the openMPI FAQ, there is no default or uncommented
value in opensm.conf
Do you happen to know what the correct value is, I have :
OPTIONS="$OPTIONS subnet_prefix 0xfe80000000000010"
..and there isn't an uncommented option already there to learn from.
>
>
> You can double check the subnet prefix on the two ports by using
> smpquery portinfo if you have infiniband diags installed.
Thanks for that clue...it reveals that what I am doing is not
affecting anything, yet opensm hapily runs with this as an option.
# smpquery portinfo 1 1
# Port info: Lid 1 port 1
Mkey:............................0x0000000000000000
GidPrefix:.......................0xfe80000000000000
<snip>
>
>
> -- Hal
>
>> Yet we are seeing complaints from openMPI as such(single rail works
>> fine):
>>
>> $ mpirun -np 2 -machinefile ~/myduals ./IMB-MPI1
>> --------------------------------------------------------------------------
>> WARNING: There are more than one active ports on host 'n0051', but
>> the
>> default subnet GID prefix was detected on more than one of these
>> ports. If these ports are connected to different physical IB
>> networks, this configuration will fail in Open MPI. This version of
>> Open MPI requires that every physically separate IB subnet that is
>> used between connected MPI processes must have different subnet ID
>> values.
>>
>> Please see this FAQ entry for more details:
>>
>> http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid
>>
>> NOTE: You can turn off this warning by setting the MCA parameter
>> btl_openib_warn_default_gid_prefix to 0.
>> --------------------------------------------------------------------------
>> [[1372,1],0][btl_openib_component.c:2827:handle_wc] from n0051 to:
>> n0052
>> error polling LP CQ with status RETRY EXCEEDED ERROR status number
>> 12 for
>> wr_id 197710080 opcode 0 qp_idx 0
>> --------------------------------------------------------------------------
>> The InfiniBand retry count between two MPI processes has been
>> exceeded. "Retry count" is defined in the InfiniBand spec 1.2
>> (section 12.7.38):
>>
>> The total number of times that the sender wishes the receiver to
>> retry timeout, packet sequence, etc. errors before posting a
>> completion error.
>>
>> This error typically means that there is something awry within the
>> InfiniBand fabric itself. You should note the hosts on which this
>> error has occurred; it has been observed that rebooting or removing a
>> particular host from the job can sometimes resolve this issue.
>> Two MCA parameters can be used to control Open MPI's behavior with
>> respect to the retry count:
>>
>> * btl_openib_ib_retry_count - The number of times the sender will
>> attempt to retry (defaulted to 7, the maximum value).
>> * btl_openib_ib_timeout - The local ACK timeout parameter (defaulted
>> to 10). The actual timeout value used is calculated as:
>>
>> 4.096 microseconds * (2^btl_openib_ib_timeout)
>>
>> See the InfiniBand spec 1.2 (section 12.7.34) for more details.
>>
>> Below is some information about the host that raised the error and
>> the
>> peer to which it was connected:
>>
>> Local host: n0051
>> Local device: mlx4_0
>> Peer host: n0052
>>
>> You may need to consult with your system administrator to get this
>> problem fixed.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 0 with PID 9242 on
>> node n0051 exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>> [admin1:03239] 1 more process has sent help message help-mpi-btl-
>> openib.txt
>> / default subnet prefix
>> [admin1:03239] Set MCA parameter "orte_base_help_aggregate" to 0 to
>> see all
>> help / error messages
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
More information about the general
mailing list