[ofa-general] Re: [PATCH] opensm/osm_node_info_rcv.c: create physp for the newly discovered port of the known node
Yevgeny Kliteynik
kliteyn at dev.mellanox.co.il
Wed Feb 18 13:26:52 PST 2009
Hi Sasha,
Sasha Khapyorsky wrote:
>>> On 14:41 Tue 17 Feb , Yevgeny Kliteynik wrote:
>>>> This patch fixes bugzilla issue #1515:
>>>>
>>>> Topology:
>>>> |---------------|
>>>> | SW2 |
>>>> |---------------|
>>>> |x |y |z |v
>>>> |----| | | |----|
>>>> | | | |
>>>> | |----| |----| |
>>>> | | | |
>>>> a| b| c| d|
>>>> |---------------| |---------------|
>>>> | SW1 | | SW3 |
>>>> |---------------| |---------------|
>>>> | |
>>>> | |
>>>> HCA with SM HCA
>>>>
>>>> During the discovery:
>>>>
>>>> SM sends NodeInfo request to SW1
>>>> SM sends NodeInfo request to SW2 through link a->x
>>>> SM discovers new node SW2:
>>>> - updates DR to SW2 to go through link a->x
>>>> - creates physp x
>>> And requests SwitchInfo from SW2, and on response sends PortInfo to all
>>> switch ports. PortInfo receiver will initialize all switch ports. Isn't
>>> it?
>> Links are created only by getting NodeInfo response. W/o the
>> fix, when SW1 gets NodeInfo from SW2 through link b->y, it
>> doesn't initialize physp for y, hence the link can't be created.
>> So the only chance for the link to be created is when
>> SW2 will send NodeInfo request to SW1 through link y->b.
>> But this isn't happening, because DR for SW2 is updated
>> to contain this link, so SM doesn't probe the remote side
>> of y to avoid loop.
>
> Ok, so whole story should be caused by race between SW2 SwitchInfo
> receiving (using a->x) and SW2 NodeInfo (using b->y). As far as I can
> see only in this case SW2 port 0 path will be altered (and PortInfo will
> be requested using new path). Right?
Right.
>> BTW, thing happens with every other link that connects
>> same nodes. In the example above, link v<->d will be
>> missing as well.
>
> Hmm, I was not able to reproduce this using two switch setup. But if it
> is resulted by race it also should not be 100% reproducible.
Right again. Discovery shouldn't rely on the order of packets
that it receives. I guess that on real hardware the packets
are handled serially, so we need some more complex example
for higher probability of this race.
I see the problem on the simple example using the simulator
(ibmgtsim), which has several threads handling the packets,
so the chances for OOO packets are much higher.
-- Yevgeny
> Basically I'm not against proposed physp initialization, but want to
> understand the problem better.
>
> Sasha
>
>> -- Yevgeny
>>
>>> Sasha
>>>> SM sends NodeInfo request to SW2 through link b->y
>>>> SM discovers a known node SW2
>>>> - DOES NOT create physp y
>>>> - updates DR to SW2 to go through link b->y
>>>>
>>>> From now on, the DR to SW2 is going through port y, so OpenSM won't deal
>>>> with
>>>> port y any more, leaving it uninitialized (no physp object for this
>>>> port).
>>>>
>>>> The fix is to create physp for the newly discovered port of the known
>>>> switch node, same way as it is done for HCAs.
>>>> I also added one log message for the case that showed the problem - when
>>>> one of the link sides is uninitialized (no valid ports check). Perhaps
>>>> this log message should be an error message instead?
>>>>
>>>> Signed-off-by: Yevgeny Kliteynik <kliteyn at dev.mellanox.co.il>
>>>> ---
>>>> opensm/opensm/osm_node_info_rcv.c | 24 +++++++++++++++++++++++-
>>>> 1 files changed, 23 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/opensm/opensm/osm_node_info_rcv.c
>>>> b/opensm/opensm/osm_node_info_rcv.c
>>>> index c52c0d5..7da3103 100644
>>>> --- a/opensm/opensm/osm_node_info_rcv.c
>>>> +++ b/opensm/opensm/osm_node_info_rcv.c
>>>> @@ -164,8 +164,12 @@ __osm_ni_rcv_set_links(IN osm_sm_t * sm,
>>>> */
>>>> if (!osm_node_link_has_valid_ports(p_node, port_num,
>>>> p_neighbor_node,
>>>> - p_ni_context->port_num))
>>>> + p_ni_context->port_num)) {
>>>> + OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
>>>> + "Link at node 0x%" PRIx64 ", port %u - no valid ports\n",
>>>> + cl_ntoh64(osm_node_get_node_guid(p_node)), port_num);
>>>> goto _exit;
>>>> + }
>>>>
>>>> if (osm_node_link_exists(p_node, port_num,
>>>> p_neighbor_node, p_ni_context->port_num)) {
>>>> @@ -537,8 +541,26 @@ __osm_ni_rcv_process_existing_switch(IN osm_sm_t *
>>>> sm,
>>>> IN osm_node_t * const p_node,
>>>> IN const osm_madw_t * const p_madw)
>>>> {
>>>> +
>>>> + ib_smp_t *p_smp;
>>>> + ib_node_info_t *p_ni;
>>>> + uint8_t port_num;
>>>> +
>>>> OSM_LOG_ENTER(sm->p_log);
>>>>
>>>> + p_smp = osm_madw_get_smp_ptr(p_madw);
>>>> + p_ni = (ib_node_info_t *) ib_smp_get_payload_ptr(p_smp);
>>>> + port_num = ib_node_info_get_local_port_num(p_ni);
>>>> +
>>>> + if (!osm_node_get_physp_ptr(p_node, port_num)) {
>>>> + OSM_LOG(sm->p_log, OSM_LOG_DEBUG,
>>>> + "Creating physp for node GUID:0x%"
>>>> + PRIx64 ", port %u\n",
>>>> + cl_ntoh64(osm_node_get_node_guid(p_node)),
>>>> + port_num);
>>>> + osm_node_init_physp(p_node, p_madw);
>>>> + }
>>>> +
>>>> /*
>>>> If this switch has already been probed during this sweep,
>>>> then don't bother reprobing it.
>>>> --
>>>> 1.5.1.4
>>>>
>
More information about the general
mailing list