[openib-general] [PATCH] Opensm - lid assignment issues

Troy Benjegerdes troy at scl.ameslab.gov
Sun Nov 13 13:45:44 PST 2005


Yael Kalka wrote:

>Hi Hal,
>
>During some windows tests we've discovered that there is still another
>problem in the lid_mgr. The problem happend when 2  HCAs had the same
>lid - opensm entered an infinite loop.
>The following patch fixes this.
>
>Thanks,
>Yael
>
>Signed-off-by:  Yael Kalka <yael at mellanox.co.il>
>
>Index: opensm/osm_lid_mgr.c
>===================================================================
>--- opensm/osm_lid_mgr.c	(revision 4032)
>+++ opensm/osm_lid_mgr.c	(working copy)
>@@ -550,6 +550,9 @@ __osm_lid_mgr_init_sweep(
>       {
>               /* This port will use its local lid, and consume the entire required lid range.
>                  Thus we can skip that range. */
>+              /* If the disc_max_lid is greater then lid - we can skip right to it, 
>+                 since we've done all neccessary checks on the lids in between. */
>+              if (disc_max_lid > lid)
>         lid = disc_max_lid;
>       }
>     }
>@@ -593,7 +596,14 @@ __osm_lid_mgr_init_sweep(
>   {
>     p_range =
>       (osm_lid_mgr_range_t *)cl_malloc(sizeof(osm_lid_mgr_range_t));
>-    p_range->min_lid = 1;
>+    /* 
>+       The p_range can be NULL in one of 2 cases:
>+       1. If max_defined_lid == 0. In this case, we want the entire range.
>+       2. If all lids discovered in the loop where mapped. In this case
>+          no free range exists, and we want to define it after the last 
>+          mapped lid.
>+    */
>+    p_range->min_lid = lid;
>   }
>   p_range->max_lid = p_mgr->p_subn->max_unicast_lid_ho - 1;
>   cl_qlist_insert_tail( &p_mgr->free_ranges, &p_range->item );
>
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>  
>
The opensm on the show floor is showing the following in oprofile:

with a unit mask of 0x01 (mandatory) count 100000
samples  %        app name                 symbol name
5970354  51.7020  libpthread-2.3.4.so      
pthread_cond_timedwait@@GLIBC_2.3.2
5037621  43.6247  libosmcomp.so.1.0.0      __cl_timer_prov_cb
66241     0.5736  libosmcomp.so.1.0.0      anonymous symbol from section 
.plt
55929     0.4843  oprofiled                (no symbols)
49918     0.4323  opensm                   __osm_ucast_mgr_process_neighbors
39585     0.3428  vmlinux                  hpet_readl
25333     0.2194  oprofile                 (no symbols)
22734     0.1969  opreport                 (no symbols)
14724     0.1275  libcrypto.so.0.9.7a      (no symbols)
14296     0.1238  libc-2.3.4.so            __tzfile_compute
13901     0.1204  vmlinux                  __copy_to_user_ll


Is this the same loop?



More information about the general mailing list