[openib-general] OpenSM 1.8.0 merge status

Hal Rosenstock halr at voltaire.com
Sat Sep 3 12:43:42 PDT 2005


Hi Yael,

I've made my way through the rest of the OpenSM changes.

First, I'd like to thank you for the great job you did merging the
OpenIB OpenSM up to 1.8.0. I hope that once I check things back in to
the trunk that this can be the basis for future OpenSM development
rather than some other code base.

Aside from the issue with the 4x port operating at 1x which I will get
back to shortly, I need also to do some testing with Solaris 10. With
that and the answers, I should be ready to check things back into the
trunk early week (Monday is a holiday) so I am shooting for by
Wednesday.

I do have some additional comments (including nits) and questions based
on inspection of the changes:

1. include/opensm/osm_sm.h
typedef struct _osm_sm
{
  osm_thread_state_t       thread_state;
// osm_sm_state_t          state;

Should state be removed ?

2. osm_port.c has the following:
  /* allocate enough SL2VL tables */
  if (osm_node_get_type( p_node ) == IB_NODE_TYPE_SWITCH)
  {
    /* we need node num ports + 1 SL2VL tables */
    num_slvl = osm_node_get_num_physp( p_node ) + 1;
  }
  else
  {
    /* An end node - we need only one SL2VL */
    num_slvl = osm_node_get_num_physp( p_node ) + 1;
  }
Seems like it should just be either:
num_slvl = osm_node_get_num_physp( p_node ) + 1;
or the end node case should be different per the comment.

3. osm_subnet.c
  /* by default we will consider waiting for 20x transaction timeout normal */
  p_opt->max_msg_fifo_timeout = 50*OSM_DEFAULT_TRANS_TIMEOUT_MILLISEC;
The comment and code are inconsistent.

Also,
p_opt->single_thread = TRUE; /* HACK : Modified until SMP bug is resolved */
Is the comment accurate ? What's the SMP bug ? Can this run multithreaded ?

4. osm_db_files.c:
  cl_list_init( &p_db->domains, 5 );
Should 5 be defined ?

Should the test code at the bottom of this file be separated out somewhere else ?

5. osm_helper.c
Should Infinicon now be listed as SilverStorm ?

6. osm_multicast,c:
In osm_mcast_mgr_process_mgrp
  status = osm_mcast_mgr_process_tree( p_mgr, p_mgrp, req_type, port_guid );
  if( status != IB_SUCCESS )
  {
    CL_PLOCK_RELEASE( p_mgr->p_lock );

Should this lock release be removed as the other one was ?

7. osm_mad_pool.c
osm_mad_pool_destroy(
  IN osm_mad_pool_t* const p_pool )
{
  CL_ASSERT( p_pool );

  /* HACK: we still rarely see some mads leaking - so ignore this */
  /* cl_qlock_pool_destroy( &p_pool->madw_pool ); */
Should this be removed ? 

But do MADs leak ?

8. osm_ucast_updn.c
#if 0
/* This function insert a new element by guid index with rank value into the
   qmap list */
int
__updn_insert_rank (
  OUT cl_qmap_t *p_guid_rank_tbl,
  IN ib_net64_t guid_index,
  IN uint8_t rank)
{
  cl_status_t status;
  updn_rank_t *p_rank,*p_check;
  p_rank = (updn_rank_t*) cl_malloc(sizeof(updn_rank_t));
  CL_ASSERT (p_rank != NULL);

  p_rank->rank = rank;
  p_check = (updn_rank_t*) cl_qmap_insert(p_guid_rank_tbl, guid_index , &p_rank->map_item);
  /* No check for same key required since we support mutiple guids ranking */
  return 0;
}
#endif
Should this be removed ?

Also in __updn_bfs_by_node
    /*
      make sure that all five of the following occur:
      1. The port isn't NULL
      2. The port is a valid port
    */
It looks like it is 2 rather than 5 (comment).

9. osm_sa_mad_ctrl.c
__osm_sa_mad_ctrl_rcv_callback
  switch( p_sa_mad->method )
  {
  case IB_MAD_METHOD_REPORT_RESP:
...
    if (p_req_madw)
      osm_mad_pool_put( p_ctrl->p_mad_pool, p_req_madw );
    osm_mad_pool_put( p_ctrl->p_mad_pool, p_madw );
Should the second duplicate call be eliminated ?

10. osm_sm_mad_ctrl.c
__osm_sm_mad_ctrl_send_err_cb
  /* For now - do not add the alternate dr path to the release */
  if (0)
Should this code be removed ?

11. osm_sa_service_record.c
osm_sr_rcv_process_set_method
  if( (comp_mask & ( IB_SR_COMPMASK_SID | IB_SR_COMPMASK_SGID )) !=
      (IB_SR_COMPMASK_SID | IB_SR_COMPMASK_SGID ))

What about SGID and PKEY as well ? Looks like there is some code for
PKEY mask not set for the response.

12. I also have a question about files/database and file/database
formats:

For those files which are input (and perhaps output ones) as well, is
there some version number kept ? How is compatibility going forward
dealt with (especially if it is a generated file whose format may
change) ?

Others:
You might want to change the permission on st.c in merge branch (no +x)

Please be careful about adding extra whitespace to files.

The questions about the osmtest changes are still pending.

Thanks again for your efforts.

-- Hal




More information about the general mailing list