[openib-general] Re: opensm and SIGINT

Hal Rosenstock halr at voltaire.com
Thu Sep 22 20:44:04 PDT 2005


On Thu, 2005-09-22 at 21:37, Viswanath Krishnamurthy wrote:
> 
> On 22 Sep 2005 18:44:44 -0400, Hal Rosenstock <halr at voltaire.com>
> wrote:
>         Hi Viswa,
>         
>         On Thu, 2005-09-22 at 15:55, Viswanath Krishnamurthy wrote:
>         > Here is the log of osmtest failure. This was seen 150 times
>         out of
>         > 2500 iterations. The opensm SUBNET UP failure is tough to
>         reproduce. 
>         > Saw it once in 2500 iterations. Unfortunately I did not
>         collect the
>         > log on that error.
>         
>         I understand but it is hard to know whether this is a known
>         issue or
>         something else without a log of the failure. 
>         
>         > The patch worked as expected and did not see any issues with
>         ctrl-C.
>         > When I tried apply the patch, I got a failure.  (I used the
>         patch
>         > command). I manually added those 2 lines.
>         
>         Not sure why the patch wouldn't apply. 
>         
>         > Command Line Arguments
>         > Done with args
>         >         Flow = All Validations
>         > Sep 21 17:50:56 684254 [B7F026C0] ->
>         osm_vendor_get_all_port_attr:
>         > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def 
>         > ault port.
>         > using default guid 0x2c90200400cfd
>         > Sep 21 17:50:56 686301 [B7F026C0] ->
>         osm_vendor_get_all_port_attr:
>         > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
>         > ault port. 
>         > Sep 21 17:50:56 686347 [B7F026C0] -> osm_vendor_bind:
>         Binding to port
>         > 0x2c90200400cfd.
>         > Sep 21 17:50:56 689963 [B7F026C0] ->
>         osm_vendor_get_all_port_attr:
>         > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def 
>         > ault port.
>         > Sep 21 17:50:56 691969 [B7F026C0] ->
>         osm_vendor_get_all_port_attr:
>         > assign CA mthca0 port 1 guid (0x2c90200400cfd) as the def
>         > ault port.
>         > Sep 21 17:50:56 693187 [B7F026C0] -> 
>         > osmtest_validate_sa_class_port_info:
>         > -----------------------------
>         > SA Class Port Info:
>         >  base_ver:1
>         >  class_ver:2
>         >  cap_mask:0x202
>         >  resp_time_val:0x64
>         > ----------------------------- 
>         > Sep 21 17:50:56 775383 [B7F026C0] ->
>         osmtest_wrong_sm_key_ignored: Try
>         > PortRecord for port with LID 0x0 Num:0x1.
>         > Sep 21 17:51:00 775320 [B76FFBB0] -> umad_receiver: ERR
>         5409: send
>         > completed with error (method=1 attr=12 trans_id=0x34) -- 
>         > dropping.
>         > Sep 21 17:51:00 775389 [B76FFBB0] -> umad_receiver: ERR
>         5410: class
>         > 0x3 LID 0x0
>         > Sep 21 17:51:00 775418 [B76FFBB0] -> osmtest_query_res_cb:
>         ERR 0003:
>         > Error on query (IB_TIMEOUT). 
>         > Sep 21 17:51:00 775465 [B7F026C0] ->
>         osmtest_wrong_sm_key_ignored: ERR
>         > 0011: Did not get a timeout but got (IB_SUCCESS).
>         > Sep 21 17:51:00 775581 [B7F026C0] -> osmt_register_service:
>         > Registering Service: name: osmt.srvc.1804289383.7793
>         id:0x6b8b26f
>         > 6.
>         > Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service:
>         > Registering Service: name:osmt.srvc.846930885.7793
>         id:0x327b0554
>         > Sep 21 17:51:00 777143 [B7F026C0] -> osmt_register_service: 
>         > Registering Service: name:osmt.srvc.846930885.7793
>         id:0x327b0554
>         > .
>         > Sep 21 17:51:04 779578 [B76FFBB0] -> umad_receiver: ERR
>         5409: send
>         > completed with error (method=2 attr=31 trans_id=0x36)
>         --dropping. 
>         > Sep 21 17:51:04 779604 [B76FFBB0] -> umad_receiver: ERR
>         5410: class
>         > 0x3 LID 0x0
>         > Sep 21 17:51:04 779631 [B76FFBB0] -> osmtest_query_res_cb:
>         ERR 0003:
>         > Error on query (IB_TIMEOUT).
>         > Sep 21 17:51:04 779674 [B7F026C0] -> osmt_register_service:
>         ERR 0364: 
>         > ib_query failed (IB_TIMEOUT).
>         > Sep 21 17:51:04 779740 [B7F026C0] -> osmtest_run: ERR 00148:
>         Service
>         > Flow failed (IB_TIMEOUT)
>         > OSMTEST: TEST "All Validations" FAIL
>         
>         The final FAIL/PASS is definitive so there are real failures
>         here. Is 
>         this consistent or intermittent ? Does this work sometimes or
>         always
> 
> 
> Intermittent.. As I said 150 out of  2500 iterations failed.

You did say that :-) Sorry.

>  Is there any log you want me to collect ?

Can you capture a fresh log for this on the OpenSM side (opensm -V) ? 

Also, are there port state LEDs on the switch(es) in your subnet ? Can
you correlate these failures with the LEDs changing ?

Thanks.

-- Hal




More information about the general mailing list