[openib-general] Re: opensm and SIGINT

Eitan Zahavi eitan at mellanox.co.il
Sat Sep 24 22:36:51 PDT 2005


Hi Hal,

Seems I was able to reproduce the osmtest failure (hope same one Viswa see).
I have left it running for a while on a machine and after 736
iterations it failed. Once it did - I stopped the loop.

 From osm.log I see:
Sep 25 02:50:56 463143 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory).
...
Sep 25 02:50:57 463991 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory).
...
Sep 25 02:50:58 463751 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory).

Sep 25 02:50:59 462938 [C004] -> __osm_sr_rcv_respond: [
Sep 25 02:50:59 462955 [C004] -> __osm_sr_rcv_respond: Generating response with 744 records.
...
Sep 25 02:50:59 463489 [C004] -> osm_vendor_send: RMPP 1 length 131000
Sep 25 02:50:59 463518 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory).
Sep 25 02:50:59 463549 [C004] -> __osm_sa_mad_ctrl_send_err_callback: [
Sep 25 02:50:59 463566 [C004] -> __osm_sa_mad_ctrl_send_err_callback: ERR 1A06: MAD transaction completed in error.

 From osmtest I get:
Sep 25 02:50:56 461412 [4000] -> osmt_get_all_services_and_check_names: Getting All Service Records
Sep 25 02:50:56 461429 [4000] -> osmv_query_sa: [
Sep 25 02:50:56 461445 [4000] -> osmv_query_sa DBG:001 SVC_REC_BY_NAME
Sep 25 02:50:56 461462 [4000] -> __osmv_send_sa_req: [
Sep 25 02:50:56 461478 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: [
Sep 25 02:50:56 461498 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: Using previously stored lid:0x0001 sm_lid:0x0001
Sep 25 02:50:56 461515 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: ]
Sep 25 02:50:56 461555 [4000] -> osm_mad_pool_get: [
...
Sep 25 02:51:00 461961 [8003] -> umad_receiver: ERR 5409: send completed with error (method=12 attr=31) -- dropping.
Sep 25 02:51:00 461979 [8003] -> umad_receiver: ERR 5410: class 0x3 LID 0x0

Is it possible there is a max limit on MAD size in umad? It seems the SM fails to allocate the size of the MAD required
for answering the "get all service records" query.

Another interesting message is the last message saying
"umad_receiver: ERR 5410: class 0x3 LID 0x0" Why is the reported LID 0 ?

Will you be able to handle the mad allocation?

Please advice

Eitan

Eitan Zahavi wrote:
> Hi Viswa,
> 
> Please run step 4 with verbose :  osmtest -f a -V -l /tmp/osmtest.log
> If it fails - please send us one copy of the /tmp/osmtest.log
> 
> This is just a guess but I think the "bug" will be in the fact that the SM
> did had a chance to completely cleanup between the tests and the tests are
> picky about the SM state (like number of services, multicast groups etc.
> 
> We will try to reproduce in here too.
> 
> Thanks
> 
> Eitan
> 
> Viswanath Krishnamurthy wrote:
> 
>>On 23 Sep 2005 13:49:31 -0400, Hal Rosenstock < halr at voltaire.com> wrote:
>>
>>Hi Viswa,
>>
>>On Fri, 2005-09-23 at 13:43, Viswanath Krishnamurthy wrote:
>>
>>
>>>More information,
>>>
>>>The test case is as follows
>>>
>>>1. Start opensm in verbose mode (-V)
>>>2. Ping remote node 
>>>3. osmtest -f c
>>>4. osmtest -f a
>>>5. pkill -9 opensm
>>>6. Repeat over
>>>
>>>Out of about 2500 iterations, 143  osmtest  failed. Keep in mind,
>>>only Step 4 failed.
>>
>>
>>Yes.
>>
>>Do you see any port LEDs on the switch blink indicating the port went
>>down from active and back while running this  ?
>>
>>
>>
>>No, I ran this test overnight and logged the results.  I will try it next week and let you know.
>>
>>
>>
>>
>>
>>>Step 3 which is inventory file creation *never* failed. (I think
>>>inventory file creation also talks to SA right ?) 
>>
>>
>>Right.
>>
>>-- Hal
>>
>>
>>
>>
>>
> 
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 




More information about the general mailing list