[openib-general] Re: opensm and SIGINT

Hal Rosenstock halr at voltaire.com
Sun Sep 25 03:07:49 PDT 2005


Hi Eitan,

On Sun, 2005-09-25 at 01:36, Eitan Zahavi wrote:
> Hi Hal,
> 
> Seems I was able to reproduce the osmtest failure (hope same one Viswa see).
                                ^^^
                                an osmtest failure


I don't think it's the same one. This looks quite different.

> I have left it running for a while on a machine and after 736
> iterations it failed. Once it did - I stopped the loop.
> 
>  From osm.log I see:
> Sep 25 02:50:56 463143 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory).
> ...
> Sep 25 02:50:57 463991 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory).
> ...
> Sep 25 02:50:58 463751 [8003] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory).
> 
> Sep 25 02:50:59 462938 [C004] -> __osm_sr_rcv_respond: [
> Sep 25 02:50:59 462955 [C004] -> __osm_sr_rcv_respond: Generating response with 744 records.
> ...
> Sep 25 02:50:59 463489 [C004] -> osm_vendor_send: RMPP 1 length 131000

That sounds right for 744 service records.

> Sep 25 02:50:59 463518 [C004] -> osm_vendor_send: ERR 5430: Send p_madw = 0x80a49f8 failed -5 (Cannot allocate memory).
> Sep 25 02:50:59 463549 [C004] -> __osm_sa_mad_ctrl_send_err_callback: [
> Sep 25 02:50:59 463566 [C004] -> __osm_sa_mad_ctrl_send_err_callback: ERR 1A06: MAD transaction completed in error.
> 
>  From osmtest I get:
> Sep 25 02:50:56 461412 [4000] -> osmt_get_all_services_and_check_names: Getting All Service Records
> Sep 25 02:50:56 461429 [4000] -> osmv_query_sa: [
> Sep 25 02:50:56 461445 [4000] -> osmv_query_sa DBG:001 SVC_REC_BY_NAME
> Sep 25 02:50:56 461462 [4000] -> __osmv_send_sa_req: [
> Sep 25 02:50:56 461478 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: [
> Sep 25 02:50:56 461498 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: Using previously stored lid:0x0001 sm_lid:0x0001
> Sep 25 02:50:56 461515 [4000] -> __osmv_get_lid_and_sm_lid_by_port_guid: ]
> Sep 25 02:50:56 461555 [4000] -> osm_mad_pool_get: [
> ...
> Sep 25 02:51:00 461961 [8003] -> umad_receiver: ERR 5409: send completed with error (method=12 attr=31) -- dropping.
> Sep 25 02:51:00 461979 [8003] -> umad_receiver: ERR 5410: class 0x3 LID 0x0
> 
> Is it possible there is a max limit on MAD size in umad? 

The memory allocation is just using calloc.

> It seems the SM fails to allocate the size of the MAD required
> for answering the "get all service records" query.

It looks like it may have run out of memory just before this.

> Another interesting message is the last message saying
> "umad_receiver: ERR 5410: class 0x3 LID 0x0" Why is the reported LID 0 ?

Not sure. I'll look into it. This is only "cosmetic" (e.g.
informational).

> Will you be able to handle the mad allocation?

Not sure what you mean by this question. I think this must be a memory
leak situation.

What was your osmtest invocation for this ?

I may have some questions about this as I investigate further. 

-- Hal




More information about the general mailing list