[openib-general] opensm
Batwara, Ashish
Ashish.Batwara at lsi.com
Thu Dec 21 13:39:00 PST 2006
Thanks Vu,
This seems to be working.
Thanks
Ashish
-----Original Message-----
From: Vu Pham [mailto:vuhuong at mellanox.com]
Sent: Wednesday, December 20, 2006 3:23 PM
To: Batwara, Ashish
Cc: Hal Rosenstock; ishai at mellanox.co.il; openib-general at openib.org
Subject: Re: [openib-general] opensm
Hi Ashish,
> Hi,
> Please see the information below
>
> This is what I did:
> /etc/init.d/openibd start
> /etc/init.d/opensmd start
> modprobe ib_srp
>
> Issued the command /usr/local/ofed/sbin/ibsrpdm -c to get the
> information about target and used them in
>
By default without -d option, ibsrpdm will use
/dev/infiniband/umad0 -- with corresponding to port 1 of mthca0
> echo id_ext=200400A0B81146A1,ioc_guid=0002c90200402bd4,
>
>
dgid=fe800000000000000002c90200402bd5,pkey=ffff,service_id=200400a0b8114
> 6a1 > /sys/class/infiniband_srp/srp-mthca0-1/add_target
This is correct by using srp-mthca0-1; however, I got this
from your previous email which you reported *I am seeing the
error " Got failed path rec status -110 " on Linux console*
echo
id_ext=200300A0B811C847,ioc_guid=00a0b8020022cd27,dgid=fe800000000000000
002c9020022cd26,pkey=ffff,service_id=200300a0b811c847
> /sys/class/infiniband_srp/srp-mthca0-2/add_target
You used port 2 of mthca0 here ie. srp-mthca0-2; therefore,
you got pathrecord failure
Please retry:
0. Make sure you connect port 1 of host hca to target (since
you connect them directly. Port 2 work as well but you have
to use the umad1 and srp-mthca0-2 for steps 1,2 below)
1. ibsrpdm -c -d /dev/infiniband/umad0
2. echo whatever target discover to srp-mthca0-1
-vu
>
> Yes, earlier I had silverstorm switch which was running SM but now I
> have taken that out and directly connecting the target and host.
>
> I have only one port connected between the host and the target.
> The reason behind link is not stable is that I am restarting and
> stopping again and again, as this does not seem to be working and I
did
> not know the issue until I looked at the console log which was
> indicating "Got failed path rec status -110" and after seeing that I
> searched on goggle and found that
>
"https://lists.scl.ameslab.gov/pipermail/sc05-ib/2005-November/000383.ht
> ml" it seems to be a bug with 64-bit machine.
> BTW, my linux server is 64-bit.
> When I hooked up 32-bit server running OFED-1.1, I see my target
> discovered with the same procedure.
>
> So, whole question is that what is the fix for issue "Got failed path
> rec status -110" on 64-bit machine.
>
> Thanks
> Ashish
>
> -----Original Message-----
> From: Hal Rosenstock [mailto:halr at voltaire.com]
> Sent: Tuesday, December 19, 2006 10:35 PM
> To: Batwara, Ashish
> Cc: Eitan Zahavi; ishai at mellanox.co.il; openib-general at openib.org
> Subject: RE: [openib-general] opensm
>
> On Tue, 2006-12-19 at 18:22, Batwara, Ashish wrote:
>> Hi,
>> Please look towards the end of the attached file.
>
> What options are you starting opensm with ? What is the command line ?
>
> Also, it looks like (at least at one point) you have another SM on the
> subnet. What is the make (vendor) for your switch ?
>
> I see many SM port is DOWN. What is going on with this port ? Why is
the
> physical link not LinkUp and stable ? That is the main issue and is
> likely why the SubnGet of NodeInfo is not being responded to.
>
> -- Hal
>
>> Thanks
>> Ashish
>>
>> -----Original Message-----
>> From: Hal Rosenstock [mailto:halr at voltaire.com]
>> Sent: Tuesday, December 19, 2006 5:06 PM
>> To: Batwara, Ashish
>> Cc: Eitan Zahavi; ishai at mellanox.co.il; openib-general at openib.org
>> Subject: Re: [openib-general] opensm
>>
>> Ashish,
>>
>> On Tue, 2006-12-19 at 17:43, Batwara, Ashish wrote:
>>> Hi,
>>>
>>> Here is the info that you have asked. I am seeing the Subnet manager
>>> is up now having the port active. But server is not able to discover
>>> the target. I am seeing the error "Got failed path rec status -110"
> on
>>> Linux console.
>> That means the request for an SA PathRecord from the initiator to the
>> target failed (-110 is ETIMEDOUT). Are you sure the target is up
>> (ACTIVE) on the subnet ? If it is, can you send the opensm log ?
>>
>> -- Hal
>>
>>> Below are the output of different commands. I am using following to
>>> discover the target:
>>>
>>>
>>>
>>> /etc/init.d/opensmd start
>>>
>>> /etc/init.d/openibd start
>>>
>>> modprobe ib_srp
>>>
>>> echo
>>>
>
id_ext=200300A0B811C847,ioc_guid=00a0b8020022cd27,dgid=fe800000000000000
>> 002c9020022cd26,pkey=ffff,service_id=200300a0b811c847 >
>> /sys/class/infiniband_srp/srp-mthca0-2/add_target
>>>
>>>
>>>
>>>
>>> [root at p49 ~]# ibv_devinfo
>>>
>>> hca_id: mthca0
>>>
>>> fw_ver: 5.1.400
>>>
>>> node_guid: 0002:c902:0022:cce0
>>>
>>> sys_image_guid: 0002:c902:0022:cce3
>>>
>>> vendor_id: 0x02c9
>>>
>>> vendor_part_id: 25218
>>>
>>> hw_ver: 0xA0
>>>
>>> board_id: MT_0370130002
>>>
>>> phys_port_cnt: 2
>>>
>>> port: 1
>>>
>>> state: PORT_DOWN (1)
>>>
>>> max_mtu: 2048 (4)
>>>
>>> active_mtu: 512 (2)
>>>
>>> sm_lid: 0
>>>
>>> port_lid: 0
>>>
>>> port_lmc: 0x00
>>>
>>>
>>>
>>> port: 2
>>>
>>> state: PORT_ACTIVE (4)
>>>
>>> max_mtu: 2048 (4)
>>>
>>> active_mtu: 2048 (4)
>>>
>>> sm_lid: 1
>>>
>>> port_lid: 1
>>>
>>> port_lmc: 0x00
>>> hca_id: mthca1
>>>
>>> fw_ver: 5.1.400
>>>
>>> node_guid: 0002:c902:0022:cd2c
>>>
>>> sys_image_guid: 0002:c902:0022:cd2f
>>>
>>> vendor_id: 0x02c9
>>>
>>> vendor_part_id: 25218
>>>
>>> hw_ver: 0xA0
>>>
>>> board_id: MT_0370130002
>>>
>>> phys_port_cnt: 2
>>>
>>> port: 1
>>>
>>> state: PORT_DOWN (1)
>>>
>>> max_mtu: 2048 (4)
>>>
>>> active_mtu: 512 (2)
>>>
>>> sm_lid: 0
>>>
>>> port_lid: 0
>>>
>>> port_lmc: 0x00
>>>
>>>
>>>
>>> port: 2
>>>
>>> state: PORT_DOWN (1)
>>>
>>> max_mtu: 2048 (4)
>>>
>>> active_mtu: 512 (2)
>>>
>>> sm_lid: 0
>>>
>>> port_lid: 0
>>>
>>> port_lmc: 0x00
>>>
>>>
>>>
>>>
>>>
>>> [root at p49 ~]# uname -a
>>>
>>> Linux p49.ks.lsil.com 2.6.9-42.0.3.ELsmp #1 SMP Mon Sep 25 17:24:31
>>> EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>>
>>>
>>> [root at p49 ~]# cat /etc/infiniband/info
>>>
>>> #!/bin/bash
>>>
>>>
>>>
>>> echo prefix=/usr/local/ofed
>>>
>>> echo Kernel=2.6.9-42.0.3.ELsmp
>>>
>>> echo
>>>
>>> echo "Configure options: --with-dapl --with-ipoibtools
> --with-libibcm
>>> --with-libibcommon --with-libibmad --with-libibumad
> --with-libibverbs
>>> --with-libipathverbs --with-libmthca --with-opensm --with-librdmacm
>>> --with-libsdp --with-openib-diags --with-srptools --with-mstflint
>>> --with-perftest --with-tvflash --with-ipath_inf-mod --with-ipoib-mod
>>> --with-mthca-mod --with-sdp-mod --with-srp-mod --with-core-mod
>>> --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod"
>>>
>>> echo
>>>
>>>
>>>
>>> OFED Version: OFED-1.1
>>
>>
>>> Thanks
>>>
>>> Ashish
>>>
>>> -----Original Message-----
>>> From: Eitan Zahavi [mailto:eitan at mellanox.co.il]
>>> Sent: Tuesday, December 19, 2006 5:18 AM
>>> To: Batwara, Ashish
>>> Cc: ishai at mellanox.co.il; openib-general at openib.org
>>> Subject: Re: [openib-general] opensm
>>>
>>>
>>>
>>> Hi Ashish,
>>>
>>>
>>>
>>> SRP people say they have no such error message.
>>>
>>> OpenSM does. So I take it back.
>>>
>>>
>>>
>>> Ashish,
>>>
>>> Please provide more into:
>>>
>>>
>>>
>>> 1. ibv_devinfo
>>>
>>> 2. Version of code you are using
>>>
>>> 3. Command line you use for starting opensm
>>>
>>> 4. /var/log/osm.log
>>>
>>>
>>>
>>> Thanks and sorry for the confusion.
>>>
>>>
>>>
>>> EZ
>>>
>>>
>>>
>>> Eitan Zahavi wrote:
>>>
>>>> This is not an OpenSM issue.
>>>> Forwarded to the SRP people.
>>>> EZ
>>>> Batwara, Ashish wrote:
>>>>
>>>>> Hi,
>>>>> I am trying to run opensm on Linux server. It has two HCAs
>>> (4-ports) and
>>>
>>>>> connected to IB Switch. ibnodes command displays the information
>>> about
>>>
>>>>> the Switch ports and HCA ports.
>>>>> When I start opensm, I see in /var/log/messages "Starting
>>> srp_daemon"
>>>
>>>>> for all the 4 ports and immediately after I see "failed
> srp_daemon"
>>> for
>>>
>>>>> all the ports and the displays "SM Port is down".
>>>>> I tried several times and even rebooted the server few times but
> no
>>>>> luck.
>>>>> Does anybody know what this problem is?
>>>>> Thanks
>>>>> Ashish
>>>>> _______________________________________________
>>>>> openib-general mailing list
>>>>> openib-general at openib.org
>>>>> http://openib.org/mailman/listinfo/openib-general
>>>>> To unsubscribe, please visit
>>> http://openib.org/mailman/listinfo/openib-general
>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> openib-general mailing list
>>>> openib-general at openib.org
>>>> http://openib.org/mailman/listinfo/openib-general
>>>> To unsubscribe, please visit
>>> http://openib.org/mailman/listinfo/openib-general
>>>
>>>>
>>>
>>>
>>>
>>>
>>>
> ______________________________________________________________________
>>> _______________________________________________
>>> openib-general mailing list
>>> openib-general at openib.org
>>> http://openib.org/mailman/listinfo/openib-general
>>>
>>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list