[openib-general] Automatically connect to SRP target

Vu Pham vuhuong at mellanox.com
Tue Dec 12 00:58:01 PST 2006


PN wrote:
> Hi Vu,
>  
> i have 2 more questions,
> now i have 3 srp targets and use LVM to form a GFS system.
>  
> after setting SRPHA_ENABLE=yes, i found that sometimes (~30%) it will 
> miss a target during reboot.
> i need to manually type "srp_daemon -e -o" to discover the missing target.
> is there any method such that the srp_daemon will repeat to try to 
> ensure all targets were found?
>  

Probably you didn't have a clean shutdown and the srp target 
still had the previous connection around (it does not have 
self clean up dead connection mechanism) then the next login 
the srp target reject the login request

However srp_daemon will scan the fabric every 60 sec and 
should pick up the missing target from previous scan


> also, currently there is only 1 cable connect to each dual ports client.
> is it normal to have the following messages? 
> Dec 12 10:18:10 storage02 run_srp_daemon[5471]: starting srp_daemon: 
> [HCA=mthca0] [port=2]
> Dec 12 10:18:13 storage02 run_srp_daemon[5483]: failed srp_daemon: 
> [HCA=mthca0] [port=2] [exit status=0]
> Dec 12 10:18:43 storage02 run_srp_daemon[5489]: starting srp_daemon: 
> [HCA=mthca0] [port=2]
> Dec 12 10:18:46 storage02 run_srp_daemon[5501]: failed srp_daemon: 
> [HCA=mthca0] [port=2] [exit status=0]
> .....[repeat infinitely]


This is fine. The srp_daemon for port 2 keep running and it 
will detect any target on the fabric if you plug the cable 
in; otherwise, there's no ill effect except these annoying 
error messages

-vu

> 
>  
> Thanks a lot,
> PN
>  
> 
> Below is the log:
>  
> Dec 12 10:17:18 storage02 network: Setting network parameters:  succeeded
> Dec 12 10:17:18 storage02 network: Bringing up loopback interface:  
> succeeded
> Dec 12 10:17:23 storage02 network: Bringing up interface eth0:  succeeded
> Dec 12 10:17:23 storage02 network: Bringing up interface ib0:  succeeded
> Dec 12 10:17:26 storage02 kernel:   REJ reason 0xa
> Dec 12 10:17:26 storage02 kernel: ib_srp: Connection failed
> Dec 12 10:17:26 storage02 kernel: scsi3 : SRP.T10:00D0680000000578
> Dec 12 10:17:26 storage02 kernel:   Vendor: Mellanox  Model: 
> IBSRP10-TGT       Rev: 1.46
> Dec 12 10:17:26 storage02 kernel:   Type:   
> Direct-Access                      ANSI SCSI revision: 03
> Dec 12 10:17:26 storage02 kernel: SCSI device sdb: 160086528 512-byte 
> hdwr sectors (81964 MB)
> Dec 12 10:17:26 storage02 kernel: SCSI device sdb: drive cache: write back
> Dec 12 10:17:26 storage02 kernel: SCSI device sdb: 160086528 512-byte 
> hdwr sectors (81964 MB)
> Dec 12 10:17:26 storage02 kernel: SCSI device sdb: drive cache: write back
> Dec 12 10:17:26 storage02 rpcidmapd: rpc.idmapd startup succeeded
> Dec 12 10:17:26 storage02 kernel:  sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6 
> sdb7 >
> Dec 12 10:17:26 storage02 kernel: Attached scsi disk sdb at scsi3, 
> channel 0, id 0, lun 0
> Dec 12 10:17:26 storage02 kernel: scsi4 : SRP.T10:00D06800000007B2
> Dec 12 10:17:26 storage02 kernel:   Vendor: Mellanox  Model: IBSRP10-TGT 
> hy-b  Rev: 1.46
> Dec 12 10:17:26 storage02 kernel:   Type:   
> Direct-Access                      ANSI SCSI revision: 03
> Dec 12 10:17:26 storage02 kernel: SCSI device sdc: 160086528 512-byte 
> hdwr sectors (81964 MB)
> Dec 12 10:17:26 storage02 kernel: SCSI device sdc: drive cache: write back
> Dec 12 10:17:26 storage02 kernel: SCSI device sdc: 160086528 512-byte 
> hdwr sectors (81964 MB)
> Dec 12 10:17:26 storage02 kernel: SCSI device sdc: drive cache: write back
> Dec 12 10:17:26 storage02 kernel:  sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 >
> Dec 12 10:17:26 storage02 kernel: Attached scsi disk sdc at scsi4, 
> channel 0, id 0, lun 0
> Dec 12 10:17:26 storage02 scsi.agent[3668]: disk at 
> /devices/pci0000:00/0000:00:02.0/0000:01:00.0/host3/target3:0:0/3:0:0:0
> Dec 12 10:17:26 storage02 scsi.agent[3705]: disk at 
> /devices/pci0000:00/0000:00:02.0/0000:01:00.0/host4/target4:0:0/4:0:0:0
> Dec 12 10:17:26 storage02 ccsd[3769]: Starting ccsd 1.0.7:
> Dec 12 10:17:26 storage02 ccsd[3769]:  Built: Aug 26 2006 15:01:49
> Dec 12 10:17:26 storage02 ccsd[3769]:  Copyright (C) Red Hat, Inc.  
> 2004  All rights reserved.
> Dec 12 10:17:26 storage02 kernel: NET: Registered protocol family 10
> Dec 12 10:17:26 storage02 kernel: Disabled Privacy Extensions on device 
> ffffffff80405540(lo)
> Dec 12 10:17:26 storage02 kernel: IPv6 over IPv4 tunneling driver
> Dec 12 10:17:26 storage02 ccsd:  succeeded
> Dec 12 10:17:26 storage02 kernel: CMAN 2.6.9-45.4.centos4 (built Aug 26 
> 2006 14:55:55) installed
> Dec 12 10:17:26 storage02 kernel: NET: Registered protocol family 30
> Dec 12 10:17:26 storage02 kernel: DLM 2.6.9-42.12.centos4 (built Aug 27 
> 2006 05:25:40) installed
> Dec 12 10:17:27 storage02 ccsd[3769]: cluster.conf (cluster name = 
> GFS_Cluster, version = 21) found.
> Dec 12 10:17:27 storage02 ccsd[3769]: Unable to perform sendto: Cannot 
> assign requested address
> Dec 12 10:17:27 storage02 run_srp_daemon[3845]: failed srp_daemon: 
> [HCA=mthca0] [port=2] [exit status=0]
> Dec 12 10:17:28 storage02 run_srp_daemon[3851]: starting srp_daemon: 
> [HCA=mthca0] [port=2]
> Dec 12 10:17:29 storage02 ccsd[3769]: Remote copy of cluster.conf is 
> from quorate node.
> Dec 12 10:17:29 storage02 ccsd[3769]:  Local version # : 21
> Dec 12 10:17:29 storage02 ccsd[3769]:  Remote version #: 21
> Dec 12 10:17:29 storage02 kernel: CMAN: Waiting to join or form a 
> Linux-cluster
> Dec 12 10:17:29 storage02 kernel: CMAN: sending membership request
> Dec 12 10:17:29 storage02 ccsd[3769]: Connected to cluster infrastruture 
> via: CMAN/SM Plugin v1.1.7.1
> Dec 12 10:17:29 storage02 ccsd[3769]: Initial status:: Inquorate
> Dec 12 10:17:30 storage02 kernel: CMAN: got node storage01
> Dec 12 10:17:30 storage02 kernel: CMAN: got node storage03
> Dec 12 10:17:30 storage02 kernel: CMAN: quorum regained, resuming activity
> Dec 12 10:17:30 storage02 ccsd[3769]: Cluster is quorate.  Allowing 
> connections.
> Dec 12 10:17:30 storage02 cman: startup succeeded
> Dec 12 10:17:30 storage02 lock_gulmd: no <gulm> section detected in 
> /etc/cluster/cluster.conf succeeded
> Dec 12 10:17:31 storage02 fenced: startup succeeded
> Dec 12 10:17:31 storage02 run_srp_daemon[4196]: failed srp_daemon: 
> [HCA=mthca0] [port=2] [exit status=0]
> Dec 12 10:17:33 storage02 run_srp_daemon[4224]: starting srp_daemon: 
> [HCA=mthca0] [port=2]
> Dec 12 10:17:36 storage02 run_srp_daemon[4236]: failed srp_daemon: 
> [HCA=mthca0] [port=2] [exit status=0]
> Dec 12 10:17:40 storage02 run_srp_daemon[4242]: starting srp_daemon: 
> [HCA=mthca0] [port=2]
> Dec 12 10:17:42 storage02 clvmd: Cluster LVM daemon started - connected 
> to CMAN
> Dec 12 10:17:42 storage02 kernel: CMAN: WARNING no listener for port 11 
> on node storage01
> Dec 12 10:17:42 storage02 kernel: CMAN: WARNING no listener for port 11 
> on node storage03
> Dec 12 10:17:42 storage02 clvmd: clvmd startup succeeded
> Dec 12 10:17:42 storage02 vgchange:   Couldn't find device with uuid 
> 'U8viRP-K6Ev-0HlZ-5pwK-09co-tXgh-sJJKXT'.
> Dec 12 10:17:42 storage02 vgchange:   Couldn't find all physical volumes 
> for volume group gfsvg.
> Dec 12 10:17:42 storage02 vgchange:
> Dec 12 10:17:42 storage02 vgchange: Couldn't find device with uuid 
> 'U8viRP-K6Ev-0HlZ-5pwK-09co-tXgh-sJJKXT'.
> Dec 12 10:17:42 storage02 vgchange:   Couldn't find all physical volumes 
> for volume group gfsvg.
> Dec 12 10:17:42 storage02 vgchange:   Couldn't find device with uuid 
> 'U8viRP-K6Ev-0HlZ-5pwK-09co-tXgh-sJJKXT'.
> Dec 12 10:17:42 storage02 vgchange:   Couldn't find all physical volumes 
> for volume group gfsvg.
> Dec 12 10:17:42 storage02 vgchange:   Couldn't find device with uuid 
> 'U8viRP-K6Ev-0HlZ-5pwK-09co-tXgh-sJJKXT'.
> Dec 12 10:17:42 storage02 vgchange:   Couldn't find all physical volumes 
> for volume group gfsvg.
> Dec 12 10:17:42 storage02 vgchange:   Volume group "gfsvg" not found
> Dec 12 10:17:42 storage02 clvmd: Activating VGs: failed
> Dec 12 10:17:42 storage02 netfs: Mounting other filesystems:  succeeded
> Dec 12 10:17:42 storage02 kernel: Lock_Harness 2.6.9-58.2.centos4 (built 
> Aug 27 2006 05:27:43) installed
> Dec 12 10:17:42 storage02 kernel: GFS 2.6.9-58.2.centos4 (built Aug 27 
> 2006 05:28:00) installed
> Dec 12 10:17:42 storage02 mount: mount: special device /dev/gfsvg/gfslv 
> does not exist
> Dec 12 10:17:42 storage02 gfs: Mounting GFS filesystems:  failed
> Dec 12 10:17:42 storage02 kernel: i2c /dev entries driver
> .....
>  
>  
>  
>  
>  
>  
> 2006/12/12, Vu Pham <vuhuong at mellanox.com <mailto:vuhuong at mellanox.com>>:
> 
>     PN,
>       Edit file /etc/infiniband/openib.conf and set
> 
>     SRPHA_ENABLE=yes
> 
>     this will start srp_daemon by default
> 
>     -vu
> 
>      > No one can help me? :(
>      >
>      > PN
>      >
>      >
>      > 2006/12/7, Lai Dragonfly <poknam at gmail.com
>     <mailto:poknam at gmail.com> <mailto:poknam at gmail.com
>     <mailto:poknam at gmail.com>>>:
>      >
>      >     Hi all,
>      >
>      >     i'm using CentOS 4.4 (kernel 2.6.9-42.ELsmp) with OFED-1.1 in
>      >     clients and
>      >     IBGD-1.8.2-srpt in targets.
>      >     i found that even i use "modprobe ib_srp" or set SRP_LOAD=yes in
>      >     openib.conf,
>      >     i could not found the SRP target.
>      >     until i execute "srp_daemon -e -o", i can see all the targets
>     appear
>      >     in /dev/sdX.
>      >
>      >     since i want to export the targets to other nodes,
>      >     any idea so that i can connect to the targets automatically
>     in each
>      >     reboot.
>      >     without typing "srp_daemon -e -o" each time?
>      >
>      >     thanks in advance.
>      >
>      >     PN
>      >
>      >
>      >
>      >
>     ------------------------------------------------------------------------
>      >
>      > _______________________________________________
>      > openib-general mailing list
>      > openib-general at openib.org <mailto:openib-general at openib.org>
>      > http://openib.org/mailman/listinfo/openib-general
>      >
>      > To unsubscribe, please visit
>     http://openib.org/mailman/listinfo/openib-general
> 
> 





More information about the general mailing list