[openib-general] Automatically connect to SRP target
Vu Pham
vuhuong at mellanox.com
Tue Dec 12 00:58:01 PST 2006
PN wrote:
> Hi Vu,
>
> i have 2 more questions,
> now i have 3 srp targets and use LVM to form a GFS system.
>
> after setting SRPHA_ENABLE=yes, i found that sometimes (~30%) it will
> miss a target during reboot.
> i need to manually type "srp_daemon -e -o" to discover the missing target.
> is there any method such that the srp_daemon will repeat to try to
> ensure all targets were found?
>
Probably you didn't have a clean shutdown and the srp target
still had the previous connection around (it does not have
self clean up dead connection mechanism) then the next login
the srp target reject the login request
However srp_daemon will scan the fabric every 60 sec and
should pick up the missing target from previous scan
> also, currently there is only 1 cable connect to each dual ports client.
> is it normal to have the following messages?
> Dec 12 10:18:10 storage02 run_srp_daemon[5471]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Dec 12 10:18:13 storage02 run_srp_daemon[5483]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Dec 12 10:18:43 storage02 run_srp_daemon[5489]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Dec 12 10:18:46 storage02 run_srp_daemon[5501]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> .....[repeat infinitely]
This is fine. The srp_daemon for port 2 keep running and it
will detect any target on the fabric if you plug the cable
in; otherwise, there's no ill effect except these annoying
error messages
-vu
>
>
> Thanks a lot,
> PN
>
>
> Below is the log:
>
> Dec 12 10:17:18 storage02 network: Setting network parameters: succeeded
> Dec 12 10:17:18 storage02 network: Bringing up loopback interface:
> succeeded
> Dec 12 10:17:23 storage02 network: Bringing up interface eth0: succeeded
> Dec 12 10:17:23 storage02 network: Bringing up interface ib0: succeeded
> Dec 12 10:17:26 storage02 kernel: REJ reason 0xa
> Dec 12 10:17:26 storage02 kernel: ib_srp: Connection failed
> Dec 12 10:17:26 storage02 kernel: scsi3 : SRP.T10:00D0680000000578
> Dec 12 10:17:26 storage02 kernel: Vendor: Mellanox Model:
> IBSRP10-TGT Rev: 1.46
> Dec 12 10:17:26 storage02 kernel: Type:
> Direct-Access ANSI SCSI revision: 03
> Dec 12 10:17:26 storage02 kernel: SCSI device sdb: 160086528 512-byte
> hdwr sectors (81964 MB)
> Dec 12 10:17:26 storage02 kernel: SCSI device sdb: drive cache: write back
> Dec 12 10:17:26 storage02 kernel: SCSI device sdb: 160086528 512-byte
> hdwr sectors (81964 MB)
> Dec 12 10:17:26 storage02 kernel: SCSI device sdb: drive cache: write back
> Dec 12 10:17:26 storage02 rpcidmapd: rpc.idmapd startup succeeded
> Dec 12 10:17:26 storage02 kernel: sdb: sdb1 sdb2 sdb3 sdb4 < sdb5 sdb6
> sdb7 >
> Dec 12 10:17:26 storage02 kernel: Attached scsi disk sdb at scsi3,
> channel 0, id 0, lun 0
> Dec 12 10:17:26 storage02 kernel: scsi4 : SRP.T10:00D06800000007B2
> Dec 12 10:17:26 storage02 kernel: Vendor: Mellanox Model: IBSRP10-TGT
> hy-b Rev: 1.46
> Dec 12 10:17:26 storage02 kernel: Type:
> Direct-Access ANSI SCSI revision: 03
> Dec 12 10:17:26 storage02 kernel: SCSI device sdc: 160086528 512-byte
> hdwr sectors (81964 MB)
> Dec 12 10:17:26 storage02 kernel: SCSI device sdc: drive cache: write back
> Dec 12 10:17:26 storage02 kernel: SCSI device sdc: 160086528 512-byte
> hdwr sectors (81964 MB)
> Dec 12 10:17:26 storage02 kernel: SCSI device sdc: drive cache: write back
> Dec 12 10:17:26 storage02 kernel: sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 >
> Dec 12 10:17:26 storage02 kernel: Attached scsi disk sdc at scsi4,
> channel 0, id 0, lun 0
> Dec 12 10:17:26 storage02 scsi.agent[3668]: disk at
> /devices/pci0000:00/0000:00:02.0/0000:01:00.0/host3/target3:0:0/3:0:0:0
> Dec 12 10:17:26 storage02 scsi.agent[3705]: disk at
> /devices/pci0000:00/0000:00:02.0/0000:01:00.0/host4/target4:0:0/4:0:0:0
> Dec 12 10:17:26 storage02 ccsd[3769]: Starting ccsd 1.0.7:
> Dec 12 10:17:26 storage02 ccsd[3769]: Built: Aug 26 2006 15:01:49
> Dec 12 10:17:26 storage02 ccsd[3769]: Copyright (C) Red Hat, Inc.
> 2004 All rights reserved.
> Dec 12 10:17:26 storage02 kernel: NET: Registered protocol family 10
> Dec 12 10:17:26 storage02 kernel: Disabled Privacy Extensions on device
> ffffffff80405540(lo)
> Dec 12 10:17:26 storage02 kernel: IPv6 over IPv4 tunneling driver
> Dec 12 10:17:26 storage02 ccsd: succeeded
> Dec 12 10:17:26 storage02 kernel: CMAN 2.6.9-45.4.centos4 (built Aug 26
> 2006 14:55:55) installed
> Dec 12 10:17:26 storage02 kernel: NET: Registered protocol family 30
> Dec 12 10:17:26 storage02 kernel: DLM 2.6.9-42.12.centos4 (built Aug 27
> 2006 05:25:40) installed
> Dec 12 10:17:27 storage02 ccsd[3769]: cluster.conf (cluster name =
> GFS_Cluster, version = 21) found.
> Dec 12 10:17:27 storage02 ccsd[3769]: Unable to perform sendto: Cannot
> assign requested address
> Dec 12 10:17:27 storage02 run_srp_daemon[3845]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Dec 12 10:17:28 storage02 run_srp_daemon[3851]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Dec 12 10:17:29 storage02 ccsd[3769]: Remote copy of cluster.conf is
> from quorate node.
> Dec 12 10:17:29 storage02 ccsd[3769]: Local version # : 21
> Dec 12 10:17:29 storage02 ccsd[3769]: Remote version #: 21
> Dec 12 10:17:29 storage02 kernel: CMAN: Waiting to join or form a
> Linux-cluster
> Dec 12 10:17:29 storage02 kernel: CMAN: sending membership request
> Dec 12 10:17:29 storage02 ccsd[3769]: Connected to cluster infrastruture
> via: CMAN/SM Plugin v1.1.7.1
> Dec 12 10:17:29 storage02 ccsd[3769]: Initial status:: Inquorate
> Dec 12 10:17:30 storage02 kernel: CMAN: got node storage01
> Dec 12 10:17:30 storage02 kernel: CMAN: got node storage03
> Dec 12 10:17:30 storage02 kernel: CMAN: quorum regained, resuming activity
> Dec 12 10:17:30 storage02 ccsd[3769]: Cluster is quorate. Allowing
> connections.
> Dec 12 10:17:30 storage02 cman: startup succeeded
> Dec 12 10:17:30 storage02 lock_gulmd: no <gulm> section detected in
> /etc/cluster/cluster.conf succeeded
> Dec 12 10:17:31 storage02 fenced: startup succeeded
> Dec 12 10:17:31 storage02 run_srp_daemon[4196]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Dec 12 10:17:33 storage02 run_srp_daemon[4224]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Dec 12 10:17:36 storage02 run_srp_daemon[4236]: failed srp_daemon:
> [HCA=mthca0] [port=2] [exit status=0]
> Dec 12 10:17:40 storage02 run_srp_daemon[4242]: starting srp_daemon:
> [HCA=mthca0] [port=2]
> Dec 12 10:17:42 storage02 clvmd: Cluster LVM daemon started - connected
> to CMAN
> Dec 12 10:17:42 storage02 kernel: CMAN: WARNING no listener for port 11
> on node storage01
> Dec 12 10:17:42 storage02 kernel: CMAN: WARNING no listener for port 11
> on node storage03
> Dec 12 10:17:42 storage02 clvmd: clvmd startup succeeded
> Dec 12 10:17:42 storage02 vgchange: Couldn't find device with uuid
> 'U8viRP-K6Ev-0HlZ-5pwK-09co-tXgh-sJJKXT'.
> Dec 12 10:17:42 storage02 vgchange: Couldn't find all physical volumes
> for volume group gfsvg.
> Dec 12 10:17:42 storage02 vgchange:
> Dec 12 10:17:42 storage02 vgchange: Couldn't find device with uuid
> 'U8viRP-K6Ev-0HlZ-5pwK-09co-tXgh-sJJKXT'.
> Dec 12 10:17:42 storage02 vgchange: Couldn't find all physical volumes
> for volume group gfsvg.
> Dec 12 10:17:42 storage02 vgchange: Couldn't find device with uuid
> 'U8viRP-K6Ev-0HlZ-5pwK-09co-tXgh-sJJKXT'.
> Dec 12 10:17:42 storage02 vgchange: Couldn't find all physical volumes
> for volume group gfsvg.
> Dec 12 10:17:42 storage02 vgchange: Couldn't find device with uuid
> 'U8viRP-K6Ev-0HlZ-5pwK-09co-tXgh-sJJKXT'.
> Dec 12 10:17:42 storage02 vgchange: Couldn't find all physical volumes
> for volume group gfsvg.
> Dec 12 10:17:42 storage02 vgchange: Volume group "gfsvg" not found
> Dec 12 10:17:42 storage02 clvmd: Activating VGs: failed
> Dec 12 10:17:42 storage02 netfs: Mounting other filesystems: succeeded
> Dec 12 10:17:42 storage02 kernel: Lock_Harness 2.6.9-58.2.centos4 (built
> Aug 27 2006 05:27:43) installed
> Dec 12 10:17:42 storage02 kernel: GFS 2.6.9-58.2.centos4 (built Aug 27
> 2006 05:28:00) installed
> Dec 12 10:17:42 storage02 mount: mount: special device /dev/gfsvg/gfslv
> does not exist
> Dec 12 10:17:42 storage02 gfs: Mounting GFS filesystems: failed
> Dec 12 10:17:42 storage02 kernel: i2c /dev entries driver
> .....
>
>
>
>
>
>
> 2006/12/12, Vu Pham <vuhuong at mellanox.com <mailto:vuhuong at mellanox.com>>:
>
> PN,
> Edit file /etc/infiniband/openib.conf and set
>
> SRPHA_ENABLE=yes
>
> this will start srp_daemon by default
>
> -vu
>
> > No one can help me? :(
> >
> > PN
> >
> >
> > 2006/12/7, Lai Dragonfly <poknam at gmail.com
> <mailto:poknam at gmail.com> <mailto:poknam at gmail.com
> <mailto:poknam at gmail.com>>>:
> >
> > Hi all,
> >
> > i'm using CentOS 4.4 (kernel 2.6.9-42.ELsmp) with OFED-1.1 in
> > clients and
> > IBGD-1.8.2-srpt in targets.
> > i found that even i use "modprobe ib_srp" or set SRP_LOAD=yes in
> > openib.conf,
> > i could not found the SRP target.
> > until i execute "srp_daemon -e -o", i can see all the targets
> appear
> > in /dev/sdX.
> >
> > since i want to export the targets to other nodes,
> > any idea so that i can connect to the targets automatically
> in each
> > reboot.
> > without typing "srp_daemon -e -o" each time?
> >
> > thanks in advance.
> >
> > PN
> >
> >
> >
> >
> ------------------------------------------------------------------------
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org <mailto:openib-general at openib.org>
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
>
More information about the general
mailing list