[ewg] /dev/infiniband/rdma_cm not created
Jeff Squyres
jsquyres at cisco.com
Wed May 13 11:54:55 PDT 2009
On May 13, 2009, at 2:39 PM, Woodruff, Robert J wrote:
> Is the driver loaded ? ie., do an /sbin/lsmod to see.
>
Ah ha -- no, it is not:
[11:51] svbu-mpi005:/etc/udev/rules.d % /sbin/lsmod | grep rdma
[11:51] svbu-mpi005:/etc/udev/rules.d %
What would cause it to not be loaded? I *assumed* (but didn't check)
that it is loaded as part of OFED's /etc/init.d/openibd. Is that
correct?
> Also are there any messages that would indicate a
> problem when you do a dmesg.
>
As I indicated in my first mail :-), no.
>
>
>
> -----Original Message-----
> From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces at lists.openfabrics.org
> ] On Behalf Of Jeff Squyres
> Sent: Wednesday, May 13, 2009 11:34 AM
> To: OpenFabrics General; OpenFabrics EWG
> Subject: [ewg] /dev/infiniband/rdma_cm not created
>
> I'm running on rhel4u6 with the 1.4.1 nightly from last night and
> sometimes /dev/infiniband/rdma_cm is not created. I can see its entry
> in /etc/udev/rules.d/90-ib.rules:
>
> KERNEL="umad*", NAME="infiniband/%k"
> KERNEL="issm*", NAME="infiniband/%k"
> KERNEL="ucm*", NAME="infiniband/%k", MODE="0666"
> KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666"
> KERNEL="ucma", NAME="infiniband/%k", MODE="0666"
> KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666"
>
> But only some of these are created:
>
> [11:29] svbu-mpi005:/etc/udev/rules.d % l /dev/infiniband/
> total 0
> drwxr-xr-x 2 root root 120 May 13 02:39 ./
> drwxr-xr-x 10 root root 5740 May 13 09:39 ../
> crw------- 1 root root 231, 64 May 13 02:39 issm0
> crw------- 1 root root 231, 0 May 13 02:39 umad0
> crw-rw-rw- 1 root root 231, 192 May 13 02:39 uverbs0
> crw-rw-rw- 1 root root 231, 193 May 13 02:39 uverbs1
> [11:29] svbu-mpi005:/etc/udev/rules.d %
>
> I have both an IB HCA and an iWARP RNIC in this server:
>
> hca_id: mthca0
> fw_ver: 1.2.917
> node_guid: 0005:ad00:0008:bd60
> sys_image_guid: 0005:ad00:0100:d050
> vendor_id: 0x05ad
> vendor_part_id: 25204
> hw_ver: 0xA0
> board_id: MT_03B0120002
> phys_port_cnt: 1
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 2
> port_lid: 34
> port_lmc: 0x00
>
> hca_id: nes0
> node_guid: 0012:5502:b58c:0000
> sys_image_guid: 0012:5502:b58c:0000
> vendor_id: 0x1255
> vendor_part_id: 256
> hw_ver: 0x5
> board_id: NES020 Board ID
> phys_port_cnt: 1
> port: 1
> state: PORT_ACTIVE (4)
> max_mtu: 2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 0
> port_lid: 1
> port_lmc: 0x00
>
> I don't see any obvious errors occurring in syslog or dmesg.
>
> What could cause this failure?
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
--
Jeff Squyres
Cisco Systems
More information about the ewg
mailing list