[ewg] /dev/infiniband/rdma_cm not created

Jeff Squyres jsquyres at cisco.com
Wed May 13 11:54:55 PDT 2009


On May 13, 2009, at 2:39 PM, Woodruff, Robert J wrote:

> Is the driver loaded ? ie., do an /sbin/lsmod to see.
>

Ah ha -- no, it is not:

[11:51] svbu-mpi005:/etc/udev/rules.d % /sbin/lsmod | grep rdma
[11:51] svbu-mpi005:/etc/udev/rules.d %

What would cause it to not be loaded?  I *assumed* (but didn't check)  
that it is loaded as part of OFED's /etc/init.d/openibd.  Is that  
correct?

> Also are there any messages that would indicate a
> problem when you do a dmesg.
>

As I indicated in my first mail :-), no.

>
>
>
> -----Original Message-----
> From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces at lists.openfabrics.org 
> ] On Behalf Of Jeff Squyres
> Sent: Wednesday, May 13, 2009 11:34 AM
> To: OpenFabrics General; OpenFabrics EWG
> Subject: [ewg] /dev/infiniband/rdma_cm not created
>
> I'm running on rhel4u6 with the 1.4.1 nightly from last night and
> sometimes /dev/infiniband/rdma_cm is not created.  I can see its entry
> in /etc/udev/rules.d/90-ib.rules:
>
> KERNEL="umad*", NAME="infiniband/%k"
> KERNEL="issm*", NAME="infiniband/%k"
> KERNEL="ucm*", NAME="infiniband/%k", MODE="0666"
> KERNEL="uverbs*", NAME="infiniband/%k", MODE="0666"
> KERNEL="ucma", NAME="infiniband/%k", MODE="0666"
> KERNEL="rdma_cm", NAME="infiniband/%k", MODE="0666"
>
> But only some of these are created:
>
> [11:29] svbu-mpi005:/etc/udev/rules.d % l /dev/infiniband/
> total 0
> drwxr-xr-x   2 root root      120 May 13 02:39 ./
> drwxr-xr-x  10 root root     5740 May 13 09:39 ../
> crw-------   1 root root 231,  64 May 13 02:39 issm0
> crw-------   1 root root 231,   0 May 13 02:39 umad0
> crw-rw-rw-   1 root root 231, 192 May 13 02:39 uverbs0
> crw-rw-rw-   1 root root 231, 193 May 13 02:39 uverbs1
> [11:29] svbu-mpi005:/etc/udev/rules.d %
>
> I have both an IB HCA and an iWARP RNIC in this server:
>
> hca_id: mthca0
>         fw_ver:                         1.2.917
>         node_guid:                      0005:ad00:0008:bd60
>         sys_image_guid:                 0005:ad00:0100:d050
>         vendor_id:                      0x05ad
>         vendor_part_id:                 25204
>         hw_ver:                         0xA0
>         board_id:                       MT_03B0120002
>         phys_port_cnt:                  1
>                 port:   1
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                2048 (4)
>                         active_mtu:             2048 (4)
>                         sm_lid:                 2
>                         port_lid:               34
>                         port_lmc:               0x00
>
> hca_id: nes0
>         node_guid:                      0012:5502:b58c:0000
>         sys_image_guid:                 0012:5502:b58c:0000
>         vendor_id:                      0x1255
>         vendor_part_id:                 256
>         hw_ver:                         0x5
>         board_id:                       NES020 Board ID
>         phys_port_cnt:                  1
>                 port:   1
>                         state:                  PORT_ACTIVE (4)
>                         max_mtu:                2048 (4)
>                         active_mtu:             2048 (4)
>                         sm_lid:                 0
>                         port_lid:               1
>                         port_lmc:               0x00
>
> I don't see any obvious errors occurring in syslog or dmesg.
>
> What could cause this failure?
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg


-- 
Jeff Squyres
Cisco Systems




More information about the ewg mailing list