[openib-general] ib_send_cm_req failes with error -22

susan ssbyrn at yahoo.com
Wed Apr 19 11:27:21 PDT 2006


hello,

i'm writing sample kernel ulp driver to get me acquainted
with openib stack on linux kernel 2.6.16.2 (fedora 5) with
openib gen 2 stack checkout from openib.org website.

the setup is two nodes with point-to-point connection, viz.
primary & secondary node. the secondary node starts in a
listen mode, until primary node makes a connection and start
exchanging messages.

the problem that i am running into is that the ib_send_cm_req
api fails with errorno 22. i'm using local id to make the
connection on port 1. ib_send_cm_req() api calls function
cm_init_av_by_path(), which calls ib_find_cached_gid().
function ib_find_cached_gid() fails because it can't locate
cached gid in device's cache table. below is full control
flow from both primary & secondary node.

  priamry node				   secondary node
 --------------                       -----------------
						ib_register_client()
						using active port = 1
						ib_create_cm_id()
						ib_cm_listen()
						listening .... (waiting)

ib_register_client()
using active port = 1
ib_create_cm_id()
ib_alloc_pd()
ib_create_cq()
ib_req_notify_cq()
ib_get_dma_mr()
ib_create_qp()
ib_query_gid()
source.lid = 0x1
dest.lid = 0x2
ib_sa_path_rec_get()
sa_path_rec handler returned success
ib_send_cm_req()
ib_send_cm_req() failed with error -22


how would update device's cache to get cached gid? am i missing
any steps?

here is output from ib* commands:

from primary node:
root at copa:~:23> sminfo 
sminfo: sm lid 0x2 sm guid 0x5ad0000030655, activity count 1707
priority 1 state SMINFO_MASTER 3

root at copa:~:26> ibhosts 
Ca      : 0x0005ad0000030654 ports 2 "Topspin HCA"
Ca      : 0x0005ad0000030860 ports 2 "Topspin HCA"

root at copa:~:28> ibstatus
Infiniband device 'mthca0' port 1 status:
        default gid:     fe80:0000:0000:0000:0005:ad00:0003:0861
        base lid:        0x1
        sm lid:          0x2
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X)

Infiniband device 'mthca0' port 2 status:
        default gid:     fe80:0000:0000:0000:0005:ad00:0003:0862
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            2.5 Gb/sec (1X)

root at copa:~:29> ibnetdiscover 
#
# Topology file: generated on Tue Apr 18 12:36:39 2006
#
# Max of 1 hops discovered
# Initiated from node 0005ad0000030860 port 0005ad0000030861

vendid=0x5ad
devid=0x5a44
sysimgguid=0x5ad000100d050
caguid=0x5ad0000030654
Ca      2 "H-0005ad0000030654"          # Topspin HCA
[1]     "H-0005ad0000030860"[1]         # lid 2 lmc 0

vendid=0x5ad
devid=0x5a44
sysimgguid=0x5ad000100d050
caguid=0x5ad0000030860
Ca      2 "H-0005ad0000030860"          # Topspin HCA
[1]     "H-0005ad0000030654"[1]         # lid 1 lmc

from secondary node:
root at bana:~:8> sminfo 
sminfo: sm lid 0x2 sm guid 0x5ad0000030655, activity count 1738
priority 1 state SMINFO_MASTER 3

root at bana:~:15> ibhosts 
Ca      : 0x0005ad0000030860 ports 2 "Topspin HCA"
Ca      : 0x0005ad0000030654 ports 2 "Topspin HCA"

root at bana:~:16> ibstatus
Infiniband device 'mthca0' port 1 status:
        default gid:     fe80:0000:0000:0000:0005:ad00:0003:0655
        base lid:        0x2
        sm lid:          0x2
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            10 Gb/sec (4X)

Infiniband device 'mthca0' port 2 status:
        default gid:     fe80:0000:0000:0000:0005:ad00:0003:0656
        base lid:        0x0
        sm lid:          0x0
        state:           1: DOWN
        phys state:      2: Polling
        rate:            2.5 Gb/sec (1X)

root at bana:~:17> ibnetdiscover 
#
# Topology file: generated on Tue Apr 18 12:36:15 2006
#
# Max of 1 hops discovered
# Initiated from node 0005ad0000030654 port 0005ad0000030655

vendid=0x5ad
devid=0x5a44
sysimgguid=0x5ad000100d050
caguid=0x5ad0000030860
Ca      2 "H-0005ad0000030860"          # Topspin HCA
[1]     "H-0005ad0000030654"[1]         # lid 1 lmc 0

vendid=0x5ad
devid=0x5a44
sysimgguid=0x5ad000100d050
caguid=0x5ad0000030654
Ca      2 "H-0005ad0000030654"          # Topspin HCA
[1]     "H-0005ad0000030860"[1]         # lid 2 lmc 0


do you know what's wrong?
thanks,
susan





More information about the general mailing list