[ofa-general] Infiniband data transfer across different IB drivers

Dukle, Kapil (GE Healthcare) Kapil.Dukle at med.ge.com
Fri Jun 15 09:21:16 PDT 2007


Hi, 
I am currently experimenting with Infiniband data transfers across two
servers with different operating systems
and IB drivers.
 
Server A runs VxWorks 5.5 and uses Mellanox IB driver modules and VAPI
interface
  
Server B runs Linux 2.6.x and uses OFED 1.0 drivers and the OFED Verbs
API

Problem:
I have written code (that makes the respective Verbs calls) to setup
queue pairs and initialize them with the
destination queue pair number and lid. The connection type is IBV_QPT_RC
(Reliable Connection).
The traces seem to confirm that the destination qpn, lid values are
correct. The next thing
I try to do is to post send requests on Server A, and receive requests
on Server B. I then check the 
respective completion queues for any events. The problem is that I do
NOT see any completion events on 
the receive completion queue for Server B.

Questions:
- Are these two drivers (Mellanox VAPI and OFED) compatible with each
other in the first place?

- Is it possible to verify the two queue pairs are indeed "connected" to
each other?

- Can I enable some debug mechanism at the driver level to see what the
send/receive requests translate to, and what the underlying
errors could be (if any)?


Here is some information about the network that may help:

[root at ServerB ~]# ps -elf | grep opensm
4 S root      2695     1  0  32   - - 14738 stext  Jun14 ?
00:00:00 /usr/local/ofed/bin/opensm -t 200 -g 0
0 S root     12030 11992  0  76   0 - 13981 pipe_w 11:18 pts/1
00:00:00 grep opensm

[root at ServerB ~]# sminfo
sminfo: sm lid 0x1 sm guid 0x2c90200212251, activity count 40926
priority 1 state SMINFO_MASTER 3


[root at ServerB ~]# ibnetdiscover -v
        [1] {0002c90200212250}
DR path [0][1] -> new remote ca {00d01c000001010a} portnum 2 lid 0x2-0x2
"ServerA HCA-1 (Topspin HCA)"
        [2] {00d01c000001010a}
#
# Topology file: generated on Fri Jun 15 11:05:52 2007
#
# Max of 1 hops discovered
# Initiated from node 0002c90200212250 port 0002c90200212251

vendid=0xd01c
devid=0x5a44
sysimgguid=0xd01c000001010a
caguid=0xd01c000001010a
Ca      2 "H-00d01c000001010a"          # ServerA HCA-1 (Topspin HCA)
[2]     "H-0002c90200212250"[1]         # lid 2 lmc 0

vendid=0x2c9
devid=0x5a44
sysimgguid=0x2c90200212253
caguid=0x2c90200212250
Ca      2 "H-0002c90200212250"          # ServerB HCA-1
[1]     "H-00d01c000001010a"[2]         # lid 1 lmc 0


[root at ServerB ~]# ibcheckstate  -v

# Checking Ca: nodeguid 0x00d01c000001010a
Node check lid 2:  OK
Port check lid 2 port 2:  OK

# Checking Ca: nodeguid 0x0002c90200212250
Node check lid 1:  OK
Port check lid 1 port 1:  OK

## Summary: 2 nodes checked, 0 bad nodes found
##          2 ports checked, 0 ports with bad state found


[root at ServerB ~]# ibnodes -v
Ca      : 0x00d01c000001010a ports 2 "ServerA HCA-1 (Topspin HCA)"
Ca      : 0x0002c90200212250 ports 2 "ServerB HCA-1"




Please let me know if you need any other information. 


Thanks in advance,

Kapil

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/8984b886/attachment.html>


More information about the general mailing list