[ofa-general] Infiniband data transfer across different IB drivers
Dukle, Kapil (GE Healthcare)
Kapil.Dukle at med.ge.com
Fri Jun 15 09:21:16 PDT 2007
Hi,
I am currently experimenting with Infiniband data transfers across two
servers with different operating systems
and IB drivers.
Server A runs VxWorks 5.5 and uses Mellanox IB driver modules and VAPI
interface
Server B runs Linux 2.6.x and uses OFED 1.0 drivers and the OFED Verbs
API
Problem:
I have written code (that makes the respective Verbs calls) to setup
queue pairs and initialize them with the
destination queue pair number and lid. The connection type is IBV_QPT_RC
(Reliable Connection).
The traces seem to confirm that the destination qpn, lid values are
correct. The next thing
I try to do is to post send requests on Server A, and receive requests
on Server B. I then check the
respective completion queues for any events. The problem is that I do
NOT see any completion events on
the receive completion queue for Server B.
Questions:
- Are these two drivers (Mellanox VAPI and OFED) compatible with each
other in the first place?
- Is it possible to verify the two queue pairs are indeed "connected" to
each other?
- Can I enable some debug mechanism at the driver level to see what the
send/receive requests translate to, and what the underlying
errors could be (if any)?
Here is some information about the network that may help:
[root at ServerB ~]# ps -elf | grep opensm
4 S root 2695 1 0 32 - - 14738 stext Jun14 ?
00:00:00 /usr/local/ofed/bin/opensm -t 200 -g 0
0 S root 12030 11992 0 76 0 - 13981 pipe_w 11:18 pts/1
00:00:00 grep opensm
[root at ServerB ~]# sminfo
sminfo: sm lid 0x1 sm guid 0x2c90200212251, activity count 40926
priority 1 state SMINFO_MASTER 3
[root at ServerB ~]# ibnetdiscover -v
[1] {0002c90200212250}
DR path [0][1] -> new remote ca {00d01c000001010a} portnum 2 lid 0x2-0x2
"ServerA HCA-1 (Topspin HCA)"
[2] {00d01c000001010a}
#
# Topology file: generated on Fri Jun 15 11:05:52 2007
#
# Max of 1 hops discovered
# Initiated from node 0002c90200212250 port 0002c90200212251
vendid=0xd01c
devid=0x5a44
sysimgguid=0xd01c000001010a
caguid=0xd01c000001010a
Ca 2 "H-00d01c000001010a" # ServerA HCA-1 (Topspin HCA)
[2] "H-0002c90200212250"[1] # lid 2 lmc 0
vendid=0x2c9
devid=0x5a44
sysimgguid=0x2c90200212253
caguid=0x2c90200212250
Ca 2 "H-0002c90200212250" # ServerB HCA-1
[1] "H-00d01c000001010a"[2] # lid 1 lmc 0
[root at ServerB ~]# ibcheckstate -v
# Checking Ca: nodeguid 0x00d01c000001010a
Node check lid 2: OK
Port check lid 2 port 2: OK
# Checking Ca: nodeguid 0x0002c90200212250
Node check lid 1: OK
Port check lid 1 port 1: OK
## Summary: 2 nodes checked, 0 bad nodes found
## 2 ports checked, 0 ports with bad state found
[root at ServerB ~]# ibnodes -v
Ca : 0x00d01c000001010a ports 2 "ServerA HCA-1 (Topspin HCA)"
Ca : 0x0002c90200212250 ports 2 "ServerB HCA-1"
Please let me know if you need any other information.
Thanks in advance,
Kapil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070615/8984b886/attachment.html>
More information about the general
mailing list