[ofa-general] synchronize commands issued to MTHCA

Yicheng Jia YJia at tmriusa.com
Mon Dec 31 17:02:15 PST 2007


Hi Roland,

Thanks for your reply!

Actually I'm working on porting IB driver to QNX platform. I resume the 
work started by my former colleague, and I just found that the sync codes 
(dev->cmd.poll_sem and dev->cmd.hcr_mutex) were deleted for unknown 
reason. After adding back these sync codes, the driver runs much 
smoothlier. 

However I still get a command exec error which I believe is relevant to 
command synchronization. The problem is when "Created UDAV" is called 
during SW2HW_MPT command is being executed, the SW2HW_MPT command would 
return with bad parameter error. Here are my debug trace output:

139903841835 HCR CMD: op_code:          LE: d
139903861104 TRACE: mad.c:639/ib_mad_recv_done_handler
139903890876 HCR CMD: in_param_h:       LE: 0
139903942869 TRACE: mad.c:644/ib_mad_recv_done_handler
139903993296 HCR CMD: in_param_l:       LE: cf616000
139904038413 TRACE: verbs.c:182/ib_create_ah_from_wc
139904094753 HCR CMD: input_modifier:   LE: 1e
139904139150 TRACE: mthca_provider.c:447/mthca_ah_create
MTHCA DBG: <mthca_av.c:229> Created UDAV at 8075220/00000000:
139904197065 HCR CMD: out_pram_h:       LE: 0
139904333343   [ 0] 01000005
139904384499 HCR CMD: out_pram_l:       LE: 0
139904428086   [ 4] 0000ffff
139904478675 HCR CMD: token:            LE: ffff0000
139904520156   [ 8] 00003000
139904572059 HCR CMD: op_code_modifier: LE: 0
139904612802   [ c] 00000000
139904667693 HCR CMD: event:            LE: 0
139904708526   [10] 00000000
139904758422 HCR CMD 0x18h:             LE=80000d, BE=d008000
139904799210   [14] 00000000
139904904204   [18] 00000000
139904946792MTHCA DBG: <mthca_cmd.c:195> HCR_STATUS 40100698= d008000 ? 
8000
   [1c] 00000002
139905076860 TRACE: mthca_av.c:235/mthca_create_ah
139905112329 TRACE: mthca_av.c:243/mthca_create_ah
139905147672 TRACE: mthca_provider.c:460/mthca_ah_create
....
139906793007 HCR CMD: Status Return:              : 3

Do you have any idea?

Thanks and have a good new year!
Yicheng




Roland Dreier <rdreier at cisco.com> 
12/28/2007 11:39 PM

To
Yicheng Jia <YJia at tmriusa.com>
cc
general at lists.openfabrics.org
Subject
Re: [ofa-general] synchronize commands issued to MTHCA






 > I'm using OFED-1.0 and the problem I believe is related to command 
 > synchronization of HCA. The host issues a MAD_INF command at first and 
 > then a SW2HW_MTP command without waiting for the completion of the 
first 
 > command. Both of commands return with bad parameters error.

I guess you mean the MAD_IFC and SW2HW_MPT commands?  I've never heard
of a problem like that -- more details about your hardware/software
config and the exact symptoms you see would be helpful in debugging.

Anyway OFED 1.0 is ancient by now -- you are much better off just
using drivers from the standard kernel.  If you must use OFED, then
OFED 1.2 or even a 1.3 prerelease would be better.

 > My question is why there's no synchronization mechanism for the command 

 > execution on HCA, can I use "spin_lock" or "sem_wait" to synchronize 
 > between every command?

The HCA firmware allows multiple commands to be queued.  The
dev->cmd.event_sem semaphore is used to limit the number of
outstanding commands to the HCA's capabilities, and the
dev->cmd.hcr_mutex mutex is used to serialize the actual writing of
commands to the HCA.

There was a mmiowb() added to mthca_cmd_post() fairly recently that
might fix your problems if you are running on a large SGI Altix system.

 - R.

_____________________________________________________________________________
Scanned by IBM Email Security Management Services powered by MessageLabs. 
For more information please visit http://www.ers.ibm.com
_____________________________________________________________________________

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20071231/955bfe8f/attachment.html>


More information about the general mailing list