[ofa-general] synchronize commands issued to MTHCA
Yicheng Jia
YJia at tmriusa.com
Mon Dec 31 17:02:15 PST 2007
Hi Roland,
Thanks for your reply!
Actually I'm working on porting IB driver to QNX platform. I resume the
work started by my former colleague, and I just found that the sync codes
(dev->cmd.poll_sem and dev->cmd.hcr_mutex) were deleted for unknown
reason. After adding back these sync codes, the driver runs much
smoothlier.
However I still get a command exec error which I believe is relevant to
command synchronization. The problem is when "Created UDAV" is called
during SW2HW_MPT command is being executed, the SW2HW_MPT command would
return with bad parameter error. Here are my debug trace output:
139903841835 HCR CMD: op_code: LE: d
139903861104 TRACE: mad.c:639/ib_mad_recv_done_handler
139903890876 HCR CMD: in_param_h: LE: 0
139903942869 TRACE: mad.c:644/ib_mad_recv_done_handler
139903993296 HCR CMD: in_param_l: LE: cf616000
139904038413 TRACE: verbs.c:182/ib_create_ah_from_wc
139904094753 HCR CMD: input_modifier: LE: 1e
139904139150 TRACE: mthca_provider.c:447/mthca_ah_create
MTHCA DBG: <mthca_av.c:229> Created UDAV at 8075220/00000000:
139904197065 HCR CMD: out_pram_h: LE: 0
139904333343 [ 0] 01000005
139904384499 HCR CMD: out_pram_l: LE: 0
139904428086 [ 4] 0000ffff
139904478675 HCR CMD: token: LE: ffff0000
139904520156 [ 8] 00003000
139904572059 HCR CMD: op_code_modifier: LE: 0
139904612802 [ c] 00000000
139904667693 HCR CMD: event: LE: 0
139904708526 [10] 00000000
139904758422 HCR CMD 0x18h: LE=80000d, BE=d008000
139904799210 [14] 00000000
139904904204 [18] 00000000
139904946792MTHCA DBG: <mthca_cmd.c:195> HCR_STATUS 40100698= d008000 ?
8000
[1c] 00000002
139905076860 TRACE: mthca_av.c:235/mthca_create_ah
139905112329 TRACE: mthca_av.c:243/mthca_create_ah
139905147672 TRACE: mthca_provider.c:460/mthca_ah_create
....
139906793007 HCR CMD: Status Return: : 3
Do you have any idea?
Thanks and have a good new year!
Yicheng
Roland Dreier <rdreier at cisco.com>
12/28/2007 11:39 PM
To
Yicheng Jia <YJia at tmriusa.com>
cc
general at lists.openfabrics.org
Subject
Re: [ofa-general] synchronize commands issued to MTHCA
> I'm using OFED-1.0 and the problem I believe is related to command
> synchronization of HCA. The host issues a MAD_INF command at first and
> then a SW2HW_MTP command without waiting for the completion of the
first
> command. Both of commands return with bad parameters error.
I guess you mean the MAD_IFC and SW2HW_MPT commands? I've never heard
of a problem like that -- more details about your hardware/software
config and the exact symptoms you see would be helpful in debugging.
Anyway OFED 1.0 is ancient by now -- you are much better off just
using drivers from the standard kernel. If you must use OFED, then
OFED 1.2 or even a 1.3 prerelease would be better.
> My question is why there's no synchronization mechanism for the command
> execution on HCA, can I use "spin_lock" or "sem_wait" to synchronize
> between every command?
The HCA firmware allows multiple commands to be queued. The
dev->cmd.event_sem semaphore is used to limit the number of
outstanding commands to the HCA's capabilities, and the
dev->cmd.hcr_mutex mutex is used to serialize the actual writing of
commands to the HCA.
There was a mmiowb() added to mthca_cmd_post() fairly recently that
might fix your problems if you are running on a large SGI Altix system.
- R.
_____________________________________________________________________________
Scanned by IBM Email Security Management Services powered by MessageLabs.
For more information please visit http://www.ers.ibm.com
_____________________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20071231/955bfe8f/attachment.html>
More information about the general
mailing list