[ofa-general] synchronize commands issued to MTHCA

Yicheng Jia YJia at tmriusa.com
Wed Jan 2 13:51:26 PST 2008


> What is the call chain that calls SW2HW_MPT in this case?
The SW2HW_MPT is called by mthca_mr_alloc function. In this function, It 
first call "mthca_alloc" to get a mr key, then "mthca_table_get" to get a 
mr ICM entry, then "mthca_alloc_mailbox" to alloc a block of mailbox for 
the command. During the procedure, the mad completion handler of "
ib_mad_recv_done_handler" is also running, which processes the MAD_IFC 
command and sends response, they are all completed without error report. 
Also for your information, I'm using two Due Core Xeon CPU to run the 
driver.

> Also are you going through the mthca_cmd_post_dbell() or 
mthca_cmd_post_hcr()code to write the command params to the HCA?
Yes. I found there's a little difference between these two functions. 
There are two "wmb()" functions call in mthca_cmd_post_dbell()but only one 
"wmb()" in mthca_cmd_post_hcr(). Any perticular reason for it? 

> I think the best way to debug this would be to work directly with 
Mellanox to get a debug build of the HCA firmware and get definite info on 
why the SW2HW_MPT command is failing.
Do you know who I am supposed to contact with?

Thanks!
Yicheng




Roland Dreier <rdreier at cisco.com> 
01/02/2008 02:55 PM

To
Yicheng Jia <YJia at tmriusa.com>
cc
Jack Morgenstein <jackm at dev.mellanox.co.il>, general at lists.openfabrics.org
Subject
Re: [ofa-general] synchronize commands issued to MTHCA






 > The SW2HW_MPT command is issued while UDAV table is been creating. 
During 
 > the time that the driver is waiting for the completion of the command, 
it 
 > does many other things: creating send mad package, posting send mad 
 > request to the SQ and posting another receive mad request to the RQ. 
 > There's no error report for all of these actions. However after it, the 

 > HCA report command parameter error for the SW2HW_MPT.

I doubt the problem is creating the UD address vector -- that is just
shuffling some things around in the CPU's memory.  It seems more
likely that posting a send or receive request is messing things up
somehow.  What is the call chain that calls SW2HW_MPT in this case?
Also are you going through the mthca_cmd_post_dbell() or 
mthca_cmd_post_hcr()
code to write the command params to the HCA?

I think the best way to debug this would be to work directly with
Mellanox to get a debug build of the HCA firmware and get definite
info on why the SW2HW_MPT command is failing.

 - R.

_____________________________________________________________________________
Scanned by IBM Email Security Management Services powered by MessageLabs. 
For more information please visit http://www.ers.ibm.com
_____________________________________________________________________________

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20080102/f758a72f/attachment.html>


More information about the general mailing list