[ofa-general] synchronize commands issued to MTHCA

Jack Morgenstein jackm at dev.mellanox.co.il
Mon Dec 31 23:03:38 PST 2007


On Tuesday 01 January 2008 03:02, Yicheng Jia wrote:

Does your HCA use on-board memory?
(Run: "lspci" and look at "Mellanox" lines.  You have on-board memory
 if you see either:
	PCI bridge: Mellanox Technologies MT23108 InfiniHost HCA bridge (rev a1)
	InfiniBand: Mellanox Technologies MT23108 InfiniHost HCA (rev a1)
 OR:
   InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode)
)

In that case, when you create an AH in kernel space
(file mthca_av.c, procedure mthca_create_ah() ), you will enter the following flow:
	if (ah->type == MTHCA_AH_ON_HCA) {
		memcpy_toio(dev->av_table.av_map + index * MTHCA_AV_SIZE,
			    av, MTHCA_AV_SIZE);
		kfree(av);
	}

Roland, do you think that the memcpy_toio() call might mess things up?

Maybe we need "wmb()" or "mmiowb()" here as well?

- Jack

> Hi Roland,
> 
> Thanks for your reply!
> 
> Actually I'm working on porting IB driver to QNX platform. I resume the 
> work started by my former colleague, and I just found that the sync codes 
> (dev->cmd.poll_sem and dev->cmd.hcr_mutex) were deleted for unknown 
> reason. After adding back these sync codes, the driver runs much 
> smoothlier. 
> 
> However I still get a command exec error which I believe is relevant to 
> command synchronization. The problem is when "Created UDAV" is called 
> during SW2HW_MPT command is being executed, the SW2HW_MPT command would 
> return with bad parameter error. Here are my debug trace output:
> 
> 139903841835 HCR CMD: op_code:          LE: d
> 139903861104 TRACE: mad.c:639/ib_mad_recv_done_handler
> 139903890876 HCR CMD: in_param_h:       LE: 0
> 139903942869 TRACE: mad.c:644/ib_mad_recv_done_handler
> 139903993296 HCR CMD: in_param_l:       LE: cf616000
> 139904038413 TRACE: verbs.c:182/ib_create_ah_from_wc
> 139904094753 HCR CMD: input_modifier:   LE: 1e
> 139904139150 TRACE: mthca_provider.c:447/mthca_ah_create
> MTHCA DBG: <mthca_av.c:229> Created UDAV at 8075220/00000000:
> 139904197065 HCR CMD: out_pram_h:       LE: 0
> 139904333343   [ 0] 01000005
> 139904384499 HCR CMD: out_pram_l:       LE: 0
> 139904428086   [ 4] 0000ffff
> 139904478675 HCR CMD: token:            LE: ffff0000
> 139904520156   [ 8] 00003000
> 139904572059 HCR CMD: op_code_modifier: LE: 0
> 139904612802   [ c] 00000000
> 139904667693 HCR CMD: event:            LE: 0
> 139904708526   [10] 00000000
> 139904758422 HCR CMD 0x18h:             LE=80000d, BE=d008000
> 139904799210   [14] 00000000
> 139904904204   [18] 00000000
> 139904946792MTHCA DBG: <mthca_cmd.c:195> HCR_STATUS 40100698= d008000 ? 
> 8000
>    [1c] 00000002
> 139905076860 TRACE: mthca_av.c:235/mthca_create_ah
> 139905112329 TRACE: mthca_av.c:243/mthca_create_ah
> 139905147672 TRACE: mthca_provider.c:460/mthca_ah_create
> ....
> 139906793007 HCR CMD: Status Return:              : 3
> 
> Do you have any idea?
> 
> Thanks and have a good new year!
> Yicheng
> 
> 
> 
> 
> Roland Dreier <rdreier at cisco.com> 
> 12/28/2007 11:39 PM
> 
> To
> Yicheng Jia <YJia at tmriusa.com>
> cc
> general at lists.openfabrics.org
> Subject
> Re: [ofa-general] synchronize commands issued to MTHCA
> 
> 
> 
> 
> 
> 
>  > I'm using OFED-1.0 and the problem I believe is related to command 
>  > synchronization of HCA. The host issues a MAD_INF command at first and 
>  > then a SW2HW_MTP command without waiting for the completion of the 
> first 
>  > command. Both of commands return with bad parameters error.
> 
> I guess you mean the MAD_IFC and SW2HW_MPT commands?  I've never heard
> of a problem like that -- more details about your hardware/software
> config and the exact symptoms you see would be helpful in debugging.
> 
> Anyway OFED 1.0 is ancient by now -- you are much better off just
> using drivers from the standard kernel.  If you must use OFED, then
> OFED 1.2 or even a 1.3 prerelease would be better.
> 
>  > My question is why there's no synchronization mechanism for the command 
> 
>  > execution on HCA, can I use "spin_lock" or "sem_wait" to synchronize 
>  > between every command?
> 
> The HCA firmware allows multiple commands to be queued.  The
> dev->cmd.event_sem semaphore is used to limit the number of
> outstanding commands to the HCA's capabilities, and the
> dev->cmd.hcr_mutex mutex is used to serialize the actual writing of
> commands to the HCA.
> 
> There was a mmiowb() added to mthca_cmd_post() fairly recently that
> might fix your problems if you are running on a large SGI Altix system.
> 
>  - R.
> 
> _____________________________________________________________________________
> Scanned by IBM Email Security Management Services powered by MessageLabs. 
> For more information please visit http://www.ers.ibm.com
> _____________________________________________________________________________
> 
> 



More information about the general mailing list