[Users] mthca lockup

Bart Van Assche bvanassche at acm.org
Mon Sep 30 11:33:53 PDT 2013


On 09/23/13 20:37, Orion Poplawski wrote:
> On 09/23/2013 12:00 PM, Rupert Dance wrote:
>> When OFA software is installed from the OFED distribution, a utility is
>> included called "ofed_info" which will spit out a lot of data about
>> what was
>> installed. A simpler command is available using "ofed_info -s" which
>> gives
>> just the version. Things may be slightly different in the packaging from
>> various Distros.
>>
>> The reason I asked about the version is that OFED 3.5-2 includes an
>> updated
>> version of the mthca module and so I was curious if this could be
>> related.
>> If you want to try the latest build from the OFA you can find it here
>> but be
>> aware that you can get conflicts between the Distro version of OFA
>> software
>> and OFED itself. So try to remove all support for OFED before you
>> installed
>> the 3.5-2 package. If this is a production cluster, you may be best to
>> try
>> it on a test cluster first.
>>
>> http://www.openfabrics.org/downloads/OFED/ofed-3.5-2/OFED-3.5-2-rc1.tgz
>>
>
> Thanks, but I don't see any evidence that 3.5-2 actually has an updated
> libmthca.  It seems to have libmthca-1.0.6-1.src.rpm which seems to be
> the same version I have via the distro.
>
> The release notes indicates an updated libmthca compared to 3.5-1, but
> this appears to be a mistake.  It is updated compared to 3.5 though.
>
> Also, apparently err -16 indicates EBUSY so perhaps the hardware had
> locked up somehow.

It might be a good idea to log queue pair numbers just after a queue 
pair has been created and just before a queue pair is destroyed. That 
will allow to figure out whether or not queue pair numbers are reused 
too quickly. A patch that resolved a similar issue for the mlx4 driver 
(but that is not in RHEL 6.4 AFAIK) can be found here: 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=f4ec9e9531ac79ee2521faf7ad3d98978f747e42.

Bart.




More information about the Users mailing list