[openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort: Couldn'tmodify SRQ limit
Pavel Shamis (Pasha)
pasha at mellanox.co.il
Wed May 10 10:27:02 PDT 2006
Hi Amit,
Thank you for patches.
I will apply all the patches to current version.
Thanks,
Pasha.
Amit Mehrotra (amehrotr) wrote:
> Hi Pavel,
>
> Please find a zip containing the srq patch. As there were lot of
> indentation changes I have attached another zip containing diffs created
> with -bu. The only issue is that the diffs are with respect to the rc3
> code base.
>
> I have tested the code on IA32(3.6 GHz)/PCI_X systems with osu_latency
> and Pallas. The code was compiled with PCI_EX_ and the performance in
> both cases is quite similar.
>
> thanks
> -Amit
>
> -----Original Message-----
> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il]
> Sent: Tuesday, May 09, 2006 5:29 PM
> To: Amit Mehrotra (amehrotr)
> Cc: Openfabrics-ewg at openib.org
> Subject: Re: [openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort:
> Couldn'tmodify SRQ limit
>
> Hi Amit,
> I'm agree that the SRQ_ENABLE/DISABLE should be run time options and it
> will be best solution.
>
> We want to freeze the mvapich code till Wednesday (US time), so if you
> will be able to prepare the patch before code_freeze I will integrate it
> in current OFED release.
>
> Regards,
> Pasha.
>
> Amit Mehrotra (amehrotr) wrote:
>> Hi Pavel,
>>
>> Having the default as PCI_EX is a good idea. The only issue is that
>> Sayantan(OSU) reported lower latency numbers with SRQ on PCI_X. I
>> propose that we can add a new tunable parameter e.g.
>> VIADEV_SRQ_ENABLE which will allow the user to select weather he wants
>
>> to run using SRQ or without SRQ. I looked through the code and
>> implementing it wont be difficult . I can generate a patch for it if
> required.
>> -Amit
>>
>> -----Original Message-----
>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il]
>> Sent: Monday, May 08, 2006 2:52 PM
>> To: Amit Mehrotra (amehrotr)
>> Cc: Amit Krig; Openfabrics-ewg at openib.org
>> Subject: Re: [openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort:
>> Couldn'tmodify SRQ limit
>>
>> Hi Amit,
>> Please see my answers below:
>>
>> Amit Mehrotra (amehrotr) wrote:
>>> Hi Amit,
>>>
>>> A manual firmware upgrade fixed the problem. I was incorrectly
>>> expecting that the ibed installer would have automatically flashed
>>> the
>>> correct firmware.
>>>
>>> The code is now still being compiled with PCI_EX but the only impact
>>> is that the vbuf size(sort of packet size) calculation in mvapich
>>> becomes incorrect(mvapich calculates vbuf size based on the weather
>>> its a PCI_X/PCI_EX cards).
>>>
>>> I believe we would need to get MVAPICH changed so that when it
>>> compiles for PCI_X it compiles it with SRQ support and the
>>> mvapich.make script needs to be fixed so that it correctly identifies
>> the IB card as PCI_X.
>>
>> I want remove the HCA auto detection from mvapich.By default I will
>> use PCI_EX and will add default param file that will include vbufs
>> size tunings for different platforms.All lines will be commented by
>> default and user will be responsible to uncomment tuning that he want
> to use.
>> Regards,
>> Pasha.
>>
>>>
>>> -Amit
>>>
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --
>>> *From:* Amit Krig [mailto:amitk at mellanox.co.il]
>>> *Sent:* Thursday, May 04, 2006 5:48 PM
>>> *To:* Amit Mehrotra (amehrotr); Openfabrics-ewg at openib.org
>>> *Subject:* RE: [openfabrics-ewg] MVAPICH on PCI-X fails with [0]
>> Abort:
>>> Couldn'tmodify SRQ limit
>>>
>>> Hi Amit,
>>>
>>> Please note that SRQ limit event is being supported from fw version
>>> 3.4.0
>>>
>>> Can you specify the FW version that you use, SRQ size, number of
>>> outstanding WR in the SRQ when the modify SRQ command was executed,
>>> requested SRQ limit value and QP transport type.
>>>
>>>
>>>
>>> Amit
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --
>>> *From:* openfabrics-ewg-bounces at openib.org
>>> [mailto:openfabrics-ewg-bounces at openib.org] *On Behalf Of *Amit
>>> Mehrotra
>>> (amehrotr)
>>> *Sent:* Tuesday, May 02, 2006 2:36 PM
>>> *To:* Openfabrics-ewg at openib.org
>>> *Subject:* [openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort:
>>> Couldn'tmodify SRQ limit
>>>
>>> Configuration:- RHEL4U3,ia32,rc3,PCI-X
>>>
>>> I have been seeing the following error when I try to run MVAPICH test
>
>>> programs
>>> ----
>>> [0] Abort: Couldn't modify SRQ limit
>>> at line 999 in file viainit.c
>>> -----
>>>
>>> On debugging the issue it seems that MVAPICH is being incorrectly
>>> compiled for the PCI_EX cards rather than PCI_X cards. From the MPI
>>> code it seems that PCI-X cards do not support modifications of SRQs.
>>> The source of the problem lies in a bug in the mvapich.make
>>> script(new
>>> IBED
>>> addition) which incorrectly always returns the card as PCI_EX . I
>>> have
>>> appended a diff with the fix. I am not sure as to how the patch can
>>> be
>>> correctly generated as the whole MVAPICH is in the form of a zipped
>>> tarball.
>>>
>>> There seems to be one more issue in the script where it deviates from
>
>>> the MVAPICH build. The script is treating the older PCI_EX
>>> cards(cards
>>> with the lspci signature of 15b3:6278) as PCI-X. Was this done
>>> because
>>> these cards also don't support resizing SRQs?
>>>
>>> -Amit
>>>
>>> -------------------
>>> diff -u mvapich.make.old mvapich.make
>>> --- mvapich.make.old 2006-05-02 15:32:11.000000000 +0530
>>> +++ mvapich.make 2006-05-02 15:33:37.000000000 +0530
>>> @@ -251,13 +251,13 @@
>>> DEF_BUILDID="$DEF_BUILDID"
>>> fi
>>> fi
>>> -if (/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d '[:space:]');
>>> then
>>> +if (test `/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d
>>> '[:space:]'` -gt 0); then
>>> # Arbel
>>> CFLAGS="$CFLAGS -D_PCI_EX_"
>>> -elif (/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc -l | tr -d
>
>>> '[:space:]'); then
>>> +elif (test `/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc -l |
>
>>> +tr
>>> -d '[:space:]'` -gt 0); then
>>> # Sinai
>>> CFLAGS="$CFLAGS -D_PCI_EX_"
>>> -elif (/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc -l | tr -d
>
>>> '[:space:]'); then
>>> +elif (test `/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc -l |
>
>>> +tr
>>> -d '[:space:]'` -gt 0); then
>>> # Tavor
>>> CFLAGS="$CFLAGS -D_PCI_X_"
>>> fi
>>> -----------
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --
>>>
>>> _______________________________________________
>>> openfabrics-ewg mailing list
>>> openfabrics-ewg at openib.org
>>> http://openib.org/mailman/listinfo/openfabrics-ewg
>> _______________________________________________
>> openfabrics-ewg mailing list
>> openfabrics-ewg at openib.org
>> http://openib.org/mailman/listinfo/openfabrics-ewg
>>
More information about the ewg
mailing list