[openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort: Couldn'tmodify SRQ limit

Pavel Shamis (Pasha) pasha at mellanox.co.il
Wed May 10 10:27:02 PDT 2006


Hi Amit,
Thank you for patches.
I will apply all the patches to current version.

Thanks,
Pasha.

Amit Mehrotra (amehrotr) wrote:
> Hi Pavel,
> 
> Please find a zip containing the srq patch. As there were lot of
> indentation changes I have attached another zip containing diffs created
> with -bu. The only issue is that the diffs are with respect to the rc3
> code base.
> 
> I have tested the code on IA32(3.6 GHz)/PCI_X systems with osu_latency
> and Pallas. The code was compiled with PCI_EX_ and the performance in
> both cases is quite similar.
> 
> thanks
> -Amit
> 
> -----Original Message-----
> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
> Sent: Tuesday, May 09, 2006 5:29 PM
> To: Amit Mehrotra (amehrotr)
> Cc: Openfabrics-ewg at openib.org
> Subject: Re: [openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort:
> Couldn'tmodify SRQ limit
> 
> Hi Amit,
> I'm agree that the SRQ_ENABLE/DISABLE should be run time options and it
> will be best solution.
> 
> We want to freeze the mvapich code till Wednesday (US time), so if you
> will be able to prepare the patch before code_freeze I will integrate it
> in current OFED release.
> 
> Regards,
> Pasha.
> 
> Amit Mehrotra (amehrotr) wrote:
>>  Hi Pavel,
>>
>> Having the default as PCI_EX is a good idea. The only issue is that
>> Sayantan(OSU) reported lower latency numbers with SRQ on PCI_X. I 
>> propose that we can  add a new tunable parameter e.g. 
>> VIADEV_SRQ_ENABLE which will allow the user to select weather he wants
> 
>> to run using SRQ or without SRQ. I looked through the code and 
>> implementing it wont be difficult . I can generate a patch for it if
> required.
>> -Amit
>>
>> -----Original Message-----
>> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il]
>> Sent: Monday, May 08, 2006 2:52 PM
>> To: Amit Mehrotra (amehrotr)
>> Cc: Amit Krig; Openfabrics-ewg at openib.org
>> Subject: Re: [openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort:
>> Couldn'tmodify SRQ limit
>>
>> Hi Amit,
>> Please see my answers below:
>>
>> Amit Mehrotra (amehrotr) wrote:
>>> Hi Amit,
>>>  
>>> A manual firmware upgrade fixed the problem. I was incorrectly 
>>> expecting that the ibed installer would have automatically flashed 
>>> the
>>> correct firmware.
>>>  
>>> The code is now still being compiled with PCI_EX but the only impact 
>>> is that the vbuf size(sort of packet size) calculation in mvapich 
>>> becomes incorrect(mvapich calculates vbuf size based on the weather 
>>> its a PCI_X/PCI_EX cards).
>>>  
>>> I believe we would need to get MVAPICH changed so that when it 
>>> compiles for PCI_X it compiles it with SRQ support and the 
>>> mvapich.make script needs to be fixed so that it correctly identifies
>> the IB card as PCI_X.
>>
>> I want remove the HCA auto detection from mvapich.By default I will 
>> use PCI_EX and will add default param file that will include vbufs 
>> size tunings for different platforms.All lines will be commented by 
>> default and user will be responsible to uncomment tuning that he want
> to use.
>> Regards,
>> Pasha.
>>
>>>  
>>> -Amit
>>>  
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --
>>> *From:* Amit Krig [mailto:amitk at mellanox.co.il]
>>> *Sent:* Thursday, May 04, 2006 5:48 PM
>>> *To:* Amit Mehrotra (amehrotr); Openfabrics-ewg at openib.org
>>> *Subject:* RE: [openfabrics-ewg] MVAPICH on PCI-X fails with [0]
>> Abort: 
>>> Couldn'tmodify SRQ limit
>>>
>>> Hi Amit,
>>>  
>>> Please note that  SRQ limit event is being supported from fw version 
>>> 3.4.0
>>>
>>> Can you specify the FW version that you use, SRQ size,  number of 
>>> outstanding WR in the SRQ when the modify SRQ command was executed, 
>>> requested SRQ limit value and QP transport type.
>>>
>>>  
>>>
>>> Amit
>>>
>>>  
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --
>>> *From:* openfabrics-ewg-bounces at openib.org
>>> [mailto:openfabrics-ewg-bounces at openib.org] *On Behalf Of *Amit 
>>> Mehrotra
>>> (amehrotr)
>>> *Sent:* Tuesday, May 02, 2006 2:36 PM
>>> *To:* Openfabrics-ewg at openib.org
>>> *Subject:* [openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort: 
>>> Couldn'tmodify SRQ limit
>>>
>>> Configuration:- RHEL4U3,ia32,rc3,PCI-X
>>>  
>>> I have been seeing the following error when I try to run MVAPICH test
> 
>>> programs
>>> ----
>>> [0] Abort: Couldn't modify SRQ limit
>>>  at line 999 in file viainit.c
>>> -----
>>>  
>>> On debugging the issue it seems that MVAPICH is being incorrectly 
>>> compiled for the PCI_EX cards rather than PCI_X cards. From the MPI 
>>> code it seems that PCI-X cards do not support modifications of SRQs.
>>> The source of the problem lies in a bug in the mvapich.make 
>>> script(new
>>> IBED
>>> addition) which incorrectly always returns the card as PCI_EX . I 
>>> have
>>> appended a diff with the fix. I am not sure as to how the patch can 
>>> be
>>> correctly generated as the whole MVAPICH  is in the form of a zipped 
>>> tarball.
>>>  
>>> There seems to be one more issue in the script where it deviates from
> 
>>> the MVAPICH build. The script is treating the older PCI_EX 
>>> cards(cards
>>> with the lspci signature of 15b3:6278) as PCI-X. Was this done 
>>> because
>>> these cards also don't support resizing SRQs?
>>>  
>>> -Amit
>>>  
>>> -------------------
>>> diff -u mvapich.make.old mvapich.make
>>> --- mvapich.make.old    2006-05-02 15:32:11.000000000 +0530
>>> +++ mvapich.make        2006-05-02 15:33:37.000000000 +0530
>>> @@ -251,13 +251,13 @@
>>>                 DEF_BUILDID="$DEF_BUILDID"
>>>         fi
>>>  fi
>>> -if (/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d '[:space:]'); 
>>> then
>>> +if (test `/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d
>>> '[:space:]'` -gt 0); then
>>>         # Arbel
>>>         CFLAGS="$CFLAGS -D_PCI_EX_"
>>> -elif (/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc -l | tr -d
> 
>>> '[:space:]'); then
>>> +elif (test `/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc -l |
> 
>>> +tr
>>> -d '[:space:]'` -gt 0); then
>>>         # Sinai
>>>         CFLAGS="$CFLAGS -D_PCI_EX_"
>>> -elif (/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc -l | tr -d
> 
>>> '[:space:]'); then
>>> +elif (test `/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc -l |
> 
>>> +tr
>>> -d '[:space:]'` -gt 0); then
>>>         # Tavor
>>>         CFLAGS="$CFLAGS -D_PCI_X_"
>>>  fi
>>> -----------
>>>  
>>>  
>>>
>>>
>>> ---------------------------------------------------------------------
>>> -
>>> --
>>>
>>> _______________________________________________
>>> openfabrics-ewg mailing list
>>> openfabrics-ewg at openib.org
>>> http://openib.org/mailman/listinfo/openfabrics-ewg
>> _______________________________________________
>> openfabrics-ewg mailing list
>> openfabrics-ewg at openib.org
>> http://openib.org/mailman/listinfo/openfabrics-ewg
>>




More information about the ewg mailing list