[openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort: Couldn'tmodify SRQ limit

Pavel Shamis (Pasha) pasha at mellanox.co.il
Tue May 9 04:58:58 PDT 2006


Hi Amit,
I'm agree that the SRQ_ENABLE/DISABLE should be run time options and it 
will be best solution.

We want to freeze the mvapich code till Wednesday (US time),
so if you will be able to prepare the patch before code_freeze
I will integrate it in current OFED release.

Regards,
Pasha.

Amit Mehrotra (amehrotr) wrote:
>  Hi Pavel,
> 
> Having the default as PCI_EX is a good idea. The only issue is that
> Sayantan(OSU) reported lower latency numbers with SRQ on PCI_X. I
> propose that we can  add a new tunable parameter e.g. VIADEV_SRQ_ENABLE
> which will allow the user to select weather he wants to run using SRQ or
> without SRQ. I looked through the code and implementing it wont be
> difficult . I can generate a patch for it if required.
> 
> -Amit
> 
> -----Original Message-----
> From: Pavel Shamis (Pasha) [mailto:pasha at mellanox.co.il] 
> Sent: Monday, May 08, 2006 2:52 PM
> To: Amit Mehrotra (amehrotr)
> Cc: Amit Krig; Openfabrics-ewg at openib.org
> Subject: Re: [openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort:
> Couldn'tmodify SRQ limit
> 
> Hi Amit,
> Please see my answers below:
> 
> Amit Mehrotra (amehrotr) wrote:
>> Hi Amit,
>>  
>> A manual firmware upgrade fixed the problem. I was incorrectly 
>> expecting that the ibed installer would have automatically flashed the
> 
>> correct firmware.
>>  
>> The code is now still being compiled with PCI_EX but the only impact 
>> is that the vbuf size(sort of packet size) calculation in mvapich 
>> becomes incorrect(mvapich calculates vbuf size based on the weather 
>> its a PCI_X/PCI_EX cards).
>>  
>> I believe we would need to get MVAPICH changed so that when it 
>> compiles for PCI_X it compiles it with SRQ support and the 
>> mvapich.make script needs to be fixed so that it correctly identifies
> the IB card as PCI_X.
> 
> I want remove the HCA auto detection from mvapich.By default I will use
> PCI_EX and will add default param file that will include vbufs size
> tunings for different platforms.All lines will be commented by default
> and user will be responsible to uncomment tuning that he want to use.
> 
> Regards,
> Pasha.
> 
>>  
>> -Amit
>>  
>>
>> ----------------------------------------------------------------------
>> --
>> *From:* Amit Krig [mailto:amitk at mellanox.co.il]
>> *Sent:* Thursday, May 04, 2006 5:48 PM
>> *To:* Amit Mehrotra (amehrotr); Openfabrics-ewg at openib.org
>> *Subject:* RE: [openfabrics-ewg] MVAPICH on PCI-X fails with [0]
> Abort: 
>> Couldn'tmodify SRQ limit
>>
>> Hi Amit,
>>  
>> Please note that  SRQ limit event is being supported from fw version 
>> 3.4.0
>>
>> Can you specify the FW version that you use, SRQ size,  number of 
>> outstanding WR in the SRQ when the modify SRQ command was executed, 
>> requested SRQ limit value and QP transport type.
>>
>>  
>>
>> Amit
>>
>>  
>>
>> ----------------------------------------------------------------------
>> --
>> *From:* openfabrics-ewg-bounces at openib.org
>> [mailto:openfabrics-ewg-bounces at openib.org] *On Behalf Of *Amit 
>> Mehrotra
>> (amehrotr)
>> *Sent:* Tuesday, May 02, 2006 2:36 PM
>> *To:* Openfabrics-ewg at openib.org
>> *Subject:* [openfabrics-ewg] MVAPICH on PCI-X fails with [0] Abort: 
>> Couldn'tmodify SRQ limit
>>
>> Configuration:- RHEL4U3,ia32,rc3,PCI-X
>>  
>> I have been seeing the following error when I try to run MVAPICH test 
>> programs
>> ----
>> [0] Abort: Couldn't modify SRQ limit
>>  at line 999 in file viainit.c
>> -----
>>  
>> On debugging the issue it seems that MVAPICH is being incorrectly 
>> compiled for the PCI_EX cards rather than PCI_X cards. From the MPI 
>> code it seems that PCI-X cards do not support modifications of SRQs. 
>> The source of the problem lies in a bug in the mvapich.make script(new
> 
>> IBED
>> addition) which incorrectly always returns the card as PCI_EX . I have
> 
>> appended a diff with the fix. I am not sure as to how the patch can be
> 
>> correctly generated as the whole MVAPICH  is in the form of a zipped 
>> tarball.
>>  
>> There seems to be one more issue in the script where it deviates from 
>> the MVAPICH build. The script is treating the older PCI_EX cards(cards
> 
>> with the lspci signature of 15b3:6278) as PCI-X. Was this done because
> 
>> these cards also don't support resizing SRQs?
>>  
>> -Amit
>>  
>> -------------------
>> diff -u mvapich.make.old mvapich.make
>> --- mvapich.make.old    2006-05-02 15:32:11.000000000 +0530
>> +++ mvapich.make        2006-05-02 15:33:37.000000000 +0530
>> @@ -251,13 +251,13 @@
>>                 DEF_BUILDID="$DEF_BUILDID"
>>         fi
>>  fi
>> -if (/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d '[:space:]'); 
>> then
>> +if (test `/sbin/lspci -n | grep "15b3:6282" | wc -l | tr -d
>> '[:space:]'` -gt 0); then
>>         # Arbel
>>         CFLAGS="$CFLAGS -D_PCI_EX_"
>> -elif (/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc -l | tr -d 
>> '[:space:]'); then
>> +elif (test `/sbin/lspci -n | grep -E "15b3:5e8c|15b3:6274" | wc -l | 
>> +tr
>> -d '[:space:]'` -gt 0); then
>>         # Sinai
>>         CFLAGS="$CFLAGS -D_PCI_EX_"
>> -elif (/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc -l | tr -d 
>> '[:space:]'); then
>> +elif (test `/sbin/lspci -n | grep -E "15b3:5a44|15b3:6278" | wc -l | 
>> +tr
>> -d '[:space:]'` -gt 0); then
>>         # Tavor
>>         CFLAGS="$CFLAGS -D_PCI_X_"
>>  fi
>> -----------
>>  
>>  
>>
>>
>> ----------------------------------------------------------------------
>> --
>>
>> _______________________________________________
>> openfabrics-ewg mailing list
>> openfabrics-ewg at openib.org
>> http://openib.org/mailman/listinfo/openfabrics-ewg
> _______________________________________________
> openfabrics-ewg mailing list
> openfabrics-ewg at openib.org
> http://openib.org/mailman/listinfo/openfabrics-ewg
> 




More information about the ewg mailing list