[openib-general] [DAPL] ran kdapl test, got slab corruption
Hal Rosenstock
halr at voltaire.com
Fri Apr 29 03:31:04 PDT 2005
On Thu, 2005-04-28 at 18:31, Tom Duffy wrote:
> On Fri, 2005-04-29 at 00:06 +0300, Itamar Rabenstein wrote:
> > I think the problem is related to the use of double in kernel
> > in i386 arch we need to add to makefile :
> > ifeq (${IS_i686},i686)
> > # Override -msoft-float in arch/i386/Makefile
> > EXTRA_CFLAGS += -mhard-float
> > endif
> >
> > I am not sure that you have a flag like this .
> > I am working now on a new version of kdapletst without any use of double's
> > in kernel.
> > I think it will be ready early next week.
>
> If I add:
>
> EXTRA_CLFAGS += -msse
>
> I can compile on x86_64.
I don't need to do this on x64_64. Not sure why. Could this be due to a
compiler difference ? I am using gcc version 3.4.2 20041017 (Red Hat
3.4.2-6.fc3). I also build "in tree" rather that out of tree.
> Now that I run the test, I get the following:
>
> [root at flopteron2 ~]# ./kdapltest -T S -D mthca0a -d
> Server_Cmd.debug: 1
> Server_Cmd.dapl_name: mthca0a
> DT_cs_Server: IA mthca0a opened
> DT_cs_Server: PZ created
> DT_cs_Server: EP created
> DT_cs_Server: PSP created
> ***** DAPL Characteristics *****
> Provider: mthca0a Version 1.0 DAPL 1.2
> Adapter: Generic InfiniBand HCA by DAPL Reference Implementation Version
> 0.0
> Supporting:
> 64512 EPs with 65535 DTOs and 0 RDMA/RDs each
> 65408 EVDs of up to 65535 entries (default S/R size is 16/16)
> IOVs of up to 59 elements
> 131056 LMRs (and 131056 RMRs) of up to 0xffffffffffffffff bytes
> Maximum MTU 0x80000000 bytes, RDMA 0x80000000 bytes
> Maximum Private data size 92 bytes
> ***** ***** ***** ***** ***** *****
> DT_cs_Server: Posting 2 recvs
> Dapltest: Service Point Ready - mthca0a
> DT_cs_Server: Waiting for Connection Request
> DT_cs_Server: Accepting Connection Request
> DT_cs_Server: Awaiting connection ...
> DT_cs_Server: Connected!
> DAT_STATE: DAT_EP_STATE_CONNECTED
> DAT_STATE: Inbound DTO Status: Active
> DAT_STATE: Outbound DTO Status: Idle
> DT_cs_Server: Waiting for Client_Info
> DT_cs_Server: Got Client_Info
> DT_cs_Server: Waiting for Client_Cmd_Info
> DT_cs_Server: Send Server_Info
> Client Requests Server to Quit
> DT_cs_Server: Waiting for clients to all go away...
> DT_cs_Server: Cleaning up ...
> DT_cs_Server: IA mthca0a closed
> DT_cs_Server (mthca0a): Exiting.
> TEST INSTANCE 1
>
> [root at sins-stinger-10 ~]# ./kdapltest -T Q -s 192.168.0.26 -D mthca0a -d
> Server Name: 192.168.0.26
> Server Net Address: 192.168.0.26
> DT_cs_Client: Starting Test ...
> DT_cs_Client: IA mthca0a opened
> DT_cs_Client: EP created
> ***** DAPL Characteristics *****
> Provider: mthca0a Version 1.0 DAPL 1.2
> Adapter: Generic InfiniBand HCA by DAPL Reference Implementation Version
> 0.0
> Supporting:
> 64512 EPs with 65535 DTOs and 0 RDMA/RDs each
> 65408 EVDs of up to 65535 entries (default S/R size is 16/16)
> IOVs of up to 28 elements
> 131056 LMRs (and 131056 RMRs) of up to 0xffffffffffffffff bytes
> Maximum MTU 0x80000000 bytes, RDMA 0x80000000 bytes
> Maximum Private data size 92 bytes
> ***** ***** ***** ***** ***** *****
> DT_cs_Client: Posting 1 recv buffer
> DT_cs_Client: Connect Endpoint
> DT_cs_Client: Await connection ...
> DT_cs_Client: Connected!
> DAT_STATE: DAT_EP_STATE_CONNECTED
> DAT_STATE: Inbound DTO Status: Active
> DAT_STATE: Outbound DTO Status: Idle
> DT_cs_Client: Sending Client_Info
> DT_cs_Client: Sent Client_Info - awaiting completion
> DT_cs_Client: Sending Command
> DT_cs_Client: Sent Command - awaiting completion
> DT_cs_Client: Waiting for Server_Info
> DT_cs_Client: Server_Info Received
> DT_cs_Client: Version OK!
> -------------------------------------
> Server_Info.dapltest_version : 6
> Server_Info.is_little_endian : 1
> -------------------------------------
> Client_Info.dapltest_version : 6
> Client_Info.is_little_endian : 1
> Client_Info.test_type : 4
> Quit_Cmd.server_name: 192.168.0.26
> Quit_Cmd.device_name: mthca0a
> DT_cs_Client: Cleaning Up ...
> DT_cs_Client: IA mthca0a closed
> DT_cs_Client: ========== End of Work -- Client Exiting
> TEST INSTANCE 1
>
> Unfortunately, I get this error in dmesg:
>
> dapl_ib_disconnect_clean: ep_ptr 0xffff81003f085320 has invalid CM handle
Do you know if the DREQ actually has been sent on IB ?
In any case, that message is really debug and can be eliminated.
> Also, this is bad: slab corruption
>
> Slab corruption: start=ffff810077455eb8, len=312
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<ffffffff882c8252>](req_comp_work+0x42/0x90 [ib_at])
> 050: 6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 00 00 00 00
> 120: 6b 6b 6b 6b 6b 6b 6b 6b 00 00 00 00 00 00 00 00
> Prev obj: start=ffff810077455d68, len=312
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<0000000000000000>](0x0)
> 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Slab corruption: start=ffff81003aecbdb0, len=288
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<ffffffff882c8264>](req_comp_work+0x54/0x90 [ib_at])
> 040: 00 00 00 00 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b
> 110: 00 00 00 00 00 00 00 00 6b 6b 6b 6b 6b 6b 6b a5
> Prev obj: start=ffff81003aecbc78, len=288
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<0000000000000000>](0x0)
> 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Is this associated with the above (quit test) ? Can this be reproduced ?
I will inspect the code to see how this could occur. Is it between 2
x86_64 machines ?
-- Hal
> -tduffy
>
> ______________________________________________________________________
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list