[ofa-general] using SDP for block device traffic: several problems
Lars Ellenberg
lars.ellenberg at linbit.com
Wed Jul 1 06:36:52 PDT 2009
On Wed, Jul 01, 2009 at 04:02:17PM +0300, Amir Vadai wrote:
Subject: Re: [patch] fix SDP page leak in sdp_bz_cleanup
In-Reply-To: <4A4B5E59.2030001 at mellanox.co.il>
> Hi Lars,
>
> This is the right place for posting patches.
>
> I will commit it ASAP into both branches.
Thanks for that one.
now, let me summarize some other findings.
== off-by-one error, data corruption ==
I think that "sometimes" you lose the last byte of a fragment.
situation: multi core, mlx4_ib driver,
IPoIB configured, SDP configured, more details on request ;)
do large message traffic on several streaming sockets
at the same time, using as much bandwidth as possible,
some on IPoIB, some on SDP.
"sometimes" (typically within a couple of minutes), when receiving the
stream, the last byte of some fragment is missing, or replaced by the
first byte of the next fragment (if any).
This has been noticed when using SDP from kernel space (for DRBD),
and reproduced in userland.
I will provide two simple perl scripts (server and client) today or
tomorrow, so you should be able to reproduce this yourself in userland.
It does not occur (within my patience time span) if there is not much
load, or if I only use one stream, or even if I only use SDP (and not
simultaneously also IPoIB streams). It only happens on SDP streams.
I'm not sure if this off-by-one happens during send or recv.
I'm open for suggestions to aid in tracking it down.
== module count imbalance ==
after modprobe, module usage count of ib_sdp is 0, as it should be.
starting to use it with some streaming sockest, module count goes up.
once the streams start disconnecting, being interrupted from the other
side, reconnect and similar stuff, module count quickly drops below
zero, manifesting in lsmod showing a module count of 4.2 millon ;)
I'm still trying to track this down, I'm not yet sure if it is a double
module_put, or a missing (try_)module_get ...
more when I find more.
Cheers,
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
More information about the general
mailing list