[openib-general] Re: RFC: ipath ioctls and their replacements

Bryan O'Sullivan bos at pathscale.com
Thu Jan 19 08:29:18 PST 2006


On Thu, 2006-01-19 at 01:25 -0700, Eric W. Biederman wrote:

> Do the infiniband verbs not allow dealing with a unreliable datagram
> protocol?

Eric, I think you are misunderstanding what we are actually trying to
do.  We already implement IB verbs and the various IB networking
protocols in our drivers, at a layer that is not at all related to the
one that is currently festooned with ioctls.

The ioctl discussion pertains to lower-level direct user access to the
hardware, for a protocol that bypasses the entire IB stack and just
happens to send UD-compliant datagrams over the wire.

I'm actually pretty satisfied with the feedback I've already gotten from
Greg K-H and davem.

> We need some generic subsystem support to do this.

I am more than happy to put together generic support, provided I see
other drivers that could take advantage of it being considered for
submission.  Right now, I do not - in general - see this happening.

I know that some other drivers need to do user page pinning, and I'm
happy to try to find a generic solution that is common to IB and drivers
unrelated to IB.

> >         RCVCTRL enables/disables receipt of packets.
> 
> Again this is a generic problem, and the generic interfaces are broken
> if you can't do this.

The SIOCSIFFLAGS ioctl, which I assume is the generic interface you
refer to (it's the one used by iproute, at any rate), has poor overlap
with what we need (it supports a pile of stuff that we don't care about,
and we require a pile of stuff it doesn't support), and I don't feel
inclined to try using it in any case.
      
> >         SET_PKEY sets a partition key, essentially telling hardware
> >         which packets are interesting to userspace.
> 
> I'm pretty certain this should be something that should be set
> at open time.

It might be possible to make it fit into whatever replaces USERINIT, or
else we can use a netlink message of its own.

> >         UPDM_TID and FREE_TID are used for RDMA context management.
> >
> >         WAIT waits for incoming packets, and can clearly be replaced by
> >         file_ops->poll.
> >         
> >         GETCOUNTERS, GETUNITCOUNTERS and GETSTATS can all be replaced by
> >         files in sysfs.
> 
> This whole section just cries out for a network/rdma/ib/kernel-by-pass
> layer that is that any interesting network driver can use.

No, it doesn't.  Our chip's approach to remote memory access doesn't
even slightly resemble that of other comparable chips.  In addition, our
counters are entirely device-specific, and I'm already planning to move
them to sysfs.  The sysfs move gets them out of ioctl-land, and there's
no point in trying to do anything beyond that.

> Infiniband stack, it's there use it.

No.  If you're running a full IB stack, we provide the usual IB subnet
management facilities, and you can run OpenSM to manage your subnet.  If
you're *not*, which is the case I'm concerned with here, it makes no
sense to replicate the byzantine IB management interfaces in order to do
a handful of simple things that aren't even tied to the higher-level IB
protocols.

> There are a couple of choices here.

Yes, we'll use the firmware interface, as Greg suggested.

> There is also an interface in /proc or /sys I forget which
> that let's you select the individual bar for a pci device.

Yes, we'll use that.

Thanks for your comments.

	<b




More information about the general mailing list