[openib-general] [RFC] determining which changes in svn to merge upstream or remove

Sean Hefty sean.hefty at intel.com
Tue Sep 26 01:12:17 PDT 2006


>BTW, there was a set of bugfix patches for CMA posted that didn't get acked or
>nacked yet.  They looked sane and I took them into ofed - could you take the
>time to review please? Should I repost?  It might make sense to put stability
>fixes in before adding more features.

I've actually been on vacation for 2 of the last 3 weeks, so haven't had a
chance to review recent patches.  I should get to them by the end of this week.

I want to ensure that whatever ends up being submitted upstream makes the most
sense, including pushing fixes before other changes.  This adds more work, which
is why I'd like to get svn more in sync with the kernel.

>>         - userspace support
>
>I think we agreed that this will use timewait support in
>core/low level drivers to handle timewait/stale packets right.
>Is that right? If so, I really need to fid the time to do this.

I think so.  I don't see a clean alternative.  I would also propose that we
discourage connecting QPs outside of the IB CM to allow for detecting duplicate
connections.  We don't necessarily need to enforce this in code, but changing
test programs to at least comment that connecting over sockets is discouraged
may help.

>>         - multicast support
>>         - UD QP support (required for multicast)
>>         - IB specific options (set paths, CM timeouts)
>
>I think that at some point we agreed that at least the option to set
>retry count can be made generic (with a limit of 15 retries).
>This kind of makes sense since TCP sockets have SYN retry option.
>
>Wrt CM timeouts, asking the ULP to guess the timeout
>does not make much sense to me - how does the ULP know?
>IMO we need to implement a smarter heuristic that will set them
>automatically somehow. Is RDMA CM using all data from the path record
>query already? How about implementing exponential backoff? Other ideas?

My thoughts on the options are to try to hold off merging them upstream.  The
option to get/set path records needs reworked.  Getting paths should be done
outside of the RDMA CM, such as through a userspace SA, and the user to kernel
interface should pass the attribute values as defined by the spec (i.e. in
network order) to avoid marshalling issues.

The CM timeout/retry options are used by uDAPL, but the fix to increase the
default retry count to the maximum may help.  The RDMA CM uses the data from the
path record, but the ULP has the most data about how long the remote side might
take to respond to a CM REQ message (remote_cm_response_timeout).  We might be
able to have the RDMA CM make clever use of the MRA to avoid the issue, and even
in the short term, dumb use of the MRA may help.  (The issue is that as
connections are formed, they begin being used, which can greatly affect how
quickly new connections can be created.  We've seen them take up to 60 seconds.)

>
>> * Local SA cache
>
>This is supposed to reduce the load on SM, but personally, I am still not
>convinced this is actually necessary - we are seeing gen2 based clusters
>running
>just fine without these tricks.
>
>What is more, this seems to break the model of IB network as a centrally
>managed
>fabric, and a look at the code gives me the feeling no one thought through how
>this will interact with SM features such as QoS, balancing, tavor MTU
>optimizations etc.

This is a module that I'm not inclined to submit upstream.  It was requested as
part of the Path Forward work, but I haven't seen any feedback on its use or
performance.

>> * IB multicast module
>
>Last time I tested this, there still were crashes with the IPoIB.
>If there's a patch that adds just this change, I might be able to test it.
>OTOH, I'm still not sure why are we touching IPoIB at all since
>it seems unlikely any other ULP will want to share in the IPoIB mcast group.

Personally, I think ipoib should use it to reduce code duplication.  Declining
the change because there _might_ be a bug in the new module doesn't seem like
the right approach to take.  (Why accept changes in the HCA driver then?)  Plus
we continue to find bugs in the ipoib multicast code.  The main reason I
modified ipoib was so that ib_multicast had a real user that I could test with,
but there's nothing architecturally that prevents a process from joining the
ipoib multicast group (maybe to snoop traffic for some reason...).  I haven't
seen any crashes with ipoib using ib_multicast, and since my last fix, I haven't
seen any bug reports.

The patches for ipoib need to be regenerated because of changes since the svn
checkin.

- Sean




More information about the general mailing list