[ofw] RoCE branch merge into the trunk?

Smith, Stan stan.smith at intel.com
Tue Oct 25 11:09:53 PDT 2011


Leo,
  Here's the facts:

1) winOFED has been waiting 6 months for RoCE code drop in order to make a winOFED release. In April the WWG agreed to hold the winOFED release in order to include RoCE support.
   In April Mellanox claimed a code drop in May, in May the claim was June, in June the claim was August maybe September; the drop arrived in October.

2) Because Mellanox is so busy, the WWG gave you a special no patches required pass because you had so many changes you could not spare the time to make patches for all the changes.  Mellanox drops an SVN branch of RoCE mods not based on the current SVN trunk head.

3) Three weeks later, after reviews by Sean, Fab and myself, you get around to testing your branch for RoCE and discover it does not work.

4) Instead of working from an existing common (updated to svn trunk head) RoCE branch and root causing the problem, because you're so busy, you dump another RoCE branch (mlx4_full) which contains 203 different files from the existing common RoCE branch; including additional source changes in areas which have nothing to do with RoCE (IPoIB).

By creating another new branch, all the time Sean, Fab and myself spent on code review and testing is now wasted time; restart the review/test process with the latest RoCE branch (mlx4_full).

5) Did I mention the new RoCE drop does NOT work for InfiniHost cards!  For ConnectX-2 cards mlx4_full does install with initial testing showing base IB functionality.
   The mthca driver from the System Event log entries claims to have loaded correctly; Windows on the other hand claims the device never started?

I'm not trying to diminish all the work you did, it's just the WWG trusted Mellanox to deliver on what they claimed they would do and here we are 6 months later flopping around like a fish out of water.
No complete RoCE support and no winOFED release.

How would you react if you were in our position?

Stan.

>-----Original Message-----
>From: Leonid Keller [mailto:leonid at mellanox.com]
>Sent: Tuesday, October 25, 2011 3:07 AM
>To: Smith, Stan; Tzachi Dar; Hefty, Sean; 'Fab Tillier'
>Cc: Uri Habusha; Leonid Keller
>Subject: RE: RoCE branch merge into the trunk?
>
>Hi Stan,
>
>I'm sorry, but can't agree with some of your statements below.
>I never told that I *have* finished the merge with \mlx4 branch.
>I never told that it contains all the RoCE code and is ready to be merged into trunk.
>I could have worked in the local branch and publish all the work after it has been done (what I did with \mlx4_full).
>But *WE* decided on another way, we - the community and Mellanox.
>We decided that I will publish the changes step by step to make it possible for you to review them.
>And I *strongly disagree* with you statement that you wasted your time.
>No! All your comments are very valuable and it's not about some un-relevant code.
>All mlx4 branch is inside of mlx4_full; it just contains more other changes.
>
>I've got a feeling that I failed to explain myself well.
>I want to try once more, shortly.
>
>I did the full merge in *another* branch to give the possibility for OFED to provide a new version as fast as possible.
>It goes by the price of adding some non-agreed-upon but working code, which is going to be reviewed, discussed and changed after our and
>your releases.
>It's one possible way.
>
>Another one - to postpone the OFED release and to continue the work on \mlx4 branch.
>I.e., to add missing stuff, discuss all the patches, change them after getting to consensus, to debug and test.
>This way is better from the point of having only agreed-upon code, but more time consuming.
>
>These ways - and any others, you are ready to suggest - we want to discuss with you guys today or ASAP.
>
>We *want* to see RoCE in OFED and we *want* to have the same code base with Open Source, the question is how we can reach these
>goals with existing resources and other our obligations.
>
>
>-----Original Message-----
>From: Smith, Stan [mailto:stan.smith at intel.com]
>Sent: Tuesday, October 25, 2011 7:01 AM
>To: Leonid Keller; Tzachi Dar; Hefty, Sean; 'Fab Tillier'
>Cc: Uri Habusha
>Subject: RE: RoCE branch merge into the trunk?
>
>Hello Mellanox,
>  It's very disappointing the code you placed in branches\mlx4 was never tested for RoCE, the entire point of the effort!
>Bottom line: Sean and I wasted a large amount of time reviewing and testing what you claimed to be valid RoCE code.
>I understand mistakes happen, this is unacceptable.
>
>So it's back to the beginning on the RoCE changes review plus testing.
>
>Consequently I see no reason to meet Tuesday to discuss the merge as Sean, Fab and I have had no time to review this code base.
>
>I will be in touch when we have something constructive to say about the situation.
>
>Stan.
>
>
>
>>-----Original Message-----
>>From: Leonid Keller [mailto:leonid at mellanox.com]
>>Sent: Monday, October 24, 2011 5:55 PM
>>To: Smith, Stan; Tzachi Dar
>>Cc: Uri Habusha; Leonid Keller
>>Subject: RE: RoCE branch merge into the trunk?
>>
>>Hi,
>>
>>Before answering your question I have to make some explanations.
>>
>>My first intention was to include in mlx4 branch new \hw drivers and a small number of patches, relevant to RoCE and FDR.
>>It turned out, that these patches are not so "small-numbered", and - what is much worse - they are found inside a sea of other  changes,
>>made for the sake of new chips, Ethernet driver and debugging instrumentation.
>>The patches, that I managed to add to mlx4 were not enough for RoCE to work, seems like I've missed something.
>>I compared mlx4 branch with our tree and saw 200+ different files.
>>It looked vey time-consuming either to look for the missed patches or to debug \mlx4 branch.
>>From other side, even several patches to \core, that I've entered, met a lot of comments and disagreements.
>>
>>That's why I made a decision that answers the following requirements:
>>- provide to OFED as fast as possible a workable version of the stack, which supports RoCE and FDR;
>>- sync Mellanox and OFED trees as much as possible in order to keep it sync'ed from now on.
>>
>>The latter means, that it's clear for me that there are a lot of changes that will meet questions/disagreements et al.
>>But: We have to prepare a release in a month, OFED - either.
>>My idea was to postpone all the discussions till after the release.
>>*We are going to discuss all this issue with you tomorrow.*
>>
>>Now - to your question.
>>I didn't touch mlx4 branch as you can see. It supports IB, but not RoCE.
>>
>>Instead, I've created a new branch mlx4_full on base of mlx4:3311.
>>Then I merged into it the most of Mellanox tree in one large commit 3316.
>>The result is - you have a version (3316), that supports RoCE and FDR.
>>It is not tested enough yet, but I saw all perf tests (ib_xxx_bw, ibv_xxx_bw) and nd tests (over both ND providers), running over RoCE.
>>
>>In order to provide possibility to check RoCE ASAP, I've included into this revision a temporary folder with Ethernet driver installation files
>>(see hw\eth) and even an MSI, built from this revision.
>>(So if you have a Win7/x64 setup, you can install RoCE in 5 minutes. :) )
>>
>>BTW, you'll notice that my MSI has revision 3313.
>>It's because I haven't noticed your patches 3313-3315 in \mlx4.
>>They are not included in 3316 yet.
>>
>>
>>
>>
>>-----Original Message-----
>>From: Smith, Stan [mailto:stan.smith at intel.com]
>>Sent: Friday, October 21, 2011 10:45 PM
>>To: Leonid Keller; Tzachi Dar
>>Cc: Uri Habusha
>>Subject: RoCE branch merge into the trunk?
>>
>>Hello,
>>  When do you plan on making the merge of the branches\mlx4 --> Trunk\ ?
>>
>>Thanks,
>>
>>Stan.



More information about the ofw mailing list