[ofa-general] Re: [PATCH 2.6.30] xprtrdma: The frmr iova_start values are truncated by the nfs rdma client.

Steve Wise swise at opengridcomputing.com
Mon May 11 20:06:26 PDT 2009


Trond Myklebust wrote:
> On Mon, 2009-05-11 at 21:14 -0400, Tom Talpey wrote:
>   
>> At 08:44 PM 5/11/2009, Trond Myklebust wrote:
>>     
>>> On Mon, 2009-05-11 at 19:13 -0500, Steve Wise wrote:
>>>       
>>>> Trond Myklebust wrote:
>>>>         
>>>>> On Mon, 2009-05-11 at 17:25 -0500, Steve Wise wrote:
>>>>>   
>>>>>           
>>>>>> Hey Trond,
>>>>>>
>>>>>> Will this bug fix make 2.6.30?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Steve.
>>>>>>     
>>>>>>             
>>>>> Not in the form it is in now. As I've said earlier, I'm not happy about
>>>>> the sunrpc layer having to circumvent ordinary type checking on
>>>>> non-sunrpc structures.
>>>>>
>>>>> Cheers
>>>>>     Trond
>>>>>           
>>>> How is it circumventing?  It's currently incorrectly casting a pointer 
>>>> into a u64.  That seems just broken to me.  Also, its really the sunrpc 
>>>> rdma transport layer.  It deals specifically with rdma.  It _should_ 
>>>> know about rdma interfaces and types.
>>>>         
>>> The fact is that I'm simply not interested enough in rdma to tolerate
>>> hacks. If it isn't done cleanly, in a manner that I can maintain, then
>>> the whole transport layer comes out...
>>>       
>> I know exactly what you want - it's not what the code does now and
>> it's not an accessor function to set the hardware's u64 field. What's
>> needed is a new function to manage the entire RDMA triplet, and the
>> memory registration behind it, in the OFA code side. Put the hardware
>> goop below the line, IOW. I'll dust up Steve on this.
>>     
>
> This does indeed sound like what I'd looking for.
>
> There is a huge difference between having code that depends on well
> defined rdma interfaces, and code that depends on rdma hacks. A piece of
> code that requires casts from a non-local opaque type into another
> protocol-dependent non-local type will definitely fall in the latter
> category. I really don't care what the current code does, but a fix for
> that code is something that does it _correctly_; it is not yet another
> hack, whether or not it fixes a bug in the short term.
>
>    Trond
>
>   

Trond, I get your point, and we can certainly work on improving this 
with the rdma developer community.  But removing the one-line-broken 
cast will resolve a current crash situation for 2.6.30.  Can't we get 
this fix in 2.6.30 and work on the API improvements for 2.6.31?  I've 
CCed Roland and the ofa general list to get everyone involved in this 
thread so we can get this API design change going.

I agree we can clean this up moving forward, but lets fix the broken 
2.6.30 code.  

Will this work?

Steve.




More information about the general mailing list