[openib-general] [PATCH] IB/core - ib_umad can cause address alignment fault on ia64

John W. Marland jwm at prairieinet.net
Thu Jan 18 09:53:35 PST 2007


Michael S. Tsirkin wrote:

>>Quoting John W. Marland <jwm at prairieinet.net>:
>>Subject: Re: [PATCH] IB/core - ib_umad can cause address alignment fault on ia64
>>
>>Michael S. Tsirkin wrote:
>>
>>    
>>
>>>>Quoting Ralph Campbell <ralph.campbell at qlogic.com>:
>>>>Subject: [PATCH] IB/core - ib_umad can cause address alignment fault on ia64
>>>>
>>>>IB/core - ib_umad can cause address alignment fault
>>>>
>>>>In user_mad.c, the definition for struct ib_umad_packet includes
>>>>struct ib_user_mad at an odd 32-bit offset.  When ib_umad_write()
>>>>tries to assign rmpp_mad->mad_hdr.tid, there is an alignment fault on
>>>>architectures which have strict alignment for load/stores.
>>>>This patch fixes the problem by changing the offset on which
>>>>struct ib_user_mad is defined within struct ib_umad_packet.
>>>>
>>>>Thanks go to John W. Marland <jwm at prairieinet.net> for finding this.
>>>>
>>>>Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
>>>>
>>>>diff -r b1128b48dc99 drivers/infiniband/core/user_mad.c
>>>>--- a/drivers/infiniband/core/user_mad.c	Fri Jan 12 20:00:03 2007 +0000
>>>>+++ b/drivers/infiniband/core/user_mad.c	Wed Jan 17 14:09:37 2007 -0800
>>>>@@ -125,7 +125,7 @@ struct ib_umad_packet {
>>>>	struct ib_mad_send_buf *msg;
>>>>	struct ib_mad_recv_wc  *recv_wc;
>>>>	struct list_head   list;
>>>>-	int		   length;
>>>>+	long		   length;
>>>>	struct ib_user_mad mad;
>>>>};
>>>>   
>>>>
>>>>        
>>>>
>>>This does not make sense to me - do we have to replace all int fields with long
>>>now? Looks like a compiler or makefile bug in your setup - struct fields should
>>>be naturally aligned.
>>>
>>> 
>>>
>>>      
>>>
>>    We should probably have given a more complete explanation. The 
>>unaligned access hits in two places, that I've tracked down so far.
>>    The one where it's easiest to see what's happening is in ib_umad_write.
>>______________________________________________________________________________________
>>       if (!ib_response_mad(packet->msg->mad)) {
>>                tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid;
>>                *tid = cpu_to_be64(((u64) agent->hi_tid) << 32 |
>>                                   (be64_to_cpup(tid) & 0xffffffff));
>>
>>---> this line causes the access problem               
>>rmpp_mad->mad_hdr.tid = *tid;
>>        }
>>________________________________________________________________________________________
>>    The rmpp_mad variable is an ib_rmpp_mad pointer that is initialized 
>>    from the packet->mad.data early in the function.
>>    Because the ib_umad_packet structure has a as it's last element an 
>>    ib_user_mad structure, not a pointer to one, but the structure.
>>    This means that the Data[0] declaration at the end of the ib_umad 
>>    structure is forced onto a 4 byte boundary.
>>    
>>
>
>So the issue is that we are casting char *data which has no alignment guarantees
>to 64 bit number. We really must find a way to force 64 bit alignment for
>struct ib_user_mad all over. Would not something like the following simple trick work?
>
>struct ib_user_mad_hdr {
>	.............
>} __attribute__((aligned (8)));
>  
>
    In this case I don't think that will solve it. The memory area where 
the structure area is copied is one of those open ended declarations .i.e.
    ...
    __u8 Data[0]
    };
    The allocation allows one of two different sizes of structures to be 
placed on that Data[0]. Which would STILL work fine if the 
ib_umad_packet and ib_user_mad and varied sized data area are all 
allocated in one lump.
    I've never cared for these open ended structures. This is a good 
reason how/why it will eventually bite you.
    ....JW

>
>  
>
   




More information about the general mailing list