[openib-general] [PATCH] IB/core - ib_umad can cause address alignment fault on ia64

John W. Marland jwm at prairieinet.net
Thu Jan 18 07:28:16 PST 2007


Michael S. Tsirkin wrote:

>>Quoting Ralph Campbell <ralph.campbell at qlogic.com>:
>>Subject: [PATCH] IB/core - ib_umad can cause address alignment fault on ia64
>>
>>IB/core - ib_umad can cause address alignment fault
>>
>>In user_mad.c, the definition for struct ib_umad_packet includes
>>struct ib_user_mad at an odd 32-bit offset.  When ib_umad_write()
>>tries to assign rmpp_mad->mad_hdr.tid, there is an alignment fault on
>>architectures which have strict alignment for load/stores.
>>This patch fixes the problem by changing the offset on which
>>struct ib_user_mad is defined within struct ib_umad_packet.
>>
>>Thanks go to John W. Marland <jwm at prairieinet.net> for finding this.
>>
>>Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
>>
>>diff -r b1128b48dc99 drivers/infiniband/core/user_mad.c
>>--- a/drivers/infiniband/core/user_mad.c	Fri Jan 12 20:00:03 2007 +0000
>>+++ b/drivers/infiniband/core/user_mad.c	Wed Jan 17 14:09:37 2007 -0800
>>@@ -125,7 +125,7 @@ struct ib_umad_packet {
>> 	struct ib_mad_send_buf *msg;
>> 	struct ib_mad_recv_wc  *recv_wc;
>> 	struct list_head   list;
>>-	int		   length;
>>+	long		   length;
>> 	struct ib_user_mad mad;
>> };
>>    
>>
>
>This does not make sense to me - do we have to replace all int fields with long
>now? Looks like a compiler or makefile bug in your setup - struct fields should
>be naturally aligned.
>
>  
>
    We should probably have given a more complete explanation. The 
unaligned access hits in two places, that I've tracked down so far.
    The one where it's easiest to see what's happening is in ib_umad_write.
______________________________________________________________________________________
       if (!ib_response_mad(packet->msg->mad)) {
                tid = &((struct ib_mad_hdr *) packet->msg->mad)->tid;
                *tid = cpu_to_be64(((u64) agent->hi_tid) << 32 |
                                   (be64_to_cpup(tid) & 0xffffffff));

---> this line causes the access problem               
rmpp_mad->mad_hdr.tid = *tid;
        }
________________________________________________________________________________________
    The rmpp_mad variable is an ib_rmpp_mad pointer that is initialized 
from the packet->mad.data early in the function.
    Because the ib_umad_packet structure has a as it's last element an 
ib_user_mad structure, not a pointer to one, but the structure.
    This means that the Data[0] declaration at the end of the ib_umad 
structure is forced onto a 4 byte boundary.
    A chunk-o-memory is allocated to encapsulate the ib_umad_packet with 
the ib_user_mad AND enough space for a user created rmpp header.
    So the incoming structure is copied into the Data[0] area - and tid 
ends up on a 4 byte boundary even thought it's an 8 byte wide variable.
    Nothing to do with the compiler - more like pilot error :-)
    ---------
    Other possible solutions would be to exchange the position of the 
tid with the 16 bit variable before it or the one after it. That would 
put it on a correct alignment, but what about when someone else attaches 
the ib_mad_hdr to some other random place?
    Another solution would be to make the ib_user_mad structure in the 
ib_umad_packet into a pointer, but that would mean another kzalloc.
    The best solution general rule is probably to make sure that if you 
are going to embed a structure in another, that it's embedded on an 
$(ARCH_SIZE) boundary.
    I could have used an int PADD; variable but since there was a length 
field I figured that no one would mind if a longer length was possible.
    ....JW

    John W. Marland
    System Fabric Works





More information about the general mailing list