[ofa-general] ib_umad method mask problems on big-endian 64-bit archs
Roland Dreier
rdreier at cisco.com
Mon Aug 20 14:32:22 PDT 2007
It turns out there is a problem with how user_mad copies the
method_mask from the userspace agent registration request, which has
the member
__u32 method_mask[4];
into the kernel agent registration request, which has the member
DECLARE_BITMAP(method_mask, IB_MGMT_MAX_METHODS);
which defines method_mask to be an array of longs.
<asm-powerpc/bitops.h> has a good pictorial representation of the problem:
* The bitop functions are defined to work on unsigned longs, so for a
* ppc64 system the bits end up numbered:
* |63..............0|127............64|191...........128|255...........196|
* and on ppc32:
* |31.....0|63....31|95....64|127...96|159..128|191..160|223..192|255..224|
ie if userspace fills in the method_mask in the natural way then
things end up being wrong when the kernel memcpys the userspace
request into the kernel registration request.
Userspace could try and work around this and match the kernel's
layout, but unfortunately 32-bit userspace can't do that in any sane
way: on a 32-bit kernel there is one layout and on a 64-bit kernel
there is another and so userspace is stuck.
The patch below fixes the kernel to fix the bit layour when copying
the method_mask.
I'd like to merge this, but unfortunately there is the complication that
opensm/libvendor/osm_vendor_ibumad.c has this:
static int set_bit(int nr, void *method_mask)
{
int mask, retval;
long *addr = method_mask;
addr += nr >> 5;
mask = 1 << (nr & 0x1f);
retval = (mask & *addr) != 0;
*addr |= mask;
return retval;
}
which is actually wrong in general (if sizeof long is 8, then
offsetting by nr>>5 ends up making the subscript twice what it should
be), but all defined IB MAD methods are less than 32 so it ends up
being OK, and actually works for 64-bit big-endian userspace on a
64-bit kernel. So the patch below would mean that 64-bit osm binaries
on ppc64 would break (32-bit osm binaries on powerpc are of course
broken with a 64-bit kernel). Still, perhaps the right thing to do is
to just fix the opensm code at the same time?
Opinions?
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index d97ded2..c2c58c9 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -596,6 +596,7 @@ static int ib_umad_reg_agent(struct ib_umad_file *file, unsigned long arg)
struct ib_mad_agent *agent;
int agent_id;
int ret;
+ int i, j;
down_write(&file->port->mutex);
@@ -625,8 +626,16 @@ found:
if (ureq.mgmt_class) {
req.mgmt_class = ureq.mgmt_class;
req.mgmt_class_version = ureq.mgmt_class_version;
- memcpy(req.method_mask, ureq.method_mask, sizeof req.method_mask);
- memcpy(req.oui, ureq.oui, sizeof req.oui);
+
+ for (i = 0; i < BITS_TO_LONGS(IB_MGMT_MAX_METHODS); ++i) {
+ req.method_mask[i] = 0;
+ for (j = 0; j < sizeof (long) / 4; ++j)
+ req.method_mask[i] |=
+ ureq.method_mask[i * sizeof (long) / 4 +
+ j] << (j * 32);
+ }
+
+ memcpy(req.oui, ureq.oui, sizeof req.oui);
}
agent = ib_register_mad_agent(file->port->ib_dev, file->port->port_num,
More information about the general
mailing list