[ofa-general] mlx4: problem with resource limits > 2^20

Jack Morgenstein jackm at dev.mellanox.co.il
Tue Nov 20 04:42:35 PST 2007


Roland,

We're encountering a problem with resource profiles which have elements which exceed 1 Meg
(e.g., log_num_qp=21, or log_num_mtt=21 as module options for mlx4_core)

Many kernels allow kmalloc of only up to 128KB (which can support a 1 MB bitmap).  If the
resource max is greater than 1 Meg, the kmalloc will fail.

This occurred for MTTs when allocating the buddy table  -- file net/mlx4/mr.c,
procedure mlx4_buddy_init():
	for (i = 0; i <= buddy->max_order; ++i) {
		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
		buddy->bits[i] = kmalloc(s * sizeof (long), GFP_KERNEL);
		if (!buddy->bits[i])
			goto err_out_free;

The kmalloc here fails for max_order > 20.

Additionally, kmalloc will fail in net/mlx4/alloc.c, procedure mlx4_bitmap_init():

	/* num must be a power of 2 */
	if (num != roundup_pow_of_two(num))
		return -EINVAL;

	bitmap->last = 0;
	bitmap->top  = 0;
	bitmap->max  = num;
	bitmap->mask = mask;
	spin_lock_init(&bitmap->lock);
	bitmap->table = kzalloc(BITS_TO_LONGS(num) * sizeof (long), GFP_KERNEL);

Here, num is the resource max.  Thus, if we set the profile to allow log_num_qp=21,
the above kzalloc will fail (also because the required bitmap is greater than 128KB).

I know that we can use vmalloc here and succeed -- however, this will present a severe problem
on x86 systems (really small kernel virtual-memory space).

We have 3 options, as I see it:

1. Change the bitmap allocator and buddy systems to use a 2-level scheme.
2. Use vmalloc for allocations greater than 128K, and note that for x86 systems you cannot specify more
   than 1M for any resource in the profile.
3. Do nothing, and just note that cannot allocate more than 1M of any resource for ANY system.

My own preference is 2 (with maybe some test to determine just what the crossover point is, rather than
just having 128K as a defined constant) -- or, given some time, 1 (which is a more general and scalable solution).

Any suggestions?

- Jack



More information about the general mailing list