[openib-general] [PATCH] fmr support in mthca

Fri Mar 18 11:42:29 PST 2005

    Michael> Good, glad to help. I will try to address your comments
    Michael> next week (its already weekend here).

No problem.  Libor won't be back until Monday so I won't even try SDP
until then.

    Roland> What if we just reserve something like 64K MPTs and MTTs
    Roland> for FMRs and ioremap everything at driver startup?  That
    Roland> would only use a few MB of vmalloc space and probably
    Roland> simplify the code too.

    Michael> I dont like these pre-allocations - if someone is only
    Michael> using SDP and IP over IB, it seems he wont need almost
    Michael> any regular regions.  64K MTTs with 4K page size cover up
    Michael> to 200MByte of memory.

We can bump up the numbers if you want.  Right now the default
allocation is 1 << 20 MTT segments (8 << 20 MTT entries).  I see no
problem with having 64K MPTs and 256 MTT segments reserved for FMRs by
default.  That should be more than enough for a single HCA -- 256K MTT
segments means that 2 million pages or 8 GB of IO could be in flight
at a time, which doesn't seem like a harsh limit to me.

Ultimately we can make the allocations tunable at device init time,
along with the rest of the parameters (number of QPs, number of CQs,
etc).  I haven't seen much pressure to do that so far but it is
definitely in my plans.

    Michael> My other problem with this approach was implementational:
    Michael> existing allocator and table code can be passed reserved
    Michael> parameter, but dont have the ability to allocate out of
    Michael> that pool. So we'd have to allocate out of a separate
    Michael> allocator, and take care so that keys do not
    Michael> conflict. This gets a bit complicated.

I think this is the way to go.  Keys are easy to deal with -- in
mthca_init_mr_table, we could just pass dev->limits.num_fmrs instead
of dev->limits.reserved_mrws when initializing dev->mr_table.mpt_alloc,
and then create a new table of size dev->limits.num_fmrs and reserve
dev->limits.reserved_mrws out of that table.

The buddy allocator is a little more work but it needs to be cleaned
up and encapsulated better anyway.  Once that's done we'd just have
two buddy allocators.  The first one would cover all the MTT segments,
and we'd first take out a chunk of that one to cover the reserved MTTs
and then allocate another chunk that can hold whatever number of MTT
segments we decide to use for FMRs.

    Michael> Maybe do something separate for 32 bit kernels (like -
    Michael> disable FMR support)?

No FMRs on 32-bit kernels isn't going to fly.  It doesn't seem that
hard to make things work on i386 so why not do it?

    Michael> Yes but for mtts the addresses may not be physically
    Michael> contigious, unless we want to limit FMRs to PAGE_SIZE/8
    Michael> MTTs, which means 512 MTTs, that is 2MByte with 4K FMR
    Michael> page size.  And is it seems possible that even with this
    Michael> limitation MTTs for a specific FMR start at non page
    Michael> aligned boundary.

I think it's fine to limit an FMR to 512 MTT entries.  I'd have to
look at the source to be sure of the exact numbers, but I know that
for the Topspin stack, neither SDP nor SRP is using more than 32
entries per FMR.  A limit of mapping 512 pages/2 MB per FMR seems
fine.  I don't know of anyone using FMRs even close to that big.

Even if it turns out to be to small, I see no problem with adding a
small array of something on the order of 2 or 4 MTT pages.

If we use the buddy allocator for MTT entries for FMRs, then alignment
is OK.  The buddy allocator guarantees that objects will be aligned to
their size, which means that the MTT segments will never cross a page
boundary.

 - R.