[ofa-general] fmr pool free_list empty

Pete Wyckoff pw at osc.edu
Mon Feb 25 14:53:30 PST 2008


I have a test code that breaks iser reliably, making it say this:

    iser: iser_reg_page_vec:ib_fmr_pool_map_phys failed: -11

in 2.6.25-rc1 plus varlen, bidi patches.

The trick is to require it to use FMR and to keep a large number of
operations in flight.  Building an sglist with a bunch of pages that
are not contiguous does the job.  Increasing the pool size and/or
decreasing the dirty watermark seem to have no effect.

Looking at the FMR dirty list unmapping code in
ib_fmr_batch_release(), there is a section that pulls all the dirty
entries onto a list that it will later unmap and put back on the
free list.

But it also plans to unmap all the free entries that have ever been
remapped:

        /*
         * The free_list may hold FMRs that have been put there
         * because they haven't reached the max_remap count.
         * Invalidate their mapping as well.
         */
        list_for_each_entry_safe(fmr, next, &pool->free_list, list) {
                if (fmr->remap_count == 0)
                        continue;
                hlist_del_init(&fmr->cache_node);
                fmr->remap_count = 0;
                list_add_tail(&fmr->fmr->list, &fmr_list);
                list_move(&fmr->list, &unmap_list);
        }

Deleting that block of code makes the problem go away.

The issue seems to be that the thread doing this batch_release()
holds the spinlock while gathering up the unmap victims, then it
drops it to go off and do the actual unmaps.  Meanwhile, the thread
from iser that wants to do ib_fmr_pool_map_phys() finds that the
free list is now empty and complains.

Presumably this optimization of remapping the aging free list
entries helps in some workloads.  But emptying the free list is not
good for iser.  Any ideas on this fix or suggestions for a better
one?

Maybe if ib_fmr_pool_unmap() put returning FMRs on the front of the
list, it would help keep the remap_count more bimodal, and the unmap
code above would not eagerly grab all of the free ones at once.
Might keep the cache a bit hotter too.

		-- Pete



More information about the general mailing list