[openib-general] Re: testing amso1100

Pete Wyckoff pw at osc.edu
Mon Mar 20 12:13:55 PST 2006


swise at opengridcomputing.com wrote on Mon, 20 Mar 2006 11:46 -0600:
> So were you able to get krping running on the amso adapters?

Yes, that works fine now.  Took a bit of futzing to work around some
odd situations that we may have brought upon ourselves.

1.  Interfaces iw2 and eth2 (rdma data path and normal data path) do not
want to be in the same subnet.

2.  On fc4, after "ifup iw2" it seems necessary to do "ip l s iw2
down; ip l s iw2 up" to get the driver to call c2_add_addr(), else
NIC apparently does not respond to client's arp request.

I wouldn't have complained except that you prodded me just now.

> > And pulling out iw_c2 takes a looong time:
> > 
> >     am30# time rmmod iw_c2
> >     c2: drivers/infiniband/hw/amso1100/c2_provider.c:c2_unregister_device:862
> >     ACPI: PCI interrupt for device 0000:09:08.0 disabled
> >     0.000 user  0.008 sys  60.085 real
> > 
> 
> I've never seen this.  I'm wondering how old these amso cards are?  Are
> they running the latest fpga image from 1.2u1?  IE:  did you use the
> Ammasso 1.2u1 package and ccflash2 to bring the device up to the latest
> HW image?  I'm not talking about firmware.  1.2u1 released a new FPGA
> image that needs to be applied with the 1.2u1 ccflash2.  

I used ccflash2 to update to this hardware image:
C2L_H23_B58_F61_080507.bit, from ogc_amso_kit_20060308.tgz.  Not the
one in the 1.2u1 package, from Amso1100-1.2u1-ga.tgz, as it appeared
by name to be older: C2L_H22_B58_F61_040814.bit.  Let me know if I
should try the H22 one instead.

(Before loading the iw_c2 module, every time, I install the
boot_image from ogc_amso_kit_20060308, too.)

To work around the one minute delay, I hacked the timeout in
vq_wait_for_reply down to 5 sec.  No idea why the NIC isn't
responding or if it is a bad thing.  I've not seen a minute-long
hang in any other circumstances yet.  A traceback during the 60
second hang shows this (de-uglified from x86-64 sysrq-T):

    <ffffffff802cc9ea>{schedule_timeout+154}
    <ffffffff8013a9d0>{process_timeout+0}
    <ffffffff880fc3fa>{:iw_c2:vq_wait_for_reply+106}
    <ffffffff8012b430>{default_wake_function+0}
    <ffffffff880fa942>{:iw_c2:c2_rnic_close+146}
    <ffffffff880faa1d>{:iw_c2:c2_rnic_term+13}
    <ffffffff880ed921>{:ib_core:ib_unregister_device+193}
    <ffffffff880f70e0>{:iw_c2:c2_remove+112}
    <ffffffff802caa90>{klist_release+0}
    <ffffffff801dcf6c>{pci_device_remove+44}
    <ffffffff80231905>{__device_release_driver+133}
    <ffffffff80231ce8>{driver_detach+184}
    <ffffffff802312ea>{bus_remove_driver+122}

It is in the context of "modprobe -r iw_c2".  Not a big deal, but
let me know if you'd like me to test something.

		-- Pete



More information about the general mailing list