[openib-general] Kernel assertion

Woodruff, Robert J robert.j.woodruff at intel.com
Fri Dec 17 17:45:19 PST 2004


 
>Roland wrote, 
>OK, I checked in a fix for this.  I'm actually not sure if it was
>really a bug in IPoIB or a glitch in the network stack (see my recent
>post to netdev, cc'ed to openib-general, about 'LLTX and
>netif_stop_queue') but in any case I made the messages go away in my
>setup.

>By the way, it might be interesting to see if increasing
>IPOIB_RX_RING_SIZE and/or IPOIB_TX_RING_SIZE in ipoib.h had any effect
>on your performance (make sure to keep them powers of 2).

>Thanks,
>  Roland

Ok this seems to fix the assertion and I am now able to run a 4 node 
cluster OK. I have 2 PCI-E HCA nodes and 2 PCI-X HCA nodes. I still had
to modify

	IPOIB_NUM_WC 		  = 4,
to
	IPOIB_NUM_WC 		  = 1,

and with that change it seems to run fine with the 4.6-0-rc4 firmware.
I still cannot seen to get it to run reliably with the 4.3.5 firmware,
so that is why I am running PCI-X cards in the other 2 nodes. 
Guess I need to investigate how to use tvflash to upgrade all the nodes
to 4.6.0-rc4. 

I will let it run over the week-end running MPI jobs to test the
stability,
but it has been running for about 1 hour without any problems. 
 
I also played around with the ring buffer sizes, 
I tried the default 64-128, 128-256, and 256-512.
It really did not seem to make much difference in performance (on MPI
Pallas benchmark).
See attached logs, mpi.64-128.log, mpi.128-256.log, and mpi.256-512.log
for the numbers running across the PCI-E HCAs.  
I also make a run between the 2 PCI cards. The PCI-E cards look a lot
better
in performance, but I am not sure if the PCI slots are PCI-X 133 or not,
so
that could explain why it is so much slower on the PCI-X cards. 

woody




-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi-256-512.log
Type: application/octet-stream
Size: 21692 bytes
Desc: mpi-256-512.log
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20041217/eac8bb61/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi-64-128.log
Type: application/octet-stream
Size: 21692 bytes
Desc: mpi-64-128.log
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20041217/eac8bb61/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi-64-128-pci.log
Type: application/octet-stream
Size: 21691 bytes
Desc: mpi-64-128-pci.log
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20041217/eac8bb61/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpi-128-256.log
Type: application/octet-stream
Size: 21692 bytes
Desc: mpi-128-256.log
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20041217/eac8bb61/attachment-0003.obj>


More information about the general mailing list