<html><body>
<p>Hello Eli,<br>
<br>
FYI. In case you didn't receive these emails on time. You are welcome to create a patch on top of it, like use __skb_put to replace +size in ipoib_ud_sg_put_frags().<br>
<br>
-------------------------------------------------------------------------------------------<br>
<br>
Nam and Stefan have helped out in the backporting while I am concentrate on stress testing against 2.6.24 kernel, (20 duplex streams over one port testing against mthca for 2K mtu, it has been running over 8 hours). What we have validated this patch on (build, sniff test, flood ping) are 2.6.16 - 2.6.24 kernel, RHEL4.5, RHEL4.6, RHEL5.1 and SLES10SP1& the derivative version of SLES 10 SP1.<br>
<br>
Below attachment is the backport patch. I reattach the patch file here for your convenient. The backport patch file ipoib_0100_to_2.6.21.patch needs to be copied into below dir:<br>
<table border="0" cellspacing="0" cellpadding="0">
<tr valign="top"><td width="498">./kernel_patches/attic/backport/2.6.9_U2/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/attic/backport/2.6.9_U3/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.18-EL5.1/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.16_sles10/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.9_U4/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.9_U5/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.9_U6/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.18_suse10_2/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.16_sles10_sp1/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.11/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.12/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.13/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.14/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.15/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.16/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.17/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.18/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.19/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.20/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.21/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.13_suse10_0_u/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.15_ubuntu606/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.11_FC4/ipoib_0100_to_2.6.21.patch<br>
./kernel_patches/backport/2.6.18_FC6/ipoib_0100_to_2.6.21.patch</td></tr>
</table>
<br>
Shirley<br>
<br>
<font size="2" color="#800080">----- Forwarded by Shirley Ma/Beaverton/IBM</font><font size="2" color="#800080"> on 02/06/08 12:28 AM</font><font size="2" color="#800080"> -----</font><br>
<br>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr valign="top"><td style="background-image:url(cid:2__=08BBF974DFF6F2488f9e8a93df938@us.ibm.com); background-repeat: no-repeat; " width="31%">
<ul>
<ul>
<ul>
<ul><b><font size="2">Shirley Ma/Beaverton/IBM</font></b>
<p><font size="2">02/05/08 01:33 PM</font></ul>
</ul>
</ul>
</ul>
</td><td width="69%">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr valign="top"><td width="1%"><img width="58" height="1" src="cid:3__=08BBF974DFF6F2488f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<div align="right"><font size="2">To</font></div></td><td width="100%"><img width="1" height="1" src="cid:3__=08BBF974DFF6F2488f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<font size="2">tziporet@dev.mellanox.co.il, "Vladimir Sokolovsky (Mellanox)" <vlad@lists.openfabrics.org></font></td></tr>
<tr valign="top"><td width="1%"><img width="58" height="1" src="cid:3__=08BBF974DFF6F2488f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<div align="right"><font size="2">cc</font></div></td><td width="100%"><img width="1" height="1" src="cid:3__=08BBF974DFF6F2488f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<font size="2">eli@mellanox.co.il</font></td></tr>
<tr valign="top"><td width="1%"><img width="58" height="1" src="cid:3__=08BBF974DFF6F2488f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<div align="right"><font size="2">Subject</font></div></td><td width="100%"><img width="1" height="1" src="cid:3__=08BBF974DFF6F2488f9e8a93df938@us.ibm.com" border="0" alt=""><br>
<font size="2">[Final][PATCH] IPoIB-4K MTU patch</font><a href="Notes:///872568A20061C100/32547D7F59F9E7E38525613200556E77/764BED18EE8DDCF1872573E6006A8BA6"><img src="cid:4__=08BBF974DFF6F2488f9e8a93df938@us.ibm.com" border="0" alt="Shirley Ma"></a></td></tr>
</table>
<table border="0" cellspacing="0" cellpadding="0">
<tr valign="top"><td width="58"><img width="1" height="1" src="cid:3__=08BBF974DFF6F2488f9e8a93df938@us.ibm.com" border="0" alt=""></td><td width="336"><img width="1" height="1" src="cid:3__=08BBF974DFF6F2488f9e8a93df938@us.ibm.com" border="0" alt=""></td></tr>
</table>
</td></tr>
</table>
Hello, below is the final patch based on Eli's review comments. Thanks Eli for all of your work.<br>
<br>
This patch has been validated on 2.6.24 kernel, SLES10 on both intel/mthca and ppc/mthca. I am working on RHEL5 testing. The backport patch will be provided tonight. Hopefully Nam could help me on this. I will continue to let the stress test going on different of subnets. I hopefully these is nothing changed in ofed-1.3bit today. So the patch can be applied cleanly. If not, let me know. Please use attachment for applying patch since my notes has problem.<br>
<br>
Thanks<br>
Shirley<br>
------------<br>
<br>
This patch is enabling IPoIB 4K MTU support. When PAGE_SIZE is greater than<br>
IB MTU size + GRH + IPoIB head, there is no need for RX S/G. When it's smaller<br>
two buffers are allocated, one buffer is GRH+IPoIB header, one buffer is for <br>
IPoIB payload.<br>
<br>
Signed-off-by: Shirley Ma <xma@us.ibm.com><br>
---<br>
<br>
diff -urpN ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib.h ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib.h<br>
--- ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib.h 2008-02-04 20:09:18.000000000 -0800<br>
+++ ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib.h 2008-02-05 12:20:46.000000000 -0800<br>
@@ -56,11 +56,11 @@<br>
/* constants */<br>
<br>
enum {<br>
- IPOIB_PACKET_SIZE = 2048,<br>
- IPOIB_BUF_SIZE = IPOIB_PACKET_SIZE + IB_GRH_BYTES,<br>
-<br>
IPOIB_ENCAP_LEN = 4,<br>
<br>
+ IPOIB_UD_HEAD_SIZE = IB_GRH_BYTES + IPOIB_ENCAP_LEN,<br>
+ IPOIB_UD_RX_SG = 2, /* for 4K MTU */ <br>
+<br>
IPOIB_CM_MTU = 0x10000 - 0x10, /* padding to align header to 16 */<br>
IPOIB_CM_BUF_SIZE = IPOIB_CM_MTU + IPOIB_ENCAP_LEN,<br>
IPOIB_CM_HEAD_SIZE = IPOIB_CM_BUF_SIZE % PAGE_SIZE,<br>
@@ -141,9 +141,9 @@ struct ipoib_mcast {<br>
struct net_device *dev;<br>
};<br>
<br>
-struct ipoib_rx_buf {<br>
+struct ipoib_sg_rx_buf {<br>
struct sk_buff *skb;<br>
- u64 mapping;<br>
+ u64 mapping[IPOIB_UD_RX_SG];<br>
};<br>
<br>
struct ipoib_tx_buf {<br>
@@ -337,7 +337,7 @@ struct ipoib_dev_priv {<br>
<br>
struct net_device *dev;<br>
struct ib_recv_wr rx_wr_draft[UD_POST_RCV_COUNT];<br>
- struct ib_sge sglist_draft[UD_POST_RCV_COUNT];<br>
+ struct ib_sge sglist_draft[UD_POST_RCV_COUNT][IPOIB_UD_RX_SG];<br>
unsigned int rx_outst;<br>
<br>
struct napi_struct napi;<br>
@@ -378,7 +378,7 @@ struct ipoib_dev_priv {<br>
unsigned int admin_mtu;<br>
unsigned int mcast_mtu;<br>
<br>
- struct ipoib_rx_buf *rx_ring;<br>
+ struct ipoib_sg_rx_buf *rx_ring;<br>
<br>
spinlock_t tx_lock;<br>
struct ipoib_tx_buf *tx_ring;<br>
@@ -412,6 +412,7 @@ struct ipoib_dev_priv {<br>
struct ipoib_ethtool_st etool;<br>
struct timer_list poll_timer;<br>
struct ib_ah *own_ah;<br>
+ int max_ib_mtu;<br>
};<br>
<br>
struct ipoib_ah {<br>
@@ -452,6 +453,22 @@ struct ipoib_neigh {<br>
struct list_head list;<br>
};<br>
<br>
+#define IPOIB_UD_MTU(ib_mtu) (ib_mtu - IPOIB_ENCAP_LEN)<br>
+#define IPOIB_UD_BUF_SIZE(ib_mtu) (ib_mtu + IB_GRH_BYTES)<br>
+static inline int ipoib_ud_need_sg(int ib_mtu)<br>
+{<br>
+ return (IPOIB_UD_BUF_SIZE(ib_mtu) > PAGE_SIZE) ? 1 : 0;<br>
+}<br>
+static inline void ipoib_sg_dma_unmap_rx(struct ipoib_dev_priv *priv,<br>
+ u64 mapping[IPOIB_UD_RX_SG])<br>
+{<br>
+ if (ipoib_ud_need_sg(priv->max_ib_mtu)) {<br>
+ ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_HEAD_SIZE, DMA_FROM_DEVICE);<br>
+ ib_dma_unmap_page(priv->ca, mapping[1], PAGE_SIZE, DMA_FROM_DEVICE);<br>
+ } else <br>
+ ib_dma_unmap_single(priv->ca, mapping[0], IPOIB_UD_BUF_SIZE(priv->max_ib_mtu), DMA_FROM_DEVICE);<br>
+}<br>
+<br>
/*<br>
* We stash a pointer to our private neighbour information after our<br>
* hardware address in neigh->ha. The ALIGN() expression here makes<br>
diff -urpN ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib_ib.c ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib_ib.c<br>
--- ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2008-02-04 20:09:18.000000000 -0800<br>
+++ ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2008-02-05 12:20:40.000000000 -0800<br>
@@ -96,14 +96,37 @@ static void clean_pending_receives(struc<br>
<br>
for (i = 0; i < priv->rx_outst; ++i) {<br>
id = priv->rx_wr_draft[i].wr_id & ~IPOIB_OP_RECV;<br>
- ib_dma_unmap_single(priv->ca, priv->rx_ring[id].mapping,<br>
- IPOIB_BUF_SIZE, DMA_FROM_DEVICE);<br>
+ ipoib_sg_dma_unmap_rx(priv,<br>
+ priv->rx_ring[i].mapping);<br>
dev_kfree_skb_any(priv->rx_ring[id].skb);<br>
priv->rx_ring[id].skb = NULL;<br>
}<br>
priv->rx_outst = 0;<br>
}<br>
<br>
+static void ipoib_ud_skb_put_frags(struct ipoib_dev_priv *priv, struct sk_buff *skb, <br>
+ unsigned int length)<br>
+{<br>
+ if (ipoib_ud_need_sg(priv->max_ib_mtu)) {<br>
+ unsigned int size;<br>
+ skb_frag_t *frag = &skb_shinfo(skb)->frags[0];<br>
+ <br>
+ /* put header into skb */<br>
+ size = min(length, (unsigned)IPOIB_UD_HEAD_SIZE);<br>
+ skb->tail += size;<br>
+ skb->len += size;<br>
+ length -= size;<br>
+ <br>
+ size = min(length, (unsigned) PAGE_SIZE);<br>
+ frag->size = size;<br>
+ skb->data_len += size;<br>
+ skb->truesize += size;<br>
+ skb->len += size;<br>
+ length -= size;<br>
+ } else<br>
+ skb_put(skb, length);<br>
+}<br>
+ <br>
static int ipoib_ib_post_receive(struct net_device *dev, int id)<br>
{<br>
struct ipoib_dev_priv *priv = netdev_priv(dev);<br>
@@ -111,8 +134,11 @@ static int ipoib_ib_post_receive(struct <br>
int ret = 0;<br>
int i = priv->rx_outst;<br>
<br>
- priv->sglist_draft[i].addr = priv->rx_ring[id].mapping;<br>
+ priv->sglist_draft[i][0].addr = priv->rx_ring[id].mapping[0];<br>
+ priv->sglist_draft[i][1].addr = priv->rx_ring[id].mapping[1];<br>
+ <br>
priv->rx_wr_draft[i].wr_id = id | IPOIB_OP_RECV;<br>
+ <br>
if (++priv->rx_outst == UD_POST_RCV_COUNT) {<br>
ret = ib_post_recv(priv->qp, priv->rx_wr_draft, &bad_wr);<br>
<br>
@@ -120,8 +146,8 @@ static int ipoib_ib_post_receive(struct <br>
ipoib_warn(priv, "receive failed for buf %d (%d)\n", id, ret);<br>
while (bad_wr) {<br>
id = bad_wr->wr_id & ~IPOIB_OP_RECV;<br>
- ib_dma_unmap_single(priv->ca, priv->rx_ring[id].mapping,<br>
- IPOIB_BUF_SIZE, DMA_FROM_DEVICE);<br>
+ ipoib_sg_dma_unmap_rx(priv,<br>
+ priv->rx_ring[i].mapping);<br>
dev_kfree_skb_any(priv->rx_ring[id].skb);<br>
priv->rx_ring[id].skb = NULL;<br>
}<br>
@@ -132,16 +158,23 @@ static int ipoib_ib_post_receive(struct <br>
return ret;<br>
}<br>
<br>
-static int ipoib_alloc_rx_skb(struct net_device *dev, int id)<br>
+static struct sk_buff *ipoib_alloc_rx_skb(struct net_device *dev, int id,<br>
+ u64 mapping[IPOIB_UD_RX_SG])<br>
{<br>
struct ipoib_dev_priv *priv = netdev_priv(dev);<br>
struct sk_buff *skb;<br>
- u64 addr;<br>
+ int buf_size;<br>
<br>
- skb = dev_alloc_skb(IPOIB_BUF_SIZE + 4);<br>
- if (!skb)<br>
- return -ENOMEM;<br>
+ if (ipoib_ud_need_sg(priv->max_ib_mtu)) <br>
+ buf_size = IPOIB_UD_HEAD_SIZE;<br>
+ else<br>
+ buf_size = IPOIB_UD_BUF_SIZE(priv->max_ib_mtu);<br>
<br>
+ skb = dev_alloc_skb(buf_size + 4);<br>
+ <br>
+ if (unlikely(!skb)) <br>
+ return NULL;<br>
+ <br>
/*<br>
* IB will leave a 40 byte gap for a GRH and IPoIB adds a 4 byte<br>
* header. So we need 4 more bytes to get to 48 and align the<br>
@@ -149,17 +182,32 @@ static int ipoib_alloc_rx_skb(struct net<br>
*/<br>
skb_reserve(skb, 4);<br>
<br>
- addr = ib_dma_map_single(priv->ca, skb->data, IPOIB_BUF_SIZE,<br>
- DMA_FROM_DEVICE);<br>
- if (unlikely(ib_dma_mapping_error(priv->ca, addr))) {<br>
- dev_kfree_skb_any(skb);<br>
- return -EIO;<br>
- }<br>
-<br>
- priv->rx_ring[id].skb = skb;<br>
- priv->rx_ring[id].mapping = addr;<br>
-<br>
- return 0;<br>
+ mapping[0] = ib_dma_map_single(priv->ca, skb->data, buf_size,<br>
+ DMA_FROM_DEVICE);<br>
+ if (unlikely(ib_dma_mapping_error(priv->ca, mapping[0]))) {<br>
+ dev_kfree_skb_any(skb);<br>
+ return NULL;<br>
+ }<br>
+ <br>
+ if (ipoib_ud_need_sg(priv->max_ib_mtu)) {<br>
+ struct page *page = alloc_page(GFP_ATOMIC);<br>
+ if (!page)<br>
+ goto partial_error;<br>
+ <br>
+ skb_fill_page_desc(skb, 0, page, 0, PAGE_SIZE);<br>
+ mapping[1] = ib_dma_map_page(priv->ca, skb_shinfo(skb)->frags[0].page,<br>
+ 0, PAGE_SIZE, DMA_FROM_DEVICE);<br>
+ if (unlikely(ib_dma_mapping_error(priv->ca, mapping[1])))<br>
+ goto partial_error;<br>
+ } <br>
+<br>
+ priv->rx_ring[id].skb = skb;<br>
+ return skb;<br>
+ <br>
+partial_error:<br>
+ ib_dma_unmap_single(priv->ca, mapping[0], buf_size, DMA_FROM_DEVICE);<br>
+ dev_kfree_skb_any(skb);<br>
+ return NULL;<br>
}<br>
<br>
static int ipoib_ib_post_receives(struct net_device *dev)<br>
@@ -168,7 +216,7 @@ static int ipoib_ib_post_receives(struct<br>
int i;<br>
<br>
for (i = 0; i < ipoib_recvq_size; ++i) {<br>
- if (ipoib_alloc_rx_skb(dev, i)) {<br>
+ if (!ipoib_alloc_rx_skb(dev, i, priv->rx_ring[i].mapping)) {<br>
ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);<br>
return -ENOMEM;<br>
}<br>
@@ -186,7 +234,7 @@ static void ipoib_ib_handle_rx_wc(struct<br>
struct ipoib_dev_priv *priv = netdev_priv(dev);<br>
unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV;<br>
struct sk_buff *skb;<br>
- u64 addr;<br>
+ u64 mapping[IPOIB_UD_RX_SG];<br>
<br>
ipoib_dbg_data(priv, "recv completion: id %d, status: %d\n",<br>
wr_id, wc->status);<br>
@@ -198,42 +246,38 @@ static void ipoib_ib_handle_rx_wc(struct<br>
}<br>
<br>
skb = priv->rx_ring[wr_id].skb;<br>
- addr = priv->rx_ring[wr_id].mapping;<br>
<br>
+ /* duplicate the code here, to omit fast path if need-sg condition check */<br>
if (unlikely(wc->status != IB_WC_SUCCESS)) {<br>
if (wc->status != IB_WC_WR_FLUSH_ERR)<br>
ipoib_warn(priv, "failed recv event "<br>
"(status=%d, wrid=%d vend_err %x)\n",<br>
wc->status, wr_id, wc->vendor_err);<br>
- ib_dma_unmap_single(priv->ca, addr,<br>
- IPOIB_BUF_SIZE, DMA_FROM_DEVICE);<br>
+ ipoib_sg_dma_unmap_rx(priv, priv->rx_ring[wr_id].mapping);<br>
dev_kfree_skb_any(skb);<br>
priv->rx_ring[wr_id].skb = NULL;<br>
return;<br>
}<br>
-<br>
/*<br>
* Drop packets that this interface sent, ie multicast packets<br>
* that the HCA has replicated.<br>
*/<br>
- if (unlikely(wc->slid == priv->local_lid && wc->src_qp == priv->qp->qp_num))<br>
+ if (wc->slid == priv->local_lid && wc->src_qp == priv->qp->qp_num)<br>
goto repost;<br>
-<br>
/*<br>
* If we can't allocate a new RX buffer, dump<br>
* this packet and reuse the old buffer.<br>
*/<br>
- if (unlikely(ipoib_alloc_rx_skb(dev, wr_id))) {<br>
+ if (unlikely(!ipoib_alloc_rx_skb(dev, wr_id, mapping))) {<br>
++dev->stats.rx_dropped;<br>
goto repost;<br>
}<br>
-<br>
ipoib_dbg_data(priv, "received %d bytes, SLID 0x%04x\n",<br>
wc->byte_len, wc->slid);<br>
-<br>
- ib_dma_unmap_single(priv->ca, addr, IPOIB_BUF_SIZE, DMA_FROM_DEVICE);<br>
-<br>
- skb_put(skb, wc->byte_len);<br>
+ ipoib_sg_dma_unmap_rx(priv, priv->rx_ring[wr_id].mapping);<br>
+ ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);<br>
+ memcpy(priv->rx_ring[wr_id].mapping, mapping,<br>
+ IPOIB_UD_RX_SG * sizeof *mapping);<br>
skb_pull(skb, IB_GRH_BYTES);<br>
<br>
skb->protocol = ((struct ipoib_header *) skb->data)->proto;<br>
@@ -827,18 +871,15 @@ int ipoib_ib_dev_stop(struct net_device <br>
* all our pending work requests.<br>
*/<br>
for (i = 0; i < ipoib_recvq_size; ++i) {<br>
- struct ipoib_rx_buf *rx_req;<br>
+ struct ipoib_sg_rx_buf *rx_req;<br>
<br>
rx_req = &priv->rx_ring[i];<br>
-<br>
- if (rx_req->skb) {<br>
- ib_dma_unmap_single(priv->ca,<br>
- rx_req->mapping,<br>
- IPOIB_BUF_SIZE,<br>
- DMA_FROM_DEVICE);<br>
- dev_kfree_skb_any(rx_req->skb);<br>
- rx_req->skb = NULL;<br>
- }<br>
+ if (!rx_req->skb)<br>
+ continue;<br>
+ ipoib_sg_dma_unmap_rx(priv,<br>
+ priv->rx_ring[i].mapping);<br>
+ dev_kfree_skb_any(rx_req->skb);<br>
+ rx_req->skb = NULL;<br>
}<br>
<br>
goto timeout;<br>
diff -urpN ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib_main.c ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib_main.c<br>
--- ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-02-04 20:09:18.000000000 -0800<br>
+++ ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib_main.c 2008-02-05 12:20:40.000000000 -0800<br>
@@ -193,7 +193,7 @@ static int ipoib_change_mtu(struct net_d<br>
return 0;<br>
}<br>
<br>
- if (new_mtu > IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN)<br>
+ if (new_mtu > IPOIB_UD_MTU(priv->max_ib_mtu))<br>
return -EINVAL;<br>
<br>
priv->admin_mtu = new_mtu;<br>
@@ -1007,10 +1007,6 @@ static void ipoib_setup(struct net_devic<br>
dev->tx_queue_len = ipoib_sendq_size * 2;<br>
dev->features = NETIF_F_VLAN_CHALLENGED | NETIF_F_LLTX;<br>
<br>
- /* MTU will be reset when mcast join happens */<br>
- dev->mtu = IPOIB_PACKET_SIZE - IPOIB_ENCAP_LEN;<br>
- priv->mcast_mtu = priv->admin_mtu = dev->mtu;<br>
-<br>
memcpy(dev->broadcast, ipv4_bcast_addr, INFINIBAND_ALEN);<br>
<br>
netif_carrier_off(dev);<br>
@@ -1156,6 +1152,7 @@ static struct net_device *ipoib_add_port<br>
struct ib_device *hca, u8 port)<br>
{<br>
struct ipoib_dev_priv *priv;<br>
+ struct ib_port_attr attr;<br>
int result = -ENOMEM;<br>
<br>
priv = ipoib_intf_alloc(format);<br>
@@ -1166,6 +1163,18 @@ static struct net_device *ipoib_add_port<br>
<br>
priv->dev->features |= NETIF_F_HIGHDMA;<br>
<br>
+ if (!ib_query_port(hca, port, &attr))<br>
+ priv->max_ib_mtu = ib_mtu_enum_to_int(attr.max_mtu);<br>
+ else {<br>
+ printk(KERN_WARNING "%s: ib_query_port %d failed\n",<br>
+ hca->name, port);<br>
+ goto device_init_failed;<br>
+ }<br>
+<br>
+ /* MTU will be reset when mcast join happens */<br>
+ priv->dev->mtu = IPOIB_UD_MTU(priv->max_ib_mtu);<br>
+ priv->mcast_mtu = priv->admin_mtu = priv->dev->mtu;<br>
+<br>
result = ib_query_pkey(hca, port, 0, &priv->pkey);<br>
if (result) {<br>
printk(KERN_WARNING "%s: ib_query_pkey port %d failed (ret = %d)\n",<br>
diff -urpN ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c<br>
--- ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-02-04 15:31:14.000000000 -0800<br>
+++ ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-02-05 12:20:40.000000000 -0800<br>
@@ -567,8 +567,7 @@ void ipoib_mcast_join_task(struct work_s<br>
return;<br>
}<br>
<br>
- priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu) -<br>
- IPOIB_ENCAP_LEN;<br>
+ priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));<br>
<br>
if (!ipoib_cm_admin_enabled(dev))<br>
dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);<br>
diff -urpN ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c<br>
--- ofed_kernel_a/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2008-02-04 20:09:18.000000000 -0800<br>
+++ ofed_kernel_b/drivers/infiniband/ulp/ipoib/ipoib_verbs.c 2008-02-05 12:20:40.000000000 -0800<br>
@@ -151,7 +151,7 @@ int ipoib_transport_dev_init(struct net_<br>
.max_send_wr = ipoib_sendq_size,<br>
.max_recv_wr = ipoib_recvq_size,<br>
.max_send_sge = dev->features & NETIF_F_SG ? MAX_SKB_FRAGS + 1 : 1,<br>
- .max_recv_sge = 1<br>
+ .max_recv_sge = IPOIB_UD_RX_SG <br>
},<br>
.sq_sig_type = IB_SIGNAL_REQ_WR,<br>
.qp_type = IB_QPT_UD,<br>
@@ -225,18 +225,29 @@ int ipoib_transport_dev_init(struct net_<br>
priv->tx_wr.opcode = IB_WR_SEND;<br>
priv->tx_wr.sg_list = priv->tx_sge;<br>
priv->tx_wr.send_flags = IB_SEND_SIGNALED;<br>
-<br>
+ <br>
for (i = 0; i < UD_POST_RCV_COUNT; ++i) {<br>
- priv->sglist_draft[i].length = IPOIB_BUF_SIZE;<br>
- priv->sglist_draft[i].lkey = priv->mr->lkey;<br>
-<br>
- priv->rx_wr_draft[i].sg_list = &priv->sglist_draft[i];<br>
- priv->rx_wr_draft[i].num_sge = 1;<br>
+ priv->sglist_draft[i][0].lkey = priv->mr->lkey;<br>
+ priv->sglist_draft[i][1].lkey = priv->mr->lkey;<br>
+ priv->rx_wr_draft[i].sg_list = &priv->sglist_draft[i][0];<br>
if (i < UD_POST_RCV_COUNT - 1)<br>
priv->rx_wr_draft[i].next = &priv->rx_wr_draft[i + 1];<br>
}<br>
priv->rx_wr_draft[i].next = NULL;<br>
<br>
+ if (ipoib_ud_need_sg(priv->max_ib_mtu)) {<br>
+ for (i = 0; i < UD_POST_RCV_COUNT; ++i) {<br>
+ priv->sglist_draft[i][0].length = IPOIB_UD_HEAD_SIZE;<br>
+ priv->sglist_draft[i][1].length = PAGE_SIZE;<br>
+ priv->rx_wr_draft[i].num_sge = IPOIB_UD_RX_SG;<br>
+ }<br>
+ } else {<br>
+ for (i = 0; i < UD_POST_RCV_COUNT; ++i) {<br>
+ priv->sglist_draft[i][0].length = IPOIB_UD_BUF_SIZE(priv->max_ib_mtu);<br>
+ priv->rx_wr_draft[i].num_sge = 1;<br>
+ }<br>
+ }<br>
+<br>
return 0;<br>
<br>
out_free_scq:<br>
<br>
<br>
<i>(See attached file: ipoib-new-4kmtu.patch)</i><br>
<br>
<br>
Shirley </body></html>