[ofa-general] SDP memory allocation policy problem?
Ken Phillips
phillips.ken at gmail.com
Tue Dec 18 12:02:49 PST 2007
Greetings,
We would like to confirm that this patch actually helps out quite a
bit when system is writing out to block device over sdp. We believe
that this should be included in 1.3.
Regards
KP
On 9/26/07, Jim Mott <jimmott at austin.rr.com> wrote:
> I have reworked your patch slightly and run my simple unit tests on it. No correctness problems detected in latency or bandwidth
> paths. No performance regressions either.
>
> If your proposed patch worked for you, then this one ought to work too. Could you please give it a go and let me know?
>
> Index: ofa_1_3_dev_kernel/drivers/infiniband/ulp/sdp/sdp_bcopy.c
> ===================================================================
> --- ofa_1_3_dev_kernel.orig/drivers/infiniband/ulp/sdp/sdp_bcopy.c 2007-09-26 13:27:43.000000000 -0500
> +++ ofa_1_3_dev_kernel/drivers/infiniband/ulp/sdp/sdp_bcopy.c 2007-09-26 17:52:12.000000000 -0500
> @@ -221,16 +221,26 @@ static void sdp_post_recv(struct sdp_soc
> skb_frag_t *frag;
> struct sdp_bsdh *h;
> int id = ssk->rx_head;
> + unsigned int gfp_page;
>
> /* Now, allocate and repost recv */
> /* TODO: allocate from cache */
> - skb = sk_stream_alloc_skb(&ssk->isk.sk, SDP_HEAD_SIZE,
> - GFP_KERNEL);
> +
> + if (unlikely(ssk->isk.sk.sk_allocation)) {
> + skb = sk_stream_alloc_skb(&ssk->isk.sk, SDP_HEAD_SIZE,
> + ssk->isk.sk.sk_allocation);
> + gfp_page = ssk->isk.sk.sk_allocation | __GFP_HIGHMEM;
> + } else {
> + skb = sk_stream_alloc_skb(&ssk->isk.sk, SDP_HEAD_SIZE,
> + GFP_KERNEL);
> + gfp_page = GFP_HIGHUSER;
> + }
> +
> /* FIXME */
> BUG_ON(!skb);
> h = (struct sdp_bsdh *)skb->head;
> for (i = 0; i < ssk->recv_frags; ++i) {
> - page = alloc_pages(GFP_HIGHUSER, 0);
> + page = alloc_pages(gfp_page, 0);
> BUG_ON(!page);
> frag = &skb_shinfo(skb)->frags[i];
> frag->page = page;
> @@ -404,6 +414,7 @@ void sdp_post_sends(struct sdp_sock *ssk
> /* TODO: nonagle? */
> struct sk_buff *skb;
> int c;
> + int gfp_page;
>
> if (unlikely(!ssk->id)) {
> if (ssk->isk.sk.sk_send_head) {
> @@ -415,6 +426,11 @@ void sdp_post_sends(struct sdp_sock *ssk
> return;
> }
>
> + if (unlikely(ssk->isk.sk.sk_allocation))
> + gfp_page = ssk->isk.sk.sk_allocation;
> + else
> + gfp_page = GFP_KERNEL;
> +
> if (ssk->recv_request &&
> ssk->rx_tail >= ssk->recv_request_head &&
> ssk->bufs >= SDP_MIN_BUFS &&
> @@ -424,7 +440,7 @@ void sdp_post_sends(struct sdp_sock *ssk
> skb = sk_stream_alloc_skb(&ssk->isk.sk,
> sizeof(struct sdp_bsdh) +
> sizeof(*resp_size),
> - GFP_KERNEL);
> + gfp_page);
> /* FIXME */
> BUG_ON(!skb);
> resp_size = (struct sdp_chrecvbuf *)skb_put(skb, sizeof *resp_size);
> @@ -449,7 +465,7 @@ void sdp_post_sends(struct sdp_sock *ssk
> skb = sk_stream_alloc_skb(&ssk->isk.sk,
> sizeof(struct sdp_bsdh) +
> sizeof(*req_size),
> - GFP_KERNEL);
> + gfp_page);
> /* FIXME */
> BUG_ON(!skb);
> ssk->sent_request = SDP_MAX_SEND_SKB_FRAGS * PAGE_SIZE;
> @@ -480,7 +496,7 @@ void sdp_post_sends(struct sdp_sock *ssk
> ssk->bufs) {
> skb = sk_stream_alloc_skb(&ssk->isk.sk,
> sizeof(struct sdp_bsdh),
> - GFP_KERNEL);
> + gfp_page);
> /* FIXME */
> BUG_ON(!skb);
> sdp_post_send(ssk, skb, SDP_MID_DISCONN);
>
> -----Original Message-----
> From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Nathan Dauchy
> Sent: Tuesday, September 25, 2007 5:50 PM
> To: general at lists.openfabrics.org
> Subject: Re: [ofa-general] SDP memory allocation policy problem?
>
> Is there anyone among the OFED development team that is looking into
> this issue? I believe that it is causing nodes to hang at our site. We
> are running ofed-1.2 and the 2.6.9-55.ELsmp kernel.
>
> Workarounds or even untested patches would be appreciated.
>
> Thanks!
>
> -Nathan
>
>
> Ken Phillips wrote:
> > Greetings,
> >
> > Teammates here report the following:
> >
> > Problem
> >
> > The method SDP uses to allocate socket buffers may cause the
> > node to hang under memory pressure.
> >
> > Details
> >
> > Each kernel level socket has an allocation flag to specify the
> > memory allocation policy for socket buffers, the default is GFP_ATOMIC
> > (or GFP_KERNEL for SDP). If the caller creates a socket with the
> > policy set to GFP_NOFS or GFP_NOIO this should be the allocation
> > policy used by the SDP layer.
> >
> > The problem we are seeing is that if a node is under load, and
> > a memory allocation fails (say in sock_sendmsg()), the kernel will
> > use the allocation policy to decide how to proceed with the allocation.
> > If GFP_KERNEL is specified, then the kernel may attempt to free pages
> > through the iSCSI block device that is making the socket call, which
> > would result in a deadlock. Use of GFP_NOIO should prevent the kernel
> > from using the IO backend to free memory resources.
> >
> > here is a sample stack trace from Alt-Sysrq during one of these
> > lockups,
> >
> >> tx_worker D ffffff0014d14000 0 10195 1 10196 10194
> >> (L-TLB)
> >> 00000100707e98d8 0000000000000046 0000000000000004 0000000000000212
> >> 0000000000000212 ffffffffa018ccae 0000000000000246 0000000000000246
> >> 000001007873c7f0 0000000000000320
> >> Call Trace:<ffffffffa018ccae>{:ib_mthca:mthca_poll_cq+2258}
> >> <ffffffff8030ad5c>{schedule_timeout+224}
> >> <ffffffff802a9db7>{lock_sock+152}
> >> <ffffffff80135756>{autoremove_wake_function+0}
> >> <ffffffffa0538b13>{:ib_sdp:sdp_poll_cq+58}
> >> <ffffffff80135756>{autoremove_wake_function+0}
> >> <ffffffff802a9dfd>{release_sock+16}
> >> <ffffffffa0534b18>{:ib_sdp:sdp_sendmsg+33}
> >> <ffffffff802a730f>{sock_sendmsg+271}
> >> <ffffffffa05386ad>{:ib_sdp:sdp_post_sends+619}
> >> <ffffffff802a9dfd>{release_sock+16}
> >> <ffffffffa05353a5>{:ib_sdp:sdp_sendmsg+2222}
> >> <ffffffff80135756>{autoremove_wake_function+0}
> >> <ffffffffa057708f>{:rs_iscsi:iscsi_sock_msg+1265}
> >> <ffffffffa057708b>{:rs_iscsi:iscsi_sock_msg+1261}
> >> <ffffffff80132159>{recalc_task_prio+337}
> >> <ffffffffa055bfdb>{:rs_iscsi:scsi_command_i+5283}
> >> <ffffffff8030a2c9>{thread_return+0}
> >> <ffffffff8030a321>{thread_return+88}
> >> <ffffffff8013fdf7>{del_timer+107}
> >> <ffffffff8013feb4>{del_singleshot_timer_sync+9}
> >> <ffffffff8030adf3>{schedule_timeout+375}
> >> <ffffffffa056829e>{:rs_iscsi:tx_worker_proc_i+6819}
> >> <ffffffff80110f47>{child_rip+8}
> >> <ffffffffa05667fb>{:rs_iscsi:tx_worker_proc_i+0}
> >> <ffffffff80110f3f>{child_rip+0}
> >>
> >>
> >
> > We still don't know the scope of changes to be made, but we think,
> > at minimum that some of the memory allocation in SDP should be changed,
> > for example.
> >
> > diff -Naur old/drivers/infiniband/ulp/sdp/sdp_bcopy.c
> > new/drivers/infiniband/ulp/sdp/sdp_bcopy.c
> > --- old/drivers/infiniband/ulp/sdp/sdp_bcopy.c 2007-06-21
> > 10:38:47.000000000 -0400
> > +++ new/drivers/infiniband/ulp/sdp/sdp_bcopy.c 2007-08-31
> > 12:25:58.000000000 -0400
> > @@ -224,13 +224,27 @@
> >
> > /* Now, allocate and repost recv */
> > /* TODO: allocate from cache */
> > +
> > +#if (PROPOSED_SDP_FIX == 1)
> > + skb = sk_stream_alloc_skb(&ssk->isk.sk, SDP_HEAD_SIZE,
> > + (ssk->isk.sk.sk_allocation == 0) ? (GFP_KERNEL) :
> > + ssk->isk.sk.sk_allocation);
> > +#else
> > skb = sk_stream_alloc_skb(&ssk->isk.sk, SDP_HEAD_SIZE,
> > GFP_KERNEL);
> > +#endif
> > /* FIXME */
> > BUG_ON(!skb);
> > h = (struct sdp_bsdh *)skb->head;
> > for (i = 0; i < ssk->recv_frags; ++i) {
> > +#if (PROPOSED_SDP_FIX == 1)
> > + page = alloc_pages((ssk->isk.sk.sk_allocation == 0)
> > + ? (GFP_HIGHUSER) :
> > + (ssk->isk.sk.sk_allocation | (__GFP_HIGHMEM)),
> > + 0);
> > +#else
> > page = alloc_pages(GFP_HIGHUSER, 0);
> > +#endif
> > BUG_ON(!page);
> > frag = &skb_shinfo(skb)->frags[i];
> > frag->page = page;
> > @@ -406,10 +420,18 @@
> > ssk->tx_head - ssk->tx_tail < SDP_TX_SIZE) {
> > struct sdp_chrecvbuf *resp_size;
> > ssk->recv_request = 0;
> > +#if (PROPOSED_SDP_FIX == 1)
> > + skb = sk_stream_alloc_skb(&ssk->isk.sk,
> > + sizeof(struct sdp_bsdh) +
> > + sizeof(*resp_size),
> > + (ssk->isk.sk.sk_allocation == 0) ? (GFP_KERNEL) :
> > + ssk->isk.sk.sk_allocation);
> > +#else
> > skb = sk_stream_alloc_skb(&ssk->isk.sk,
> > sizeof(struct sdp_bsdh) +
> > sizeof(*resp_size),
> > GFP_KERNEL);
> > +#endif
> > /* FIXME */
> > BUG_ON(!skb);
> > resp_size = (struct sdp_chrecvbuf *)skb_put(skb, sizeof *resp_size);
> > @@ -431,10 +453,18 @@
> > ssk->tx_head > ssk->sent_request_head + SDP_RESIZE_WAIT &&
> > ssk->tx_head - ssk->tx_tail < SDP_TX_SIZE) {
> > struct sdp_chrecvbuf *req_size;
> > +#if (PROPOSED_SDP_FIX == 1)
> > + skb = sk_stream_alloc_skb(&ssk->isk.sk,
> > + sizeof(struct sdp_bsdh) +
> > + sizeof(*req_size),
> > + (ssk->isk.sk.sk_allocation == 0) ? (GFP_KERNEL) :
> > + ssk->isk.sk.sk_allocation);
> > +#else
> > skb = sk_stream_alloc_skb(&ssk->isk.sk,
> > sizeof(struct sdp_bsdh) +
> > sizeof(*req_size),
> > GFP_KERNEL);
> > +#endif
> > /* FIXME */
> > BUG_ON(!skb);
> > ssk->sent_request = SDP_MAX_SEND_SKB_FRAGS * PAGE_SIZE;
> > @@ -463,9 +493,16 @@
> > (TCPF_FIN_WAIT1 | TCPF_LAST_ACK)) &&
> > !ssk->isk.sk.sk_send_head &&
> > ssk->bufs) {
> > +#if (PROPOSED_SDP_FIX == 1)
> > + skb = sk_stream_alloc_skb(&ssk->isk.sk,
> > + sizeof(struct sdp_bsdh),
> > + (ssk->isk.sk.sk_allocation == 0) ? (GFP_KERNEL) :
> > + ssk->isk.sk.sk_allocation);
> > +#else
> > skb = sk_stream_alloc_skb(&ssk->isk.sk,
> > sizeof(struct sdp_bsdh),
> > GFP_KERNEL);
> > +#endif
> > /* FIXME */
> > BUG_ON(!skb);
> > sdp_post_send(ssk, skb, SDP_MID_DISCONN);
> > diff -Naur old/drivers/infiniband/ulp/sdp/sdp.h
> > new/drivers/infiniband/ulp/sdp/sdp.h
> > --- old/drivers/infiniband/ulp/sdp/sdp.h 2007-06-21 10:38:47.000000000 -0400
> > +++ new/drivers/infiniband/ulp/sdp/sdp.h 2007-08-31 12:25:45.000000000 -0400
> > @@ -7,6 +7,8 @@
> > #include <net/tcp.h> /* For urgent data flags */
> > #include <rdma/ib_verbs.h>
> >
> > +#define PROPOSED_SDP_FIX 1
> > +
> > #define sdp_printk(level, sk, format, arg...) \
> > printk(level "sdp_sock(%d:%d): " format, \
> > (sk) ? inet_sk(sk)->num : -1, \
> >
> >
> >
> >
> > ---------------------
> > Best Regards
> > K Phillips
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
More information about the general
mailing list