[ofa-general] SDP memory allocation policy problem?
Jim Mott
jimmott at austin.rr.com
Wed Sep 26 04:39:03 PDT 2007
This would be on my plate. I was travelling and have not gotten a chance to test the fix. On inspection, I see no problems with
this approach and do not expect to see any testing issues.
If you want to rework the patch to remove the PROPOSED_SDP_FIX and submit it, I will test it today. Otherwise, I will do the patch
and testing by tomorrow.
Sorry for taking so long.
JIm
-----Original Message-----
From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Nathan Dauchy
Sent: Tuesday, September 25, 2007 5:50 PM
To: general at lists.openfabrics.org
Subject: Re: [ofa-general] SDP memory allocation policy problem?
Is there anyone among the OFED development team that is looking into
this issue? I believe that it is causing nodes to hang at our site. We
are running ofed-1.2 and the 2.6.9-55.ELsmp kernel.
Workarounds or even untested patches would be appreciated.
Thanks!
-Nathan
Ken Phillips wrote:
> Greetings,
>
> Teammates here report the following:
>
> Problem
>
> The method SDP uses to allocate socket buffers may cause the
> node to hang under memory pressure.
>
> Details
>
> Each kernel level socket has an allocation flag to specify the
> memory allocation policy for socket buffers, the default is GFP_ATOMIC
> (or GFP_KERNEL for SDP). If the caller creates a socket with the
> policy set to GFP_NOFS or GFP_NOIO this should be the allocation
> policy used by the SDP layer.
>
> The problem we are seeing is that if a node is under load, and
> a memory allocation fails (say in sock_sendmsg()), the kernel will
> use the allocation policy to decide how to proceed with the allocation.
> If GFP_KERNEL is specified, then the kernel may attempt to free pages
> through the iSCSI block device that is making the socket call, which
> would result in a deadlock. Use of GFP_NOIO should prevent the kernel
> from using the IO backend to free memory resources.
>
> here is a sample stack trace from Alt-Sysrq during one of these
> lockups,
>
>> tx_worker D ffffff0014d14000 0 10195 1 10196 10194
>> (L-TLB)
>> 00000100707e98d8 0000000000000046 0000000000000004 0000000000000212
>> 0000000000000212 ffffffffa018ccae 0000000000000246 0000000000000246
>> 000001007873c7f0 0000000000000320
>> Call Trace:<ffffffffa018ccae>{:ib_mthca:mthca_poll_cq+2258}
>> <ffffffff8030ad5c>{schedule_timeout+224}
>> <ffffffff802a9db7>{lock_sock+152}
>> <ffffffff80135756>{autoremove_wake_function+0}
>> <ffffffffa0538b13>{:ib_sdp:sdp_poll_cq+58}
>> <ffffffff80135756>{autoremove_wake_function+0}
>> <ffffffff802a9dfd>{release_sock+16}
>> <ffffffffa0534b18>{:ib_sdp:sdp_sendmsg+33}
>> <ffffffff802a730f>{sock_sendmsg+271}
>> <ffffffffa05386ad>{:ib_sdp:sdp_post_sends+619}
>> <ffffffff802a9dfd>{release_sock+16}
>> <ffffffffa05353a5>{:ib_sdp:sdp_sendmsg+2222}
>> <ffffffff80135756>{autoremove_wake_function+0}
>> <ffffffffa057708f>{:rs_iscsi:iscsi_sock_msg+1265}
>> <ffffffffa057708b>{:rs_iscsi:iscsi_sock_msg+1261}
>> <ffffffff80132159>{recalc_task_prio+337}
>> <ffffffffa055bfdb>{:rs_iscsi:scsi_command_i+5283}
>> <ffffffff8030a2c9>{thread_return+0}
>> <ffffffff8030a321>{thread_return+88}
>> <ffffffff8013fdf7>{del_timer+107}
>> <ffffffff8013feb4>{del_singleshot_timer_sync+9}
>> <ffffffff8030adf3>{schedule_timeout+375}
>> <ffffffffa056829e>{:rs_iscsi:tx_worker_proc_i+6819}
>> <ffffffff80110f47>{child_rip+8}
>> <ffffffffa05667fb>{:rs_iscsi:tx_worker_proc_i+0}
>> <ffffffff80110f3f>{child_rip+0}
>>
>>
>
> We still don't know the scope of changes to be made, but we think,
> at minimum that some of the memory allocation in SDP should be changed,
> for example.
>
> diff -Naur old/drivers/infiniband/ulp/sdp/sdp_bcopy.c
> new/drivers/infiniband/ulp/sdp/sdp_bcopy.c
> --- old/drivers/infiniband/ulp/sdp/sdp_bcopy.c 2007-06-21
> 10:38:47.000000000 -0400
> +++ new/drivers/infiniband/ulp/sdp/sdp_bcopy.c 2007-08-31
> 12:25:58.000000000 -0400
> @@ -224,13 +224,27 @@
>
> /* Now, allocate and repost recv */
> /* TODO: allocate from cache */
> +
> +#if (PROPOSED_SDP_FIX == 1)
> + skb = sk_stream_alloc_skb(&ssk->isk.sk, SDP_HEAD_SIZE,
> + (ssk->isk.sk.sk_allocation == 0) ? (GFP_KERNEL) :
> + ssk->isk.sk.sk_allocation);
> +#else
> skb = sk_stream_alloc_skb(&ssk->isk.sk, SDP_HEAD_SIZE,
> GFP_KERNEL);
> +#endif
> /* FIXME */
> BUG_ON(!skb);
> h = (struct sdp_bsdh *)skb->head;
> for (i = 0; i < ssk->recv_frags; ++i) {
> +#if (PROPOSED_SDP_FIX == 1)
> + page = alloc_pages((ssk->isk.sk.sk_allocation == 0)
> + ? (GFP_HIGHUSER) :
> + (ssk->isk.sk.sk_allocation | (__GFP_HIGHMEM)),
> + 0);
> +#else
> page = alloc_pages(GFP_HIGHUSER, 0);
> +#endif
> BUG_ON(!page);
> frag = &skb_shinfo(skb)->frags[i];
> frag->page = page;
> @@ -406,10 +420,18 @@
> ssk->tx_head - ssk->tx_tail < SDP_TX_SIZE) {
> struct sdp_chrecvbuf *resp_size;
> ssk->recv_request = 0;
> +#if (PROPOSED_SDP_FIX == 1)
> + skb = sk_stream_alloc_skb(&ssk->isk.sk,
> + sizeof(struct sdp_bsdh) +
> + sizeof(*resp_size),
> + (ssk->isk.sk.sk_allocation == 0) ? (GFP_KERNEL) :
> + ssk->isk.sk.sk_allocation);
> +#else
> skb = sk_stream_alloc_skb(&ssk->isk.sk,
> sizeof(struct sdp_bsdh) +
> sizeof(*resp_size),
> GFP_KERNEL);
> +#endif
> /* FIXME */
> BUG_ON(!skb);
> resp_size = (struct sdp_chrecvbuf *)skb_put(skb, sizeof *resp_size);
> @@ -431,10 +453,18 @@
> ssk->tx_head > ssk->sent_request_head + SDP_RESIZE_WAIT &&
> ssk->tx_head - ssk->tx_tail < SDP_TX_SIZE) {
> struct sdp_chrecvbuf *req_size;
> +#if (PROPOSED_SDP_FIX == 1)
> + skb = sk_stream_alloc_skb(&ssk->isk.sk,
> + sizeof(struct sdp_bsdh) +
> + sizeof(*req_size),
> + (ssk->isk.sk.sk_allocation == 0) ? (GFP_KERNEL) :
> + ssk->isk.sk.sk_allocation);
> +#else
> skb = sk_stream_alloc_skb(&ssk->isk.sk,
> sizeof(struct sdp_bsdh) +
> sizeof(*req_size),
> GFP_KERNEL);
> +#endif
> /* FIXME */
> BUG_ON(!skb);
> ssk->sent_request = SDP_MAX_SEND_SKB_FRAGS * PAGE_SIZE;
> @@ -463,9 +493,16 @@
> (TCPF_FIN_WAIT1 | TCPF_LAST_ACK)) &&
> !ssk->isk.sk.sk_send_head &&
> ssk->bufs) {
> +#if (PROPOSED_SDP_FIX == 1)
> + skb = sk_stream_alloc_skb(&ssk->isk.sk,
> + sizeof(struct sdp_bsdh),
> + (ssk->isk.sk.sk_allocation == 0) ? (GFP_KERNEL) :
> + ssk->isk.sk.sk_allocation);
> +#else
> skb = sk_stream_alloc_skb(&ssk->isk.sk,
> sizeof(struct sdp_bsdh),
> GFP_KERNEL);
> +#endif
> /* FIXME */
> BUG_ON(!skb);
> sdp_post_send(ssk, skb, SDP_MID_DISCONN);
> diff -Naur old/drivers/infiniband/ulp/sdp/sdp.h
> new/drivers/infiniband/ulp/sdp/sdp.h
> --- old/drivers/infiniband/ulp/sdp/sdp.h 2007-06-21 10:38:47.000000000 -0400
> +++ new/drivers/infiniband/ulp/sdp/sdp.h 2007-08-31 12:25:45.000000000 -0400
> @@ -7,6 +7,8 @@
> #include <net/tcp.h> /* For urgent data flags */
> #include <rdma/ib_verbs.h>
>
> +#define PROPOSED_SDP_FIX 1
> +
> #define sdp_printk(level, sk, format, arg...) \
> printk(level "sdp_sock(%d:%d): " format, \
> (sk) ? inet_sk(sk)->num : -1, \
>
>
>
>
> ---------------------
> Best Regards
> K Phillips
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list