<div dir="ltr"><div><div><div><div><div><div><div><div>Hi list.<br><br>We're running a TCP middleware over IPoIB-CM (OFED-3.5-2) on Red Hat 6.4. We intend to eventually run a multicast RDMA middleware on the stack.<br>
<br></div>The hardware stack is Dell R720s (some Westmere, mostly Sandy Bridge) with bonded Mellanox MT26428 ConnectX-2 on two QLogc 12300 managed switches. We're runnign the latest firmware on the HCAs and the switches.<br>
<br></div>We have been seeing the following messages in the kernel ring, which also seems to coincide with page allocation errors:<br><br>ib0: dev_queue_xmit failed to requeue packet<br>ib0: dev_queue_xmit failed to requeue packet<br>
ib0: dev_queue_xmit failed to requeue packet<br>ib0: dev_queue_xmit failed to requeue packet<br>ib0: dev_queue_xmit failed to requeue packet<br>ib0: dev_queue_xmit failed to requeue packet<br>java: page allocation failure. order:1, mode:0x20<br>
Pid: 24410, comm: java Tainted: P --------------- 2.6.32-279.el6.x86_64 #1<br>Call Trace:<br> <IRQ> [<ffffffff8112759f>] ? __alloc_pages_nodemask+0x77f/0x940<br> [<ffffffff81489c00>] ? tcp_rcv_established+0x290/0x800<br>
[<ffffffff81161d62>] ? kmem_getpages+0x62/0x170<br> [<ffffffff8116297a>] ? fallback_alloc+0x1ba/0x270<br> [<ffffffff811623cf>] ? cache_grow+0x2cf/0x320<br> [<ffffffff811626f9>] ? ____cache_alloc_node+0x99/0x160<br>
[<ffffffff8143014d>] ? __alloc_skb+0x6d/0x190<br> [<ffffffff811635bf>] ? kmem_cache_alloc_node_notrace+0x6f/0x130<br> [<ffffffff811637fb>] ? __kmalloc_node+0x7b/0x100<br> [<ffffffff8143014d>] ? __alloc_skb+0x6d/0x190<br>
[<ffffffff8143028d>] ? dev_alloc_skb+0x1d/0x40<br> [<ffffffffa0673f90>] ? ipoib_cm_alloc_rx_skb+0x30/0x430 [ib_ipoib]<br> [<ffffffffa067523f>] ? ipoib_cm_handle_rx_wc+0x29f/0x770 [ib_ipoib]<br> [<ffffffffa018c828>] ? mlx4_ib_poll_cq+0xa8/0x890 [mlx4_ib]<br>
[<ffffffffa066c01d>] ? ipoib_ib_completion+0x2d/0x30 [ib_ipoib]<br> [<ffffffffa066d80b>] ? ipoib_poll+0xdb/0x190 [ib_ipoib]<br> [<ffffffff810600bc>] ? try_to_wake_up+0x24c/0x3e0<br> [<ffffffff8143f193>] ? net_rx_action+0x103/0x2f0<br>
[<ffffffff81073ec1>] ? __do_softirq+0xc1/0x1e0<br> [<ffffffff810db800>] ? handle_IRQ_event+0x60/0x170<br> [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30<br> [<ffffffff8100de85>] ? do_softirq+0x65/0xa0<br>
[<ffffffff81073ca5>] ? irq_exit+0x85/0x90<br> [<ffffffff81505af5>] ? do_IRQ+0x75/0xf0<br> [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11<br> <EOI> <br><br></div>These appear to be genuine drops, as we are seeing gaps in our middleware which is then going on to re-cap.<br>
<br></div>We've just made a change to increase the page cache from ~90M to 128M - but what is the lists feeling on the dev_queue_xmit errors? Could they be being caused by the same issue? Unable to allocate pages in a timely manner perhaps?<br>
<br>We're not running at anywhere near high messages rates (<1000 ~450b mps).<br><br></div>I can see a thread started in 2012 where someone had caused these dev_queue_xmit using netperf and Roland had suggested that at worst one packet was being dropped. Silence after this.<br>
<br></div>Has anyone seen this behavior, or got any pointers to chase this down?<br><br></div>Cheers,<br>-Andrew<br><br></div>ibv_devinfo<br><div><br>ca_id: mlx4_1<br> transport: InfiniBand (0)<br> fw_ver: 2.9.1000<br>
node_guid: 0002:c903:0057:2250<br> sys_image_guid: 0002:c903:0057:2253<br> vendor_id: 0x02c9<br> vendor_part_id: 26428<br> hw_ver: 0xB0<br> board_id: MT_0D90110009<br>
phys_port_cnt: 1<br> max_mr_size: 0xffffffffffffffff<br> page_size_cap: 0xfffffe00<br> max_qp: 163776<br> max_qp_wr: 16351<br> device_cap_flags: 0x007c9c76<br>
max_sge: 32<br> max_sge_rd: 0<br> max_cq: 65408<br> max_cqe: 4194303<br> max_mr: 524272<br> max_pd: 32764<br> max_qp_rd_atom: 16<br>
max_ee_rd_atom: 0<br> max_res_rd_atom: 2620416<br> max_qp_init_rd_atom: 128<br> max_ee_init_rd_atom: 0<br> atomic_cap: ATOMIC_HCA (1)<br> max_ee: 0<br>
max_rdd: 0<br> max_mw: 0<br> max_raw_ipv6_qp: 0<br> max_raw_ethy_qp: 0<br> max_mcast_grp: 8192<br> max_mcast_qp_attach: 248<br> max_total_mcast_qp_attach: 2031616<br>
max_ah: 0<br> max_fmr: 0<br> max_srq: 65472<br> max_srq_wr: 16383<br> max_srq_sge: 31<br> max_pkeys: 128<br> local_ca_ack_delay: 15<br>
port: 1<br> state: PORT_ACTIVE (4)<br> max_mtu: 4096 (5)<br> active_mtu: 2048 (4)<br> sm_lid: 1<br> port_lid: 9<br>
port_lmc: 0x00<br> link_layer: InfiniBand<br> max_msg_sz: 0x40000000<br> port_cap_flags: 0x02510868<br> max_vl_num: 4 (3)<br> bad_pkey_cntr: 0x0<br>
qkey_viol_cntr: 0x0<br> sm_sl: 0<br> pkey_tbl_len: 128<br> gid_tbl_len: 128<br> subnet_timeout: 17<br> init_type_reply: 0<br>
active_width: 4X (2)<br> active_speed: 10.0 Gbps (4)<br> phys_state: LINK_UP (5)<br> GID[ 0]: fe80:0000:0000:0000:0002:c903:0057:2251<br><br>hca_id: mlx4_0<br>
transport: InfiniBand (0)<br> fw_ver: 2.9.1000<br> node_guid: 0002:c903:0057:2764<br> sys_image_guid: 0002:c903:0057:2767<br> vendor_id: 0x02c9<br>
vendor_part_id: 26428<br> hw_ver: 0xB0<br> board_id: MT_0D90110009<br> phys_port_cnt: 1<br> max_mr_size: 0xffffffffffffffff<br> page_size_cap: 0xfffffe00<br>
max_qp: 163776<br> max_qp_wr: 16351<br> device_cap_flags: 0x007c9c76<br> max_sge: 32<br> max_sge_rd: 0<br> max_cq: 65408<br> max_cqe: 4194303<br>
max_mr: 524272<br> max_pd: 32764<br> max_qp_rd_atom: 16<br> max_ee_rd_atom: 0<br> max_res_rd_atom: 2620416<br> max_qp_init_rd_atom: 128<br>
max_ee_init_rd_atom: 0<br> atomic_cap: ATOMIC_HCA (1)<br> max_ee: 0<br> max_rdd: 0<br> max_mw: 0<br> max_raw_ipv6_qp: 0<br> max_raw_ethy_qp: 0<br>
max_mcast_grp: 8192<br> max_mcast_qp_attach: 248<br> max_total_mcast_qp_attach: 2031616<br> max_ah: 0<br> max_fmr: 0<br> max_srq: 65472<br> max_srq_wr: 16383<br>
max_srq_sge: 31<br> max_pkeys: 128<br> local_ca_ack_delay: 15<br> port: 1<br> state: PORT_ACTIVE (4)<br> max_mtu: 4096 (5)<br> active_mtu: 2048 (4)<br>
sm_lid: 1<br> port_lid: 10<br> port_lmc: 0x00<br> link_layer: InfiniBand<br> max_msg_sz: 0x40000000<br> port_cap_flags: 0x02510868<br>
max_vl_num: 4 (3)<br> bad_pkey_cntr: 0x0<br> qkey_viol_cntr: 0x0<br> sm_sl: 0<br> pkey_tbl_len: 128<br> gid_tbl_len: 128<br>
subnet_timeout: 17<br> init_type_reply: 0<br> active_width: 4X (2)<br> active_speed: 10.0 Gbps (4)<br> phys_state: LINK_UP (5)<br> GID[ 0]: fe80:0000:0000:0000:0002:c903:0057:2765<br>
<br><br><div><div><div><div><div>slabtop<br><br> Active / Total Objects (% used) : 3436408 / 5925284 (58.0%)<br> Active / Total Slabs (% used) : 178659 / 178867 (99.9%)<br> Active / Total Caches (% used) : 117 / 193 (60.6%)<br>
Active / Total Size (% used) : 422516.74K / 692339.54K (61.0%)<br> Minimum / Average / Maximum Object : 0.02K / 0.12K / 4096.00K<br><br> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME <br>
4461349 2084881 46% 0.10K 120577 37 482308K buffer_head <br>548064 547979 99% 0.02K 3806 144 15224K avtab_node <br>370496 368197 99% 0.03K 3308 112 13232K size-32 <br>
135534 105374 77% 0.55K 19362 7 77448K radix_tree_node <br> 67946 51531 75% 0.07K 1282 53 5128K selinux_inode_security <br> 57938 35717 61% 0.06K 982 59 3928K size-64 <br>
42620 42303 99% 0.19K 2131 20 8524K dentry <br> 25132 25129 99% 1.00K 6283 4 25132K ext4_inode_cache <br> 23600 23436 99% 0.19K 1180 20 4720K size-192 <br>
18225 18189 99% 0.14K 675 27 2700K sysfs_dir_cache <br> 17062 15025 88% 0.20K 898 19 3592K vm_area_struct <br> 16555 9899 59% 0.05K 215 77 860K anon_vma_chain <br>
15456 15143 97% 0.62K 2576 6 10304K proc_inode_cache <br> 14340 8881 61% 0.19K 717 20 2868K filp <br> 12090 7545 62% 0.12K 403 30 1612K size-128 <br>
10770 8748 81% 0.25K 718 15 2872K skbuff_head_cache <br> 10568 8365 79% 1.00K 2642 4 10568K size-1024 <br> 8924 5464 61% 0.04K 97 92 388K anon_vma <br>
7038 6943 98% 0.58K 1173 6 4692K inode_cache <br> 5192 4956 95% 2.00K 2596 2 10384K size-2048 <br> 3600 3427 95% 0.50K 450 8 1800K size-512 <br>
3498 3105 88% 0.07K 66 53 264K eventpoll_pwq <br> 3390 3105 91% 0.12K 113 30 452K eventpoll_epi <br> 3335 3239 97% 0.69K 667 5 2668K sock_inode_cache <br>
2636 2612 99% 1.62K 659 4 5272K TCP <br> 2380 1962 82% 0.11K 70 34 280K task_delay_info <br> 2310 1951 84% 0.12K 77 30 308K pid <br>
2136 2053 96% 0.44K 267 8 1068K ib_mad <br> 1992 1947 97% 2.59K 664 3 5312K task_struct <br> 1888 1506 79% 0.06K 32 59 128K tcp_bind_bucket <br>
1785 1685 94% 0.25K 119 15 476K size-256 <br> 1743 695 39% 0.50K 249 7 996K skbuff_fclone_cache <br> 1652 532 32% 0.06K 28 59 112K avc_node <br>
1640 1175 71% 0.19K 82 20 328K cred_jar <br> 1456 1264 86% 0.50K 182 8 728K task_xstate <br> 1378 781 56% 0.07K 26 53 104K Acpi-Operand <br>
1156 459 39% 0.11K 34 34 136K jbd2_journal_head <br> 1050 983 93% 0.78K 210 5 840K shmem_inode_cache <br> 1021 879 86% 4.00K 1021 1 4084K size-4096 <br>
1020 537 52% 0.19K 51 20 204K bio-0 <br> 1008 501 49% 0.02K 7 144 28K dm_target_io <br> 920 463 50% 0.04K 10 92 40K dm_io <br>
876 791 90% 1.00K 219 4 876K signal_cache <br> 840 792 94% 2.06K 280 3 2240K sighand_cache <br> 740 439 59% 0.10K 20 37 80K ext4_prealloc_space <br>
736 658 89% 0.04K 8 92 32K Acpi-Namespace <br> 720 283 39% 0.08K 15 48 60K blkdev_ioc <br> 720 294 40% 0.02K 5 144 20K jbd2_journal_handle <br>
708 131 18% 0.06K 12 59 48K fs_cache <br> 630 429 68% 0.38K 63 10 252K ip_dst_cache <br> 627 625 99% 8.00K 627 1 5016K size-8192 <br>
616 297 48% 0.13K 22 28 88K cfq_io_context <br> 480 249 51% 0.23K 30 16 120K cfq_queue <br> 370 330 89% 0.75K 74 5 296K UNIX <br>
368 31 8% 0.04K 4 92 16K khugepaged_mm_slot <br> 357 325 91% 0.53K 51 7 204K idr_layer_cache <br> 341 128 37% 0.69K 31 11 248K files_cache <br>
270 159 58% 0.12K 9 30 36K scsi_sense_cache <br> 246 244 99% 1.81K 123 2 492K TCPv6 <br> 231 131 56% 0.34K 21 11 84K blkdev_requests <br>
210 102 48% 1.38K 42 5 336K mm_struct <br> 210 116 55% 0.25K 14 15 56K sgpool-8 <br> 202 14 6% 0.02K 1 202 4K jbd2_revoke_table <br>
192 192 100% 32.12K 192 1 12288K kmem_cache <br> 180 121 67% 0.25K 12 15 48K scsi_cmd_cache <br> 170 113 66% 0.11K 5 34 20K inotify_inode_mark_entry<br>
144 121 84% 0.16K 6 24 24K sigqueue <br> 134 4 2% 0.05K 2 67 8K ext4_free_block_extents<br> 118 26 22% 0.06K 2 59 8K fib6_nodes <br>
112 2 1% 0.03K 1 112 4K ip_fib_alias <br> 112 1 0% 0.03K 1 112 4K dnotify_struct <br> 112 2 1% 0.03K 1 112 4K sd_ext_cdb <br>
<br></div></div></div></div></div></div></div>