Skip to content
代码片段 群组 项目
  1. 6月 29, 2022
  2. 6月 28, 2022
  3. 6月 27, 2022
  4. 6月 22, 2022
  5. 5月 05, 2022
  6. 4月 14, 2022
    • Nikolay Aleksandrov's avatar
      device: check for metadata_dst with skb_valid_dst() · f9d9b4db
      Nikolay Aleksandrov 创作于
      When we try to transmit an skb with md_dst attached through wireguard
      we hit a null pointer dereference in wg_xmit() due to the use of
      dst_mtu() which calls into dst_blackhole_mtu() which in turn tries to
      dereference dst->dev.
      
      Since wireguard doesn't use md_dsts we should use skb_valid_dst(), which
      checks for DST_METADATA flag, and if it's set, then falls back to
      wireguard's device mtu. That gives us the best chance of transmitting
      the packet; otherwise if the blackhole netdev is used we'd get
      ETH_MIN_MTU.
      
       [  263.693506] BUG: kernel NULL pointer dereference, address: 00000000000000e0
       [  263.693908] #PF: supervisor read access in kernel mode
       [  263.694174] #PF: error_code(0x0000) - not-present page
       [  263.694424] PGD 0 P4D 0
       [  263.694653] Oops: 0000 [#1] PREEMPT SMP NOPTI
       [  263.694876] CPU: 5 PID: 951 Comm: mausezahn Kdump: loaded Not tainted 5.18.0-rc1+ #522
       [  263.695190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
       [  263.695529] RIP: 0010:dst_blackhole_mtu+0x17/0x20
       [  263.695770] Code: 00 00 00 0f 1f 44 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 10 48 83 e0 fc 8b 40 04 85 c0 75 09 48 8b 07 <8b> 80 e0 00 00 00 c3 66 90 0f 1f 44 00 00 48 89 d7 be 01 00 00 00
       [  263.696339] RSP: 0018:ffffa4a4422fbb28 EFLAGS: 00010246
       [  263.696600] RAX: 0000000000000000 RBX: ffff8ac9c3553000 RCX: 0000000000000000
       [  263.696891] RDX: 0000000000000401 RSI: 00000000fffffe01 RDI: ffffc4a43fb48900
       [  263.697178] RBP: ffffa4a4422fbb90 R08: ffffffff9622635e R09: 0000000000000002
       [  263.697469] R10: ffffffff9b69a6c0 R11: ffffa4a4422fbd0c R12: ffff8ac9d18b1a00
       [  263.697766] R13: ffff8ac9d0ce1840 R14: ffff8ac9d18b1a00 R15: ffff8ac9c3553000
       [  263.698054] FS:  00007f3704c337c0(0000) GS:ffff8acaebf40000(0000) knlGS:0000000000000000
       [  263.698470] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [  263.698826] CR2: 00000000000000e0 CR3: 0000000117a5c000 CR4: 00000000000006e0
       [  263.699214] Call Trace:
       [  263.699505]  <TASK>
       [  263.699759]  wg_xmit+0x411/0x450
       [  263.700059]  ? bpf_skb_set_tunnel_key+0x46/0x2d0
       [   263.700382]  ? dev_queue_xmit_nit+0x31/0x2b0
       [  263.700719]  dev_hard_start_xmit+0xd9/0x220
       [  263.701047]  __dev_queue_xmit+0x8b9/0xd30
       [  263.701344]  __bpf_redirect+0x1a4/0x380
       [  263.701664]  __dev_queue_xmit+0x83b/0xd30
       [  263.701961]  ? packet_parse_headers+0xb4/0xf0
       [  263.702275]  packet_sendmsg+0x9a8/0x16a0
       [  263.702596]  ? _raw_spin_unlock_irqrestore+0x23/0x40
       [  263.702933]  sock_sendmsg+0x5e/0x60
       [  263.703239]  __sys_sendto+0xf0/0x160
       [  263.703549]  __x64_sys_sendto+0x20/0x30
       [  263.703853]  do_syscall_64+0x3b/0x90
       [  263.704162]  entry_SYSCALL_64_after_hwframe+0x44/0xae
       [  263.704494] RIP: 0033:0x7f3704d50506
       [  263.704789] Code: 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
       [  263.705652] RSP: 002b:00007ffe954b0b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       [  263.706141] RAX: ffffffffffffffda RBX: 0000558bb259b490 RCX: 00007f3704d50506
       [  263.706544] RDX: 000000000000004a RSI: 0000558bb259b7b2 RDI: 0000000000000003
       [  263.706952] RBP: 0000000000000000 R08: 00007ffe954b0b90 R09: 0000000000000014
       [  263.707339] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe954b0b90
       [  263.707735] R13: 000000000000004a R14: 0000558bb259b7b2 R15: 0000000000000001
       [  263.708132]  </TASK>
       [  263.708398] Modules linked in: bridge netconsole bonding [last unloaded: bridge]
       [  263.708942] CR2: 00000000000000e0
      
      Link: https://github.com/cilium/cilium/issues/19428
      
      
      Reported-by: default avatarMartynas Pumputis <m@lambda.lt>
      Signed-off-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      [Jason: polyfilled for < 4.3]
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      f9d9b4db
  7. 4月 07, 2022
    • Jason A. Donenfeld's avatar
      qemu: enable ACPI for SMP · f909532a
      Jason A. Donenfeld 创作于
      
      It turns out that by having CONFIG_ACPI=n, we've been failing to boot
      additional CPUs, and so these systems were functionally UP. The code
      bloat is unfortunate for build times, but I don't see an alternative. So
      this commit sets CONFIG_ACPI=y for x86_64 and i686 configs.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      f909532a
    • Jason A. Donenfeld's avatar
      socket: ignore v6 endpoints when ipv6 is disabled · ec89ca64
      Jason A. Donenfeld 创作于
      
      The previous commit fixed a memory leak on the send path in the event
      that IPv6 is disabled at compile time, but how did a packet even arrive
      there to begin with? It turns out we have previously allowed IPv6
      endpoints even when IPv6 support is disabled at compile time. This is
      awkward and inconsistent. Instead, let's just ignore all things IPv6,
      the same way we do other malformed endpoints, in the case where IPv6 is
      disabled.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      ec89ca64
  8. 4月 06, 2022
    • Wang Hai's avatar
      socket: free skb in send6 when ipv6 is disabled · fa32671b
      Wang Hai 创作于
      
      I got a memory leak report:
      
      unreferenced object 0xffff8881191fc040 (size 232):
        comm "kworker/u17:0", pid 23193, jiffies 4295238848 (age 3464.870s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff814c3ef4>] slab_post_alloc_hook+0x84/0x3b0
          [<ffffffff814c8977>] kmem_cache_alloc_node+0x167/0x340
          [<ffffffff832974fb>] __alloc_skb+0x1db/0x200
          [<ffffffff82612b5d>] wg_socket_send_buffer_to_peer+0x3d/0xc0
          [<ffffffff8260e94a>] wg_packet_send_handshake_initiation+0xfa/0x110
          [<ffffffff8260ec81>] wg_packet_handshake_send_worker+0x21/0x30
          [<ffffffff8119c558>] process_one_work+0x2e8/0x770
          [<ffffffff8119ca2a>] worker_thread+0x4a/0x4b0
          [<ffffffff811a88e0>] kthread+0x120/0x160
          [<ffffffff8100242f>] ret_from_fork+0x1f/0x30
      
      In function wg_socket_send_buffer_as_reply_to_skb() or wg_socket_send_
      buffer_to_peer(), the semantics of send6() is required to free skb. But
      when CONFIG_IPV6 is disable, kfree_skb() is missing. This patch adds it
      to fix this bug.
      
      Signed-off-by: default avatarWang Hai <wanghai38@huawei.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      fa32671b
  9. 3月 03, 2022
  10. 12月 14, 2021
  11. 12月 13, 2021
  12. 12月 08, 2021
  13. 12月 07, 2021
    • Mathias Krause's avatar
      crypto: curve25519-x86_64: solve register constraints with reserved registers · 3c9f3b69
      Mathias Krause 创作于
      
      The register constraints for the inline assembly in fsqr() and fsqr2()
      are pretty tight on what the compiler may assign to the remaining three
      register variables. The clobber list only allows the following to be
      used: RDI, RSI, RBP and R12. With RAP reserving R12 and a kernel having
      CONFIG_FRAME_POINTER=y, claiming RBP, there are only two registers left
      so the compiler rightfully complains about impossible constraints.
      
      Provide alternatives that'll allow a memory reference for 'out' to solve
      the allocation constraint dilemma for this configuration.
      
      Also make 'out' an input-only operand as it is only used as such. This
      not only allows gcc to optimize its usage further, but also works around
      older gcc versions, apparently failing to handle multiple alternatives
      correctly, as in failing to initialize the 'out' operand with its input
      value.
      
      Signed-off-by: default avatarMathias Krause <minipli@grsecurity.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      3c9f3b69
    • Jason A. Donenfeld's avatar
      compat: udp_tunnel: don't take reference to non-init namespace · 8e40dd62
      Jason A. Donenfeld 创作于
      
      The comment to sk_change_net is instructive:
      
        Kernel sockets, f.e. rtnl or icmp_socket, are a part of a namespace.
        They should not hold a reference to a namespace in order to allow
        to stop it.
        Sockets after sk_change_net should be released using sk_release_kernel
      
      We weren't following these rules before, and were instead using
      __sock_create, which means we kept a reference to the namespace, which
      in turn meant that interfaces were not cleaned up on namespace
      exit.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      8e40dd62
  14. 12月 04, 2021
    • Arnd Bergmann's avatar
      compat: siphash: use _unaligned version by default · ea6b8e7b
      Arnd Bergmann 创作于
      On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
      because the ordinary load/store instructions (ldr, ldrh, ldrb) can
      tolerate any misalignment of the memory address. However, load/store
      double and load/store multiple instructions (ldrd, ldm) may still only
      be used on memory addresses that are 32-bit aligned, and so we have to
      use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
      may end up with a severe performance hit due to alignment traps that
      require fixups by the kernel. Testing shows that this currently happens
      with clang-13 but not gcc-11. In theory, any compiler version can
      produce this bug or other problems, as we are dealing with undefined
      behavior in C99 even on architectures that support this in hardware,
      see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
      
      .
      
      Fortunately, the get_unaligned() accessors do the right thing: when
      building for ARMv6 or later, the compiler will emit unaligned accesses
      using the ordinary load/store instructions (but avoid the ones that
      require 32-bit alignment). When building for older ARM, those accessors
      will emit the appropriate sequence of ldrb/mov/orr instructions. And on
      architectures that can truly tolerate any kind of misalignment, the
      get_unaligned() accessors resolve to the leXX_to_cpup accessors that
      operate on aligned addresses.
      
      Since the compiler will in fact emit ldrd or ldm instructions when
      building this code for ARM v6 or later, the solution is to use the
      unaligned accessors unconditionally on architectures where this is
      known to be fast. The _aligned version of the hash function is
      however still needed to get the best performance on architectures
      that cannot do any unaligned access in hardware.
      
      This new version avoids the undefined behavior and should produce
      the fastest hash on all architectures we support.
      
      Reported-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      ea6b8e7b
    • Gustavo A. R. Silva's avatar
      ratelimiter: use kvcalloc() instead of kvzalloc() · 5325bc82
      Gustavo A. R. Silva 创作于
      
      Use 2-factor argument form kvcalloc() instead of kvzalloc().
      
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      5325bc82
    • Jason A. Donenfeld's avatar
      receive: drop handshakes if queue lock is contended · e44c78cb
      Jason A. Donenfeld 创作于
      
      If we're being delivered packets from multiple CPUs so quickly that the
      ring lock is contended for CPU tries, then it's safe to assume that the
      queue is near capacity anyway, so just drop the packet rather than
      spinning. This helps deal with multicore DoS that can interfere with
      data path performance. It _still_ does not completely fix the issue, but
      it again chips away at it.
      
      Reported-by: default avatarStreun Fabio <fstreun@student.ethz.ch>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      e44c78cb
    • Jason A. Donenfeld's avatar
      receive: use ring buffer for incoming handshakes · 5707d38f
      Jason A. Donenfeld 创作于
      
      Apparently the spinlock on incoming_handshake's skb_queue is highly
      contended, and a torrent of handshake or cookie packets can bring the
      data plane to its knees, simply by virtue of enqueueing the handshake
      packets to be processed asynchronously. So, we try switching this to a
      ring buffer to hopefully have less lock contention. This alleviates the
      problem somewhat, though it still isn't perfect, so future patches will
      have to improve this further. However, it at least doesn't completely
      diminish the data plane.
      
      Reported-by: default avatarStreun Fabio <fstreun@student.ethz.ch>
      Reported-by: default avatarJoel Wanner <joel.wanner@inf.ethz.ch>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      5707d38f
    • Jason A. Donenfeld's avatar
      device: reset peer src endpoint when netns exits · 68abb1b9
      Jason A. Donenfeld 创作于
      
      Each peer's endpoint contains a dst_cache entry that takes a reference
      to another netdev. When the containing namespace exits, we take down the
      socket and prevent future sockets from being created (by setting
      creating_net to NULL), which removes that potential reference on the
      netns. However, it doesn't release references to the netns that a netdev
      cached in dst_cache might be taking, so the netns still might fail to
      exit. Since the socket is gimped anyway, we can simply clear all the
      dst_caches (by way of clearing the endpoint src), which will release all
      references.
      
      However, the current dst_cache_reset function only releases those
      references lazily. But it turns out that all of our usages of
      wg_socket_clear_peer_endpoint_src are called from contexts that are not
      exactly high-speed or bottle-necked. For example, when there's
      connection difficulty, or when userspace is reconfiguring the interface.
      And in particular for this patch, when the netns is exiting. So for
      those cases, it makes more sense to call dst_release immediately. For
      that, we add a small helper function to dst_cache.
      
      This patch also adds a test to netns.sh from Hangbin Liu to ensure this
      doesn't regress.
      
      Test-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reported-by: default avatarXiumei Mu <xmu@redhat.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      68abb1b9
    • Randy Dunlap's avatar
      main: rename 'mod_init' & 'mod_exit' functions to be module-specific · ea3f5fbe
      Randy Dunlap 创作于
      
      Rename module_init & module_exit functions that are named
      "mod_init" and "mod_exit" so that they are unique in both the
      System.map file and in initcall_debug output instead of showing
      up as almost anonymous "mod_init".
      
      This is helpful for debugging and in determining how long certain
      module_init calls take to execute.
      
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      ea3f5fbe
    • Jason A. Donenfeld's avatar
      netns: actually test for routing loops · cb001d45
      Jason A. Donenfeld 创作于
      
      We previously removed the restriction on looping to self, and then added
      a test to make sure the kernel didn't blow up during a routing loop. The
      kernel didn't blow up, thankfully, but on certain architectures where
      skb fragmentation is easier, such as ppc64, the skbs weren't actually
      being discarded after a few rounds through. But the test wasn't catching
      this. So actually test explicitly for massive increases in tx to see if
      we have a routing loop. Note that the actual loop problem will need to
      be addressed in a different commit.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      cb001d45
    • Peter Georg's avatar
      compat: update for RHEL 8.5 · 2715e641
      Peter Georg 创作于
      
      RHEL 8.5 has been released. Replace all ISCENTOS8S checks with ISRHEL8.
      Increase RHEL_MINOR for CentOS 8 Stream detection to 6.
      
      Signed-off-by: default avatarPeter Georg <peter.georg@physik.uni-regensburg.de>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      2715e641
  15. 8月 09, 2021
    • Mathias Krause's avatar
      compat: account for grsecurity backports and changes · 29747255
      Mathias Krause 创作于
      
      grsecurity kernels tend to carry additional backports and changes, like
      commit b60b87fc2996 ("netlink: add ethernet address policy types") or
      the SYM_FUNC_* changes. RAP nowadays hooks the latter, therefore no
      diversion to RAP_ENTRY is needed any more.
      
      Instead of relying on the kernel version test, also test for the macros
      we're about to define to not already be defined to account for these
      additional changes in the grsecurity patch without breaking
      compatibility to the older public ones.
      
      Also test for CONFIG_PAX instead of RAP_PLUGIN for the timer API related
      changes as these don't depend on the RAP plugin to be enabled but just a
      PaX/grsecurity patch to be applied. While there is no preprocessor knob
      for the latter, use CONFIG_PAX as this will likely be enabled in every
      kernel that uses the patch.
      
      Signed-off-by: default avatarMathias Krause <minipli@grsecurity.net>
      [zx2c4: small changes to include a header nearby a macro def test]
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      29747255
  16. 6月 15, 2021
  17. 6月 06, 2021
  18. 6月 04, 2021
    • Jason A. Donenfeld's avatar
      allowedips: add missing __rcu annotation to satisfy sparse · fd7a4621
      Jason A. Donenfeld 创作于
      
      A __rcu annotation got lost during refactoring, which caused sparse to
      become enraged.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      fd7a4621
    • Jason A. Donenfeld's avatar
      allowedips: free empty intermediate nodes when removing single node · 383461db
      Jason A. Donenfeld 创作于
      
      When removing single nodes, it's possible that that node's parent is an
      empty intermediate node, in which case, it too should be removed.
      Otherwise the trie fills up and never is fully emptied, leading to
      gradual memory leaks over time for tries that are modified often. There
      was originally code to do this, but was removed during refactoring in
      2016 and never reworked. Now that we have proper parent pointers from
      the previous commits, we can implement this properly.
      
      In order to reduce branching and expensive comparisons, we want to keep
      the double pointer for parent assignment (which lets us easily chain up
      to the root), but we still need to actually get the parent's base
      address. So encode the bit number into the last two bits of the pointer,
      and pack and unpack it as needed. This is a little bit clumsy but is the
      fastest and less memory wasteful of the compromises. Note that we align
      the root struct here to a minimum of 4, because it's embedded into a
      larger struct, and we're relying on having the bottom two bits for our
      flag, which would only be 16-bit aligned on m68k.
      
      The existing macro-based helpers were a bit unwieldy for adding the bit
      packing to, so this commit replaces them with safer and clearer ordinary
      functions.
      
      We add a test to the randomized/fuzzer part of the selftests, to free
      the randomized tries by-peer, refuzz it, and repeat, until it's supposed
      to be empty, and then then see if that actually resulted in the whole
      thing being emptied. That combined with kmemcheck should hopefully make
      sure this commit is doing what it should. Along the way this resulted in
      various other cleanups of the tests and fixes for recent graphviz.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      383461db
    • Jason A. Donenfeld's avatar
      allowedips: allocate nodes in kmem_cache · 03add828
      Jason A. Donenfeld 创作于
      
      The previous commit moved from O(n) to O(1) for removal, but in the
      process introduced an additional pointer member to a struct that
      increased the size from 60 to 68 bytes, putting nodes in the 128-byte
      slab. With deployed systems having as many as 2 million nodes, this
      represents a significant doubling in memory usage (128 MiB -> 256 MiB).
      Fix this by using our own kmem_cache, that's sized exactly right. This
      also makes wireguard's memory usage more transparent in tools like
      slabtop and /proc/slabinfo.
      
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      03add828
    • Jason A. Donenfeld's avatar
      allowedips: remove nodes in O(1) · b56d48ce
      Jason A. Donenfeld 创作于
      
      Previously, deleting peers would require traversing the entire trie in
      order to rebalance nodes and safely free them. This meant that removing
      1000 peers from a trie with a half million nodes would take an extremely
      long time, during which we're holding the rtnl lock. Large-scale users
      were reporting 200ms latencies added to the networking stack as a whole
      every time their userspace software would queue up significant removals.
      That's a serious situation.
      
      This commit fixes that by maintaining a double pointer to the parent's
      bit pointer for each node, and then using the already existing node list
      belonging to each peer to go directly to the node, fix up its pointers,
      and free it with RCU. This means removal is O(1) instead of O(n), and we
      don't use gobs of stack.
      
      The removal algorithm has the same downside as the code that it fixes:
      it won't collapse needlessly long runs of fillers.  We can enhance that
      in the future if it ever becomes a problem. This commit documents that
      limitation with a TODO comment in code, a small but meaningful
      improvement over the prior situation.
      
      Currently the biggest flaw, which the next commit addresses, is that
      because this increases the node size on 64-bit machines from 60 bytes to
      68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up
      using twice as much memory per node, because of power-of-two
      allocations, which is a big bummer. We'll need to figure something out
      there.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      b56d48ce
    • Jason A. Donenfeld's avatar
      allowedips: initialize list head in selftest · 3c14c4bf
      Jason A. Donenfeld 创作于
      
      The randomized trie tests weren't initializing the dummy peer list head,
      resulting in a NULL pointer dereference when used. Fix this by
      initializing it in the randomized trie test, just like we do for the
      static unit test.
      
      While we're at it, all of the other strings like this have the word
      "self-test", so add it to the missing place here.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      3c14c4bf
    • Jason A. Donenfeld's avatar
      peer: allocate in kmem_cache · 4d8b7edc
      Jason A. Donenfeld 创作于
      
      With deployments having upwards of 600k peers now, this somewhat heavy
      structure could benefit from more fine-grained allocations.
      Specifically, instead of using a 2048-byte slab for a 1544-byte object,
      we can now use 1544-byte objects directly, thus saving almost 25%
      per-peer, or with 600k peers, that's a savings of 303 MiB. This also
      makes wireguard's memory usage more transparent in tools like slabtop
      and /proc/slabinfo.
      
      Suggested-by: default avatarArnd Bergmann <arnd@arndb.de>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      4d8b7edc
  19. 6月 03, 2021
    • Jason A. Donenfeld's avatar
      global: use synchronize_net rather than synchronize_rcu · 6fbc0e62
      Jason A. Donenfeld 创作于
      
      Many of the synchronization points are sometimes called under the rtnl
      lock, which means we should use synchronize_net rather than
      synchronize_rcu. Under the hood, this expands to using the expedited
      flavor of function in the event that rtnl is held, in order to not stall
      other concurrent changes.
      
      This fixes some very, very long delays when removing multiple peers at
      once, which would cause some operations to take several minutes.
      
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      6fbc0e62
加载中