- 6月 29, 2022
-
-
由 Jason A. Donenfeld 创作于
Nobody uses this and it's impossible to maintain given the current CI situation. RHEL 7 and 8 release remain for now, though that might not always be the case. See the link for details. Link: https://lists.zx2c4.com/pipermail/wireguard/2022-June/007664.html Suggested-by:
Philip J. Perry <phil@elrepo.org> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 6月 28, 2022
-
-
由 Jason A. Donenfeld 创作于
Also bump the c8s version stamp. Reported-by:
Vladimír Beneš <vbenes@redhat.com> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 6月 27, 2022
-
-
由 Jason A. Donenfeld 创作于
Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 6月 22, 2022
-
-
由 Jason A. Donenfeld 创作于
Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 5月 05, 2022
-
-
由 Jason A. Donenfeld 创作于
They keep breaking their kernel and being difficult when I send patches to fix it, so just give up on trying to support this in the CI. It'll bitrot and people will complain and we'll see what happens at that point. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
Rather than setting this once init is running, set panic_on_warn from the kernel command line, so that it catches splats from WireGuard initialization code and the various crypto selftests. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
Rather than having to hack up QEMU, just use the virtio serial device. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
The parallel tests were added to catch queueing issues from multiple cores. But what happens in reality when testing tons of processes is that these separate threads wind up fighting with the scheduler, and we wind up with contention in places we don't care about that decrease the chances of hitting a bug. So just do a test with the number of CPU cores, rather than trying to scale up arbitrarily. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
I hate to do this, but I still do not have a good solution to actually fix this bug across architectures. So just disable it for now, so that the CI can still deliver actionable results. This commit adds a large red warning, so that at least the failure isn't lost forever, and hopefully this can be revisited down the line. Link: https://lore.kernel.org/netdev/CAHmME9pv1x6C4TNdL6648HydD8r+txpV4hTUXOBVkrapBXH4QQ@mail.gmail.com/ Link: https://lore.kernel.org/netdev/YmszSXueTxYOC41G@zx2c4.com/ Link: https://lore.kernel.org/wireguard/CAHmME9rNnBiNvBstb7MPwK-7AmAN0sOfnhdR=eeLrowWcKxaaQ@mail.gmail.com/ Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 4月 14, 2022
-
-
由 Nikolay Aleksandrov 创作于
When we try to transmit an skb with md_dst attached through wireguard we hit a null pointer dereference in wg_xmit() due to the use of dst_mtu() which calls into dst_blackhole_mtu() which in turn tries to dereference dst->dev. Since wireguard doesn't use md_dsts we should use skb_valid_dst(), which checks for DST_METADATA flag, and if it's set, then falls back to wireguard's device mtu. That gives us the best chance of transmitting the packet; otherwise if the blackhole netdev is used we'd get ETH_MIN_MTU. [ 263.693506] BUG: kernel NULL pointer dereference, address: 00000000000000e0 [ 263.693908] #PF: supervisor read access in kernel mode [ 263.694174] #PF: error_code(0x0000) - not-present page [ 263.694424] PGD 0 P4D 0 [ 263.694653] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 263.694876] CPU: 5 PID: 951 Comm: mausezahn Kdump: loaded Not tainted 5.18.0-rc1+ #522 [ 263.695190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 [ 263.695529] RIP: 0010:dst_blackhole_mtu+0x17/0x20 [ 263.695770] Code: 00 00 00 0f 1f 44 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 10 48 83 e0 fc 8b 40 04 85 c0 75 09 48 8b 07 <8b> 80 e0 00 00 00 c3 66 90 0f 1f 44 00 00 48 89 d7 be 01 00 00 00 [ 263.696339] RSP: 0018:ffffa4a4422fbb28 EFLAGS: 00010246 [ 263.696600] RAX: 0000000000000000 RBX: ffff8ac9c3553000 RCX: 0000000000000000 [ 263.696891] RDX: 0000000000000401 RSI: 00000000fffffe01 RDI: ffffc4a43fb48900 [ 263.697178] RBP: ffffa4a4422fbb90 R08: ffffffff9622635e R09: 0000000000000002 [ 263.697469] R10: ffffffff9b69a6c0 R11: ffffa4a4422fbd0c R12: ffff8ac9d18b1a00 [ 263.697766] R13: ffff8ac9d0ce1840 R14: ffff8ac9d18b1a00 R15: ffff8ac9c3553000 [ 263.698054] FS: 00007f3704c337c0(0000) GS:ffff8acaebf40000(0000) knlGS:0000000000000000 [ 263.698470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 263.698826] CR2: 00000000000000e0 CR3: 0000000117a5c000 CR4: 00000000000006e0 [ 263.699214] Call Trace: [ 263.699505] <TASK> [ 263.699759] wg_xmit+0x411/0x450 [ 263.700059] ? bpf_skb_set_tunnel_key+0x46/0x2d0 [ 263.700382] ? dev_queue_xmit_nit+0x31/0x2b0 [ 263.700719] dev_hard_start_xmit+0xd9/0x220 [ 263.701047] __dev_queue_xmit+0x8b9/0xd30 [ 263.701344] __bpf_redirect+0x1a4/0x380 [ 263.701664] __dev_queue_xmit+0x83b/0xd30 [ 263.701961] ? packet_parse_headers+0xb4/0xf0 [ 263.702275] packet_sendmsg+0x9a8/0x16a0 [ 263.702596] ? _raw_spin_unlock_irqrestore+0x23/0x40 [ 263.702933] sock_sendmsg+0x5e/0x60 [ 263.703239] __sys_sendto+0xf0/0x160 [ 263.703549] __x64_sys_sendto+0x20/0x30 [ 263.703853] do_syscall_64+0x3b/0x90 [ 263.704162] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 263.704494] RIP: 0033:0x7f3704d50506 [ 263.704789] Code: 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89 [ 263.705652] RSP: 002b:00007ffe954b0b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 263.706141] RAX: ffffffffffffffda RBX: 0000558bb259b490 RCX: 00007f3704d50506 [ 263.706544] RDX: 000000000000004a RSI: 0000558bb259b7b2 RDI: 0000000000000003 [ 263.706952] RBP: 0000000000000000 R08: 00007ffe954b0b90 R09: 0000000000000014 [ 263.707339] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe954b0b90 [ 263.707735] R13: 000000000000004a R14: 0000558bb259b7b2 R15: 0000000000000001 [ 263.708132] </TASK> [ 263.708398] Modules linked in: bridge netconsole bonding [last unloaded: bridge] [ 263.708942] CR2: 00000000000000e0 Link: https://github.com/cilium/cilium/issues/19428 Reported-by:
Martynas Pumputis <m@lambda.lt> Signed-off-by:
Nikolay Aleksandrov <razor@blackwall.org> Acked-by:
Daniel Borkmann <daniel@iogearbox.net> [Jason: polyfilled for < 4.3] Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 4月 07, 2022
-
-
由 Jason A. Donenfeld 创作于
It turns out that by having CONFIG_ACPI=n, we've been failing to boot additional CPUs, and so these systems were functionally UP. The code bloat is unfortunate for build times, but I don't see an alternative. So this commit sets CONFIG_ACPI=y for x86_64 and i686 configs. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
The previous commit fixed a memory leak on the send path in the event that IPv6 is disabled at compile time, but how did a packet even arrive there to begin with? It turns out we have previously allowed IPv6 endpoints even when IPv6 support is disabled at compile time. This is awkward and inconsistent. Instead, let's just ignore all things IPv6, the same way we do other malformed endpoints, in the case where IPv6 is disabled. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 4月 06, 2022
-
-
由 Wang Hai 创作于
I got a memory leak report: unreferenced object 0xffff8881191fc040 (size 232): comm "kworker/u17:0", pid 23193, jiffies 4295238848 (age 3464.870s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814c3ef4>] slab_post_alloc_hook+0x84/0x3b0 [<ffffffff814c8977>] kmem_cache_alloc_node+0x167/0x340 [<ffffffff832974fb>] __alloc_skb+0x1db/0x200 [<ffffffff82612b5d>] wg_socket_send_buffer_to_peer+0x3d/0xc0 [<ffffffff8260e94a>] wg_packet_send_handshake_initiation+0xfa/0x110 [<ffffffff8260ec81>] wg_packet_handshake_send_worker+0x21/0x30 [<ffffffff8119c558>] process_one_work+0x2e8/0x770 [<ffffffff8119ca2a>] worker_thread+0x4a/0x4b0 [<ffffffff811a88e0>] kthread+0x120/0x160 [<ffffffff8100242f>] ret_from_fork+0x1f/0x30 In function wg_socket_send_buffer_as_reply_to_skb() or wg_socket_send_ buffer_to_peer(), the semantics of send6() is required to free skb. But when CONFIG_IPV6 is disable, kfree_skb() is missing. This patch adds it to fix this bug. Signed-off-by:
Wang Hai <wanghai38@huawei.com> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 3月 03, 2022
-
-
由 Jason A. Donenfeld 创作于
We don't actualy need to write anything in the pool. Instead, we just force the total over 128, and we should be good to go for all old kernels. We also only need this on getrandom() kernels, which simplifies things too. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
We make too nuanced use of ptr_ring to entirely move to the skb_array wrappers, but we at least should avoid the naughty function pointer cast when cleaning up skbs. Otherwise RAP/CFI will honk at us. This patch uses the __skb_array_destroy_skb wrapper for the cleanup, rather than directly providing kfree_skb, which is what other drivers in the same situation do too. Reported-by:
PaX Team <pageexec@freemail.hu> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 12月 14, 2021
-
-
由 Jason A. Donenfeld 创作于
Rather than passing all variables as modified, pass ones that are only read into that parameter. This helps with old gcc versions when alternatives are additionally used, and lets gcc's codegen be a little bit more efficient. This also syncs up with the latest Vale/EverCrypt output. This also forward ports 3c9f3b69 ("crypto: curve25519-x86_64: solve register constraints with reserved registers"). Cc: Aymeric Fromherz <aymeric.fromherz@inria.fr> Cc: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/wireguard/1554725710.1290070.1639240504281.JavaMail.zimbra@inria.fr/ Link: https://github.com/project-everest/hacl-star/pull/501 Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 12月 13, 2021
-
-
由 Jason A. Donenfeld 创作于
It's been over a year since we announced sunsetting this. Link: https://lore.kernel.org/wireguard/CAHmME9rckipsdZYW+LA=x6wCMybdFFA+VqoogFXnR=kHYiCteg@mail.gmail.com/T Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 12月 08, 2021
-
-
由 Jason A. Donenfeld 创作于
Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 12月 07, 2021
-
-
由 Mathias Krause 创作于
The register constraints for the inline assembly in fsqr() and fsqr2() are pretty tight on what the compiler may assign to the remaining three register variables. The clobber list only allows the following to be used: RDI, RSI, RBP and R12. With RAP reserving R12 and a kernel having CONFIG_FRAME_POINTER=y, claiming RBP, there are only two registers left so the compiler rightfully complains about impossible constraints. Provide alternatives that'll allow a memory reference for 'out' to solve the allocation constraint dilemma for this configuration. Also make 'out' an input-only operand as it is only used as such. This not only allows gcc to optimize its usage further, but also works around older gcc versions, apparently failing to handle multiple alternatives correctly, as in failing to initialize the 'out' operand with its input value. Signed-off-by:
Mathias Krause <minipli@grsecurity.net> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
The comment to sk_change_net is instructive: Kernel sockets, f.e. rtnl or icmp_socket, are a part of a namespace. They should not hold a reference to a namespace in order to allow to stop it. Sockets after sk_change_net should be released using sk_release_kernel We weren't following these rules before, and were instead using __sock_create, which means we kept a reference to the namespace, which in turn meant that interfaces were not cleaned up on namespace exit. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 12月 04, 2021
-
-
由 Arnd Bergmann 创作于
On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS because the ordinary load/store instructions (ldr, ldrh, ldrb) can tolerate any misalignment of the memory address. However, load/store double and load/store multiple instructions (ldrd, ldm) may still only be used on memory addresses that are 32-bit aligned, and so we have to use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we may end up with a severe performance hit due to alignment traps that require fixups by the kernel. Testing shows that this currently happens with clang-13 but not gcc-11. In theory, any compiler version can produce this bug or other problems, as we are dealing with undefined behavior in C99 even on architectures that support this in hardware, see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363 . Fortunately, the get_unaligned() accessors do the right thing: when building for ARMv6 or later, the compiler will emit unaligned accesses using the ordinary load/store instructions (but avoid the ones that require 32-bit alignment). When building for older ARM, those accessors will emit the appropriate sequence of ldrb/mov/orr instructions. And on architectures that can truly tolerate any kind of misalignment, the get_unaligned() accessors resolve to the leXX_to_cpup accessors that operate on aligned addresses. Since the compiler will in fact emit ldrd or ldm instructions when building this code for ARM v6 or later, the solution is to use the unaligned accessors unconditionally on architectures where this is known to be fast. The _aligned version of the hash function is however still needed to get the best performance on architectures that cannot do any unaligned access in hardware. This new version avoids the undefined behavior and should produce the fastest hash on all architectures we support. Reported-by:
Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Reviewed-by:
Jason A. Donenfeld <Jason@zx2c4.com> Acked-by:
Ard Biesheuvel <ardb@kernel.org> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Gustavo A. R. Silva 创作于
Use 2-factor argument form kvcalloc() instead of kvzalloc(). Signed-off-by:
Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
If we're being delivered packets from multiple CPUs so quickly that the ring lock is contended for CPU tries, then it's safe to assume that the queue is near capacity anyway, so just drop the packet rather than spinning. This helps deal with multicore DoS that can interfere with data path performance. It _still_ does not completely fix the issue, but it again chips away at it. Reported-by:
Streun Fabio <fstreun@student.ethz.ch> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
Apparently the spinlock on incoming_handshake's skb_queue is highly contended, and a torrent of handshake or cookie packets can bring the data plane to its knees, simply by virtue of enqueueing the handshake packets to be processed asynchronously. So, we try switching this to a ring buffer to hopefully have less lock contention. This alleviates the problem somewhat, though it still isn't perfect, so future patches will have to improve this further. However, it at least doesn't completely diminish the data plane. Reported-by:
Streun Fabio <fstreun@student.ethz.ch> Reported-by:
Joel Wanner <joel.wanner@inf.ethz.ch> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
Each peer's endpoint contains a dst_cache entry that takes a reference to another netdev. When the containing namespace exits, we take down the socket and prevent future sockets from being created (by setting creating_net to NULL), which removes that potential reference on the netns. However, it doesn't release references to the netns that a netdev cached in dst_cache might be taking, so the netns still might fail to exit. Since the socket is gimped anyway, we can simply clear all the dst_caches (by way of clearing the endpoint src), which will release all references. However, the current dst_cache_reset function only releases those references lazily. But it turns out that all of our usages of wg_socket_clear_peer_endpoint_src are called from contexts that are not exactly high-speed or bottle-necked. For example, when there's connection difficulty, or when userspace is reconfiguring the interface. And in particular for this patch, when the netns is exiting. So for those cases, it makes more sense to call dst_release immediately. For that, we add a small helper function to dst_cache. This patch also adds a test to netns.sh from Hangbin Liu to ensure this doesn't regress. Test-by:
Hangbin Liu <liuhangbin@gmail.com> Reported-by:
Xiumei Mu <xmu@redhat.com> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Randy Dunlap 创作于
Rename module_init & module_exit functions that are named "mod_init" and "mod_exit" so that they are unique in both the System.map file and in initcall_debug output instead of showing up as almost anonymous "mod_init". This is helpful for debugging and in determining how long certain module_init calls take to execute. Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
We previously removed the restriction on looping to self, and then added a test to make sure the kernel didn't blow up during a routing loop. The kernel didn't blow up, thankfully, but on certain architectures where skb fragmentation is easier, such as ppc64, the skbs weren't actually being discarded after a few rounds through. But the test wasn't catching this. So actually test explicitly for massive increases in tx to see if we have a routing loop. Note that the actual loop problem will need to be addressed in a different commit. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Peter Georg 创作于
RHEL 8.5 has been released. Replace all ISCENTOS8S checks with ISRHEL8. Increase RHEL_MINOR for CentOS 8 Stream detection to 6. Signed-off-by:
Peter Georg <peter.georg@physik.uni-regensburg.de> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 8月 09, 2021
-
-
由 Mathias Krause 创作于
grsecurity kernels tend to carry additional backports and changes, like commit b60b87fc2996 ("netlink: add ethernet address policy types") or the SYM_FUNC_* changes. RAP nowadays hooks the latter, therefore no diversion to RAP_ENTRY is needed any more. Instead of relying on the kernel version test, also test for the macros we're about to define to not already be defined to account for these additional changes in the grsecurity patch without breaking compatibility to the older public ones. Also test for CONFIG_PAX instead of RAP_PLUGIN for the timer API related changes as these don't depend on the RAP plugin to be enabled but just a PaX/grsecurity patch to be applied. While there is no preprocessor knob for the latter, use CONFIG_PAX as this will likely be enabled in every kernel that uses the patch. Signed-off-by:
Mathias Krause <minipli@grsecurity.net> [zx2c4: small changes to include a header nearby a macro def test] Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 6月 15, 2021
-
-
由 Jason A. Donenfeld 创作于
Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 6月 06, 2021
-
-
由 Jason A. Donenfeld 创作于
Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
The selftests currently parse the kernel log at the end to track potential memory leaks. With these tests now reading off the end of the buffer, due to recent optimizations, some creation messages were lost, making the tests think that there was a free without an alloc. Fix this by increasing the kernel log size. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
Red Hat does awful things to their kernel for RHEL 8, such that it doesn't even compile in most configurations. This is utter craziness, and their response to me sending patches to fix this stuff has been to stonewall for months on end and then do nothing. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 6月 04, 2021
-
-
由 Jason A. Donenfeld 创作于
A __rcu annotation got lost during refactoring, which caused sparse to become enraged. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
When removing single nodes, it's possible that that node's parent is an empty intermediate node, in which case, it too should be removed. Otherwise the trie fills up and never is fully emptied, leading to gradual memory leaks over time for tries that are modified often. There was originally code to do this, but was removed during refactoring in 2016 and never reworked. Now that we have proper parent pointers from the previous commits, we can implement this properly. In order to reduce branching and expensive comparisons, we want to keep the double pointer for parent assignment (which lets us easily chain up to the root), but we still need to actually get the parent's base address. So encode the bit number into the last two bits of the pointer, and pack and unpack it as needed. This is a little bit clumsy but is the fastest and less memory wasteful of the compromises. Note that we align the root struct here to a minimum of 4, because it's embedded into a larger struct, and we're relying on having the bottom two bits for our flag, which would only be 16-bit aligned on m68k. The existing macro-based helpers were a bit unwieldy for adding the bit packing to, so this commit replaces them with safer and clearer ordinary functions. We add a test to the randomized/fuzzer part of the selftests, to free the randomized tries by-peer, refuzz it, and repeat, until it's supposed to be empty, and then then see if that actually resulted in the whole thing being emptied. That combined with kmemcheck should hopefully make sure this commit is doing what it should. Along the way this resulted in various other cleanups of the tests and fixes for recent graphviz. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
The previous commit moved from O(n) to O(1) for removal, but in the process introduced an additional pointer member to a struct that increased the size from 60 to 68 bytes, putting nodes in the 128-byte slab. With deployed systems having as many as 2 million nodes, this represents a significant doubling in memory usage (128 MiB -> 256 MiB). Fix this by using our own kmem_cache, that's sized exactly right. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Suggested-by:
Arnd Bergmann <arnd@arndb.de> Suggested-by:
Matthew Wilcox <willy@infradead.org> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
Previously, deleting peers would require traversing the entire trie in order to rebalance nodes and safely free them. This meant that removing 1000 peers from a trie with a half million nodes would take an extremely long time, during which we're holding the rtnl lock. Large-scale users were reporting 200ms latencies added to the networking stack as a whole every time their userspace software would queue up significant removals. That's a serious situation. This commit fixes that by maintaining a double pointer to the parent's bit pointer for each node, and then using the already existing node list belonging to each peer to go directly to the node, fix up its pointers, and free it with RCU. This means removal is O(1) instead of O(n), and we don't use gobs of stack. The removal algorithm has the same downside as the code that it fixes: it won't collapse needlessly long runs of fillers. We can enhance that in the future if it ever becomes a problem. This commit documents that limitation with a TODO comment in code, a small but meaningful improvement over the prior situation. Currently the biggest flaw, which the next commit addresses, is that because this increases the node size on 64-bit machines from 60 bytes to 68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up using twice as much memory per node, because of power-of-two allocations, which is a big bummer. We'll need to figure something out there. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
The randomized trie tests weren't initializing the dummy peer list head, resulting in a NULL pointer dereference when used. Fix this by initializing it in the randomized trie test, just like we do for the static unit test. While we're at it, all of the other strings like this have the word "self-test", so add it to the missing place here. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
由 Jason A. Donenfeld 创作于
With deployments having upwards of 600k peers now, this somewhat heavy structure could benefit from more fine-grained allocations. Specifically, instead of using a 2048-byte slab for a 1544-byte object, we can now use 1544-byte objects directly, thus saving almost 25% per-peer, or with 600k peers, that's a savings of 303 MiB. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Suggested-by:
Arnd Bergmann <arnd@arndb.de> Suggested-by:
Matthew Wilcox <willy@infradead.org> Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-
- 6月 03, 2021
-
-
由 Jason A. Donenfeld 创作于
Many of the synchronization points are sometimes called under the rtnl lock, which means we should use synchronize_net rather than synchronize_rcu. Under the hood, this expands to using the expedited flavor of function in the event that rtnl is held, in order to not stall other concurrent changes. This fixes some very, very long delays when removing multiple peers at once, which would cause some operations to take several minutes. Signed-off-by:
Jason A. Donenfeld <Jason@zx2c4.com>
-