提交 · master · apsara2825 / wireguard-linux-compat

6月 29, 2022

compat: drop CentOS 8 Stream support · 3d3c92b4

Nobody uses this and it's impossible to maintain given the current CI
situation.

RHEL 7 and 8 release remain for now, though that might not always be the
case. See the link for details.

Link: https://lists.zx2c4.com/pipermail/wireguard/2022-June/007664.html


Suggested-by: Philip J. Perry <phil@elrepo.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

3d3c92b4

6月 28, 2022

compat: do not backport ktime_get_coarse_boottime_ns to c8s · 99935b07

由 Jason A. Donenfeld 创作于 2年前


Also bump the c8s version stamp.

Reported-by: Vladimír Beneš <vbenes@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

99935b07

6月 27, 2022
- version: bump · 18fbcd68
  由 Jason A. Donenfeld 创作于 2年前
  
  Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
  18fbcd68
6月 22, 2022
- compat: handle backported rng and blake2s · 3ec3e822
  由 Jason A. Donenfeld 创作于 2年前
  
  Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
  3ec3e822
5月 05, 2022

qemu: give up on RHEL8 in CI · ba45dd6f

由 Jason A. Donenfeld 创作于 2年前


They keep breaking their kernel and being difficult when I send patches
to fix it, so just give up on trying to support this in the CI. It'll
bitrot and people will complain and we'll see what happens at that
point.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

ba45dd6f

qemu: set panic_on_warn=1 from cmdline · c7560fd0

由 Jason A. Donenfeld 创作于 2年前


Rather than setting this once init is running, set panic_on_warn from
the kernel command line, so that it catches splats from WireGuard
initialization code and the various crypto selftests.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

c7560fd0

qemu: use vports on arm · 33c87a11

由 Jason A. Donenfeld 创作于 2年前


Rather than having to hack up QEMU, just use the virtio serial device.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

33c87a11

netns: limit parallelism to $(nproc) tests at once · 894152a5

由 Jason A. Donenfeld 创作于 2年前

The parallel tests were added to catch queueing issues from multiple
cores. But what happens in reality when testing tons of processes is
that these separate threads wind up fighting with the scheduler, and we
wind up with contention in places we don't care about that decrease the
chances of hitting a bug. So just do a test with the number of CPU
cores, rather than trying to scale up arbitrarily.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

894152a5

netns: make routing loop test non-fatal · f8886735

由 Jason A. Donenfeld 创作于 2年前

I hate to do this, but I still do not have a good solution to actually
fix this bug across architectures. So just disable it for now, so that
the CI can still deliver actionable results. This commit adds a large
red warning, so that at least the failure isn't lost forever, and
hopefully this can be revisited down the line.

Link: https://lore.kernel.org/netdev/CAHmME9pv1x6C4TNdL6648HydD8r+txpV4hTUXOBVkrapBXH4QQ@mail.gmail.com/
Link: https://lore.kernel.org/netdev/YmszSXueTxYOC41G@zx2c4.com/
Link: https://lore.kernel.org/wireguard/CAHmME9rNnBiNvBstb7MPwK-7AmAN0sOfnhdR=eeLrowWcKxaaQ@mail.gmail.com/

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

f8886735

4月 14, 2022

device: check for metadata_dst with skb_valid_dst() · f9d9b4db

由 Nikolay Aleksandrov 创作于 2年前

When we try to transmit an skb with md_dst attached through wireguard
we hit a null pointer dereference in wg_xmit() due to the use of
dst_mtu() which calls into dst_blackhole_mtu() which in turn tries to
dereference dst->dev.

Since wireguard doesn't use md_dsts we should use skb_valid_dst(), which
checks for DST_METADATA flag, and if it's set, then falls back to
wireguard's device mtu. That gives us the best chance of transmitting
the packet; otherwise if the blackhole netdev is used we'd get
ETH_MIN_MTU.

 [  263.693506] BUG: kernel NULL pointer dereference, address: 00000000000000e0
 [  263.693908] #PF: supervisor read access in kernel mode
 [  263.694174] #PF: error_code(0x0000) - not-present page
 [  263.694424] PGD 0 P4D 0
 [  263.694653] Oops: 0000 [#1] PREEMPT SMP NOPTI
 [  263.694876] CPU: 5 PID: 951 Comm: mausezahn Kdump: loaded Not tainted 5.18.0-rc1+ #522
 [  263.695190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
 [  263.695529] RIP: 0010:dst_blackhole_mtu+0x17/0x20
 [  263.695770] Code: 00 00 00 0f 1f 44 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 47 10 48 83 e0 fc 8b 40 04 85 c0 75 09 48 8b 07 <8b> 80 e0 00 00 00 c3 66 90 0f 1f 44 00 00 48 89 d7 be 01 00 00 00
 [  263.696339] RSP: 0018:ffffa4a4422fbb28 EFLAGS: 00010246
 [  263.696600] RAX: 0000000000000000 RBX: ffff8ac9c3553000 RCX: 0000000000000000
 [  263.696891] RDX: 0000000000000401 RSI: 00000000fffffe01 RDI: ffffc4a43fb48900
 [  263.697178] RBP: ffffa4a4422fbb90 R08: ffffffff9622635e R09: 0000000000000002
 [  263.697469] R10: ffffffff9b69a6c0 R11: ffffa4a4422fbd0c R12: ffff8ac9d18b1a00
 [  263.697766] R13: ffff8ac9d0ce1840 R14: ffff8ac9d18b1a00 R15: ffff8ac9c3553000
 [  263.698054] FS:  00007f3704c337c0(0000) GS:ffff8acaebf40000(0000) knlGS:0000000000000000
 [  263.698470] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  263.698826] CR2: 00000000000000e0 CR3: 0000000117a5c000 CR4: 00000000000006e0
 [  263.699214] Call Trace:
 [  263.699505]  <TASK>
 [  263.699759]  wg_xmit+0x411/0x450
 [  263.700059]  ? bpf_skb_set_tunnel_key+0x46/0x2d0
 [   263.700382]  ? dev_queue_xmit_nit+0x31/0x2b0
 [  263.700719]  dev_hard_start_xmit+0xd9/0x220
 [  263.701047]  __dev_queue_xmit+0x8b9/0xd30
 [  263.701344]  __bpf_redirect+0x1a4/0x380
 [  263.701664]  __dev_queue_xmit+0x83b/0xd30
 [  263.701961]  ? packet_parse_headers+0xb4/0xf0
 [  263.702275]  packet_sendmsg+0x9a8/0x16a0
 [  263.702596]  ? _raw_spin_unlock_irqrestore+0x23/0x40
 [  263.702933]  sock_sendmsg+0x5e/0x60
 [  263.703239]  __sys_sendto+0xf0/0x160
 [  263.703549]  __x64_sys_sendto+0x20/0x30
 [  263.703853]  do_syscall_64+0x3b/0x90
 [  263.704162]  entry_SYSCALL_64_after_hwframe+0x44/0xae
 [  263.704494] RIP: 0033:0x7f3704d50506
 [  263.704789] Code: 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
 [  263.705652] RSP: 002b:00007ffe954b0b88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 [  263.706141] RAX: ffffffffffffffda RBX: 0000558bb259b490 RCX: 00007f3704d50506
 [  263.706544] RDX: 000000000000004a RSI: 0000558bb259b7b2 RDI: 0000000000000003
 [  263.706952] RBP: 0000000000000000 R08: 00007ffe954b0b90 R09: 0000000000000014
 [  263.707339] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe954b0b90
 [  263.707735] R13: 000000000000004a R14: 0000558bb259b7b2 R15: 0000000000000001
 [  263.708132]  </TASK>
 [  263.708398] Modules linked in: bridge netconsole bonding [last unloaded: bridge]
 [  263.708942] CR2: 00000000000000e0

Link: https://github.com/cilium/cilium/issues/19428


Reported-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
[Jason: polyfilled for < 4.3]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

f9d9b4db

4月 07, 2022

qemu: enable ACPI for SMP · f909532a

由 Jason A. Donenfeld 创作于 2年前

It turns out that by having CONFIG_ACPI=n, we've been failing to boot
additional CPUs, and so these systems were functionally UP. The code
bloat is unfortunate for build times, but I don't see an alternative. So
this commit sets CONFIG_ACPI=y for x86_64 and i686 configs.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

f909532a

socket: ignore v6 endpoints when ipv6 is disabled · ec89ca64

由 Jason A. Donenfeld 创作于 2年前


The previous commit fixed a memory leak on the send path in the event
that IPv6 is disabled at compile time, but how did a packet even arrive
there to begin with? It turns out we have previously allowed IPv6
endpoints even when IPv6 support is disabled at compile time. This is
awkward and inconsistent. Instead, let's just ignore all things IPv6,
the same way we do other malformed endpoints, in the case where IPv6 is
disabled.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

ec89ca64

4月 06, 2022

socket: free skb in send6 when ipv6 is disabled · fa32671b

由 Wang Hai 创作于 2年前


I got a memory leak report:

unreferenced object 0xffff8881191fc040 (size 232):
  comm "kworker/u17:0", pid 23193, jiffies 4295238848 (age 3464.870s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff814c3ef4>] slab_post_alloc_hook+0x84/0x3b0
    [<ffffffff814c8977>] kmem_cache_alloc_node+0x167/0x340
    [<ffffffff832974fb>] __alloc_skb+0x1db/0x200
    [<ffffffff82612b5d>] wg_socket_send_buffer_to_peer+0x3d/0xc0
    [<ffffffff8260e94a>] wg_packet_send_handshake_initiation+0xfa/0x110
    [<ffffffff8260ec81>] wg_packet_handshake_send_worker+0x21/0x30
    [<ffffffff8119c558>] process_one_work+0x2e8/0x770
    [<ffffffff8119ca2a>] worker_thread+0x4a/0x4b0
    [<ffffffff811a88e0>] kthread+0x120/0x160
    [<ffffffff8100242f>] ret_from_fork+0x1f/0x30

In function wg_socket_send_buffer_as_reply_to_skb() or wg_socket_send_
buffer_to_peer(), the semantics of send6() is required to free skb. But
when CONFIG_IPV6 is disable, kfree_skb() is missing. This patch adds it
to fix this bug.

Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

fa32671b

3月 03, 2022

qemu: simplify RNG seeding · ffb8cd62

由 Jason A. Donenfeld 创作于 3年前


We don't actualy need to write anything in the pool. Instead, we just
force the total over 128, and we should be good to go for all old
kernels. We also only need this on getrandom() kernels, which simplifies
things too.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

ffb8cd62

queueing: use CFI-safe ptr_ring cleanup function · 4eff63d2

由 Jason A. Donenfeld 创作于 3年前


We make too nuanced use of ptr_ring to entirely move to the skb_array
wrappers, but we at least should avoid the naughty function pointer cast
when cleaning up skbs. Otherwise RAP/CFI will honk at us. This patch
uses the __skb_array_destroy_skb wrapper for the cleanup, rather than
directly providing kfree_skb, which is what other drivers in the same
situation do too.

Reported-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

4eff63d2

12月 14, 2021

crypto: curve25519-x86_64: use in/out register constraints more precisely · 273018b7

由 Jason A. Donenfeld 创作于 3年前

Rather than passing all variables as modified, pass ones that are only
read into that parameter. This helps with old gcc versions when
alternatives are additionally used, and lets gcc's codegen be a little
bit more efficient. This also syncs up with the latest Vale/EverCrypt
output.

This also forward ports 3c9f3b69 ("crypto: curve25519-x86_64: solve
register constraints with reserved registers").

Cc: Aymeric Fromherz <aymeric.fromherz@inria.fr>
Cc: Mathias Krause <minipli@grsecurity.net>
Link: https://lore.kernel.org/wireguard/1554725710.1290070.1639240504281.JavaMail.zimbra@inria.fr/
Link: https://github.com/project-everest/hacl-star/pull/501


Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

273018b7

12月 13, 2021

compat: drop Ubuntu 14.04 · 4f4c0198

由 Jason A. Donenfeld 创作于 3年前

It's been over a year since we announced sunsetting this.

Link: https://lore.kernel.org/wireguard/CAHmME9rckipsdZYW+LA=x6wCMybdFFA+VqoogFXnR=kHYiCteg@mail.gmail.com/T

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

4f4c0198

12月 08, 2021
- version: bump · 743eef23
  由 Jason A. Donenfeld 创作于 3年前
  
  Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
  743eef23
12月 07, 2021

crypto: curve25519-x86_64: solve register constraints with reserved registers · 3c9f3b69

由 Mathias Krause 创作于 3年前

The register constraints for the inline assembly in fsqr() and fsqr2()
are pretty tight on what the compiler may assign to the remaining three
register variables. The clobber list only allows the following to be
used: RDI, RSI, RBP and R12. With RAP reserving R12 and a kernel having
CONFIG_FRAME_POINTER=y, claiming RBP, there are only two registers left
so the compiler rightfully complains about impossible constraints.

Provide alternatives that'll allow a memory reference for 'out' to solve
the allocation constraint dilemma for this configuration.

Also make 'out' an input-only operand as it is only used as such. This
not only allows gcc to optimize its usage further, but also works around
older gcc versions, apparently failing to handle multiple alternatives
correctly, as in failing to initialize the 'out' operand with its input
value.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

3c9f3b69

compat: udp_tunnel: don't take reference to non-init namespace · 8e40dd62

由 Jason A. Donenfeld 创作于 3年前


The comment to sk_change_net is instructive:

  Kernel sockets, f.e. rtnl or icmp_socket, are a part of a namespace.
  They should not hold a reference to a namespace in order to allow
  to stop it.
  Sockets after sk_change_net should be released using sk_release_kernel

We weren't following these rules before, and were instead using
__sock_create, which means we kept a reference to the namespace, which
in turn meant that interfaces were not cleaned up on namespace
exit.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

8e40dd62

12月 04, 2021

compat: siphash: use _unaligned version by default · ea6b8e7b

由 Arnd Bergmann 创作于 3年前

On ARM v6 and later, we define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
because the ordinary load/store instructions (ldr, ldrh, ldrb) can
tolerate any misalignment of the memory address. However, load/store
double and load/store multiple instructions (ldrd, ldm) may still only
be used on memory addresses that are 32-bit aligned, and so we have to
use the CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS macro with care, or we
may end up with a severe performance hit due to alignment traps that
require fixups by the kernel. Testing shows that this currently happens
with clang-13 but not gcc-11. In theory, any compiler version can
produce this bug or other problems, as we are dealing with undefined
behavior in C99 even on architectures that support this in hardware,
see also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

.

Fortunately, the get_unaligned() accessors do the right thing: when
building for ARMv6 or later, the compiler will emit unaligned accesses
using the ordinary load/store instructions (but avoid the ones that
require 32-bit alignment). When building for older ARM, those accessors
will emit the appropriate sequence of ldrb/mov/orr instructions. And on
architectures that can truly tolerate any kind of misalignment, the
get_unaligned() accessors resolve to the leXX_to_cpup accessors that
operate on aligned addresses.

Since the compiler will in fact emit ldrd or ldm instructions when
building this code for ARM v6 or later, the solution is to use the
unaligned accessors unconditionally on architectures where this is
known to be fast. The _aligned version of the hash function is
however still needed to get the best performance on architectures
that cannot do any unaligned access in hardware.

This new version avoids the undefined behavior and should produce
the fastest hash on all architectures we support.

Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

ea6b8e7b

ratelimiter: use kvcalloc() instead of kvzalloc() · 5325bc82

由 Gustavo A. R. Silva 创作于 3年前


Use 2-factor argument form kvcalloc() instead of kvzalloc().

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

5325bc82

receive: drop handshakes if queue lock is contended · e44c78cb

由 Jason A. Donenfeld 创作于 3年前


If we're being delivered packets from multiple CPUs so quickly that the
ring lock is contended for CPU tries, then it's safe to assume that the
queue is near capacity anyway, so just drop the packet rather than
spinning. This helps deal with multicore DoS that can interfere with
data path performance. It _still_ does not completely fix the issue, but
it again chips away at it.

Reported-by: Streun Fabio <fstreun@student.ethz.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

e44c78cb

receive: use ring buffer for incoming handshakes · 5707d38f

由 Jason A. Donenfeld 创作于 3年前


Apparently the spinlock on incoming_handshake's skb_queue is highly
contended, and a torrent of handshake or cookie packets can bring the
data plane to its knees, simply by virtue of enqueueing the handshake
packets to be processed asynchronously. So, we try switching this to a
ring buffer to hopefully have less lock contention. This alleviates the
problem somewhat, though it still isn't perfect, so future patches will
have to improve this further. However, it at least doesn't completely
diminish the data plane.

Reported-by: Streun Fabio <fstreun@student.ethz.ch>
Reported-by: Joel Wanner <joel.wanner@inf.ethz.ch>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

5707d38f

device: reset peer src endpoint when netns exits · 68abb1b9

由 Jason A. Donenfeld 创作于 3年前


Each peer's endpoint contains a dst_cache entry that takes a reference
to another netdev. When the containing namespace exits, we take down the
socket and prevent future sockets from being created (by setting
creating_net to NULL), which removes that potential reference on the
netns. However, it doesn't release references to the netns that a netdev
cached in dst_cache might be taking, so the netns still might fail to
exit. Since the socket is gimped anyway, we can simply clear all the
dst_caches (by way of clearing the endpoint src), which will release all
references.

However, the current dst_cache_reset function only releases those
references lazily. But it turns out that all of our usages of
wg_socket_clear_peer_endpoint_src are called from contexts that are not
exactly high-speed or bottle-necked. For example, when there's
connection difficulty, or when userspace is reconfiguring the interface.
And in particular for this patch, when the netns is exiting. So for
those cases, it makes more sense to call dst_release immediately. For
that, we add a small helper function to dst_cache.

This patch also adds a test to netns.sh from Hangbin Liu to ensure this
doesn't regress.

Test-by: Hangbin Liu <liuhangbin@gmail.com>
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

68abb1b9

main: rename 'mod_init' & 'mod_exit' functions to be module-specific · ea3f5fbe

由 Randy Dunlap 创作于 3年前


Rename module_init & module_exit functions that are named
"mod_init" and "mod_exit" so that they are unique in both the
System.map file and in initcall_debug output instead of showing
up as almost anonymous "mod_init".

This is helpful for debugging and in determining how long certain
module_init calls take to execute.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

ea3f5fbe

netns: actually test for routing loops · cb001d45

由 Jason A. Donenfeld 创作于 3年前

We previously removed the restriction on looping to self, and then added
a test to make sure the kernel didn't blow up during a routing loop. The
kernel didn't blow up, thankfully, but on certain architectures where
skb fragmentation is easier, such as ppc64, the skbs weren't actually
being discarded after a few rounds through. But the test wasn't catching
this. So actually test explicitly for massive increases in tx to see if
we have a routing loop. Note that the actual loop problem will need to
be addressed in a different commit.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

cb001d45

compat: update for RHEL 8.5 · 2715e641

由 Peter Georg 创作于 3年前


RHEL 8.5 has been released. Replace all ISCENTOS8S checks with ISRHEL8.
Increase RHEL_MINOR for CentOS 8 Stream detection to 6.

Signed-off-by: Peter Georg <peter.georg@physik.uni-regensburg.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

2715e641

8月 09, 2021

compat: account for grsecurity backports and changes · 29747255

由 Mathias Krause 创作于 3年前


grsecurity kernels tend to carry additional backports and changes, like
commit b60b87fc2996 ("netlink: add ethernet address policy types") or
the SYM_FUNC_* changes. RAP nowadays hooks the latter, therefore no
diversion to RAP_ENTRY is needed any more.

Instead of relying on the kernel version test, also test for the macros
we're about to define to not already be defined to account for these
additional changes in the grsecurity patch without breaking
compatibility to the older public ones.

Also test for CONFIG_PAX instead of RAP_PLUGIN for the timer API related
changes as these don't depend on the RAP plugin to be enabled but just a
PaX/grsecurity patch to be applied. While there is no preprocessor knob
for the latter, use CONFIG_PAX as this will likely be enabled in every
kernel that uses the patch.

Signed-off-by: Mathias Krause <minipli@grsecurity.net>
[zx2c4: small changes to include a header nearby a macro def test]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

29747255

6月 15, 2021
- compat: account for latest c8s backports · 50dda8ce
  由 Jason A. Donenfeld 创作于 3年前
  
  Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
  50dda8ce
6月 06, 2021

version: bump · d378f930
由 Jason A. Donenfeld 创作于 3年前
```
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
```
d378f930

qemu: increase default dmesg log size · fb4a0da6

由 Jason A. Donenfeld 创作于 3年前


The selftests currently parse the kernel log at the end to track
potential memory leaks. With these tests now reading off the end of the
buffer, due to recent optimizations, some creation messages were lost,
making the tests think that there was a free without an alloc. Fix this
by increasing the kernel log size.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

fb4a0da6

qemu: add disgusting hacks for RHEL 8 · 8f4414d3

由 Jason A. Donenfeld 创作于 3年前


Red Hat does awful things to their kernel for RHEL 8, such that it
doesn't even compile in most configurations. This is utter craziness,
and their response to me sending patches to fix this stuff has been to
stonewall for months on end and then do nothing.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

8f4414d3

6月 04, 2021

allowedips: add missing __rcu annotation to satisfy sparse · fd7a4621

由 Jason A. Donenfeld 创作于 3年前


A __rcu annotation got lost during refactoring, which caused sparse to
become enraged.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

fd7a4621

allowedips: free empty intermediate nodes when removing single node · 383461db

由 Jason A. Donenfeld 创作于 3年前

When removing single nodes, it's possible that that node's parent is an
empty intermediate node, in which case, it too should be removed.
Otherwise the trie fills up and never is fully emptied, leading to
gradual memory leaks over time for tries that are modified often. There
was originally code to do this, but was removed during refactoring in
2016 and never reworked. Now that we have proper parent pointers from
the previous commits, we can implement this properly.

In order to reduce branching and expensive comparisons, we want to keep
the double pointer for parent assignment (which lets us easily chain up
to the root), but we still need to actually get the parent's base
address. So encode the bit number into the last two bits of the pointer,
and pack and unpack it as needed. This is a little bit clumsy but is the
fastest and less memory wasteful of the compromises. Note that we align
the root struct here to a minimum of 4, because it's embedded into a
larger struct, and we're relying on having the bottom two bits for our
flag, which would only be 16-bit aligned on m68k.

The existing macro-based helpers were a bit unwieldy for adding the bit
packing to, so this commit replaces them with safer and clearer ordinary
functions.

We add a test to the randomized/fuzzer part of the selftests, to free
the randomized tries by-peer, refuzz it, and repeat, until it's supposed
to be empty, and then then see if that actually resulted in the whole
thing being emptied. That combined with kmemcheck should hopefully make
sure this commit is doing what it should. Along the way this resulted in
various other cleanups of the tests and fixes for recent graphviz.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

383461db

allowedips: allocate nodes in kmem_cache · 03add828

由 Jason A. Donenfeld 创作于 3年前


The previous commit moved from O(n) to O(1) for removal, but in the
process introduced an additional pointer member to a struct that
increased the size from 60 to 68 bytes, putting nodes in the 128-byte
slab. With deployed systems having as many as 2 million nodes, this
represents a significant doubling in memory usage (128 MiB -> 256 MiB).
Fix this by using our own kmem_cache, that's sized exactly right. This
also makes wireguard's memory usage more transparent in tools like
slabtop and /proc/slabinfo.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

03add828

allowedips: remove nodes in O(1) · b56d48ce

由 Jason A. Donenfeld 创作于 3年前

Previously, deleting peers would require traversing the entire trie in
order to rebalance nodes and safely free them. This meant that removing
1000 peers from a trie with a half million nodes would take an extremely
long time, during which we're holding the rtnl lock. Large-scale users
were reporting 200ms latencies added to the networking stack as a whole
every time their userspace software would queue up significant removals.
That's a serious situation.

This commit fixes that by maintaining a double pointer to the parent's
bit pointer for each node, and then using the already existing node list
belonging to each peer to go directly to the node, fix up its pointers,
and free it with RCU. This means removal is O(1) instead of O(n), and we
don't use gobs of stack.

The removal algorithm has the same downside as the code that it fixes:
it won't collapse needlessly long runs of fillers. We can enhance that
in the future if it ever becomes a problem. This commit documents that
limitation with a TODO comment in code, a small but meaningful
improvement over the prior situation.

Currently the biggest flaw, which the next commit addresses, is that
because this increases the node size on 64-bit machines from 60 bytes to
68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up
using twice as much memory per node, because of power-of-two
allocations, which is a big bummer. We'll need to figure something out
there.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

b56d48ce

allowedips: initialize list head in selftest · 3c14c4bf

由 Jason A. Donenfeld 创作于 3年前


The randomized trie tests weren't initializing the dummy peer list head,
resulting in a NULL pointer dereference when used. Fix this by
initializing it in the randomized trie test, just like we do for the
static unit test.

While we're at it, all of the other strings like this have the word
"self-test", so add it to the missing place here.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

3c14c4bf

peer: allocate in kmem_cache · 4d8b7edc

由 Jason A. Donenfeld 创作于 3年前


With deployments having upwards of 600k peers now, this somewhat heavy
structure could benefit from more fine-grained allocations.
Specifically, instead of using a 2048-byte slab for a 1544-byte object,
we can now use 1544-byte objects directly, thus saving almost 25%
per-peer, or with 600k peers, that's a savings of 303 MiB. This also
makes wireguard's memory usage more transparent in tools like slabtop
and /proc/slabinfo.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

4d8b7edc

6月 03, 2021

global: use synchronize_net rather than synchronize_rcu · 6fbc0e62

由 Jason A. Donenfeld 创作于 3年前


Many of the synchronization points are sometimes called under the rtnl
lock, which means we should use synchronize_net rather than
synchronize_rcu. Under the hood, this expands to using the expedited
flavor of function in the event that rtnl is held, in order to not stall
other concurrent changes.

This fixes some very, very long delays when removing multiple peers at
once, which would cause some operations to take several minutes.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

6fbc0e62