Skip to content
代码片段 群组 项目
  1. 12月 31, 2023
  2. 12月 30, 2023
  3. 12月 29, 2023
  4. 12月 19, 2023
  5. 12月 18, 2023
  6. 12月 16, 2023
  7. 12月 14, 2023
  8. 11月 29, 2023
  9. 11月 28, 2023
  10. 11月 27, 2023
  11. 11月 25, 2023
  12. 11月 22, 2023
    • Nick Terrell's avatar
      Modernize macros to use `do { } while (0)` · 81932506
      Nick Terrell 创作于
      This PR introduces no functional changes. It attempts to change all
      macros currently using `{ }` or some variant of that to to
      `do { } while (0)`, and introduces trailing `;` where necessary.
      There were no bugs found during this migration.
      
      The bug in Visual Studios warning on this has been fixed since VS2015.
      Additionally, we have several instances of `do { } while (0)` which have
      been present for several releases, so we don't have to worry about
      breaking peoples builds.
      
      Fixes Issue #3830.
      81932506
    • Yann Collet's avatar
      Merge pull request #3820 from facebook/xxh082 · 6b3d12fe
      Yann Collet 创作于
      update xxhash library to v0.8.2
      6b3d12fe
  13. 11月 21, 2023
    • Nick Terrell's avatar
      [huf] Fix null pointer addition · dd4de1dd
      Nick Terrell 创作于
      `HUF_DecompressFastArgs_init()` was adding 0 to NULL. Fix it by exiting
      early for empty outputs. This is no change in behavior, because the
      function was already exiting 0 in this case, just slightly later.
      dd4de1dd
    • Nick Terrell's avatar
      [huf] Improve fast C & ASM performance on small data · 5ab78c04
      Nick Terrell 创作于
      * Rename `ilimit` to `ilowest` and set it equal to `src` instead of
        `src + 6 + 8`. This is safe because the fast decoding loops guarantee
        to never read below `ilowest` already. This allows the fast decoder to
        run for at least two more iterations, because it consumes at most 7
        bytes per iteration.
      * Continue the fast loop all the way until the number of safe iterations
       is 0. Initially, I thought that when it got towards the end, the
       computation of how many iterations of safe might become expensive. But
       it ends up being slower to have to decode each of the 4 streams
       individually, which makes sense.
      
      This drastically speeds up the Huffman decoder on the `github` dataset
      for the issue raised in #3762, measured with `zstd -b1e1r github/`.
      
      | Decoder  | Speed before | Speed after |
      |----------|--------------|-------------|
      | Fallback | 477 MB/s     | 477 MB/s    |
      | Fast C   | 384 MB/s     | 492 MB/s    |
      | Assembly | 385 MB/s     | 501 MB/s    |
      
      We can also look at the speed delta for different block sizes of silesia
      using `zstd -b1e1r silesia.tar -B#`.
      
      | Decoder  | -B1K ∆ | -B2K ∆ | -B4K ∆ | -B8K ∆ | -B16K ∆ | -B32K ∆ | -B64K ∆ | -B128K ∆ |
      |----------|--------|--------|--------|--------|---------|---------|---------|----------|
      | Fast C   | +11.2% | +8.2%  | +6.1%  | +4.4%  | +2.7%   | +1.5%   | +0.6%   | +0.2%    |
      | Assembly | +12.5% | +9.0%  | +6.2%  | +3.6%  | +1.5%   | +0.7%   | +0.2%   | +0.03%   |
      5ab78c04
    • Nick Terrell's avatar
      [huf] Improve fast huffman decoding speed in linux kernel · c7269add
      Nick Terrell 创作于
      gcc in the linux kernel was not unrolling the inner loops of the Huffman
      decoder, which was destroying decoding performance. The compiler was
      generating crazy code with all sorts of branches. I suspect because of
      Spectre mitigations, but I'm not certain. Once the loops were manually
      unrolled, performance was restored.
      
      Additionally, when gcc couldn't prove that the variable left shift in
      the 4X2 decode loop wasn't greater than 63, it inserted checks to verify
      it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete
      this check. This is a no op, because `entry.nbBits` is guaranteed to be
      less than 64.
      
      Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the
      fast C loops for Issue #3762. So if even after this change, there is a
      performance regression, users can opt-out at compile time.
      c7269add
  14. 11月 18, 2023
  15. 11月 17, 2023
  16. 11月 15, 2023
  17. 11月 14, 2023
  18. 11月 13, 2023
  19. 11月 09, 2023
  20. 11月 08, 2023
  21. 11月 02, 2023
  22. 11月 01, 2023
  23. 10月 31, 2023
加载中