Skip to content

Releases: animetosho/ParPar

v0.4.4

10 Dec 10:37
Compare
Choose a tag to compare
  • Add AVX10/256 kernels for Affine and hashing
  • Add RISC-V Zbc CRC acceleration
  • Add non-Apple version of CLMul SHA3 kernel (mostly useful for Qualcomm chips)
  • Minor performance tweak to MD5 BMI1 kernel and ARM CLMul kernels
  • RISC-V compile fixes

v0.4.3

18 Oct 02:24
Compare
Choose a tag to compare
  • various fixes
  • display progress during initial file scan + enable some concurrency
  • add support for complex computed slice size options
  • display short help text by default

v0.4.2

30 Aug 12:45
Compare
Choose a tag to compare
  • Various fixes
  • Add Apple M1 optimised implementation of the CLMul GF16 kernel
  • Add RISC-V Vector implementation of the Shuffle128 GF16 kernel
  • Add advanced option to specify exact recovery exponents to use (via --recovery-exponents)

Note: linux-glibc builds are linked to glibc 2.35 (Ubuntu 22.04). linux-static builds don't support OpenCL.

v0.4.1

25 May 21:29
Compare
Choose a tag to compare

v0.4.0 bug fixes

Note: linux-glibc builds are linked to glibc 2.35 (Ubuntu 22.04). linux-static builds don't support OpenCL.

v0.4.0

24 May 11:39
Compare
Choose a tag to compare

Note: OpenCL support is not possible on static Linux builds.

Release Highlights

  • Overhaul GF16 processing backend
    • Remove GF-Complete components, rewrite framework and improve general region handling
    • Fully separate ISA compilation units to support dynamic dispatch (also enables static Linux builds)
    • Add dot-product, region interleaving, chunk packing and prefetching optimisations
    • Add new calculation kernels: CLMul for NEON, Affine AVX variant for x86 (for Alder Lake and later CPUs) and experimental Shuffle2x/Affine2x variants
    • Add ARM SVE and SVE2 support
    • More optimisations during initialisation for various kernels, coefficient computation, and tweaked loop-tiling parameters
    • Improve transposition performance for Xor-Jit kernel, plus add single-use JIT optimisations
    • Rework multi-threading and remove OpenMP dependency; threading now manually managed via libuv
    • Add experimental OpenCL backend for GPGPU acceleration
      • Disabled by default - must be manually enabled
      • Have noticed it generate incorrect output, particularly on non-Windows hosts - use with caution!
  • Add internal checksumming support to help detect memory errors during GF16 computation
  • Improve concurrency when transferring to/from GF backend and hashing
  • Redo MD5/CRC32 implementation for better optimisation
    • Input hashing now uses a stitched 2xMD5+CRC32 implementation
    • Add ASM MD5 implementation for x64/ARMv6/AArch64 (unsupported in MSVC)
    • Add ARM NEON and SVE2 MD5 implementations
    • Full SIMD width multi-buffer implementations
    • Remove node-yencode dependency
  • Add support for concurrently processing multiple files to work around bottlenecks with single threaded input hashing
  • Support concurrent I/O requests with chunked reading
  • Support for compiling under MSVC/Clang-CL for Windows ARM/64 targets
  • Separate GUI frontend available
  • Improve progress display accuracy
  • Various bug fixes

v0.3.2

25 Jan 10:20
Compare
Choose a tag to compare

This release mostly fixes build issues on ARM platforms.
If you're looking for Windows binaries, use v0.3.1 below as it's the same as v0.3.2.

v0.3.1

22 Jan 13:26
Compare
Choose a tag to compare
Update test arguments, add tweaks + support caching par2cmdline results

v0.3.0

17 May 09:17
Compare
Choose a tag to compare
Mark v0.3.0

v0.2.1

26 Jan 03:23
Compare
Choose a tag to compare

Fixes & tweaks from previous version.

v0.2.0

15 Nov 05:41
Compare
Choose a tag to compare

Most features available, can be considered an alpha release.