From 51d8118b43b15c3bf9941d4d5aef05ee8a9a0535 Mon Sep 17 00:00:00 2001 From: apainintheneck Date: Sun, 20 Oct 2024 19:51:11 -0700 Subject: [PATCH] Fix encoding bug in tests on Ruby 3 We need to guess the HTML encoding here otherwise some tests fail. ``` Failures: 1) Readability images should show one image, but outside of the best candidate Failure/Error: @input = @input.gsub(REGEXES[:replaceBrsRe], '

') ArgumentError: invalid byte sequence in UTF-8 # ./lib/readability.rb:51:in `gsub' # ./lib/readability.rb:51:in `initialize' # ./spec/readability_spec.rb:80:in `new' # ./spec/readability_spec.rb:80:in `block (3 levels) in ' 2) Readability the cant_read.html fixture should work on the cant_read.html fixture with some allowed tags Failure/Error: @input = @input.gsub(REGEXES[:replaceBrsRe], '

') ArgumentError: invalid byte sequence in UTF-8 # ./lib/readability.rb:51:in `gsub' # ./lib/readability.rb:51:in `initialize' # ./spec/readability_spec.rb:555:in `new' # ./spec/readability_spec.rb:555:in `block (3 levels) in ' ``` Fixes https://github.com/cantino/ruby-readability/issues/87 It also adds the latest Ruby 3 version to CI to test for these sort of bugs regularly. --- .github/workflows/ruby.yml | 2 +- lib/readability.rb | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/ruby.yml b/.github/workflows/ruby.yml index 1d00469..16c65f8 100644 --- a/.github/workflows/ruby.yml +++ b/.github/workflows/ruby.yml @@ -12,7 +12,7 @@ jobs: runs-on: ubuntu-latest strategy: matrix: - ruby-version: ['2.7'] + ruby-version: ['2.7', '3.3'] steps: - uses: actions/checkout@v2 diff --git a/lib/readability.rb b/lib/readability.rb index 4e4309f..2279517 100644 --- a/lib/readability.rb +++ b/lib/readability.rb @@ -43,7 +43,7 @@ def initialize(input, options = {}) @options = DEFAULT_OPTIONS.merge(options) @input = input - if RUBY_VERSION =~ /^(1\.9|2)/ && !@options[:encoding] + if RUBY_VERSION =~ /^(1\.9|2|3)/ && !@options[:encoding] @input = GuessHtmlEncoding.encode(@input, @options[:html_headers]) unless @options[:do_not_guess_encoding] @options[:encoding] = @input.encoding.to_s end