From 51d8118b43b15c3bf9941d4d5aef05ee8a9a0535 Mon Sep 17 00:00:00 2001
From: apainintheneck
Date: Sun, 20 Oct 2024 19:51:11 -0700
Subject: [PATCH] Fix encoding bug in tests on Ruby 3
We need to guess the HTML encoding here otherwise some tests fail.
```
Failures:
1) Readability images should show one image, but outside of the best candidate
Failure/Error: @input = @input.gsub(REGEXES[:replaceBrsRe], '
')
ArgumentError:
invalid byte sequence in UTF-8
# ./lib/readability.rb:51:in `gsub'
# ./lib/readability.rb:51:in `initialize'
# ./spec/readability_spec.rb:80:in `new'
# ./spec/readability_spec.rb:80:in `block (3 levels) in '
2) Readability the cant_read.html fixture should work on the cant_read.html fixture with some allowed tags
Failure/Error: @input = @input.gsub(REGEXES[:replaceBrsRe], '
')
ArgumentError:
invalid byte sequence in UTF-8
# ./lib/readability.rb:51:in `gsub'
# ./lib/readability.rb:51:in `initialize'
# ./spec/readability_spec.rb:555:in `new'
# ./spec/readability_spec.rb:555:in `block (3 levels) in '
```
Fixes https://github.com/cantino/ruby-readability/issues/87
It also adds the latest Ruby 3 version to CI to test for these sort of bugs regularly.
---
.github/workflows/ruby.yml | 2 +-
lib/readability.rb | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/.github/workflows/ruby.yml b/.github/workflows/ruby.yml
index 1d00469..16c65f8 100644
--- a/.github/workflows/ruby.yml
+++ b/.github/workflows/ruby.yml
@@ -12,7 +12,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
- ruby-version: ['2.7']
+ ruby-version: ['2.7', '3.3']
steps:
- uses: actions/checkout@v2
diff --git a/lib/readability.rb b/lib/readability.rb
index 4e4309f..2279517 100644
--- a/lib/readability.rb
+++ b/lib/readability.rb
@@ -43,7 +43,7 @@ def initialize(input, options = {})
@options = DEFAULT_OPTIONS.merge(options)
@input = input
- if RUBY_VERSION =~ /^(1\.9|2)/ && !@options[:encoding]
+ if RUBY_VERSION =~ /^(1\.9|2|3)/ && !@options[:encoding]
@input = GuessHtmlEncoding.encode(@input, @options[:html_headers]) unless @options[:do_not_guess_encoding]
@options[:encoding] = @input.encoding.to_s
end