diff --git a/README.md b/README.md index f41b139b..e5ce0054 100644 --- a/README.md +++ b/README.md @@ -198,8 +198,18 @@ My Title,Hello World! ``` The files uploaded to your Active Storage service provider will be renamed to -include an ISO 8601 timestamp and the Task name in snake case format. The CSV is -expected to have a trailing newline at the end of the file. +include an ISO 8601 timestamp and the Task name in snake case format. + +The implicit `#count` method loads and parses the entire file to determine the +accurate number of rows. With files with millions of rows, it takes several +seconds to process. Consider skipping the count (defining a `count` that returns +`nil`) or use an approximation, eg: count the number of new lines: + +```ruby +def count(task) + task.csv_content.count("\n") - 1 +end +``` #### Batch CSV Tasks diff --git a/app/models/maintenance_tasks/csv_collection_builder.rb b/app/models/maintenance_tasks/csv_collection_builder.rb index 5f08e1cc..a38d6a75 100644 --- a/app/models/maintenance_tasks/csv_collection_builder.rb +++ b/app/models/maintenance_tasks/csv_collection_builder.rb @@ -15,14 +15,13 @@ def collection(task) CSV.new(task.csv_content, headers: true) end - # The number of rows to be processed. Excludes the header row from the - # count and assumes a trailing newline is at the end of the CSV file. - # Note that this number is an approximation based on the number of - # newlines. + # The number of rows to be processed. + # It uses the CSV library for an accurate row count. + # Note that the entire file is loaded. It will take several seconds with files with millions of rows. # # @return [Integer] the approximate number of rows to process. def count(task) - task.csv_content.count("\n") - 1 + CSV.new(task.csv_content, headers: true).count end # Return that the Task processes CSV content.