Skip to content

Commit

Permalink
Switch csv and tsv method 'sv' from ReadAll() to stream each record w…
Browse files Browse the repository at this point in the history
…ith Read() (#355)

* switch csv and tsv method 'sv' from ReadAll() to stream each record with Read(), to improve memory usage. related to issue #354

* add testcases for one/two line csv/tsv

* Remove some test files to CSV, TSV

These files were introduced when changing csv detection from allocating
all results at once to streaming each line. This change improves
performance but the functionality remains exactly the same. Since the
functionality is unchanged, existing test cases should suffice.

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: guoguangwu <[email protected]>
Co-authored-by: robkau <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Gabriel Vasile <[email protected]>
Co-authored-by: guangwu <[email protected]>
  • Loading branch information
5 people authored Oct 10, 2023
1 parent 85b2cdc commit 9df6903
Showing 1 changed file with 14 additions and 2 deletions.
16 changes: 14 additions & 2 deletions internal/magic/text_csv.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package magic
import (
"bytes"
"encoding/csv"
"errors"
"io"
)

Expand All @@ -23,8 +24,19 @@ func sv(in []byte, comma rune, limit uint32) bool {
r.LazyQuotes = true
r.Comment = '#'

lines, err := r.ReadAll()
return err == nil && r.FieldsPerRecord > 1 && len(lines) > 1
lines := 0
for {
_, err := r.Read()
if errors.Is(err, io.EOF) {
break
}
if err != nil {
return false
}
lines++
}

return r.FieldsPerRecord > 1 && lines > 1
}

// dropLastLine drops the last incomplete line from b.
Expand Down

0 comments on commit 9df6903

Please sign in to comment.