Demultiplexer scales badly with Hamming distance #103

tp2750 · 2020-06-01T18:06:03Z

Background

I'm using Demultiplexer() to demultiplex nanopore reads.
This works well, but when allowing more errors in the barcodes, the time to generate the demultiplexer grows very fast.

Current Behavior

Allowing one more error cost more than 10 times longer in terms of time and allocations.

Desired Behavior

It would be great if it was faster.

Steps to reproduce

julia> @time Demultiplexer(LongDNASeq.(["GGAGAAGAAGAAGAA"]), n_max_errors=1, distance=:hamming)
  0.000388 seconds (1.56 k allocations: 162.047 KiB)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 1

julia> @time Demultiplexer(LongDNASeq.(["GGAGAAGAAGAAGAA"]), n_max_errors=2, distance=:hamming)
  0.010063 seconds (50.47 k allocations: 3.590 MiB)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 2

julia> @time Demultiplexer(LongDNASeq.(["GGAGAAGAAGAAGAA"]), n_max_errors=3, distance=:hamming)
  0.193055 seconds (1.08 M allocations: 58.884 MiB)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 3

julia> @time Demultiplexer(LongDNASeq.(["GGAGAAGAAGAAGAA"]), n_max_errors=4, distance=:hamming)
  3.394650 seconds (15.94 M allocations: 734.229 MiB, 10.49% gc time)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 4

julia> @time Demultiplexer(LongDNASeq.(["GGAGAAGAAGAAGAA"]), n_max_errors=5, distance=:hamming)
 39.984839 seconds (169.53 M allocations: 7.118 GiB, 9.05% gc time)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 5

My Environment

julia> versioninfo()
Julia Version 1.4.0
Commit b8e9a9ecc6 (2020-03-21 16:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, sandybridge)

julia> Pkg.status("BioSequences")
Status `~/.julia/environments/v1.4/Project.toml`
  [7e6ae17a] BioSequences v2.0.1

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demultiplexer scales badly with Hamming distance #103

Demultiplexer scales badly with Hamming distance #103

tp2750 commented Jun 1, 2020

Demultiplexer scales badly with Hamming distance #103

Demultiplexer scales badly with Hamming distance #103

Comments

tp2750 commented Jun 1, 2020

Background

Current Behavior

Desired Behavior

Steps to reproduce

My Environment