Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demultiplexer scales badly with Hamming distance #103

Open
tp2750 opened this issue Jun 1, 2020 · 0 comments
Open

Demultiplexer scales badly with Hamming distance #103

tp2750 opened this issue Jun 1, 2020 · 0 comments

Comments

@tp2750
Copy link
Contributor

tp2750 commented Jun 1, 2020

Background

I'm using Demultiplexer() to demultiplex nanopore reads.
This works well, but when allowing more errors in the barcodes, the time to generate the demultiplexer grows very fast.

Current Behavior

Allowing one more error cost more than 10 times longer in terms of time and allocations.

Desired Behavior

It would be great if it was faster.

Steps to reproduce

julia> @time Demultiplexer(LongDNASeq.(["GGAGAAGAAGAAGAA"]), n_max_errors=1, distance=:hamming)
  0.000388 seconds (1.56 k allocations: 162.047 KiB)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 1

julia> @time Demultiplexer(LongDNASeq.(["GGAGAAGAAGAAGAA"]), n_max_errors=2, distance=:hamming)
  0.010063 seconds (50.47 k allocations: 3.590 MiB)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 2

julia> @time Demultiplexer(LongDNASeq.(["GGAGAAGAAGAAGAA"]), n_max_errors=3, distance=:hamming)
  0.193055 seconds (1.08 M allocations: 58.884 MiB)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 3

julia> @time Demultiplexer(LongDNASeq.(["GGAGAAGAAGAAGAA"]), n_max_errors=4, distance=:hamming)
  3.394650 seconds (15.94 M allocations: 734.229 MiB, 10.49% gc time)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 4

julia> @time Demultiplexer(LongDNASeq.(["GGAGAAGAAGAAGAA"]), n_max_errors=5, distance=:hamming)
 39.984839 seconds (169.53 M allocations: 7.118 GiB, 9.05% gc time)
Demultiplexer{LongSequence{DNAAlphabet{4}}}:
  distance: hamming
  number of barcodes: 1
  number of correctable errors: 5

My Environment

julia> versioninfo()
Julia Version 1.4.0
Commit b8e9a9ecc6 (2020-03-21 16:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, sandybridge)

julia> Pkg.status("BioSequences")
Status `~/.julia/environments/v1.4/Project.toml`
  [7e6ae17a] BioSequences v2.0.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant