-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
97a6f64
commit 6531cd1
Showing
1 changed file
with
110 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
#+title: [WIP] Comments on GreedyMini | ||
#+filetags: @paper-review minimizers wip | ||
#+OPTIONS: ^:{} num: num: | ||
#+hugo_front_matter_key_replace: author>authors | ||
#+toc: headlines 3 | ||
#+date: <2024-11-04 Mon> | ||
|
||
These are some (biased) comments on [cite/title/b:@greedymini] | ||
[cite:@greedymini], which introduces the =GreedyMini= and =GreedyMini+= | ||
minimizer schemes. | ||
|
||
TODO: Also write some comments on [cite/title/b:@random-mini-density] [cite:@random-mini-density]. | ||
|
||
* Overview | ||
|
||
* Detailed comments | ||
** Terminology | ||
A number of terms in the preprint deviate from established terminology (mostly by the | ||
papers of Zheng and Marçais): | ||
- Usually =particular density= is used for specific sequences, and =density= is | ||
the expected particular density on a random string. Thus, to me, =expected | ||
density= parses as the expected value of the expected particular density, | ||
which is redundant. Prefer just =density=. | ||
- In [cite/t:@random-mini-density], /gamechanger/ was introduced instead of the more | ||
common /charged window/. But now it's /charged window/ again. | ||
- In [cite/t:@random-mini-density], /marked positions/ and /markup of $S$/ are | ||
introduced, but in the current work those are dropped again. | ||
|
||
|
||
|
||
|
||
** Abstract | ||
(I'm in a pedantic mood --- I'm sorry.) | ||
|
||
#+begin_quote | ||
Minimizers is [...] . It is [...] | ||
#+end_quote | ||
~Minimizers~ is plural? | ||
|
||
|
||
#+begin_quote | ||
Minimizers that achieve the smallest number of selected k-mers over a random | ||
DNA sequence, termed expected density, are desired [...]. | ||
#+end_quote | ||
- =(expected) density= is not the /smallest/ number of selected k-mers. | ||
- 'the number of selected k-mers over a random sequence' is missing an | ||
'expected' somewhere. | ||
- (nit) density is not only defined for DNA sequences. | ||
|
||
#+begin_quote | ||
Yet, no method to date exists to generate minimizers that achieve minimum | ||
expected density. | ||
#+end_quote | ||
What does =minimum= mean? Some existing scheme is best. Probably you mean | ||
=optimal= (equal to the lower bound) instead? But then you should qualify it | ||
with =for all $k, w, \sigma$= or so, since the mod-minimizer is near-optimal | ||
when e.g.~$k=w+1$ and $\sigma\to\infty$. | ||
|
||
#+begin_quote | ||
Moreover, existing selection schemes fail to achieve low density for values of | ||
$k$ and $w$ that are most practical for high-throughput sequencing algorithms | ||
and datastructures. | ||
#+end_quote | ||
You seem to focus mostly on $k \sim w$. In this regime, (double) decycling minimizers, | ||
are pretty good? And for $k>w$, the mod-minimizer is pretty good. | ||
(In general, I'd argue all existing schemes are already pretty close to the | ||
density lower bound, and hence that they all achieve 'low density'.) | ||
|
||
|
||
#+begin_quote | ||
Moreover, we present innovative techniques to transform minimizers [...] to | ||
larger $k$ values, [...]. | ||
#+end_quote | ||
This was done before by [cite/t:@asymptotic-optimal-minimizers], | ||
called the /naive extension/ (see 3.1.1). | ||
|
||
|
||
#+begin_quote | ||
practical values of $k$ and $w$ | ||
#+end_quote | ||
Can you be more specific? Since you are hiding exponential running times, what | ||
about e.g.~$(k, w) \sim (22, 42)$? | ||
|
||
Generally, it seems this method cannot go much beyond $k=15$ since it needs | ||
$\sigma^k$ memory? | ||
|
||
|
||
#+begin_quote | ||
both expected and particular densities | ||
#+end_quote | ||
So far you were using =density= to mean what previous work calls =particular | ||
density=, but now you also use =particular density=. Be consistent. | ||
|
||
#+begin_quote | ||
densities much lower compared to existing selection schemes | ||
#+end_quote | ||
Please quantify. | ||
|
||
#+begin_quote | ||
We expect =GreedyMini+= to improve the performance of many high-throughput | ||
sequencing algorithms and data structures | ||
#+end_quote | ||
One drawback of =GreedyMini= seems that it uses memory exponential in $k$. For | ||
$k>11$ or so, the order will likely not fit in cache, and each processed k-mer | ||
requires a read from main memory. Even in the best case, this will limit | ||
(single threaded) throughput to around $10ns$ per kmer, over $10\times$ slower | ||
than my fast minimizer implementation. | ||
|
||
|
||
#+print_bibliography: |