Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intronic/'upstream' last exons - expand quantification regions to upstream penultimate exon/ shared region & add decoy transcripts #36

Open
SamBryce-Smith opened this issue Oct 13, 2022 · 1 comment
Labels
enhancement New feature or request high priority This should be worked asap

Comments

@SamBryce-Smith
Copy link
Member

For gene-body internal last exons currently the uniquely-exonic regions for each last exon are passed as regions for Salmon to quantify. e.g. for a bleedthrough event, the annotated internal exon is subtracted from the last exon and only the region unique to the last exon is used to quantify the region.

image

Whilst simple, it has a few drawbacks:

  • shortening the region of which to count reads. This will penalise power to detect DU of short last exons.
  • Doesn't take into account other processing decisions that can happen at that region e.g. intron retention. All reads aligned to that region will be assigned to the last exon, which in some cases will be erroneous
  • Salmon requires both reads of a fragment/pair to be compatible with the transcript in order for them to be used for quantification. This means rightmost (+ strand) / leftmost (- strand) alignments of mates which fall at the start of the last exon will be not be considered for quantification b/c their mate will not align to the last exon. see screenshot below, the first few blue alignments are right-most in their pairs and just overlap the start of the last exon

cnpy3_example_2nd_pair_alignments

One way to get around this (and to avoid problems of intron retention being mis-assigned to last exon) is to:

  • Expand the last exons to the upstream annotated exon (allow more fragments to be aligned to tx)
  • Add 'decoy transcripts' which include all permutations of processing decisions that do not include using the last exon. E.g. for CNPY example above that would be splicing out from the shared last exon to exon downstream and retention of the intron
@SamBryce-Smith SamBryce-Smith added enhancement New feature or request high priority This should be worked asap labels Oct 13, 2022
@SamBryce-Smith
Copy link
Member Author

take a look at the --recoverOrphans option - if it uses the genome decoys to search upstream of the last exon it may be able to recover some of these alignments?

Even if so, would still be useful to implement this feature RE the point of alternative processing decisions, especially intron retention

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high priority This should be worked asap
Projects
None yet
Development

No branches or pull requests

1 participant