Skip to content

Commit

Permalink
Merge branch 'dev' into 'master'
Browse files Browse the repository at this point in the history
release 0.6.7

See merge request tron/bnt_neoants/splice2neo!122
  • Loading branch information
ibn-salem committed Jun 7, 2024
2 parents 2fe810d + 0bdd5be commit 835ca7d
Show file tree
Hide file tree
Showing 79 changed files with 2,534 additions and 180 deletions.
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@
^pkgdown$
^\.github$
^codecov\.yml$
^doc$
^Meta$
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@ docs
fix_issue_*.R
vignettes/*.html
vignettes/*.R
/doc/
/Meta/
18 changes: 11 additions & 7 deletions .gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
image: bioconductor/bioconductor_docker:RELEASE_3_17
image: bioconductor/bioconductor_docker:RELEASE_3_19

stages:
- check
- deploy

testing:
artifacts:
when: on_failure
paths:
- testing.log
stage: check
script:
- R -e 'install.packages("devtools")'
- R -e 'devtools::install()'
- R -e 'install.packages("devtools", verbose = FALSE, quiet = TRUE)'
- R -e 'devtools::install(quiet=TRUE)'
- R -e 'devtools::check()'

pages:
stage: deploy
dependencies:
- testing
script:
- R -e 'install.packages("devtools")'
- R -e 'devtools::install()'
- R -e 'install.packages("pkgdown")'
- R -e "pkgdown::build_site()"
- R -e 'install.packages("devtools", verbose = FALSE, quiet = TRUE)'
- R -e 'devtools::install(quiet=TRUE)'
- R -e 'install.packages("pkgdown", verbose = FALSE, quiet = TRUE)'
- R -e 'pkgdown::build_site()'
- mkdir -p public
- cp -r docs/* public
artifacts:
Expand Down
35 changes: 22 additions & 13 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: splice2neo
Title: Aberrant splice junction analysis with associated mutations
Version: 0.6.6
Version: 0.6.7
Authors@R:
c(
person(given = "Jonas",
Expand All @@ -11,26 +11,36 @@ Authors@R:
person(given = "Franziska",
family = "Lang",
role = c("aut"),
email = "[email protected]")
email = "[email protected]"),
person(given = "Michelle",
family = "Cervante",
role = c("aut")),
person(given = "Johannes",
family = "Hausmann",
role = c("aut")),
person(given = "Patrick",
family = "Sorn",
role = c("aut"))
)
Description: This package provides functions for the analysis of alternative or
aberrant splicing junctions and their creation from or association with
somatic mutations. It integrates the output of several tools which predict
splicing effects from mutation or RNA-seq data into a common splice junction
format based on genomic coordinates. Splice junctions can be annotated with
affected transcript sequences, CDS, and resulting peptide sequences.
Description: This package provides functions for analyzing alternative splicing
and aberrant splice junctions and their creation from or association with
somatic mutations. It integrates the output of several tools that predict
splicing effects from mutation or RNA-seq data into a unified splice junction
format. Splice junctions can be annotated with affected transcript sequences,
CDS, and resulting peptide sequences.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
RoxygenNote: 7.3.1
Suggests:
spelling,
knitr,
testthat (>= 3.0.0),
covr,
ggbio,
rmarkdown
rmarkdown,
BSgenome,
Config/testthat/edition: 3
Language: en-US
Imports:
Expand All @@ -41,7 +51,7 @@ Imports:
magrittr,
tibble,
stringr (>= 1.5.0),
dplyr,
dplyr (>= 1.1.1),
readr,
tidyr,
vcfR,
Expand All @@ -52,8 +62,7 @@ Imports:
XVector,
rlang,
rtracklayer,
BSgenome.Hsapiens.UCSC.hg19,
BSgenome
BSgenome.Hsapiens.UCSC.hg19
URL: https://github.com/TRON-Bioinformatics/splice2neo
Depends:
R (>= 2.10)
Expand Down
18 changes: 18 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,11 @@ export(canonical_junctions)
export(choose_tx)
export(combine_mut_junc)
export(exon_in_intron)
export(filter_irfinder_txt)
export(format_cispliceai_thresh)
export(format_pangolin)
export(format_spliceai)
export(format_spliceai_thresh)
export(generate_combined_dataset)
export(generate_junction_id)
export(get_exon_inclusion_junction)
Expand All @@ -23,27 +26,42 @@ export(get_intronretention_alt_pos)
export(get_intronretention_genomic_alt_pos)
export(get_junc_pos)
export(import_leafcutter_counts)
export(import_regtools_junc)
export(import_spladder)
export(import_stringtie_gtf)
export(is_canonical)
export(is_in_rnaseq)
export(junc2breakpoint)
export(junc_to_gr)
export(leafcutter_transform)
export(map_requant)
export(modify_tx)
export(parse_cispliceai_thresh)
export(parse_gtf)
export(parse_irfinder_txt)
export(parse_mmsplice)
export(parse_pangolin)
export(parse_spliceai)
export(parse_spliceai_thresh)
export(parse_star_sj)
export(read_requant)
export(read_suppa_ioe)
export(regtools_transform)
export(seq_truncate_nonstop)
export(spladder_transform)
export(spladder_transform_format)
export(stringtie_transform)
export(stringtie_transform_format)
export(suppa_import)
export(suppa_transform)
export(suppa_transform_format)
export(transform_for_requant)
export(transform_leafcutter_counts)
export(transform_regtools_junc)
export(unique_junc_mmsplice)
export(unique_mut_junc)
import(dplyr)
import(purrr)
import(readr)
import(stringr)
import(tidyr)
Expand Down
36 changes: 34 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,35 @@
# splice2neo 0.6.7

## Added:

- Functionality from splice2neo_neoants:
- CI-SpliceAI parsing and formatting functions with example data and tests
- IRFinder parsing and filtering functions with example data and tests
- StringTie parsing and formatting functions
- SUPPA2 parsing and formatting functions
- STAR parsing and formatting functions with example data and tests
- RegTools parsing and formatting functions with example data and tests
- add test for empty CI-SpliceAI input file
- add more global variables to avoid notes in R CMD check

## Changed:

- Update CI/CD pipeline for less verbosity during R pkg installations
- Return single-row data.frame for empty input in `annotate_mut_effect()`
- include 'number_of_supporting_reads` when parsing SplAdder output
- generalize `generate_combined_dataset()` to multiple inputs
- Update README with list of supported tools
- add option consider_intron_retention to `annotate_mut_effect()`
- adjust syntax for dplyr >= 1.1.1
- remove rarely used pkgs from Imports
- simplify code in parse_spliceai_thresh()
- minor spell check fixes and doc updates

## Fixed:

- fix warnings due to NA's in CI-SpliceAI parsing function
- fix use of `generate_combined_dataset()` in vignette

# splice2neo 0.6.6

* add vignette with example workflow
Expand Down Expand Up @@ -68,7 +100,7 @@

# splice2neo 0.5.0

* intron rentetion events are now supported by `add_context_seq()`. the resulting context sequence covers the complete intron instead of the exon/intron boundary only. Instead of a the junction position in the cts_seq, the positions are given in form of an interval in the `cts_junc_pos` column for intron retentions. (0,start_IR, end_IR, end_cts)
* intron retention events are now supported by `add_context_seq()`. the resulting context sequence covers the complete intron instead of the exon/intron boundary only. Instead of a the junction position in the cts_seq, the positions are given in form of an interval in the `cts_junc_pos` column for intron retentions. (0,start_IR, end_IR, end_cts)
* `add_peptide()` was adjusted for intron retentions
* more tests were added for several functions
* fix small bug in `annotate_spliceai_junction()` that led to annotation with same transcript
Expand All @@ -83,7 +115,7 @@

# splice2neo 0.4.0

* Leafcutterr: The strand information while leafcutter parsing is now retrieved from the cluster id in the count table.
* Leafcutter: The strand information while leafcutter parsing is now retrieved from the cluster id in the count table.
* Spladder: previous code changes missed to consider the event type in generation of the junc id for alternative splice sites. This is fixed now.
* More tests were added
* An old bug in spladder_transform_mutex_exon was fixed
Expand Down
34 changes: 17 additions & 17 deletions R/add_peptide.R
Original file line number Diff line number Diff line change
Expand Up @@ -106,23 +106,23 @@ add_peptide <- function(df, cds, flanking_size = 14, bsg = NULL, keep_ranges = F
# extract context sequence from full peptide and cut before stop codon (*)
df_positions <- df_sub %>%
dplyr::mutate(
intron_retention = intron_retention,
strand = str_sub(df_sub$junc_id, -1),
protein = protein %>% as.character(),
protein_wt = protein_wt %>% as.character(),
frame_shift = frame_shift,
junc_in_cds = junc_in_cds,
cds_mod_id = stringr::str_c(tx_id, "|", junc_id),
cds_length_difference = cds_length_difference,
junc_pos_cds = junc_pos_cds,
junc_pos_cds_wt = junc_pos_cds_wt,
# Get context peptides around junction
protein_junc_pos = ceiling(junc_pos_cds / 3),
# end for IRs
protein_length_difference = ifelse(!frame_shift &
!intron_retention, cds_length_difference / 3, NA),
protein_len = as.numeric(BiocGenerics::width(protein))
)
intron_retention = intron_retention,
strand = str_sub(df_sub$junc_id, -1),
protein = protein %>% as.character(),
protein_wt = protein_wt %>% as.character(),
frame_shift = frame_shift,
junc_in_cds = junc_in_cds,
cds_mod_id = stringr::str_c(tx_id, "|", junc_id),
cds_length_difference = cds_length_difference,
junc_pos_cds = junc_pos_cds,
junc_pos_cds_wt = junc_pos_cds_wt,
# Get context peptides around junction
protein_junc_pos = ceiling(junc_pos_cds / 3),
# end for IRs
protein_length_difference = ifelse(!frame_shift &
!intron_retention, cds_length_difference / 3, NA),
protein_len = as.numeric(BiocGenerics::width(protein))
)

df_positions <- df_positions %>%
is_first_reading_frame() %>%
Expand Down
61 changes: 54 additions & 7 deletions R/annotate_mut_effect.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
#' the relevant genomic position are considered.
#' If `gene_mapping` is TRUE, potentially affected transcripts from the gene
#' provided in `effect_df` that cover the relevant genomic positions are considered.
#' @param consider_intron_retention Indicator whether intron retention events
#' should be considered as splicing event types when building the resulting
#' splice junctions.
#'
#' @return A data.frame with with additional rows and columns including the
#' splice junction in the column `junc_id`.
Expand All @@ -38,7 +41,18 @@
#'
#'
#' @export
annotate_mut_effect <- function(effect_df, transcripts, transcripts_gr, gene_mapping = FALSE){
annotate_mut_effect <- function(effect_df,
transcripts,
transcripts_gr,
gene_mapping = FALSE,
consider_intron_retention = TRUE){

# choose rules based on intron retention consideration
if(consider_intron_retention){
effect_rule_table = effect_to_junction_rules
} else{
effect_rule_table = effect_to_junction_rules_wo_ir
}

effect_df <- effect_df %>%
mutate(
Expand All @@ -55,7 +69,8 @@ annotate_mut_effect <- function(effect_df, transcripts, transcripts_gr, gene_map
names(var_gr) <- effect_df$effect_index

message("INFO: calculate coordinates of upstream and downstream exons...")
# get all possible junctions of by start and end coordinates of upsteam and downstream exons

# get all possible junctions by start and end coordinates of upstream and downstream exons
next_junc_df <- next_junctions(var_gr, transcripts, transcripts_gr)

message("INFO: calculate junction coordinates from predicted effect...")
Expand All @@ -69,16 +84,34 @@ annotate_mut_effect <- function(effect_df, transcripts, transcripts_gr, gene_map
filter(
effect != "DL" | at_end,
effect != "AL" | at_start
) %>%
)

# add rules
# return empty tibble if non of the junctions fulfill the above filters (might be a problem in low mutation burden cases)
if(nrow(junc_df) == 0) {
junc_df <- junc_df %>%
tibble::add_column(
"class"= NA,
"rule_left"= NA,
"rule_right"= NA,
"strand_offset"= NA,
"coord_1"= NA ,
"coord_2"= NA,
"left"= NA,
"right" = NA,
"junc_id"= NA,
"tx_junc_id"= NA)
return(junc_df)
}

# add rules
junc_df <- junc_df %>%
mutate(
effect = as.character(effect),
# pos = as.integer(POS) + pos_rel
) %>%
left_join(
effect_to_junction_rules,
by = c("effect")
effect_rule_table,
by = c("effect"),
relationship = "many-to-many"
) %>%

# apply rules
Expand Down Expand Up @@ -250,6 +283,7 @@ next_junctions <- function(var_gr, transcripts, transcripts_gr){
}

#' Rules on how a splicing affecting variant creates a junction
#' @keywords internal
effect_to_junction_rules <- tribble(
~effect, ~event_type, ~rule_left, ~rule_right,
"DL", "intron retention", "pos", "pos + strand_offset",
Expand All @@ -262,3 +296,16 @@ effect_to_junction_rules <- tribble(

"AG", "alternative 3prim", "upstream_end", "pos",
)


#' Rules on how a splicing affecting variant creates a junction without intron
#' retention
#' @keywords internal
effect_to_junction_rules_wo_ir <- tribble(
~effect, ~event_type, ~rule_left, ~rule_right,
"DL", "exon skipping", "upstream_end", "downstream_start",
"DG", "alternative 5prim", "pos", "downstream_start",
"AL", "exon skipping", "upstream_end", "downstream_start",
"AG", "alternative 3prim", "upstream_end", "pos",
)

2 changes: 1 addition & 1 deletion R/choose_tx.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#' This function selects transcripts that are more likely to be affected to reduce the amount of junction and transcript combinations.
#' The function excludes transcripts for which both junction positions are located in an intron. Junctions in a given transcript must either represent an
#' exon skipping, intron retention, exitron, or alternative splice site event or have both junction positions in an exon. Other junction-transcript combinations are also excluded.
#' This function may loose relevant or keep irrelevant junction-transcripts in particular in regions with mutliple isoforms with distinct splicing pattern.
#' This function may loose relevant or keep irrelevant junction-transcripts in particular in regions with multiple isoforms with distinct splicing pattern.
#'
#' @examples
#' junc_df <- tibble::tibble(
Expand Down
4 changes: 3 additions & 1 deletion R/combine_mut_junc.R
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,9 @@ combine_mut_junc <- function(junc_data_list){

for (df in junc_data_list_names){
junc_df <- junc_df %>%
left_join(df, by = c("mut_id", "tx_id", "junc_id"))
left_join(df,
by = c("mut_id", "tx_id", "junc_id"),
relationship = "many-to-many")
}

return(junc_df)
Expand Down
Loading

0 comments on commit 835ca7d

Please sign in to comment.