bedops_parse_star_junctions

Pipeline for taking STAR's SJ.out files and parsing the counts for a given bed of named spliced junctions

Edit the top lines of the .smk file to run correctly

There's two independent parts to the analysis, one will turn STAR's splice junction into bed files, sort them, and then use bedtools plus some awk to give a final output file that looks like this(without the header)

chromosome	start	end	filename_this_count_comes_from	count	strand	name_of_junction_in_your_input
chr19	7168094	7170537	Cont-B_S2.SJ.out	49	-	INSR_annotated
chr19	7168094	7170537	Cont-C_S3.SJ.out	30	-	INSR_annotated
chr19	7168094	7170537	Cont-D_S4.SJ.out	35	-	INSR_annotated
chr19	7168094	7170537	control_fluorescent_2.SJ.out	9	-	INSR_annotated
chr19	7168094	7170537	control_fluorescent_3.SJ.out	5	-	INSR_annotated
chr19	7168094	7170537	control_none_1.SJ.out	20	-	INSR_annotated

To use that part properly you'll want to edit parse_star_junctions.smk And tweak the following input

project_dir - this is a top level folder where the sorted beds, and outputs are going to end up

out_spot - a folder underneath project_dir that will be created, and sorted beds are output is going to appear

bam_spot - pipeline is fairly lazy, it's going to glob wild cards from this folder, so make sure all the samples you want to are in the same folder, (symlinks are fine!)

bam_suffix - suffix of the bams for pattern matching to work

sj_suffix - suffix of your splice junction tables for pattern matching to work

bed_file - a bed file of junctions you want to compare against

final_output_name - a name for your file. the final output file will be located in:

{project_dir}/{out_spot}/{final_output_name}.aggregated.clean.annotated.bed

This will contain only the junctions in bed_file and with the names of the junction and the names of the file it was found it

You'll also have a file called {project_dir}/{out_spot}/{final_output_name}.aggregated.bed

This is all the junctions which overlapped the ones in bed_file, useful to check if you expected junctions that weren't present because you might have an a one-off error.

Basic work flow is that the first rule will convert a SJ.out.tab to a bed file, and put the 'name' of each entry as the name of the file it's in

e.g. if I input a folder with samples called sample01.SJ.out.tab and sample02.SJ.out.tab

I'll get 2 beds that look like this

chrY 57208979 57209532 sample01.SJ.out 0 + chrY 57209059 57209219 sample01.SJ.out 0 +

chrY 57208979 57209532 sample02.SJ.out 0 + chrY 57209059 57209219 sample02.SJ.out 0 +

The second part uses Dasper to annotate relative to a GTF and convert STAR's splice junction counts to percent spliced in.

Feel free to try linking the 2 together but that part is highly developmental yet, so buyer-beware.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
.gitattributes		.gitattributes
README.md		README.md
call_element_of_one.sh		call_element_of_one.sh
cluster.yaml		cluster.yaml
cluster_qsub.sh		cluster_qsub.sh
combine_annotated_psi.R		combine_annotated_psi.R
convert_sj_to_psi.R		convert_sj_to_psi.R
convert_sj_to_psi.smk		convert_sj_to_psi.smk
parse_star_junctions.smk		parse_star_junctions.smk
splice_junction_psi_tobed.py		splice_junction_psi_tobed.py
splicejunction2bed.py		splicejunction2bed.py
submit_convert_sj.sh		submit_convert_sj.sh
submit_convert_sj_nosub.sh		submit_convert_sj_nosub.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bedops_parse_star_junctions

About

Releases

Packages

Languages

frattalab/bedops_parse_star_junctions

Folders and files

Latest commit

History

Repository files navigation

bedops_parse_star_junctions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages