TEAM Motif_Detectives

Motif detectives proyect for CSHL "Programming for Biology" course 2022

AUTHORS

Brianda Lopez Aviña / Marleny García Lozano / Bana Abolibdeh / Chrissi Heil / Mina Peyton / Aparna Thomas

Collaboration with TA Cynthia Cardinault (Centro de Investigación Científica de Yucatán)

Description

The purpose of this project is to offer the user a program to identify an specific motif/consensus sequence, a.k.a transcription factor binding site, on their organism genome of interest.

The transcription factors (TF) are proteins that activate or repress gene expression by binding to consensus sequences located at the start of the gene (promoter). Determining the localization of TF-binding sequences will help us to identify direct targets gene on genomes; one of the most challenging problems in molecular biology and bioinformatics.

Files used

INPUTS (links below):

GENOME.fa
GENOME.gff
selected MOTIF

OUTPUT:

FILE.txt To test the code developed, we will use the C. elegans genome (specifically, Chr 1) and the Retinoic Acid Response Element motif (RARE-DR5)

INPUT FILES:

FASTA FILE Caenorhabditis_elegans.WBcel235.dna.chromosome.I.fa.gz
GFF FILE Caenorhabditis_elegans.WBcel235.108.chromosome.I.gff3.gz
MOTIF regular expression - RARE/DR5 ([A|G]G[G|T]T[C|G]A.....[A|G]G[G|T]T[C|G]A)

Steps

Figure1. Motif finder pipeline

1. Fasta parser; extracting data fields from GENOME.fa.gz file -> included in motif_finder_version2_BA.py to run this script, be sure to download in the same directory the md_fasta_parser.py which is the source for fasta parser fuction
2. Search for the motif sequence on the genome fasta sequence; extract motif coordinates (# start nucleotide, # end nucleotide) -> motif_finder_version2_BA.py to run this script, be sure to download in the same directory the md_fasta_parser.py which is the source for fasta parser fuction
STDOUT from Step2= motif_hit_out.txt
3. gff parse; extract chromosome number, feature (exon, CDS, mRNA, etc), feature coordinates (# start nucleotide, # end nucleotide) and description (gene_ID, protein_ID) -> step included on gff3_motif_analyzer.py
4. Determine which genes on the genome have the motif pair motif coordinates extracted in Step2 with feature coordinates extracted on Step3 to determine where in the chromosome the motif is located. This returns a a list of motifs associated with the gene_ID -> step iincluded on gff3_motif_analyzer.py (STDOUT= mapped_motif_hits.out)

Additional step: Application of our scripts on DEGs from RNAseq data
Is your motif present in differentally expressed genes?
To answer that question, we proposed to use RNAseq data from two stages in development of C. elegans: oocyta and one-cell stage embryo from the paper:Global characterization of the oocyte-to-embryo transition in Caenorhabditis elegans uncovers a novel mRNA clearance mechanism

INPUTS for DEGs Analysis:

list of DEGs from data base (already prepared by the authors up_genes_1cell_embryo.tx)
list of motif hits obtained in step4 mapped_motif_hits.out

This analysis work running only one code -> SCRIPT NAME: [Op_genes_motifs.py](https://github.com/cyntsc/Motif_Detectives/blob/main/op_genes_motifs.py)

OUTPUTS:
- list of DEGs that have the motif in their sequence (STDOUT= [up_genes_match_motif_out.txt](https://github.com/cyntsc/Motif_Detectives/blob/main/up_genes_match_motif_out.txt))

Development of GUI

In order to present this program as user-friendly plataform, we implemented a graphical user interface (GUI) with the previous scripts. By running a Python script (name.py) a window will pop-out for the user to type the motif sequence of interest, and the program will display on the GUI the same output as Step4.

SCRIPT = md_gui.py

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
Fig_1_Motif_finderPIPELINE.png		Fig_1_Motif_finderPIPELINE.png
Fig_2_GUI.png		Fig_2_GUI.png
README.md		README.md
gff3_motif_analyzer.py		gff3_motif_analyzer.py
mapped_motif_hits.out		mapped_motif_hits.out
md_fasta_parser.py		md_fasta_parser.py
md_gui.py		md_gui.py
motif_finder.py		motif_finder.py
motif_finder_version2_BA.py		motif_finder_version2_BA.py
motif_hit_out.txt		motif_hit_out.txt
op_genes_motifs.py		op_genes_motifs.py
parse_gtf.py		parse_gtf.py
step3ParseGTF.py		step3ParseGTF.py
up_genes_1cell_embryo.txt		up_genes_1cell_embryo.txt
up_genes_match_motif_out.txt		up_genes_match_motif_out.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEAM Motif_Detectives

AUTHORS

Description

Files used

Steps

Development of GUI

About

Releases

Packages

Languages

hitowie/Motif_Detectives

Folders and files

Latest commit

History

Repository files navigation

TEAM Motif_Detectives

AUTHORS

Description

Files used

Steps

Development of GUI

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages