Skip to content

Motif detectives – final project for PFB2022

Notifications You must be signed in to change notification settings

hitowie/Motif_Detectives

 
 

Repository files navigation

TEAM Motif_Detectives

Motif detectives proyect for CSHL "Programming for Biology" course 2022

AUTHORS

Brianda Lopez Aviña / Marleny García Lozano / Bana Abolibdeh / Chrissi Heil / Mina Peyton / Aparna Thomas

Collaboration with TA Cynthia Cardinault (Centro de Investigación Científica de Yucatán)

Description

The purpose of this project is to offer the user a program to identify an specific motif/consensus sequence, a.k.a transcription factor binding site, on their organism genome of interest.

The transcription factors (TF) are proteins that activate or repress gene expression by binding to consensus sequences located at the start of the gene (promoter). Determining the localization of TF-binding sequences will help us to identify direct targets gene on genomes; one of the most challenging problems in molecular biology and bioinformatics.

Files used

INPUTS (links below):

  • GENOME.fa
  • GENOME.gff
  • selected MOTIF

OUTPUT:

INPUT FILES:

Steps

Figure1. Motif finder pipeline

MOTIF FINDER PROGRAM

1. Fasta parser; extracting data fields from GENOME.fa.gz file -> included in motif_finder_version2_BA.py to run this script, be sure to download in the same directory the md_fasta_parser.py which is the source for fasta parser fuction
2. Search for the motif sequence on the genome fasta sequence; extract motif coordinates (# start nucleotide, # end nucleotide) -> motif_finder_version2_BA.py to run this script, be sure to download in the same directory the md_fasta_parser.py which is the source for fasta parser fuction
STDOUT from Step2= motif_hit_out.txt
3. gff parse; extract chromosome number, feature (exon, CDS, mRNA, etc), feature coordinates (# start nucleotide, # end nucleotide) and description (gene_ID, protein_ID) -> step included on gff3_motif_analyzer.py
4. Determine which genes on the genome have the motif pair motif coordinates extracted in Step2 with feature coordinates extracted on Step3 to determine where in the chromosome the motif is located. This returns a a list of motifs associated with the gene_ID -> step iincluded on gff3_motif_analyzer.py (STDOUT= mapped_motif_hits.out)

Additional step: Application of our scripts on DEGs from RNAseq data
Is your motif present in differentally expressed genes?
To answer that question, we proposed to use RNAseq data from two stages in development of C. elegans: oocyta and one-cell stage embryo from the paper:Global characterization of the oocyte-to-embryo transition in Caenorhabditis elegans uncovers a novel mRNA clearance mechanism

INPUTS for DEGs Analysis:


This analysis work running only one code -> SCRIPT NAME: [Op_genes_motifs.py](https://github.com/cyntsc/Motif_Detectives/blob/main/op_genes_motifs.py)

OUTPUTS:
- list of DEGs that have the motif in their sequence (STDOUT= [up_genes_match_motif_out.txt](https://github.com/cyntsc/Motif_Detectives/blob/main/up_genes_match_motif_out.txt))

Development of GUI

In order to present this program as user-friendly plataform, we implemented a graphical user interface (GUI) with the previous scripts. By running a Python script (name.py) a window will pop-out for the user to type the motif sequence of interest, and the program will display on the GUI the same output as Step4.

SCRIPT = md_gui.py

VISUAL_I

About

Motif detectives – final project for PFB2022

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%