prot_pep_scan

Scan protein sequences for permutations of amino acid subsequences

proteome_consec_pep_scan.py

Finds sequences with the longest subsequences that match consecutive arrays of sub-peptide sequences. You can run this script with something like:

python3 proteome_consec_pep_scan.py /data/p3_ortho/seqs_metazoa/*.fasta.gz -p PEW PLP IRP GGP GPP -n 7 -o eukarya_hits.tsv

Command line help can be accessed via:

$> python3 proteome_consec_pep_scan.py -h

usage

python3 proteome_consec_pep_scan.py [-h] [-p PEP_SEQ [PEP_SEQ ...]] [-n PEP_COUNT] [-o OUT_TSV_FILE] FASTA_FILE [FASTA_FILE ...]

positional arguments

FASTA_FILE One or more FASTA format sequence file paths (separated by spaces). Wildcards accepted.

optional arguments

-p PEP_SEQ [PEP_SEQ ...] Peptide amino acid sub-sequences to search for. May include "X" to match any amino acid. Space-separated, without quotes. For example: RKL PGG STQ

-n PEP_COUNT, --min-num-consec PEP_COUNT Minimum number of consecutive sub-peptide sequences. Default 3

-g GAP_COUNT, --max-num-gaps GAP_COUNT Maximum number of unspecified, internal sub-peptides; gaps of sub-peptide length. Default 0

-o OUT_TSV_FILE, --out-file OUT_TSV_FILE Optional output file to write results as a tab- separated table

For speed this uses the numba Python module (https://numba.pydata.org/).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
proteome_consec_pep_scan.py		proteome_consec_pep_scan.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prot_pep_scan

proteome_consec_pep_scan.py

usage

positional arguments

optional arguments

About

Releases

Packages

Languages

tjs23/prot_pep_scan

Folders and files

Latest commit

History

Repository files navigation

prot_pep_scan

proteome_consec_pep_scan.py

usage

positional arguments

optional arguments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages