Skip to content

Latest commit

 

History

History
88 lines (67 loc) · 4.02 KB

database-mirrors.rst

File metadata and controls

88 lines (67 loc) · 4.02 KB

Database Mirrors

We maintain local mirrors of several popular bioinformatics databases which are accessible over the high-performance storage network from any node in the cluster.

All databases can be found at: /mnt/shared/datasets/databases/.

Important

The databases are updated and/or synced from their master copies at 1am on the first Sunday of each month. You may wish to avoid using them during this time in case any active files are changed.

NCBI BLAST

Copies of many NCBI BLAST databases are available at: /mnt/shared/datasets/databases/ncbi/. You can tell the command line BLAST tools to search here by setting an environment variable:

$ export BLASTDB=/mnt/shared/datasets/databases/ncbi/

The following databases are currently available:

Source Name Description
NCBI Cdd.* Protein domain database (for RPS-BLAST etc), the Conserved Domain Database (CDD) is compiled from PFAM, SMART, etc by the NCBI.
NCBI cdd_delta.* Protein domain database based on the Conserved Domain Database (CDD), compiled specifically for the DELTA-BLAST tool.
NCBI Cog.* Protein domain database (for RPS-BLAST etc) using sequences classified in the COGs resource, which focuses primarily on prokaryotes.
NCBI Kog.* Protein domain database (for RPS-BLAST etc) using sequences classified in the KOGs resource, the eukaryotic counterpart to COGs, see http://www.ncbi.nlm.nih.gov/COG/new/
NCBI nr.* A collection of protein sequences with entries from GenPept, Swissprot, PDB, PRF, PIR and NCBI Reference Sequence (RefSeq) project.
NCBI nt.* The nucleotide sequence database contains entries from traditional divisions of GenBank, EMBL and DDBJ. Sequences from bulk divisions, like gss, sts, pat, est and htg, as well as environmental sequences and whole genome shotgun assemblies are excluded.
NCBI pdbaa.* An alias database file marking a subset of nr database with entries from PDB protein structures. Its function requires the nr.
NCBI pdbnt.* An alias database containing nucleotide sequences from PDB structures. Its function requires the nt database.
NCBI Pfam.* Protein domain database (for RPS-BLAST etc) using the Pfam-A seed alignment database, see http://pfam.sanger.ac.uk/
NCBI Prk.* Protein domain database (for RPS-BLAST etc) using sequences classified as stable clusters in the Protein Clusters database
NCBI Smart.* Protein domain database (for RPS-BLAST etc) using the Smart domain alignment database, see http://smart.embl-heidelberg.de/
NCBI swissprot.* An alias database file marking a subset of nr database with entries from the swiss-prot sequence database (last major update). Its function requires the nr database.
NCBI Tigr.* Protein domain database (for RPS-BLAST etc) using models from the TIGRFAM database of protein families, see http://www.jcvi.org/cms/research/projects/tigrfams/overview/
NCBI taxdb.* A non-sequence database file containing taxonomic information for sequences in the pre-formatted databases providing common and scientific names for each entry.

Pfam

Copies of several popular Pfam databases are available at: /mnt/shared/datasets/databases/pfam-31 and /pfam-35 (mirrored from http://ftp.ebi.ac.uk/pub/databases/Pfam/releases).

Uniprot

A full copy of Uniprot is available at: /mnt/shared/datasets/databases/uniprot (mirrored from ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/).