| Website | https://www.ebi.ac.uk/chembl/ | | SPARQL endpoint | https://chemblmirror.rdf.bigcat-bioinformatics.org/sparql/ | | License | CC BY-SA 3.0 |
ChEMBL is a manually curated database of bioactive molecules with drug-like properties [Q27144224,Q27062334]. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs. Built upon the ChEMBL database, an RDF representation of the ChEMBL database is produced by the European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI) and provided for download. The ChEMBL RDF model uses a basic internal ontology, called the ChEMBL Core Ontology (CCO), to identify all of the primary entities (e.g., Documents, Assays, Substances, Targets) in the ChEMBL database.
The Department of Bioinformatics (BiGCaT) at Maastricht University took the initiative to host the RDF and expose it to the scientific community through a SPARQL endpoint where queries can be executed against the RDF to find answers to biological questions. The tool is available through https://chemblmirror.rdf.bigcat-bioinformatics.org/.
The main classes are:
- Protein: protein targets
- Metabolite: ligand, e.g. drug / drug-like compound
- Assay: measures some property of, for example, the protein-ligand binding
- Document: source of the data or knowledge
The simplest SPARQL queries to explore RDF is to retrieve full lists of subjects of a
particular type, which is frequently defined with the predicate rdfs:type
or "a" which
can be used interchangably. The type itself can be part of a hierarchy and then we can
specify the type of a particular subclass using the predicate rdfs:subClassOf
. See
the below example of listing all molecules in the ChEMBL RDF where the molecule type is
a subclass of the cco:Substance
class.
substances
chemblSources
This query gets the count of the assays used to measure the activity of the molecule with ID (CHEMBL294873):
countingChEMBLAssays
To get all assay, binding affinity type (Kd, Ki, IC50) and affinity value for all compounds
targeting Thrombin protein (CHEMBL204
):
bindingAffinities
The above query get assays and molecules information along with binding affinity type and value limited ti the top 100 entry.
To list all assays, target names and UniProt IDs for the drug Paracetamol (CHEBI:46195
):
paracetamol
The above query will get assay types, target and UniProt identifier for all the proteins tested fo binding with a single molecule (chembl_molecule:CHEMBL46195):
paracetamol