Positive examples of using microscopy image-based chemical screening #12

agitter · 2019-02-27T17:32:05Z

Our latest results in #9 and #7 have given no indication that the cell images are meaningful for predicting chemical effects. There seems to be very little signal in this type of data. We may need to find some positive success stories of how this type of imaging data has been used for chemical screening, drug discovery, etc. to convince ourselves there is a meaningful way to link ChEMBL assays to these images or the Sanger drug sensitivity to these images.

https://www.recursionpharma.com/ works specifically in this area, so reminding ourselves of their successes may be a good place to start.

agitter · 2019-03-10T21:25:30Z

Here is a paper Anne Carpenter shared recently that reports positive results:
Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery
https://doi.org/10.1016/j.chembiol.2018.01.015

agitter · 2019-03-19T18:20:44Z

My notes on Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery

Use a "three-channel glucocorticoid receptor (GCR) HTI assay" from which they extract 842 Cell Profiler features
Normalize with mean and standard deviation of each feature, then compute median of normalized values for all cells in an image to get an image-derived fingerprint
Screen 524,371 proprietary compounds
Do not directly apply CNNs to the images, future work
Focus on multi-task feed forward neural networks and Bayesian matrix factorization for supervised learning
The NNs are 1-3 layers with 1024-4096 hidden units per layer.
Random forest and k-NN in supplement
Use AUC-ROC to find assays that produce reliable models, keep those for which 3-fold CV gives AUC-ROC > 0.9. This selects < 10% of the 535 assays in their dataset (Table 1).
Start with 1200 assays but require 25 actives and 25 inactives along with other criteria to get 535 assays
Careful design of cross validation folds using ECFP6 fingerprint clustering
Use the matrix factorization model predictions to do in vitro validation for an oncology project and a central nervous system (CNS) project. Not explained why they prioritize those two projects.
Their image-based prioritization gives more chemically diverse compounds than a random selection (Figure 5).
Oncology project screened 342 top-ranked compounds, 124 are hits. A 36.3% hit rate is excellent in my opinion, and much better than the full screen hit rate of 0.725%. They only considered 60k compounds.
They also train a matrix factorization model on the ECFP fingerprints but it is unclear whether they experimentally tested all of these. They look at where the top image-based compounds and hits lie in the ECFP-based ranked list. They claim "This shows that the image fingerprints clearly provided an additional source of information that is not encoded in the chemical fingerprints." but do the chemical fingerprints yield a better hit rate?
For the CNS project they consider all 500k compounds and select those predicted to be highly active and apply a PAINS filter and CNS filter.
Use a prioritization strategy that explicitly promotes chemical diversity. Select 141 compounds and find 36 hits (25.5% hit rate). Much better than initial screen with a 0.088% hit rate.
No discussion of ECFP based predictions in this part. "We leave for future work the head-to-head comparison of chemistry-based and image-based fingerprints... In the case of a well-covered chemical space, we would not expect image-based fingerprints to outperform a well-designed chemical fingerprint like ECFP".
They argue image-based features would be better for scaffold hopping, which is possibly true. Similar arguments have been made when using high-throughput assay activity as the chemical feature (HTS fingerprints).
Image-based fingerprinting is applicable to RNAi, antibodies, and other perturbations that are not small molecules.
Full AUC-ROC values are in Data S1. Their assays look better than what we have because many of them have thousands of compounds screened. We may be able to learn something by plotting their performance in the same style you use to evaluate your performance. Here's a quick look at the NN performance for all assays:

Comments:

We have intentionally avoided AUC-ROC in this domain because it is easy to get a good score even if the classifier or regression model performs poorly.
The idea to focus only on assays that can be predicted well from the images is very interesting. There is no need for the images to be useful for predicting activity on all assays.
1/2 million compounds is a huge screen compare to the Cell Painting data
The things we are interested in - CNNs and benchmarking with ECFP fingerprints - are both areas of future work they explicitly call out. Hopefully they are not too far along in testing these things.
The direct pharma involvement shows. They are not descriptive about their compounds or assays and have a very large initial screen to start from. "Due to the proprietary nature of the drug development process, we are unable to disclose specific information related to the chemical compounds and specific protein targets."
Therefore, we could not replicate or build upon this study.
This paper is somewhat encouraging. We may not need to have good performance on all assays. If we spend more time building strong Cell Profiler baselines and compare them to the ECFP baseline and something with a CNN, that would address a lot of important future work that they did not cover.

xiaohk · 2019-03-20T12:48:05Z

Some adds on comments:

Their image features are based on single cells. "For each compound, we compute a vector of feature medians across all cells in its image, producing a single image-based fingerprint". There are two fields for each well, so there are at least two images for each compound. Not sure if they aggregate across all images, or have multiple feature vectors for one compound.
They used negative controlled images to z-score normalize features within each plate before aggregating.
Cross-validation should not have random splits, since compounds have correlations. They used a stratified sampling method based on compound clusters.
Compound diversity is greatly valued for screening, they use ECFP fingerprint to compute compound similarities.
As you noted, one ECFP fingerprint model is implemented, but not fully validated.

agitter · 2019-03-20T16:31:59Z

Paper, supplement, and Data S1
1-s2.0-S2451945618300370-mmc3.pdf
1-s2.0-S2451945618300370-mmc2.xlsx

agitter · 2019-03-20T16:38:12Z

I can contact the authors to see if there is any chance they'd be willing to share the data. It is unlikely, but there is nothing to lose. Maybe they could anonymize the assay labels.

agitter · 2019-03-20T16:52:16Z

Their supplement describes the three channels

Hoechst 33258 (Invitrogen H3569, dilution 1/5000) to label the nucleus, CellMask Deep Red (Invitrogen H32721, dissolved in 100 ml DMSO, then diluted 1/4000) to delineate cell boundaries, and an Alexa-568 labeled goat anti-rabbit secondary antibody (Invitrogen A11011, 1/500) to detect the GCR.

For Hoechst, a 405-nm laser was used and a 445/45 bandpass emission filter; for Alexa 568 a 561-nm excitation and a 600/37 filter, and for CellMask Deep Red a 635-nm laser and a 676/29 filter.

Hoechst is for DNA staining
Alexa 568 is used to detect the GCR
CellMask Deep Red is for cell boundaries

We also noted that they are specifically targeting Glucocorticoid Receptor, a single protein target. This may mean that the "hit ratio" of the image-based screen is more like the hit ratio of traditional assays. In addition, it is likely that there is a stronger contrast between the hits and the controls.

xiaohk · 2019-03-20T16:59:12Z

Here are the five channels used in our U2OS cell-painting dataset.

Dye	Alternative	Position
ERSyto	ER	Endoplasmic reticulum
ERSytoBleed	RNA	RNA
Hoechst	DNA	Nucleus
Mito	Mito	Mitochondria
Ph_golgi	AGP	plasma membrane

agitter · 2019-03-23T17:05:22Z

I contacted the last author of this paper asking about data availability but received an out of office response. I can follow up in a week or two.

agitter mentioned this issue Mar 22, 2019

End-to-end learning of pharmacological assays from high-resolution microscopy images #7

Open

xiaohk mentioned this issue Jul 31, 2019

Project summary #14

Open

agitter added the related work Related manuscript label Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Positive examples of using microscopy image-based chemical screening #12

Positive examples of using microscopy image-based chemical screening #12

agitter commented Feb 27, 2019

agitter commented Mar 10, 2019

agitter commented Mar 19, 2019

xiaohk commented Mar 20, 2019

agitter commented Mar 20, 2019

agitter commented Mar 20, 2019

agitter commented Mar 20, 2019 •

edited

Loading

xiaohk commented Mar 20, 2019 •

edited

Loading

agitter commented Mar 23, 2019

Positive examples of using microscopy image-based chemical screening #12

Positive examples of using microscopy image-based chemical screening #12

Comments

agitter commented Feb 27, 2019

agitter commented Mar 10, 2019

agitter commented Mar 19, 2019

xiaohk commented Mar 20, 2019

agitter commented Mar 20, 2019

agitter commented Mar 20, 2019

agitter commented Mar 20, 2019 • edited Loading

xiaohk commented Mar 20, 2019 • edited Loading

agitter commented Mar 23, 2019

agitter commented Mar 20, 2019 •

edited

Loading

xiaohk commented Mar 20, 2019 •

edited

Loading