Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-throughput run over all plates (continued) #9

Open
agitter opened this issue Nov 29, 2018 · 23 comments
Open

High-throughput run over all plates (continued) #9

agitter opened this issue Nov 29, 2018 · 23 comments

Comments

@agitter
Copy link
Member

agitter commented Nov 29, 2018

Continuing the discussion from #5.

We will select two batches that have two DMSO clusters and a subset of plates to explore normalization options. Three normalization ideas are:

  • Normalize the original images using the mean channel intensities
  • Extract features with the CNN and treat those as a new data matrix. Explore which standard batch effect correction methods, e.g. for gene expression data, may be applicable.
  • Train a new Siamese CNN to learn how to normalize the images by constraining images with the same labels to have the same latent representation. We will use the Carpenter lab approach initially if it is applicable.

For the extracted features, we may also try visualizing the data matrix as a heat map if it is feasible. We would keep the plate ordering and use hierarchical clustering of the features to group similar features.

@xiaohk
Copy link
Member

xiaohk commented Dec 5, 2018

Heatmap for DMSO images

Since there are many corrupted raw images, we only had features extracted from 317 plates. Then, for each plate, I randomly sample 5 images, from random DMSO wells and random field of view. It gave me 317 * 5 = 1585 rows.

Then we can visualize this 1585 * 4096 matrix.

heat_map_dmso_5

Heatmap for Non-DMSO images

Following the same procedure, I sampled 1585 non-DMSO images. We also can visualize this 1585 * 4096 matrix.

heat_map_no_dmso_5

Comments

  • The original feature matrix has a very large range (from 0 to about 30000). Therefore, I linearly scaled the values into [0, 1] for each column.

  • There are regions in the matrix that are always sparse. For each row, there are many columns sharing the same value.

  • Note that when we extract features from the raw images, we split 5 channels into one 3-channel image and another 2-channel image. Each image gives us 2048 features, then we merge two vectors to get 4096 features for each 5-channel image. Therefore, we see some repetitive patters for each row.

  • I added a color bar on the left to indicate which batch is the row comes from.

    The mapping is {1 => red, 2 => orange, 3 => yellow, 4 => green, 5 => skyblue, 6 => blue, 7 => purple, 8 => black}.

  • From the color bar, we can see that batch 2 (orange) and batch 3 (yellow) are picked up by the hierarchical clustering, while others are not so obvious.

  • Batches 1, 2, 3 (red, orange, yellow) tend to be grouped together, perhaps we can group batches 1, 2, 3 together? See the new plots if I group 1, 2, 3 as red, 4, 6, 8 as blue, 5, 7 as green. Maybe the plate number does not necessarily follows the imaging order?

heat_map_dmso_5_3batch

Also see our batch grouping in the DMSO distribution.

@xiaohk
Copy link
Member

xiaohk commented Dec 6, 2018

I subsampled features of 5 plates in batch 3, and 5 plates in batch 4 to test batch effect adjustment algorithms. It gave me a (41410, 4096) feature matrix.

Then, I used the ComBat() function from the R package sva to remove the batch effect in this feature matrix. sva is a popular package for gene expression/RNA sequencing, and ComBat() uses Bayes method to adjust batch effects.

I visualized the heat map (4096, 4096) for the feature matrix. In each heatmap, top 2048 rows are randomly sampled from the rows in batch 3, while the bottom 2048 rows are from batch 4. Two heatmaps have the same sampling index.

Before adjustment

screen shot 2018-12-05 at 10 25 37 pm

full size image

After adjustment

screen shot 2018-12-05 at 10 25 49 pm

full size image

Comments

  • Each row in the heatmap looks symmetric. This is because we are merging two 2048 features (from channel-123, channel-45).
  • ComBat() really removes the batch effects. We also can plot the UMAP before/after normalization, but I am pretty confident that two batches should have a similar distribution in the 2D space.
  • This normalization method greatly modify the feature values (especially rows from batch 3). I am not sure if is the right way to do. One way to validate it is to see if it improves our drug test prediction. I think changing the original pixels makes more sense, because we can check if the images make sense after normalization.

Some other ideas for normalization

  • Design an optimization problem to find a linear transformation to transform all off-batch feature vectors to minimize the "batch distance". We only use DMSO images in the optimization. It might relate to multi response regression.
  • There are a lot of research in this problem. One key word is "calibration transfer".

@agitter
Copy link
Member Author

agitter commented Dec 6, 2018

We can try the sva function in the sva package to do an unsupervised analysis on the same data used for the supervised ComBat analysis run above. Here supervised refers to having the batch identities.

We will need a way to evaluate the three current normalization strategies:

  • ComBat (as above) on the full dataset (or very large subsampled dataset)
  • sva
  • Intensity normalization directly on the input images before feature extraction

One approach would be to normalize, run UMAP on the feature matrix, and visualize the DMSO images in the new 2D space from UMAP. If the normalization worked well, we might hope to see a single DMSO cluster now instead of the dual clusters we saw for some batches in #5.

We will also continue thinking about the calibration transfer strategies, possibly including the CNN approach we discussed before.

We could also consider encoding the batch (or plate?) as a feature and use that when predicting drug responses. The batches have clear visual differences but we created them, whereas the plates are directly technical indices.

@xiaohk
Copy link
Member

xiaohk commented Jan 10, 2019

Use UMAP to measure normalization performance

bc2547b adds UMAP plots for CombatP (Parametric Combat) normalization. Surprisingly, two batches are separated in the UMAP 2D space after normalization, even though we have seen a consistent heat map in #9 (comment).

test

  • After a successful normalization, we expect to see overlapping DMSO dots from two batches.
  • CombatP does have some effects, but it does not remove the batch effects
  • Maybe heat maps overlook the small connection between each image in one batch?

Other normalization

sva

I have been trying to use sva to get batch labels, instead of setting them artificially. Since sva requires model matrix and other parameters, I found it hard to set up for our feature matrix. And sva is very slow.

Mean Normalization

I found a paper Removing Batch Effects From Histopathological Images for Enhanced Cancer Diagnosis which tried 5 different methods to remove the batch effect of histopathological images (mean, rank, ratio, combatP, combatN). It concluded that combatN has the most positive impact on later prediction performance. Mean normalization also has a good result.

nihms799862f9

Therefore, I tried to use their mean normalization method on our (41410, 4096) feature matrix.

image

The result is better than combatP method:

  • It changes feature values more progressively than combatP
  • It is interesting that two clusters have "perpendicular" orientations in the 2D space. I am not sure if it means anything
  • It makes sense in the 2D space that two batches now have the same mean (same cluster center), and the same standard deviation (same spread)

combatN

CombatN is extremely slow. It runs 5 hours (only for 10 plates) on Condor. It failed to find adjustment (result matrix contains only NA values). I will try to rerun the job again.

Other comments

  • I was worried about direct normalization on extracted features might cause problems, but it seems okay based on the paper.
  • There is only few research studying image batch effects.
  • We can look deeper into it and perhaps set up a design like this paper, which has known batches. We can compare network-based normalization to baselines like mean normalization. We probably can get more experiment information from Broad Institute to create a more "official/correct" batch separation.

@agitter
Copy link
Member Author

agitter commented Jan 10, 2019

The standard batch effect correction methods are either too slow or unsatisfactory. The network-based normalization could still possibly work but will be hard to implement and execute.

We could also contact the Broad team again to point out what we perceive as a batch effect and ask for suggestions. Instead of going to the authors directly, we can post to the meta-image analysis board that supports Cell Profiler, ImageJ, etc.

@xiaohk
Copy link
Member

xiaohk commented Jan 18, 2019

The imaging forum (image.sc) is very helpful. Someone actually asked questions about the Cell Painting data we are using.

Based on the replies, Carpenter Lab has noticed the batch effects. For the provided CellProfiler profiling features, they have processed per-plate normalization using z-score method.

Carpenter Lab has published a very nice paper discussing the cell image processing procedure in their lab. For batch effect detection, it suggests to aggregate the negative control feature for each plate (median is recommended), and then visualize the correlation matrix.

For batch effect removal, it recommends us to use DMSO (negative control) as the "normalizing population," then apply z-score normalization (similar to the mean-method we have used) within each plate, directly on every feature.

Relative normalization. This procedure consists of computing statistics (for example, median and median absolute deviation) in one population of samples, and then centering and scaling the rest with respect to that population. Ideally, features are normalized across an entire screen in which batch effects are absent; however, normalization within plates is generally performed to correct for batch effects (described in 'Batch-effect correction'). When choosing the normalizing population, we suggest the use of control samples (assuming that they are present in sufficient quantity), because the presence of dramatic phenotypes may confound results. This procedure is good practice regardless of the normalization being performed within plates or across the screen. Alternately, all samples on a plate can be used as the normalizing population when negative controls are unavailable, too few, or unsuitable for some reason, and when samples on each plate are expected to not be enriched in dramatic phenotypes.

We recommend applying normalization across all features. Normalization can be applied even if features are not transformed, and it is preferable to remove biases while simultaneously fixing range issues. z-score normalization is the most commonly used procedure in our laboratories. Normalization also aligns the range of different features, thus decreasing the effects of unbalanced scales when computing similarities (described in 'Measuring profile similarity') or applying analysis algorithms (described in 'Downstream analysis'). It is advisable to compare several transformation and normalization methods, because their performance can vary significantly among assays.

@xiaohk
Copy link
Member

xiaohk commented Jan 23, 2019

Using the recommended batch effect detection method, we can get the following correlation heatmap. I used the "elemental-wise" median of DMSO CNN feature (4096) in each plate to compute this 317*317 correlation matrix. The horizontal and vertical axises are the sorted 317 plate ID's.

feature_median_cor_317

  1. We can infer batch effects based on the non-uniform distribution of squares.
  2. Depending on how we group the diagonal squares, there can be 6~10 batches. It matches what we have seen in the mean intensity plot (~8 batches).
  3. We can also infer some outliers using this method. These outliers match the outliers on our mean intensity plot.
  4. Compared to our previous detection method, this way is more indirect (based on feature instead of image pixels), but it is more light-weight (no need 5 channels). It is also easier to compute after extracting the features. We can adopt this method in future projects.

Next step:

  1. Visualize the correlation matrix after normalization.
  2. I have redownloaded the corrupted images. We now have the complete 406 plates. I am extracting features from the images (10 jobs running for 2 days already).
  3. Use z-score normalization on all extracted features, and assess its performance.

@xiaohk
Copy link
Member

xiaohk commented Jan 30, 2019

I have tried the z-score normalization on all extracted features. The results do not look very good, so I tried some other methods.

One issue I have encountered is the variance of some features on the normalizing population is 0, but they are not 0 on treated images. I have discussed it with Shantanu on image.sc forum. I believe he knows we are working on the Cellpainting data.

My solution is to use the feature variance of all images instead of DMSO images if the latter variance is 0. If the variance for all images in that plate is 0, I then stop normalizing that feature (mean = 0, std=1) and report it. The idea is applied to all three normalization below (use super set to compute std).

Method Description
DMSO Within-plate Normalization Use in-plate DMSO features to compute mean and sd for each plate
All-Feature Within-plate Normalization Use all in-plate features to compute mean and sd for each plate
DMSO Across-plate Normalization Use all DMSO features across all plates to compute mean and sd for all plates
406 Plates Before Normalization
before_correlation
406 Plates After DMSO Within-plate Normalization
after_correlation
406 Plates After All-Feature Within-plate Normalization
after_correlation_group
406 Plates After DMSO Across-plate Normalization
after_correlation

Comments

  1. It becomes harder to interpret the heatmap.
  2. I expect the perfect heatmap to have light light yellow diagonal, and random dark off-diagonal entries.
  3. The correlation between different plates significantly drops after all normalization methods, but the "square patches" are still very obvious.
  4. It seems "All-Feature Within-plate Normalization" gives the best result (not perfect), but this is not the one suggested on the paper.

Next Steps

  1. Maybe ask Shantanu if "All-Feature Within-plate Normalization" makes sense?
  2. Use alternative visualization to assess batch effects (UMAP)
  3. Use "All-Feature Within-plate" normalized features to predict assay activity with LR

@agitter
Copy link
Member Author

agitter commented Jan 30, 2019

These next steps make sense. UMAP has been helpful in visualizing the batch effects in the past. We are comfortable sharing these heatmaps in the image.sc forum if it can help us determine the best normalization.

@xiaohk
Copy link
Member

xiaohk commented Feb 6, 2019

I asked for suggestions here, but Shantanu didn't reply. I guess these questions are too detailed, and not related to CellProfilers.

Then I tried to revitalize the normalization results using UMAP with some slight visualization changes.

screen shot 2019-02-06 at 1 45 12 am

Workflow

  1. Plot the mean intensity distribution by sorted plate number of 406 plates
  2. Manually decide batches based on the plot (we get 10 batches now, used to have 8 batches)
  3. Compute UMAP 2D values for all the four versions of features (before normalization, After DMSO Within-plate Normalization, After All-Feature Within-plate Normalization, After DMSO Across-plate Normalization)
  4. For each version of features, collect all DMSO 2D values (grouped by batch)
  5. For each version of features, randomly sample 5 plates from each batch
  6. For each version of features, randomly plot 10 plots of 40 treated wells from those 5 plates with DMSO values from step 4
  7. For each version of features, organize all plots for 10 batches in one image, so we can compare the batch-to-batch similarity
406 Plates Before Normalization
original
406 Plates After DMSO Within-plate Normalization
dmso_within
406 Plates After All-Feature Within-plate Normalization
grouped_all
406 Plates After DMSO Across-plate Normalization
across_dmso

Comments

  1. It seems both DMSO Within-plate Normalization and All-Feature Within-plate Normalization strategies are working.
  2. UMAP visualization is more intuitive than the correlation heatmap, but UMAP takes much longer time
  3. To make the visualization easier, I sample both wells (~400 wells) and plates (5 plates). I think my samples are representative.
  4. Ideally, if there is no batch effects, then the distribution of DMSO images should be similar across all batches. It is very interesting that the clusters within-batch are also unified after normalization.
  5. Since Carpenter Lab suggested DMSO Within-plate Normalization method, we can use this version of features in the proceeding training.

@agitter
Copy link
Member Author

agitter commented Feb 6, 2019

I agree DMSO Within-plate Normalization looks great. We can proceed with that.

Now that we are happy with the 2D UMAP representations after this normalization, we can spot check individual images to confirm the extracted features and UMAP 2D coordinates make sense with respect to the original images. If we fit a 2D Gaussian to the DMSO images in the 2D space, we can score each treated image by the likelihood it was generated from that Gaussian. Then we can look at 10-100 images with the highest likelihood and 10-100 with the lowest. Perhaps we do this for more than one batch.

One hypothesis is still that the outliers here are those where the chemicals have the greatest effect on the cancer cells. That may be a research question we can explore next. Before trying to systematically validate the outliers, we could look at some that are strong outliers and manually search ChEMBL or other databases to see what evidence exists that we can use to support our predictions.

@xiaohk
Copy link
Member

xiaohk commented Feb 9, 2019

I was preparing for an admission interview this week. To show that professor I am capable of designing and implementing visual analytics systems, I designed a visualization for flexibly viewing a million of UMAP dots.

The link above should be private. He gave us lots of suggestions in terms of this visualization. The plot is still sketchy. I can give you a demo in our meeting, and we can discuss if this plot is helpful.

@agitter
Copy link
Member Author

agitter commented Feb 10, 2019

Very nice! I'm curious to hear the suggestions. Did you do this directly in D3 or does it use some library?

@xiaohk
Copy link
Member

xiaohk commented Feb 13, 2019

UMAP Viewer Visualization

Yes, I am directly using D3.js and HTML SVG. It might not be our main interest, but I got many fun suggestions from my interviewer and Michael Gleicher.

Functions and goals

  1. Sliding a window through various sizes over controlled images to detect batch effects
  2. Find outlier in treated images
  3. A more compact and flexible way to visualize UMAP dots than tons of static plots

Limitations

  1. Very slow rendering when the window size gets large.
  2. I did not implement outlier detection interaction yet (could further slow the rendering)
  3. Scatter plot overdraw issue

Feedbacks

  1. Draw rectangles instead of circles to make rendering faster
  2. Use a lower level framework: D3.js + SVG → D3.js + Canvas → WebGL
  3. Use binned scatter plot (paper) to replace scatter plot (I didn't know there is lots of research on this area)

image

Gaussian Mixture 2D Model

I have tried to fit a Gaussian 2D distribution on all normalized DMSO UMAP values, then I can automatically detect outliers of non-DMSO UMAP points.

If I fit the mixture model using component=1, then this model predicts all non-DMSO dots have 100% probability of falling in the DMSO cluster.

If I fit the mixture model using component=2, then it fits two 2D Gaussian distributions within the DMSO cluster (departed by a y threshold). It does not really help us detect outliers.

test

test2

I manually picked outliers from 3 regions: far top (y > 10), far right (x > 10), cluster bottom edge area (y < -5) to inspect the raw images. I am fixing a bug in my image inspect code.

@xiaohk
Copy link
Member

xiaohk commented Feb 13, 2019

Group Channel 123 Channel 45
y>10 25575_p06_s2_123_1 25575_p06_s2_45_1
y<-5 25931_o19_s6_123_2 25931_o19_s6_45_2
x>10 24732_k02_s3_123_3 24732_k02_s3_45_3

@agitter
Copy link
Member Author

agitter commented Feb 13, 2019

Even with these outliers, it ends up being difficult for us to manually evaluate whether they are reasonable or not. The x > 10 example could be an effective drug that killed all the cancer cells or a technical artifact with no cells in the viewing frame.

Another direction would be to look for positive controls, known effective drugs. If these are reported in the Gigascience paper, we could find the corresponding 2D coordinates and look at the images.

The Broad Institute and Sanger have projects that screen drugs against many types of cancer cell lines. If they have U2OS data, it could also give us positive controls.

@agitter
Copy link
Member Author

agitter commented Feb 13, 2019

Based on #7, we have concerns about over-normalization and removing the signal. We may be able to design more direct tests of the normalization. For instance, if we select some training batches and a different test batch, can we predict DMSO versus non-DMSO on the test batch successfully? We could directly compare results for the original features and normalized features. The normalization is within plate, so it should be fair to normalize even the test batch images.

@agitter
Copy link
Member Author

agitter commented Feb 18, 2019

The Broad Institute and Sanger have projects that screen drugs against many types of cancer cell lines. If they have U2OS data, it could also give us positive controls.

Both of these include U2OS drug screening data and can serve as positive controls of drugs that kill the cancer cells if we need those. We should also check whether the Gigascience paper discusses these screening datasets and any overlapping chemicals.

The Broad data was originally reported in https://www.nature.com/articles/nature11003#supplementary-information I checked the supplementary Excel file to confirm U2OS is there. It may be more direct to register for their data portal to access the data though: https://portals.broadinstitute.org/ccle/about

The Sanger drug sensitivity data is https://www.cancerrxgene.org/translation/CellLine/909776 They used z-scores to assess sensitivity and resistance. From the plot, it looks like no drugs passed their -2.0 threshold for sensitivity. We could still potentially learn what the most sensitive versus the most resistant chemicals do to the cells.

@xiaohk
Copy link
Member

xiaohk commented Feb 27, 2019

e10df9b adds analysis of these positive control compounds.

  • There are only 23 compounds tested on U2OS in the CCLE dataset, and I am not sure how to read the drug sensitivity from this dataset
  • The drug name in Sanger dataset is also more "standard" than CCLE. I have tried both clue.io API and PubChem API to convert compound name to SMILES/InChi. PubChem works better for me.
  • There are 34 compounds that are not in PubChem. I can find 18 of them in the Harvard LINCS database. They are all called "Small Molecule".
  • There are 196 compounds tested on U2OS in the Sanger dataset. If I use canonical SMILES to match compounds with our compounds, there are 7 overlapping drugs. If I used InChi to match, we can get 11 overlapping compounds.
  • These 11 compounds are distributed in 84 wells in our dataset. There are 458 images in total.

Below is their UMAP project on the pre-normalized DMSO space:

pre_norm_overlap

Below is their UMAP project on the normalized DMSO space:

post_norm_overlap

  • We would expect high sensitivity and low sensitivity compounds to have their own clusters, but they are distributed "uniformly" here.
  • Based on the Z score, majority of overlapping compounds do not exceed the thresholds (either sensitive or resistant). The min is -0.9554, and the max is 1.9195. That might be the reason we have a uniform distribution here.

@agitter
Copy link
Member Author

agitter commented Feb 27, 2019

The running jobs to look at the images from the max and min z-scores may still be informative. Otherwise, these results are somewhat confusing. We might expect that the images for the sensitive drugs with the most positive z-scores have fewer cells or show some sign that the cells are being affected or killed. In the 2D plots, however, the yellow and dark blue points are sometimes adjacent and the yellow points are distributed throughout the 2D space.

@xiaohk
Copy link
Member

xiaohk commented Mar 6, 2019

We have 11 compounds with known effects overlapping with our image dataset. This corresponds to 504 images. I randomly sampled 100 images from channel 123 and channel 45 respectively. Then, I sorted these raw images by their drug sensitivity Z scores.

Channel 123

test_low

Channel 45

test_low

Comment

  1. From the 100 raw images with channel 123, we can see that compounds having low or high Z-scores could have few cells in this dataset. It might tell us cell painting data do not really encode cell biological signals.
  2. The image color distributes uniformly over increasing Z-scores. It also suggests there is no correlation between sensitivity Z-scores and cell images.

@agitter
Copy link
Member Author

agitter commented Mar 10, 2019

My first observations were the same as yours, which is not encouraging. However, there may be more subtle biological cues for which we both lack suitable expertise to pick up. For instance, the cell shape could be changing in a consistent way even though I can't see any trends.

@xiaohk
Copy link
Member

xiaohk commented Mar 12, 2019

Cell number box plot for random 1000 compounds:

test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants