Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report feature quality #5

Open
niranjchandrasekaran opened this issue Mar 8, 2023 · 6 comments
Open

Report feature quality #5

niranjchandrasekaran opened this issue Mar 8, 2023 · 6 comments
Assignees

Comments

@niranjchandrasekaran
Copy link
Collaborator

niranjchandrasekaran commented Mar 8, 2023

cc @shntnu @MarziehHaghighi

Here is list of materials to generate and questions to explore. Please add anything else you think might be useful to check to this list by directly editing my comment:

We each of three ORF, CRISPR and Compound datasets, I check the following:

Blank diagram - Page 1 (1)

UPDATE:

  • Basic tables as reference:

    • Ranking of perturbations by their replicability of their profiles
    • Ranking of features by their quality
    • Group-wise feature quality map
      • Is there any consistent pattern across datasets for some categories to be high/low quality?
  • Can we trust the ranking?

    • Is feature quality (as of current definitions) variable from batch to batch?
    • Is feature quality variable across datasets? (for same perturbations, comparison of each perturbation type with its corresponding available dataset as shown in the figure)
  • The current quality metric ranks features based on their consistency in an experiment, does features replicate across experiments?

    • Rank features based on their replicability across different batches.
    • Rank features based on their replicability across different datasets.
@MarziehHaghighi
Copy link
Collaborator

MarziehHaghighi commented May 24, 2023

Perturbation replicate reproducibility:

Jump-compound
  • source_8
    jump_compound_corr_curves_source_8
  • source_7
    jump_compound_corr_curves_source_7
  • source_6
    jump_compound_corr_curves_source_6
  • source_5
    jump_compound_corr_curves_source_5
  • source_11
    jump_compound_corr_curves_source_11
  • source_10
    jump_compound_corr_curves_source_10
  • source_9
    jump_compound_corr_curves_source_2
  • source_1
    jump_compound_corr_curves_source_1
  • source_3
    jump_compound_corr_curves_source_3
Jump-orf

jump_orf_corr_curves

Jump-crispr

jump_crispr_corr_curves

taorf

taorf_corr_curves_broad 2

lincs

lincs_g_corr_curves_broad

@MarziehHaghighi
Copy link
Collaborator

MarziehHaghighi commented May 24, 2023

Is feature quality consistent across sources or batches of each dataset?

Jump-compound
  • source 1
    image

  • source 11
    image

  • source 7
    image

  • source 8
    image

  • source 3
    image

  • source 2
    image

  • source 10
    image

  • source 6
    image

  • source 5
    image

Jump-orf

image

Jump-crispr

image

taorf
lincs

@MarziehHaghighi
Copy link
Collaborator

MarziehHaghighi commented May 25, 2023

Is feature quality consistent across various datasets and sources?

image

Are there groups with overall high or low quality according to median scores across datasets?

image

Same analysis but with including lincs:

  • all the datasets except lincs have the same set of features, so adding lincs will reduce the overlap
    image

@MarziehHaghighi
Copy link
Collaborator

MarziehHaghighi commented May 25, 2023

Feature replicability across datasets

  • for each dataset of genetic or chemical perturbations, we can take the overlap of perturbations and calculate the pairwise correlation coefficient of each feature profile across datasets.

TA-ORF, Jump-ORF and Jump-CRISPR

image

Jump-Compound and LINCS

  • I need the map between perturbation IDs in jump-cmpound (Metadata_JCP2022) and lincs_g

@AnneCarpenter
Copy link

We discussed in checkin that most JUMP sources do not have replicates of a given compound, except for the Target2 plates that most partners did in many replicates (except source_1 did not do that plate, explaining why its result is quite different though we are not sure what compounds are shown in the plot because there would be very few with replicates at all!). So here we are probably looking at the results for around 300 compounds.

Another exception is that the three wave 2 partners may have had 2 replicates per compound because they had a different swapping scheme.

Overall, Marzieh if you're able to describe some conclusions here from each result that would be great because it's hard to grasp just looking at the plots what analysis is happening. Thx!

@MarziehHaghighi
Copy link
Collaborator

@AnneCarpenter sure these results are not complete yet. I just wanted to show you the mito_radialdistribution category quality being low according to the median over datasets in the checkin (which has the caveat of high variance across datasets). I will go trough a complete interpretation once the tasks in this issue are complete. For now I can say that feature quality seems to be consistent among various batches within an experiment but that doesn't hold across datasets. That means that we cant say for example this specific group of features are always low quality relative to the rest of features in all cell painting experiments/datasets but we can make such a statement for different batches within a dataset/source/experiment. But let's pause here and come back to it once I have all I need to have a conclusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants