Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reprocess profiles using jump-profiling-recipe #2

Open
shntnu opened this issue Jul 17, 2024 · 8 comments
Open

Reprocess profiles using jump-profiling-recipe #2

shntnu opened this issue Jul 17, 2024 · 8 comments

Comments

@shntnu
Copy link

shntnu commented Jul 17, 2024

We need to reprocess the profiles in this repo, starting from the augmented profiles.

  1. Use the upcoming v0.2.0 of https://github.com/broadinstitute/jump-profiling-recipe to do so.
  2. Processing using both pipelines – the CRISPR/ORF pipelines as well as the Compound pipeline
  3. Evaluate using (mAP-based) phenotypic activity
@PaulaLlanos
Copy link

I am starting to work on this using the new recipe version. Regarding the recipes to use, besides the compound recipe, should I use CRISPR or ORF? Do you have any preference?
Suganya and I discussed using her documentation for the recipes, then I can provide feedback if necessary

@shntnu
Copy link
Author

shntnu commented Oct 4, 2024

Please go with ORF

@PaulaLlanos
Copy link

I noticed that some of the plates have metadata corresponding to the control.txt platemap. This file only includes the well names and an empty column for the condition.

I’m wondering if I should consider this as DMSO, a negative control, or something else. Could you please clarify? thank you!

I've attached a screenshot of the platemap for reference:

Screenshot 2024-10-09 at 11 37 33 AM

@shntnu
Copy link
Author

shntnu commented Oct 9, 2024

I noticed that some of the plates have metadata corresponding to the control.txt platemap. This file only includes the well names and an empty column for the condition.

I’m wondering if I should consider this as DMSO, a negative control, or something else. Could you please clarify? thank you!

I've attached a screenshot of the platemap for reference:

@jump-cellpainting/broad-claussnitzer can you help address the question above?

Additional info that might help: Here are the platemaps for a single batch; each batch has these control plates (they are different from the Target2 plates)

https://github.com/jump-cellpainting/cpg0014-jump-adipocyte-data/tree/master/metadata/platemaps/2022_11_28_Batch1/platemap

@PaulaLlanos
Copy link

Update: I was able to run the ORF jump recipe with some changes that I will document soon, after discuss about it with Suganya and John.
I run it considering that empty columns as a 'control' until I get the correct information from @jump-cellpainting/broad-claussnitzer.

@PaulaLlanos
Copy link

Hi Shantanu, I obtained the maP values for this batch using the new version of the pipeline. However, I am not sure how these maP values have been analyzed in the past or what kind of further analysis they are interested in performing to compare them with previous versions, which was the aim, right?

@shntnu
Copy link
Author

shntnu commented Dec 4, 2024

That's great @PaulaLlanos

This is sufficient for others to take it forward.

Please be sure to update this repo to document

  1. how the data was processed, with a URL to the version of https://github.com/broadinstitute/jump-profiling-recipe (or its fork) that you used + any parameter files that may not be available in the rep
  2. where the new data live
  3. mAP analysis notebooks

Please also update the landing page README.md with any other relevant information

@PaulaLlanos
Copy link

I run jump profiling recipe using the last version of ORF pipeline.

Here, the link with the files used (cloned from jump-profiling-recipe):

https://github.com/PaulaLlanos/jump-profiling-recipe/tree/cpg0014_adipocytes

Prepare Metadata

Code used: 'get_allmetadata.py'
output:'combined_metadata.csv'

It become necessary to get all CSVs in just one document, which should include Metadata and Features. In this big csv we should include also all batches and plates that we want to preprocess.

# Check and download metadata 
    aws s3 sync --no-sign-request s3://cellpainting-gallery/cpg0014-jump-adipocyte/broad/workspace/metadata/platemaps/ metadata/platemaps/

Also, the metadata_broad_sample column was 'Nan' because the broad sample column in the plate map was empty, since it was a control plate. Based on the answer of Felipe Do Santo, we should consider thos control.txt plate as a DMSO plate.

We need a csv file that contain also this information:

Source (broad)
Batch
Plate
Well
Perturbation as Metadata_JCP2022, don't change the name of this columns, because we don't want to modify the code downstream.

Convert profiles to parquet format

Code: convert_parquet_profiles.py
Output: 'inputs/broad/workspace/profiles/<Batch_name>/<plate_name>/<plante_name.parquet>

Once we got this, we should convert the csv in parquet files with the function load_Data in the preprocessing folder io.py this is the first step.

Create cell count files to run ORF pipeline

Code: get_cell_counts.py
Output: orf_cell_counts_adipocytes.csv

Beside, it was necessary to creat a file of "orf_cell_counts_adipocytes.csv" since the ORF pipeline require to get the cell counts as a separate file.

Create the environment

I create the environment using nix, you can check flake files to see te requirement detailed there. I create the environment in Moby Server (CS Lab server mantained by Alán)

cd jump-profiling-recipe/
nix develop . --impure --extra-experimental-features nix-command --extra-experimental-features flakes --show-trace

To check phenotipic activity calculating mAP

output: 'map_scores.parquet'
code: below

    from preprocessing import metrics

    # Get average precision
    metrics.average_precision_negcon(parquet_path="outputs/orf/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet", ap_path="ap_scores.parquet", plate_types=["COMPOUND"])
    # Get Mean average precision
    metrics.mean_average_precision("ap_scores.parquet", "map_scores.parquet")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants