Reprocess profiles using jump-profiling-recipe #2

shntnu · 2024-07-17T20:01:02Z

We need to reprocess the profiles in this repo, starting from the augmented profiles.

Use the upcoming v0.2.0 of https://github.com/broadinstitute/jump-profiling-recipe to do so.
Processing using both pipelines – the CRISPR/ORF pipelines as well as the Compound pipeline
Evaluate using (mAP-based) phenotypic activity

PaulaLlanos · 2024-10-01T19:23:46Z

I am starting to work on this using the new recipe version. Regarding the recipes to use, besides the compound recipe, should I use CRISPR or ORF? Do you have any preference?
Suganya and I discussed using her documentation for the recipes, then I can provide feedback if necessary

shntnu · 2024-10-04T16:30:23Z

Please go with ORF

PaulaLlanos · 2024-10-09T15:38:29Z

I noticed that some of the plates have metadata corresponding to the control.txt platemap. This file only includes the well names and an empty column for the condition.

I’m wondering if I should consider this as DMSO, a negative control, or something else. Could you please clarify? thank you!

I've attached a screenshot of the platemap for reference:

shntnu · 2024-10-09T15:46:23Z

I noticed that some of the plates have metadata corresponding to the control.txt platemap. This file only includes the well names and an empty column for the condition.

I’m wondering if I should consider this as DMSO, a negative control, or something else. Could you please clarify? thank you!

I've attached a screenshot of the platemap for reference:

@jump-cellpainting/broad-claussnitzer can you help address the question above?

Additional info that might help: Here are the platemaps for a single batch; each batch has these control plates (they are different from the Target2 plates)

https://github.com/jump-cellpainting/cpg0014-jump-adipocyte-data/tree/master/metadata/platemaps/2022_11_28_Batch1/platemap

PaulaLlanos · 2024-10-18T19:52:45Z

Update: I was able to run the ORF jump recipe with some changes that I will document soon, after discuss about it with Suganya and John.
I run it considering that empty columns as a 'control' until I get the correct information from @jump-cellpainting/broad-claussnitzer.

PaulaLlanos · 2024-12-04T16:43:48Z

Hi Shantanu, I obtained the maP values for this batch using the new version of the pipeline. However, I am not sure how these maP values have been analyzed in the past or what kind of further analysis they are interested in performing to compare them with previous versions, which was the aim, right?

shntnu · 2024-12-04T16:50:32Z

That's great @PaulaLlanos

This is sufficient for others to take it forward.

Please be sure to update this repo to document

how the data was processed, with a URL to the version of https://github.com/broadinstitute/jump-profiling-recipe (or its fork) that you used + any parameter files that may not be available in the rep
where the new data live
mAP analysis notebooks

Please also update the landing page README.md with any other relevant information

PaulaLlanos · 2024-12-04T22:57:43Z

I run jump profiling recipe using the last version of ORF pipeline.

Here, the link with the files used (cloned from jump-profiling-recipe):

https://github.com/PaulaLlanos/jump-profiling-recipe/tree/cpg0014_adipocytes

Prepare Metadata

Code used: 'get_allmetadata.py'
output:'combined_metadata.csv'

It become necessary to get all CSVs in just one document, which should include Metadata and Features. In this big csv we should include also all batches and plates that we want to preprocess.

# Check and download metadata 
    aws s3 sync --no-sign-request s3://cellpainting-gallery/cpg0014-jump-adipocyte/broad/workspace/metadata/platemaps/ metadata/platemaps/

Also, the metadata_broad_sample column was 'Nan' because the broad sample column in the plate map was empty, since it was a control plate. Based on the answer of Felipe Do Santo, we should consider thos control.txt plate as a DMSO plate.

We need a csv file that contain also this information:

Source (broad)
Batch
Plate
Well
Perturbation as Metadata_JCP2022, don't change the name of this columns, because we don't want to modify the code downstream.

Convert profiles to parquet format

Code: convert_parquet_profiles.py
Output: 'inputs/broad/workspace/profiles/<Batch_name>/<plate_name>/<plante_name.parquet>

Once we got this, we should convert the csv in parquet files with the function load_Data in the preprocessing folder io.py this is the first step.

Create cell count files to run ORF pipeline

Code: get_cell_counts.py
Output: orf_cell_counts_adipocytes.csv

Beside, it was necessary to creat a file of "orf_cell_counts_adipocytes.csv" since the ORF pipeline require to get the cell counts as a separate file.

Create the environment

I create the environment using nix, you can check flake files to see te requirement detailed there. I create the environment in Moby Server (CS Lab server mantained by Alán)

cd jump-profiling-recipe/
nix develop . --impure --extra-experimental-features nix-command --extra-experimental-features flakes --show-trace

To check phenotipic activity calculating mAP

output: 'map_scores.parquet'
code: below

    from preprocessing import metrics

    # Get average precision
    metrics.average_precision_negcon(parquet_path="outputs/orf/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet", ap_path="ap_scores.parquet", plate_types=["COMPOUND"])
    # Get Mean average precision
    metrics.mean_average_precision("ap_scores.parquet", "map_scores.parquet")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reprocess profiles using jump-profiling-recipe #2

Reprocess profiles using jump-profiling-recipe #2

shntnu commented Jul 17, 2024

PaulaLlanos commented Oct 1, 2024

shntnu commented Oct 4, 2024

PaulaLlanos commented Oct 9, 2024

shntnu commented Oct 9, 2024

PaulaLlanos commented Oct 18, 2024

PaulaLlanos commented Dec 4, 2024

shntnu commented Dec 4, 2024 •

edited

Loading

PaulaLlanos commented Dec 4, 2024

Reprocess profiles using jump-profiling-recipe #2

Reprocess profiles using jump-profiling-recipe #2

Comments

shntnu commented Jul 17, 2024

PaulaLlanos commented Oct 1, 2024

shntnu commented Oct 4, 2024

PaulaLlanos commented Oct 9, 2024

shntnu commented Oct 9, 2024

PaulaLlanos commented Oct 18, 2024

PaulaLlanos commented Dec 4, 2024

shntnu commented Dec 4, 2024 • edited Loading

PaulaLlanos commented Dec 4, 2024

Prepare Metadata

Convert profiles to parquet format

Create cell count files to run ORF pipeline

Create the environment

To check phenotipic activity calculating mAP

shntnu commented Dec 4, 2024 •

edited

Loading