Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Report FP / FN negative variants in single table #107

Merged
merged 19 commits into from
Dec 13, 2024

Conversation

BiancaStoecker
Copy link
Collaborator

@BiancaStoecker BiancaStoecker commented Nov 21, 2024

Change folder structure for fp-fn results, add collection of fp-fn results for all covs of a callset and all callsets of a benchmark.

Results for a each benchmark are currently sorted by callset and then coverage. No dependency sorting of the tables, yet.

New tables are not used in the datawzrd report yet and not requested by any rules.

Summary by CodeRabbit

  • New Features

    • Introduced new rules for collecting and reporting false positive and false negative data.
    • Added scripts for processing benchmark and stratification data related to variant calling.
    • New configuration file for managing data visualization and reporting for callset classifications.
  • Bug Fixes

    • Updated output path formats for false positive and negative reporting to enhance clarity.

…-fn results for all covs of a callset and all callsets of a benchmark.
Copy link

coderabbitai bot commented Nov 21, 2024

Walkthrough

The changes introduce enhancements to a Snakemake workflow for evaluating variant calling performance. Key modifications include the addition of new rules and functions to manage false positive and false negative data, improved handling of coverage data, and the introduction of new scripts for processing benchmark and stratification data. A new YAML configuration file is also created to structure data visualization and reporting. These updates refine the evaluation process and improve data organization within the workflow.

Changes

File Change Summary
workflow/Snakefile Added input to eval rule for false positive/negative reports using used_callsets.
workflow/rules/common.smk - Added get_coverages_of_callset(callset) function.
- Modified get_collect_fp_fn_callsets(wildcards) and get_collect_fp_fn_input(wildcards) functions.
- Added new functions for stratifications and benchmarks.
workflow/rules/eval.smk - Added rules: collect_stratifications_fp_fn, collect_fp_fn_benchmark, report_fp_fn_callset.
- Updated extract_fp_fn and report_fp_fn rules for output path changes.
workflow/scripts/collect-fp-fn-benchmarks.py Introduced a script for processing benchmark data, including functions for loading data and sorting.
workflow/scripts/collect-stratifications-fp-fn.py Introduced a script for processing coverage data, including functions for generating coverage labels and loading data.
workflow/resources/datavzrd/fp-fn-per-callset-config.yte.yaml New configuration file for managing data visualization, including dataset and view definitions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Snakemake
    participant CoverageData
    participant BenchmarkData
    participant Report

    User->>Snakemake: Trigger workflow
    Snakemake->>CoverageData: Retrieve coverage information
    Snakemake->>BenchmarkData: Collect FP/FN benchmarks
    Snakemake->>Report: Generate reports
    Report-->>User: Provide evaluation results
Loading

🐰 "In the meadow where data flows,
New rules and scripts, like flowers, grow.
False positives, negatives, all in sight,
With coverage and benchmarks, we take flight!
A hop, a jump, through logic we prance,
In the world of Snakemake, we dance!" 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@famosab
Copy link
Collaborator

famosab commented Nov 29, 2024

@coderabbitai review

Copy link

coderabbitai bot commented Nov 29, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (13)
workflow/scripts/collect-fp-fn-benchmarks.py (3)

1-5: Use context manager for file handling.

Consider using a context manager (with statement) for safer file handling.

-sys.stderr = open(snakemake.log[0], "w")
+with open(snakemake.log[0], "w") as log_file:
+    sys.stderr = log_file
🧰 Tools
🪛 Ruff (0.8.0)

2-2: Use a context manager for opening files

(SIM115)


2-2: Undefined name snakemake

(F821)


7-10: Add error handling and type hints.

The function would benefit from:

  1. Error handling for file reading
  2. Type hints for better code maintainability
  3. Docstring explaining the function's purpose
-def load_data(path, callset):
+def load_data(path: str, callset: str) -> pd.DataFrame:
+    """Load benchmark data from TSV file and add callset identifier.
+
+    Args:
+        path: Path to the TSV file
+        callset: Identifier for the data source
+
+    Returns:
+        DataFrame with loaded data and callset column
+    """
+    try:
         d = pd.read_csv(path, sep="\t")
         d.insert(0, "callset", callset)
         return d
+    except Exception as e:
+        raise RuntimeError(f"Failed to load data from {path}: {str(e)}")

1-42: Consider adding logging and memory optimization.

For better operational visibility and scalability:

  1. Add logging of key operations (data loading, processing steps)
  2. Consider using chunks parameter in pd.read_csv for large files
  3. Use dtype specifications in pandas operations to optimize memory usage

Example logging implementation:

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[logging.FileHandler(snakemake.log[0]), logging.StreamHandler()]
)

# Then use throughout the code:
logging.info(f"Loading data from {len(snakemake.input.tables)} input files")
logging.info(f"Processed {len(results)} rows of data")
🧰 Tools
🪛 Ruff (0.8.0)

2-2: Use a context manager for opening files

(SIM115)


2-2: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


42-42: Undefined name snakemake

(F821)

workflow/resources/datavzrd/fp-fn-per-benchmark-config.yte.yaml (3)

4-5: Consider extracting color codes to a theme configuration.

The hardcoded color values could be moved to a common theme configuration file to ensure consistency across the visualization configs and make it easier to maintain a unified color scheme.


27-30: Complete or remove the commented description block.

The commented description block appears to be incomplete. Either complete it with relevant benchmark information or remove it if not needed.


23-44: Add newline at end of file.

The view configuration looks good, but please add a newline at the end of the file to comply with YAML standards.

             predicted_genotype:
               display-mode: hidden
+
🧰 Tools
🪛 yamllint (1.35.1)

[error] 44-44: no new line character at the end of file

(new-line-at-end-of-file)

workflow/scripts/collect-stratifications-fp-fn.py (3)

8-20: Add input validation for coverage parameter

The function assumes the coverage parameter exists in snakemake.params.coverage_lower_bounds. Consider adding validation to handle missing or invalid values gracefully.

 def get_cov_label(coverage):
+    if coverage not in snakemake.params.coverage_lower_bounds:
+        raise ValueError(f"Invalid coverage value: {coverage}")
     lower = snakemake.params.coverage_lower_bounds[coverage]
🧰 Tools
🪛 Ruff (0.8.0)

9-9: Undefined name snakemake

(F821)


12-12: Undefined name snakemake

(F821)


33-39: Consider implementing the commented validation logic

The TODO comment suggests important validation for variant prediction. This could help catch technical issues early in the pipeline.

Would you like me to help implement this validation logic or create a GitHub issue to track this enhancement?


1-56: Consider improving modularity for testing

The script's main logic could be encapsulated in a main() function to facilitate unit testing and improve maintainability. This would also help separate the Snakemake-specific code from the core functionality.

Consider restructuring the script like this:

def main(input_files, output_file, coverages, coverage_lower_bounds):
    if input_files:
        report = pd.concat(
            load_data(f, cov) for cov, f in zip(coverages, input_files)
        )
        report.to_csv(output_file, sep="\t", index=False)
    else:
        # Create empty DataFrame...

if __name__ == "__main__":
    main(snakemake.input, snakemake.output[0], 
         snakemake.params.coverages, 
         snakemake.params.coverage_lower_bounds)
🧰 Tools
🪛 Ruff (0.8.0)

3-3: Use a context manager for opening files

(SIM115)


3-3: Undefined name snakemake

(F821)


9-9: Undefined name snakemake

(F821)


12-12: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


56-56: Undefined name snakemake

(F821)

workflow/rules/common.smk (3)

406-413: Consider refactoring to reduce code duplication with get_coverages

The function implements similar logic to get_coverages. Consider extracting the common logic into a shared helper function to improve maintainability.

-def get_coverages_of_callset(callset):
-    benchmark = config["variant-calls"][callset]["benchmark"]
-    high_cov_status = benchmarks[benchmark].get("high-coverage", False)
-    if high_cov_status:
-        coverages = high_coverages
-    else:
-        coverages = low_coverages
-    return coverages
+def _get_coverage_dict(high_cov_status):
+    return high_coverages if high_cov_status else low_coverages
+
+def get_coverages_of_callset(callset):
+    benchmark = config["variant-calls"][callset]["benchmark"]
+    high_cov_status = benchmarks[benchmark].get("high-coverage", False)
+    return _get_coverage_dict(high_cov_status)
+
+def get_coverages(wildcards):
+    if hasattr(wildcards, "benchmark"):
+        high_cov_status = benchmarks[wildcards.benchmark].get("high-coverage", False)
+    else:
+        benchmark = config["variant-calls"][wildcards.callset]["benchmark"]
+        high_cov_status = benchmarks[benchmark].get("high-coverage", False)
+    return _get_coverage_dict(high_cov_status)

502-509: Consider simplifying the benchmark set comprehension and improving function name

The function works correctly but could be improved:

  1. The benchmark set comprehension could be simplified as used_benchmarks is already a set
  2. The function name could be more descriptive, e.g., get_benchmark_fp_fn_report_paths
-def get_fp_fn_reports_benchmarks(wildcards):
+def get_benchmark_fp_fn_report_paths(wildcards):
     for genome in used_genomes:
         yield from expand(
             "results/report/fp-fn/benchmarks/{benchmark}/{classification}",
-            benchmark={benchmark for benchmark in used_benchmarks},
+            benchmark=used_benchmarks,
             classification=["fp", "fn"],
         )

601-605: Consider reusing filtered callsets to avoid code duplication

The coverage filtering logic is duplicated from get_collect_fp_fn_callsets. Consider reusing that function to maintain DRY principle.

 def get_collect_fp_fn_labels(wildcards):
-    callsets = get_genome_callsets(wildcards.genome)
-    callsets = [
-        callset
-        for callset in callsets
-        if wildcards.cov in get_coverages_of_callset(callset)
-    ]
+    callsets = get_collect_fp_fn_callsets(wildcards)
     return get_callset_label_entries(callsets)
workflow/rules/eval.smk (1)

377-393: Consider explicitly specifying dependencies instead of using priority

In the collect_stratifications_fp_fn rule, you have set priority: 1 to ensure it runs after precision/recall computations. It's advisable to model dependencies explicitly through inputs or use Snakemake's after function to enforce rule order. This approach improves workflow clarity and maintainability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between db68793 and af469ef.

📒 Files selected for processing (6)
  • workflow/Snakefile (1 hunks)
  • workflow/resources/datavzrd/fp-fn-per-benchmark-config.yte.yaml (1 hunks)
  • workflow/rules/common.smk (6 hunks)
  • workflow/rules/eval.smk (3 hunks)
  • workflow/scripts/collect-fp-fn-benchmarks.py (1 hunks)
  • workflow/scripts/collect-stratifications-fp-fn.py (1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
workflow/resources/datavzrd/fp-fn-per-benchmark-config.yte.yaml

[error] 44-44: no new line character at the end of file

(new-line-at-end-of-file)

🪛 Ruff (0.8.0)
workflow/scripts/collect-fp-fn-benchmarks.py

2-2: Use a context manager for opening files

(SIM115)


2-2: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


16-16: Undefined name snakemake

(F821)


42-42: Undefined name snakemake

(F821)

workflow/scripts/collect-stratifications-fp-fn.py

3-3: Use a context manager for opening files

(SIM115)


3-3: Undefined name snakemake

(F821)


9-9: Undefined name snakemake

(F821)


12-12: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


30-30: Undefined name snakemake

(F821)


41-41: Undefined name snakemake

(F821)


56-56: Undefined name snakemake

(F821)

🔇 Additional comments (11)
workflow/resources/datavzrd/fp-fn-per-benchmark-config.yte.yaml (2)

10-11: LGTM!

The view settings are appropriate for a data visualization interface.


18-21: Verify table linking configuration.

The table linking configuration needs verification to ensure it correctly connects the callset data between tables.

✅ Verification successful

Table linking configuration is correctly set up

The table linking configuration is properly implemented across both files. The link configuration in fp-fn-per-benchmark-config.yte.yaml correctly references:

  • The source column "callset" in the current table
  • The target table-row "results-table/callset" which exists in both configuration files

The linking structure is consistent with the datavzrd configuration format and matches between the files.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for the existence of referenced table and column
# Look for other datavzrd configs that might define the target table

# Search for other datavzrd configs
fd -e yaml -e yml . workflow/resources/datavzrd -x grep -l "results-table/callset"

Length of output: 207


Script:

#!/bin/bash
# Let's examine both files to understand the table structure and linking
cat workflow/resources/datavzrd/precision-recall-config.yte.yaml workflow/resources/datavzrd/fp-fn-per-benchmark-config.yte.yaml

Length of output: 8596

workflow/Snakefile (1)

26-26: Verify the implementation of get_fp_fn_reports_benchmarks

The addition of this input variable aligns well with the PR's objective to collect FP/FN results across benchmarks. However, let's verify its implementation.

✅ Verification successful

Function implementation is correct and properly used

The get_fp_fn_reports_benchmarks function is correctly implemented in workflow/rules/common.smk and yields paths for FP/FN reports across all benchmarks and genomes, following the expected path pattern results/report/fp-fn/benchmarks/{benchmark}/{classification}. The function is only used in the Snakefile as intended, making this a valid and well-structured change.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of get_fp_fn_reports_benchmarks function
# Expected: Function definition in common.smk or related files

# Search for the function definition
echo "Searching for function definition:"
rg -A 5 "def get_fp_fn_reports_benchmarks"

# Search for any usage of the function
echo -e "\nSearching for function usage:"
rg "get_fp_fn_reports_benchmarks"

Length of output: 809

workflow/rules/common.smk (4)

481-485: LGTM! Implementation follows established patterns

The function correctly generates input paths for FP/FN stratification results across all coverage levels, maintaining consistency with similar functions in the codebase.


526-531: LGTM! Well-structured implementation

The function correctly collects FP/FN inputs for benchmarks, following the established pattern of similar collection functions.


582-588: LGTM! Coverage filtering is correctly implemented

The function effectively filters callsets based on coverage levels, ensuring only relevant callsets are included.


593-596: LGTM! Path structure aligns with PR objectives

The function correctly implements the new path structure for FP/FN results while maintaining consistency with the existing pattern.

workflow/rules/eval.smk (4)

341-343: Update output and log paths for consistency with new folder structure

The updated output and log paths in the extract_fp_fn rule reflect the new folder structure for FP/FN results, enhancing organization by grouping files under their respective callsets.


396-409: New rule collect_fp_fn_benchmark added

The collect_fp_fn_benchmark rule effectively collects FP/FN benchmarks across all callsets within a benchmark. The inputs, outputs, parameters, and script are correctly specified.


418-425: Adjust output directory and labels in report_fp_fn rule

The changes to the output directory path and labels in the report_fp_fn rule enhance clarity and maintain consistency in the reporting structure by organizing reports under the genomes directory.


436-460: New rule report_fp_fn_benchmark added

The report_fp_fn_benchmark rule appropriately generates reports for FP/FN benchmarks per benchmark. The inputs, outputs, parameters, and wrapper are correctly defined, integrating well with the existing reporting workflow.

@famosab
Copy link
Collaborator

famosab commented Nov 29, 2024

I added two more possibilities to render the results of the fp / fn variant tables in the report. I am unsure what is desirable:
image
Each folder contains the table per benchmark/genome(old view)/callset.
I would say we still need the view per genome as it also contains the calculations and a new view with a one table per callset that are sorted by benchmarks.

Something like a subcategory
subcategory=lambda w: get_benchmark_callsets(w.benchmark),

-> I was not able to achieve that yet.

@famosab famosab marked this pull request as ready for review December 9, 2024 07:20
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
workflow/resources/datavzrd/fp-fn-per-callset-config.yte.yaml (1)

1-37: Add newline at end of file

The configuration looks well-structured and follows YAML best practices. However, please add a newline at the end of the file to comply with standard file formatting practices.

🧰 Tools
🪛 yamllint (1.35.1)

[error] 37-37: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between af469ef and 313bad5.

📒 Files selected for processing (4)
  • workflow/Snakefile (1 hunks)
  • workflow/resources/datavzrd/fp-fn-per-callset-config.yte.yaml (1 hunks)
  • workflow/rules/common.smk (7 hunks)
  • workflow/rules/eval.smk (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • workflow/Snakefile
🧰 Additional context used
🪛 yamllint (1.35.1)
workflow/resources/datavzrd/fp-fn-per-callset-config.yte.yaml

[error] 37-37: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (7)
workflow/rules/eval.smk (4)

341-343: LGTM: Improved file path structure

The modified output paths now include coverage in the filename, which improves organization and makes it easier to track results across different coverage levels.


377-394: LGTM: Well-structured collection rule with memory considerations

The new collect_stratifications_fp_fn rule is well-implemented with:

  • Appropriate priority setting to manage memory usage
  • Clear parameter definitions for coverages
  • Proper error handling through conda environment

396-409: LGTM: Benchmark collection implementation

The collect_fp_fn_benchmark rule effectively aggregates benchmark-specific FP/FN data with appropriate input handling and conda environment setup.


436-462: Verify report configuration parameters

The report_fp_fn_callset rule looks good, but let's verify the configuration parameters are correctly passed to datavzrd.

✅ Verification successful

Configuration parameters are correctly passed and consistent

The datavzrd configuration files show that:

  • The fp-fn-per-callset-config.yte.yaml correctly handles the required parameters:
    • Uses params.genome and params.version for benchmark versioning
    • Properly handles params.somatic flag to conditionally hide genotype columns
    • Correctly configures coverage visualization with ordinal scale
    • Table settings (separator, excel export) match other configurations
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify datavzrd configuration parameters

# Check if the config file exists and validate its structure
cat workflow/resources/datavzrd/fp-fn-per-callset-config.yte.yaml

# Check for any other report configurations for consistency
fd -e yaml -e yml . workflow/resources/datavzrd/ -x cat {}

Length of output: 12741

workflow/rules/common.smk (3)

408-416: LGTM: Well-implemented coverage determination

The get_coverages_of_callset function effectively determines the appropriate coverage settings based on the benchmark configuration.


483-488: LGTM: Comprehensive input collection functions

The new input collection functions are well-structured and maintain consistency with the existing codebase:

  • get_collect_stratifications_fp_fn_input: Handles stratification data collection
  • get_fp_fn_reports_benchmarks: Manages benchmark-specific reporting
  • get_collect_fp_fn_benchmark_input: Handles benchmark input collection

Also applies to: 504-511, 528-533


584-590: LGTM: Improved callset filtering

The modifications to get_collect_fp_fn_callsets and get_collect_fp_fn_labels now properly filter callsets based on coverage settings, ensuring consistent data handling.

Also applies to: 603-607

workflow/rules/common.smk Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
workflow/rules/common.smk (2)

408-416: Consider reusing get_coverages logic.

The function duplicates logic from get_coverages. Consider refactoring to reuse the existing function:

def get_coverages_of_callset(callset):
-    benchmark = config["variant-calls"][callset]["benchmark"]
-    high_cov_status = benchmarks[benchmark].get("high-coverage", False)
-    if high_cov_status:
-        coverages = high_coverages
-    else:
-        coverages = low_coverages
-    return coverages
+    class CallsetWildcards:
+        def __init__(self, callset):
+            self.callset = callset
+    
+    return get_coverages(CallsetWildcards(callset))

504-511: Simplify benchmark set creation.

The set comprehension is redundant since used_benchmarks is already a set.

def get_fp_fn_reports_benchmarks(wildcards):
    for genome in used_genomes:
        yield from expand(
            "results/report/fp-fn/benchmarks/{benchmark}/{classification}",
-           benchmark={benchmark for benchmark in used_benchmarks},
+           benchmark=used_benchmarks,
            classification=["fp", "fn"],
        )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 313bad5 and 6bb7766.

📒 Files selected for processing (1)
  • workflow/rules/common.smk (6 hunks)
🔇 Additional comments (6)
workflow/rules/common.smk (6)

22-23: LGTM: Clean implementation of used_callsets.

The set comprehension is an efficient way to collect all callset keys.


483-488: LGTM: Well-structured input path generation.

The function follows the established pattern and correctly handles empty coverage filtering.


528-533: LGTM: Clean implementation of benchmark-based FP/FN collection.

The function correctly uses get_benchmark_callsets and follows the established pattern for input path generation.


584-590: LGTM: Effective callset filtering based on coverage.

The function correctly filters callsets based on coverage compatibility using the new get_coverages_of_callset function.


595-598: LGTM: Well-structured input path generation.

The function correctly uses filtered callsets and follows the established pattern for input path generation.


602-603: LGTM: Consistent label handling.

The function correctly uses filtered callsets and reuses get_callset_label_entries for consistent label handling.

@BiancaStoecker BiancaStoecker merged commit 0464700 into main Dec 13, 2024
7 checks passed
@BiancaStoecker BiancaStoecker deleted the feat/improve-fp-fn-report branch December 13, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can only run either WGS or WES/FFPE callsets Report FP / FN negative variants in single table
3 participants