Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RunLmpHDF5 #267

Merged
merged 5 commits into from
Oct 21, 2024
Merged

Add RunLmpHDF5 #267

merged 5 commits into from
Oct 21, 2024

Conversation

zjgemi
Copy link
Collaborator

@zjgemi zjgemi commented Oct 18, 2024

Summary by CodeRabbit

  • New Features
    • Introduced dynamic selection of exploration operations based on configuration settings, enhancing configurability.
    • Added a new RunLmpHDF5 option for improved data handling and output types.
  • Improvements
    • Enhanced the RunLmp class with new methods for better processing of model deviation files and output data storage options.
    • Improved type safety and error handling in the TrajRenderLammps class, including better management of temperature settings.
  • Dependency Updates
    • Updated the version requirement for the pydflow dependency to >=1.8.95.

Signed-off-by: zjgemi <[email protected]>
Copy link

coderabbitai bot commented Oct 18, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes involve modifications to the make_concurrent_learning_op function in submit.py to improve the selection logic for operations based on explore_style. A new export, RunLmpHDF5, is introduced in the __init__.py file, and the RunLmp class in run_lmp.py is enhanced with a subclass that supports HDF5 output. The execute method in RunLmp is updated to include a new method for processing model deviation files, and the argument handling in lmp_args is adjusted to accommodate HDF5 usage.

Changes

File Change Summary
dpgen2/entrypoint/submit.py Modified make_concurrent_learning_op to enhance operation selection based on explore_style.
dpgen2/op/init.py Added new export RunLmpHDF5 to the module.
dpgen2/op/run_lmp.py Enhanced RunLmp class with RunLmpHDF5 subclass; added methods get_model_devi and get_output_sign; modified execute and lmp_args methods.
dpgen2/exploration/render/traj_render_lammps.py Updated TrajRenderLammps class for type safety and improved temperature handling; modified several method signatures.
pyproject.toml Updated pydflow dependency version from >=1.8.88 to >=1.8.95.

Possibly related PRs

  • Fix bugs related to lammps PIMD #266: The changes in dpgen2/op/run_lmp.py related to the RunLmp class may interact with the modifications made to the same class in the main PR, particularly regarding the execute method and overall control flow for LAMMPS operations.

Suggested reviewers

  • wanghan-iapcm

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between cae6577 and 5cf58f3.

📒 Files selected for processing (1)
  • dpgen2/exploration/render/traj_render_lammps.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dpgen2/exploration/render/traj_render_lammps.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
dpgen2/entrypoint/submit.py (1)

Line range hint 1-1000: Overall assessment of HDF5 implementation changes.

The changes in this file are focused on implementing HDF5 support in the concurrent learning workflow. The modifications are well-contained and consistent, affecting only the relevant parts of the make_concurrent_learning_op function. These changes enhance the flexibility of the system by allowing users to opt for HDF5 output when desired.

To ensure the robustness of this implementation, consider the following:

  1. Verify that the use_hdf5 flag is properly propagated from the user configuration to this point in the code.
  2. Ensure that any downstream code that processes the output of these operations is capable of handling both HDF5 and non-HDF5 formats.
  3. Update relevant documentation to reflect this new option for users.

To further improve this implementation, consider:

  1. Creating a configuration validation step to ensure use_hdf5 is always a boolean value.
  2. Implementing unit tests to verify the correct selection of HDF5 vs. non-HDF5 operations based on the configuration.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 336d385 and 899aaba.

📒 Files selected for processing (3)
  • dpgen2/entrypoint/submit.py (2 hunks)
  • dpgen2/op/init.py (1 hunks)
  • dpgen2/op/run_lmp.py (6 hunks)
🧰 Additional context used
🪛 Ruff
dpgen2/op/__init__.py

39-39: .run_lmp.RunLmpHDF5 imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

🔇 Additional comments (9)
dpgen2/entrypoint/submit.py (2)

Line range hint 236-236: Consistent application of HDF5 conditional logic for diffcsp exploration.

The same conditional logic for HDF5 usage has been applied to the diffcsp exploration style, selecting between RunRelaxHDF5 and RunRelax. This change maintains consistency with the previous modification and enhances the configurability of the exploration operations.


191-191: Conditional selection of RunLmpHDF5 or RunLmp based on use_hdf5 flag.

The code now conditionally selects between RunLmpHDF5 and RunLmp based on the use_hdf5 flag in the explore_config. This change allows for more flexible handling of exploration configurations, particularly when HDF5 format is preferred.

To ensure this change is consistently applied throughout the codebase, let's verify the usage of RunLmpHDF5:

✅ Verification successful

To accurately verify the usage of RunLmpHDF5, please execute the following updated shell scripts:


To ensure consistent and appropriate usage of RunLmpHDF5 across the codebase, please execute the following shell scripts to inspect its context in other files:


Conditional selection of RunLmpHDF5 confirmed as unique within submit.py.

The conditional usage is properly encapsulated and does not affect other parts of the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for other occurrences of RunLmpHDF5 and its conditional usage

# Search for RunLmpHDF5 usage
echo "Occurrences of RunLmpHDF5:"
rg --type python "RunLmpHDF5"

# Search for conditional usage similar to the one in this file
echo "\nConditional usage of RunLmpHDF5:"
rg --type python "RunLmpHDF5 if.*use_hdf5.*else RunLmp"

Length of output: 291


Script:

#!/bin/bash
# Description: Check for other occurrences of RunLmpHDF5 and its conditional usage without specifying file type

# Search for RunLmpHDF5 usage
echo "Occurrences of RunLmpHDF5:"
rg "RunLmpHDF5"

# Search for conditional usage similar to the one in this file
echo "\nConditional usage of RunLmpHDF5:"
rg "RunLmpHDF5 if.*use_hdf5.*else RunLmp"

Length of output: 550


Script:

#!/bin/bash
# Description: Inspect the context of RunLmpHDF5 usage in dpgen2/op/__init__.py and dpgen2/op/run_lmp.py

# Check usage in dpgen2/op/__init__.py
echo "Context of RunLmpHDF5 in dpgen2/op/__init__.py:"
rg "RunLmpHDF5" dpgen2/op/__init__.py -A 5 -B 5

# Check usage in dpgen2/op/run_lmp.py
echo "\nContext of RunLmpHDF5 in dpgen2/op/run_lmp.py:"
rg "RunLmpHDF5" dpgen2/op/run_lmp.py -A 5 -B 5

Length of output: 854

dpgen2/op/run_lmp.py (7)

17-17: Add numpy to the project's dependencies.

The import of numpy introduces a new dependency. Ensure that numpy is added to the project's requirements (e.g., requirements.txt or setup.py) so that it is available in all environments where this code runs.


30-30: Verify the availability of HDF5Datasets.

The HDF5Datasets class is imported from dflow.python. Confirm that this class is available in the version of dflow.python used by the project and that it functions as expected.


205-205: Refactor to use the get_model_devi method.

The execute method now uses self.get_model_devi to process the model_devi output. This change promotes extensibility by allowing subclasses to customize how model_devi is handled.


218-220: Introduce get_model_devi method for extensibility.

The get_model_devi method is added to the RunLmp class, returning the model_devi_file by default. This allows subclasses to override this method to customize the processing of model_devi.


252-258: Add use_hdf5 argument to support HDF5 output.

A new argument use_hdf5 is introduced in the lmp_args method to toggle the usage of HDF5 for storing trajectories and model deviations. Ensure that this parameter is correctly integrated into the task configuration and that users are aware of its functionality.


396-397: Ensure correct handling of HDF5Datasets artifacts.

In the overridden get_output_sign method of RunLmpHDF5, the types for traj and model_devi outputs are changed to Artifact(HDF5Datasets). Confirm that downstream components expect this artifact type and can handle HDF5 data appropriately.


400-401: ⚠️ Potential issue

Use appropriate method to read HDF5 files.

The get_model_devi method in RunLmpHDF5 uses np.loadtxt to read model_devi_file. Since HDF5 files are in binary format, np.loadtxt is not suitable for reading them. Consider using the h5py library or other appropriate methods to read HDF5 datasets.

Apply the following changes:

  • Import the h5py library.
  • Update the get_model_devi method to use h5py for reading the HDF5 file.
+import h5py

class RunLmpHDF5(RunLmp):
    @classmethod
    def get_output_sign(cls):
        output_sign = super().get_output_sign()
        output_sign["traj"] = Artifact(HDF5Datasets)
        output_sign["model_devi"] = Artifact(HDF5Datasets)
        return output_sign

    def get_model_devi(self, model_devi_file):
-       return np.loadtxt(model_devi_file)
+       with h5py.File(model_devi_file, 'r') as f:
+           data = f['/path/to/dataset'][...]
+       return data

Ensure to replace '/path/to/dataset' with the actual dataset path within the HDF5 file.

@@ -36,6 +36,7 @@
)
from .run_lmp import (
RunLmp,
RunLmpHDF5,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Define __all__ in dpgen2/op/__init__.py to include RunLmpHDF5

The __all__ variable is not defined in dpgen2/op/__init__.py. To make RunLmpHDF5 part of the public API and resolve the unused import warning, consider adding an __all__ list that includes RunLmpHDF5 along with other public classes and functions.

Example:

__all__ = [
    "CollectData",
    "CollRunCaly",
    "DiffCSPGen",
    "PrepCalyDPOptim",
    "PrepCalyInput",
    "PrepCalyModelDevi",
    "PrepDPTrain",
    "PrepLmp",
    "PrepRelax",
    "RunCalyDPOptim",
    "RunCalyModelDevi",
    "RunDPTrain",
    "RunLmp",
    "RunLmpHDF5",
    "RunRelax",
    "RunRelaxHDF5",
    "SelectConfs",
]
🔗 Analysis chain

Consider adding RunLmpHDF5 to all if it's part of the public API

The addition of RunLmpHDF5 to the imports is good, as it makes the new functionality available to other parts of the package. However, the static analysis tool has flagged this import as unused within this file.

If RunLmpHDF5 is intended to be part of the public API of this package, consider adding it to the __all__ list (if one exists) to explicitly define what should be imported when using from dpgen2.op import *.

To verify if __all__ is defined in this file and if so, what it contains, you can run the following script:

If __all__ is not defined, consider adding it to explicitly control what's exported:

__all__ = [
    "CollectData",
    "CollRunCaly",
    "DiffCSPGen",
    "PrepCalyDPOptim",
    "PrepCalyInput",
    "PrepCalyModelDevi",
    "PrepDPTrain",
    "PrepLmp",
    "PrepRelax",
    "RunCalyDPOptim",
    "RunCalyModelDevi",
    "RunDPTrain",
    "RunLmp",
    "RunLmpHDF5",
    "RunRelax",
    "RunRelaxHDF5",
    "SelectConfs",
]
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for __all__ definition in __init__.py

# Test: Search for __all__ definition
rg --type python '__all__\s*=' dpgen2/op/__init__.py

# If __all__ is not found, suggest adding it
if [ $? -ne 0 ]; then
    echo "__all__ not found. Consider adding it with RunLmpHDF5 included."
fi

Length of output: 240


Script:

#!/bin/bash
# Description: Check for __all__ definition in __init__.py without specifying file type

# Test: Search for __all__ definition
rg '__all__\s*=' dpgen2/op/__init__.py

# Capture the exit status
if [ $? -ne 0 ]; then
    echo "__all__ not found. Consider adding it with RunLmpHDF5 included."
else
    echo "__all__ is defined."
fi

Length of output: 195

🧰 Tools
🪛 Ruff

39-39: .run_lmp.RunLmpHDF5 imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

Comment on lines +252 to +258
Argument(
"use_hdf5",
bool,
optional=True,
default=False,
doc=doc_use_hdf5,
),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement logic for use_hdf5 in the execute method.

The use_hdf5 parameter is added, but the execute method does not currently utilize it to alter the behavior of the class. Consider implementing conditional logic in the execute method to handle cases when use_hdf5 is True, such as processing outputs differently or initializing RunLmpHDF5.

Copy link

codecov bot commented Oct 18, 2024

Codecov Report

Attention: Patch coverage is 71.42857% with 6 lines in your changes missing coverage. Please review.

Project coverage is 83.65%. Comparing base (336d385) to head (5cf58f3).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
dpgen2/op/run_lmp.py 61.53% 5 Missing ⚠️
dpgen2/exploration/render/traj_render_lammps.py 87.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #267      +/-   ##
==========================================
- Coverage   83.70%   83.65%   -0.05%     
==========================================
  Files         104      104              
  Lines        6015     6028      +13     
==========================================
+ Hits         5035     5043       +8     
- Misses        980      985       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
dpgen2/exploration/render/traj_render_lammps.py (3)

Line range hint 74-92: Enhance documentation and error handling in get_ele_temp method.

The get_ele_temp method introduces new functionality, but its purpose and behavior are not immediately clear from the implementation.

Consider the following improvements:

  1. Add a docstring explaining the method's purpose, parameters, and return value.
  2. Document the significance of different use_ele_temp values (0, 1, 2).
  3. Consider using an enum for use_ele_temp to make the code more self-documenting.
  4. Add error handling for the case where optional_outputs is empty or None.
  5. Clarify the purpose and impact of the setup_ele_temp function calls.

Example docstring:

def get_ele_temp(self, optional_outputs):
    """
    Process optional output files to extract electronic temperature data.

    Args:
        optional_outputs (List[Path]): List of paths to optional output files.

    Returns:
        List[float]: List of electronic temperatures extracted from the files.

    Raises:
        ValueError: If use_ele_temp is not 0, 1, or 2.
    """
🧰 Tools
🪛 Ruff

62-65: Use ternary operator dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname) instead of if-else-block

(SIM108)

🪛 GitHub Check: pyright

[failure] 66-66:
Argument of type "Unknown | NDArray[float64] | None" cannot be assigned to parameter "a" of type "ArrayLike" in function "shape" (reportGeneralTypeIssues)


Line range hint 94-102: Improve documentation and error handling in set_ele_temp method.

The set_ele_temp method modifies the system.data dictionary based on use_ele_temp, but its behavior and purpose are not immediately clear.

Consider the following improvements:

  1. Add a docstring explaining the method's purpose, parameters, and side effects.
  2. Document the significance of different use_ele_temp values and their impact on 'fparam' and 'aparam'.
  3. Add error handling or logging for cases where use_ele_temp is 0 or any unexpected value.
  4. Consider using an enum for use_ele_temp to make the code more self-documenting.

Example docstring:

def set_ele_temp(self, system, ele_temp):
    """
    Set electronic temperature data in the system object.

    Args:
        system (dpdata.System): The system object to modify.
        ele_temp (float): The electronic temperature value to set.

    Side effects:
        Modifies system.data['fparam'] or system.data['aparam'] based on use_ele_temp.

    Note:
        This method has no effect if use_ele_temp is 0 or any value other than 1 or 2.
    """
🧰 Tools
🪛 Ruff

62-65: Use ternary operator dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname) instead of if-else-block

(SIM108)

🪛 GitHub Check: pyright

[failure] 66-66:
Argument of type "Unknown | NDArray[float64] | None" cannot be assigned to parameter "a" of type "ArrayLike" in function "shape" (reportGeneralTypeIssues)


Line range hint 104-140: Address potential issues and improve error handling in get_confs method.

The get_confs method has been updated to incorporate electronic temperature functionality, but there are some potential issues and areas for improvement.

Consider the following changes:

  1. Replace the assertion assert ntraj == len(optional_outputs) with a more robust check:
    if optional_outputs is not None and ntraj != len(optional_outputs):
        raise ValueError("Number of trajectories must match the number of optional outputs")
  2. Investigate and address the root cause of the type ignore comment on line 127. If possible, resolve the type mismatch without suppressing the type checker.
  3. Add error handling for cases where get_ele_temp or set_ele_temp might fail.
  4. Consider adding a docstring to explain the method's purpose, parameters, and return value.

Example docstring:

def get_confs(
    self,
    trajs: Union[List[Path], List[HDF5Dataset]],
    id_selected: List[List[int]],
    type_map: Optional[List[str]] = None,
    conf_filters: Optional["ConfFilters"] = None,
    optional_outputs: Optional[List[Path]] = None,
) -> dpdata.MultiSystems:
    """
    Extract configurations from trajectories based on selected IDs and apply filters.

    Args:
        trajs: List of trajectory files or HDF5Datasets.
        id_selected: List of lists containing selected frame IDs for each trajectory.
        type_map: Optional mapping of atom types.
        conf_filters: Optional filters to apply to configurations.
        optional_outputs: Optional list of paths to additional output files.

    Returns:
        A MultiSystems object containing the selected and filtered configurations.

    Raises:
        ValueError: If the number of trajectories doesn't match the number of optional outputs.
    """
🧰 Tools
🪛 Ruff

62-65: Use ternary operator dd = fname.get_data() if isinstance(fname, HDF5Dataset) else np.loadtxt(fname) instead of if-else-block

(SIM108)

🪛 GitHub Check: pyright

[failure] 66-66:
Argument of type "Unknown | NDArray[float64] | None" cannot be assigned to parameter "a" of type "ArrayLike" in function "shape" (reportGeneralTypeIssues)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 899aaba and 47a551d.

📒 Files selected for processing (2)
  • dpgen2/exploration/render/traj_render_lammps.py (1 hunks)
  • pyproject.toml (1 hunks)
🧰 Additional context used
🔇 Additional comments (1)
pyproject.toml (1)

20-20: Verify the necessity of updating pydflow dependency.

The pydflow dependency has been updated from '>=1.8.88' to '>=1.8.95'. This change might be necessary due to new features or bug fixes in pydflow that are required for this project.

To ensure this update is intentional and necessary, please run the following script:

Can you confirm if this update is required for any specific features or bug fixes in the latest pydflow version?

@@ -60,18 +60,18 @@

def _load_one_model_devi(self, fname, model_devi):
if isinstance(fname, HDF5Dataset):
dd = fname.get_data()
dd = fname.get_data() # type: ignore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Investigate and address the root cause of the type ignore comment.

The addition of # type: ignore suggests a potential type mismatch that's being suppressed. While this allows the code to pass type checking, it may hide underlying issues.

Consider investigating the root cause of this type mismatch. If possible, modify the code to resolve the type issue without needing to suppress the type checker. If the suppression is absolutely necessary, add a comment explaining why it's needed and any potential risks.

Signed-off-by: zjgemi <[email protected]>
@wanghan-iapcm wanghan-iapcm merged commit 08d8d6e into deepmodeling:master Oct 21, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants