Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2170: Add unit and E2E tests for model and dataset initializers #2323

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

seanlaii
Copy link
Contributor

@seanlaii seanlaii commented Nov 9, 2024

What this PR does / why we need it:
I added unit tests and e2e tests for model and dataset initializers.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2305

Checklist:

  • Docs included if any changes are user facing

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines 59 to 70
# Private HuggingFace dataset test
# (
# "HuggingFace - Private dataset",
# "huggingface",
# {
# "storage_uri": "hf://username/private-dataset",
# "use_real_token": True,
# "expected_files": ["config.json", "dataset.safetensors"],
# "expected_error": None
# }
# ),
# Invalid HuggingFace dataset test
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an access token for testing login and downloading resources from private repo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, maybe we can track this in a separate issue that we should create Kubeflow-owned account in HF for the Token.

Comment on lines 19 to 21
current_dir = os.path.dirname(os.path.abspath(__file__))
self.temp_dir = tempfile.mkdtemp(dir=current_dir)
os.environ[VOLUME_PATH_DATASET] = self.temp_dir
Copy link
Contributor Author

@seanlaii seanlaii Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently test the dataset/model download by downloading resources to a temp folder and removing the temp folder after the test.

Comment on lines 46 to 52
@pytest.fixture
def real_hf_token():
"""Fixture to provide real HuggingFace token for E2E tests"""
token = os.getenv("HUGGINGFACE_TOKEN")
# if not token:
# pytest.skip("HUGGINGFACE_TOKEN environment variable not set")
return token
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a private token, I will use this fixture to inject the token. If we don't, I can remove this.

@coveralls
Copy link

coveralls commented Nov 9, 2024

Pull Request Test Coverage Report for Build 12346097008

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall first build on initializer-test at 100.0%

Totals Coverage Status
Change from base Build 12345273877: 100.0%
Covered Lines: 85
Relevant Lines: 85

💛 - Coveralls

python3 -m pip install -e sdk/python; pytest -s sdk/python/test --log-cli-level=debug --namespace=default
env:
GANG_SCHEDULER_NAME: ${{ matrix.gang-scheduler-name }}

- name: Run specific tests for Python 3.10+
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since match is released in python 3.10, I created another step for the e2e.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you use match in the tests ?

Copy link
Contributor Author

@seanlaii seanlaii Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point.
Let's actually use the same Python version that we use in our initializer images: https://github.com/kubeflow/training-operator/blob/master/cmd/initializer_v2/dataset/Dockerfile#L1.
E.g. Python 3.11

"HuggingFace - Public dataset",
"huggingface",
{
"storage_uri": "hf://karpathy/tiny_shakespeare",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone know which dataset/model in huggingface is suitable for the connectivity test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaii Which connectivity test do you want to perform ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to test the actual downloading process and would like to know if there is any recommended dataset/model for testing. I currently choose a dataset that is only 1.11 MB.

@seanlaii seanlaii force-pushed the initializer-test branch 4 times, most recently from 8930b80 to c6e0a83 Compare November 9, 2024 18:17
@seanlaii
Copy link
Contributor Author

seanlaii commented Nov 26, 2024

Hi @andreyvelich ,

Could you help review this PR? I have some questions. Once the SDK's PR gets approved, I will modify it accordingly.

Thank you!

@andreyvelich
Copy link
Member

@seanlaii Sorry for the delay, sure, I will review it today

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this effort @seanlaii!
I left my initial thoughts.
Please take a look @Electronic-Waste @deepanker13 @kubeflow/wg-training-leads @varshaprasad96 @akshaychitneni @saileshd1402

python3 -m pip install -e sdk/python; pytest -s sdk/python/test --log-cli-level=debug --namespace=default
env:
GANG_SCHEDULER_NAME: ${{ matrix.gang-scheduler-name }}

- name: Run specific tests for Python 3.10+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you use match in the tests ?

.github/workflows/test-python.yaml Outdated Show resolved Hide resolved
pkg/initializer_v2/test/unit/dataset/test_dataset.py Outdated Show resolved Hide resolved
pkg/initializer_v2/test/unit/model/test_model_config.py Outdated Show resolved Hide resolved
pkg/initializer_v2/test/unit/model/test_model.py Outdated Show resolved Hide resolved
pkg/initializer_v2/test/unit/test_utils.py Outdated Show resolved Hide resolved
from sdk.python.kubeflow.storage_initializer.constants import VOLUME_PATH_MODEL


class TestModelE2E:
Copy link
Member

@andreyvelich andreyvelich Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaii @kubeflow/wg-training-leads @deepanker13 @Electronic-Waste @saileshd1402 What do you think about actually using Kubernetes to perform E2E tests for our initializers ?
E.g. we can deploy a single Pod that runs two initContainer for initializers and one Container to just verify that model and dataset exists under /workspace/model and /workspace/dataset dirs.

In that case, in our E2Es we verify that our Docker containers actually work to initialize assets.

Do we see any values in tests that I propose compare to running just initializers Python scripts ?

pkg/initializer_v2/test/unit/dataset/test_dataset.py Outdated Show resolved Hide resolved
@seanlaii seanlaii force-pushed the initializer-test branch 5 times, most recently from 228c4b6 to 08fbd57 Compare December 16, 2024 05:39
@seanlaii
Copy link
Contributor Author

Hi @andreyvelich , could you help review the PR? I addressed the comments. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KEP-2170: Add unit and E2E tests for model and dataset initializers
4 participants