Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transforms 1.0.0a0 refactored language transforms #879

Merged
merged 135 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
5ca102f
html2parquet example
touma-I Nov 13, 2024
ee1b3c5
fix typo with package name
touma-I Nov 13, 2024
244a7b3
remove output folder from git
touma-I Nov 16, 2024
3b8576d
Merge branch 'dev' into html2parquet-simplify
touma-I Nov 16, 2024
402cfb9
first draft make test-src working
touma-I Nov 17, 2024
078179a
fix docker build for all 3 runtime
touma-I Nov 17, 2024
a222cb3
fix publish-image
touma-I Nov 17, 2024
d3df908
use module name to invoke entry point
touma-I Nov 17, 2024
00c2d70
added kind-load-image required for workflow test
touma-I Nov 17, 2024
c128893
fix setting for transform runtime being used. The global setting is n…
touma-I Nov 17, 2024
b44fff1
override DOCKER_REMOTE_IMAGE defined in .make.defaults
touma-I Nov 17, 2024
6631851
chasing after bug with DOCKER_REMOTE_IMAGE value
touma-I Nov 17, 2024
1c58a5c
chasing after DOCKER_REMOTE_IMAGE
touma-I Nov 17, 2024
981930c
bug fixes
touma-I Nov 18, 2024
3481667
odify notebook with simplified API
touma-I Nov 18, 2024
890a9da
keep for comparaison
touma-I Nov 18, 2024
e99e0bc
fix typo
touma-I Nov 18, 2024
590f5dd
added __init__
touma-I Nov 18, 2024
777fb9d
no longer needed
touma-I Nov 18, 2024
d7dfa3e
Addd test-image-python/ray/spark
touma-I Nov 19, 2024
f7fc598
first cut at refactoring pdf2parquet
touma-I Nov 19, 2024
a2835b3
added notebook and bug fixes
touma-I Nov 20, 2024
6cda6fb
fix typo
Nov 24, 2024
3c0187c
merge with dev with latest update for readme.md and notebook
touma-I Nov 24, 2024
8c8f6e8
first cut at refactoring doc_chunk
touma-I Nov 24, 2024
63051cb
-ssome cleanup
touma-I Nov 24, 2024
012e254
first cut refactoring dpk_text_encoder
touma-I Nov 24, 2024
7202347
fix type
touma-I Nov 25, 2024
3d7cb02
added runtime notebooks
touma-I Nov 25, 2024
38f7e25
merge with latest from dev
touma-I Nov 27, 2024
7b2381d
merge with dev
touma-I Dec 4, 2024
9a28f50
update notebook for new structure
touma-I Dec 4, 2024
07345e6
merge with dev
touma-I Dec 4, 2024
5052622
refactored doc quality transform as its own module
touma-I Dec 4, 2024
70cc5a7
use latest make targets
touma-I Dec 4, 2024
ae79ab5
Enhance make target
touma-I Dec 4, 2024
d3bd5ec
merge with dev
touma-I Dec 4, 2024
3c9b999
refactored code as its own module
touma-I Dec 5, 2024
49a22ae
added __init__
touma-I Dec 5, 2024
24b6d9d
merge with latest from dev
touma-I Dec 5, 2024
808521a
Fix typo
touma-I Dec 5, 2024
b8f3b6f
remove spark unit test for now
touma-I Dec 5, 2024
3baad3e
Show example for runnign ray runtime
touma-I Dec 5, 2024
37881ef
fixing issues with spark
touma-I Dec 5, 2024
9c869ee
remove BASE_IMAGE arg from dockerfile.spark
touma-I Dec 6, 2024
6fa0c04
added login to quay.io
touma-I Dec 6, 2024
bbe9a02
debug registry credential
touma-I Dec 6, 2024
b77aaa3
use dpk secrets
touma-I Dec 6, 2024
2c52fd5
testing registry user
touma-I Dec 6, 2024
5ffadb4
testing registry user
touma-I Dec 6, 2024
8b234a4
testing environment secrets
touma-I Dec 6, 2024
878fa42
testing environment secrets
touma-I Dec 6, 2024
5632c06
testing environment secrets
touma-I Dec 6, 2024
6da893f
testing environment secrets
touma-I Dec 6, 2024
d426439
testing environment secrets
touma-I Dec 6, 2024
e8bb04a
Delete .github/workflows/test-universal-doc_id.yml
touma-I Dec 6, 2024
3647ebc
restore workflow file
touma-I Dec 6, 2024
adaf78f
clear testing of docker login
touma-I Dec 6, 2024
fda7999
first cut at refactoring with own dpk_lang_id name space
touma-I Dec 9, 2024
b50988b
restore missing file
touma-I Dec 9, 2024
7836832
-sAdded README.md by combing python and ray
touma-I Dec 9, 2024
22d844a
README changes
shahrokhDaijavad Dec 9, 2024
af0f89b
Refactor hap transform as its own module
touma-I Dec 10, 2024
59a4fba
update cicd make file
touma-I Dec 10, 2024
0c38a4c
fix Makefile failing targets
touma-I Dec 10, 2024
dd3a63e
Merge branch 'dev' into hap-simplify
touma-I Dec 10, 2024
64781b9
fix notebook
touma-I Dec 10, 2024
fd0b261
merged with dev
touma-I Dec 10, 2024
e4ad78e
README changes
shahrokhDaijavad Dec 10, 2024
55dcce1
README changes
shahrokhDaijavad Dec 10, 2024
07b91ec
README changes
shahrokhDaijavad Dec 10, 2024
87eecf0
README changes
shahrokhDaijavad Dec 10, 2024
3b2420b
fix notebook
touma-I Dec 10, 2024
30912d1
More changes to README
shahrokhDaijavad Dec 10, 2024
2e324d4
README changes
shahrokhDaijavad Dec 10, 2024
62166d9
More README changes
shahrokhDaijavad Dec 10, 2024
02bc21a
added python notebook
touma-I Dec 10, 2024
ca3efdb
README changes
shahrokhDaijavad Dec 10, 2024
8ac2d8c
fix Makefile cli target although not very useful
touma-I Dec 10, 2024
d1ad598
README changes
shahrokhDaijavad Dec 10, 2024
0a0e785
added notebook and fix makefile
touma-I Dec 10, 2024
411bc19
test notebook with new simplified API
touma-I Dec 11, 2024
ac9b954
added hap extra to pip install
touma-I Dec 11, 2024
83d7e7c
fix pip install for notebook
touma-I Dec 11, 2024
97bc8f2
fix notebooks
touma-I Dec 11, 2024
127b290
fix notebook
touma-I Dec 11, 2024
0cbe63f
fix notebook
touma-I Dec 11, 2024
6c54442
added notebooks
touma-I Dec 11, 2024
0722d63
first cut at refactoring tokenization
touma-I Dec 11, 2024
18d0c46
fix script string
touma-I Dec 11, 2024
cbb68bc
added notebook
touma-I Dec 11, 2024
6f0dfef
README changes
shahrokhDaijavad Dec 11, 2024
73a69be
README fix of the link to the notebook
shahrokhDaijavad Dec 11, 2024
c59e5bb
fix typo in script name
touma-I Dec 11, 2024
43d792a
Merge branch 'dev' into pdf2parquet-simplify
touma-I Dec 11, 2024
2d4e3b4
fix notebook:
touma-I Dec 12, 2024
cabd577
fix notebook
touma-I Dec 12, 2024
a26715b
fix notebook
touma-I Dec 12, 2024
f877db4
fix notebook
touma-I Dec 12, 2024
f5940d1
fix notebook
touma-I Dec 12, 2024
42d3587
fix notebook
touma-I Dec 12, 2024
2aff557
fix notebook
touma-I Dec 12, 2024
70e2506
fix notebook
touma-I Dec 12, 2024
c251033
fix notebook
touma-I Dec 12, 2024
4750dd7
added ray notebook
touma-I Dec 12, 2024
2eb47bd
Added the link to the Ray notebook in README
shahrokhDaijavad Dec 12, 2024
0c3ae86
Fix make=cli-sample
matouma Dec 13, 2024
5d7f901
fix sample target
matouma Dec 13, 2024
b2ce1b6
fix sample target
matouma Dec 13, 2024
aa97a17
fix sample targets
matouma Dec 13, 2024
07bc76b
fix script string
matouma Dec 13, 2024
7965b5e
fix wf exec script
matouma Dec 14, 2024
e144927
fix pdf2parquet script
matouma Dec 14, 2024
664082c
Merge branch 'html2parquet-simplify' into alpha-1.0
matouma Dec 14, 2024
8006919
merge doc_chunk
matouma Dec 14, 2024
0b56c8e
merge from dev after new release
matouma Dec 14, 2024
9feee2d
added text_encoder
matouma Dec 14, 2024
c5c1540
added doc_id
matouma Dec 14, 2024
f4773d4
merge doc_quality
matouma Dec 14, 2024
d571654
Merged hap
matouma Dec 14, 2024
327a5be
renamed kfp_ray.disabled
matouma Dec 14, 2024
4ff3ec7
removed kfp_ray.disable
matouma Dec 14, 2024
e9321a1
added hap to block list
matouma Dec 14, 2024
4873ed6
changed folder location for test-data
matouma Dec 14, 2024
98c8019
fix exec script for doc_quality
matouma Dec 14, 2024
45ac542
added lang_id
matouma Dec 14, 2024
352e267
fix dcc_quality kfp makefile
matouma Dec 15, 2024
b7c74f9
merge tokenization
matouma Dec 15, 2024
4014a3d
fix reference to test data
matouma Dec 15, 2024
0cae865
fix path to bad word filepath
touma-I Dec 16, 2024
7fdfc39
mege from dev
touma-I Dec 16, 2024
86e1789
merge with latest from dev
touma-I Dec 17, 2024
2748821
merge with dev and remove tokenization transform
touma-I Dec 17, 2024
9c2e3a4
added notebook
touma-I Dec 17, 2024
6a43092
update dependencies
touma-I Dec 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 5 additions & 5 deletions .make.versions
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ DPK_MAJOR_VERSION=0
# The minor version is incremented manually when significant features have been added that are backward compatible with the previous major.minor release.
DPK_MINOR_VERSION=2
# The minor version is incremented AUTOMATICALLY by the release.sh script when a new release is set.
DPK_MICRO_VERSION=3
DPK_MICRO_VERSION=4
# The suffix is generally always set in the main/development branch and only nulled out when creating release branches.
# It can be manually incremented, for example, to allow publishing a new intermediate version wheel to pypi.
DPK_VERSION_SUFFIX=
DPK_VERSION_SUFFIX=.dev0

DPK_VERSION=$(DPK_MAJOR_VERSION).$(DPK_MINOR_VERSION).$(DPK_MICRO_VERSION)$(DPK_VERSION_SUFFIX)

Expand All @@ -36,8 +36,8 @@ DPK_LIB_KFP_VERSION=$(DPK_VERSION)
DPK_LIB_KFP_VERSION_v2=$(DPK_VERSION)
DPK_LIB_KFP_SHARED=$(DPK_VERSION)

KFP_DOCKER_VERSION=$(DOCKER_IMAGE_VERSION)
KFP_DOCKER_VERSION_v2=$(DOCKER_IMAGE_VERSION)
KFP_DOCKER_VERSION=0.2.3
KFP_DOCKER_VERSION_v2=0.2.3

DPK_CONNECTOR_VERSION=0.2.4.dev0

Expand Down Expand Up @@ -66,4 +66,4 @@ endif
#
# If you change the versions numbers, be sure to run "make set-versions" to
# update version numbers across the transform (e.g., pyproject.toml).
TRANSFORMS_PKG_VERSION=0.2.3
TRANSFORMS_PKG_VERSION=1.0.0a0
2 changes: 1 addition & 1 deletion data-processing-lib/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "data_prep_toolkit"
version = "0.2.3"
version = "0.2.4.dev0"
keywords = ["data", "data preprocessing", "data preparation", "llm", "generative", "ai", "fine-tuning", "llmapps" ]
requires-python = ">=3.10,<3.13"
description = "Data Preparation Toolkit Library for Ray and Python"
Expand Down
2 changes: 1 addition & 1 deletion kfp/kfp_ray_components/createRayClusterComponent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ inputs:

implementation:
container:
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
# command is a list of strings (command-line arguments).
# The YAML language has two syntaxes for lists and you can use either of them.
# Here we use the "flow syntax" - comma-separated strings inside square brackets.
Expand Down
2 changes: 1 addition & 1 deletion kfp/kfp_ray_components/deleteRayClusterComponent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ inputs:

implementation:
container:
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
# command is a list of strings (command-line arguments).
# The YAML language has two syntaxes for lists and you can use either of them.
# Here we use the "flow syntax" - comma-separated strings inside square brackets.
Expand Down
2 changes: 1 addition & 1 deletion kfp/kfp_ray_components/executeRayJobComponent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ inputs:

implementation:
container:
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
# command is a list of strings (command-line arguments).
# The YAML language has two syntaxes for lists and you can use either of them.
# Here we use the "flow syntax" - comma-separated strings inside square brackets.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ inputs:

implementation:
container:
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
# command is a list of strings (command-line arguments).
# The YAML language has two syntaxes for lists and you can use either of them.
# Here we use the "flow syntax" - comma-separated strings inside square brackets.
Expand Down
2 changes: 1 addition & 1 deletion kfp/kfp_ray_components/executeSubWorkflowComponent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ outputs:

implementation:
container:
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
image: "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"
# command is a list of strings (command-line arguments).
# The YAML language has two syntaxes for lists, and you can use either of them.
# Here we use the "flow syntax" - comma-separated strings inside square brackets.
Expand Down
4 changes: 2 additions & 2 deletions kfp/kfp_support_lib/kfp_v1_workflow_support/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "data_prep_toolkit_kfp_v1"
version = "0.2.3"
version = "0.2.4.dev0"
requires-python = ">=3.10,<3.13"
description = "Data Preparation Kit Library. KFP support"
license = {text = "Apache-2.0"}
Expand All @@ -13,7 +13,7 @@ authors = [
]
dependencies = [
"kfp==1.8.22",
"data-prep-toolkit-kfp-shared==0.2.3",
"data-prep-toolkit-kfp-shared==0.2.4.dev0",
]

[build-system]
Expand Down
4 changes: 2 additions & 2 deletions kfp/kfp_support_lib/kfp_v2_workflow_support/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "data_prep_toolkit_kfp_v2"
version = "0.2.3"
version = "0.2.4.dev0"
requires-python = ">=3.10,<3.13"
description = "Data Preparation Kit Library. KFP support"
license = {text = "Apache-2.0"}
Expand All @@ -14,7 +14,7 @@ authors = [
dependencies = [
"kfp==2.8.0",
"kfp-kubernetes==1.2.0",
"data-prep-toolkit-kfp-shared==0.2.3",
"data-prep-toolkit-kfp-shared==0.2.4.dev0",
]

[build-system]
Expand Down
4 changes: 2 additions & 2 deletions kfp/kfp_support_lib/shared_workflow_support/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "data_prep_toolkit_kfp_shared"
version = "0.2.3"
version = "0.2.4.dev0"
requires-python = ">=3.10,<3.13"
description = "Data Preparation Kit Library. KFP support"
license = {text = "Apache-2.0"}
Expand All @@ -14,7 +14,7 @@ authors = [
dependencies = [
"requests",
"kubernetes",
"data-prep-toolkit[ray]>=0.2.3",
"data-prep-toolkit[ray]>=0.2.4.dev0",
]

[build-system]
Expand Down
2 changes: 1 addition & 1 deletion scripts/check-workflows.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ if [ ! -d transforms ]; then
echo Please run this script from the top of the repository
exit 1
fi
KFP_BLACK_LIST="doc_chunk pdf2parquet pii_redactor text_encoder license_select repo_level_ordering header_cleanser fdedup"
KFP_BLACK_LIST="doc_chunk pdf2parquet pii_redactor text_encoder license_select repo_level_ordering header_cleanser fdedup hap"
while [ $# -ne 0 ]; do
case $1 in
-show-kfp-black-list) echo $KFP_BLACK_LIST; exit 0;
Expand Down
18 changes: 9 additions & 9 deletions scripts/k8s-setup/populate_minio.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,19 +30,19 @@ mc cp --recursive ${REPOROOT}/transforms/code/repo_level_ordering/ray/test-data/
mc cp --recursive ${REPOROOT}/transforms/code/license_select/ray/test-data/input/ kfp/test/license_select/input
mc cp --recursive ${REPOROOT}/transforms/code/license_select/ray/test-data/sample_approved_licenses.json kfp/test/license_select/
# language
mc cp --recursive ${REPOROOT}/transforms/language/lang_id/ray/test-data/input/ kfp/test/lang_id/input
mc cp --recursive ${REPOROOT}/transforms/language/doc_quality/ray/test-data/input/ kfp/test/doc_quality/input
mc cp --recursive ${REPOROOT}/transforms/language/pdf2parquet/ray/test-data/input/2206.01062.pdf kfp/test/pdf2parquet/input
mc cp --recursive ${REPOROOT}/transforms/language/text_encoder/ray/test-data/input/ kfp/test/text_encoder/input
mc cp --recursive ${REPOROOT}/transforms/language/doc_chunk/ray/test-data/input/ kfp/test/doc_chunk/input
mc cp --recursive ${REPOROOT}/transforms/language/html2parquet/ray/test-data/input/test1.html kfp/test/html2parquet/input
mc cp --recursive ${REPOROOT}/transforms/language/lang_id/test-data/input/ kfp/test/lang_id/input
mc cp --recursive ${REPOROOT}/transforms/language/doc_quality/test-data/input/ kfp/test/doc_quality/input
mc cp --recursive ${REPOROOT}/transforms/language/pdf2parquet/test-data/input/2206.01062.pdf kfp/test/pdf2parquet/input
mc cp --recursive ${REPOROOT}/transforms/language/text_encoder/test-data/input/ kfp/test/text_encoder/input
mc cp --recursive ${REPOROOT}/transforms/language/doc_chunk/test-data/input/ kfp/test/doc_chunk/input
mc cp --recursive ${REPOROOT}/transforms/language/html2parquet/test-data/input/test1.html kfp/test/html2parquet/input
# universal
mc cp --recursive ${REPOROOT}/transforms/universal/doc_id/ray/test-data/input/ kfp/test/doc_id/input
mc cp --recursive ${REPOROOT}/transforms/universal/doc_id/test-data/input/ kfp/test/doc_id/input
mc cp --recursive ${REPOROOT}/transforms/universal/ededup/ray/test-data/input/ kfp/test/ededup/input
mc cp --recursive ${REPOROOT}/transforms/universal/fdedup/ray/test-data/input/ kfp/test/fdedup/input
mc cp --recursive ${REPOROOT}/transforms/universal/filter/ray/test-data/input/ kfp/test/filter/input
mc cp --recursive ${REPOROOT}/transforms/universal/noop/ray/test-data/input/ kfp/test/noop/input
mc cp --recursive ${REPOROOT}/transforms/universal/tokenization/ray/test-data/ds01/input/ kfp/test/tokenization/ds01/input
mc cp --recursive ${REPOROOT}/transforms/universal/tokenization/test-data/ds01/input/ kfp/test/tokenization/ds01/input
mc cp --recursive ${REPOROOT}/transforms/universal/profiler/ray/test-data/input/ kfp/test/profiler/input
mc cp --recursive ${REPOROOT}/transforms/universal/resize/ray/test-data/input/ kfp/test/resize/input
mc cp --recursive ${REPOROOT}/transforms/universal/hap/ray/test-data/input/ kfp/test/hap/input
mc cp --recursive ${REPOROOT}/transforms/universal/hap/test-data/input/ kfp/test/hap/input
89 changes: 52 additions & 37 deletions transforms/.make.cicd.targets
Original file line number Diff line number Diff line change
Expand Up @@ -51,63 +51,78 @@ publish:

test-image-sequence:: .defaults.lib-whl-image .transforms.test-image-help .transforms.clean

test-image-python:
$(MAKE) BUILD_SPECIFIC_RUNTIME=python test-image

test-image-ray:
$(MAKE) BUILD_SPECIFIC_RUNTIME=ray test-image

test-image-spark:
$(MAKE) BUILD_SPECIFIC_RUNTIME=spark test-image

test-image:: .default.build-lib-wheel
@if [ -e Dockerfile.python ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.python \
TRANSFORM_RUNTIME_SRC_FILE=$(TRANSFORM_PYTHON_SRC) \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-python \
test-image-sequence ; \
@if [ -z "$(BUILD_SPECIFIC_RUNTIME)" ] || [ "$(BUILD_SPECIFIC_RUNTIME)" == "python" ]; then \
if [ -e Dockerfile.python ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.python \
TRANSFORM_RUNTIME_SRC_FILE=$(TRANSFORM_PYTHON_SRC) \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-python \
test-image-sequence ; \
fi ;\
fi
@if [ -e Dockerfile.ray ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.ray \
TRANSFORM_RUNTIME_SRC_FILE=$(TRANSFORM_RAY_SRC) \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-ray \
BASE_IMAGE=$(RAY_BASE_IMAGE) \
test-image-sequence ; \
@if [ -z "$(BUILD_SPECIFIC_RUNTIME)" ] || [ "$(BUILD_SPECIFIC_RUNTIME)" == "ray" ]; then \
if [ -e Dockerfile.ray ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.ray \
TRANSFORM_RUNTIME_SRC_FILE=$(TRANSFORM_RAY_SRC) \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-ray \
BASE_IMAGE=$(RAY_BASE_IMAGE) \
test-image-sequence ; \
fi ;\
fi
@if [ -e Dockerfile.spark ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.spark \
TRANSFORM_RUNTIME_SRC_FILE=$(TRANSFORM_SPARK_SRC) \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-spark \
BASE_IMAGE=$(SPARK_BASE_IMAGE) \
test-image-sequence ; \
@if [ -z "$(BUILD_SPECIFIC_RUNTIME)" ] || [ "$(BUILD_SPECIFIC_RUNTIME)" == "spark" ]; then \
if [ -e Dockerfile.spark ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.spark \
TRANSFORM_RUNTIME_SRC_FILE=$(TRANSFORM_SPARK_SRC) \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-spark \
BASE_IMAGE=$(SPARK_BASE_IMAGE) \
test-image-sequence ; \
fi ;\
fi
-rm -rf data-processing-dist


image-python:
@if [ -e Dockerfile.python ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.python \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-python \
.defaults.lib-whl-image ; \
fi
$(MAKE) BUILD_SPECIFIC_RUNTIME=python image

image-ray:
@if [ -e Dockerfile.ray ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.ray \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-ray \
BASE_IMAGE=$(RAY_BASE_IMAGE) \
.defaults.lib-whl-image ; \
fi
$(MAKE) BUILD_SPECIFIC_RUNTIME=ray image

image-spark:
@if [ -e Dockerfile.spark ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.spark \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-spark \
BASE_IMAGE=$(SPARK_BASE_IMAGE) \
.defaults.lib-whl-image ; \
fi
$(MAKE) BUILD_SPECIFIC_RUNTIME=spark image

image:: .default.build-lib-wheel
## Build all possible images unless a specific runtime is specified
@if [ -z "$(BUILD_SPECIFIC_RUNTIME)" ] || [ "$(BUILD_SPECIFIC_RUNTIME)" == "python" ]; then \
$(MAKE) image-python ; \
if [ -e Dockerfile.python ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.python \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-python \
.defaults.lib-whl-image ; \
fi ; \
fi
@if [ -z "$(BUILD_SPECIFIC_RUNTIME)" ] || [ "$(BUILD_SPECIFIC_RUNTIME)" == "ray" ]; then \
$(MAKE) image-ray ; \
if [ -e Dockerfile.ray ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.ray \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-ray \
BASE_IMAGE=$(RAY_BASE_IMAGE) \
.defaults.lib-whl-image ; \
fi ; \
fi
@if [ -z "$(BUILD_SPECIFIC_RUNTIME)" ] || [ "$(BUILD_SPECIFIC_RUNTIME)" == "spark" ]; then \
$(MAKE) image-spark ; \
if [ -e Dockerfile.spark ]; then \
$(MAKE) DOCKER_FILE=Dockerfile.spark \
DOCKER_IMAGE_NAME=$(TRANSFORM_NAME)-spark \
BASE_IMAGE=$(SPARK_BASE_IMAGE) \
.defaults.lib-whl-image ; \
fi ; \
fi
-rm -rf data-processing-dist

Expand Down
16 changes: 16 additions & 0 deletions transforms/Makefile.transform.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
REPOROOT=../../..
# Use make help, to see the available rules
include $(REPOROOT)/transforms/.make.cicd.targets

#
# This is intended to be included across the Makefiles provided within
# a given transform's directory tree, so must use compatible syntax.
#
################################################################################
# This defines the name of the transform and is used to match against
# expected files and is used to define the transform's image name.
TRANSFORM_NAME=$(shell basename `pwd`)

################################################################################


2 changes: 1 addition & 1 deletion transforms/code/code2parquet/kfp_ray/code2parquet_wf.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@


# components
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"

# path to kfp component specifications files
component_spec_path = "../../../../kfp/kfp_ray_components/"
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code2parquet/python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code2parquet_transform_python"
version = "0.2.3"
version = "0.2.4.dev0"
requires-python = ">=3.10,<3.13"
description = "code2parquet Python Transform"
license = {text = "Apache-2.0"}
Expand Down
6 changes: 3 additions & 3 deletions transforms/code/code2parquet/ray/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code2parquet_transform_ray"
version = "0.2.3"
version = "0.2.4.dev0"
requires-python = ">=3.10,<3.13"
description = "code2parquet Ray Transform"
license = {text = "Apache-2.0"}
Expand All @@ -10,8 +10,8 @@ authors = [
{ name = "Boris Lublinsky", email = "[email protected]" },
]
dependencies = [
"data-prep-toolkit[ray]>=0.2.3",
"dpk-code2parquet-transform-python==0.2.3",
"data-prep-toolkit[ray]>=0.2.4.dev0",
"dpk-code2parquet-transform-python==0.2.4.dev0",
"parameterized",
"pandas",
]
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_profiler/python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code_profiler_transform_python"
version = "0.2.3"
version = "0.2.4.dev0"
requires-python = ">=3.10,<3.13"
description = "Code Profiler Python Transform"
license = {text = "Apache-2.0"}
Expand Down
6 changes: 3 additions & 3 deletions transforms/code/code_profiler/ray/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code_profiler_transform_ray"
version = "0.2.3"
version = "0.2.4.dev0"
requires-python = ">=3.10,<3.13"
description = "Code Profiler Ray Transform"
license = {text = "Apache-2.0"}
Expand All @@ -9,8 +9,8 @@ authors = [
{ name = "Pankaj Thorat", email = "[email protected]" },
]
dependencies = [
"dpk-code-profiler-transform-python==0.2.3",
"data-prep-toolkit[ray]>=0.2.3",
"dpk-code-profiler-transform-python==0.2.4.dev0",
"data-prep-toolkit[ray]>=0.2.4.dev0",
]

[build-system]
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_quality/kfp_ray/code_quality_wf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
task_image = "quay.io/dataprep1/data-prep-kit/code_quality-ray:latest"

# components
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:0.2.3"
base_kfp_image = "quay.io/dataprep1/data-prep-kit/kfp-data-processing:latest"

# path to kfp component specifications files
component_spec_path = "../../../../kfp/kfp_ray_components/"
Expand Down
2 changes: 1 addition & 1 deletion transforms/code/code_quality/python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "dpk_code_quality_transform_python"
version = "0.2.3"
version = "0.2.4.dev0"
requires-python = ">=3.10,<3.13"
description = "Code Quality Python Transform"
license = {text = "Apache-2.0"}
Expand Down
Loading