-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make generic pipeline work with airflow2x #3167
Make generic pipeline work with airflow2x #3167
Conversation
One thing I forgot to note is that |
Regarding KubernetesPodOperator location: Yes, good point. I missed that in the release notes. Well, 2.5.x is already well past, plus there were some security bugs in 2.5.x that someone in our org mentioned that make it a good idea to go further than 2.5.x. Airflow 2.6.0 came out in May, so we can assume it is current. …@lresende what do you think in terms of which Airflow 2.x release we are targeting in our efforts here? greater or equal to 2.6.0? |
4cbf21d
to
4b8386f
Compare
I did approximatly the same thing in an air gapped environment so it would be a problem for me to commit my code but I think I can help.
In this case, the UI doesn't have an option for CPU and Memory limits, so the limits could probably be removed (In my k8s cluster a limit must be set so I just set it in some ratio of the request). I lack the knowledge to change the UI to include limits but I'm willing to help with anything else! |
@giladd123 agreed, setting limits is good practice, even when not enforced by LimitRange and max ratio request to limit.
For the template, setting limits to either equal to requests or minimally higher by a factor of x 1.2 would be good practice and make the cluster as a whole more stable. In the end, developers sets those requests in the GUI and they must be admonished to set realistic request sizes. With a limit set on more than 2 or 3 times the request, they'll realize soon they are wrong in their pipeline steps definition for resources whne competing resources lead to e.g. node OOM. |
Altough being a good practice, I don't think setting a hard ratio is a good idea as in the end this gives the user less flexability (for example, my cluster enforces a 1:4 ratio on cpu request and limit, with a 1.2x jobs will just not go up). I don't think we should force good practices on users, we should let them deal with this themselves. |
@giladd123 Agreed, however, not setting limits at all leads to pods being scheduled on a node that they really should not be scheduled on. In your example of 1 to 4 ratio, should all pods ever run on the same node at the same time, you're in for huge trouble, having a node outage and re-scheduling, leading to interruptions. We never went beyond 1 to 2 ratio in our production and test clusters. As you mentioned, it'd be best to at least have the option to set limits at the GUI level. |
Where we are with this? I am assuming still a WIP and we are waiting for more changes before a full review? |
Hi, is there any update? |
@lresende yes, it is WIP still, in large part due to the fact that I have not received any feedback, neither from ODH nor here, on why CPU and Memory limits are not in the variables and thus cannot be used at least optionally. In Kubeflow notebooks, limits can be used, why not in pipelines? @harshad16 My aim is to conceptually keep Airflow 2.x runtime code as much aligned as possible with the Redhat KFP side. |
I am leaving out the issue of resource limits for now. My aim is to test build this and integrate it into an open data hub notebook container, then test it with our Airflow 2.6.2 instance that is reading the Elyra-generated DAGs from our Gitlab company-internally. PIP_INDEX_URL or similar. You can probably tell I never installed from dev builds before ... |
Testing with a built .whl file
and integrated it into Open Data Hub Notebook Containers build process.
getting there. Muuch better. Just tested this and realized I forgot about what Gilad told me back then regarding changes to resources assembly, making the changes now and will test the locally-built wheel file once more with Airflow 2.6.x then, next week. i.e. in the airflow template file, for now only with that gpu limit, no cpu and memory limits as we do not have them yet in GUI.
I don't use GPUs, but I want to find out before I make the whl again whether there is also a property for the kind of gpu, don't wanna hard-code either
Another thing I noticed: CPU always needs to be an integer greater or equal than 1, memory also. This is not good. We should later fix that GUI and property part to more K8S and Openshift style resource units: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Ok, that worked nicely and my pipelines are getting picked up without error, GPU field, if specified, is picked up as well. I am having a minor issue with PodTemplate not allowing for Volume Creation from ConfigMap Content (only from PVCs), which would be super useful for mounting in custom CA-bundle file from a configmap to
is present in the airflow worker container, but not in the Elyra-defined dependent Pod Container unfortunately. Some stuff for a different story, volumes from ConfigMaps and volumeMounts based on that volume, similar to CPU limits ;-) |
1f21e5f
to
a6b7e82
Compare
squashed commits as well to be more readable. As mentioned, the built whl file together with Open Data Hub Jupyter Notebooks is working fine together with Airflow 2.6.2. |
It would be good to have a 2nd person to validate this is working before we look into merging it... |
@shalberd - thank you for this pull request. I am surprised at how little changes there are and don't see any kind of Airflow version checking happening. With these changes, is elyra agnostic to the Airflow version or do these changes replace the existing Airflow 1x support with Airflow 2x support? I guess I was expecting to see a new subclass (e.g., |
@kevin-bates it is a replacement for Airflow 2.x. I have already talked with @lresende in a Friday community call and we have the idea so far to make this part of Elyra 4.x, i.e. no more Airflow 1.x support soon. About lifecycle management: I only have contact with Astronomer and the Airflow community helm chart maintainer @thesuperzapper ... my judgment call would be: Airflow 1.x is long deprecated, no more security updates ... announce Airflow 1.x support derelease with Elyra 4.x |
d0c2050
to
81bf416
Compare
f8bc3ba
to
eba4211
Compare
Hi @lresende I had to fix some lint-server E226 missing whitespace around arithmetic operator code in elyra/tests, but all well now. |
4ef351e
to
d4f4d34
Compare
@lresende @fresende getting an error at step validate runtime images "=> ERROR: Container image pytorch/pytorch:2.5.1-cuda11.8-cudnn9-devel is not a suitable Elyra runtime image Looks like #3257 will fix this issue. Thank you, @caponetto |
@lresende seeing if the checks run through now, including the validate-runtime-image one. |
d4f4d34
to
a8722fa
Compare
@caponetto looking much better, your change worked for the validate images step. Nice job, thanks again. You mentioned that cypress component, do you know why this check fails in the step "run integration tests" it is the test itself or is it due to yarn? |
@shalberd From my observations over the past few months, I believe there is a flaky integration test. Sometimes it passes, sometimes it fails. You can try rerunning the job a few times to see if it passes. |
a8722fa
to
9112a32
Compare
Yes, I amended my commit with nothing and force-pushed again, leading to the tests to run through again (I do not have enough rights to re-run checks from the Github GUI). All tests, including that flaky yarn cypress integration test, ran through now. Wonder if we should have a separate issue filed for that flaky yarn cy integration test .... |
About other people running this Airflow 2 specific code to generate DAG python code: Another person from India, thanks, @MR-GOYAL, also verified earlier this year with an earlier Airflow version 2.7.2 that it worked fine. #3167 (comment) This code needs at least Airflow 2.7.0 to run because the Secret line import / location changed back then and I decided to use the new, non-deprecated one: #3167 (comment) |
123c1e6
to
380ec4e
Compare
I believe so. Also, if the failure is too frequent to the point of slowing work down, I suggest disabling it until its stability is restored. |
380ec4e
to
b0d89dd
Compare
Hi @lresende I rebased my feature branch with the latest changes to main, i.e. your refactor of pipeline implementation and namings It should not have any impact on my code changes here. |
d2c0924
to
be772ef
Compare
…-ai#3166. - Changed processor airflow and jinja2 template airflow to comply with Airflow >= 2.7.0. - Added new location of KubernetesPodOperator library in Airflow 2.x to test Pipeline for processor airflow. - Added cpu and memory limits fields in airflow 2.x fashion as well. - Added documentation related to Airflow 2 support and Elyra 4 for readthedocs. - Also mentioning what does not yet work in Elyra 4, e.g. custom components. - Changelog page in readthedocs appended with main important changes Elyra 4. - tests_kubernetes.py fixed lint-server E226 missing whitespace around arithmetic operator - tests_metadata.py fixed lint-server E226 missing whitespace around arithmetic operator Signed-off-by: Sven Thoms <[email protected]>
ff3a79e
to
afa3244
Compare
fixes #3166
@ianonavy @lresende NOT intended to work with Airflow 1.x
What changes were proposed in this pull request?
changes to airflow processor and airflow pipeline template with regards to Airflow 2.8.2 or higher support
also added pendulum instead of days_ago since deprecated
https://github.com/apache/airflow/pull/21653/files
Conceptual overlap with a fork, came to the same code in parallel
change to the Kubernetes client SDK in generic pipeline part of template since the Airflow abstractions
were all deprecated and removed except for Secret.
Finally, Airflow 2 adds logic that makes config_file mutually exclusive
with in_cluster, so we need to ensure that None is passed as None and
not string "None".
See also
https://github.com/kflow-ai/elyra/commit/f9d132954e008d30145f18794aa543d97f121a5f#diff-dc6c3f666aad9271fa5e9b8c31e3f0582cd39a7d2516cbc2240731fe4456e641
How was this pull request tested?
In contrast to kubeflow pipelines, even for Airflow 1.x and the different pipeline editors, there do not seem to be any tests.
I'd like to test the built wheel file in a docker image in conjunction with ODH.
Mostly seeing whether the generated DAG code works with Airflow 2.8.2 and higher.
Developer's Certificate of Origin 1.1