-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix birds not creating their wps output under each bird name #203
Fix birds not creating their wps output under each bird name #203
Conversation
Fix this kind of error: ``` [2021-10-28 02:39:05 +0000] [1] [INFO] Starting gunicorn 20.1.0 [2021-10-28 02:39:05 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1) [2021-10-28 02:39:05 +0000] [1] [INFO] Using worker: sync [2021-10-28 02:39:05 +0000] [7] [INFO] Booting worker with pid: 7 server->outputpath configuration value /data/wpsoutputs/finch is not directory ```
Otherwise, on disk the output files will be at the proper location but the advertized download url will be wrong. Wrong output url returned (missing "finch" after "/wpsoutputs/"): ``` print("Process status: ", resp.status) urls = resp.get() print("Link to process output: ", urls.output_netcdf) Process status: ProcessSucceeded Link to process output: https://lvupavics.ouranos.ca/wpsoutputs/fa9a7d15-379a-11ec-988f-d38848f08134/frost-days_SRES-A2-experiment_20460101-20650101.nc ```
@@ -202,6 +202,11 @@ do | |||
docker exec ${postgres_id} /postgres-setup.sh | |||
fi | |||
|
|||
# because server.outputpath in wps.cfg do not create the dir | |||
for bird in finch flyingpigeon raven; do | |||
docker exec $bird mkdir -p /data/wpsoutputs/$bird |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so proud of this hack. Question for someone with more WPS config knowledge, is there an option in the config file to tell the server to create the outputpath dir if it does not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we were to add this to the makefile for all birds instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised because the FileStorage code uses os.makedirs, which as far as I understand should create the missing path. https://github.com/geopython/pywps/blob/711219792be8b3d6a175a81152282dc5046d412b/pywps/inout/storage/file.py#L97
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know where PyWPS fails if you don't create this directory ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh indeed ! Maybe the dir will be created on first use, not on bird startup so it's fine that's it it not there.
I saw this warning during bird startup server->outputpath configuration value /data/wpsoutputs/finch is not directory
so I assumed too fast.
Will retest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my end, I'm ok to wait for a PyWPS fix if that avoids throw-away code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could then clean
pavics-compose.sh
further by moving the custom script section for geoserver and postgres to their own config folder. In the end the main compose script would be cleaner and each component would have the ability to run custom command in the context of pavics.
@dbyrns Unfortunately geoserver and postgres are not in a "component" layout. We have been adding new components using the new pluggable design but we have not migrate all the existing pieces to this pluggable design :(
On my end, I'm ok to wait for a PyWPS fix if that avoids throw-away code.
@huard no big deal with throw-away code, it's just this section of mkdir
that is throw-away. Waiting for PyWPS means not only waiting for a release of PyWPS but also the integration into our birds then a new build of all our birds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbyrns Unfortunately geoserver and postgres are not in a "component" layout. We have been adding new components using the new pluggable design but we have not migrate all the existing pieces to this pluggable design :(
I know, I'm not suggesting to make them proper components, but only to extend the pre/post compose loop to include the "built-in" components in the config directory. This way any existing services could include custom scripts.
This time I agree that we can wait for PyWPS, but we should keep that in mind if component related stuff are to be included in the pavics-compose.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extend the pre/post compose loop to include the "built-in" components in the config directory. This way any existing services could include custom scripts.
Yeah that's a quick win. But I'd say also a potential throwaway code if we migrate all existing components to the new pluggable design. Potential usecase for a complete pluggable design is not all organisations deploying PAVICS will want all the current components activated. Maybe they just want Thredds and their birds, not our birds.
Anyways, back to this PR, it's a cheap throwaway and a quick win. I'd rather take it now than wait. I can make the pre/post for configs/
as well since we are into cheap throwaway for quick win. But I am fine if we prefer to wait.
Note the other issue bird-house/finch#160 probably also need this same mkdir
hack or the matching proper fix on PyWPS side. So either we solve both issues now or we wait for both issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huard Are you okay to merge this quick hack for this fix? Not sure how much time I'll need to debug and perform the proper fix in PyWPS. All the old birds (hummingbird, ...) using the old "Buildout way" are basically doing this same hack, ie mkdir themselve outside of PyWPS.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/723/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-birds-not-creating-their-wps_output-under-each-bird-name DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-8.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/640/NOTEBOOK TEST RESULTS |
@tlvu you should look at #11 (comment). If I'm correct this PR would not be required. |
I'm getting ready to release a new version of Finch with PyWPS 4.5.1. See bird-house/finch#218 |
No, all override should be able to be done here, theoretically. |
…der-each-bird-name
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/852/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-birds-not-creating-their-wps_output-under-each-bird-name DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-69.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/737/NOTEBOOK TEST RESULTS |
@fmigneault I indent to go forward with this PR to move the output dir of each bird into their own name. Hopefully before Xmas as part of my end of year clean up. Would this impact Weaver or Magpie? |
@tlvu |
…ing-their-wps_output-under-each-bird-name
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1260/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-birds-not-creating-their-wps_output-under-each-bird-name DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/967/NOTEBOOK TEST RESULTS |
run tests Previous test run broken because production machine was down and some tests use data from production machine. |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1261/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-birds-not-creating-their-wps_output-under-each-bird-name DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca Infrastructure deployment failed. Instance has not been destroyed. @matprov |
See PR bird-house/birdhouse-deploy#203 Makefile updated for future refresh as well. Jenkins failure due to wpsoutputs path change: ``` _________ finch-master/docs/source/notebooks/finch-usage.ipynb::Cell 3 _________ Notebook cell execution failed Cell 3: Cell outputs differ Input: print("Process status: ", resp.status) urls = resp.get() print("Link to process output: ", urls.output) Traceback: mismatch 'stdout' assert reference_output == test_output failed: 'Process stat...20650101.nc\n' == 'Process stat...20650101.nc\n' Skipping 77 identical leading characters in diff, use -v to show - psoutputs/finch/STATUS_FILE/frost_days_sres_a2_experiment_20460101_20650101.nc ? ------ + psoutputs/STATUS_FILE/frost_days_sres_a2_experiment_20460101_20650101.nc ```
See PR bird-house/birdhouse-deploy#203 Jenkins failure due to wpsoutputs path change: ``` ___ pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb::Cell 23 ___ Notebook cell execution failed Cell 23: Cell outputs differ Input: from urllib.parse import urlparse output_url = result.get().output print("output_url = ", output_url) parsed = urlparse(output_url) output_path = parsed.path.replace("wpsoutputs", "wps_outputs") print("output_path = ", output_path) output_thredds_url = ( f"https://{parsed.hostname}/twitcher/ows/proxy/thredds/dodsC/birdhouse{output_path}" ) print("output_thredds_url = ", output_thredds_url) Traceback: mismatch 'stdout' assert reference_output == test_output failed: 'output_url =...FILE/out.nc\n' == 'output_url =...FILE/out.nc\n' - output_url = https://WPS_HOST/wpsoutputs/finch/STATUS_FILE/out.nc ? ------ + output_url = https://WPS_HOST/wpsoutputs/STATUS_FILE/out.nc - output_path = /wps_outputs/finch/STATUS_FILE/out.nc ? ------ + output_path = /wps_outputs/STATUS_FILE/out.nc - output_thredds_url = https://PAVICS_FQDN/twitcher/ows/proxy/thredds/dodsC/birdhouse/wps_outputs/finch/STATUS_FILE/out.nc ? ------ + output_thredds_url = https://PAVICS_FQDN/twitcher/ows/proxy/thredds/dodsC/birdhouse/wps_outputs/STATUS_FILE/out.nc _______ pavics-sdi-master/docs/source/notebooks/subsetting.ipynb::Cell 4 _______ Notebook cell execution failed Cell 4: Cell outputs differ Input: res = resp.get() print("URL: ", res.output) res = resp.get(asobj=True) res.output Traceback: mismatch 'stdout' assert reference_output == test_output failed: 'URL: https:..._Africa.nc.\n' == 'URL: https:..._Africa.nc.\n' - URL: https://WPS_HOST/wpsoutputs/flyingpigeon/STATUS_FILE/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612_Africa.nc ? ------------- + URL: https://WPS_HOST/wpsoutputs/STATUS_FILE/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612_Africa.nc Downloading to /tmp/tmpRANDOM/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612_Africa.nc. ```
Maybe add a note to change log about the recent |
@fmigneault |
You don't "need" it necessarily, especially if WPS outputs are all public. I was considering only adding the references as advisory in the changelogs to help track errors if any. |
Makefile updated for future refresh. See PR bird-house/birdhouse-deploy#203 Not refreshing the notebooks because they do not all work in Jenkins yet and are not enabled in Jenkins by default.
@fmigneault |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1264/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-birds-not-creating-their-wps_output-under-each-bird-name DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-92.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/969/NOTEBOOK TEST RESULTS |
See PR bird-house/birdhouse-deploy#203 `Makefile` updated for future refresh as well. Jenkins failure due to wpsoutputs path change: ``` _________ finch-master/docs/source/notebooks/finch-usage.ipynb::Cell 3 _________ Notebook cell execution failed Cell 3: Cell outputs differ Input: print("Process status: ", resp.status) urls = resp.get() print("Link to process output: ", urls.output) Traceback: mismatch 'stdout' assert reference_output == test_output failed: 'Process stat...20650101.nc\n' == 'Process stat...20650101.nc\n' Skipping 77 identical leading characters in diff, use -v to show - psoutputs/finch/STATUS_FILE/frost_days_sres_a2_experiment_20460101_20650101.nc ? ------ + psoutputs/STATUS_FILE/frost_days_sres_a2_experiment_20460101_20650101.nc ```
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1266/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-birds-not-creating-their-wps_output-under-each-bird-name DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : false PAVICS_HOST : https://host-140-92.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/971/NOTEBOOK TEST RESULTS |
This one is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick tests on finch/raven wpsoutputs on instance created #203 (comment) seem to work.
Good to go. 👍
@matprov |
@fmigneault |
Wow, looks like adding Raven nb to the mix exceeded the 1h timeout ! Normal runs without Raven nb should be less than 35 mins. |
@fmigneault |
See PR bird-house/birdhouse-deploy#203 Jenkins failure due to wpsoutputs path change: ``` ___ pavics-sdi-master/docs/source/notebooks/subset-user-input.ipynb::Cell 23 ___ Notebook cell execution failed Cell 23: Cell outputs differ Input: from urllib.parse import urlparse output_url = result.get().output print("output_url = ", output_url) parsed = urlparse(output_url) output_path = parsed.path.replace("wpsoutputs", "wps_outputs") print("output_path = ", output_path) output_thredds_url = ( f"https://{parsed.hostname}/twitcher/ows/proxy/thredds/dodsC/birdhouse{output_path}" ) print("output_thredds_url = ", output_thredds_url) Traceback: mismatch 'stdout' assert reference_output == test_output failed: 'output_url =...FILE/out.nc\n' == 'output_url =...FILE/out.nc\n' - output_url = https://WPS_HOST/wpsoutputs/finch/STATUS_FILE/out.nc ? ------ + output_url = https://WPS_HOST/wpsoutputs/STATUS_FILE/out.nc - output_path = /wps_outputs/finch/STATUS_FILE/out.nc ? ------ + output_path = /wps_outputs/STATUS_FILE/out.nc - output_thredds_url = https://PAVICS_FQDN/twitcher/ows/proxy/thredds/dodsC/birdhouse/wps_outputs/finch/STATUS_FILE/out.nc ? ------ + output_thredds_url = https://PAVICS_FQDN/twitcher/ows/proxy/thredds/dodsC/birdhouse/wps_outputs/STATUS_FILE/out.nc _______ pavics-sdi-master/docs/source/notebooks/subsetting.ipynb::Cell 4 _______ Notebook cell execution failed Cell 4: Cell outputs differ Input: res = resp.get() print("URL: ", res.output) res = resp.get(asobj=True) res.output Traceback: mismatch 'stdout' assert reference_output == test_output failed: 'URL: https:..._Africa.nc.\n' == 'URL: https:..._Africa.nc.\n' - URL: https://WPS_HOST/wpsoutputs/flyingpigeon/STATUS_FILE/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612_Africa.nc ? ------------- + URL: https://WPS_HOST/wpsoutputs/STATUS_FILE/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612_Africa.nc Downloading to /tmp/tmpRANDOM/tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612_Africa.nc. ```
Makefile updated for future refresh. See PR bird-house/birdhouse-deploy#203 Not refreshing the notebooks because they do not all work in Jenkins yet and are not enabled in Jenkins by default.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/1273/Result : failure BIRDHOUSE_DEPLOY_BRANCH : fix-birds-not-creating-their-wps_output-under-each-bird-name DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca Infrastructure deployment failed. Instance has not been destroyed. @matprov |
Because when Raven notebooks are added to the test run, the 1h timeout is busted. See comment bird-house/birdhouse-deploy#203 (comment)
|
Because when Raven notebooks are added to the test run, the 1h timeout is busted. See comment bird-house/birdhouse-deploy#203 (comment)
This overshadowing the default config means the recent changes to move birds to their separate wpsoutputs dir did not take effect, see bird-house/birdhouse-deploy#203. Keep raven/.gitignore to avoid existing instanciated wps.cfg file to become unknown and break subsequent autodeploy.
Makefile updated for future refresh. See PR bird-house/birdhouse-deploy#203 Not refreshing the notebooks because they do not all work in Jenkins yet and are not enabled in Jenkins by default.
Makefile updated for future refresh. See PR bird-house/birdhouse-deploy#203 Not refreshing the notebooks because they do not all work in Jenkins yet and are not enabled in Jenkins by default.
Before this fix, finch, raven, flyingpigeon were dumping their output directly under
https://PAVICS_HOST/wpsoutputs/
.With this fix, it will be under each bird name, ex:
https://PAVICS_HOST/wpsoutputs/finch/
which is cleaner and follows what malleefowl and hummingbird already does.Fixes #11.
Fixes https://crim-ca.atlassian.net/browse/DAC-398
Requires PR Ouranosinc/pavics-sdi#280, bird-house/finch#273, Ouranosinc/raven#459
If
optional-components/secure-data-proxy
is enabled, might need someadditional permissions for each bird in
https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/optional-components/secure-data-proxy/config/magpie/config.yml.template.