Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken features in cloud instances when depending on temp or uploaded files #9441

Open
nucleogenesis opened this issue May 16, 2022 · 4 comments · May be fixed by #12590
Open

Broken features in cloud instances when depending on temp or uploaded files #9441

nucleogenesis opened this issue May 16, 2022 · 4 comments · May be fixed by #12590
Assignees
Labels
DEV: backend Python, databases, networking, filesystem... P0 - critical Priority: Release blocker or regression TAG: cloud Issues specific to running Kolibri in a cloud environment

Comments

@nucleogenesis
Copy link
Member

Observed behavior

On instances running in the cloud using BCK, Kolibri is unable to provide features that make use of temporary storage. Two examples were discovered by NCC testing on the Vodafone BCK pentesting instance.

  1. Cannot upload a CSV to import users
  2. When generating logs, the links to download the successfully generated logs returns 404

A path toward solving this will need to look into storing user uploaded files and pod-generated files in a GCS bucket and referencing that location rather than a local file system when generating or storing files.

Note there may be more instances where this is a problem and it should be considered for all future features in Kolibri that involve temporary file storage or user file uploads.

Expected behavior

All Kolibri features work in the cloud instances as expected.

User-facing consequences

Cloud Kolibri instances have broken features.

Steps to reproduce

Try a BCK-deployed Kolibri to generate logs or import users by CSV.

Context

Kolibri 0.15.2
BCK VF Pentesting instance

@nucleogenesis nucleogenesis added P0 - critical Priority: Release blocker or regression DEV: backend Python, databases, networking, filesystem... labels May 16, 2022
@rtibbles
Copy link
Member

Note that I think the most sustainable way to do this would be to use a DjangoStorage class to handle any file uploads in Kolibri - then it can be swapped out for a different class that supports the appropriate backend for the environment.

This is similar to #5698, except that this is for all non-content file operations - we have worked around content in remote settings by not having to import content at all, which seems better!

@marcellamaki marcellamaki added this to the 0.16.0 milestone Jul 12, 2022
@rtibbles rtibbles added the TAG: cloud Issues specific to running Kolibri in a cloud environment label May 17, 2023
@nucleogenesis nucleogenesis self-assigned this Aug 8, 2024
@nucleogenesis
Copy link
Member Author

@rtibbles some thoughts & questions on this

Looks like we'll need to set up a BCK env so that we can authenticate to it w/ the google-cloud-storage lib.

I found this gcloud backend in a lib called django-storages (which is BSD-3-Clause fwiw in case we want to try to vendor the single backend to avoid WHL bloat?)

If I'm reading the DJango docs correctly and understanding well, the short list of things to do here are:

  • Add the two dependencies to Kolibri
  • Figure out how to setup the authentication business such that it will work w/ our BCK configuration (which, I believe, relies on the gcloud utility locally, but I imagine there is some automagical parts of this when connecting from one part of GCP to another?)
  • On BCK, set the env var DEFAULT_FILE_STORAGE to that django-storages.storages.backends.gcloud module.
  • Now BCK loads the GCloud file backend when that env-var is set and uses the default file system otherwise

Are there any other things I should be considering here w/ regard to how Kolibri works on BCK (cc @DXCanas @anguyen1234 ).

@rtibbles
Copy link
Member

rtibbles commented Aug 8, 2024

The main work is updating how we interact with files to use a DjangoStorage backend, currently we just deal with files on disk for the generated reports.

We don't need to add the google cloud backend to Kolibri's dependencies (I imagine that will cause a lot of bloat), so instead, we just need to make the default storage backend configurable. We can check that the right things are installed in the same way that we verify our Redis configuration too by trying to import the appropriate package: https://github.com/learningequality/kolibri/blob/develop/kolibri/utils/options.py#L287

The env var would that would be set on BCK then be mediated via the options.py machinery - it would presumably need more options, much like the Redis cache does, to configure the bucket, permissions, etc.

@DXCanas
Copy link
Member

DXCanas commented Aug 9, 2024

What Richard said. No gcloud utility. That’d be kinda insane. Because it’s running on google “hardware” it has ways of figuring out perms.

We typically rely on the default behavior to this point.

To learn more:
https://cloud.google.com/docs/authentication/provide-credentials-adc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DEV: backend Python, databases, networking, filesystem... P0 - critical Priority: Release blocker or regression TAG: cloud Issues specific to running Kolibri in a cloud environment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants