Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store test datasets in this repo #226

Open
TomNicholas opened this issue Aug 23, 2024 · 3 comments · May be fixed by #235
Open

Store test datasets in this repo #226

TomNicholas opened this issue Aug 23, 2024 · 3 comments · May be fixed by #235
Labels

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Aug 23, 2024

Our current approach to testing involves a bunch of fixtures which each download a tutorial dataset from xarray (and cache it because it uses pooch), saves them to a temporary directory, then open that dataset from disk. This is not ideal for a few reasons:

  1. The datasets aren't minimal, so they contain more complexity than is really needed to test a single bug / feature. This can make debugging more complicated.
  2. We're using the network when we don't need to be.
  3. vz.open_virtual_dataset calls xr.open_dataset, but because of our test setup xr.open_dataset can potentially be called more than once in the same test invocation, even if the code we are testing only calls it once. This again can make debugging more confusing than it needs to be.

We do need to test our ability to read files from disk, but it might be better just to make some really tiny netCDF files and save them in this repo.

EDIT: Xarray actually does this and no-one seems to complain because the files are only ~1kB in size, which is smaller than the text files containing the actual code.

@norlandrhagen norlandrhagen linked a pull request Aug 31, 2024 that will close this issue
3 tasks
@TomNicholas
Copy link
Member Author

Note that the way we have been doing this so far is good in that we haven't committed any large files to git, so we don't have to do any cleaning of the git history (which is a PITA).

@maxrjones
Copy link
Member

Just a note that @TomNicholas suggested in #365 that we store a smaller alternative to the NISAR file used in FAILED virtualizarr/tests/test_backend.py::TestReadFromURL::test_virtualizarr_vs_local_nisar as part of this issue.

On this and #235, should we consider moving the tests outside the source code directory or explicitly excluding the data files from the release manifest? Some people care a lot about having small release sizes when using lambda for example, though I'm not sure if this includes both the sdist and wheels or just wheels.

@TomNicholas
Copy link
Member Author

On this and #235, should we consider moving the tests outside the source code directory or explicitly excluding the data files from the release manifest?

I don't think there is any need is there? Currently we don't ship large files with the release, and if we switch to using very small test files (~kB) then we still won't be shipping large files with the release. As long as we actually make sure the files are that small then we don't need to separate it out.

OTOH if this is a commonly done thing then sure let's split them apart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants