Store test datasets in this repo #226

TomNicholas · 2024-08-23T21:54:41Z

Our current approach to testing involves a bunch of fixtures which each download a tutorial dataset from xarray (and cache it because it uses pooch), saves them to a temporary directory, then open that dataset from disk. This is not ideal for a few reasons:

The datasets aren't minimal, so they contain more complexity than is really needed to test a single bug / feature. This can make debugging more complicated.
We're using the network when we don't need to be.
vz.open_virtual_dataset calls xr.open_dataset, but because of our test setup xr.open_dataset can potentially be called more than once in the same test invocation, even if the code we are testing only calls it once. This again can make debugging more confusing than it needs to be.

We do need to test our ability to read files from disk, but it might be better just to make some really tiny netCDF files and save them in this repo.

EDIT: Xarray actually does this and no-one seems to complain because the files are only ~1kB in size, which is smaller than the text files containing the actual code.

The text was updated successfully, but these errors were encountered:

TomNicholas · 2024-12-30T20:52:24Z

Note that the way we have been doing this so far is good in that we haven't committed any large files to git, so we don't have to do any cleaning of the git history (which is a PITA).

maxrjones · 2024-12-30T21:59:32Z

Just a note that @TomNicholas suggested in #365 that we store a smaller alternative to the NISAR file used in FAILED virtualizarr/tests/test_backend.py::TestReadFromURL::test_virtualizarr_vs_local_nisar as part of this issue.

On this and #235, should we consider moving the tests outside the source code directory or explicitly excluding the data files from the release manifest? Some people care a lot about having small release sizes when using lambda for example, though I'm not sure if this includes both the sdist and wheels or just wheels.

TomNicholas · 2024-12-31T01:01:27Z

On this and #235, should we consider moving the tests outside the source code directory or explicitly excluding the data files from the release manifest?

I don't think there is any need is there? Currently we don't ship large files with the release, and if we switch to using very small test files (~kB) then we still won't be shipping large files with the release. As long as we actually make sure the files are that small then we don't need to separate it out.

OTOH if this is a commonly done thing then sure let's split them apart.

TomNicholas added the testing label Aug 23, 2024

norlandrhagen linked a pull request Aug 31, 2024 that will close this issue

Store test datasets in repo #235

Open

3 tasks

TomNicholas mentioned this issue Dec 30, 2024

Host our own test files #365

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store test datasets in this repo #226

Store test datasets in this repo #226

TomNicholas commented Aug 23, 2024 •

edited

Loading

TomNicholas commented Dec 30, 2024

maxrjones commented Dec 30, 2024

TomNicholas commented Dec 31, 2024

Store test datasets in this repo #226

Store test datasets in this repo #226

Comments

TomNicholas commented Aug 23, 2024 • edited Loading

TomNicholas commented Dec 30, 2024

maxrjones commented Dec 30, 2024

TomNicholas commented Dec 31, 2024

TomNicholas commented Aug 23, 2024 •

edited

Loading