Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Files #108

Merged
merged 43 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
cb279db
Initial framework for flat csv files
micahjohnson150 Jun 10, 2024
e345cbf
Working download
micahjohnson150 Jun 10, 2024
5140a16
Working download of snowex data
micahjohnson150 Jun 10, 2024
7e28aef
Working download and radiation plot
micahjohnson150 Jun 10, 2024
5769c70
Added all variables of interest to m3 users
micahjohnson150 Jun 10, 2024
fb8a986
Cleaning up some
micahjohnson150 Jun 10, 2024
190fa16
Added in testing data for snowex data, all reduced to first 2 weeks i…
micahjohnson150 Jun 10, 2024
ae7559a
Started tests. Not quite working with a download mock
micahjohnson150 Jun 10, 2024
ec7069b
Built a test using a mocked download
micahjohnson150 Jun 11, 2024
88f4d5c
Test works without internet
micahjohnson150 Jun 11, 2024
0d3c6d9
Working points from geom in csv read with tests
micahjohnson150 Jun 11, 2024
d613699
Added in buffer testing in within geom
micahjohnson150 Jun 11, 2024
f7165f2
Working!
micahjohnson150 Jun 11, 2024
38f8462
First attempt at Senator beck data
micahjohnson150 Jun 11, 2024
27b2029
Added in datetime conversion
micahjohnson150 Jun 11, 2024
b295b72
Added in csv file specific datetime conversion
micahjohnson150 Jun 11, 2024
bdebde5
Added in excpetion handling for date ranges outside of the availability
micahjohnson150 Jun 11, 2024
0f1c320
Working url pivoting
micahjohnson150 Jun 11, 2024
ae9e74a
Added a test for checking valid links
micahjohnson150 Jun 11, 2024
e9df267
Added more variable for SBB. Added in test datasets
micahjohnson150 Jun 11, 2024
c8f213b
Updated snowex tests and files
micahjohnson150 Jun 12, 2024
5afb517
Added mock data for SASP and SBSP for 2010-2023
micahjohnson150 Jun 12, 2024
0615611
Added variables. Moved mock data to correct dir. Found small bug arou…
micahjohnson150 Jun 12, 2024
650abf9
Fixed headers. Worked through multi file copying in mocking
micahjohnson150 Jun 12, 2024
6e7ec0f
Minor jockeying around datetime id-ing in flat files
micahjohnson150 Jun 17, 2024
33992ca
FLAKED!
micahjohnson150 Jun 17, 2024
42ce4ba
Flaked!
micahjohnson150 Jun 17, 2024
4d28aad
Fixed typo in file reading. Eliminated the pandas copy warning. Added…
micahjohnson150 Jun 17, 2024
56d61fd
Added in documentation about snowex and csas. Attemting to remove set…
micahjohnson150 Jun 18, 2024
f74e6df
Flaked
micahjohnson150 Jun 18, 2024
fbdc4e8
Slight mod to tests on frequency
micahjohnson150 Jun 18, 2024
21f13b8
Migrated freq to lower case h since H is deprecated
micahjohnson150 Jun 18, 2024
d452ad1
Secured cache dir location
micahjohnson150 Jun 18, 2024
0a17f53
Coping with frequency changes.
micahjohnson150 Jun 18, 2024
4c30b2f
Added edits per PR review
micahjohnson150 Jun 25, 2024
b3fadc1
Added in feedback from CSAS on their pattern for adding data in.
micahjohnson150 Jul 2, 2024
f0e544e
Modified assertion for units to be an exception
micahjohnson150 Jul 2, 2024
93c39e5
added more variables
micahjohnson150 Jul 2, 2024
14bf9f7
Added in testing around streaming in csas
micahjohnson150 Jul 2, 2024
a640dbe
Working tests
micahjohnson150 Jul 2, 2024
2bbb84f
Rearranged the base validation to avoid complexity flake issures. Con…
micahjohnson150 Jul 2, 2024
46f513d
Broke out validation checks. Added associated tests
micahjohnson150 Jul 3, 2024
dcd220d
Removed redundent ref to validate columns
micahjohnson150 Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -111,3 +111,4 @@ ENV/
# scratch dir
scratch/
**/.ipynb_checkpoints/*
**/cache/**
5 changes: 3 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@ metloom
Location Oriented Observed Meteorology

metloom is a python library created with the goal of consistent, simple sampling of
meteorology and snow related point measurments from a variety of datasources across the
Western US. metloom is developed by `M3 Works <https://m3works.io>`_ as a tool for validating
meteorology and snow related point measurments from a variety of datasources is developed by `M3 Works <https://m3works.io>`_ as a tool for validating
computational hydrology model results. Contributions welcome!

Warning - This software is provided as is (see the license), so use at your own risk.
Expand All @@ -45,6 +44,8 @@ Features
* `GEOSPHERE AUSTRIA <https://data.hub.geosphere.at/dataset/>`_
* `UCSB CUES <https://snow.ucsb.edu/#>`_
* `MET NORWAY <https://frost.met.no/index.html>`_
* `SNOWEX MET STATIONS <https://nsidc.org/data/snex_met/versions/1>`_
* `CENTER FOR SNOW AND AVALANCHE STUDIES (CSAS) <https://snowstudies.org/csas-facilities/>`_

Requirements
------------
Expand Down
140 changes: 140 additions & 0 deletions docs/gallery/csas_example.ipynb

Large diffs are not rendered by default.

51 changes: 51 additions & 0 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,58 @@ To pull stations using Mesowest::
)
print(df)

Center for Snow and Avalanche Studies (CSAS)
--------------------------------------------
There are 4 stations of interest maintained by the CSAS. Senator Beck Study plot,
Swamp Angel Study Plot, Senator Beck Stream Gauge and Putney Study plot. These four stations
contain a wealth of data useful for studying and validating snow processes. The files exist as a
flat csv file so requests using this will simply download the file, interpret the datetime
index and crop according to your request. Since it is a CSV the file will be stored in a local cache
in the same directory you ran your code. This way the download times are reduced.

Additionally, the CSAS data is not available in realtime (at least as of June 2024).
Data is updated annually and stored on the website. Metloom will try to stay as up to date as
possible when the files are updated. Please feel free to submit a PR if you know the data has been
updated. Checkout the `facilities page <https://snowstudies.org/csas-facilities/>`_ on CSAS to see more about the stations.

To pull stations using CSAS::

from metloom.pointdata import CSASMet
from metloom.variables import CSASVariables
from datetime import datetime

start = datetime(2023, 1, 1)
end = datetime(2023, 6, 1)
sbsp = CSASMet('SBSP')
df_sbsp = sbsp.get_daily_data(start, end, [CSASVariables.SNOWDEPTH])

If you use these data, please use the `appropriate citations <https://snowstudies.org/data-use-policy/>`_ and give credit to the
institution.

SnowEx
------
During the `NASA SnowEx campaign <https://snow.nasa.gov/campaigns/snowex>`_
there were a handful of met stations deployed which are now published on the
`NSIDC <https://nsidc.org/data/snex_met/versions/1>`_. These stations have been
mapped into metloom to increase the utility/convenience of these data. The SnowEx
data is in a csv file format and thus any queries will download the appropriate
files to a local cache to reduce download times. For this to work you need to have
a `.netrc` and an account with the NSIDC. See the
`access guide <https://nsidc.org/data/user-resources/help-center/programmatic-data-access-guide>`_
for more help.

To pull stations using SnowEx::

from metloom.pointdata import SnowExMet
from metloom.variables import SnowExVariables
from datetime import datetime

start = datetime(2020, 1, 1)
end = datetime(2020, 6, 1)

# Grand Mesa Study Plot
gmsp = SnowExMet('GMSP')
df_gmsp = gmsp.get_daily_data(start, end, [SnowExVariables.SNOWDEPTH])

My variables aren't here
------------------------
Expand Down
6 changes: 5 additions & 1 deletion metloom/pointdata/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,13 @@
from .geosphere_austria import GeoSphereHistPointData, GeoSphereCurrentPointData
from .norway import MetNorwayPointData
from .cues import CuesLevel1
from .files import CSVPointData, StationInfo
from .snowex import SnowExMet
from .csas import CSASMet

__all__ = [
"PointData", "PointDataCollection", "CDECPointData", "SnotelPointData",
"MesowestPointData", "USGSPointData", "GeoSphereHistPointData",
"GeoSphereCurrentPointData", "CuesLevel1", "MetNorwayPointData"
"GeoSphereCurrentPointData", "CuesLevel1", "MetNorwayPointData",
"CSVPointData", "StationInfo", "SnowExMet", "CSASMet"
]
76 changes: 76 additions & 0 deletions metloom/pointdata/csas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
"""
Data reader for the Center for Snow and Avalanche Studies
"""
from metloom.pointdata import CSVPointData, StationInfo
from metloom.variables import CSASVariables
import os
from datetime import datetime, timedelta


class InvalidDateRange(Exception):
"""
Exception to indicate there is no know data for the available date range
"""


class CSASStationInfo(StationInfo):
# Name, id, lat, long, elevation, http path
SENATOR_BECK = ("Senator Beck Study Plot", "SBSP", 37.90688, -107.72627, 12186,
"2023/11/SBSP_1hr_2003-2009.csv")
SWAMP_ANGEL = ("Swamp Angel Study Plot", "SASP", 37.90691, -107.71132, 11060,
"2023/11/SASP_1hr_2003-2009.csv")
PUTNEY = ("Putney Study Plot", "PTSP", 37.89233, -107.69577, 12323,
"2023/11/PTSP_1hr.csv")
SENATOR_BECK_STREAM_GAUGE = ("Senator Beck Stream Gauge", "SBSG", 37.90678,
-107.70943, 11030, "2023/11/SBSG_1hr.csv")


class CSASMet(CSVPointData):
"""
"""
ALLOWED_VARIABLES = CSASVariables
ALLOWED_STATIONS = CSASStationInfo

# Data is in Mountain time
UTC_OFFSET_HOURS = -7

URL = "https://snowstudies.org/wp-content/uploads/"
DATASOURCE = "CSAS"
DOI = ""

def _file_urls(self, station_id, start, end):
"""
Navigate the system using dates. Data for SASP and SBSP is stored in
two csvs. 2003-2009 and 2010-2023. Not sure what happens when the
next year is made available. This function will grab the necessary urls
depending on the requested data
"""
urls = []
if station_id in ['SASP', 'SBSP']:
if start.year <= 2009:
urls.append(os.path.join(self.URL, self._station_info.path))

# Account for later file use or even straddling thge data
if start.year > 2009 or end.year > 2009: # TODO: add to the info enum?
partial = str(self._station_info.path).replace("2003", "2010")
# TODO: what happens in 2024?
filename = partial.replace('2009', '2023')
urls.append(os.path.join(self.URL, filename))

if start.year < 2003 or end.year > 2023:
raise InvalidDateRange("CSAS data is only available from 2003-2023")
else:
urls.append(os.path.join(self.URL, self._station_info.path))

return urls

@staticmethod
def _parse_datetime(row):
# Julian day is not zero based Jan 1 == DOY 1
dt = timedelta(days=int(row['DOY']) - 1, hours=int(row['Hour'] / 100))
return datetime(int(row['Year']), 1, 1) + dt

def _assign_datetime(self, resp_df):
resp_df['datetime'] = resp_df.apply(lambda row: self._parse_datetime(row),
axis=1)
return resp_df.set_index('datetime')
Loading
Loading