BC Wildfires & Recreational Trails Data Engineering Project

As part of the Data Engineering Zoomcamp by DataTalks.Club, this project marks the final component of the course. Many thanks to the instructors that contributed to building this educational resource.

This project no longer will be maintained as it used a temporary free trial account on Google Cloud Platform.

Figure 6. Sample of shapefile contents from data source.
Figure 7. Comparison between geojson and newline-delimited geojson format, processed using the geojson2ndjson command-line tool.
Figure 8. Airflow DAG graph for processing the recreation trails dataset.
Figure 9. Airflow DAG graph for processing the wildfire perimeters dataset.
Figure 10. Airflow DAG graph for running the DBT models (staging and core)

Data Warehouse Transformations

Figure 11. SQL code in the DBT staging model that transforms raw data stored in BigQuery.
Figure 12. SQL code in the DBT core model that performs a spatial join on staging data in BigQuery.
Figure 13. Diagram of the staging and core DBT model dependencies.

Future Opportunities

Purpose

The purpose of this project was to design and develop a prototype of a modern data pipeline focused on wildfire activity and recreational trails in British Columbia (BC), Canada.

Background

Statistics from the 2023 wildfire season:

2,245 wildfires burned more than 2.84 million hectares of forest and land.
29,900 calls were made to the Provincial Wildfire Reporting Centre, generating 18,200 wildfire reports.
There were an estimated 208 evacuation orders which affected approximately 24,000 properties and roughly 48,000 people.

Wildfire season in BC is also the season during which outdoor recreational activity is very popular. The province is well-known for its beautiful, mountainous, and coastal landscape with ample opportunity for hiking, camping, mountain biking etc.

The final deliverable of this project is a web map application visualizing provincial datasets of wildfire perimeters and recreational trails analyzed to be within 20 kilometers of any wildfire, active or inactive.

Web Map

Click here to view. Requires login with a Google account. (May initially take a few minutes to download data)

Figure 1. Overview of the final report visualized as a web map application using Dekart.

Figure 2. Summary table of recreation trails and counts of nearby (within 20km) active and inactive wildfires viewed in the map application.

Data Stack

Development Platform: Docker
Infrastructure as Code (IAC): Terraform
Orchestration: Apache Airflow
Data Lake: Google Cloud Storage
Data Warehouse: Google Big Query
Transformation Manager: Data Built Tool (DBT)
Data Visualization: Dekart (built on top of kepler.gl and mapbox)

Architecture

Figure 3. Diagram modelling the tools used in this project.

Data Sources

BC Wildfire Fire Perimeters - Current
- This is a spatial layer (shapefile) of polygons showing both active and inactive fires for the current fire season. The data is refreshed from operational systems every 15 minutes. These perimeters are rolled over to Historical Fire Polygons on April 1st of each year. With automation in this project, this dataset is dynamic.
- Limitations:
  - Wildfire data may not reflect the most current fire situation as fires are dynamic and circumstances may change quickly. Wildfire data is refreshed when practicable and individual fire update frequency will vary.
Recreation Line
- This is a spatial layer (shapefile) of polylines showing features such as recreation trails. In this project, this dataset is static.
- Limitations:
  - These polylines represent recreational trails and may not include recreational projects where polygon features would be more appropriate. Trails that are not registered within the BC Minstry of Forests are not included. Data is not updated on a regular schedule and the decision was made to keep this data static for the purposes of this project.
All data was obtained through the BC Data Catalogue. Original sources include BC Wildfire Service and Recreational Sites & Trails BC. (Attribution: Contains information licensed under the Open Government Licence – British Columbia)

Figure 4. Example of visualization of wildfire perimeters data in the final report.

Figure 5. Example of visualization of recreation trails data in the final report.

Setup

Google Cloud Platform
- Services account and project
- IAM user permissions and API's
- Credentials keyfile and ssh client
- VM instance
VM Instance
- Anaconda, Docker, Terraform installation
- GCP credentials retrieval
Docker
- Docker build context and volume mapping
- Forward port 8080 of VM to local device to access Airflow UI at localhost:8080
Terraform
- Configure GCP provider with credentials
- Resource configuration (i.e., storage bucket, dataset)

Workflow Orchestration

Apache Airflow was used as a workflow orchestrator to manage the tasks of data ingestion, storage, and transformation. Tasks were configured using either Python or Bash operators and defined in Directed Acyclic Graphs (DAGs).
Data was ingested in shapefile format. Prior to loading data to BigQuery, necessary transformations included:
- Conversion to geographic coordinate system EPSG:4326 (WGS84 datum, lat/lon coordinates)
- File conversion to geojson format
- File conversion to newline-delimited geojson format
More on loading spatial data to BigQuery here.

Figure 6. Sample of shapefile contents from data source.

Figure 7. Comparison between geojson and newline-delimited geojson format, processed using the geojson2ndjson command-line tool.

Figure 8. Airflow DAG graph for processing the recreation trails dataset.

Figure 9. Airflow DAG graph for processing the wildfire perimeters dataset.

Figure 10. Airflow DAG graph for running the DBT models (staging and core).

Data Warehouse Transformations

DBT manages the execution of SQL-based transformations in BigQuery. The DBT models store the dependencies and configurations for how the results are materialized. The staging model involves type casting and formatting while the core model includes queries for performing a spatial join and data aggregation for a summary table.

Figure 11. SQL code in the DBT staging model that transforms raw data stored in BigQuery.

Figure 12. SQL code in the DBT core model that performs a spatial join on staging data in BigQuery.

Figure 13. Diagram of the staging and core DBT model dependencies.

Data Visualization

Post-transformation, the data was visualized in a web map application using Dekart which connects to BigQuery and works on top of kepler.gl and mapbox to display data layers.
Features:
- Map legend with attribute-dependent categorization
- Dynamic display of latitude and longitude coordinates
- Location search
- Table and map view of the whole wildfire dataset and the filtered recreation trails results
- Table view of the aggregated summary table
- Customizable basemap, layer filters, tool tips, export data and image options

Future Opportunities

DBT Cloud production jobs and CI/CD jobs
Additional data sources for recreational projects in proximity to wildfires
Alternative mapping visualization tools such as Folium & Dash with consideration to speed, UI/UX features, and security
Incorporating Apache Sedona (Geospark) to explore spatial analytics capabilities using its distributed clustering system in combination with Google DataProc
Integration with the web page for provincial trail closures

Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Terraform		Terraform
airflow		airflow
dbt		dbt
images		images
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BC Wildfires & Recreational Trails Data Engineering Project

Table of Contents

Purpose

Background

Web Map

Data Stack

Architecture

Data Sources

Setup

Workflow Orchestration

Data Warehouse Transformations

Data Visualization

Future Opportunities

About

Releases

Packages

Languages

casschow98/de_zoomcamp_2024_bc_wildfires_project

Folders and files

Latest commit

History

Repository files navigation

BC Wildfires & Recreational Trails Data Engineering Project

Table of Contents

Purpose

Background

Web Map

Data Stack

Architecture

Data Sources

Setup

Workflow Orchestration

Data Warehouse Transformations

Data Visualization

Future Opportunities

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages