Skip to content

Commit

Permalink
Fix pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
HeNeos committed Aug 24, 2024
1 parent 4afc735 commit d4184e0
Show file tree
Hide file tree
Showing 13 changed files with 248 additions and 4 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
run: |
gem install bundler # Ensure bundler is installed
bundle install # Install dependencies from Gemfile
bundle exec appraisal install # Install appraisal gemfiles (if using appraisal)
# bundle exec appraisal install # Install appraisal gemfiles (if using appraisal)
- name: Build site
run: |
bundle exec jekyll build --future # Build the Jekyll site (without appraisal unless necessary)
Expand Down
2 changes: 1 addition & 1 deletion _layouts/page.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<main class="{% if page.full-width %} container-fluid {% else %} container-md {% endif %}">
<div class="row">
<div class="{% if page.full-width %} col {% else %} col-xl-8 offset-xl-2 col-lg-10 offset-lg-1 {% endif %}">
<div class="{% if page.full-width %} col {% else %} col-xl-10 offset-xl-1 col-lg-11 offset-lg-1 {% endif %}">
{% if page.before-content %}
<div class="before-content">
{% for file in page.before-content %}
Expand Down
2 changes: 1 addition & 1 deletion _layouts/post.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<main class="{% if page.full-width %} container-fluid {% else %} container-md {% endif %}">
<div class="row">
<div class="{% if page.full-width %} col {% else %} col-xl-8 offset-xl-2 col-lg-10 offset-lg-1 {% endif %}">
<div class="{% if page.full-width %} col {% else %} col-xl-10 offset-xl-1 col-lg-11 offset-lg-1 {% endif %}">

{% if page.gh-repo %}
{% assign gh_split = page.gh-repo | split:'/' %}
Expand Down
246 changes: 245 additions & 1 deletion _posts/2024-08-23-cloud-graphs-algorithms-in-maps.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def get_lat_lon(address: str) -> Optional[Coordinates]:

### Validating positions

The biggest issue for this application is `downloading` and `plotting` the maps, the problem is that if you want to find the shortest path between Paris and Berlin, then you have to download a map which contains both and it's really heavy. So, for now we are also limiting it to positions in the same city.
The biggest issue for this application is `downloading` and `plotting` the maps, the problem is that if you want to find the shortest path between Paris and Berlin, then you have to download a map that contains both and it's really big/heavy. So, for now we are also limiting it to positions in the same city.

If we know the latitude and longitude for the `source` and `destination`, then we have to verify know if both are in the same country and city:

Expand All @@ -73,3 +73,247 @@ def get_current_location(

```

### Getting node id

So far, we are still working with positions and `latitude` and `longitude`, however in `graphs`, we prefer to work with **nodes** and **edges**. Think for a moment, do we really need to know what is the latitude and longitude of every position? Isn't it enough if we have `nodes` represented by some `id` and `edges` that connects those `nodes` and has some `length`?

That's actually what we need, we don't have to care about geographical positions, instead we can abstract it to just an `id`. `NetworkX` allows us to download a graph knowing the city and country, which we already have from the previous step. This `graph` has `nodes` and `edges`, so our task is to map the position for source and destination to a valid `id` in the graph.

```py
def get_node_id(graph: Union[MultiDiGraph, NGraph], location: Coordinates) -> NodeId:
return cast(NodeId, ox.nearest_nodes(graph, location.longitude, location.latitude))

G = download_graph(country, city)
graph: Graph = generate_graph(G)
source = get_node_id(G, source_coordinates)
destination = get_node_id(G, destination_coordinates)
```

there are faster ways to calculate, like *caching* the map and storing some data for the map in *dynamo*, but it will be explained later.

## Graph algorithms

This is by far the most interesting section, this is heavily focused on the algorithm in an abstract way, here we don't have to think in maps, instead in the abstract representation: `graph`.

If you remember the introduction, our objective is to find the **fastest** path, not the **shortest** path, both terms could be missconcepted since commonly attributes to the same thing, but in the fastest path we want to minimize the **time**, and in shortest the objective is to minimize the **length**.

Of course, both quantities are related, if you remember your physic class or follow your intuition you can get:

\begin{aligned}
\Delta t = \frac{\Delta \mathrm{length}}{\mathrm{speed}}
\end{aligned}

However, this is only true if the `speed` is constant across the complete length, which of course is something we can't guarantee, there are some roads with speed limit and then we have a limitation for each edge, to refine this idea, we can consider a discrete approach:

\begin{aligned}
\Delta t = \sum_{i} \frac{\Delta \text{lengt}h_{i}}{\text{speed}_{i}}
\end{aligned}

where `length[i]` is the length for the `edge[i]` and `speed[i]` its maximum allowed speed.

It's the only modification we have to do, and the data for the maximum allowed speed is provided from `OpenStreetMaps`.

If you remember well, Dijkstra is an algorithm to find the shortest path between a source to any other node in the graph, which is not the case for the application, it will work, but there are *better* ways.

### A* algorithm

A* is an heuristic algorithm, it uses the fact that given two nodes, if we can approximate and find a lower bound for the distance between them, then it's possible to have a better insight about what is the next best node, it will reduce the number of iterations drastically.

If you remember well, every node represents a position in a real map, it means it has a latitude and longitude, then the eulerian distance between these two nodes is just:

\begin{aligned}
d = \sqrt{(\Delta x)^2 + (\Delta y)^2}
\end{aligned}

wait, the earth is not flat!, it means the minimum distance between two positions in the earth surface is not an straight line, that's also the reason why we are using `latitude` and `longitude` instead of `x, y, z` positions. Instead of using eulerian distance, we have to use `Haversine` distance:

\begin{aligned}
d = 2r\arcsin\bigg( \sqrt{\frac{1-\cos(\Delta \phi) + \cos\phi_{1}\cdot \cos\phi_{2}\cdot (1-\cos(\Delta \lambda))}{2}} \bigg)
\end{aligned}

and $r \approx 6371 \text{km}$.

This is my implementation in **Rust**, don't ask me why I wrote it in rust, I just wanted to try something new :D.

```rust
let destination_node = graph.nodes.get(&destination).unwrap().clone();
while let Some(State { weight: _, node_id }) = priority_queue.pop() {
let weight_to_node = weight_from_source
.get(&node_id)
.copied()
.unwrap_or(INFINITY);
if node_id == destination {
return Some((
previous_node,
visited_edges,
Vec::from_iter(active_edges),
weight_to_node,
iteration,
));
}
let current_node: Node = graph.nodes.get(&node_id).unwrap().clone();
if visited_nodes.contains(&node_id) {
continue;
}
visited_nodes.insert(node_id);
let next_nodes_id: Vec<NodeId> = current_node.next_nodes;
for next_node_id in &next_nodes_id {
iteration += 1;
let next_node: Node = graph.nodes.get(&next_node_id).unwrap().clone();
let current_edge_id: EdgeId = (node_id, *next_node_id);
let current_edge: Edge = graph.edges.get(&current_edge_id).unwrap().clone();
visited_edges.push(current_edge_id);
active_edges.remove(&current_edge_id);
let edge_weight: f64 = (current_edge.length / 1000.) / (current_edge.maxspeed as f64);
let destination_distance: f64 = find_distance_by_nodes(
next_node.lat,
next_node.lon,
destination_node.lat,
destination_node.lon,
)
.await;
let heuristic_weight: f64 = destination_distance / max_speed_allowed;
let new_weight: f64 = weight_to_node + edge_weight;
if weight_from_source
.get(next_node_id)
.copied()
.unwrap_or(INFINITY)
> new_weight
{
weight_from_source.insert(*next_node_id, new_weight);
previous_node.insert(*next_node_id, node_id);
priority_queue.push(State {
weight: new_weight + heuristic_weight,
node_id: *next_node_id,
});
let nodes_to_visit: Vec<NodeId> =
graph.nodes.get(&next_node_id).unwrap().clone().next_nodes;
for to_visit_node_id in &nodes_to_visit {
active_edges.insert((*next_node_id, *to_visit_node_id));
}
}
}
}
```

### A* enhanced algorithm

This is my modification of the A* algorithm, it can't guarantee you a path but in most of the cases I've tested, it was able to beat A* and find the same solution in less iterations.

My idea is to use a level max distance which is constantly updated for a node $u$. We know the distance between $u$ and the destination node is less than the Haversine distance. If the haversine distance is greater than `2 * best distance / ln(1 + best distance)` then the node is skipped, the best distance is calculated as the minimum between the `Haversine(source, destination)` and `Haversine(u, destination)`

```rust
if level_max_distance != INFINITY {
level_max_distance = f64::max(level_max_distance, destination_distance);
} else {
level_max_distance = destination_distance;
}
if best_node_distance != INFINITY {
if destination_distance * f64::min(1.0, (1.0 + best_node_distance).ln())
> 2.0 * best_node_distance
{
continue;
} else {
best_node_distance =
f64::min(source_to_destination_min_distance, destination_distance);
}
}
```

What does it mean? It means that nodes that does a back step are more penalized than the nodes that move you forward to the destination. It will be clearer when you can see the outputs

## Plotting

`NetworkX` already allows us to plot a downloaded map and customize nodes and edges, so the remaining step is to find the visited edges and the edges that belong to the fastest path.

```py
def save_graph(
graph: MultiDiGraph,
edges_in_path: Set[EdgeId],
visited: Set[EdgeId],
active: Set[EdgeId],
source: NodeId,
destination: NodeId,
solution_key: str,
dist: float,
time: str,
) -> str:
node_size: List[float] = []
node_alpha: List[float] = []
node_color: List[Color] = []
for node in cast(List[NodeId], graph.nodes):
if node in (source, destination):
node_size.append(POINT_SIZE)
node_alpha.append(POINT_ALPHA)
if node == source:
node_color.append("blue")
else:
node_color.append("red")
else:
node_size.append(NODE_SIZE)
node_alpha.append(NODE_ALPHA)
node_color.append("white")
edge_alpha: List[float] = []
edge_color: List[str | Tuple[float, float, float, float]] = []
edge_linewidth: List[float] = []
for edge in graph.edges:
edge_id = (edge[0], edge[1])
if edge_id in edges_in_path:
edge_color.append(PathEdge.color)
edge_alpha.append(PathEdge.alpha)
edge_linewidth.append(PathEdge.linewidth)
elif edge_id in visited:
edge_color.append(VisitedEdge.color)
edge_alpha.append(VisitedEdge.alpha)
edge_linewidth.append(VisitedEdge.linewidth)
elif edge_id in active:
edge_color.append(ActiveEdge.color)
edge_alpha.append(ActiveEdge.alpha)
edge_linewidth.append(ActiveEdge.linewidth)
else:
edge_color.append(UnvisitedEdge.color)
edge_alpha.append(UnvisitedEdge.alpha)
edge_linewidth.append(UnvisitedEdge.linewidth)

fig, ax = ox.plot_graph(
graph,
node_size=node_size, # type: ignore
node_alpha=node_alpha, # type: ignore
edge_color=edge_color, # type: ignore
edge_alpha=edge_alpha,
edge_linewidth=edge_linewidth, # type: ignore
node_color=node_color, # type: ignore
bgcolor="#000000",
show=False,
close=False,
)
title: str = "\n".join([f"Distance: {dist} km", f"Time: {time}"])
ax.set_title(title, color="#3b528b", fontsize=10)
```

## Final considerations

I've explained in big picture how this project was built, however there are a lot more details that were no explained before, however I want to give a quick mention to them.

1. Step functions to orchestrate the lambda functions
2. Preload most common graphs and upload them to S3 with a unique graph id and store it in dynamo for quick queries.
3. If the map is already in S3, then no need to download the complete graph, only an small radius around the position to calculate the node id.
4. Instead of using the heavy map from `NetworkX`, use a simplified graph which only contains necessary information. Store it in S3 as well.
5. Deploy an API GW and presigned url to retrieve the plot.
6. Use lifecycle configuration to delete old plots.

## Results

| Place | Dijkstra | A* |
|-------|----------|----|
| Milan | <img src="https://raw.githubusercontent.com/HeNeos/heneos.github.io/master/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Milan.png" width="900"> | <img src="https://raw.githubusercontent.com/HeNeos/heneos.github.io/master/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Milan.png" width="900"> |
| Munich | <img src="https://raw.githubusercontent.com/HeNeos/heneos.github.io/master/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Munich.png" width="900"> | <img src="https://raw.githubusercontent.com/HeNeos/heneos.github.io/master/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Munich.png" width="900"> |
| Paris | <img src="https://raw.githubusercontent.com/HeNeos/heneos.github.io/master/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Paris.png" width="900"> | <img src="https://raw.githubusercontent.com/HeNeos/heneos.github.io/master/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Paris.png" width="900"> |

Compare it with my modified version :)

| Place | A* enhanced |
|-------|-------------|
| Milan | <img src="https://raw.githubusercontent.com/HeNeos/heneos.github.io/master/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Milan.png" width="500"> |
| Munich | <img src="https://raw.githubusercontent.com/HeNeos/heneos.github.io/master/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Munich.png" width="500"> |
| Paris | <img src="https://raw.githubusercontent.com/HeNeos/heneos.github.io/master/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Paris.png" width="500"> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d4184e0

Please sign in to comment.