diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 56dee7320b81..886a753aca23 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -13,7 +13,7 @@ jobs: run: | gem install bundler # Ensure bundler is installed bundle install # Install dependencies from Gemfile - bundle exec appraisal install # Install appraisal gemfiles (if using appraisal) + # bundle exec appraisal install # Install appraisal gemfiles (if using appraisal) - name: Build site run: | bundle exec jekyll build --future # Build the Jekyll site (without appraisal unless necessary) diff --git a/_layouts/page.html b/_layouts/page.html index f66f3e42db1b..b149e4fc9ce2 100644 --- a/_layouts/page.html +++ b/_layouts/page.html @@ -6,7 +6,7 @@
-
+
{% if page.before-content %}
{% for file in page.before-content %} diff --git a/_layouts/post.html b/_layouts/post.html index 9f27cc3109c0..53e503a39175 100644 --- a/_layouts/post.html +++ b/_layouts/post.html @@ -6,7 +6,7 @@
-
+
{% if page.gh-repo %} {% assign gh_split = page.gh-repo | split:'/' %} diff --git a/_posts/2024-08-23-cloud-graphs-algorithms-in-maps.md b/_posts/2024-08-23-cloud-graphs-algorithms-in-maps.md index c5f00f1eb261..deba9422ce51 100644 --- a/_posts/2024-08-23-cloud-graphs-algorithms-in-maps.md +++ b/_posts/2024-08-23-cloud-graphs-algorithms-in-maps.md @@ -49,7 +49,7 @@ def get_lat_lon(address: str) -> Optional[Coordinates]: ### Validating positions -The biggest issue for this application is `downloading` and `plotting` the maps, the problem is that if you want to find the shortest path between Paris and Berlin, then you have to download a map which contains both and it's really heavy. So, for now we are also limiting it to positions in the same city. +The biggest issue for this application is `downloading` and `plotting` the maps, the problem is that if you want to find the shortest path between Paris and Berlin, then you have to download a map that contains both and it's really big/heavy. So, for now we are also limiting it to positions in the same city. If we know the latitude and longitude for the `source` and `destination`, then we have to verify know if both are in the same country and city: @@ -73,3 +73,247 @@ def get_current_location( ``` +### Getting node id + +So far, we are still working with positions and `latitude` and `longitude`, however in `graphs`, we prefer to work with **nodes** and **edges**. Think for a moment, do we really need to know what is the latitude and longitude of every position? Isn't it enough if we have `nodes` represented by some `id` and `edges` that connects those `nodes` and has some `length`? + +That's actually what we need, we don't have to care about geographical positions, instead we can abstract it to just an `id`. `NetworkX` allows us to download a graph knowing the city and country, which we already have from the previous step. This `graph` has `nodes` and `edges`, so our task is to map the position for source and destination to a valid `id` in the graph. + +```py +def get_node_id(graph: Union[MultiDiGraph, NGraph], location: Coordinates) -> NodeId: + return cast(NodeId, ox.nearest_nodes(graph, location.longitude, location.latitude)) + +G = download_graph(country, city) +graph: Graph = generate_graph(G) +source = get_node_id(G, source_coordinates) +destination = get_node_id(G, destination_coordinates) +``` + +there are faster ways to calculate, like *caching* the map and storing some data for the map in *dynamo*, but it will be explained later. + +## Graph algorithms + +This is by far the most interesting section, this is heavily focused on the algorithm in an abstract way, here we don't have to think in maps, instead in the abstract representation: `graph`. + +If you remember the introduction, our objective is to find the **fastest** path, not the **shortest** path, both terms could be missconcepted since commonly attributes to the same thing, but in the fastest path we want to minimize the **time**, and in shortest the objective is to minimize the **length**. + +Of course, both quantities are related, if you remember your physic class or follow your intuition you can get: + +\begin{aligned} +\Delta t = \frac{\Delta \mathrm{length}}{\mathrm{speed}} +\end{aligned} + +However, this is only true if the `speed` is constant across the complete length, which of course is something we can't guarantee, there are some roads with speed limit and then we have a limitation for each edge, to refine this idea, we can consider a discrete approach: + +\begin{aligned} +\Delta t = \sum_{i} \frac{\Delta \text{lengt}h_{i}}{\text{speed}_{i}} +\end{aligned} + +where `length[i]` is the length for the `edge[i]` and `speed[i]` its maximum allowed speed. + +It's the only modification we have to do, and the data for the maximum allowed speed is provided from `OpenStreetMaps`. + +If you remember well, Dijkstra is an algorithm to find the shortest path between a source to any other node in the graph, which is not the case for the application, it will work, but there are *better* ways. + +### A* algorithm + +A* is an heuristic algorithm, it uses the fact that given two nodes, if we can approximate and find a lower bound for the distance between them, then it's possible to have a better insight about what is the next best node, it will reduce the number of iterations drastically. + +If you remember well, every node represents a position in a real map, it means it has a latitude and longitude, then the eulerian distance between these two nodes is just: + +\begin{aligned} +d = \sqrt{(\Delta x)^2 + (\Delta y)^2} +\end{aligned} + +wait, the earth is not flat!, it means the minimum distance between two positions in the earth surface is not an straight line, that's also the reason why we are using `latitude` and `longitude` instead of `x, y, z` positions. Instead of using eulerian distance, we have to use `Haversine` distance: + +\begin{aligned} +d = 2r\arcsin\bigg( \sqrt{\frac{1-\cos(\Delta \phi) + \cos\phi_{1}\cdot \cos\phi_{2}\cdot (1-\cos(\Delta \lambda))}{2}} \bigg) +\end{aligned} + +and $r \approx 6371 \text{km}$. + +This is my implementation in **Rust**, don't ask me why I wrote it in rust, I just wanted to try something new :D. + +```rust +let destination_node = graph.nodes.get(&destination).unwrap().clone(); +while let Some(State { weight: _, node_id }) = priority_queue.pop() { + let weight_to_node = weight_from_source + .get(&node_id) + .copied() + .unwrap_or(INFINITY); + if node_id == destination { + return Some(( + previous_node, + visited_edges, + Vec::from_iter(active_edges), + weight_to_node, + iteration, + )); + } + let current_node: Node = graph.nodes.get(&node_id).unwrap().clone(); + if visited_nodes.contains(&node_id) { + continue; + } + visited_nodes.insert(node_id); + let next_nodes_id: Vec = current_node.next_nodes; + for next_node_id in &next_nodes_id { + iteration += 1; + let next_node: Node = graph.nodes.get(&next_node_id).unwrap().clone(); + let current_edge_id: EdgeId = (node_id, *next_node_id); + let current_edge: Edge = graph.edges.get(¤t_edge_id).unwrap().clone(); + visited_edges.push(current_edge_id); + active_edges.remove(¤t_edge_id); + let edge_weight: f64 = (current_edge.length / 1000.) / (current_edge.maxspeed as f64); + let destination_distance: f64 = find_distance_by_nodes( + next_node.lat, + next_node.lon, + destination_node.lat, + destination_node.lon, + ) + .await; + let heuristic_weight: f64 = destination_distance / max_speed_allowed; + let new_weight: f64 = weight_to_node + edge_weight; + if weight_from_source + .get(next_node_id) + .copied() + .unwrap_or(INFINITY) + > new_weight + { + weight_from_source.insert(*next_node_id, new_weight); + previous_node.insert(*next_node_id, node_id); + priority_queue.push(State { + weight: new_weight + heuristic_weight, + node_id: *next_node_id, + }); + let nodes_to_visit: Vec = + graph.nodes.get(&next_node_id).unwrap().clone().next_nodes; + for to_visit_node_id in &nodes_to_visit { + active_edges.insert((*next_node_id, *to_visit_node_id)); + } + } + } +} +``` + +### A* enhanced algorithm + +This is my modification of the A* algorithm, it can't guarantee you a path but in most of the cases I've tested, it was able to beat A* and find the same solution in less iterations. + +My idea is to use a level max distance which is constantly updated for a node $u$. We know the distance between $u$ and the destination node is less than the Haversine distance. If the haversine distance is greater than `2 * best distance / ln(1 + best distance)` then the node is skipped, the best distance is calculated as the minimum between the `Haversine(source, destination)` and `Haversine(u, destination)` + +```rust +if level_max_distance != INFINITY { + level_max_distance = f64::max(level_max_distance, destination_distance); +} else { + level_max_distance = destination_distance; +} +if best_node_distance != INFINITY { + if destination_distance * f64::min(1.0, (1.0 + best_node_distance).ln()) + > 2.0 * best_node_distance + { + continue; + } else { + best_node_distance = + f64::min(source_to_destination_min_distance, destination_distance); + } +} +``` + +What does it mean? It means that nodes that does a back step are more penalized than the nodes that move you forward to the destination. It will be clearer when you can see the outputs + +## Plotting + +`NetworkX` already allows us to plot a downloaded map and customize nodes and edges, so the remaining step is to find the visited edges and the edges that belong to the fastest path. + +```py +def save_graph( + graph: MultiDiGraph, + edges_in_path: Set[EdgeId], + visited: Set[EdgeId], + active: Set[EdgeId], + source: NodeId, + destination: NodeId, + solution_key: str, + dist: float, + time: str, +) -> str: + node_size: List[float] = [] + node_alpha: List[float] = [] + node_color: List[Color] = [] + for node in cast(List[NodeId], graph.nodes): + if node in (source, destination): + node_size.append(POINT_SIZE) + node_alpha.append(POINT_ALPHA) + if node == source: + node_color.append("blue") + else: + node_color.append("red") + else: + node_size.append(NODE_SIZE) + node_alpha.append(NODE_ALPHA) + node_color.append("white") + edge_alpha: List[float] = [] + edge_color: List[str | Tuple[float, float, float, float]] = [] + edge_linewidth: List[float] = [] + for edge in graph.edges: + edge_id = (edge[0], edge[1]) + if edge_id in edges_in_path: + edge_color.append(PathEdge.color) + edge_alpha.append(PathEdge.alpha) + edge_linewidth.append(PathEdge.linewidth) + elif edge_id in visited: + edge_color.append(VisitedEdge.color) + edge_alpha.append(VisitedEdge.alpha) + edge_linewidth.append(VisitedEdge.linewidth) + elif edge_id in active: + edge_color.append(ActiveEdge.color) + edge_alpha.append(ActiveEdge.alpha) + edge_linewidth.append(ActiveEdge.linewidth) + else: + edge_color.append(UnvisitedEdge.color) + edge_alpha.append(UnvisitedEdge.alpha) + edge_linewidth.append(UnvisitedEdge.linewidth) + + fig, ax = ox.plot_graph( + graph, + node_size=node_size, # type: ignore + node_alpha=node_alpha, # type: ignore + edge_color=edge_color, # type: ignore + edge_alpha=edge_alpha, + edge_linewidth=edge_linewidth, # type: ignore + node_color=node_color, # type: ignore + bgcolor="#000000", + show=False, + close=False, + ) + title: str = "\n".join([f"Distance: {dist} km", f"Time: {time}"]) + ax.set_title(title, color="#3b528b", fontsize=10) +``` + +## Final considerations + +I've explained in big picture how this project was built, however there are a lot more details that were no explained before, however I want to give a quick mention to them. + +1. Step functions to orchestrate the lambda functions +2. Preload most common graphs and upload them to S3 with a unique graph id and store it in dynamo for quick queries. +3. If the map is already in S3, then no need to download the complete graph, only an small radius around the position to calculate the node id. +4. Instead of using the heavy map from `NetworkX`, use a simplified graph which only contains necessary information. Store it in S3 as well. +5. Deploy an API GW and presigned url to retrieve the plot. +6. Use lifecycle configuration to delete old plots. + +## Results + +| Place | Dijkstra | A* | +|-------|----------|----| +| Milan | | | +| Munich | | | +| Paris | | | + +Compare it with my modified version :) + +| Place | A* enhanced | +|-------|-------------| +| Milan | | +| Munich | | +| Paris | | \ No newline at end of file diff --git a/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Milan.png b/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Milan.png new file mode 100644 index 000000000000..06d961e30ce0 Binary files /dev/null and b/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Milan.png differ diff --git a/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Munich.png b/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Munich.png new file mode 100644 index 000000000000..736cd83f1d5d Binary files /dev/null and b/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Munich.png differ diff --git a/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Paris.png b/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Paris.png new file mode 100644 index 000000000000..439bcd12dcbb Binary files /dev/null and b/assets/img/cloud/graphs_algorithms_in_maps/a_star-path_Paris.png differ diff --git a/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Milan.png b/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Milan.png new file mode 100644 index 000000000000..582a3dd7a0a1 Binary files /dev/null and b/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Milan.png differ diff --git a/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Munich.png b/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Munich.png new file mode 100644 index 000000000000..49eca8229198 Binary files /dev/null and b/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Munich.png differ diff --git a/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Paris.png b/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Paris.png new file mode 100644 index 000000000000..c4a6c1755781 Binary files /dev/null and b/assets/img/cloud/graphs_algorithms_in_maps/a_star_enhanced-path_Paris.png differ diff --git a/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Milan.png b/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Milan.png new file mode 100644 index 000000000000..63920333e723 Binary files /dev/null and b/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Milan.png differ diff --git a/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Munich.png b/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Munich.png new file mode 100644 index 000000000000..f5f0943d911d Binary files /dev/null and b/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Munich.png differ diff --git a/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Paris.png b/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Paris.png new file mode 100644 index 000000000000..ce93c1958f93 Binary files /dev/null and b/assets/img/cloud/graphs_algorithms_in_maps/dijkstra-path_Paris.png differ