fixed issue in precision converting annotations with "force_mask=True" #1746

0xD4rky · 2024-12-16T17:56:34Z

Description

When we use supervision to load YOLO annotations with force_masks=True, it internally converts normalized polygon coordinates from your YOLO text files into pixel coordinates (multiplying by image width/height) and then back into normalized coordinates when saving them out. During this round-trip, integer casting or rounding may occur, causing slight shifts in the polygon coordinates. This leads to “crooked” or misaligned masks.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

YOUR_ANSWER

Minimal Reproducible Code:

import numpy as np
import cv2
import os

resolution_wh = (640, 480)  
relative_polygon = np.array([
    [0.25, 0.4],
    [0.25, 0.6],
    [0.45, 0.6],
    [0.45, 0.4]
], dtype=np.float32)

def polygon_to_mask(polygon: np.ndarray, resolution_wh: tuple[int, int]) -> np.ndarray:
    """
    New approach: Convert to int at the last moment.
    """
    polygon_int = np.round(polygon).astype(np.int32)
    mask = np.zeros((resolution_wh[1], resolution_wh[0]), dtype=np.uint8)
    cv2.fillPoly(mask, [polygon_int], 1)
    return mask

def old_polygon_processing(relative_polygon: np.ndarray, resolution_wh: tuple[int,int]) -> np.ndarray:
    """
    Old (problematic) approach: Cast to int too early.
    """
    polygons = (relative_polygon * np.array(resolution_wh)).astype(int)
    return polygon_to_mask(polygons, resolution_wh)

def new_polygon_processing(relative_polygon: np.ndarray, resolution_wh: tuple[int,int]) -> np.ndarray:
    """
    New (improved) approach: Keep floats until mask creation.
    """
    polygons = relative_polygon * np.array(resolution_wh, dtype=np.float32)
    return polygon_to_mask(polygons, resolution_wh)

old_mask = old_polygon_processing(relative_polygon, resolution_wh)
cv2.imwrite("old_mask.png", old_mask.astype(np.uint8)*255)  

new_mask = new_polygon_processing(relative_polygon, resolution_wh)
cv2.imwrite("new_mask.png", new_mask.astype(np.uint8)*255) 

difference = np.bitwise_xor(old_mask, new_mask)
print("Number of differing pixels:", difference.sum())

# Instructions for Analysis:
# 1. Open old_mask.png and new_mask.png.
# 2. Check if the polygon edges appear more accurate in new_mask.png.
# 3. A reduced "Number of differing pixels" may indicate less distortion if comparing to a ground-truth mask.

Docs

The Docs haven't been updated yet, I need to check the validity of the PR with the maintainers first!

CLAassistant · 2024-12-16T17:56:40Z

All committers have signed the CLA.

SkalskiP · 2024-12-17T16:25:26Z

Hi @0xD4rky 👋🏻 thanks a lot for your interest in our library. It's true that the YOLO format requires normalization of box coordinates and masks, and loading and re-saving the dataset can lead to distortions, and we would like to minimize the level of these distortions.

However, before we decide to introduce any changes to supervision datasets, I need to see that your proposed solution actually minimizes the distortions. The test you attached only shows that the masks processed in two different ways are different. However, there is no reference point to the source polygon. That is, we don't know if and by how much the output polygon differs from the input one.

I would like to see a test where we have the source .txt file with annotations. This file is loaded and then saved back to disk. We can then compare the level of distortion.

0xD4rky · 2024-12-17T18:03:05Z

Thanks @SkalskiP for pointing out the need to verify that change. I forgot to add the verification to it. I created a sample label file to notice how polygon's coordinates used to change before the change and how does the change handle the polygon rounding.

The below is the piece of code I used to analyze the changes in polygon's observed coordinates.

import os
import numpy as np
import supervision as sv

test_dir = "test_annotation"
os.makedirs(test_dir, exist_ok=True)
images_dir = os.path.join(test_dir, "images")
labels_dir = os.path.join(test_dir, "labels")
os.makedirs(images_dir, exist_ok=True)
os.makedirs(labels_dir, exist_ok=True)

data_yaml_path = os.path.join(test_dir, "data.yaml")

with open(data_yaml_path, "w") as f:
    f.write("train: ./\nval: ./\nnames: ['class0']\n")
image_name = "example.jpg"
image_path = os.path.join(images_dir, image_name)
import cv2
dummy_img = np.zeros((480, 640, 3), dtype=np.uint8)
cv2.imwrite(image_path, dummy_img)

original_polygon = [
    "0 0.25 0.4 0.25 0.6 0.45 0.6 0.45 0.4\n"
]

label_path = os.path.join(labels_dir, "example.txt")
with open(label_path, "w") as f:
    f.writelines(original_polygon)

ds = sv.DetectionDataset.from_yolo(
    images_directory_path=images_dir,
    annotations_directory_path=labels_dir,
    data_yaml_path=data_yaml_path,
    force_masks=True
)

ds.as_yolo(annotations_directory_path=labels_dir)
with open(label_path, "r") as f:
    processed_lines = f.readlines()
processed_polygon_line = processed_lines[0].strip()

def parse_yolo_polygon(line):
    vals = line.split()
    cls = vals[0]
    coords = list(map(float, vals[1:]))
    return cls, np.array(coords, dtype=float).reshape(-1, 2)

orig_cls, orig_coords = parse_yolo_polygon(original_polygon[0])
proc_cls, proc_coords = parse_yolo_polygon(processed_polygon_line)

print("Original Polygon Coordinates (Normalized):")
print(orig_coords)
print("Processed Polygon Coordinates (Normalized):")
print(proc_coords)

differences = np.linalg.norm(orig_coords - proc_coords, axis=1)
avg_difference = np.mean(differences)
max_difference = np.max(differences)

print("Average per-point difference:", avg_difference)
print("Max per-point difference:", max_difference)

We start with a known polygon in normalized YOLO coordinates. After loading and saving via supervision, we compare the polygon coordinates before and after. By computing the numeric difference, we get a quantitative measure of how much the polygon has been distorted.

the results before the changes are as follows:

the results after the changes are as follows:

You can see how the processed polygon coordinates are similar to the original coordinates after we have taken the changes into consideration.

One extra point: I will make one extra change in the code in the _polygons_to_masks function i.e. mask = mask[None, ...] so as to make mask (1,H,W) in dimension from (H,W).

0xD4rky · 2024-12-23T05:26:55Z

hey @SkalskiP 👋, sorry to ping you but coud you review the changes?

0xD4rky added 2 commits December 16, 2024 23:16

changing polygon format conversion

4e669b6

reformating polygon type conversion

fd4e84e

0xD4rky requested review from SkalskiP and onuralpszr as code owners December 16, 2024 17:56

fix(pre_commit): 🎨 auto format pre-commit hooks

b593c1e

0xD4rky changed the title ~~Resolving Issue #368 ["force_mask = True"]~~ fixed issue in precision converting annotations with "force_mask=True" Dec 17, 2024

0xD4rky mentioned this pull request Dec 17, 2024

Possible issue in precision converting annotations with "force_mask=True" #369

Open

2 tasks

0xD4rky added 3 commits December 17, 2024 23:40

handling mask dimension

45e0259

handling mask dimension

d93e2e8

handling mask dimensions

373b661

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed issue in precision converting annotations with "force_mask=True" #1746

fixed issue in precision converting annotations with "force_mask=True" #1746

0xD4rky commented Dec 16, 2024 •

edited by onuralpszr

Loading

CLAassistant commented Dec 16, 2024 •

edited

Loading

SkalskiP commented Dec 17, 2024

0xD4rky commented Dec 17, 2024

0xD4rky commented Dec 23, 2024

fixed issue in precision converting annotations with "force_mask=True" #1746

Are you sure you want to change the base?

fixed issue in precision converting annotations with "force_mask=True" #1746

Conversation

0xD4rky commented Dec 16, 2024 • edited by onuralpszr Loading

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Minimal Reproducible Code:

Docs

CLAassistant commented Dec 16, 2024 • edited Loading

SkalskiP commented Dec 17, 2024

0xD4rky commented Dec 17, 2024

0xD4rky commented Dec 23, 2024

0xD4rky commented Dec 16, 2024 •

edited by onuralpszr

Loading

CLAassistant commented Dec 16, 2024 •

edited

Loading