Haiyu Wu, Kevin W. Bowyer, What Should Be Balanced in a "Balanced“ Face Recognition Dataset?, BMVC, 2023, arXiv:2304.09818.
The issue of demographic disparities in face recognition accuracy has attracted increasing attention in recent years. Various face image datasets have been proposed as ’fair’ or ’balanced’ to assess the accuracy of face recognition algorithms across demographics. These datasets typically balance the number of identities and images across demographics. It is important to note that the number of identities and images in an evaluation dataset are not driving factors for 1-to-1 face matching accuracy. Moreover, balancing the number of identities and images does not ensure balance in other factors known to impact accuracy, such as head pose, brightness, and image quality. We demonstrate these issues using several recently proposed datasets. To improve the ability to perform less biased evaluations, we propose a bias-aware toolkit that facilitates creation of cross-demographic evaluation datasets balanced on factors mentioned in this paper.
If you use any part of our code or dataset, please cite our paper and VGGFace2 paper.
@article{wu2023real,
title={A Real Balanced Dataset For Understanding Bias? Factors That Impact Accuracy, Not Numbers of Identities and Images},
author={Wu, Haiyu and Bowyer, Kevin W},
journal={arXiv preprint arXiv:2304.09818},
year={2023}
}
@inproceedings{cao2018vggface2,
title={Vggface2: A dataset for recognising faces across pose and age},
author={Cao, Qiong and Shen, Li and Xie, Weidi and Parkhi, Omkar M and Zisserman, Andrew},
booktitle={2018 13th IEEE international conference on automatic face \& gesture recognition (FG 2018)},
pages={67--74},
year={2018},
organization={IEEE}
}
Now you can get the dataset at BA-test and extract the images by:
python3 get_images.py -zx .zx/file/path
You need to have the VGGFace2 (original version) dataset first, then use file_path_extractor.py to collect the image paths:
python3 file_path_extractor.py -s folder/path/of/vggface2 -d ./ -sfn vggface2 -end_with jpg
Then collecting BA-test images by running image_collection.py to collect the images and store the image paths to the "BA-test.txt" file.
python3 image_collection.py -im_path vggface2.txt -label ./BA-test/labels.csv
The last step is using run_face_alignment.py in img2pose repo to crop and align images.
After you having the BA-test dataset, you can simply run image_collection.py to collect the benchmark images.
python3 image_collection.py -im_path BA-test.txt -label ./benchmark/benchmark_labels.csv -dest ./benchmark/images -n BA-test_benchmark