diff --git a/README.md b/README.md index 0e6b85e..f476e16 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Light-weight and Efficient Networks for Mobile Vision Applications ## :rocket: News -* Training and evaluation code along with pre-trained models will be released soon. Stay tuned! +* Training and evaluation code along with pre-trained models is released.
@@ -10,5 +10,68 @@ Light-weight and Efficient Networks for Mobile Vision Applications > **Abstract:** *Designing lightweight general purpose networks for edge devices is a challenging task due to the compute constraints. In this domain, CNN-based light-weight architectures are considered the de-facto choice due to their efficiency in terms of parameters and complexity. However, they are based on spatially local operations and exhibit a limited receptive field. While vision transformers alleviate these issues and can learn global representations, they are typically compute intensive and difficult to optimize. Here, we investigate how to effectively encode both local and global information, while being efficient in terms of both parameters and MAdds on vision tasks. To this end, we propose EdgeNeXt, a hybrid CNN-Transformer architecture that strives to jointly optimize parameters and MAdds for efficient inference on edge devices. Within our EdgeNeXt, we introduce split depthwise transpose attention (SDTA) encoder that splits input tensors into multiple channel groups and utilizes depthwise convolution along with self-attention across channel dimensions to implicitly increase the receptive field and encode multi-scale features. Our extensive experiments on classification, detection and segmentation settings, reveal the merits of the proposed approach, outperforming state-of-the-art methods with comparatively lower compute requirements. Our EdgeNeXt model with 1.3M parameters achieves 71.2\% top-1 accuracy on ImageNet-1K, outperforming MobileViT with an absolute gain of 2.2\% with similar parameters and 28\% reduction in MAdds. Further, our EdgeNeXt model with 5.6M parameters achieves 79.4\% top-1 accuracy on ImageNet-1K.*
-## Comparison with SOTA ViTs and Hybrid Designs -![main figure](images/Figure_1.png) \ No newline at end of file +## Comparison with SOTA ViTs and Hybrid Architectures +![results](images/Figure_1.png) + +
+ +## Comparison with Previous SOTA [MobileViT (ICLR-2022)](https://arxiv.org/abs/2110.02178) +![results](images/table_2.png) + +
+ +## Installation +1. Create conda environment +```shell +conda create --name mobilenext python=3.8 +conda activate mobilenext +``` +2. Install PyTorch and torchvision +```shell +pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113 +``` +3. Install other dependencies +```shell +pip install -r requirements.txt +``` + +
+ +## Dataset Preparation +Download the [ImageNet-1K](http://image-net.org/) classification dataset and structure the data as follows: +``` +/path/to/imagenet-1k/ + train/ + class1/ + img1.jpeg + class2/ + img2.jpeg + val/ + class1/ + img3.jpeg + class2/ + img4.jpeg +``` + +
+ +## Evaluation +Download the pretrained weights and run the following command for evaluation on ImageNet-1K dataset. + +```shell +python main.py --model edgenext_small --eval True --batch_size 16 --data_path --output_dir --resume +``` + +## Training + +On a single machine with 8 GPUs, run the following command to train EdgeNeXt-S model. + +```shell +python -m torch.distributed.launch --nproc_per_node=8 main.py \ +--model edgenext_small --drop_path 0.1 \ +--batch_size 256 --lr 6e-3 --update_freq 2 \ +--model_ema true --model_ema_eval true \ +--data_path \ +--output_dir \ +--use_amp True --multi_scale_sampler +``` diff --git a/images/Figure_1.png b/images/Figure_1.png index 72ab204..a51944f 100644 Binary files a/images/Figure_1.png and b/images/Figure_1.png differ diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..f877b8f --- /dev/null +++ b/requirements.txt @@ -0,0 +1,4 @@ +timm==0.4.12 +tensorboardX==2.2 +six==1.16.0 +fvcore=0.1.5.post20220414 \ No newline at end of file