Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameter Sweep and NaNs #16

Open
JoakimHaurum opened this issue Dec 20, 2024 · 2 comments
Open

Hyperparameter Sweep and NaNs #16

JoakimHaurum opened this issue Dec 20, 2024 · 2 comments

Comments

@JoakimHaurum
Copy link

Im working on reproducing your results for EViT, ATS, DyanmicVIT etc. However, I find that I often run into NaNs about 1/3-1/2 through the training. It doesnt matter if I reserve the prior features or scatter onto a zero-matrix.. I use the config from SViT with no adjustmetns to the optimizer.

Did you observe similar behavior, and what hyperparameters did you use to train the different models; just one fixed set (ie lr = 1e-5) or did you do a sweep per method?

@kaikai23
Copy link
Contributor

kaikai23 commented Jan 2, 2025

Hi Joakim,
Thank you for your interest in reproducing our results! We did not encounter NaN issues during training. Below are some details that might help you debug the problem:

  1. We did not perform hyperparameter sweeps. For all experiments, we used a fixed learning rate (lr = 1e-5) and other default settings without adjustments..
  2. Here’s an example configuration we used for fine-tuning EViT:
# Copyright (c) Shanghai AI Lab. All rights reserved.
_base_ = [
    '../_base_/models/mask_rcnn_r50_fpn.py',
    '../_base_/datasets/coco_instance.py',
    '../_base_/schedules/schedule_0.5x.py',
    '../_base_/default_runtime.py'
]
# pretrained = 'https://dl.fbaipublicfiles.com/deit/deit_tiny_patch16_224-a1311bcf.pth'
# pretrained = 'pretrained/deit_tiny_patch16_224-a1311bcf.pth'
model = dict(
    backbone=dict(
        _delete_=True,
        type='EViTAdapter',
        patch_size=16,
        embed_dim=192,
        depth=12,
        num_heads=3,
        mlp_ratio=4,
        drop_path_rate=0.1,
        layer_scale=False,
        conv_inplane=64,
        n_points=4,
        deform_num_heads=6,
        cffn_ratio=0.25,
        deform_ratio=1.0,
        interaction_indexes=[[0, 2], [3, 5], [6, 8], [9, 11]],
        window_attn=[False] * 12,
        window_size=[None] * 12,
        pretrained=None,
        keep_rate=[1, 1, 1, 0.7, 1, 1, 0.7, 1, 1, 0.7, 1, 1],
        fuse_token=False
    ),
    neck=dict(
        type='FPN',
        in_channels=[192, 192, 192, 192],
        out_channels=256,
        num_outs=5))
# optimizer
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
# augmentation strategy originates from DETR / Sparse RCNN
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='AutoAugment',
         policies=[
             [
                 dict(type='Resize',
                      img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                                 (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                                 (736, 1333), (768, 1333), (800, 1333)],
                      multiscale_mode='value',
                      keep_ratio=True)
             ],
             [
                 dict(type='Resize',
                      img_scale=[(400, 1333), (500, 1333), (600, 1333)],
                      multiscale_mode='value',
                      keep_ratio=True),
                 dict(type='RandomCrop',
                      crop_type='absolute_range',
                      crop_size=(384, 600),
                      allow_negative_crop=True),
                 dict(type='Resize',
                      img_scale=[(480, 1333), (512, 1333), (544, 1333),
                                 (576, 1333), (608, 1333), (640, 1333),
                                 (672, 1333), (704, 1333), (736, 1333),
                                 (768, 1333), (800, 1333)],
                      multiscale_mode='value',
                      override=True,
                      keep_ratio=True)
             ]
         ]),
    dict(type='RandomCrop',
         crop_type='absolute_range',
         crop_size=(1024, 1024),
         allow_negative_crop=True),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
data = dict(samples_per_gpu=4,
            workers_per_gpu=2, #####
            train=dict(pipeline=train_pipeline))
optimizer = dict(
    _delete_=True, type='AdamW', lr=0.00001, weight_decay=0.000001,
    paramwise_cfg=dict(
    custom_keys={
        'level_embed': dict(decay_mult=0.),
        'pos_embed': dict(decay_mult=0.),
        'norm': dict(decay_mult=0.),
        'bias': dict(decay_mult=0.)
    }))
optimizer_config = dict(grad_clip=None)
fp16 = dict(loss_scale=dict(init_scale=512))
checkpoint_config = dict(
    interval=1,
    max_keep_ckpts=3,
    save_last=True,
)

# work_dir = '/data/storage/yifei/output/work_dir/debug'
work_dir = '/net/cephfs/shares/rpg.ifi.uzh/yifei/output/work_dir/mask_rcnn_evit_adapter_tiny_fpn_0.5x_coco'
exp_name = 'det-evit-tiny-0.5x'

# load_from = '/data/storage/yifei/output/work_dir/mask_rcnn_deit_adapter_tiny_fpn_3x_coco/latest.pth'
load_from = '/net/cephfs/shares/rpg.ifi.uzh/yifei/output/work_dir/mask_rcnn_deit_adapter_tiny_fpn_3x_coco/latest.pth'

Feel free to reach out with more details about your training setup if the issue persists!

Best regards,
Yifei

@JoakimHaurum
Copy link
Author

Thank you for the insights Yifei!

When comparing my config with yours they are pretty much identical.
Could you share your EViTAdapter implementation? I assume you build on the original code base (https://github.com/youweiliang/evit/blob/master/evit.py), and I think the major differences might just be in how the Adapter is set up.

Best
Joakim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants