A selection of the best models are available for download from my Google Drive. After downloading simply store the pre-trained model directories in either the vision/experiments
or captioning/experiments
directory.
A summary of the models and their results is below
The first model (with ID 0006
) and basis for many other experiments was a framewise DenseNet-121 architecture, this can be evaluated with
python evaluate.py --model_id 0006 --backbone DenseNet121
.......
The two-stream model (with ID 0010
) utilises two DenseNet-121 CNNs, one for flow and one for RGB. The model can be evaluated with
python evaluate.py --model_id 0010 --backbone DenseNet121 --flow twos
.......
The 3D CNN (with ID 0031
) utilises the a R(2+1)D architecture and can be evaluated with
python evaluate.py --model_id 0031 --backbone rdnet --window 8 --data_shape 224
The CNN is fine-tuned from pre-training on Kinetics and only uses input images of 224 by 224 due to memory constraints
.......
The temporal pooling model (with ID 0028
) utilises the pretrained framewise DenseNet-121 architecture (0006
), and uses temporal max pooling. It can be evaluated with
python evaluate.py --model_id 0028 --backbone DenseNet121 --temp_pool mean --window 15 --backbone_from_id 0006 --feats_model 0006
by specifying --feats_model 0006
the model is expecting to read pre-extracted features from \data\features\$model_id$\
. These features can be extracted by running something like the following
python evaluate.py --model_id 0006 --backbone DenseNet121 --save_feats
.......
The CNN-RNN model (with ID 0042
) utilises the pretrained framewise DenseNet-121 architecture (0006
), this can be evaluated with
python evaluate.py --model_id 0042 --backbone DenseNet121 --temp_pool gru --window 30 --backbone_from_id 0006 --feats_model 0006 --freeze_backbone
The CNN-RNN captioning model (with ID 0102
) utilises the pretrained framewise DenseNet-121 architecture (0006
), and expects the features to be pre-extracted (see Temporal Pooling above). This can be evaluated with
python evaluate_gnmt.py --model_id 0102 --num_hidden 256 --backbone_from_id 0006 --feats_model 0006
NOTE: The captioning scripts require the nlg-eval package. Please install prior as recommended by thier README