Added max_files parameter to `extract_features_and_train` #312

KobaKhit · 2020-09-09T19:55:57Z

Currently, extract_features_and_train needs a list of folder paths. It would be useful to be able to set how many files per folder to read at most. So I added max_files parameter with default 1000. Potentially randomly choosing those files would be another addition.

I tested it in a Kaggle notebook and it worked fine.

Motivation behind it was that there is a Birdcall Kaggle competition with 264 classes (folders) and ~100 files per class (folder). It took longer longer than 9 hours to train a model and the Kaggle notebook timed out. So I decided to train on smaller number of files per folder, i.e. undersample classes.

from pyAudioAnalysis import audioTrainTest as aT

# train classifier 
train_folder = '../input/birdsong-recognition/train_audio/'
mid_term_window_length = 2
mid_term_window_step = 1

# get audio file folders
class_paths = [train_folder + x for x in sorted(os.listdir(train_folder))]

model_name = "../input/bird-call-classification/svmSMtemp"
model_type = 'svm'

if not os.path.exists(model_name):
    model_name = "svmSMtemp"
    # train classifier using folders of audio files
    aT.extract_features_and_train(class_paths, 
                                  mid_term_window_length, 
                                  mid_term_window_step, 
                                  aT.shortTermWindow, 
                                  aT.shortTermStep, 
                                  model_type, 
                                  model_name, 
                                  False,
                                  max_files = 5)

Analyzing file 1 of 5: ../input/birdsong-recognition/train_audio/aldfly/XC134874.mp3
Analyzing file 2 of 5: ../input/birdsong-recognition/train_audio/aldfly/XC135454.mp3
Analyzing file 3 of 5: ../input/birdsong-recognition/train_audio/aldfly/XC135455.mp3
Analyzing file 4 of 5: ../input/birdsong-recognition/train_audio/aldfly/XC135456.mp3
Analyzing file 5 of 5: ../input/birdsong-recognition/train_audio/aldfly/XC135457.mp3
Feature extraction complexity ratio: 28.2 x realtime
Analyzing file 1 of 5: ../input/birdsong-recognition/train_audio/ameavo/XC133080.mp3
Analyzing file 2 of 5: ../input/birdsong-recognition/train_audio/ameavo/XC139829.mp3
Analyzing file 3 of 5: ../input/birdsong-recognition/train_audio/ameavo/XC139921.mp3
Analyzing file 4 of 5: ../input/birdsong-recognition/train_audio/ameavo/XC155039.mp3
Analyzing file 5 of 5: ../input/birdsong-recognition/train_audio/ameavo/XC166076.mp3
Feature extraction complexity ratio: 27.5 x realtime

tyiannak · 2020-11-19T20:21:16Z

Thanx for the PR @KobaKhit
It would be nice if
(a) default value was -1 which indicates that no max files is used in the feature extraction process
(b) random shuffling would also be parametrized (not by default set to true, as in many cases we need the feature extraction to take place in the file path order)

KobaKhit added 5 commits September 9, 2020 14:55

added max files per folder parameter

2508b8e

added max_files per folder parameter

c58a4ff

random max file selection

c2ee9ea

added random choice of files in folder

4c20d4c

added random choice of files in folder

f4d7140

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added max_files parameter to `extract_features_and_train` #312

Added max_files parameter to `extract_features_and_train` #312

KobaKhit commented Sep 9, 2020 •

edited

Loading

tyiannak commented Nov 19, 2020

Added max_files parameter to extract_features_and_train #312

Are you sure you want to change the base?

Added max_files parameter to extract_features_and_train #312

Conversation

KobaKhit commented Sep 9, 2020 • edited Loading

tyiannak commented Nov 19, 2020

Added max_files parameter to `extract_features_and_train` #312

Added max_files parameter to `extract_features_and_train` #312

KobaKhit commented Sep 9, 2020 •

edited

Loading