Explanation of some of these scripts can be found on my weblog. Below is the quick guide to getting them running.
-
Download the open bearing dataset.
-
Move the
bearing_IMS
directory to the same level as thebearing_snippets
directory ORModify the first line of the script to point basedir to the
bearing_IMS/1st_test
directory. -
Run
basic_feature_extraction.R
! This writes the basic feature vectors tob1.csv
throughb4.csv
.
-
For the first time through, run
basic_feature_extraction.R
to generate the features. Thereafter, the features are written to filesb1.csv
,b2.csv
,b3.csv
, andb4.csv
, and you can go straight to step 2. -
Run
basic_feature_graphing.R
!
-
Perform steps 1 and 2 of
basic_feature_extraction.R
. -
Run
more_features.R
! This writes the full feature vectors tob1_all.csv
throughb4_all.csv
.
-
Run
more_features.R
, so the features are stored in filesb1_all.csv
throughb4_all.csv
. -
Run
feature_correlation.R
, to see features with high correlation.
-
Run
feature_correlation.R
to output sets of features with high correlation. -
Run
optimise.rb
to select the minimal set of uncorrelated features.
-
Run
more_features.R
, so the features are stored in filesb1_all.csv
throughb4_all.csv
. -
If desired, modify line 25 of
feature_information.R
to include only the features you are interested in (e.g. after runningoptimise.rb
and finding a different minimal set). -
Run
feature_information.R
to generate an interesting graph! It also writes the full feature vector plus state labels toall_bearings.csv
, and the best 14 features plus state labels toall_bearings_best_fv.csv
.
-
Run
feature_information.R
, so the minimised set of features are written toall_bearings_best_fv.csv
. -
Run
kmeans.R
to select the best k-means model! It also writes it tokmeans.obj
.
-
Run
feature_information.R
, so the minimised set of features are written toall_bearings_best_fv.csv
. -
Run
kmeans.R
, so the best k-means model is written tokmeans.obj
. -
Visualise the results using the graphs generated by
kmeans.R
. Alter the filename on line 7 to match the best k-means model. If needed, alter the cluster numbers or class labels inrelabel.R
to better match the data. -
Run
relabel.R
to modify the state labels. It also plots a state transition graph, and writes the new data toall_bearings_relabelled.csv
.
-
Requires features and labels in
all_bearings_relabelled.csv
, which can be generated byrelabel.R
. -
Run
training_set.R
to randomly pick 70% of the data rows as a training set. The row numbers are written totrain.rows.csv
.
-
Requires
train.rows.csv
andall_bearings_relabelled.csv
(which can be generated by earlier scripts). -
Run
ann_mlp.R
to train and test an array of MLP ANNs with varying parameters. Parameters include:- Hidden neurons in the range 2 to 30 inclusive
- Different class weightings to handle uneven counts of class labels
- Data normalisation, neuron range, and neither to handle wide feature range disparities
-
The table of results is written to
ann.results.csv
, all trained models are written toann.models.obj
, and the best (highest accuracy) model is written tobest.ann.obj
.
-
Requires
train.rows.csv
andall_bearings_relabelled.csv
(which can be generated by earlier scripts). -
Run
rpart.R
to train and test an array of RPART decision trees. Different class weightings are applied to handle uneven counts of class labels. -
The table of results is written to
rpart.results.csv
, all trained models are written torpart.models.obj
, and the best (highest accuracy) model is written tobest.rpart.obj
.
-
Requires
train.rows.csv
andall_bearings_relabelled.csv
(which can be generated by earlier scripts). -
Run
knn.R
to train and test an array of k-nearest neighbour weighted classifiers with varying parameters. Parameters include:- Different kernels on the weightings (all 10 in the
kknn
library) - All k values from {1, 3, 5, 10, 15, 20, 35, 50}
- Different kernels on the weightings (all 10 in the
-
The table of results is written to
knn.results.csv
, all trained models are written toknn.models.obj
, and the best (highest accuracy) model is written tobest.knn.obj
.
-
Requires
train.rows.csv
andall_bearings_relabelled.csv
(which can be generated by earlier scripts). -
Run
svm.R
to train and test an array of Support Vector Machine classifiers with varying parameters. Parameters include:- Gamma from {10^-6, 10^-5, 10^-4, 10^-3, 10^-2, 10^-1}
- Cost from {10^0, 10^1, 10^2, 10^3}
- Different class weightings to handle uneven counts of class labels
-
These gamma and cost values correspond to a rough grid search. Finer search should be performed in the region of the pair with highest accuracy.
-
The table of results is written to
svm.results.csv
, all trained models are written tosvm.models.obj
, and the best (highest accuracy) model is written tobest.svm.obj
.