Training documenation #11

Technerder · 2023-01-27T23:05:27Z

Technerder
Jan 27, 2023

Are there any plans to release documentation relating to the training of custom models?

dscripka · 2023-01-29T00:32:37Z

dscripka
Jan 29, 2023
Maintainer

Yes, I am planning on releasing more information on how to train new models soon (hopefully in the next few weeks).

Because there is so much data involved (easily 100's of GBs), I won't be able to share everything in exactly the same way that it is being used to produce the models. But I plan to share example notebooks demonstrating the data generation, preparation, and model training with a small set of example data to at least demonstrate the concept.

In the meantime, I'm happy to answer any questions you have about training the models. You can also see a little bit of detail here on the text-to-speech models used for training data generation, and some more detail about the data preparation in the model documentation here.

0 replies

dscripka · 2023-02-15T14:09:08Z

dscripka
Feb 15, 2023
Maintainer

@Technerder , sorry for the delay, but I'm now close to releasing more information on training new models. I'm still finishing the example notebooks and documentation, but in the meantime I've released the synthetic data generation code in a separate repository here: https://github.com/dscripka/synthetic_speech_dataset_generation

Feel free to experiment with that code, and if you run into any problems create an issue on that repo and I'll try to help resolve it.

0 replies

dscripka · 2023-02-19T15:12:29Z

dscripka
Feb 19, 2023
Maintainer

@Technerder there is now an example notebook in the repo that demonstrates training custom models: https://github.com/dscripka/openWakeWord/blob/main/notebooks/training_models.ipynb

Feel free to post questions/comments on the custom model training process in this thread.

3 replies

Technerder Feb 20, 2023
Author

Awesome! Thank you so much! I'll hopefully be able to try out the notebook in a few days when I have time but at a quick glance it appears to do exactly what I was hoping!

jerblack Jan 9, 2024

Is all the iron python/jupyter stuff a hard dependency, or can this all be done with pure Python? I'm looking to try running the training locally so I don't have Google shutting down the training in the middle of the session again. Is there already a .py version of this process you've created, or is that something I need to make on my own if I want to run this locally?

dscripka Jan 14, 2024
Maintainer

@jerblack, no, the jupyter/colab functionality in the notebooks is not a requirement, and you can certainly modify the code to run entirely locally.

This notebook is a good place to start, as you can see the individual steps (mostly data preparation), and then calling the train.py file which does the actual data generation and model training. If you wanted to customize the process for your system, I would recommend reviewing train.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training documenation #11

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Training documenation #11

Technerder Jan 27, 2023

Replies: 3 comments · 3 replies

dscripka Jan 29, 2023 Maintainer

dscripka Feb 15, 2023 Maintainer

dscripka Feb 19, 2023 Maintainer

Technerder Feb 20, 2023 Author

jerblack Jan 9, 2024

dscripka Jan 14, 2024 Maintainer

Technerder
Jan 27, 2023

Replies: 3 comments 3 replies

dscripka
Jan 29, 2023
Maintainer

dscripka
Feb 15, 2023
Maintainer

dscripka
Feb 19, 2023
Maintainer

Technerder Feb 20, 2023
Author

dscripka Jan 14, 2024
Maintainer