sosoaのモデルをpytorch2.4.1にしたので追従 (#29)

Hiroshiba · Dec 1, 2024 · 2433ab0 · 2433ab0
1 parent 4a243d4
commit 2433ab0
Show file tree

Hide file tree

Showing 12 changed files with 630 additions and 378 deletions.
diff --git a/.github/workflows/check.yml b/.github/workflows/check.yml
@@ -24,13 +24,13 @@ jobs:
       - name: Download model
         shell: bash
         run: |
-          wget https://github.com/Hiroshiba/vv_core_inference/releases/download/0.0.1/model.zip
+          wget https://github.com/Hiroshiba/vv_core_inference/releases/download/0.0.2/model.zip
           unzip model.zip
 
-      - name: Set up Python 3.9
+      - name: Set up Python 3.11
         uses: actions/setup-python@v4
         with:
-          python-version: 3.9
+          python-version: 3.11
           cache: "pip"
 
       - name: Install dependencies

diff --git a/README.md b/README.md
@@ -1,30 +1,30 @@
 # vv_core_inference
 
-VOICEVOX のコア内で用いられているディープラーニングモデルの推論コード。VOICEVOXコア用のonnxモデルを制作できる。
+VOICEVOX のコア内で用いられているディープラーニングモデルの推論コード。VOICEVOX コア用の onnx モデルを制作できる。
 
 サンプルとして配布しているモデルは実際の VOICEVOX のものではなく、ほとんどノイズと変わらない音が生成されます。
 含まれている config の値（層の数など）は仮の値で、VOICEVOX で使用されるモデルとは異なることがあります。
 
 ## 公開している意図
 
-VOICEVOXコアでの音声合成をより高速・軽量にするための手法の議論や提案を受けられるようにするためです。
+VOICEVOX コアでの音声合成をより高速・軽量にするための手法の議論や提案を受けられるようにするためです。
 
-VOICEVOXコアはこのリポジトリで作ったonnxモデルを用いて計算処理（推論）が行われています。
-onnxモデルをそのまま改良するのはかなり専門的な知識が必要なので、より多くの方に馴染みのあるpytorchのモデルとコードを公開しています。
+VOICEVOX コアはこのリポジトリで作った onnx モデルを用いて計算処理（推論）が行われています。
+onnx モデルをそのまま改良するのはかなり専門的な知識が必要なので、より多くの方に馴染みのある pytorch のモデルとコードを公開しています。
 
 ## 環境構築
 
-Python 3.9.19 で開発しました。 3.9 台なら動くと思います。
+Python 3.11.9 で開発しました。 3.11 台なら動くと思います。
 
 ```bash
 # ５分くらいかかります
 pip install -r requirements.txt
 ```
 
-## pytorchモデルのダウンロード
+## pytorch モデルのダウンロード
 
 ```bash
-wget https://github.com/Hiroshiba/vv_core_inference/releases/download/0.0.1/model.zip
+wget https://github.com/Hiroshiba/vv_core_inference/releases/download/0.0.2/model.zip
 unzip model.zip
 ```
 
@@ -41,13 +41,15 @@ python run.py \
   --texts "おはようございます、こんにちは、こんばんは、どうでしょうか"
 ```
 
-## モデルをonnxに変換
-* `python convert.py --yukarin_s_model_dir "model/yukarin_s" --yukarin_sa_model_dir "model/yukarin_sa" --yukarin_sosoa_model_dir "model/yukarin_sosoa" --hifigan_model_dir "model/hifigan"` でonnxへの変換が可能。modelフォルダ内のyukarin_s, yukarin_sa, yukarin_sosoaフォルダにonnxが保存される。
-  - `speaker_ids`オプションに指定する数値は自由。どの数値を指定しても生成されるonnxモデルは全ての`speaker_id`に対応しており、値を変えて実行しなおしたり、複数のidを指定したりする必要は無い。
-  - yukarin_sosoaフォルダにはhifi_ganと合わせた`decode.onnx`が保存される
-  - yukarin_sosfはオプショナルで、追加する場合は`--yukarin_sosf_model_dir "model/yukarin_sosf"`などを指定する
+## モデルを onnx に変換
 
-* onnxで実行したい場合は`run.py`を`--method=onnx`で実行する； `python run.py --yukarin_s_model_dir "model" --yukarin_sa_model_dir "model" --yukarin_sosoa_model_dir "model" --hifigan_model_dir "model"  --speaker_ids 5  --method=onnx`
+- `python convert.py --yukarin_s_model_dir "model/yukarin_s" --yukarin_sa_model_dir "model/yukarin_sa" --yukarin_sosoa_model_dir "model/yukarin_sosoa" --hifigan_model_dir "model/hifigan"` で onnx への変換が可能。model フォルダ内の yukarin_s, yukarin_sa, yukarin_sosoa フォルダに onnx が保存される。
+
+  - `speaker_ids`オプションに指定する数値は自由。どの数値を指定しても生成される onnx モデルは全ての`speaker_id`に対応しており、値を変えて実行しなおしたり、複数の id を指定したりする必要は無い。
+  - yukarin_sosoa フォルダには hifi_gan と合わせた`decode.onnx`が保存される
+  - yukarin_sosf はオプショナルで、追加する場合は`--yukarin_sosf_model_dir "model/yukarin_sosf"`などを指定する
+
+- onnx で実行したい場合は`run.py`を`--method=onnx`で実行する； `python run.py --yukarin_s_model_dir "model" --yukarin_sa_model_dir "model" --yukarin_sosoa_model_dir "model" --hifigan_model_dir "model"  --speaker_ids 5  --method=onnx`
   - `speaker_ids`に複数の数値を指定すれば、通常実行と同様に各話者の音声が保存される。
 
 ## ファイル構造
@@ -69,13 +71,13 @@ python run.py \
   - `make_decode_forwarder.py`
     - 音声波形生成用の`forwarder`を作る
   - `onnx_yukarin_s_forwarder.py`
-    - onnxruntimeで動作する`yukarin_s`用の`forwarder`を作る
+    - onnxruntime で動作する`yukarin_s`用の`forwarder`を作る
   - `onnx_yukarin_sa_forwarder.py`
-    - onnxruntimeで動作する`yukarin_sa`用の`forwarder`を作る
+    - onnxruntime で動作する`yukarin_sa`用の`forwarder`を作る
   - `onnx_yukarin_sosf_forwarder.py`
-    - onnxruntimeで動作する`yukarin_sosf`用の`forwarder`を作る
+    - onnxruntime で動作する`yukarin_sosf`用の`forwarder`を作る
   - `onnx_decode_forwarder.py`
-    - onnxruntimeで動作する音声波形生成用の`forwarder`を作る
+    - onnxruntime で動作する音声波形生成用の`forwarder`を作る
     - `yukarin_sosoa`も内部に組み込まれている
   - `acoustic_feature_extractor.py`
     - 音素情報やリサンプリング手法などが入っている。ディープラーニングとは関係ない。
@@ -87,7 +89,7 @@ python run.py \
 ## 自分で学習したモデルの onnx を作りたい場合
 
 VOICEVOX をビルドするには以下の 3 つの onnx が必要です。
-（predict_contourはオプショナルです。）
+（predict_contour はオプショナルです。）
 
 - predict_duration.onnx
   - 入力

diff --git a/convert.py b/convert.py
@@ -120,7 +120,6 @@ def convert_intonation(model_dir: Path, device: str, offset: int, working_dir: P
             "end_accent_phrase_list",
             "speaker_id",
         ],
-        example_outputs=output,
         output_names=["f0_list"],
         dynamic_axes={
             "vowel_phoneme_list": {0: "length"},