-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changes in trim_silence function regarding the agressive trimming issue #46
base: master
Are you sure you want to change the base?
Changes from 1 commit
d9fea1a
3dfeb73
ce72d01
07765f1
1a4319e
4751a4a
29f5f5e
f349877
55952e9
e8edfb5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,28 +1,28 @@ | ||
"""audio processing, etc""" | ||
from pydub import AudioSegment | ||
import pandas as pd | ||
import numpy as np | ||
import soundfile as sf | ||
|
||
|
||
class Audio: | ||
silence_threshold = -50.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems this would no longer be used anywhere is that right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no it will be used in saving the audio |
||
chunk_size = 10 | ||
threshold_value = 0.0007 # Tweak the value of threshold to get the aggressive trimming | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this threshold value represent? ie what are the units? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is the mask threshold value it will cut the mean value got from rolling windows and discard the value below threshold |
||
|
||
@staticmethod | ||
def _detect_leading_silence(sound: AudioSegment) -> int: | ||
trim_ms = 0 | ||
assert Audio.chunk_size > 0 # to avoid infinite loop | ||
while sound[trim_ms:trim_ms + Audio.chunk_size].dBFS \ | ||
< Audio.silence_threshold and trim_ms < len(sound): | ||
trim_ms += Audio.chunk_size | ||
def _detect_leading_silence(sound: bytearray, sample_rate: int) -> list: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The name of this method should probably be changed to reflect the new nature of it. It is no longer "detecting leading silence" it is generating a mask. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes I am working on it |
||
y = pd.Series(sound).apply(np.abs) | ||
y_mean = y.rolling(window=int(sample_rate / 20), | ||
min_periods=1, | ||
center=True).max() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you break this down for me? What are we getting from this and how do we know that detects silence? |
||
|
||
return trim_ms | ||
return [True if mean > Audio.threshold_value else False for mean in y_mean] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The result of a comparison is already a Boolean, eg
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yup I think this is right |
||
|
||
@staticmethod | ||
def trim_silence(path: str) -> AudioSegment: | ||
sound = AudioSegment.from_wav(path + ".wav") | ||
start_trim = Audio._detect_leading_silence(sound) | ||
end_trim = Audio._detect_leading_silence(sound.reverse()) | ||
duration = len(sound) | ||
trimmed_sound = sound[start_trim:duration - end_trim] | ||
sound, rate = sf.read(path + ".wav") | ||
mask = Audio._detect_leading_silence(sound, rate) | ||
trimmed_sound = sound[mask] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this going to remove all periods of silence, rather than only silence at the beginning and end of the clip? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it will remove all the silence from signal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I may be misunderstanding but won't this also remove any apparent "silence" during the speech? Meaning any natural pauses will be cut out and the speech will sound very run together? I don't think this is the behaviour we are after for TTS recording. Is that your intended use case? |
||
return trimmed_sound | ||
|
||
@staticmethod | ||
|
@@ -31,4 +31,4 @@ def save_audio(path: str, audio: AudioSegment): | |
|
||
@staticmethod | ||
def get_audio_len(audio: AudioSegment) -> float: | ||
return len(audio)/1000.0 | ||
return len(audio) / 1000.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally keep these as their full length names. This makes the code throughout more readable as you don't need to return and check what
sf
refers to for example.