Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes in trim_silence function regarding the agressive trimming issue #46

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
28 changes: 14 additions & 14 deletions backend/app/audio.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,28 @@
"""audio processing, etc"""
from pydub import AudioSegment
import pandas as pd
import numpy as np
import soundfile as sf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally keep these as their full length names. This makes the code throughout more readable as you don't need to return and check what sf refers to for example.



class Audio:
silence_threshold = -50.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this would no longer be used anywhere is that right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it will be used in saving the audio

chunk_size = 10
threshold_value = 0.0007 # Tweak the value of threshold to get the aggressive trimming
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this threshold value represent? ie what are the units?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the mask threshold value it will cut the mean value got from rolling windows and discard the value below threshold


@staticmethod
def _detect_leading_silence(sound: AudioSegment) -> int:
trim_ms = 0
assert Audio.chunk_size > 0 # to avoid infinite loop
while sound[trim_ms:trim_ms + Audio.chunk_size].dBFS \
< Audio.silence_threshold and trim_ms < len(sound):
trim_ms += Audio.chunk_size
def _detect_leading_silence(sound: bytearray, sample_rate: int) -> list:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this method should probably be changed to reflect the new nature of it. It is no longer "detecting leading silence" it is generating a mask.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I am working on it

y = pd.Series(sound).apply(np.abs)
y_mean = y.rolling(window=int(sample_rate / 20),
min_periods=1,
center=True).max()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you break this down for me? What are we getting from this and how do we know that detects silence?


return trim_ms
return [True if mean > Audio.threshold_value else False for mean in y_mean]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of a comparison is already a Boolean, eg x > y evaluates to either True or False. So this can be simplified to something like:

return [mean > Audio.threshold_value for mean in y_mean]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup I think this is right


@staticmethod
def trim_silence(path: str) -> AudioSegment:
sound = AudioSegment.from_wav(path + ".wav")
start_trim = Audio._detect_leading_silence(sound)
end_trim = Audio._detect_leading_silence(sound.reverse())
duration = len(sound)
trimmed_sound = sound[start_trim:duration - end_trim]
sound, rate = sf.read(path + ".wav")
mask = Audio._detect_leading_silence(sound, rate)
trimmed_sound = sound[mask]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to remove all periods of silence, rather than only silence at the beginning and end of the clip?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will remove all the silence from signal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding but won't this also remove any apparent "silence" during the speech? Meaning any natural pauses will be cut out and the speech will sound very run together?

I don't think this is the behaviour we are after for TTS recording. Is that your intended use case?

return trimmed_sound

@staticmethod
Expand All @@ -31,4 +31,4 @@ def save_audio(path: str, audio: AudioSegment):

@staticmethod
def get_audio_len(audio: AudioSegment) -> float:
return len(audio)/1000.0
return len(audio) / 1000.0