Speech Detection

This script uses the OpenCV, Azure Speech Recognition, gTTS, and VLC libraries to detect faces in the webcam video feed and recognize speech in the audio input. It generates a response to the user's speech using OpenAI's GPT-3 API, and plays the generated speech using the VLC library. It also displays a GUI with a textbox and buttons to change the language for speech recognition.

Requirements

OpenCV

pip install opencv-python

Azure Speech Recognition

pip install azure-cognitiveservices-speech

gTTS

pip install gTTS

VLC

download https://www.videolan.org/vlc/download-windows.en-GB.html

tkinter

pip install tk

OpenAI API key

pip install openai

Usage

Set the environment variable azure_api_key to your Azure API key.
Set the environment variable openai_api_key to your OpenAI API key.
Run the script: python main.py
The script will start the webcam and display the video feed in a window.
When a face is detected in the video, the script will create a rectangle around it.
When speech is detected, it will transcribe the speech to text, generate a response, generate speech from the response text, and play the generated speech.
The transcribed speech and generated response will be displayed in the GUI textbox.
Click the "English" or "German" button in the GUI to change the language for speech recognition.
Press "Q" in runtime to end the script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Speech Detection

Requirements

Usage

Copyright

Files

README.md

Latest commit

History

README.md

File metadata and controls

Speech Detection

Requirements

Usage

Copyright