This FastAPI backend handles speech-to-text, natural language processing, and text-to-speech conversion using OpenAI's Whisper, GPT models, and Azure's Speech Services.
.
├── main.py # Main application file
├── test.py # Test file generator
├── .env # Environment variables
├── requirements.txt # Required packages
└── README.md # This file
Create a .env
file in the root directory with the following:
OPENAI_API_KEY=your_openai_key_here
AZURE_SPEECH_KEY=your_azure_speech_key_here
AZURE_SPEECH_REGION=your_azure_region_here
- Manages recording, transcription, chat, and playback
- Handles conversation state
- Coordinates API interactions
- Single endpoint:
/chat
- Supports both streaming and non-streaming responses
- Swagger UI available at
/docs
record_audio()
: Captures microphone inputprocess_audio()
: Converts speech to text using Whisperget_chat_response()
: Generates response using ChatGPTsynthesize_speech()
: Converts text to speechplay_audio()
: Plays the response
- User initiates chat through endpoint
- System records audio
- Audio processed through Whisper
- Response generated via ChatGPT
- Response converted to speech
- Audio played back to user
- Run test file generator:
python3 test.py
- Navigate to
/docs
to test the/chat
endpoint - Basic error handling tests included
- Core functionality tests for each component
- Install dependencies:
pip install -r requirements.txt
-
Set up your
.env
file with the required API keys -
Start the server:
uvicorn main:app --reload
-
Navigate to
http://localhost:8000/docs
to test the/chat
endpoint -
Generate test WAV file:
python3 test.py