Gemini Multimodal Playground ✨

A Python application for having voice and video conversations with Google's new Gemini 2.0 model. Features real-time voice and video input and audio responses. Available in two versions: a full-stack web application and a standalone Python script.

Full-Stack Version

gemini.playground.full.stack.mp4

Getting Your Gemini API Key

Go to Google AI Studio
Sign in with your Google account
Click "Create API Key"
Copy the generated API key and paste it into the appropriate .env file

Prerequisites

Python 3.12 or higher
Node.js 18 or higher
A Google Cloud account
A Gemini API key

Backend Setup

Clone this repository
Create a virtual environment and activate it:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install the required packages:

pip install -r requirements.txt

Create a .env file in the root directory with your API key:

GEMINI_API_KEY=your_api_key_here

Start the backend server:

python backend/main.py

Frontend Setup

Navigate to the frontend directory:

cd frontend

Install dependencies:

npm install

Start the development server:

npm run dev

Open http://localhost:3000 in your browser

Standalone Version

playground.demo.mp4

Prerequisites

Same as above, but only Python-related requirements are needed and Tkinter:

On Ubuntu/Debian: sudo apt-get install python3-tk
On Fedora: sudo dnf install python3-tkinter
On macOS & Windows: Already included with Python

Installation

Clone this repository or download the standalone folder
Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required packages:

pip install -r requirements.txt

Create a .env file in the standalone directory with your API key:

GEMINI_API_KEY=your_api_key_here

Running the Standalone Application

Make sure your virtual environment is activated
Run the script:

python standalone.py

Configuration Options

Both versions provide several configuration options:

System Prompt: The initial instructions given to Gemini about its role and behavior
Voice: Choose from different voice options for Gemini's responses:
- Puck
- Charon
- Kore
- Fenrir
- Aoede
Enable Google Search: Allows Gemini to search the internet for current information
Allow Interruptions: Enables interrupting Gemini while it's speaking

Troubleshooting

Audio feedback loop issue - Gemini may interrupt itself when it detects its own voice output through your microphone. This occurs because the application processes all incoming audio, including Gemini's responses. To prevent this feedback loop, either:
1. Disable the "Allow Interruptions" option in settings
2. Use headphones/earphones to prevent your microphone from picking up Gemini's audio output

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
backend		backend
frontend		frontend
standalone		standalone
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai studio api key.png		ai studio api key.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemini Multimodal Playground ✨

Full-Stack Version

Getting Your Gemini API Key

Prerequisites

Backend Setup

Frontend Setup

Standalone Version

Prerequisites

Installation

Running the Standalone Application

Configuration Options

Troubleshooting

About

Languages

License

saharmor/gemini-multimodal-playground

Folders and files

Latest commit

History

Repository files navigation

Gemini Multimodal Playground ✨

Full-Stack Version

Getting Your Gemini API Key

Prerequisites

Backend Setup

Frontend Setup

Standalone Version

Prerequisites

Installation

Running the Standalone Application

Configuration Options

Troubleshooting

About

Topics

Resources

License

Stars

Watchers

Forks

Languages