Building Your First Multimodal Gen AI App 🚀

Introduction

Welcome to the tutorial on building your first multimodal generative AI (Gen AI) app! This repository contains all the resources and code you need to get started with creating an app that can generate text, audio, images, and videos using various AI models and APIs.

Prerequisites

Before you begin, make sure you have the following:

A GitHub account
GitHub Codespaces enabled (comes with your GitHub account)
API keys for the following:
- Groq or OpenAI (at least one is required; Groq has a free tier for all the models we need!)
- Replicate (necessary for full functionality; Replicate has kindly provided credits for those taking this workshop at a conference.)
Basic knowledge of Python and Bash

Note about GitHub Codespaces:

GitHub Codespaces is included with every GitHub account.
There's a substantial monthly free tier for personal accounts (120 core hours/month as of 2024).
If you exceed the free tier, you may need to purchase additional usage.
For the latest information on GitHub Codespaces pricing and usage limits, please check the official GitHub documentation.

Note for Workshop Participants: If you are taking this workshop at a conference or other event, please check with your instructors or teachers to see if they are providing the API keys for you.

Setting Up the Environment

To get up and running, you can watch the video below and/or follow the instructions (instructions at around 1:50 after motivating demo):

Building.a.Multi-Modal.AI.App.with.GitHub.Code.Spaces.--SUB.mp4

Creating a GitHub Codespace

Open the repository in GitHub.
Click on the Code button and select Create codespace on main.
Wait for the Codespace to spin up (this should take about 2 minutes).

Adding API Keys

In the Codespace, navigate to the .streamlit directory inside the multimodal_app folder.
Open the secrets.toml file.

Add your API keys as follows:

OPENAI_API_KEY = "your_openai_api_key"
GROQ_API_KEY = "your_groq_api_key"
REPLICATE_API_TOKEN = "your_replicate_api_token"

Save the file and ensure these keys are kept private and secure.

API Keys

You'll need API keys for either OpenAI or Groq, and Replicate (for full functionality).

Setting Up the Poetry Environment

Once the Codespace finishes configuring, it will automatically install Poetry.
In the Codespace terminal, activate the Poetry environment:
```
cd multimodal_app
poetry shell
```

Running the Application

To run the Streamlit app:

Ensure you're in the multimodal_app directory and have activated the Poetry shell.
Run the following command:
```
streamlit run main.py
```
Click "Open in browser" when prompted to view the app.

Using the Application

The multimodal Gen AI app allows you to:

Record speech or type text input.
Transcribe speech to text.
Generate text responses based on your input.
Create audio versions of the text.
Generate images based on the content.
Create videos incorporating the generated content.

To use the app:

Click the record button to speak, or type your input.
Click "Transcribe" to convert speech to text (if applicable).
Choose to run all tasks concurrently or step-by-step.
Explore the generated text, audio, images, and videos.

Troubleshooting

If you encounter any issues:

Ensure all API keys are correctly entered in the secrets.toml file.
Check that you're in the correct directory (multimodal_app) when running commands.
Verify that all dependencies are installed by running poetry install if needed.

Contributing

We welcome contributions to improve this project! Please feel free to submit issues or pull requests.

Happy building! We hope you enjoy creating your first multimodal Gen AI app. If you have any questions or feedback, please don't hesitate to reach out.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
multimodal-app		multimodal-app
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Building Your First Multimodal Gen AI App 🚀

Introduction

Prerequisites

Setting Up the Environment

Creating a GitHub Codespace

Adding API Keys

API Keys

Setting Up the Poetry Environment

Running the Application

Using the Application

Troubleshooting

Contributing

About

Releases

Packages

Contributors 2

Languages

License

hugobowne/first-multimodal-genAI-app

Folders and files

Latest commit

History

Repository files navigation

Building Your First Multimodal Gen AI App 🚀

Introduction

Prerequisites

Setting Up the Environment

Creating a GitHub Codespace

Adding API Keys

API Keys

Setting Up the Poetry Environment

Running the Application

Using the Application

Troubleshooting

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages