Multi-Modal-Depression-Detection

Automatic Depression Detection Using An Interpretable Audio-textual Multi-modal Transformer-based Model

Overview

This repository contains the implementation for a multi-modal depression detection model that combines audio and textual data using a Transformer-based architecture. The model is designed to detect depression levels in subjects based on their speech recordings and corresponding transcriptions. The approach leverages interpretability techniques to analyze attention mechanisms within the model.

Features

Multi-modal integration: Combines audio and text data for enhanced depression detection.
Transformer-based architecture: Uses BERT embeddings for text and a custom Transformer encoder for audio.
Interpretable design: Visualizes attention weights to provide insights into model decision-making.

Report

A detailed report of the project, including the methodology, experiments, and results, is available here.

Dataset

The model uses the DAIC-WOZ dataset, which includes:

Audio embeddings for each sentence (256-dimensional vectors).
Transcriptions of participant responses.

Due to privacy reasons we are not allowed to post the dataset, but it could be accessed [here](https://dcapswoz.ict.usc.edu/)

Implementation

The repository includes:

Data preprocessing:
- Text preprocessing with BERT tokenizer.
- Audio embeddings processed for each sentence.
Model architecture:
- A Transformer-based multi-modal model integrating both audio and text features.
- Separate encoders for audio and text, with a shared fully connected classification layer.
Training and evaluation pipeline:
- Cross-entropy loss and AdamW optimizer.
- Metrics: Accuracy and loss tracking during training and evaluation.
Interpretability:
- Visualization of attention weights for insights into model focus during predictions.

Model Architecture

Below is an overview of the model architecture used in this project:

The model integrates textual and audio modalities using separate Transformer-based encoders, followed by a shared classification layer. Attention mechanisms are leveraged for interpretability.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Attention_DAIC_WOZ_Classification.ipynb		Attention_DAIC_WOZ_Classification.ipynb
DAIC_WOZ.ipynb		DAIC_WOZ.ipynb
DAIC_WOZ_Classification.ipynb		DAIC_WOZ_Classification.ipynb
NLP-Report.pdf		NLP-Report.pdf
README.md		README.md
arch.png		arch.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Modal-Depression-Detection

Overview

Features

Report

Dataset

Implementation

Model Architecture

About

Releases

Packages

Languages

mehrshad-sdtn/Multi-Modal-Depression-Detection

Folders and files

Latest commit

History

Repository files navigation

Multi-Modal-Depression-Detection

Overview

Features

Report

Dataset

Implementation

Model Architecture

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages