Skip to content

Latest commit

 

History

History
58 lines (43 loc) · 2.74 KB

README.md

File metadata and controls

58 lines (43 loc) · 2.74 KB

Unlearning

This folder contains implementations for machine unlearning methods on LLM360 models. Machine unlearning is a pre-deployment safety measure designed to remove hazardous knowledge from language models. Unlearned models are inherently safe, as they lack the knowledge to be misused.

Table of Contents

Overview

Here's a list of unlearning methods we have implemented so far.

Directory Structure

unlearn.py is the main entrypoint for running unlearning methods. It uses python modules in methods/ and utils/ folders.

The methods/ folder contains the implementations for unlearning methods:

  • training.py: All training loop implementations
  • utils.py: Loss functions and other method-related utils

The utils/ folder contains helper functions for model/dataset IO:

  • data_utils.py: Dataloader for text datasets
  • model_utils.py: Model IO utils

By default, unlearned models are saved to models/ folder. Please store all training datasets to the data/ folder.

Note

This project uses the bio-forget-corpus from the WMDP Benchmark for unlearning training. Access to this dataset requires a separate request. Please follow the instructions provided here to obtain the necessary permissions. By default, the dataloader is configured to load the dataset from data/bio_forget.jsonl.

Installation

  1. Clone and enter the repo:
    git clone https://github.com/LLM360/Analysis360.git
    cd Analysis360/analysis/unlearning
  2. Install dependencies:
    pip install -r requirements.txt
  3. To install lm-eval, please check the installation instructions in the metrics/harness folder.

Quick Start

Training and Evaluation

An example usage is provided in the demo.ipynb, which can be executed with a single A100 80G GPU.