DELVE: Diminutive Experts Leverage Voluminous Expansion

license

language

library_name

pipeline_tag

DELVE: Diminutive Experts Leverage Voluminous Expansion

Model Details
- Model Description
Uses
Bias, Risks, and Limitations
- Recommendations
Training Details
- Training Data
- Training Procedure
  - Preprocessing
  - Speeds, Sizes, Times
Evaluation
- Testing Data, Factors & Metrics
- Results
Model Examination
Technical Specifications [optional]
- Model Architecture and Objective
- Compute Infrastructure
  - Hardware
  - Software
Citation
Glossary [optional]
More Information [optional]
Model Card Authors [optional]
Model Card Contact
How to Get Started with the Model

Model Details

Model Description

Developed by: 🧪🖌️🪄
Model type: Language model
Language(s) (NLP): Multilingual
License: Apache 2.0
Parent Model: Mamba v1, RWKV v4, Pythia
Resources for more information: More information needed

Uses

Direct Use

Downstream Use [Optional]

If you're fine-tuning with e.g. LoRA, note that the linear modules from Mamba, RWKV v4, Pythia, have already been approximated as low-rank submodules (the biggest... well, factor... in making DELVE so small).

Out-of-Scope Use

I neither know nor care whether or not this model will make a good spicy waifu.

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

Recommendations

Hopefully the model won't be a douche... It is trained on the internet though, so...

Training Details

Training Data

More information on training data needed

Training Procedure

Preprocessing

More information needed

Speeds, Sizes, Times

More information needed

Evaluation

Testing Data, Factors & Metrics

Testing Data

More information needed

Factors

More information needed

Metrics

More information needed

Results

More information needed

Model Examination

More information needed

Technical Specifications [optional]

Model Architecture and Objective

Autoregressive hybrid of Mamba v1 SSM, RWKV v4 RNN, and two decoder-only Transformer architectures (Pythia and RedPajama Incite) - all based on the GPT-NeoX 20b tokenizer.

Combined single model upcycled from these individual pretrained models, after each goes through SVD low-rank approximation for extreme parameter reduction.

Compute Infrastructure

More information needed

Hardware

More information needed

Software

More information needed

Citation

BibTeX:

More information needed

APA:

More information needed

Glossary [optional]

More information needed

More Information [optional]

More information needed

Model Card Authors [optional]

🤗: )))?!?(((

Model Card Contact

🤗: )))?!?(((

🤗: 🧪🖌️🪄

🦋: 🧪🖌️🪄

How to Get Started with the Model

Use the code below to get started with the model.

Click to expand

More information needed

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.gitattributes		.gitattributes
.lfsconfig		.lfsconfig
README.md		README.md
config.json		config.json
config_extra.json		config_extra.json
ffn.py		ffn.py
source_models.json		source_models.json
source_models_extra.json		source_models_extra.json

ScienceArtMagic/DELVE

Folders and files

Latest commit

History

Repository files navigation

DELVE: Diminutive Experts Leverage Voluminous Expansion

Table of Contents

Model Details

Model Description

Uses

Direct Use

Downstream Use [Optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

Training Details

Training Data

Training Procedure

Preprocessing

Speeds, Sizes, Times

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Model Examination

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

How to Get Started with the Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages