Skip to content

ScienceArtMagic/DELVE

Repository files navigation

license language library_name pipeline_tag tags datasets base_model
apache-2.0
multilingual
transformers
text-generation
mamba
pythia
rwkv
EleutherAI/pile
cerebras/SlimPajama-627B
oscar-corpus/oscar
togethercomputer/RedPajama-Data-V2
tiiuae/falcon-refinedweb
bigcode/the-stack-dedup
bigcode/the-stack-v2-dedup
OpenCoder-LLM/fineweb-code-corpus
codeparrot/github-code-clean
opencsg/chinese-fineweb-edu
opencsg/chinese-fineweb-edu-v2
HuggingFaceFW/fineweb
HuggingFaceFW/fineweb-edu
OpenCoder-LLM/fineweb-math-corpus
allenai/dolma
state-spaces/mamba-370m-hf
state-spaces/mamba-1.4b-hf
TRI-ML/mamba-7b-rw
RWKV/rwkv-4-430m-pile
RWKV/rwkv-4-1b5-pile
RWKV/rwkv-4-7b-pile
EleutherAI/pythia-410m-deduped
EleutherAI/pythia-1b-deduped
EleutherAI/pythia-1.4b-deduped
EleutherAI/pythia-6.9b-deduped
EleutherAI/pythia-14m
EleutherAI/pythia-31m

DELVE: Diminutive Experts Leverage Voluminous Expansion

Table of Contents

Model Details

Model Description

  • Developed by: 🧪🖌️🪄
  • Model type: Language model
  • Language(s) (NLP): Multilingual
  • License: Apache 2.0
  • Parent Model: Mamba v1, RWKV v4, Pythia
  • Resources for more information: More information needed

Uses

Direct Use

Downstream Use [Optional]

If you're fine-tuning with e.g. LoRA, note that the linear modules from Mamba, RWKV v4, Pythia, have already been approximated as low-rank submodules (the biggest... well, factor... in making DELVE so small).

Out-of-Scope Use

I neither know nor care whether or not this model will make a good spicy waifu.

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

Recommendations

Hopefully the model won't be a douche... It is trained on the internet though, so...

Training Details

Training Data

More information on training data needed

Training Procedure

Preprocessing

More information needed

Speeds, Sizes, Times

More information needed

Evaluation

Testing Data, Factors & Metrics

Testing Data

More information needed

Factors

More information needed

Metrics

More information needed

Results

More information needed

Model Examination

More information needed

Technical Specifications [optional]

Model Architecture and Objective

Autoregressive hybrid of Mamba v1 SSM, RWKV v4 RNN, and two decoder-only Transformer architectures (Pythia and RedPajama Incite) - all based on the GPT-NeoX 20b tokenizer.

Combined single model upcycled from these individual pretrained models, after each goes through SVD low-rank approximation for extreme parameter reduction.

Compute Infrastructure

More information needed

Hardware

More information needed

Software

More information needed

Citation

BibTeX:

More information needed

APA:

More information needed

Glossary [optional]

More information needed

More Information [optional]

More information needed

Model Card Authors [optional]

🤗: )))?!?(((

Model Card Contact

🤗: )))?!?(((

🤗: 🧪🖌️🪄

🦋: 🧪🖌️🪄

How to Get Started with the Model

Use the code below to get started with the model.

Click to expand

More information needed

About

Diminutive Experts Leverage Voluminous Expansion

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages