Releases: huggingface/transformers.js
2.5.1
What's new?
- Add support for Llama/Llama2 models in #232
- Tokenization performance improvements in #234 (+ The Tokenizer Playground example app)
- Add support for DeBERTa/DeBERTa-v2 models in #244
- Documentation improvements for zero-shot-classification pipeline (link)
Full Changelog: 2.5.0...2.5.1
2.5.0
What's new?
Support for computing CLIP image and text embeddings separately (#227)
You can now compute CLIP text and vision embeddings separately, allowing for faster inference when you only need to query one of the modalities. We've also released a demo application for semantic image search to showcase this functionality.
Example: Compute text embeddings with CLIPTextModelWithProjection
.
import { AutoTokenizer, CLIPTextModelWithProjection } from '@xenova/transformers';
// Load tokenizer and text model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
const text_model = await CLIPTextModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');
// Run tokenization
let texts = ['a photo of a car', 'a photo of a football match'];
let text_inputs = tokenizer(texts, { padding: true, truncation: true });
// Compute embeddings
const { text_embeds } = await text_model(text_inputs);
// Tensor {
// dims: [ 2, 512 ],
// type: 'float32',
// data: Float32Array(1024) [ ... ],
// size: 1024
// }
Example: Compute vision embeddings with CLIPVisionModelWithProjection
.
import { AutoProcessor, CLIPVisionModelWithProjection, RawImage} from '@xenova/transformers';
// Load processor and vision model
const processor = await AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
const vision_model = await CLIPVisionModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');
// Read image and run processor
let image = await RawImage.read('https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/football-match.jpg');
let image_inputs = await processor(image);
// Compute embeddings
const { image_embeds } = await vision_model(image_inputs);
// Tensor {
// dims: [ 1, 512 ],
// type: 'float32',
// data: Float32Array(512) [ ... ],
// size: 512
// }
Improved browser extension example/template (#196)
We've updated the source code for our example browser extension, making the following improvements:
- Custom model caching - meaning you don't need to ship the weights of the model with the extension. In addition to a smaller bundle size, when the user updates, they won't need to redownload the weights!
- Use ES6 module syntax (vs. CommonJS) - much cleaner code!
- Persistent service worker - fixed an issue where the service worker would go to sleep after a portion of inactivity.
Summary of updates since last minor release (2.4.0):
- (2.4.1) Improved documentation
- (2.4.2) Support for private/gated models (#202)
- (2.4.3) Example Next.js applications (#211) + MPNet model support (#221)
- (2.4.4) StarCoder models + example application (release; demo + source code)
Misc bug fixes and improvements
- Fixed floating-point-precision edge-case for resizing images
- Fixed
RawImage.save()
- BPE tokenization for weird whitespace characters (#208)
2.4.4
What's new?
-
New model: StarCoder (Xenova/starcoderbase-1b and Xenova/tiny_starcoder_py)
-
In-browser code completion example application (demo and source code)
Full Changelog: 2.4.3...2.4.4
2.4.3
What's new?
-
Example next.js applications in #211
-
Demo: client-side or server-side
-
Source code: client-side or server-side
Full Changelog: 2.4.2...2.4.3
2.4.2
What's new?
- Add support for private/gated model access by @xenova in #202
- Fix BPE tokenization for weird whitespace characters by @xenova in #208
- Thanks to @fozziethebeat for reporting and helping to debug
- Minor documentation improvements
Full Changelog: 2.4.1...2.4.2
2.4.1
2.4.0
What's new?
Word-level timestamps for Whisper automatic-speech-recognition 🤯
This release adds the ability to predict word-level timestamps for our whisper automatic-speech-recognition models by analyzing the cross-attentions and applying dynamic time warping. Our implementation is adapted from this PR, which added this functionality to the 🤗 transformers Python library.
Example usage: (see docs)
import { pipeline } from '@xenova/transformers';
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en', {
revision: 'output_attentions',
});
let output = await transcriber(url, { return_timestamps: 'word' });
// {
// "text": " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.",
// "chunks": [
// { "text": " And", "timestamp": [0, 0.78] },
// { "text": " so", "timestamp": [0.78, 1.06] },
// { "text": " my", "timestamp": [1.06, 1.46] },
// ...
// { "text": " for", "timestamp": [9.72, 9.92] },
// { "text": " your", "timestamp": [9.92, 10.22] },
// { "text": " country.", "timestamp": [10.22, 13.5] }
// ]
// }
Note: For now, you need to choose the output_attentions
revision (see above). In future, we may merge these models into the main branch. Also, we currently do not have exports for the medium and large models, simply because I don't have enough RAM to do the export myself (>25GB needed) 😅 ... so, if you would like to use our conversion script to do the conversion yourself, please make a PR on the hub with these new models (under a new output_attentions
branch)!
From our testing, the JS implementation exactly matches the output produced by the Python implementation (when using the same model of course)! 🥳
Python (left) vs. JavaScript (right)
I'm excited to see what you all build with this! Please tag me on twitter if you use it in your project - I'd love to see! I'm also planning on adding this as an option to whisper-web, so stay tuned! 🚀
Misc bug fixes and improvements
- Fix loading of grayscale images in node.js (#178)
2.3.1
What's new?
New models and tokenizers
- Models:
MobileViT
for image classificationRoberta
for token classification (thanks @julien-c)XLMRoberta
for masked language modelling, sequence classification, token classification, and question answering
- Tokenizers:
FalconTokenizer
,GPTNeoXTokenizer
Improved documentation
- Details on how to discover and share transformers.js models on the hub (link)
- Example text-generation code (link)
- Example image-classification code (link)
Misc bug fixes
- Fix conversion to grayscale (commit)
- Aligned
.generate()
function output with original python implementation - Fix issue with non-greedy samplers
- Use WASM SIMD on iOS != 16.4.x (thanks @lsb)
New Contributors
Full Changelog: 2.3.0...2.3.1
2.3.0
What's new?
Improved 🤗 Hub integration and model discoverability!
All Transformers.js-compatible models are now displayed with a super cool tag! To indicate your model is compatible with the library, simply add the "transformers.js" library tag in your README (example).
This also means you can now search for and filter these models by task!
For example,
- https://huggingface.co/models?library=transformers.js lists all Transformers.js models
- https://huggingface.co/models?library=transformers.js&pipeline_tag=feature-extraction lists all models which can be used in the
feature-extraction
pipeline!
And lastly, clicking the "Use in Transformers.js" button will show some sample code for how to use the model!
Chroma 🤝 Transformers.js
You can now use all Transformers.js-compatible feature-extraction models for embeddings computation directly in Chroma! For example:
const {ChromaClient, TransformersEmbeddingFunction} = require('chromadb');
const client = new ChromaClient();
// Create the embedder. In this case, I just use the defaults, but you can change the model,
// quantization, revision, or add a progress callback, if desired.
const embedder = new TransformersEmbeddingFunction({ /* Configuration goes here */ });
const main = async () => {
// Empties and completely resets the database.
await client.reset()
// Create the collection
const collection = await client.createCollection({name: "my_collection", embeddingFunction: embedder})
// Add some data to the collection
await collection.add({
ids: ["id1", "id2", "id3"],
metadatas: [{"source": "my_source"}, {"source": "my_source"}, {"source": "my_source"}],
documents: ["I love walking my dog", "This is another document", "This is a legal document"],
})
// Query the collection
const results = await collection.query({
nResults: 2,
queryTexts: ["This is a query document"]
})
console.log(results)
// {
// ids: [ [ 'id2', 'id3' ] ],
// embeddings: null,
// documents: [ [ 'This is another document', 'This is a legal document' ] ],
// metadatas: [ [ [Object], [Object] ] ],
// distances: [ [ 1.0109775066375732, 1.0756263732910156 ] ]
// }
}
main();
Other links:
Better alignment with python library for calling decoder-only models
You can now call decoder-only models loaded via AutoModel.from_pretrained(...)
:
import { AutoModel, AutoTokenizer } from '@xenova/transformers';
// Choose model to use
let model_id = "Xenova/gpt2";
// Load model and tokenizer
let tokenizer = await AutoTokenizer.from_pretrained(model_id);
let model = await AutoModel.from_pretrained(model_id);
// Tokenize text and call
let model_inputs = await tokenizer('Once upon a time');
let output = await model(model_inputs);
console.log(output);
// {
// logits: Tensor {
// dims: [ 1, 4, 50257 ],
// type: 'float32',
// data: Float32Array(201028) [
// -20.166624069213867, -19.662782669067383, -23.189680099487305,
// ...
// ],
// size: 201028
// },
// past_key_values: { ... }
// }
Examples for computing perplexity: #137 (comment)
More accurate quantization parameters for whisper models
We've updated the quantization parameters used for the pre-converted whisper models on the hub. You can test them out with whisper web! Thanks to @jozefchutka for reporting this issue.
Thanks to @jozefchutka for reporting this issue!
Misc bug fixes and improvements
- Do not use spread operator to concatenate large arrays (#154)
- Set chunk timestamp to rounded time by @PushpenderSaini0 (#160)
2.2.0
What's new?
Multilingual speech recognition and translation w/ Whisper
You can now transcribe and translate speech for over 100 different languages, directly in your browser, with Whisper! Play around with our demo application here.
Example: Transcribe English.
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
let output = await transcriber(url);
// { text: " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country." }
Example: Transcribe English w/ timestamps.
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/jfk.wav';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny.en');
let output = await transcriber(url, { return_timestamps: true });
// {
// text: " And so my fellow Americans ask not what your country can do for you, ask what you can do for your country."
// chunks: [
// { timestamp: [0, 8], text: " And so my fellow Americans ask not what your country can do for you" }
// { timestamp: [8, 11], text: " ask what you can do for your country." }
// ]
// }
Example: Transcribe French.
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/french-audio.mp3';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small');
let output = await transcriber(url, { language: 'french', task: 'transcribe' });
// { text: " J'adore, j'aime, je n'aime pas, je déteste." }
Example: Translate French to English.
let url = 'https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/french-audio.mp3';
let transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-small');
let output = await transcriber(url, { language: 'french', task: 'translate' });
// { text: " I love, I like, I don't like, I hate." }
Misc
- Aligned
.generate()
function with original python implementation - Minor improvements to documentation (+ some examples). More to come in the future.
Full Changelog: 2.1.1...2.2.0