Description from the site:
Mistral AI team is proud to release Mistral 7B, the most powerful language model for its size to date.
Mistral 7B in short
Mistral 7B is a 7.3B parameter model that:
Outperforms Llama 2 13B on all benchmarks
Outperforms Llama 1 34B on many benchmarks
Approaches CodeLlama 7B performance on code, while remaining good at English tasks
Uses Grouped-query attention (GQA) for faster inference
Uses Sliding Window Attention (SWA) to handle longer sequences at smaller cost
We’re releasing Mistral 7B under the Apache 2.0 license, it can be used without restrictions.
Download it and use it anywhere (including locally) with our reference implementation
Deploy it on any cloud (AWS/GCP/Azure), using vLLM inference server and skypilot
Use it on HuggingFace
Mistral 7B is easy to fine-tune on any task. As a demonstration, we’re providing a model fine-tuned for chat, which outperforms Llama 2 13B chat.
It’s not clear to me either on exactly what hardware is required for the reference implementation, but there’s a bunch of discussion about getting it to work with llama.cpp in the HN thread, so it might be possible soon (or maybe already is?) to run it on the CPU if you’re willing to wait longer for it to process.
Let us know how it goes!