melvincarvalho on Nostr: Probably the craziest week in Open Source AI (yet): 1. Mistral (in collaboration with ...
Probably the craziest week in Open Source AI (yet):
1. Mistral (in collaboration with Nvidia) dropped Apache 2.0 licensed NeMo 12B LLM, better than L3 8B and Gemma 2 9B. Models are multilingual with 128K context and a highly efficient tokenizer - tekken.
2. Apple released DCLM 7B - truly open source LLM, based on OpenELM, trained on 2.5T tokens with 63.72 MMLU (better than Mistral 7B)
3. HF shared SmolLM - 135M, 360M, & 1.7B Smol LMs capable of running directly in the browser; they beat Qwen 1.5B, Phi 1.5B and more. Trained on just 650B tokens.
4. Groq put out Llama 3 8B & 70B tool use & function calling model checkpoints - achieves 90.76% accuracy on Berkely Function Calling Leaderboard (BFCL). Excels at API usage & structured data manipulation!
5. Salesforce released xLAM 1.35B & 7B Large Action Models along with 60K instruction fine-tuning dataset. The 7B model scores 88.24% on BFCL & 2B 78.94%
6. Deepseek changed the game with v2 chat 0628 - The best open LLM on LYMSYS arena right now - 236B parameter model with 21B active parameters. It also excels at coding (rank #3) and arena hard problems (rank #3)
There's a lot more; Arcee (mergekit) released a series of LLMs, each better than the other, and Numina and HF Numina 72B (based on Qwen 2) and Math datasets, Mixbread with embedding models (english + german) and a lot more!
It's fun to see so many releases next week with L3 405B
Published at
2024-07-22 06:20:59Event JSON
{
"id": "ea08bab5f3181592c03d2be284116a28919afd548b538393478032d05d3688fc",
"pubkey": "de7ecd1e2976a6adb2ffa5f4db81a7d812c8bb6698aa00dcf1e76adb55efd645",
"created_at": 1721629259,
"kind": 1,
"tags": [
[
"t",
"3"
],
[
"t",
"3"
]
],
"content": "Probably the craziest week in Open Source AI (yet):\n\n1. Mistral (in collaboration with Nvidia) dropped Apache 2.0 licensed NeMo 12B LLM, better than L3 8B and Gemma 2 9B. Models are multilingual with 128K context and a highly efficient tokenizer - tekken.\n \n2. Apple released DCLM 7B - truly open source LLM, based on OpenELM, trained on 2.5T tokens with 63.72 MMLU (better than Mistral 7B)\n\n3. HF shared SmolLM - 135M, 360M, \u0026 1.7B Smol LMs capable of running directly in the browser; they beat Qwen 1.5B, Phi 1.5B and more. Trained on just 650B tokens. \n\n4. Groq put out Llama 3 8B \u0026 70B tool use \u0026 function calling model checkpoints - achieves 90.76% accuracy on Berkely Function Calling Leaderboard (BFCL). Excels at API usage \u0026 structured data manipulation!\n\n5. Salesforce released xLAM 1.35B \u0026 7B Large Action Models along with 60K instruction fine-tuning dataset. The 7B model scores 88.24% on BFCL \u0026 2B 78.94%\n\n6. Deepseek changed the game with v2 chat 0628 - The best open LLM on LYMSYS arena right now - 236B parameter model with 21B active parameters. It also excels at coding (rank #3) and arena hard problems (rank #3)\n\nThere's a lot more; Arcee (mergekit) released a series of LLMs, each better than the other, and Numina and HF Numina 72B (based on Qwen 2) and Math datasets, Mixbread with embedding models (english + german) and a lot more!\n\nIt's fun to see so many releases next week with L3 405B ",
"sig": "3035da840a6a7b26ca235980c9e17ba32cd67f7846dd1f8ab498ce32bc7526f6e2b9331f0a06026d4ecf0aab061a228c10c026f6870617ead8f71af863f0f9dd"
}