Arcee.ai, a startup dedicated to creating small AI models for commercial and enterprise purposes, is releasing its AFM-4.5B model for limited free use by small companies — sharing the weights on Hugging Face and allowing businesses with annual revenues under $1.75 million to access it at no cost under a custom “Arcee Model License.”
Tailored for practical enterprise application, the 4.5-billion-parameter model — significantly smaller than the leading models with tens of billions to trillions of parameters — offers cost efficiency, regulatory compliance, and robust performance within a compact framework.
AFM-4.5B was part of a two-model release by Arcee last month and is already “instruction tuned,” designed for chat, retrieval, and creative writing, making it immediately deployable for these enterprise use cases. Another foundational model was also released then, which was only pre-trained, offering more customization options for clients. Until now, both models were available exclusively through commercial licensing terms.
Arcee’s chief technology officer (CTO) Lucas Atkins mentioned in a post on X that more “specialized models for reasoning and tool use are forthcoming.”
AI Scaling Faces Its Challenges
Power limitations, rising token costs, and inference delays are transforming enterprise AI. Join our exclusive salon to learn how top teams are:
- Leveraging energy as a strategic asset
- Designing efficient inference for real throughput enhancements
- Achieving competitive ROI with sustainable AI systems
Reserve your place to stay ahead: https://bit.ly/4mwGngO
“Creating AFM-4.5B has been a significant team endeavor, and we’re immensely thankful to everyone who supported us. We eagerly await to see what you build with it,” he stated in another post. “We’re just beginning. If you have feedback or suggestions, please feel free to contact us anytime.”
The model is now available for deployment across various platforms — from cloud to smartphones to edge devices.
It also caters to Arcee’s expanding list of enterprise clients and their specific needs — particularly, a model developed without infringing on intellectual property.
As Arcee mentioned in its initial AFM-4.5B announcement last month: “Substantial effort was made to exclude copyrighted books and materials with unclear licensing.”
Arcee collaborated with third-party data curation firm DatologyAI to apply methods like source mixing, embedding-based filtering, and quality control — all aimed at reducing hallucinations and IP risks.
Focused on Enterprise Customer Needs
AFM-4.5B is Arcee.ai’s answer to significant challenges in enterprise adoption of generative AI: high costs, limited customizability, and regulatory concerns surrounding proprietary large language models (LLMs).
Over the past year, the Arcee team engaged in discussions with over 150 organizations, from startups to Fortune 100 companies, to identify the limitations of existing LLMs and define their own model objectives.
According to the company, many businesses found mainstream LLMs — such as those from OpenAI, Anthropic, or DeepSeek — too costly and difficult to adapt to industry-specific needs. Meanwhile, while smaller open-weight models like Llama, Mistral, and Qwen offered more flexibility, they raised concerns regarding licensing, IP provenance, and geopolitical risk.
AFM-4.5B was developed as a “no-compromise” alternative: customizable, compliant, and cost-efficient without sacrificing model quality or usability.
AFM-4.5B is designed with deployment versatility in mind. It can function in cloud, on-premise, hybrid, or even edge environments—thanks to its efficiency and compatibility with open frameworks like Hugging Face Transformers, llama.cpp, and (pending release) vLLM.
The model supports quantized formats, enabling it to run on lower-RAM GPUs or even CPUs, making it suitable for applications with limited resources.
Company Vision Secures Support
Arcee.ai’s broader strategy focuses on creating domain-adaptable, small language models (SLMs) that can power numerous use cases within the same organization.
As CEO Mark McQuade explained in a VentureBeat interview last year, “You don’t need to go that big for business use cases.” The company emphasizes rapid iteration and model customization as core to its offerings.
This vision attracted investor support with a $24 million Series A round back in 2024.
Inside AFM-4.5B’s Architecture and Training Process
The AFM-4.5B model uses a decoder-only transformer architecture with several enhancements for performance and deployment flexibility.
It incorporates grouped query attention for faster inference and ReLU² activations instead of SwiGLU to support sparsification without reducing accuracy.
Training followed a three-phase approach:
- Pretraining on 6.5 trillion tokens of general data
- Midtraining on 1.5 trillion tokens focusing on math and code
- Instruction tuning using high-quality instruction-following datasets and reinforcement learning with verifiable and preference-based feedback
To adhere to strict compliance and IP standards, the model was trained on nearly 7 trillion tokens of data curated for cleanliness and licensing safety.
A Competitive Model, But Not a Leader
Despite its smaller size, AFM-4.5B performs competitively across a wide range of benchmarks. The instruction-tuned version averages a score of 50.13 across evaluation suites such as MMLU, MixEval, TriviaQA, and Agieval—matching or surpassing similar-sized models like Gemma-3 4B-it, Qwen3-4B, and SmolLM3-3B.
Multilingual testing indicates the model delivers strong performance across over 10 languages, including Arabic, Mandarin, German, and Portuguese.
According to Arcee, adding support for additional dialects is straightforward due to its modular architecture.
AFM-4.5B has also demonstrated strong early traction in public evaluation settings. In a leaderboard ranking conversational model quality by user votes and win rate, the model ranks third overall, behind only Claude Opus 4 and Gemini 2.5 Pro.
It boasts a win rate of 59.2% and the fastest latency of any top model at 0.2 seconds, paired with a generation speed of 179 tokens per second.
Built-in Support for Agents
In addition to general capabilities, AFM-4.5B includes built-in support for function calling and agentic reasoning.
These features aim to simplify the creation of AI agents and workflow automation tools, reducing the need for complex prompt engineering or orchestration layers.
This functionality aligns with Arcee’s broader strategy of enabling enterprises to build custom, production-ready models faster, with lower total cost of ownership (TCO) and easier integration into business operations.
What’s Next for Arcee?
AFM-4.5B represents Arcee.ai’s effort to establish a new category of enterprise-ready language models: small, high-performing, and fully customizable, without the compromises that often accompany either proprietary LLMs or open-weight SLMs.
With competitive benchmarks, multilingual support, strong compliance standards, and flexible deployment options, the model aims to fulfill enterprise needs for speed, sovereignty, and scale.
Whether Arcee can secure a lasting position in the rapidly evolving generative AI landscape will depend on its ability to deliver on this promise. But with AFM-4.5B, the company has made a confident initial step.
Correction: This piece originally misspelled Arcee’s name in several places. We’ve since updated the article to correct it and regret the errors.
