Alibaba’s Qwen AI development team has released the Qwen3.5 Medium Model series, consisting of four new large languag…

Alibaba’s Qwen AI development team has released the Qwen3.5 Medium Model series, consisting of four new large language models (LLMs) with support for agentic tool calling. Three of these models, Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B, are available for commercial usage by enterprises and indie developers under the standard open source Apache 2.0 license, while the fourth model, Qwen3.5-Flash, is proprietary and only available through the Alibaba Cloud Model Studio API.

The open source models offer comparably high performance on third-party benchmark tests to similarly-sized proprietary models from major U.S. startups like OpenAI or Anthropic, actually beating OpenAI’s GPT-5-mini and Anthropic’s Claude Sonnet 4.5. The Qwen team has engineered these models to remain highly accurate even when “quantized,” a process that reduces their footprint further by reducing the numbers by which the model’s settings are stored from many values to far fewer. The flagship Qwen3.5-35B-A3B can now exceed a 1 million token context length on consumer-grade GPUs with 32GB of VRAM. The technical specifications for the Qwen3.5-35B-A3B reveal a highly efficient design, with a sophisticated hybrid architecture that integrates Gated Delta Networks combined with a sparse Mixture-of-Experts (MoE) system.

The Qwen3.5 Medium Models bring “frontier-level” context windows to the desktop PC, allowing developers to process massive datasets without server-grade infrastructure. The models are tailored for varying hardware environments, with Qwen3.5-27B optimized for high efficiency, supporting a context length of over 800K tokens, and Qwen3.5-122B-A10B designed for server-grade GPUs (80GB VRAM), supporting 1M+ context lengths. The Alibaba Cloud Model Studio API provides a competitive pricing model for Qwen3.5-Flash, with input at $0.1 per 1M tokens, output at $0.4 per 1M tokens, and cache creation at $0.125 per 1M tokens. This makes Qwen3.5-Flash among the most affordable to run over API among all the major LLMs in the world.

The launch of the Qwen3.5 Medium Models has significant implications for enterprise technical leaders and decision-makers. With the ability to run these specialized “Mixture-of-Experts” models within a private firewall, organizations can maintain sovereign control over their data while utilizing native “thinking” modes and official tool-calling capabilities to build more reliable, autonomous agents. The shift toward architectural efficiency over raw scale ensures that AI integration remains cost-conscious, secure, and agile enough to keep pace with evolving operational needs. As a result, the rapid iteration and fine-tuning once reserved for well-funded labs is now accessible for on-premise development at many non-technical firms, effectively decoupling sophisticated AI from massive capital expenditure.

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts

AliExpress WW