Alibaba's Qwen AI development team has released the Qwen3.5 Medium Model series, consisting of four new large language models (LLMs) with support for agentic tool calling, three of which are available for commercial usage under the Apache 2.0 license. The open-source models, including Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B, offer high performance on third-party benchmark tests, comparable to similarly-sized proprietary models from major U.S. startups like OpenAI or Anthropic, and even surpassing OpenAI's GPT-5-mini and Anthropic's Claude Sonnet 4.5. The Qwen3.5 models are engineered to remain highly accurate even when "quantized," allowing for reduced footprint and enabling developers to process massive datasets without server-grade infrastructure, with the flagship Qwen3.5-35B-A3B capable of exceeding a 1 million token context length on consumer-grade GPUs with 32GB of VRAM. The models feature a sophisticated hybrid architecture, integrating Gated Delta Networks and a sparse Mixture-of-Experts (MoE) system, with technical specifications including parameter efficiency, expert diversity, and near-lossless quantization, making them highly efficient and suitable for local deployment. Alibaba has also open-sourced the Qwen3.5-35B-A3B-Base model, and the Qwen3.5-Flash model is available through the Alibaba Cloud Model Studio API, with competitive pricing, including $0.1 per 1M tokens for input and $0.4 per 1M tokens for output, making it one of the most affordable options among major LLMs. The release of the Qwen3.5 Medium Models has significant implications for enterprise technical leaders and decision-makers, as it decouples sophisticated AI from massive capital expenditure, allowing for on-premise development and enabling organizations to maintain sovereign control over their data while utilizing native "thinking" modes and official tool-calling capabilities.

Alibaba’s Qwen AI development team has released the Qwen3.5 Medium Model series, consisting of four new large language models (LLMs) with support for agentic tool calling. Three of these models, Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B, are available for commercial usage by enterprises and indie developers under the standard open source Apache 2.0 license, while the fourth model, Qwen3.5-Flash, is proprietary and only available through the Alibaba Cloud Model Studio API.

The open source models offer comparably high performance on third-party benchmark tests to similarly-sized proprietary models from major U.S. startups like OpenAI or Anthropic, actually beating OpenAI’s GPT-5-mini and Anthropic’s Claude Sonnet 4.5. The Qwen team has engineered these models to remain highly accurate even when “quantized,” a process that reduces their footprint further by reducing the numbers by which the model’s settings are stored from many values to far fewer. The flagship Qwen3.5-35B-A3B can now exceed a 1 million token context length on consumer-grade GPUs with 32GB of VRAM. The technical specifications for the Qwen3.5-35B-A3B reveal a highly efficient design, with a sophisticated hybrid architecture that integrates Gated Delta Networks combined with a sparse Mixture-of-Experts (MoE) system.

The Qwen3.5 Medium Models bring “frontier-level” context windows to the desktop PC, allowing developers to process massive datasets without server-grade infrastructure. The models are tailored for varying hardware environments, with Qwen3.5-27B optimized for high efficiency, supporting a context length of over 800K tokens, and Qwen3.5-122B-A10B designed for server-grade GPUs (80GB VRAM), supporting 1M+ context lengths. The Alibaba Cloud Model Studio API provides a competitive pricing model for Qwen3.5-Flash, with input at $0.1 per 1M tokens, output at $0.4 per 1M tokens, and cache creation at $0.125 per 1M tokens. This makes Qwen3.5-Flash among the most affordable to run over API among all the major LLMs in the world.

The launch of the Qwen3.5 Medium Models has significant implications for enterprise technical leaders and decision-makers. With the ability to run these specialized “Mixture-of-Experts” models within a private firewall, organizations can maintain sovereign control over their data while utilizing native “thinking” modes and official tool-calling capabilities to build more reliable, autonomous agents. The shift toward architectural efficiency over raw scale ensures that AI integration remains cost-conscious, secure, and agile enough to keep pace with evolving operational needs. As a result, the rapid iteration and fine-tuning once reserved for well-funded labs is now accessible for on-premise development at many non-technical firms, effectively decoupling sophisticated AI from massive capital expenditure.

Techno News

Alibaba’s Qwen AI development team has released the Qwen3.5 Medium Model series, consisting of four new large languag…

Leave a Reply Cancel reply

Recent Posts