Alibaba Cloud Open Sources Its First AI Reasoning Model, QwQ: Its Reasoning Level Is Comparable to OpenAI o1, and Its Mathematics and Programming Are Particularly Outstanding - Alibaba Cloud 阿里云

       Today, Alibaba Cloud Tongyi Team announced the launch and simultaneous open-source release of a new AI inference model, QwQ-32B-Preview. Evaluations show that the QwQ (Qwen with Questions) Preview demonstrates graduate-level scientific reasoning capabilities, especially in mathematics and programming. Its overall reasoning capabilities are comparable to OpenAI's o1.
       It is reported that QwQ is the latest experimental research model launched by Tongyi Qianwen Qwen big model and the first open-source artificial intelligence reasoning model by Alibaba Cloud.
       Alibaba Cloud Tongyi Qianwen's team found that when a model has enough time to think, question, and reflect, its understanding of mathematics and programming deepens. Based on this, QwQ has made breakthrough progress in solving complex problems.
       In the GPQA test measuring scientific problem solving ability, QwQ achieved an accuracy of 65.2%, indicating his scientific reasoning ability at the graduate level; in the AIME test, QwQ demonstrated his ability to solve mathematical problems with a winning rate of 50%.
       In the MATH-500 test, QwQ outperformed o1-preview and o1-mini, achieving a high score of 90.6%. In the LiveCodeBench test, which assesses complex code generation, QwQ answered half of the questions correctly, and also performed well in the programming competition.
       Moreover, when faced with complex problems, QwQ is able to engage in deep self-analysis, question their own assumptions, and carefully examine every step of their reasoning process through thoughtful internal dialogue.
       For example, when solving the classic IQ problem “Guess the Card,” QwQ understood the conversation and drew conclusions like a good thinker, and eventually came to the correct answer.
       The QwQ-32B-Preview is currently open source on platforms such as MoDa Community and HuggingFace. Within just a few hours of its release, it has attracted rave reviews from developers around the world.
       Some developers believe that this model is “a completely unexpected and crazy leap forward,” “the most significant breakthrough in open source this year,” and “gives China an advantage in open source large models and AI reasoning.”
       However, the Tongyi team also stated that although QwQ has demonstrated strong analytical capabilities, it is still an experimental model for research. It has limitations such as the mixed use of different languages, occasional irrelevant biases, and lack of understanding of professional issues. As research deepens and models are iterated in the future, these issues will gradually be addressed.