Exploring Early Indicators of AGI in Coding Agents
A Case Study on MCP-Powered Systems — Interactive Dashboard
By Dr. Tali Režun, COTRUGLI Business School Co-lab initiative (Q1-Q2 2025)
Key Findings Overview
This dashboard visualizes key insights from a 2025 study examining whether Model Context Protocol (MCP) servers enhance the intelligence and reasoning of large language models (LLMs) in coding tasks, potentially revealing early indicators of Artificial General Intelligence (AGI).
Application Completion Rate
MCP-powered Grok 3 Mini agent completed ~90% of the RAG SaaS application
Development Time
Time needed for MCP-powered agent to build the application
Cost Efficiency
API call costs for MCP-powered Grok 3 Mini
Success vs Failure: MCP Impact
Research Questions Analysis
Can MCP servers enhance LLMs to build reasoning coding agents?
The research demonstrates that MCP servers significantly enhance LLMs' reasoning capabilities, enabling them to overcome otherwise insurmountable limitations.
Key MCP Contributions to LLM Reasoning
Context7 MCP Impact
- Provides real-time documentation access
- Reduces errors from outdated knowledge
- Enables accurate API implementation
- Bridges knowledge-cutoff limitations
Sequential Thinking MCP Impact
- Facilitates structured task decomposition
- Enhances logical step-by-step planning
- Improves problem-solving through reflection
- Enables complex workflow management
The study conclusively shows that MCP servers effectively bridge the gap between entry-level LLMs' inherent limitations and the requirements for complex software development tasks.
How do intelligence and reasoning manifest in LLM-driven coding processes?
The research identified specific manifestations of intelligence and reasoning in the coding processes of MCP-enhanced LLMs:
Intelligence Manifestations
- 1 Adaptive Learning: Adjusting approach based on current documentation and integration requirements
- 2 Framework Integration: Assembling diverse frameworks (Python, Streamlit, n8n, Supabase) into cohesive applications
- 3 Performance Optimization: Implementing efficient webhook communication and database schema designs
- 4 Error Identification: Recognizing and addressing integration issues between components
Reasoning Manifestations
- 1 Logical Analysis: Breaking down complex problems into manageable steps
- 2 Contextual Awareness: Maintaining project scope understanding through Knowledge Graph Memory MCP
- 3 Evidence-Based Decision-Making: Referencing documentation for implementation choices
- 4 Iterative Problem-Solving: Progressive refinement of solutions through trial and error
These manifestations closely mirror human-like cognitive processes in software development, although they remain fundamentally probabilistic in nature rather than representing true understanding.
Are there early signs of AGI in coding agents augmented by MCPs?
The research identified both signs supporting and contradicting early AGI capabilities in MCP-augmented coding agents:
Supporting AGI Indicators
- Human-like task decomposition capabilities
- Adaptive integration of diverse libraries/frameworks
- Real-time data access and incorporation
- Context maintenance across development sessions
- High (90%) task completion rate for complex applications
Contradicting AGI Indicators
- Inability to handle complex debugging scenarios
- Probabilistic rather than truly intentional processing
- Struggle with novel scenarios outside training data
- Dependency on external MCP infrastructure
- Reliance on flagship LLMs for complex debugging
While MCP-augmented coding agents demonstrate several AGI-like capabilities, particularly in task planning and adaptability, they still fall short of true AGI due to their probabilistic nature and dependency on external systems. These agents represent an important step toward AGI in specialized domains rather than full AGI itself.
MCP Server Performance & Capabilities
The research evaluated multiple MCP servers to determine their effectiveness in enhancing LLM capabilities for coding tasks.
MCP SERVER | PRIMARY FUNCTION | CAPABILITIES | PERFORMANCE RATING |
---|---|---|---|
Context7 | Real-time documentation access | Up-to-date API docs Version-specific information Code example retrieval | 95% |
Sequential Thinking | Task planning & reasoning | Structured planning Task decomposition Logical reasoning | 90% |
Knowledge Graph Memory | Contextual awareness | Project context maintenance Relationship tracking Limited integration depth | 85% |
GitHub | Repository management | Code version tracking Repository integration Limited change reasoning | 80% |
Supabase | Database operations | Schema management Query generation Complex join limitations | 78% |
Source: Based on research data from "Exploring Early Indicators of AGI in Coding Agents" (Dr. Tali Režun, May 2025) and additional MCP server performance metrics from Anthropic, Upstash, and the Model Context Protocol community (2025).
LLM Cost and Performance Comparison
The study compared several LLMs to determine cost-efficiency and performance in MCP-powered coding tasks.
API Cost Per 1,000 Tokens
Development Time and API Cost
Cost-Performance Matrix
Data Source: "Exploring Early Indicators of AGI in Coding Agents" (Dr. Tali Režun, May 2025) Table 1: Development Time and API Cost Comparison, and Table 2: API Cost Comparison per 1,000 Tokens.
Development Journey Timeline
This timeline illustrates the key stages in the MCP-powered coding agent's development process.
Research & Tool Evaluation
Systematic evaluation of Vibe Coding platforms, MCP servers, and coding agents to find the optimal components.
Coding Agent Development
Integration of five MCPs: Context7, Sequential Thinking, Knowledge Graph Memory, GitHub, and Supabase.
Prompt Engineering
Refinement of prompts to ensure clear instructions for the Cline agent, specifying application requirements.
MCP-Powered Development
Grok 3 Mini with MCP servers completed approximately 90% of the application, including webhook endpoints, database schemas, and UI.
Flagship LLM Completion
Claude Sonnet 3.7 resolved the remaining 10%, including complex authentication, animations, and security features.
Testing & Analysis
Evaluation of the agent's performance, reasoning capabilities, and AGI indicators across 300 hours of active testing.
Conclusions & Future Implications
Key Achievements
- 90% application completion with entry-level LLM + MCPs
- Cost-effective development ($30 for Grok 3 Mini, $50 for Claude Sonnet 3.7)
- Rapid development (100 hours total)
- Multiple MCP servers functioning as an integrated system
Remaining Challenges
- Complex debugging still requires flagship LLMs
- Probabilistic nature limits true reasoning
- Dependency on external MCP infrastructure
- Limited adaptability to novel scenarios
Final Assessment:
MCP-powered coding agents demonstrate early indicators of AGI-like capabilities in specific domains but fall short of true AGI. The research suggests that standardized protocols like MCP are critical for advancing AI-driven software development, offering a glimpse into the future of coding agents as potential precursors to more generalized artificial intelligence.
Future Research Directions
- Enhancing MCP servers for more complex debugging scenarios
- Developing more sophisticated reasoning MCPs with metacognitive capabilities
- Exploring multi-agent systems with specialized MCP-powered agents
- Investigating MCP applications beyond coding, such as scientific research or creative content generation
Primary Source
Režun, T. (May, 2025). Exploring Early Indicators of AGI in Coding Agents: A Case Study on MCP-Powered Systems. International Leadership Journal, COTRUGLI Business School.
Dashboard created by Claude 3.7 Sonnet - May 2025
Based on research by Dr. Tali Režun, COTRUGLI Business School
Data visualizations powered by ECharts and Chart.js