Exploring Early Indicators of AGI in Coding Agents

AGI Indicators in MCP-Powered Coding Agents

Key Findings Overview

This dashboard visualizes key insights from a 2025 study examining whether Model Context Protocol (MCP) servers enhance the intelligence and reasoning of large language models (LLMs) in coding tasks, potentially revealing early indicators of Artificial General Intelligence (AGI).

Application Completion Rate

≈90%

MCP-powered Grok 3 Mini agent completed ~90% of the RAG SaaS application

Development Time

90 hours

Time needed for MCP-powered agent to build the application

Cost Efficiency

~$30

API call costs for MCP-powered Grok 3 Mini

Success vs Failure: MCP Impact

Research Questions Analysis

Q1: MCP Enhancement

Q2: Intelligence & Reasoning

Q3: AGI Signs

Can MCP servers enhance LLMs to build reasoning coding agents?

The research demonstrates that MCP servers significantly enhance LLMs' reasoning capabilities, enabling them to overcome otherwise insurmountable limitations.

Key MCP Contributions to LLM Reasoning

Context7 MCP Impact

Provides real-time documentation access
Reduces errors from outdated knowledge
Enables accurate API implementation
Bridges knowledge-cutoff limitations

Sequential Thinking MCP Impact

Facilitates structured task decomposition
Enhances logical step-by-step planning
Improves problem-solving through reflection
Enables complex workflow management

The study conclusively shows that MCP servers effectively bridge the gap between entry-level LLMs' inherent limitations and the requirements for complex software development tasks.

How do intelligence and reasoning manifest in LLM-driven coding processes?

The research identified specific manifestations of intelligence and reasoning in the coding processes of MCP-enhanced LLMs:

Intelligence Manifestations

1 Adaptive Learning: Adjusting approach based on current documentation and integration requirements
2 Framework Integration: Assembling diverse frameworks (Python, Streamlit, n8n, Supabase) into cohesive applications
3 Performance Optimization: Implementing efficient webhook communication and database schema designs
4 Error Identification: Recognizing and addressing integration issues between components

Reasoning Manifestations

1 Logical Analysis: Breaking down complex problems into manageable steps
2 Contextual Awareness: Maintaining project scope understanding through Knowledge Graph Memory MCP
3 Evidence-Based Decision-Making: Referencing documentation for implementation choices
4 Iterative Problem-Solving: Progressive refinement of solutions through trial and error

These manifestations closely mirror human-like cognitive processes in software development, although they remain fundamentally probabilistic in nature rather than representing true understanding.

Are there early signs of AGI in coding agents augmented by MCPs?

The research identified both signs supporting and contradicting early AGI capabilities in MCP-augmented coding agents:

Supporting AGI Indicators

Human-like task decomposition capabilities
Adaptive integration of diverse libraries/frameworks
Real-time data access and incorporation
Context maintenance across development sessions
High (90%) task completion rate for complex applications

Contradicting AGI Indicators

Inability to handle complex debugging scenarios
Probabilistic rather than truly intentional processing
Struggle with novel scenarios outside training data
Dependency on external MCP infrastructure
Reliance on flagship LLMs for complex debugging

While MCP-augmented coding agents demonstrate several AGI-like capabilities, particularly in task planning and adaptability, they still fall short of true AGI due to their probabilistic nature and dependency on external systems. These agents represent an important step toward AGI in specialized domains rather than full AGI itself.

MCP Server Performance & Capabilities

The research evaluated multiple MCP servers to determine their effectiveness in enhancing LLM capabilities for coding tasks.

MCP SERVER	PRIMARY FUNCTION	CAPABILITIES	PERFORMANCE RATING
Context7	Real-time documentation access	Up-to-date API docs Version-specific information Code example retrieval	95%
Sequential Thinking	Task planning & reasoning	Structured planning Task decomposition Logical reasoning	90%
Knowledge Graph Memory	Contextual awareness	Project context maintenance Relationship tracking Limited integration depth	85%
GitHub	Repository management	Code version tracking Repository integration Limited change reasoning	80%
Supabase	Database operations	Schema management Query generation Complex join limitations	78%

Source: Based on research data from "Exploring Early Indicators of AGI in Coding Agents" (Dr. Tali Režun, May 2025) and additional MCP server performance metrics from Anthropic, Upstash, and the Model Context Protocol community (2025).

LLM Cost and Performance Comparison

The study compared several LLMs to determine cost-efficiency and performance in MCP-powered coding tasks.

API Cost Per 1,000 Tokens

Development Time and API Cost

Cost-Performance Matrix

Data Source: "Exploring Early Indicators of AGI in Coding Agents" (Dr. Tali Režun, May 2025) Table 1: Development Time and API Cost Comparison, and Table 2: API Cost Comparison per 1,000 Tokens.

Development Journey Timeline

This timeline illustrates the key stages in the MCP-powered coding agent's development process.

Day 1

Research & Tool Evaluation

Systematic evaluation of Vibe Coding platforms, MCP servers, and coding agents to find the optimal components.

Day 2-3

Coding Agent Development

Integration of five MCPs: Context7, Sequential Thinking, Knowledge Graph Memory, GitHub, and Supabase.

Day 4

Prompt Engineering

Refinement of prompts to ensure clear instructions for the Cline agent, specifying application requirements.

Day 5-9

MCP-Powered Development

Grok 3 Mini with MCP servers completed approximately 90% of the application, including webhook endpoints, database schemas, and UI.

Day 10-11

Flagship LLM Completion

Claude Sonnet 3.7 resolved the remaining 10%, including complex authentication, animations, and security features.

Day 12-30

Testing & Analysis

Evaluation of the agent's performance, reasoning capabilities, and AGI indicators across 300 hours of active testing.

Conclusions & Future Implications

Key Achievements

90% application completion with entry-level LLM + MCPs
Cost-effective development ($30 for Grok 3 Mini, $50 for Claude Sonnet 3.7)
Rapid development (100 hours total)
Multiple MCP servers functioning as an integrated system

Remaining Challenges

Complex debugging still requires flagship LLMs
Probabilistic nature limits true reasoning
Dependency on external MCP infrastructure
Limited adaptability to novel scenarios

Final Assessment:

MCP-powered coding agents demonstrate early indicators of AGI-like capabilities in specific domains but fall short of true AGI. The research suggests that standardized protocols like MCP are critical for advancing AI-driven software development, offering a glimpse into the future of coding agents as potential precursors to more generalized artificial intelligence.

Future Research Directions

Enhancing MCP servers for more complex debugging scenarios
Developing more sophisticated reasoning MCPs with metacognitive capabilities
Exploring multi-agent systems with specialized MCP-powered agents
Investigating MCP applications beyond coding, such as scientific research or creative content generation

Primary Source

Režun, T. (May, 2025). Exploring Early Indicators of AGI in Coding Agents: A Case Study on MCP-Powered Systems. International Leadership Journal, COTRUGLI Business School.

Dashboard created by Claude 3.7 Sonnet - May 2025

Based on research by Dr. Tali Režun, COTRUGLI Business School