Blog | Marvin Zhang

Introducing LeanSpec: A Lightweight SDD Framework Built from First Principles

November 27, 2025 · 8 min read

Software Engineer & Open Source Enthusiast

Earlier this year, I was amazed by agentic AI coding with Claude Sonnet 3.7. The term "vibe coding" hadn't been coined yet, but that's exactly what I was doing—letting AI generate code while I steered the conversation. It felt magical. Until it didn't.

After a few weeks, I noticed patterns: code redundancy creeping in, intentions drifting from my original vision, and increasing rework as the AI forgot context between sessions. The honeymoon was over. I needed structure, but not the heavyweight processes that would kill the speed I'd gained.

Spec-Driven Development in 2025: Industrial Tools, Frameworks, and Best Practices

October 22, 2025 · 21 min read

Marvin Zhang

Software Engineer & Open Source Enthusiast

Introduction: The Industrial Revolution of AI-Assisted Development

25% of Y Combinator's 2025 cohort now ships codebases that are 95% AI-generated. The difference between those who succeed and those who drown in technical debt? Specifications. While "vibe coding"—the ad-hoc, prompt-driven approach to AI development—might produce impressive demos, it falls apart at production scale. Context loss, architectural drift, and maintainability nightmares plague teams that treat AI assistants like enhanced search engines.

2025 marks the tipping point. What started as experimental tooling has matured into production-ready frameworks backed by both open-source momentum and substantial enterprise investment. GitHub's Spec Kit has become the de facto standard for open-source SDD adoption. Amazon launched Kiro, an IDE with SDD built into its core. Tessl, founded by Snyk's creator, raised $125M at a $500M+ valuation to pioneer "spec-as-source" development. The industry signal is clear: systematic specification-driven development (SDD) isn't optional anymore—it's becoming table stakes for AI-augmented engineering.

If you're a technical lead evaluating how to harness AI development without sacrificing code quality, this comprehensive guide maps the entire SDD landscape. You'll understand the ecosystem of 6 major tools and frameworks, learn industry best practices from real production deployments, and get actionable frameworks for choosing and implementing the right approach for your team.

Leadership Skills in the AI Era: Beyond Traditional Management

October 9, 2025 · 15 min read

Marvin Zhang

Software Engineer & Open Source Enthusiast

The first time an AI system disagreed with my architectural decision and turned out to be right, I realized something fundamental had changed—not about AI, but about what leadership means. This wasn't a story about better technology; it was about how my role as a leader needed to evolve. The skills that made me effective in leading human teams weren't suddenly obsolete, but they required significant adaptation when AI became part of the equation.

If you're a tech leader today, you've likely felt this tension. As research shows, AI's impact on productivity is real but nuanced—it's not a silver bullet that solves all problems automatically. You know the traditional leadership skills that matter: technical depth, business domain knowledge, interpersonal skills, and political navigation. These haven't disappeared. But AI introduces a new dimension where these skills must expand and adapt. You're no longer just leading people or directing tools; you're orchestrating a hybrid environment where human judgment, traditional management wisdom, and AI capabilities need to work in harmony.

The Physics of Code: Understanding Fundamental Limits in Computing (Part 2)

October 5, 2025 · 16 min read

Marvin Zhang

Software Engineer & Open Source Enthusiast

Introduction: From Theory to Practice

In Part 1 of this series, we established the foundational concepts of computational limits: the distinction between fundamental and engineering limits, the four-tier computational hierarchy, formal complexity measures, and the intelligence-computability paradox. We explored why some problems that seem simple (like the halting problem) are mathematically impossible, while problems that seem to require sophisticated intelligence (like machine translation) are decidable.

Now, in Part 2, we move from abstract theory to practical application. This article explores how these fundamental limits manifest in daily engineering decisions, examines historical patterns showing that understanding constraints unleashes innovation, and connects computational limits to profound philosophical questions about logic, mathematics, and consciousness. We'll conclude with a practical framework you can use immediately to classify problems and make better engineering decisions.

Article Series

This is Part 2 of a two-part series. Part 1 covered the nature of limits, the computational hierarchy, complexity measures, and the intelligence-computability paradox. Part 2 explores practical applications, historical lessons, and philosophical foundations.

The Physics of Code: Understanding Fundamental Limits in Computing (Part 1)

October 4, 2025 · 25 min read

Marvin Zhang

Software Engineer & Open Source Enthusiast

Introduction: The Universal Speed Limit of Code

In 1905, Albert Einstein proved something revolutionary: nothing can travel faster than the speed of light. This isn't an engineering constraint that better technology might overcome—it's a fundamental property of spacetime itself, encoded in the structure of reality. Three decades later, in 1936, Alan Turing proved an equally profound result for computing: no algorithm can determine whether an arbitrary program will halt (known as the halting problem). Like Einstein's light speed barrier, this isn't a limitation of current computers or programming languages. It's a mathematical certainty that will remain true forever, regardless of how powerful our machines become or how clever our algorithms get.

Modern software engineering operates in the shadow of these fundamental limits, though most engineers encounter them as frustrating tool limitations rather than mathematical certainties. You've likely experienced this: a static analysis tool that misses obvious bugs, a testing framework that can't guarantee correctness despite 100% coverage, an AI assistant that generates code requiring careful human review. When marketing materials promise "complete automated verification" or "guaranteed bug detection," you might sense something's wrong—these claims feel too good to be true.

They are. The limitations you encounter aren't temporary engineering challenges awaiting better tools—they're manifestations of fundamental mathematical impossibilities, as immutable as the speed of light or absolute zero. Understanding these limits transforms from constraint into competitive advantage: knowing what's impossible focuses your energy on what's achievable, much as physicists leveraging relativity enabled GPS satellites and particle physics rather than wasting resources trying to exceed light speed.

If you're a developer who has wondered why certain problems persist despite decades of tool development, or a technical leader evaluating claims about revolutionary testing or verification technologies, this article offers crucial context. Understanding computational limits isn't defeatist—it's the foundation of engineering maturity. The best engineers don't ignore these boundaries; they understand them deeply and work brilliantly within them.

This journey explores how computational limits mirror physical laws, why "hard" problems differ fundamentally from "impossible" ones, and how this knowledge empowers better engineering decisions. We'll traverse from comfortable physical analogies to abstract computational theory, then back to practical frameworks you can apply tomorrow. Along the way, you'll discover why knowing the rules of the game makes you more effective at playing it, and how every breakthrough innovation in computing history emerged not by ignoring limits, but by deeply understanding them.

Article Series

This is Part 1 of a two-part series exploring fundamental limits in computing. Part 1 covers the nature of limits, the computational hierarchy, complexity measures, and the intelligence-computability paradox. Part 2 explores practical engineering implications, historical lessons, and philosophical foundations.

Sorry, AI Can't Save Testing: Rice's Theorem Explains Why

October 2, 2025 · 20 min read

Marvin Zhang

Software Engineer & Open Source Enthusiast

Introduction: The Impossible Dream of Perfect Testing

"Testing shows the presence, not the absence of bugs." When Dutch computer scientist Edsger Dijkstra made this observation in 1970, he was articulating a fundamental truth about software testing that remains relevant today. Yet despite this wisdom, the software industry continues to pursue an elusive goal: comprehensive automated testing that can guarantee software correctness.

If you're a developer who has ever wondered why achieving 100% test coverage still doesn't guarantee bug-free code, or why your carefully crafted test suite occasionally misses critical issues, you're confronting a deeper reality. The limitations of automated testing aren't merely engineering challenges to be overcome with better tools or techniques—they're rooted in fundamental mathematical impossibilities.

The current wave of AI-powered testing tools promises to revolutionize quality assurance. Marketing materials tout intelligent test generation, autonomous bug detection, and unprecedented coverage. While these tools offer genuine improvements, they cannot escape a theoretical constraint established over seventy years ago by mathematician Henry Gordon Rice. His theorem proves that certain questions about program behavior simply cannot be answered algorithmically, regardless of computational power or ingenuity.

This isn't a pessimistic view—it's a realistic one. Understanding why complete test automation is mathematically impossible helps us make better decisions about where to invest testing efforts and how to leverage modern tools effectively. Rather than chasing an unattainable goal of perfect automation, we can adopt pragmatic approaches that acknowledge these limits while maximizing practical effectiveness.

This article explores Rice's Theorem and its profound implications for software testing. We'll examine what this mathematical result actually proves, understand how it constrains automated testing, and discover how combining formal specifications with AI-driven test generation offers a practical path forward. You'll learn why knowing the boundaries of what's possible makes you a more effective engineer, not a defeated one.

The journey ahead takes us from theoretical computer science to everyday development practices, showing how deep principles inform better engineering. Whether you're writing unit tests, designing test strategies, or evaluating new testing tools, understanding these fundamentals will sharpen your judgment and improve your results.

From Chatbots to Agents: Building Enterprise-Grade LLM Applications

September 24, 2025 · 22 min read

Marvin Zhang

Software Engineer & Open Source Enthusiast

Picture this: It's Monday morning, and you're sitting in yet another meeting about why your company's LLM application can't seem to move beyond the demo stage. Your team has built a sophisticated GPT-4o-powered agent that handles complex customer inquiries, orchestrates with internal systems through function calls, and even manages multi-step workflows with impressive intelligence. Leadership is excited, budget approved. But six months later, you're still trapped in what industry veterans call "demo purgatory"—that endless cycle of promising LLM applications that never quite achieve reliable production deployment.

If this scenario sounds familiar, you're not alone. Whether organizations are building with hosted APIs like GPT-4o, Claude Sonnet 4, and Gemini 2.5 Pro, or deploying self-hosted models like DeepSeek-R1, QwQ, Gemma 3, and Phi 4, the vast majority struggle to move beyond experimental pilots. Recent research shows that AI's productivity benefits are highly contextual, with structured approaches significantly outperforming ad-hoc usage. The bottleneck isn't the sophistication of your LLM integration, the choice between hosted versus self-hosted models, or the talent of your AI development team. It's something more fundamental: the data foundation underlying your LLM applications.

The uncomfortable truth is this: Whether you're using GPT-4o APIs or self-hosted DeepSeek-R1, the real challenge isn't model selection—it's feeding these models the right data at the right time. Your sophisticated AI agent is only as intelligent as your data infrastructure allows it to be.

If you've ever tried to transform an impressive AI demo into a production system only to hit a wall of fragmented systems, inconsistent APIs, missing lineage, and unreliable retrieval—this article is for you. We argue that successful enterprise LLM applications are built on robust data infrastructure, not just clever prompting or agent frameworks.

Here's what we'll cover: how data accessibility challenges constrain even the most capable models, the infrastructure patterns that enable reliable tool use and context management, governance frameworks designed for LLM-specific risks, and concrete implementation strategies for building production-ready systems that scale.

The solution isn't better prompts or bigger models—it's better data foundations. Let's start with why.

Spec-Driven Development: A Systematic Approach to Complex Features

September 16, 2025 · 18 min read

Marvin Zhang

Software Engineer & Open Source Enthusiast

Introduction: The Challenge of Complex Feature Development

Every developer knows the feeling of staring at a complex requirement and wondering where to begin. Modern software development increasingly involves building systems that integrate multiple services, handle diverse data formats, and coordinate across different APIs. What appears straightforward in initial specifications often evolves into intricate webs of interdependent components, each with their own constraints and edge cases.

This complexity manifests in several common development challenges that teams face regardless of their experience level or technology stack. Projects frequently suffer from scope creep as requirements evolve during implementation. Developers spend significant time explaining context to AI assistants or team members, often repeating the same architectural constraints across multiple conversations. Technical debt accumulates as developers make hasty decisions under pressure, leading to systems that become increasingly difficult to maintain and extend.

Context Engineering: The Art of Information Selection in AI Systems

September 14, 2025 · 15 min read

Marvin Zhang

Software Engineer & Open Source Enthusiast

"Context engineering is building dynamic systems to provide the right information and tools in the right format such that the LLM can plausibly accomplish the task." — LangChain

If you've been building with AI for a while, you've probably hit the wall where simple prompts just aren't enough anymore. Your carefully crafted prompts fail on edge cases, your AI assistant gets confused with complex tasks, and your applications struggle to maintain coherent conversations. These frustrations aren't accidental—they reveal a fundamental shift happening in AI development.

Companies like OpenAI, Anthropic, Notion, and GitHub aren't just building better models; they're pioneering entirely new approaches to how information, tools, and structure flow into AI systems. This is the essence of context engineering.

Unattended AI Programming: My Experience Using GitHub Copilot Agent for Content Migration

September 10, 2025 · 7 min read

Marvin Zhang

Software Engineer & Open Source Enthusiast

Introduction

Recently, I successfully used GitHub Copilot Agent to migrate all my archived markdown articles to this Docusaurus-based blog, and the experience was surprisingly smooth and efficient. What impressed me most wasn't just the AI's ability to handle repetitive tasks, but also how I could guide it to work autonomously while I focused on higher-level decisions. Even more fascinating was that I could review and guide the AI agent's work using my phone during commutes or breaks. This experience fundamentally changed my perspective on AI-assisted development workflows.

Here's a showcase of the bilingual blog after migration completion:

Figure 1: Migration results overview (Chinese)

Figure 2: Migration results overview (English)

Introduction: The Industrial Revolution of AI-Assisted Development​

Introduction: From Theory to Practice​

Introduction: The Universal Speed Limit of Code​

Introduction: The Impossible Dream of Perfect Testing​

Introduction: The Challenge of Complex Feature Development​

Introduction​

Introduction: The Industrial Revolution of AI-Assisted Development

Introduction: From Theory to Practice

Introduction: The Universal Speed Limit of Code

Introduction: The Impossible Dream of Perfect Testing

Introduction: The Challenge of Complex Feature Development

Introduction