From Production Crashes to Systematic Solutions: How AI-Human Collaboration Solves Complex Problems

11 minute read

From Production Crashes to Systematic Solutions: How AI-Human Collaboration Solves Complex ProblemsPermalink

My production analytics pipeline crashed with TypeError: type NAType doesn't define __round__ method. The AI agent immediately suggested: “Just add type handling for NaN values.” But the real question isn’t how to fix it—it’s why there are NaNs in the first place. Are there supposed to be NaNs at all?

This seemingly simple error revealed a multi-layered architectural issue that required systematic thinking to solve. More importantly, it demonstrated how structured AI-human collaboration can transform complex problem-solving when intuition fails.

How Debugging Usually Goes (And Why It’s Slow)Permalink

When something breaks in production, I usually:

Pull up logs and stare at stack traces for way too long
Try random stuff - maybe it’s a dependency issue? Maybe it’s a config problem?
Deploy to dev and hope I can reproduce it
Google the error message and try the first Stack Overflow answer
Add more logging everywhere and deploy again
Ask teammates who are probably dealing with their own fires
Eventually stumble onto the right solution after hours of trial and error

Sound familiar? This approach works, but it’s slow, frustrating, and often misses the real problem. You end up fixing symptoms instead of understanding root causes.

The debugging session I’m about to describe was different. By using structured frameworks with AI agents, I went from crash to root cause understanding in about 30 minutes instead of the usual 3-4 hours. The AI agents helped with discovery, hypothesis generation, and systematic investigation in ways that would have taken me much longer to do alone.

The Problem with Single-Solution ThinkingPermalink

When the NAType error first appeared, my initial instinct was to just add type handling and move on. The AI agent suggested the same thing: “Just add type handling for NaN values.” Technically correct, but this approach treats symptoms rather than understanding root causes.

The danger of accepting the first solution is that it often masks deeper issues. In my case, the real problem wasn’t the NAType error—it was a data flow mismatch where users without engagement data weren’t represented in the metrics dictionaries. The error was just the final symptom of a systematic data processing issue.

This is why systematic frameworks matter when intuition fails. Complex problems often have multiple layers that require structured investigation, not just quick fixes.

Framework 1: The Three-Hypothesis ApproachPermalink

The key insight from this debugging session was the power of generating multiple hypotheses with likelihood assessments. Instead of jumping to the first solution, I applied the structured debugging framework I’ve been developing. Here are the 3 hypotheses that the AI agent proposed.

**Hypothesis 1**: Lookback logic insufficient (likelihood: 7/10)
- I suspected incomplete label data due to insufficient lookback logic
- Evidence: Some users had missing engagement metrics
- Investigation: Implement 5-day lookback window

**Hypothesis 2**: NAType handling needed (likelihood: 6/10)  
- The AI suggested adding explicit filtering of `None`/`pd.NA` values
- Evidence: Error occurred when `np.mean()` was called on lists containing `pd.NA`
- Investigation: Add type checking before calculations

**Hypothesis 3**: Data flow mismatch (likelihood: 8/10)
- Users without engagements weren't represented in metrics dictionaries
- Evidence: `KeyError` during subset testing revealed the issue
- Investigation: Map data flow from `user_info` → `user_per_day_content_label_metrics`

I worked through each of these and crossed them out one by one until I figured out that Hypothesis 3 was the likeliest best. The subset testing revealed a fundamental data flow issue that regular testing had missed. Having the AI agent point me towards possible hypotheses, even if it doesn’t end up being correct, at least gets me to start debugging in a systematic way. Plus, for the AI model itself, having multiple hypotheses with likelihood assessments works better than single solutions. It forces systematic thinking and prevents premature conclusions - we all know that AI models have sycophancy problems, so having it give multiple ideas helps tamper that down.

Framework 2: Critical Analysis for Deep UnderstandingPermalink

The critical analysis framework helped me ask the right questions:

“Are there supposed to be NaNs at all?” - This question revealed that the issue wasn’t just missing data, but incomplete data processing
“What assumptions am I making about data completeness?” - This led me to discover the user set mismatch
“Where could data flow mismatches occur?” - This helped me trace the problem to its architectural root

Normally, I’d sit and stare at the computer and think deeply to do the initial hypothesis generation, but having the AI agent do the initial hypothesis generation helps me with the “creativity” component of analysis and gives me seeds to start exploring what the error could be.

The systematic questioning process revealed that the NAType error was actually a symptom of a deeper issue: users without engagements weren’t represented in the metrics dictionaries, leading to unsafe dictionary access in transformation functions.

Caveat: DO NOT TRUST THE AI BY ITSELFPermalink

I’ve experimented extensively with automating software engineering using AI, and I’ve learned a hard lesson: if you let AI agents run unsupervised for too long, they go off the rails. I’m bullish on the AI agents getting better and better over time, but at the moment they’re probably at the level of a supercharged new hire - you still need to be a manager. I think that you need at minimum a senior-level engineer level of skill to maximally use these AI models, since more of your time is spent in both design and in code review. But think of these models as really overeager, ambitious, yet unrefined junior engieners. They make changes to your data without proper context, get caught in self-reinforcing loops where they approve their own work, and write tests that pass trivially but don’t actually test anything meaningful. The code they produce is often overcomplicated and overengineered for what you actually need.

My workflow has shifted significantly. I now spend most of my time on planning, ideation, and code review. AI agents can handle code development and writing by themselves, especially when given detailed style guides and examples of high-quality human-written code. They follow stylistic patterns well, but you need multiple layers of checks. What works best is a collaborative model where the AI and human co-create a plan together, but then the human closely supervises execution—especially for complex systems that need to work correctly the first time in production. AI can’t be trusted on its own without thorough code review.

I’ve found that using additional AI agents for code review with different personas like QA architect or ML engineer helps probe results more thoroughly. But even with these safeguards, a human still needs to handle the critical thinking components: verifying the code actually works, checking it’s engineered correctly according to specs, and ensuring the AI has properly encoded the required business logic.

Another caveat: much of this stems from underspecification. I’ve worked to evolve my systems to better specify constraints and requirements so AI agents can iterate more efficiently. But even after maximizing this, there’s still a degree of underperformance that requires human oversight for anything serious in production.

I’ve had AI agents delete my .env files. I’ve had them delete database rows. Do NOT let your AI models run unsupervised without human intervention.

AI-Human Collaboration: The Best of Both WorldsPermalink

This debugging session highlighted the power of structured AI-human collaboration. Here’s what I learned about how to work effectively with AI agents:

AI’s Strengths:

Fast iteration and pattern recognition
Consistent application of debugging frameworks
Surface-level technical solutions
Systematic hypothesis generation

My Strengths:

Contextual judgment and assumption challenging
Understanding of my business logic and data flow
Final decision-making on architectural choices
Ability to ask “why” before “how”

The Collaboration Pattern:

AI generates multiple hypotheses with likelihood assessments
I apply contextual reasoning to prioritize investigation
AI provides systematic investigation steps
I make final judgment on solutions and architectural changes

This pattern works because it leverages each party’s strengths while compensating for their weaknesses. The AI provides structure and speed, while I ensure quality and contextual correctness. The key insight is that I can’t rely on the AI to make final decisions. It often gives incorrect solutions, premature optimizations, or contradicts itself. But it’s incredibly valuable for discovery, hypothesis generation, and systematic investigation.

The Systematic Discovery ProcessPermalink

The key insight from this case study is how structured frameworks provide structure when intuition fails. My debugging process followed a clear progression:

Surface Error: TypeError: type NAType doesn't define __round__ method
Initial Hypothesis: Add type handling (technically correct but incomplete)
Critical Question: “Are there supposed to be NaNs at all?”
Systematic Investigation: Three-hypothesis approach with likelihood assessments
Root Cause Discovery: Data flow mismatch between user sets
Multi-layered Solution: Type handling + data completeness + architectural fixes

Each step built on the previous one, with systematic frameworks providing the structure needed to navigate from symptoms to root causes. The AI agents were particularly helpful in steps 4 and 5—they could quickly generate hypotheses, assess likelihoods, and suggest investigation steps that would have taken me much longer to think through alone.

Key Insights: What This Case Study Teaches MePermalink

1. Multiple hypotheses beat single solutions The 3-hypothesis approach revealed complexity that a single solution would have missed. This mirrors how modern reasoning models work—generating multiple options with confidence scores before making decisions. I’ve found that having the AI generate 3 possible solutions with likelihoods works much better than asking for just one.

2. Context matters more than technical fixes Understanding “why” before “how” is crucial. The technical fix (type handling) was correct, but the contextual understanding (data flow mismatch) was what actually solved the problem. This is where my domain knowledge becomes critical—the AI can suggest technical solutions, but I need to understand the business logic.

3. AI-human collaboration is powerful Each party brings different strengths. AI provides speed and consistency, while I provide judgment and context. The combination is more effective than either alone. The AI agents can help with discovery and iteration in ways that would take me much longer to do manually.

4. Systematic frameworks provide structure When intuition fails, process succeeds. The structured debugging and critical analysis frameworks I’ve been developing provide the foundation for effective problem-solving. They give me a clear path forward when I’m stuck.

5. Question assumptions systematically The breakthrough came from asking “are there supposed to be NaNs at all?” This systematic questioning revealed hidden assumptions about data completeness and processing. The critical analysis framework helps me challenge my own assumptions in ways that feel natural.

Practical Implementation: Building Your Systematic ToolkitPermalink

Based on this case study, here’s how I’ve integrated systematic problem-solving into my AI collaboration workflow:

1. Always generate multiple hypotheses I don’t accept the first solution anymore. I use the 3-hypothesis approach with likelihood assessments to force systematic thinking. The AI agents are great at this, since they can quickly generate multiple options with confidence scores.

2. Apply critical analysis frameworks I ask “why” before “how.” I challenge assumptions about data completeness, processing logic, and architectural decisions. The critical analysis framework helps me do this systematically rather than randomly.

3. Use controlled experimentation Subset testing often reveals issues that full-scale testing misses. I start small, understand the problem, then scale up. The AI agents can help me design these experiments and interpret the results.

4. Document data flow assumptions Understanding how data moves between processing stages prevents dangerous assumptions about completeness and consistency. I’ve learned to be explicit about these assumptions in my code and documentation.

5. Build muscle memory for systematic thinking I practice applying these frameworks until they become second nature. The goal is to think systematically even when under pressure. The AI agents help by providing consistent structure and reminders about the frameworks.

Common Pitfalls and How to Avoid ThemPermalink

Premature optimization: AI often suggests technical fixes before understanding context. I always ask “why” before implementing “how.” The AI agents are great at suggesting solutions, but I need to understand the business logic first.

Single-solution thinking: The danger of accepting the first answer. I generate multiple hypotheses and assess likelihoods systematically. The 3-hypothesis approach has become my default.

Over-reliance on AI: My judgment is still crucial for final decisions. AI provides options, I make choices. I’ve learned that AI agents often give incorrect solutions, premature optimizations, or contradict themselves, so I can’t rely on them for final decisions.

Missing the “why”: I focus on understanding root causes, not just fixing symptoms. Systematic questioning reveals hidden assumptions. The critical analysis framework helps me do this consistently.

Conclusion: The Value of Systematic ThinkingPermalink

This debugging session transformed a simple NAType error into a comprehensive understanding of data flow architecture and systematic problem-solving. The key wasn’t the technical solution—it was the structured approach that revealed the real problem.

The broader value of systematic AI-human collaboration extends far beyond debugging. These same principles apply to architecture decisions, feature development, and any complex problem-solving scenario. When intuition fails, systematic frameworks provide the structure needed to navigate complexity effectively.

The compound benefits of building systematic thinking muscle memory are significant. Each debugging session becomes a learning opportunity that improves future problem-solving. The frameworks become second nature, enabling faster and more effective solutions to increasingly complex challenges.

Most importantly, this approach scales. As problems become more complex and AI capabilities grow, the ability to think systematically and collaborate effectively becomes increasingly valuable. The frameworks provide the foundation for navigating whatever challenges come next.

The NAType error was simple. The systematic thinking that solved it was not. And that’s exactly why it matters.

What’s NextPermalink

I’m continuing to develop and refine these systematic frameworks. The debugging and critical analysis prompts I’ve been working on are just the beginning. I’m exploring how to apply similar structured thinking to other areas of engineering—architecture decisions, feature development, and team collaboration.

The goal isn’t to replace intuition with process, but to have systematic frameworks ready when intuition fails. Because in my experience, the most complex problems are the ones where intuition leads you astray, and that’s exactly when you need structure the most.

Share on

Twitter Facebook LinkedIn

Mark Torres