Building text to SQL agent solution

When state-of-the-art text-to-SQL models fail in production environments, what does it take to build a system that actually works? This case study documents the development of Woody, an agentic text to SQL system that transforms how lumber industry professionals interact with their data. v1
ml
RAG
agent
Author

Jaekang Lee

Published

June 25, 2025

0. Introduction

I joined KP Logix, a lumber middleman ERP software company, with a mandate to answer a critical question: “How can AI deliver measurable value to our users?”

The users were:

  • Salespeople negotiating lumber contracts across North America and Europe
  • Logistics coordinators managing complex shipments all over America and Europe
  • Accountants tracking payments and invoicing across multiple currencies
  • Managers making strategic decisions about inventory and customer relationships

The answer to this question was ‘AI can empower users to make customers happier’. I delivered Woody, an agentic solution prototype that showed increase of 43.21% average weekly user engagement growth.

0.1 Value centric proposition

Following was my proposition. A sophisticated text-to-SQL agent wouldn’t just answer questions—it would unlock dynamic user problem solution.

Figure 0: User problem solution by asking ‘why?’ repeatedly

Hence the heart of this project was text to SQL.

1. Start somewhere

While the vision was clear, the implementation challenge was daunting: The challenge in the beginning was the overwhelming amount of business context. The database had 81 tables and 588 columns and unknown amount of undocumented domain knowledge.

Figure 1: Woody Database Overview - Shows 81 tables with web of relationships (censored for privacy)

1.0.1 Why existing solutions fail

At the time, ‘XiYan-SQL’ was at the top of text-to=SQL models according to paperwithcodes.com which attempts to solve text-to-SQL with smart table and column schema extraction automatically.

However, traditional text-to-SQL approaches fail in real-world enterprise environments because they assume: 📖 Documentation exists - Reality: Critical business rules live in people’s heads 🏢 Standard terminology - Reality: “Port” means “Port Pearlston”, “DLS” means specific lumber categories 📊 Simple queries - Reality: “How much contract did we sell?” requires 200+ lines of SQL due to unstructured nature of user data

Hence I started with one very simple question.

1.1 “How many 2x4 dls in Pearlston?”

Following is the correct sql to answer this simple question. This question is a real question a user might ask to Woody.

SELECT COUNT(packag.id) 
FROM package JOIN product ON package.product_id = product.id JOIN inventory ON package.inventory_id = inventory.id 
WHERE package.status = 1
AND product.thickness = 2
AND product.width = 4
AND product.category = 'dls'
AND inventory.name LIKE 'pearlston%';

To answer this question correctly, the system must demonstrate four distinct types of intelligence:

1. Logic Recognition Intelligence

  • Recognize: This is an inventory availability query
  • Understand: COUNT operation needed on available packages
  • Determine: Multi-table JOIN requirements across package → product → inventory

2. Domain Terminology Intelligence

  • Translate: “2x4” → thickness = 2 AND width = 4
  • Map: “CLS” → category = 'CLS' (lumber grade classification)
  • Resolve: “Pearlston” → inventory.name LIKE 'pearlston%' (location matching)

3. Business Logic Intelligence

  • Infer: User wants “available” inventory → package.status = 1
  • Apply: Industry convention that status codes have specific meanings
  • Context: Understand inventory vs. sold vs. shipped vs. reserved states

4. Data Structure Intelligence

  • Navigate: Complex foreign key relationships between tables
  • Execute: Proper JOIN syntax and WHERE clause construction
  • Optimize: Efficient query execution for production database performance

1.2 The four RAG

This analysis revealed that enterprise text-to-SQL requires four specialized knowledge systems, each solving a different aspect of the intelligence challenge:

Figure 2: 4 RAG with different top k returns

1. SQL Examples RAG Ground truth query patterns that demonstrate how similar questions have been successfully answered, providing templates for complex business logic implementation.

2. Business Logic RAG Domain-specific rules and conventions that aren’t documented anywhere—like how salespeople say “trucks sold” when they mean “revenue generated” (they don’t actually sell trucks), or how “Port” universally means “Port Canaveral” in company terminology.

3. Table Schema RAG Structural database knowledge including table relationships, column definitions, and JOIN requirements—but contextualized for business users rather than database administrators.

4. Unique Values RAG Exact entity matching using okBM25 search to disambiguate terms like distinguishing “Tony the engineer” from “Tony the salesperson” when processing queries with keyword names.

You may be wondering, isn’t this an overkill for such simple question. You are absolutely correct, this approach is good for minimum viable product but a simple semantic cache matching would be much more efficient and faster in the future.

1.3 Quick technical details

It is worth pointing out that Treating LLM responses as structured data components rather than natural language text makes next steps possible.

Using Gemini 2.5 Flash’s structured output with Pydantic validation models, every system response became a perfectly parseable data structure—eliminating the typical integration problems that plague LLM-based systems. (For example, almost garuntees that if I define a function calculator(num_1, num_2), then it wouldn’t incorrectly use it as calculate(n1, n2) which would return an error)

Additionally, using library like Langgraph to abstract model layer for different models (OpenAI, Google, Anthropic) make it easier to switch between models as they update very frequently. Langgraph state memory was useful for keeping contexts organised throughout the pipeline.

2 Agentic workflow

Building on our four-layer RAG foundation, we encountered a critical limitation: what happens when user queries require reasoning beyond simple pattern matching?

Consider these query evolution patterns:

  • RAG-Covered: “How much available 2x4 premium in Vancouver?” → Direct pattern match
  • Multi-Step Required: “How much 4x10x10 T2 in Texas, and how much was sold by Peter?” → Requires inventory lookup + sales attribution
  • Strategic Analysis: “Who is our top customer and what’s their purchase pattern?” → Requires 5+ coordinated queries with business logic
  • Validation Loop: AI will sometimes make mistake in the sql, sometimes syntax error or misspelling column names. Needed an intelligent loop to automatically retry or use get_table_schema_tool to get correct spelling of column names.
Core Innovation: An intelligent decision agent that orchestrates tool usage based on query complexity and available context.

Figure 3: Agentic workflow high level overview

2.3 Production Examples: Multi-Step Query Processing

Actual RAG context returned omitted due to privacy.

2.3.1 Simple one step query

📊 Click to view detailed execution trace
ORIGINAL QUERY: how much did we sell yesterday?
CONVERSATION UUID: ac592ca9-d3f4-4620-9454-fd8a9480a17b
TIMESTAMP: 2025-06-28T13:09:57.553192-07:00

Tool Results:
success=True summary='RAG found tables, 8 knowledge chunks, 5 SQL examples, 7 unique values.' execution_time=0.2920207977294922 error=None

TOOL EXECUTION: sql_execution_tool
----------------------------------------
Parameters:
special_sql_template: total_contract_volume_by_filters
date_from: 2025-06-27 00:00:00
date_to: 2025-06-28 00:00:00
group_by_date: True


FINAL RESPONSE:
----------------------------------------
The user is asking about the total contract/order volume in board feet (fbm) and sales revenue for yesterday, June 27, 2025.

┌─────────────────┬────────────────────┐
│ Total Sale      │ Total Volume (FBM) │
├─────────────────┼────────────────────┤
│ $657,171 USD    │ 1,341,439          │
└─────────────────┴────────────────────┘

DETAILED STEP TIMINGS:
================================================================================
Step #   | Operation                                          | Duration (s)
--------------------------------------------------------------------------------
1        | Decision/Action: agent_think_node                  |         0.00
1        | Action: use_tool (Tool context: rag_tool)          |         0.30
2        | Decision/Action: agent_think_node                  |         1.65
2        | Action: use_tool (Tool context: sql_execution_tool)|         1.34
3        | Decision/Action: agent_think_node                  |         1.40
3        | Action: use_tool (Tool context: final_response_format_tool) |2.22
4        | Decision/Action: prepare_final_response            |         0.00
--------------------------------------------------------------------------------
Total Execution Time: 6.91 seconds

In this simple example,

  1. RAG tool is first executed (step #2)
  2. Decision agent makes a decision to execute a predefined sql (step #2)
  3. Decision agent makes a decision that it is ready to respond to user (step #3)
  4. User sees the answer

2.3.2 Multi-step strategic analysis query

📊 Click to view detailed execution trace
ORIGINAL QUERY: top customers 2025 by sales volume and revenue
CONVERSATION UUID: 903403e0-7e64-41ea-b8f3-fc950df66fd7
TIMESTAMP: 2025-06-28T14:21:03.859822-07:00

Tool Results:
success=True summary='RAG found tables, 8 knowledge chunks, 5 SQL examples, 7 unique values.' execution_time=1.380218267440796 error=None

DECISION AGENT OUTPUT:
----------------------------------------
Decision: use_tool
Tool: sql_generation_tool
Parameters: {'query': 'Find the top customers for the year 2025 by sales volume (fbm) and total revenue. Use the invoice and invoice_row tables, joining with the client table to get customer names. Account for invoice types (1 as positive, others as negative) for accurate revenue and volume calculation. Group by client name and order by total revenue and volume in descending order.', 'context': rag_session_id} 
Reason: The user is asking for top customers by sales volume and revenue for the year 2025. Since this is a historical query (for the entire year 2025), it requires querying the invoice and invoice_row tables. The RAG context provides a relevant knowledge chunk on how to calculate sales by customer using these tables, including handling invoice types for accurate revenue and volume. Therefore, I need to generate SQL to perform this aggregation.

TOOL EXECUTION: sql_execution_tool
----------------------------------------
SQL QUERY:

SELECT
    c.name AS CustomerName,
    curr.name AS Currency,
    SUM(CASE WHEN i.type = 1 THEN ir.fbm ELSE -ir.fbm END) AS TotalVolumeFBM,
    SUM(CASE WHEN i.type = 1 THEN ir.value ELSE -ir.value END) AS TotalRevenue
FROM
    invoice_row AS ir
JOIN
    invoice AS i ON ir.invoice_id = i.id
JOIN
    client AS c ON c.id = i.client_id
JOIN
    currency AS curr ON curr.id = i.currency_id
WHERE
    c.type = 1
    AND YEAR(i.created_on) = 2025
GROUP BY
    c.name,
    curr.name
ORDER BY
    TotalRevenue DESC,
    TotalVolumeFBM DESC;

TOOL RESULT:

SQL RESULTS:
[{"CustomerName":"X1","Currency":"USD","TotalVolumeFBM":22426799.9800000004,"TotalRevenue":13611708.2500000037},{"CustomerName":"X2","Currency":"USD","TotalVolumeFBM":16732286.6799999997,"TotalRevenue":10504964.1800000127},{"CustomerName":"X3","Currency":"USD","TotalVolumeFBM":14947068.4299999978,"TotalRevenue":9093859.2699999921},{"CustomerName":"X4","Currency":"USD","TotalVolumeFBM":8856291.3100000005,"TotalRevenue":5503902.8599999994},{"CustomerName":X5","Currency":"USD","TotalVolumeFBM":6786221.3300000019,"TotalRevenue":4607624.4299999978},{"CustomerName":"X6","Currency":"USD","TotalVolumeFBM":6378095.9800000004,"TotalRevenue":4082330.8199999989},{"CustomerName":"X7","Currency":"USD","TotalVolumeFBM":6445345.3599999994,"TotalRevenue":4049806.4000000022},{"CustomerName":"X8","Currency":"USD","TotalVolumeFBM":6060298.6199999992,"TotalRev...}]

RESULT SUMMARY:
SQL query executed successfully in 0.47s, returned 76 rows.
Success: True

FINAL RESPONSE:
----------------------------------------
The user is asking to identify the top customers based on their total invoice volume in board feet (fbm) and total sales revenue for the year 2025.

┌──────────────────────────────────────┬─────────────────┬─────────────────────┐
│ Customer Name                        │ Total Revenue   │ Total Volume (FBM)  │
├──────────────────────────────────────┼─────────────────┼─────────────────────┤
│ X1                                   │ $13,611,708 USD │ 22,426,800          │
│ X2                                   │ $10,504,964 USD │ 16,732,287          │
│ X3                                   │ $9,093,859 USD  │ 14,947,068          │
│ X4                                   │ $5,503,903 USD  │ 8,856,291           │
│ X5                                   │ $4,607,624 USD  │ 6,786,221           │
│ X6                                   │ $4,082,331 USD  │ 6,378,096           │
│ X7                                   │ $4,049,806 USD  │ 6,445,345           │
│ X8                                   │ $3,680,910 USD  │ 6,060,299           │
│ X9                                   │ $3,511,024 USD  │ 5,747,740           │
│ X10                                  │ $3,001,423 USD  │ 4,769,825           │
└──────────────────────────────────────┴─────────────────┴─────────────────────┘

DETAILED STEP TIMINGS:
================================================================================
Step #   | Operation                                          | Duration (s)
--------------------------------------------------------------------------------
1        | Decision/Action: agent_think_node                  |         0.00
1        | Action: use_tool (Tool context: rag_tool)          |         1.38
2        | Decision/Action: agent_think_node                  |         3.39
2        | Action: use_tool (Tool context: sql_generation_tool)|        12.93
3        | Decision/Action: agent_think_node                  |         0.00
3        | Action: use_tool (Tool context: sql_execution_tool) |         0.47
4        | Decision/Action: agent_think_node                  |         1.50
4        | Action: use_tool (Tool context: final_response_format_tool)|17.16
5        | Decision/Action: agent_decision_node               |         0.00
6        | Decision/Action: prepare_final_response            |         0.00
--------------------------------------------------------------------------------
Total Execution Time: 36.84 seconds

In this complex example,

  1. RAG retrieval identifies relevant tables and business logic patterns (step #1)
  2. Decision agent reasoning analyzes query complexity and determines SQL generation is needed (step #2)
  3. Dynamic SQL generation creates complex joins with business logic for invoice types and customer classification (step #2)
  4. SQL execution processes the generated query against production database (step #3)
  5. Response formatting transforms raw results into business-friendly presentation (step #4)
  6. Quality validation ensures response completeness before delivery (step #5)

3. Claude desktop MCP for demo

This step is optional, we can quickly build a demo by leveraging Claude Desktop and turning our Woody api into a MCP (model context protocol)

Figure 4.1: Leveraging Claude Desktop demo #1

Click to view Figure 3.2: Leveraging Claude Desktop demo #1 (continued)
Leveraging Claude Desktop

Figure 4.2: Leveraging Claude Desktop demo #1 (continued)

4. Value created demos

4.1 Asking bigger question

Let’s try asking the ‘bigger question’ that I promised in the introduction of this notebook. Instead of asking ‘How much sold today?’, let’s think bigger and ask “Who is our top customers in 2025? Strategize around them in upcoming months to increase sales?”

The answer is impressive with easy to understand plots and actionable items such as “Address June Performance Decline: Investigate Customer C’s 90% June drop” which would be a critical point for managers to know.

Figure 4.1: Leveraging Claude Desktop demo #2

Figure 4.2: Leveraging Claude Desktop demo #2 interactive dashboard

4.2 Real user example

One of the accountant chased people everyday for past due invoices.

The question is “can you list past due invoices”. Let’s simply ask Woody to tell us more.

Figure 4.3: Leveraging Claude Desktop demo #3 real user example

5. Impact

5.1 Woody June stats 2025 (limited data due to Woody being fairly new)

Real user query accuracy: 85.1% (total: 177)

+43.21% average Weekly growth of user engagement

  • week 1: 38 total queries 6 users

  • week 2: 63 total queries 7 users

  • week 3: 76 total queries 7 users

6. Conclusion

6.1 The Fundamental Choice: Horizontal vs. Vertical Development

Building Woody taught me that enterprise AI success hinges on a critical strategic choice: going vertical (deep user problem solving) is priority over horizontal (broad feature coverage).

Initially, I chose horizontal development. Faced with 81 tables and 588 columns, I thought comprehensive database coverage would automatically solve user problems. It was a good approach for building mvp and collecting user data initially, but once we analyzed user data, it became clear that we needed to switch to vertical approach.

Consider this simple conversation with one of our salesperson user that I had. I asked him: “If I build an AI, what would you ask it?”

He said: “I’d ask how much sales I made today.”

“Why?”

“Because I want to see if I’m doing better than yesterday.”

“Why?”

“Because I want to see the trend for this week.”

“Why?”

“I have to see if my strategy is working - am I selling too high, too low, that sort of stuff.”

“Why?”

“I have to make sure my manager is happy.”

“Why?”

“Because my manager has to make sure our customers are happy.”

And there it was—every question, no matter how technical it appears on the surface, ultimately traces back to customer satisfaction.

Woody could answer as many basic question such as ‘how much sold today?’, ‘how much in inventory?’ but ultimately, Woody needed to answer much bigger question.

6.2 The Vertical Approach: Solving Real Problems

This insight fundamentally changed our development philosophy. Instead of building horizontally across our database ocean, we went vertical—solving specific, deep problems for specific user types.

The accountant didn’t need complex multi-table analytics. She needed daily reassurance that customer payments were on track, with insights to prevent relationship damage from late payments.

The salesperson didn’t need sophisticated inventory queries. He needed confidence that he could maintain customer relationships while managing rapid-fire negotiations.

The director didn’t need raw performance numbers. He needed early warning systems to help his team deliver better customer service.

None of them were asking for complex SQL with 10 different JOINs. They were asking: “How can I make our customers happier?”

6.3 Finally, let’s ask “how can I make our customers happier?”

Woody does a great job listing actual actionable items with targetted reasons with numbers, not just some generic advises.

AI response to ‘how can I make our customer happier?’ full report

📊 Click to expand/collapse the detailed customer happiness analysis report

Customer Happiness Analysis Report

Executive Summary: Based on analysis of your lumber business data spanning 6-12 months, three major improvement areas have been identified that can significantly enhance customer happiness and retention.

Key Performance Indicators

  • Order Fulfillment Time: 7 days average (last 30 days)
  • Customer Retention Rate: 88.16% repeat customers vs 11.84% new customers
  • Average Order Value: $16,000-$17,000 USD range for major customers
  • Order Completion Rate: 894 orders completed on time in last 30 days

Top Customer Portfolio

Your top 5 customers represent significant volume concentration:

  1. Customer A - 23.1M FBM
  2. Customer B - 17.2M FBM
  3. Customer C - 15.3M FBM
  4. Customer D - 9.6M FBM
  5. Customer E - 6.8M FBM

🚀 THREE KEY RECOMMENDATIONS FOR CUSTOMER HAPPINESS

1. IMPLEMENT PREDICTIVE INVENTORY MANAGEMENT

Problem Identified: Seasonal volume fluctuations (30M FBM peak vs 12M FBM low) suggest inventory challenges

Solution: Develop AI-powered demand forecasting for top-selling products, create automated reorder points, implement real-time inventory visibility for sales team

Customer Impact: Reduced stockouts, faster quote turnaround times, more accurate delivery promises

2. ENHANCE DELIVERY RELIABILITY & COMMUNICATION

Problem Identified: 7-day average fulfillment time with no visibility into on-time delivery rates

Solution: Implement delivery tracking system with customer notifications, create delivery performance dashboards, establish delivery time commitments by product type and region

Customer Impact: Improved planning capability for customers, reduced anxiety about order status, better project scheduling

3. DEVELOP CUSTOMER-SPECIFIC PRICING & SERVICE TIERS

Problem Identified: High customer concentration risk with limited differentiation

Solution: Create tiered service levels based on volume and loyalty, implement dynamic pricing, develop exclusive product access for top-tier customers

📊 Expected Outcomes

Short-term (3-6 months):

  • 15% improvement in order fulfillment time
  • 90% customer satisfaction on delivery reliability
  • 10% increase in repeat order frequency

Medium-term (6-12 months):

  • 20% reduction in customer churn
  • 25% improvement in average order value
  • 95% on-time delivery rate achievement

Long-term (12+ months):

  • 30% increase in customer lifetime value
  • Market leadership in customer service
  • Premium pricing capability vs competitors

🎯 Implementation Priority

  1. Phase 1 (Month 1-2): Delivery tracking and communication system
  2. Phase 2 (Month 2-4): Customer tier segmentation and service levels
  3. Phase 3 (Month 4-6): Predictive inventory management system

This analysis is based on actual business data and industry best practices. Regular monitoring and adjustment of these initiatives will be crucial for maximizing customer happiness and business growth.

TL;DR: Key Action Items

Based on the analysis, here are your key actions with specific reasons:

Immediate Actions (Next 30 Days)

  • Set up delivery tracking notifications - Your 7-day fulfillment time lacks visibility, causing customer anxiety about order status

  • Create inventory alerts for top 5 products - 2x4 and 2x6 Premium grades have 6,000+ orders but no stockout prevention system

  • Segment customers into 3 tiers by volume - 88% repeat customers deserve differentiated service based on their loyalty and volume

Short-term Actions (Next 90 Days)

  • Implement same-day quote response for top 10 customers - Major customers like A (23M FBM) need priority treatment to prevent churn

  • Deploy SMS/email order milestone updates - Customer communication gaps create uncertainty in project planning

  • Create dedicated account manager for Platinum tier - High-value customers generating $16K+ average orders need personal attention

Medium-term Actions (Next 6 Months)

  • Build predictive inventory dashboard - Monthly volume swings from 12M to 30M FBM require forecasting to prevent stockouts

  • Establish 95% on-time delivery KPI - No current delivery performance tracking means you can’t manage what you don’t measure

  • Develop volume-based pricing tiers - Customer retention at 88% could improve with loyalty incentives and exclusive pricing

  • Create customer portal with real-time order tracking - Reduce sales team interruptions from “where’s my order” calls

  • Set up automated reorder points - Prevent stockouts on your 6,471 most popular 2x4 Premium orders