We've seen teams rush to connect AI agents directly to databases, only to discover the real costs: security risks, governance nightmares, and agents making expensive mistakes. Here's what we learned and why a structured layer matters.
The Promise vs. The Reality
When we first started building AI agents, the path seemed obvious: connect them directly to your database, give them SQL access, and watch them work magic. After all, agents are smart. They can write queries. They understand data structures. What could go wrong?
A lot, as it turns out.
I've watched teams spend weeks building agent workflows, only to discover that giving agents raw database access creates problems that compound over time. The initial setup feels fast—just a connection string and you're done. But the hidden costs start showing up in week two, month three, and they never stop.
The real issue isn't that agents can't query databases. They can. The problem is that agents don't understand your business logic, your compliance requirements, or your data relationships. They'll write queries that work, but they might expose sensitive data, violate regulations, or create performance nightmares.
What Happens When Agents Write Their Own Queries
Let me walk you through what we've seen happen in production.
The Incorrect Join Problem
Agents are great at pattern matching, but they're not great at understanding your specific data model. I've seen agents write queries like this:
SELECT
customers.email,
orders.total,
payments.card_number
FROM customers
JOIN orders ON customers.id = orders.customer_id
JOIN payments ON orders.id = payments.order_id
WHERE customers.status = 'active'
Looks fine, right? Except in this case, the payments table has a one-to-many relationship with orders, and the agent just pulled every payment record for every order. The customer got back 47 payment records when they should have gotten one. The agent didn't understand that it needed to aggregate or filter.
Worse, it exposed card_number—data that should never leave your database in plain text, let alone be returned to an AI agent that might log it or include it in responses.
The Sensitive Data Leak
Here's another one we caught early: an agent was asked to "get customer information for support." The agent wrote:
SELECT * FROM customers WHERE email = 'user@example.com'
That SELECT * pulled everything: email, phone, address, internal notes, credit scores, account balances, and notes from the sales team that included competitive intelligence. The agent had no way to know that internal_notes shouldn't be exposed. It just queried what was available.
The Performance Killer
Agents don't think about query optimization. They think about getting results. I've seen agents write queries that:
- Scan entire tables without indexes
- Create Cartesian products with multiple joins
- Run aggregations on millions of rows without limits
- Execute N+1 query patterns without realizing it
One team we worked with had an agent that was querying their customer table with a full table scan on every request. It worked fine in development with 1,000 customers. In production with 2 million customers? The database started timing out. The agent kept retrying, creating a cascade of failures.
The Security Costs You Don't See Coming
When you give an agent direct database access, you're not just giving it read access. You're giving it the ability to:
Access Any Table
There's no way to tell an agent "you can query customers, but not the internal_audit_log table." Once it has database credentials, it can query anything. You might think you've locked it down with database user permissions, but:
- Database permissions are coarse-grained. You can't easily say "this agent can only see customers from the last 90 days" or "this agent can't see PII fields."
- Agents will explore. They'll query tables you didn't intend them to access, just to understand the schema.
- Mistakes happen. An agent might query the wrong table, or join to a table that contains sensitive data.
Expose Credentials
This one seems obvious, but it's worth saying: when you give an agent database credentials, those credentials are stored somewhere. In environment variables, in config files, in the agent's memory. Every place you store credentials is a potential leak point.
We've seen teams store database passwords in:
- Environment variables that get committed to Git
- Config files that get shared in Slack
- Agent builder platforms that log all inputs (including connection strings)
Create Audit Trail Gaps
When an agent queries your database directly, those queries might not show up in your application logs. They might not be tied to a user session. They might not include context about why the query was made.
If you need to answer "who accessed this customer's data and why?" you might not be able to. The agent made the query, but you don't have the context of what the user asked for, what the agent was trying to accomplish, or whether the query was appropriate.
The Governance Nightmare
Compliance isn't optional. Learn more about why governed access matters for AI agents. GDPR, HIPAA, SOC 2, PCI-DSS—they all require you to know who accessed what data, when, and why.
The "Who" Problem
When an agent queries your database, who is the user? Is it the person who asked the question? The agent itself? The system? Your audit logs need to show a real person, but agents blur that line.
The "What" Problem
You need to prove that agents only access data they're allowed to access. But with direct database access, how do you enforce that? Database-level permissions are too coarse. You can't say "this agent can only see customers in the US" or "this agent can't see credit card numbers."
The "Why" Problem
Compliance frameworks want to know why data was accessed. Was it for a legitimate business purpose? With agents making queries autonomously, it's hard to tie each query back to a specific user request or business need.
The "When" Problem
Some regulations require you to limit data retention. You might need to ensure agents can't query data older than 90 days. With direct database access, you'd need to modify every query the agent makes, or create complex database views for every possible use case.
The Performance Tax
Agents don't think about your database's capacity. They think about getting answers.
The Unbounded Query
Agents will write queries without limits. "Get all customers" becomes a query that returns 2 million rows. "Get all orders" becomes a query that scans your entire orders table.
We've seen agents:
- Query entire tables to "understand the data"
- Run expensive aggregations without date filters
- Create queries that lock tables for minutes
- Generate so many queries that they exhaust connection pools
The Cascade Failure
When an agent's query is slow, the agent might retry. If it retries 3 times, and 10 users are using the agent simultaneously, you've got 30 slow queries running at once. Your database starts to struggle. Other applications start timing out. The problem cascades.
The Cost Explosion
If you're using a cloud database like BigQuery or Snowflake, every query costs money. Agents that write inefficient queries can rack up bills fast. We've seen teams get surprised by 10x increases in database costs after giving agents direct access.
One team had an agent that was running a full table scan on a 500GB table every time it needed to "verify customer information." That query cost $50 each time. The agent was being called 100 times a day. You do the math.
Real Costs: Time, Money, and Trust
Let's talk about the actual costs we've seen teams pay.
Time Costs
Week 1-2: Setting up direct database access feels fast. Maybe 2 hours of work.
Week 3-4: You start noticing issues. Queries are slow. Agents are returning wrong data. You spend 10 hours debugging.
Month 2: You realize you need governance. You spend 40 hours building database views, setting up permissions, creating audit logs.
Month 3: Compliance asks for a data access audit. You spend 20 hours trying to figure out what agents accessed and why.
Month 6: A security review finds that agents have been accessing sensitive data. You spend 80 hours fixing the issue, updating all your queries, and documenting what happened.
Total: 150+ hours that could have been avoided.
Money Costs
- Database costs: Inefficient queries can 5-10x your database bills
- Engineering time: Debugging and fixing issues costs $50-200/hour
- Compliance fines: GDPR violations can be up to 4% of annual revenue
- Security incidents: Data breaches cost an average of $4.45 million (IBM, 2023)
Trust Costs
When agents return wrong data, users stop trusting them. When agents expose sensitive information, security teams stop trusting your approach. When compliance finds gaps, leadership stops trusting the project.
Rebuilding trust takes months, sometimes years.
Why Structured Endpoints Solve This
The solution isn't to avoid giving agents data access. It's to give them the right kind of access.
Instead of raw database access, agents should query through structured endpoints—governed views that you define, control, and monitor.
What Structured Endpoints Are
Think of a structured endpoint as a window into your data. You define exactly what the agent can see through that window. The agent can query through the window, but it can't see anything outside of it.
In practice, this means:
- You define SQL views that specify exactly what data agents can access
- You control the schema—what columns are included, what rows are filtered
- You govern access—who can query what, when, and why
- You monitor everything—every query is logged, every access is tracked
The Benefits
Security: Agents can only access data you've explicitly allowed. No accidental exposure of sensitive tables or columns.
Governance: Every query goes through your defined views, so you have complete audit trails. You know who accessed what, when, and through which view.
Performance: You can optimize your views for common query patterns. You can add indexes, set limits, and cache results.
Compliance: You can prove that agents only access appropriate data. You can show auditors exactly what data is available and how access is controlled.
Iteration: When you need to change what agents can access, you update the view. All agents automatically get the update. No code changes needed.
How Pylar Prevents These Costs
Pylar is built around the idea of structured endpoints. Here's how it prevents the costs we've been talking about.
Views Are the Only Access Level
In Pylar, agents never get raw database access. They can only query through SQL views that you create in Pylar's SQL IDE. This means:
- Agents can't query tables you haven't included in a view
- Agents can't access columns you've excluded from a view
- Agents can't join to tables that expose sensitive data
- Every query goes through your governance layer
For example, instead of giving an agent access to your entire customers table, you might create a view like this:
CREATE VIEW customer_support_view AS
SELECT
customer_id,
email,
name,
signup_date,
subscription_status,
last_login_date
FROM customers
WHERE is_active = true
AND signup_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 YEAR);
This view:
- Only includes columns the agent needs
- Filters out inactive customers
- Limits to customers from the last 2 years (for compliance)
- Excludes sensitive fields like
credit_card_numberorinternal_notes
The agent can query this view, but it can't access anything outside of it.
Built-in Governance
Pylar views are governed by design. When you create a view, you're making an explicit decision about what data agents can access. This decision is:
- Documented: The view definition is the documentation
- Versioned: You can see how views have changed over time
- Auditable: Every query against a view is logged
- Testable: You can test views before agents use them
MCP Tools with Controlled Parameters
Once you have a view, you create MCP tools on top of it. These tools define how agents interact with your data. You can use Pylar's AI to create tools from natural language, or configure them manually. Learn more in our MCP tools documentation.
For example, you might create a tool called get_customer_info that:
- Takes a customer email as input
- Queries your
customer_support_view - Returns only the fields the agent needs
- Includes error handling and validation
The tool acts as another layer of control. Even if an agent tries to query the view directly, it has to go through the tool, which validates inputs and controls outputs.
Cross-Database Joins Without Exposure
One of Pylar's powerful features is the ability to join data across multiple databases in a single view. For example, you might join:
- Customer data from HubSpot (your CRM)
- Order data from Snowflake (your data warehouse)
- Support ticket data from Zendesk
All in one view, without exposing credentials or giving agents access to the underlying systems.
SELECT
h.customer_id,
h.customer_name,
h.email,
s.order_count,
s.total_revenue,
z.open_tickets,
z.last_ticket_date
FROM hubspot.customers h
LEFT JOIN snowflake.order_summary s
ON h.email = s.customer_email
LEFT JOIN zendesk.ticket_summary z
ON h.email = z.customer_email
WHERE h.is_active = true;
The agent gets a unified view of customer data, but it never has direct access to HubSpot, Snowflake, or Zendesk. You control what data is joined and how.
Evals: Visibility into Agent Behavior
Pylar's Evals system gives you complete visibility into how agents are using your data. You can see:
- Success rates: How often queries succeed vs. fail
- Error patterns: What errors occur and why
- Query shapes: What types of queries agents are making
- Raw logs: Every query with full context
This visibility lets you:
- Identify problems before they become incidents
- Optimize views based on actual usage patterns
- Prove compliance with audit trails
- Improve agent performance over time
For example, if you see that agents are frequently querying for customers by email, you might optimize your view to include an index on email. Or if you see errors when agents try to query deleted customers, you might update your view to handle that case.
Publishing Once, Using Everywhere
When you publish a Pylar tool, you get:
- An MCP server URL
- An authorization token
You can use these credentials to connect the tool to any agent builder:
- Claude Desktop
- Cursor
- LangGraph
- OpenAI Agent Builder
- Zapier
- Make
- n8n
- Any MCP-compatible platform
The key benefit: you update your views and tools in Pylar, and all connected agents automatically get the updates. No need to redeploy code or update configurations in multiple places.
The Path Forward
If you're thinking about giving agents database access, here's what I'd recommend:
Start with Views, Not Raw Access
Before you connect an agent to your database, create a view that defines exactly what data the agent needs. Start narrow—include only the columns and rows the agent actually needs. You can always expand later.
Test with Non-Sensitive Data First
Create views that exclude sensitive data. Test your agents with these views. Once you're confident the agents work correctly, you can gradually add more data if needed.
Monitor Everything
Use Pylar's Evals (or similar observability) to watch how agents use your data. Look for:
- Queries that fail frequently
- Queries that are slow
- Queries that return unexpected results
- Patterns that suggest agents are confused about your data model
Iterate Based on Real Usage
Don't try to build the perfect view upfront. Start simple, deploy to agents, watch how they use it, and iterate. The Evals data will tell you what to optimize.
Think About Compliance from Day One
If you're in a regulated industry, think about compliance requirements when you create views. Can you prove agents only access appropriate data? Can you audit what was accessed? Can you limit data retention?
Pylar makes this easier, but you still need to think about it.
Frequently Asked Questions
Can't I just use database user permissions to restrict access?
What if I need to change what data agents can access?
How do I know if agents are using my data correctly?
What if I need to join data from multiple sources?
How do I connect Pylar tools to my agent builder?
What if an agent writes a bad query?
How do I handle compliance requirements?
Can I use Pylar with my existing agent infrastructure?
The question isn't whether you'll need governance for AI agents—it's whether you'll build it before or after your first costly mistake. A structured data layer like Pylar gives you the control you need without sacrificing the speed you want.
If you're building AI agents that need data access, start with views, not raw database connections. Your future self will thank you.
Related Posts
Why Agent Projects Fail (and How Data Structure Fixes It)
Most AI agent projects fail not because of the models, but because agents can't reliably access the right data at the right time. We break down the common failure patterns and how structured data views solve them.
The Rise of Internal AI Agents for Ops, RevOps, and Support
Internal AI agents are becoming the new operating system for modern teams. We explore how ops, RevOps, and support teams are using agents to automate workflows and get answers faster.
Structured Endpoints: The Missing Layer Between Data and AI Agents
APIs are too rigid, databases are too risky. We believe structured endpoints—governed views that agents can query safely—are the missing piece that makes AI agents actually work in production.