Most teams have data scattered across multiple systems. Your customer transactions live in Snowflake, your product analytics are in BigQuery, and your application database is Postgres. Each system has its own connection requirements, query syntax quirks, and security models. Building AI agents that need data from all three? That's when things get complicated.
Pylar works with all the major data sources—BigQuery, Snowflake, Postgres, and more. This tutorial will walk you through connecting each one, and more importantly, what to consider when building views that span multiple systems. By the end, you'll understand how to unify your data sources so agents can query them seamlessly, regardless of where the data actually lives.
Why This Approach Works
Before we dive into the connection steps, let me explain why Pylar's approach to multi-source data works and how it fits into the bigger picture.
The Problem: Data Silos Create Complexity
Most teams I've worked with have data in multiple places:
- Data warehouses (Snowflake, BigQuery) for analytics
- Application databases (Postgres, MySQL) for operational data
- SaaS tools (HubSpot, Salesforce) for business data
Each system has different:
- Connection methods (service accounts, username/password, OAuth)
- Query syntax (some support certain functions, others don't)
- Security models (IP whitelisting, network policies, firewall rules)
- Performance characteristics (some are fast for analytics, others for transactions)
When you're building AI agents, you need data from all of them. The traditional approach? Write custom code for each system, handle authentication separately, manage connection pools, and hope the queries work the same way everywhere.
Why Pylar's Unified Approach Works
Pylar treats all data sources the same way in your SQL queries. Once connected, you reference them using the same syntax: connection_name.schema.table. Pylar handles the complexity of:
- Different connection protocols
- Different authentication methods
- Cross-database joins (joining BigQuery with Postgres with Snowflake)
- Query optimization across systems
This means:
- One SQL interface: Write queries the same way regardless of source
- Automatic joins: Join data across BigQuery, Snowflake, and Postgres in a single query
- Unified security: Manage access through Pylar's views, not individual database permissions
- Agent-friendly: Agents don't need to know about different systems—they just query views
Think of it like a universal adapter. You plug in different data sources, and Pylar makes them all work together seamlessly.
How Multi-Source Connections Fit into the Bigger Picture
Connecting multiple sources is the foundation of Pylar's "Data → Views → Tools → Agents" flow:
- Data: Connect BigQuery, Snowflake, Postgres, and other sources
- Views: Build unified views that join data across sources
- Tools: Create MCP tools that wrap your views
- Agents: Agents query tools, which query views, which query your connected sources
The key insight: agents don't need to know about BigQuery, Snowflake, or Postgres. They just query your views, and Pylar handles the complexity of joining data across systems.
Step-by-Step Guide
Let's connect BigQuery, Snowflake, and Postgres to Pylar. I'll walk you through each one, highlighting the differences and what to watch for.
Prerequisites
Before we start, make sure you have:
- ✅ A Pylar account with Analyst role or higher
- ✅ Access credentials for each data source you want to connect
- ✅ Ability to whitelist IP addresses (for databases/warehouses)
- ✅ Admin access or someone who can help with firewall/network configuration
Connecting BigQuery
BigQuery uses service account authentication, which is different from username/password. Here's how to set it up.
Step 1: Create a Service Account in Google Cloud
- Go to Google Cloud Console: Navigate to console.cloud.google.com
- Open IAM & Admin: Go to IAM & Admin → Service Accounts
- Create Service Account: Click "Create Service Account"
- Grant Permissions: Give it these roles:
BigQuery Data Viewer(to read data)BigQuery Job User(to run queries)
- Create JSON Key:
- Click on the service account
- Go to Keys tab
- Click Add Key → Create new key
- Select JSON format
- Download the JSON file
Important: Keep this JSON file secure. It provides access to your BigQuery data.
Step 2: Connect in Pylar
- Navigate to Connections: In Pylar, go to the "Connections" tab in the sidebar
- Select BigQuery: Click on the BigQuery icon in the Databases section
- Enter Connection Details:
- Name:
bigquery_production(lowercase, no spaces) - Description: "Production BigQuery warehouse for analytics"
- Project ID: Your Google Cloud Project ID (e.g.,
my-project-id-12345) - Service Account JSON: Paste the entire contents of the JSON key file you downloaded
- Name:

-
Whitelist Pylar IP: Add
34.122.205.142to your Google Cloud firewall rules:- Go to VPC Network → Firewall Rules
- Create a rule allowing
34.122.205.142 - Or configure BigQuery dataset access to allow this IP
-
Submit: Click "Submit" to establish the connection
Step 3: Wait for Indexing
After connecting, Pylar will index your BigQuery data. This may take a few minutes depending on data volume. You'll receive an email when it's complete.
What happens: Pylar indexes metadata (table names, column names, schemas) for autocompletion. Your data stays in BigQuery—queries execute on your BigQuery infrastructure.
Connecting Snowflake
Snowflake uses username/password authentication and requires a specific URL format. Here's how to set it up.
Step 1: Get Your Snowflake Account URL
- Open Snowflake UI: Log in to your Snowflake account
- Find Account URL: Look for your account URL in the browser address bar or Snowflake UI
- Format: It will look like
https://abc12345.us-east-1.snowflakecomputing.com - Trim the URL: Remove
https://and everything after.snowflakecomputing.com- Original:
https://abc12345.us-east-1.snowflakecomputing.com - Trimmed:
abc12345.us-east-1.snowflakecomputing.com
- Original:
Important: Include the full account identifier including the region (e.g., .us-east-1).
Step 2: Whitelist Pylar IP in Snowflake
Critical: Do this before connecting, or the connection will fail.
- Log in as Admin: Access Snowflake as an administrator
- Go to Network Policies: Navigate to Admin → Security → Network Policies
- Create Policy: Create a new network policy or modify an existing one
- Add IP: Add
34.122.205.142to allowed IP addresses - Apply Policy: Apply the network policy to your account or user
Alternative: If you don't have admin access, contact your Snowflake administrator to whitelist the IP.
Step 3: Connect in Pylar
- Navigate to Connections: In Pylar, go to the "Connections" tab
- Select Snowflake: Click on the Snowflake icon
- Enter Connection Details:
- Account URL: The trimmed URL from Step 1 (e.g.,
abc12345.us-east-1.snowflakecomputing.com) - Database Name: The database you want to connect (e.g.,
PRODUCTION,ANALYTICS) - Username: Your Snowflake username
- Password: Your Snowflake password
- Account URL: The trimmed URL from Step 1 (e.g.,

- Test Connection: Click "Test Connection" to verify it works
- Save: Once successful, click "Save" and give it a schema name (e.g.,
snowflake_production)
Step 4: Wait for Indexing
Pylar will index your Snowflake data. You'll receive an email when complete.
Best Practice: Create a dedicated Snowflake user for Pylar with read-only permissions. This enhances security and allows better access control.
Connecting Postgres
Postgres uses standard database credentials (host, port, username, password). Here's how to set it up.
Step 1: Prepare Your Postgres Database
-
Get Connection Details: You'll need:
- Host (e.g.,
db.example.comor192.168.1.100) - Port (default:
5432) - Database name (e.g.,
production,analytics) - Username and password
- Host (e.g.,
-
Create Dedicated User (recommended): Create a Postgres user specifically for Pylar:
CREATE USER pylar_user WITH PASSWORD 'secure_password'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO pylar_user;
Best Practice: Use a read-only user for Pylar. This limits what Pylar can access and improves security.
Step 2: Whitelist Pylar IP
Critical: Whitelist Pylar's IP before connecting.
Option 1: PostgreSQL pg_hba.conf (if you manage the server):
- Edit
pg_hba.confon your Postgres server - Add:
host all all 34.122.205.142/32 md5 - Reload:
sudo systemctl reload postgresql
Option 2: Cloud Provider Firewall (if using cloud-hosted Postgres):
- AWS RDS: Add inbound rule allowing PostgreSQL (port 5432) from
34.122.205.142 - Google Cloud SQL: Add authorized network
34.122.205.142/32 - Azure Database: Add client IP
34.122.205.142in firewall settings
Step 3: Connect in Pylar
- Navigate to Connections: In Pylar, go to the "Connections" tab
- Select PostgreSQL: Click on the PostgreSQL icon
- Enter Connection Details:
- Host: Your Postgres hostname or IP
- Port:
5432(or your custom port) - Database Name: The database you want to connect
- Username: Your Postgres username
- Password: Your Postgres password

- Submit: Click "Submit" to test and save the connection
- Schema Name: Give it a descriptive name (e.g.,
postgres_production)
Step 4: Start Querying
Once connected, you can immediately start querying. Postgres connections don't require indexing like warehouses do.
What happens: Pylar can query your Postgres database directly. Queries execute on your Postgres infrastructure.
Building Views Across Different Systems
Now that you have multiple sources connected, let's talk about building views that span them. This is where Pylar's unified approach really shines.
How Cross-Database Joins Work
Pylar handles cross-database joins automatically. You reference tables from different sources using: connection_name.schema.table.
Example: Join BigQuery analytics with Postgres user data:
SELECT
pg.user_id,
pg.email,
pg.signup_date,
bq.total_events,
bq.last_event_date
FROM postgres_production.users pg
JOIN bigquery_production.analytics.user_events bq
ON pg.user_id = bq.user_id
WHERE pg.is_active = true;
What happens: Pylar executes this query by:
- Querying Postgres for user data
- Querying BigQuery for event data
- Joining the results in memory
- Returning the unified result
The agent doesn't need to know about Postgres or BigQuery—it just queries the view.
Considerations When Building Multi-Source Views
I've seen teams make these mistakes when building views across systems. Here's what to watch for.
Consideration 1: Data Type Mismatches
Different systems use different data types. Postgres might use INTEGER, BigQuery might use INT64, Snowflake might use NUMBER. When joining, ensure types are compatible.
Solution: Use CAST or CONVERT to normalize types:
SELECT
pg.customer_id,
sf.account_id
FROM postgres_production.customers pg
JOIN snowflake_production.accounts sf
ON CAST(pg.customer_id AS VARCHAR) = CAST(sf.account_id AS VARCHAR);
Consideration 2: NULL Handling
Different systems handle NULLs differently. Some use empty strings, others use NULL. When joining, NULLs won't match.
Solution: Use COALESCE to handle NULLs:
SELECT
COALESCE(pg.email, bq.email, sf.email) AS unified_email,
pg.customer_id,
bq.total_revenue,
sf.opportunity_value
FROM postgres_production.customers pg
FULL OUTER JOIN bigquery_production.analytics.revenue bq
ON pg.email = bq.email
FULL OUTER JOIN snowflake_production.sales.opportunities sf
ON COALESCE(pg.email, bq.email) = sf.email;
Consideration 3: Performance Across Systems
Queries that join data across systems can be slower because:
- Data must be fetched from multiple sources
- Joins happen in memory (not in the database)
- Network latency between systems
Solution:
- Filter early: Add WHERE clauses in subqueries to reduce data volume
- Aggregate before joining: Pre-calculate metrics in each source
- Limit date ranges: Only query recent data when possible
Example:
SELECT
pg.customer_id,
pg.email,
bq.revenue_summary.total_revenue,
sf.pipeline_summary.total_opportunities
FROM postgres_production.customers pg
LEFT JOIN (
SELECT
customer_id,
SUM(amount) AS total_revenue
FROM bigquery_production.analytics.transactions
WHERE transaction_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
GROUP BY customer_id
) bq.revenue_summary ON pg.customer_id = bq.revenue_summary.customer_id
LEFT JOIN (
SELECT
account_id,
COUNT(*) AS total_opportunities
FROM snowflake_production.sales.opportunities
WHERE created_date >= DATEADD(day, -90, CURRENT_DATE())
GROUP BY account_id
) sf.pipeline_summary ON pg.customer_id = sf.pipeline_summary.account_id
WHERE pg.is_active = true;
Consideration 4: Schema Differences
Different systems use different naming conventions. Postgres might use snake_case, BigQuery might use camelCase, Snowflake might use UPPER_CASE.
Solution: Use aliases to normalize names:
SELECT
pg.customer_id,
pg.customer_name AS name,
bq.totalRevenue AS total_revenue,
sf.ACCOUNT_STATUS AS account_status
FROM postgres_production.customers pg
JOIN bigquery_production.analytics.revenue bq ON pg.customer_id = bq.customerId
JOIN snowflake_production.accounts sf ON pg.customer_id = sf.ACCOUNT_ID;
Consideration 5: Time Zone Handling
Different systems might store timestamps in different time zones or formats. This can cause issues when joining on time-based fields.
Solution: Normalize timestamps to UTC:
SELECT
pg.user_id,
pg.created_at AS signup_time_utc,
bq.first_event_time AS first_event_utc
FROM postgres_production.users pg
JOIN bigquery_production.analytics.events bq
ON pg.user_id = bq.user_id
AND DATE(pg.created_at) = DATE(bq.first_event_time);
Best Practices for Multi-Source Views
Here's a checklist I use when building views across systems:
- ✅ Start Simple: Begin with two sources, then add more
- ✅ Test Incrementally: Test each join separately before combining
- ✅ Use Descriptive Names: Name connections clearly (
bigquery_prod,postgres_analytics) - ✅ Filter Early: Add WHERE clauses in subqueries to reduce data volume
- ✅ Handle NULLs: Use
COALESCEfor unified identifiers - ✅ Normalize Types: Cast types to ensure compatibility
- ✅ Document Assumptions: Add comments explaining join logic
- ✅ Monitor Performance: Use Pylar Evals to see query performance
Real-World Examples
Let me show you how different teams use multi-source connections in practice.
Example 1: Customer 360 View
A team connects Postgres (application data), BigQuery (analytics), and Snowflake (sales data) to create a unified customer view.
Setup:
- Connect Postgres (customer profiles, signups)
- Connect BigQuery (event analytics, product usage)
- Connect Snowflake (sales pipeline, opportunities)
View:
SELECT
-- Postgres: Customer profile
pg.customer_id,
pg.email,
pg.signup_date,
pg.plan_type,
-- BigQuery: Product usage
bq.total_events,
bq.last_active_date,
bq.feature_usage_count,
-- Snowflake: Sales pipeline
sf.total_opportunities,
sf.total_pipeline_value,
sf.last_opportunity_date,
-- Calculated segments
CASE
WHEN bq.total_events > 100 AND sf.total_pipeline_value > 10000 THEN 'High Value Active'
WHEN bq.total_events > 50 THEN 'Active User'
WHEN sf.total_opportunities > 0 THEN 'In Pipeline'
ELSE 'Low Engagement'
END AS customer_segment
FROM postgres_production.customers pg
LEFT JOIN (
SELECT
user_id,
COUNT(*) AS total_events,
MAX(event_date) AS last_active_date,
COUNT(DISTINCT feature_name) AS feature_usage_count
FROM bigquery_production.analytics.events
WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
GROUP BY user_id
) bq ON pg.customer_id = bq.user_id
LEFT JOIN (
SELECT
account_id,
COUNT(*) AS total_opportunities,
SUM(amount) AS total_pipeline_value,
MAX(created_date) AS last_opportunity_date
FROM snowflake_production.sales.opportunities
WHERE stage != 'Closed Lost'
GROUP BY account_id
) sf ON pg.customer_id = sf.account_id
WHERE pg.is_active = true;
How agents use it: Agents can answer questions like "Show me high-value active customers who haven't been contacted recently" without knowing about Postgres, BigQuery, or Snowflake.
Example 2: Product Analytics Dashboard
A product team connects BigQuery (event data) and Postgres (user data) to analyze feature adoption.
Setup:
- Connect BigQuery (product events, feature usage)
- Connect Postgres (user segments, subscription data)
View:
SELECT
pg.user_segment,
pg.subscription_tier,
bq.feature_name,
COUNT(DISTINCT bq.user_id) AS users_using_feature,
COUNT(*) AS total_events,
AVG(bq.event_value) AS avg_event_value
FROM bigquery_production.analytics.feature_events bq
JOIN postgres_production.users pg
ON bq.user_id = pg.user_id
WHERE bq.event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY pg.user_segment, pg.subscription_tier, bq.feature_name
ORDER BY users_using_feature DESC;
How agents use it: Agents can answer "Which features are most popular among enterprise users?" by querying this view.
Example 3: Revenue Operations View
A RevOps team connects Snowflake (sales data), BigQuery (marketing data), and Postgres (customer data) to track revenue attribution.
Setup:
- Connect Snowflake (sales opportunities, deals)
- Connect BigQuery (marketing campaigns, attribution)
- Connect Postgres (customer accounts, contracts)
View:
SELECT
pg.account_id,
pg.account_name,
pg.contract_value,
pg.contract_start_date,
sf.total_won_deals,
sf.total_revenue,
bq.attributed_campaigns,
bq.marketing_spend,
bq.attribution_revenue
FROM postgres_production.accounts pg
LEFT JOIN (
SELECT
account_id,
COUNT(*) AS total_won_deals,
SUM(amount) AS total_revenue
FROM snowflake_production.sales.deals
WHERE stage = 'Closed Won'
GROUP BY account_id
) sf ON pg.account_id = sf.account_id
LEFT JOIN (
SELECT
account_id,
STRING_AGG(campaign_name, ', ') AS attributed_campaigns,
SUM(spend) AS marketing_spend,
SUM(attributed_revenue) AS attribution_revenue
FROM bigquery_production.marketing.attribution
GROUP BY account_id
) bq ON pg.account_id = bq.account_id
WHERE pg.is_active = true;
How agents use it: Agents can answer "What's the ROI of our marketing campaigns by account?" by querying this unified view.
Notice how each example uses the same pattern: connect sources, build unified views, create tools, agents query tools. The complexity of different systems is hidden from the agents.
Common Pitfalls & Tips
I've seen teams make these mistakes when connecting multiple sources. Here's how to avoid them.
Pitfall 1: Not Whitelisting IP Addresses
This is the #1 reason connections fail. BigQuery, Snowflake, and Postgres all require IP whitelisting.
Why this matters: Without whitelisting, Pylar can't connect to your databases. The connection test will fail, and you'll spend time debugging credentials when the real issue is network access.
How to avoid it: Always whitelist 34.122.205.142 before connecting. Do it first, then test the connection.
Pitfall 2: Using Production Credentials
Don't use your personal database credentials or production admin accounts for Pylar connections.
Why this matters: If credentials are compromised, attackers have full access. Also, you can't revoke access without affecting other systems.
How to avoid it: Create dedicated users for Pylar:
- BigQuery: Service account with
BigQuery Data ViewerandBigQuery Job Userroles - Snowflake: Dedicated user with read-only permissions
- Postgres: Dedicated user with
SELECTpermissions only
Pitfall 3: Not Testing Connections Incrementally
Don't connect all sources at once, then try to build a complex view.
Why this matters: If something breaks, you won't know which connection is the problem. Debugging becomes a nightmare.
How to avoid it: Connect one source at a time:
- Connect BigQuery → Test with a simple query
- Connect Snowflake → Test with a simple query
- Connect Postgres → Test with a simple query
- Build a view joining two sources → Test
- Add the third source → Test
Pitfall 4: Ignoring Performance Implications
Cross-database joins can be slow, especially with large datasets.
Why this matters: Slow queries mean slow agents. Users get frustrated, and you end up optimizing under pressure.
How to avoid it:
- Filter early: Add WHERE clauses in subqueries
- Aggregate before joining: Pre-calculate metrics
- Limit date ranges: Only query recent data
- Test query performance: Use Pylar's query execution time to identify slow queries
Pitfall 5: Not Handling Schema Differences
Different systems use different naming conventions and data types.
Why this matters: Joins fail silently or return incorrect results when types don't match or names are inconsistent.
How to avoid it:
- Normalize names: Use aliases to create consistent column names
- Cast types: Use
CASTorCONVERTto ensure compatibility - Test joins: Verify join keys match before building complex views
Best Practices Summary
Here's a quick checklist:
- ✅ Whitelist IP first: Always whitelist
34.122.205.142before connecting - ✅ Use dedicated users: Create read-only users for Pylar
- ✅ Test incrementally: Connect and test one source at a time
- ✅ Filter early: Add WHERE clauses in subqueries
- ✅ Handle NULLs: Use
COALESCEfor unified identifiers - ✅ Normalize types: Cast types to ensure compatibility
- ✅ Monitor performance: Use Evals to see query execution times
- ✅ Document assumptions: Add comments explaining join logic
Next Steps
You've connected BigQuery, Snowflake, and Postgres to Pylar. That's the foundation. Now you can:
-
Build unified views: Create views that join data across all three sources, giving agents a single source of truth.
-
Create MCP tools: Turn your views into MCP tools that agents can use. The previous tutorials covered this.
-
Monitor usage: Use Pylar Evals to see how agents are querying your multi-source views and identify optimization opportunities.
-
Iterate and improve: As you learn how agents use your views, refine them to better match their needs. Add filters, optimize joins, or add calculated fields.
The key is to start simple and iterate. Your first multi-source view doesn't need to be perfect—it just needs to work. You can always refine it later based on real usage.
If you want to keep going, the next step is building views that span multiple sources. That's where you'll see the real value—unifying your data so agents can answer questions that require data from multiple systems.
Frequently Asked Questions
Can I connect multiple instances of the same source?
Do I need to whitelist the IP for each connection?
How do cross-database joins work?
Will joining data across systems be slow?
Can I join BigQuery with Snowflake with Postgres in one query?
What if my connection credentials change?
Can I see which queries are hitting which sources?
What if I need to connect a source that's not listed?
Related Posts
How to Build Your First Data View in Pylar
Get started with Pylar in minutes. We'll walk you through creating your first SQL view, connecting a data source, and setting up basic access controls.
How to Build Your First MCP Tool on a Data View
Turn your data view into an MCP tool that agents can actually use. This step-by-step guide shows you how to publish a view as a tool in under 10 minutes.
How to Publish a Pylar Tool to the OpenAI Agent Builder
Deploy your Pylar MCP tool to OpenAI's Agent Builder so you can use it in custom GPTs and agent workflows. We'll show you exactly how to connect and test it.
