Using Pylar with BigQuery, Snowflake, and Postgres

by Hoshang Mehta

Most teams have data scattered across multiple systems. Your customer transactions live in Snowflake, your product analytics are in BigQuery, and your application database is Postgres. Each system has its own connection requirements, query syntax quirks, and security models. Building AI agents that need data from all three? That's when things get complicated.

Pylar works with all the major data sources—BigQuery, Snowflake, Postgres, and more. This tutorial will walk you through connecting each one, and more importantly, what to consider when building views that span multiple systems. By the end, you'll understand how to unify your data sources so agents can query them seamlessly, regardless of where the data actually lives.


Why This Approach Works

Before we dive into the connection steps, let me explain why Pylar's approach to multi-source data works and how it fits into the bigger picture.

The Problem: Data Silos Create Complexity

Most teams I've worked with have data in multiple places:

  • Data warehouses (Snowflake, BigQuery) for analytics
  • Application databases (Postgres, MySQL) for operational data
  • SaaS tools (HubSpot, Salesforce) for business data

Each system has different:

  • Connection methods (service accounts, username/password, OAuth)
  • Query syntax (some support certain functions, others don't)
  • Security models (IP whitelisting, network policies, firewall rules)
  • Performance characteristics (some are fast for analytics, others for transactions)

When you're building AI agents, you need data from all of them. The traditional approach? Write custom code for each system, handle authentication separately, manage connection pools, and hope the queries work the same way everywhere.

Why Pylar's Unified Approach Works

Pylar treats all data sources the same way in your SQL queries. Once connected, you reference them using the same syntax: connection_name.schema.table. Pylar handles the complexity of:

  • Different connection protocols
  • Different authentication methods
  • Cross-database joins (joining BigQuery with Postgres with Snowflake)
  • Query optimization across systems

This means:

  • One SQL interface: Write queries the same way regardless of source
  • Automatic joins: Join data across BigQuery, Snowflake, and Postgres in a single query
  • Unified security: Manage access through Pylar's views, not individual database permissions
  • Agent-friendly: Agents don't need to know about different systems—they just query views

Think of it like a universal adapter. You plug in different data sources, and Pylar makes them all work together seamlessly.

How Multi-Source Connections Fit into the Bigger Picture

Connecting multiple sources is the foundation of Pylar's "Data → Views → Tools → Agents" flow:

  1. Data: Connect BigQuery, Snowflake, Postgres, and other sources
  2. Views: Build unified views that join data across sources
  3. Tools: Create MCP tools that wrap your views
  4. Agents: Agents query tools, which query views, which query your connected sources

The key insight: agents don't need to know about BigQuery, Snowflake, or Postgres. They just query your views, and Pylar handles the complexity of joining data across systems.


Step-by-Step Guide

Let's connect BigQuery, Snowflake, and Postgres to Pylar. I'll walk you through each one, highlighting the differences and what to watch for.

Prerequisites

Before we start, make sure you have:

  • ✅ A Pylar account with Analyst role or higher
  • ✅ Access credentials for each data source you want to connect
  • ✅ Ability to whitelist IP addresses (for databases/warehouses)
  • ✅ Admin access or someone who can help with firewall/network configuration

Connecting BigQuery

BigQuery uses service account authentication, which is different from username/password. Here's how to set it up.

Step 1: Create a Service Account in Google Cloud

  1. Go to Google Cloud Console: Navigate to console.cloud.google.com
  2. Open IAM & Admin: Go to IAM & AdminService Accounts
  3. Create Service Account: Click "Create Service Account"
  4. Grant Permissions: Give it these roles:
    • BigQuery Data Viewer (to read data)
    • BigQuery Job User (to run queries)
  5. Create JSON Key:
    • Click on the service account
    • Go to Keys tab
    • Click Add KeyCreate new key
    • Select JSON format
    • Download the JSON file

Important: Keep this JSON file secure. It provides access to your BigQuery data.

Step 2: Connect in Pylar

  1. Navigate to Connections: In Pylar, go to the "Connections" tab in the sidebar
  2. Select BigQuery: Click on the BigQuery icon in the Databases section
  3. Enter Connection Details:
    • Name: bigquery_production (lowercase, no spaces)
    • Description: "Production BigQuery warehouse for analytics"
    • Project ID: Your Google Cloud Project ID (e.g., my-project-id-12345)
    • Service Account JSON: Paste the entire contents of the JSON key file you downloaded

BigQuery connection setup screen

  1. Whitelist Pylar IP: Add 34.122.205.142 to your Google Cloud firewall rules:

    • Go to VPC NetworkFirewall Rules
    • Create a rule allowing 34.122.205.142
    • Or configure BigQuery dataset access to allow this IP
  2. Submit: Click "Submit" to establish the connection

Step 3: Wait for Indexing

After connecting, Pylar will index your BigQuery data. This may take a few minutes depending on data volume. You'll receive an email when it's complete.

What happens: Pylar indexes metadata (table names, column names, schemas) for autocompletion. Your data stays in BigQuery—queries execute on your BigQuery infrastructure.

Connecting Snowflake

Snowflake uses username/password authentication and requires a specific URL format. Here's how to set it up.

Step 1: Get Your Snowflake Account URL

  1. Open Snowflake UI: Log in to your Snowflake account
  2. Find Account URL: Look for your account URL in the browser address bar or Snowflake UI
  3. Format: It will look like https://abc12345.us-east-1.snowflakecomputing.com
  4. Trim the URL: Remove https:// and everything after .snowflakecomputing.com
    • Original: https://abc12345.us-east-1.snowflakecomputing.com
    • Trimmed: abc12345.us-east-1.snowflakecomputing.com

Important: Include the full account identifier including the region (e.g., .us-east-1).

Step 2: Whitelist Pylar IP in Snowflake

Critical: Do this before connecting, or the connection will fail.

  1. Log in as Admin: Access Snowflake as an administrator
  2. Go to Network Policies: Navigate to AdminSecurityNetwork Policies
  3. Create Policy: Create a new network policy or modify an existing one
  4. Add IP: Add 34.122.205.142 to allowed IP addresses
  5. Apply Policy: Apply the network policy to your account or user

Alternative: If you don't have admin access, contact your Snowflake administrator to whitelist the IP.

Step 3: Connect in Pylar

  1. Navigate to Connections: In Pylar, go to the "Connections" tab
  2. Select Snowflake: Click on the Snowflake icon
  3. Enter Connection Details:
    • Account URL: The trimmed URL from Step 1 (e.g., abc12345.us-east-1.snowflakecomputing.com)
    • Database Name: The database you want to connect (e.g., PRODUCTION, ANALYTICS)
    • Username: Your Snowflake username
    • Password: Your Snowflake password

Snowflake connection setup screen

  1. Test Connection: Click "Test Connection" to verify it works
  2. Save: Once successful, click "Save" and give it a schema name (e.g., snowflake_production)

Step 4: Wait for Indexing

Pylar will index your Snowflake data. You'll receive an email when complete.

Best Practice: Create a dedicated Snowflake user for Pylar with read-only permissions. This enhances security and allows better access control.

Connecting Postgres

Postgres uses standard database credentials (host, port, username, password). Here's how to set it up.

Step 1: Prepare Your Postgres Database

  1. Get Connection Details: You'll need:

    • Host (e.g., db.example.com or 192.168.1.100)
    • Port (default: 5432)
    • Database name (e.g., production, analytics)
    • Username and password
  2. Create Dedicated User (recommended): Create a Postgres user specifically for Pylar:

    CREATE USER pylar_user WITH PASSWORD 'secure_password';
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO pylar_user;
    

Best Practice: Use a read-only user for Pylar. This limits what Pylar can access and improves security.

Step 2: Whitelist Pylar IP

Critical: Whitelist Pylar's IP before connecting.

Option 1: PostgreSQL pg_hba.conf (if you manage the server):

  1. Edit pg_hba.conf on your Postgres server
  2. Add: host all all 34.122.205.142/32 md5
  3. Reload: sudo systemctl reload postgresql

Option 2: Cloud Provider Firewall (if using cloud-hosted Postgres):

  • AWS RDS: Add inbound rule allowing PostgreSQL (port 5432) from 34.122.205.142
  • Google Cloud SQL: Add authorized network 34.122.205.142/32
  • Azure Database: Add client IP 34.122.205.142 in firewall settings

Step 3: Connect in Pylar

  1. Navigate to Connections: In Pylar, go to the "Connections" tab
  2. Select PostgreSQL: Click on the PostgreSQL icon
  3. Enter Connection Details:
    • Host: Your Postgres hostname or IP
    • Port: 5432 (or your custom port)
    • Database Name: The database you want to connect
    • Username: Your Postgres username
    • Password: Your Postgres password

PostgreSQL connection setup screen

  1. Submit: Click "Submit" to test and save the connection
  2. Schema Name: Give it a descriptive name (e.g., postgres_production)

Step 4: Start Querying

Once connected, you can immediately start querying. Postgres connections don't require indexing like warehouses do.

What happens: Pylar can query your Postgres database directly. Queries execute on your Postgres infrastructure.


Building Views Across Different Systems

Now that you have multiple sources connected, let's talk about building views that span them. This is where Pylar's unified approach really shines.

How Cross-Database Joins Work

Pylar handles cross-database joins automatically. You reference tables from different sources using: connection_name.schema.table.

Example: Join BigQuery analytics with Postgres user data:

SELECT 
  pg.user_id,
  pg.email,
  pg.signup_date,
  bq.total_events,
  bq.last_event_date
FROM postgres_production.users pg
JOIN bigquery_production.analytics.user_events bq
  ON pg.user_id = bq.user_id
WHERE pg.is_active = true;

What happens: Pylar executes this query by:

  1. Querying Postgres for user data
  2. Querying BigQuery for event data
  3. Joining the results in memory
  4. Returning the unified result

The agent doesn't need to know about Postgres or BigQuery—it just queries the view.

Considerations When Building Multi-Source Views

I've seen teams make these mistakes when building views across systems. Here's what to watch for.

Consideration 1: Data Type Mismatches

Different systems use different data types. Postgres might use INTEGER, BigQuery might use INT64, Snowflake might use NUMBER. When joining, ensure types are compatible.

Solution: Use CAST or CONVERT to normalize types:

SELECT 
  pg.customer_id,
  sf.account_id
FROM postgres_production.customers pg
JOIN snowflake_production.accounts sf
  ON CAST(pg.customer_id AS VARCHAR) = CAST(sf.account_id AS VARCHAR);

Consideration 2: NULL Handling

Different systems handle NULLs differently. Some use empty strings, others use NULL. When joining, NULLs won't match.

Solution: Use COALESCE to handle NULLs:

SELECT 
  COALESCE(pg.email, bq.email, sf.email) AS unified_email,
  pg.customer_id,
  bq.total_revenue,
  sf.opportunity_value
FROM postgres_production.customers pg
FULL OUTER JOIN bigquery_production.analytics.revenue bq
  ON pg.email = bq.email
FULL OUTER JOIN snowflake_production.sales.opportunities sf
  ON COALESCE(pg.email, bq.email) = sf.email;

Consideration 3: Performance Across Systems

Queries that join data across systems can be slower because:

  • Data must be fetched from multiple sources
  • Joins happen in memory (not in the database)
  • Network latency between systems

Solution:

  • Filter early: Add WHERE clauses in subqueries to reduce data volume
  • Aggregate before joining: Pre-calculate metrics in each source
  • Limit date ranges: Only query recent data when possible

Example:

SELECT 
  pg.customer_id,
  pg.email,
  bq.revenue_summary.total_revenue,
  sf.pipeline_summary.total_opportunities
FROM postgres_production.customers pg
LEFT JOIN (
  SELECT 
    customer_id,
    SUM(amount) AS total_revenue
  FROM bigquery_production.analytics.transactions
  WHERE transaction_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
  GROUP BY customer_id
) bq.revenue_summary ON pg.customer_id = bq.revenue_summary.customer_id
LEFT JOIN (
  SELECT 
    account_id,
    COUNT(*) AS total_opportunities
  FROM snowflake_production.sales.opportunities
  WHERE created_date >= DATEADD(day, -90, CURRENT_DATE())
  GROUP BY account_id
) sf.pipeline_summary ON pg.customer_id = sf.pipeline_summary.account_id
WHERE pg.is_active = true;

Consideration 4: Schema Differences

Different systems use different naming conventions. Postgres might use snake_case, BigQuery might use camelCase, Snowflake might use UPPER_CASE.

Solution: Use aliases to normalize names:

SELECT 
  pg.customer_id,
  pg.customer_name AS name,
  bq.totalRevenue AS total_revenue,
  sf.ACCOUNT_STATUS AS account_status
FROM postgres_production.customers pg
JOIN bigquery_production.analytics.revenue bq ON pg.customer_id = bq.customerId
JOIN snowflake_production.accounts sf ON pg.customer_id = sf.ACCOUNT_ID;

Consideration 5: Time Zone Handling

Different systems might store timestamps in different time zones or formats. This can cause issues when joining on time-based fields.

Solution: Normalize timestamps to UTC:

SELECT 
  pg.user_id,
  pg.created_at AS signup_time_utc,
  bq.first_event_time AS first_event_utc
FROM postgres_production.users pg
JOIN bigquery_production.analytics.events bq
  ON pg.user_id = bq.user_id
  AND DATE(pg.created_at) = DATE(bq.first_event_time);

Best Practices for Multi-Source Views

Here's a checklist I use when building views across systems:

  • Start Simple: Begin with two sources, then add more
  • Test Incrementally: Test each join separately before combining
  • Use Descriptive Names: Name connections clearly (bigquery_prod, postgres_analytics)
  • Filter Early: Add WHERE clauses in subqueries to reduce data volume
  • Handle NULLs: Use COALESCE for unified identifiers
  • Normalize Types: Cast types to ensure compatibility
  • Document Assumptions: Add comments explaining join logic
  • Monitor Performance: Use Pylar Evals to see query performance

Real-World Examples

Let me show you how different teams use multi-source connections in practice.

Example 1: Customer 360 View

A team connects Postgres (application data), BigQuery (analytics), and Snowflake (sales data) to create a unified customer view.

Setup:

  1. Connect Postgres (customer profiles, signups)
  2. Connect BigQuery (event analytics, product usage)
  3. Connect Snowflake (sales pipeline, opportunities)

View:

SELECT 
  -- Postgres: Customer profile
  pg.customer_id,
  pg.email,
  pg.signup_date,
  pg.plan_type,
  
  -- BigQuery: Product usage
  bq.total_events,
  bq.last_active_date,
  bq.feature_usage_count,
  
  -- Snowflake: Sales pipeline
  sf.total_opportunities,
  sf.total_pipeline_value,
  sf.last_opportunity_date,
  
  -- Calculated segments
  CASE
    WHEN bq.total_events > 100 AND sf.total_pipeline_value > 10000 THEN 'High Value Active'
    WHEN bq.total_events > 50 THEN 'Active User'
    WHEN sf.total_opportunities > 0 THEN 'In Pipeline'
    ELSE 'Low Engagement'
  END AS customer_segment

FROM postgres_production.customers pg
LEFT JOIN (
  SELECT 
    user_id,
    COUNT(*) AS total_events,
    MAX(event_date) AS last_active_date,
    COUNT(DISTINCT feature_name) AS feature_usage_count
  FROM bigquery_production.analytics.events
  WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
  GROUP BY user_id
) bq ON pg.customer_id = bq.user_id
LEFT JOIN (
  SELECT 
    account_id,
    COUNT(*) AS total_opportunities,
    SUM(amount) AS total_pipeline_value,
    MAX(created_date) AS last_opportunity_date
  FROM snowflake_production.sales.opportunities
  WHERE stage != 'Closed Lost'
  GROUP BY account_id
) sf ON pg.customer_id = sf.account_id
WHERE pg.is_active = true;

How agents use it: Agents can answer questions like "Show me high-value active customers who haven't been contacted recently" without knowing about Postgres, BigQuery, or Snowflake.

Example 2: Product Analytics Dashboard

A product team connects BigQuery (event data) and Postgres (user data) to analyze feature adoption.

Setup:

  1. Connect BigQuery (product events, feature usage)
  2. Connect Postgres (user segments, subscription data)

View:

SELECT 
  pg.user_segment,
  pg.subscription_tier,
  bq.feature_name,
  COUNT(DISTINCT bq.user_id) AS users_using_feature,
  COUNT(*) AS total_events,
  AVG(bq.event_value) AS avg_event_value
FROM bigquery_production.analytics.feature_events bq
JOIN postgres_production.users pg
  ON bq.user_id = pg.user_id
WHERE bq.event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY pg.user_segment, pg.subscription_tier, bq.feature_name
ORDER BY users_using_feature DESC;

How agents use it: Agents can answer "Which features are most popular among enterprise users?" by querying this view.

Example 3: Revenue Operations View

A RevOps team connects Snowflake (sales data), BigQuery (marketing data), and Postgres (customer data) to track revenue attribution.

Setup:

  1. Connect Snowflake (sales opportunities, deals)
  2. Connect BigQuery (marketing campaigns, attribution)
  3. Connect Postgres (customer accounts, contracts)

View:

SELECT 
  pg.account_id,
  pg.account_name,
  pg.contract_value,
  pg.contract_start_date,
  
  sf.total_won_deals,
  sf.total_revenue,
  
  bq.attributed_campaigns,
  bq.marketing_spend,
  bq.attribution_revenue

FROM postgres_production.accounts pg
LEFT JOIN (
  SELECT 
    account_id,
    COUNT(*) AS total_won_deals,
    SUM(amount) AS total_revenue
  FROM snowflake_production.sales.deals
  WHERE stage = 'Closed Won'
  GROUP BY account_id
) sf ON pg.account_id = sf.account_id
LEFT JOIN (
  SELECT 
    account_id,
    STRING_AGG(campaign_name, ', ') AS attributed_campaigns,
    SUM(spend) AS marketing_spend,
    SUM(attributed_revenue) AS attribution_revenue
  FROM bigquery_production.marketing.attribution
  GROUP BY account_id
) bq ON pg.account_id = bq.account_id
WHERE pg.is_active = true;

How agents use it: Agents can answer "What's the ROI of our marketing campaigns by account?" by querying this unified view.

Notice how each example uses the same pattern: connect sources, build unified views, create tools, agents query tools. The complexity of different systems is hidden from the agents.


Common Pitfalls & Tips

I've seen teams make these mistakes when connecting multiple sources. Here's how to avoid them.

Pitfall 1: Not Whitelisting IP Addresses

This is the #1 reason connections fail. BigQuery, Snowflake, and Postgres all require IP whitelisting.

Why this matters: Without whitelisting, Pylar can't connect to your databases. The connection test will fail, and you'll spend time debugging credentials when the real issue is network access.

How to avoid it: Always whitelist 34.122.205.142 before connecting. Do it first, then test the connection.

Pitfall 2: Using Production Credentials

Don't use your personal database credentials or production admin accounts for Pylar connections.

Why this matters: If credentials are compromised, attackers have full access. Also, you can't revoke access without affecting other systems.

How to avoid it: Create dedicated users for Pylar:

  • BigQuery: Service account with BigQuery Data Viewer and BigQuery Job User roles
  • Snowflake: Dedicated user with read-only permissions
  • Postgres: Dedicated user with SELECT permissions only

Pitfall 3: Not Testing Connections Incrementally

Don't connect all sources at once, then try to build a complex view.

Why this matters: If something breaks, you won't know which connection is the problem. Debugging becomes a nightmare.

How to avoid it: Connect one source at a time:

  1. Connect BigQuery → Test with a simple query
  2. Connect Snowflake → Test with a simple query
  3. Connect Postgres → Test with a simple query
  4. Build a view joining two sources → Test
  5. Add the third source → Test

Pitfall 4: Ignoring Performance Implications

Cross-database joins can be slow, especially with large datasets.

Why this matters: Slow queries mean slow agents. Users get frustrated, and you end up optimizing under pressure.

How to avoid it:

  • Filter early: Add WHERE clauses in subqueries
  • Aggregate before joining: Pre-calculate metrics
  • Limit date ranges: Only query recent data
  • Test query performance: Use Pylar's query execution time to identify slow queries

Pitfall 5: Not Handling Schema Differences

Different systems use different naming conventions and data types.

Why this matters: Joins fail silently or return incorrect results when types don't match or names are inconsistent.

How to avoid it:

  • Normalize names: Use aliases to create consistent column names
  • Cast types: Use CAST or CONVERT to ensure compatibility
  • Test joins: Verify join keys match before building complex views

Best Practices Summary

Here's a quick checklist:

  • Whitelist IP first: Always whitelist 34.122.205.142 before connecting
  • Use dedicated users: Create read-only users for Pylar
  • Test incrementally: Connect and test one source at a time
  • Filter early: Add WHERE clauses in subqueries
  • Handle NULLs: Use COALESCE for unified identifiers
  • Normalize types: Cast types to ensure compatibility
  • Monitor performance: Use Evals to see query execution times
  • Document assumptions: Add comments explaining join logic

Next Steps

You've connected BigQuery, Snowflake, and Postgres to Pylar. That's the foundation. Now you can:

  1. Build unified views: Create views that join data across all three sources, giving agents a single source of truth.

  2. Create MCP tools: Turn your views into MCP tools that agents can use. The previous tutorials covered this.

  3. Monitor usage: Use Pylar Evals to see how agents are querying your multi-source views and identify optimization opportunities.

  4. Iterate and improve: As you learn how agents use your views, refine them to better match their needs. Add filters, optimize joins, or add calculated fields.

The key is to start simple and iterate. Your first multi-source view doesn't need to be perfect—it just needs to work. You can always refine it later based on real usage.

If you want to keep going, the next step is building views that span multiple sources. That's where you'll see the real value—unifying your data so agents can answer questions that require data from multiple systems.


Frequently Asked Questions

Can I connect multiple instances of the same source?

Do I need to whitelist the IP for each connection?

How do cross-database joins work?

Will joining data across systems be slow?

Can I join BigQuery with Snowflake with Postgres in one query?

What if my connection credentials change?

Can I see which queries are hitting which sources?

What if I need to connect a source that's not listed?

Using Pylar with BigQuery, Snowflake, and Postgres | Pylar Blog