🚀

We've launched on Product Hunt

Check us out →
Tutorials

Building an E-Commerce Fraud Detection Pipeline: From Transactions to Risk Scores

H

Hoshang Mehta

Building an E-Commerce Fraud Detection Pipeline: From Transactions to Risk Scores

Fraud detection is reactive and rule-based. Teams can't identify sophisticated fraud patterns, false positives block legitimate customers, and manual review is slow. Fraud losses and customer friction increase. The result: fraudulent orders slip through, legitimate orders get blocked, and teams spend hours reviewing false positives.

The problem: A $500 order comes in from a new customer. The billing address doesn't match the shipping address. The IP is from a different country. The order uses a new payment method. Is it fraud or a legitimate international customer? The team doesn't know, so they manually review. By the time they approve it, the customer has canceled and bought from a competitor. Or worse—they approve a fraudulent order and lose $500.

The solution: An automated fraud detection pipeline that analyzes transaction patterns, customer behavior, device fingerprints, and historical fraud data—scoring risk in real-time and flagging suspicious transactions. This post walks through building a repeatable architecture that transforms reactive fraud detection into proactive risk management.


The Problem with Rule-Based Fraud Detection

Rule-based fraud detection breaks down at scale. Here's what e-commerce teams face:

The False Positive Problem

Rules block legitimate customers:

  • "Block orders with billing/shipping mismatch" blocks international customers
  • "Block orders from new IPs" blocks VPN users
  • "Block high-value orders" blocks VIP customers
  • "Block rapid orders" blocks gift buyers

The math: If 5% of orders trigger fraud rules and 80% are false positives, that's 4% of orders incorrectly flagged. For 10,000 orders per month, that's 400 false positives requiring manual review. At 5 minutes per review, that's 33 hours per month wasted.

The False Negative Problem

Rules miss sophisticated fraud:

  • Fraudsters learn to bypass simple rules
  • New fraud patterns aren't detected
  • Rule combinations miss complex fraud
  • Manual rules can't adapt quickly

The impact: A fraudster uses a stolen credit card with matching billing address, ships to a different address (using "gift" as reason), and orders during business hours. Simple rules don't catch it. The order ships, the card is charged back, and the company loses $300.

The Scale Problem

Manual review doesn't scale:

  • Teams can't review every suspicious transaction
  • Review queues grow during peak periods
  • Response times increase, customers wait
  • Fraudsters exploit delays

The cost: A team reviews 100 transactions per day manually. During peak season, 500 transactions need review. The team can't keep up. Fraudulent orders slip through, or legitimate orders are delayed, causing customer churn.

The Pattern Detection Problem

Teams can't identify fraud patterns:

  • Similar fraud attempts aren't connected
  • Fraud rings aren't detected
  • New attack vectors go unnoticed
  • Historical patterns aren't learned from

The missed opportunity: 20 fraudulent orders share the same device fingerprint, but each is reviewed individually. Pattern detection could identify the fraud ring and block all future attempts automatically.


A Repeatable Fraud Detection Architecture

The solution is a pipeline architecture that analyzes transactions, scores risk, detects patterns, and flags suspicious activity.

graph TD A[New Transaction] --> B[Transaction Enrichment] B --> C[Customer History Lookup] B --> D[Device Fingerprint Analysis] B --> E[Payment Method Analysis] C --> F[Risk Scoring Engine] D --> F E --> F F --> G[Pattern Detection] G --> H[Risk Score Calculation] H --> I{Risk Level} I -->|High| J[Block/Review] I -->|Medium| K[Additional Verification] I -->|Low| L[Approve] M[Fraud Feedback] --> N[Model Learning] N --> F style B fill:#e1f5ff style F fill:#fff4e1 style H fill:#e8f5e9

Architecture Components

1. Transaction Enrichment Layer

  • Captures transaction details (amount, items, customer)
  • Retrieves customer history (previous orders, behavior)
  • Analyzes device fingerprint (IP, browser, device)
  • Checks payment method (card, billing address, CVV)

2. Customer History View

  • Unified view of customer behavior
  • Previous order patterns
  • Payment history
  • Return/refund history

3. Device Fingerprint Analysis

  • Identifies device characteristics
  • Detects VPN/proxy usage
  • Tracks device history
  • Flags suspicious devices

4. Payment Method Analysis

  • Validates payment details
  • Checks card history
  • Analyzes billing address
  • Verifies CVV and AVS

5. Risk Scoring Engine

  • Combines multiple risk factors
  • Calculates risk score (0-100)
  • Applies ML models for pattern detection
  • Generates risk categories

6. Pattern Detection System

  • Identifies fraud rings
  • Detects coordinated attacks
  • Surfaces emerging patterns
  • Learns from historical fraud

7. Decision Engine

  • Applies business rules
  • Routes by risk level
  • Triggers additional verification
  • Blocks or approves transactions

Aggregating Transaction and Customer Data

The first stage is building unified views of transactions and customer behavior.

Building the Transaction Risk View

Create a comprehensive view that joins transaction data with customer and device context:

CREATE MATERIALIZED VIEW transaction_risk_comprehensive AS
SELECT 
    -- Transaction Info
    t.transaction_id,
    t.order_id,
    t.transaction_date,
    t.transaction_amount,
    t.currency,
    t.payment_method, -- 'credit_card', 'paypal', 'apple_pay', etc.
    
    -- Customer Info
    c.customer_id,
    c.email,
    c.account_age_days,
    c.total_orders,
    c.total_revenue,
    c.avg_order_value,
    c.last_order_date,
    c.days_since_last_order,
    c.return_rate,
    c.refund_rate,
    
    -- Billing Address
    t.billing_address,
    t.billing_city,
    t.billing_state,
    t.billing_zip,
    t.billing_country,
    
    -- Shipping Address
    t.shipping_address,
    t.shipping_city,
    t.shipping_state,
    t.shipping_zip,
    t.shipping_country,
    
    -- Address Match
    CASE 
        WHEN t.billing_address = t.shipping_address 
         AND t.billing_city = t.shipping_city 
         AND t.billing_zip = t.shipping_zip 
        THEN true
        ELSE false
    END as address_match,
    
    -- Device Fingerprint
    d.device_id,
    d.ip_address,
    d.ip_country,
    d.ip_city,
    d.is_vpn,
    d.is_proxy,
    d.browser,
    d.device_type,
    d.os,
    
    -- Device History
    d.device_first_seen_date,
    d.device_transaction_count,
    d.device_fraud_count,
    d.device_fraud_rate,
    
    -- Payment Details
    t.card_last_4,
    t.card_bin,
    t.card_type,
    t.avs_result, -- Address Verification System
    t.cvv_result,
    t.3ds_result, -- 3D Secure
    
    -- Calculated Risk Factors
    CASE 
        WHEN c.total_orders = 0 THEN 'new_customer'
        WHEN c.total_orders = 1 THEN 'first_repeat'
        WHEN c.total_orders > 10 THEN 'established'
        ELSE 'returning'
    END as customer_segment,
    
    CASE 
        WHEN t.transaction_amount > c.avg_order_value * 3 THEN 'high_value_anomaly'
        WHEN t.transaction_amount > c.avg_order_value * 2 THEN 'medium_value_anomaly'
        ELSE 'normal_value'
    END as value_anomaly
    
FROM transactions t
LEFT JOIN customers c ON t.customer_id = c.customer_id
LEFT JOIN device_fingerprints d ON t.device_id = d.device_id;

Customer Behavior Patterns

CREATE MATERIALIZED VIEW customer_behavior_patterns AS
SELECT 
    customer_id,
    email,
    
    -- Order Patterns
    COUNT(*) as total_orders,
    AVG(order_value) as avg_order_value,
    STDDEV(order_value) as order_value_stddev,
    MIN(order_date) as first_order_date,
    MAX(order_date) as last_order_date,
    
    -- Time Patterns
    MODE() WITHIN GROUP (ORDER BY DATE_PART('hour', order_date)) as preferred_order_hour,
    MODE() WITHIN GROUP (ORDER BY DATE_PART('dow', order_date)) as preferred_order_day,
    
    -- Payment Patterns
    COUNT(DISTINCT payment_method) as payment_methods_used,
    MODE() WITHIN GROUP (ORDER BY payment_method) as preferred_payment_method,
    
    -- Shipping Patterns
    COUNT(DISTINCT shipping_country) as shipping_countries,
    MODE() WITHIN GROUP (ORDER BY shipping_country) as preferred_shipping_country,
    
    -- Risk Indicators
    COUNT(*) FILTER (WHERE order_status = 'fraud') as fraud_count,
    COUNT(*) FILTER (WHERE order_status = 'refunded') as refund_count,
    COUNT(*) FILTER (WHERE order_status = 'chargeback') as chargeback_count,
    
    -- Behavior Score
    CASE 
        WHEN COUNT(*) > 10 AND COUNT(*) FILTER (WHERE order_status = 'fraud') = 0 THEN 'low_risk'
        WHEN COUNT(*) > 5 AND COUNT(*) FILTER (WHERE order_status = 'fraud') = 0 THEN 'medium_risk'
        WHEN COUNT(*) FILTER (WHERE order_status = 'fraud') > 0 THEN 'high_risk'
        ELSE 'unknown_risk'
    END as behavior_risk_category
    
FROM orders
GROUP BY customer_id, email;

Pattern Detection and Anomaly Identification

Identify fraud patterns and anomalies in transaction data.

Transaction Anomaly Detection

CREATE MATERIALIZED VIEW transaction_anomalies AS
SELECT 
    trc.transaction_id,
    trc.customer_id,
    trc.transaction_amount,
    trc.customer_segment,
    
    -- Value Anomaly
    CASE 
        WHEN trc.transaction_amount > cbp.avg_order_value + (cbp.order_value_stddev * 3) THEN 'high_value_outlier'
        WHEN trc.transaction_amount > cbp.avg_order_value + (cbp.order_value_stddev * 2) THEN 'medium_value_outlier'
        ELSE 'normal_value'
    END as value_anomaly_type,
    
    -- Time Anomaly
    CASE 
        WHEN DATE_PART('hour', trc.transaction_date) != cbp.preferred_order_hour 
         AND cbp.total_orders > 5 
        THEN 'time_anomaly'
        ELSE 'normal_time'
    END as time_anomaly,
    
    -- Payment Method Anomaly
    CASE 
        WHEN trc.payment_method != cbp.preferred_payment_method 
         AND cbp.total_orders > 3
        THEN 'payment_method_anomaly'
        ELSE 'normal_payment'
    END as payment_anomaly,
    
    -- Shipping Anomaly
    CASE 
        WHEN trc.shipping_country != cbp.preferred_shipping_country 
         AND cbp.total_orders > 2
        THEN 'shipping_anomaly'
        ELSE 'normal_shipping'
    END as shipping_anomaly,
    
    -- Combined Anomaly Score
    (
        CASE WHEN trc.transaction_amount > cbp.avg_order_value + (cbp.order_value_stddev * 3) THEN 3
             WHEN trc.transaction_amount > cbp.avg_order_value + (cbp.order_value_stddev * 2) THEN 2
             ELSE 0 END +
        CASE WHEN DATE_PART('hour', trc.transaction_date) != cbp.preferred_order_hour AND cbp.total_orders > 5 THEN 2 ELSE 0 END +
        CASE WHEN trc.payment_method != cbp.preferred_payment_method AND cbp.total_orders > 3 THEN 2 ELSE 0 END +
        CASE WHEN trc.shipping_country != cbp.preferred_shipping_country AND cbp.total_orders > 2 THEN 2 ELSE 0 END
    ) as anomaly_score
    
FROM transaction_risk_comprehensive trc
LEFT JOIN customer_behavior_patterns cbp ON trc.customer_id = cbp.customer_id;

Fraud Ring Detection

CREATE MATERIALIZED VIEW fraud_ring_indicators AS
SELECT 
    d.device_id,
    d.ip_address,
    COUNT(DISTINCT t.customer_id) as unique_customers,
    COUNT(DISTINCT t.card_last_4) as unique_cards,
    COUNT(*) as transaction_count,
    SUM(t.transaction_amount) as total_amount,
    COUNT(*) FILTER (WHERE t.order_status = 'fraud') as fraud_count,
    COUNT(*) FILTER (WHERE t.order_status = 'fraud')::float / COUNT(*) as fraud_rate,
    
    -- Fraud Ring Indicators
    CASE 
        WHEN COUNT(DISTINCT t.customer_id) > 5 
         AND COUNT(DISTINCT t.card_last_4) > 5 
         AND COUNT(*) FILTER (WHERE t.order_status = 'fraud') > 2 
        THEN 'high_risk_ring'
        WHEN COUNT(DISTINCT t.customer_id) > 3 
         AND COUNT(*) FILTER (WHERE t.order_status = 'fraud') > 1 
        THEN 'medium_risk_ring'
        ELSE 'low_risk'
    END as ring_risk_category
    
FROM device_fingerprints d
JOIN transactions t ON d.device_id = t.device_id
GROUP BY d.device_id, d.ip_address
HAVING COUNT(*) > 2; -- Minimum transactions for pattern detection

Risk Scoring with ML Models

Calculate risk scores by combining multiple factors and applying ML models.

Risk Score Calculation

CREATE MATERIALIZED VIEW transaction_risk_scores AS
SELECT 
    trc.transaction_id,
    trc.customer_id,
    trc.transaction_amount,
    trc.customer_segment,
    
    -- Risk Factor Scores (0-100 each)
    
    -- Customer Risk (0-30 points)
    CASE 
        WHEN trc.customer_segment = 'new_customer' THEN 20
        WHEN trc.customer_segment = 'first_repeat' THEN 10
        WHEN cbp.behavior_risk_category = 'high_risk' THEN 25
        WHEN cbp.behavior_risk_category = 'medium_risk' THEN 15
        ELSE 5
    END as customer_risk_score,
    
    -- Device Risk (0-25 points)
    CASE 
        WHEN trc.device_fraud_rate > 0.5 THEN 25
        WHEN trc.device_fraud_rate > 0.2 THEN 20
        WHEN trc.is_vpn = true THEN 15
        WHEN trc.is_proxy = true THEN 15
        WHEN trc.device_transaction_count = 0 THEN 10
        ELSE 5
    END as device_risk_score,
    
    -- Payment Risk (0-25 points)
    CASE 
        WHEN trc.avs_result = 'N' THEN 20 -- Address doesn't match
        WHEN trc.cvv_result = 'N' THEN 15 -- CVV doesn't match
        WHEN trc.3ds_result = 'N' THEN 15 -- 3D Secure failed
        WHEN trc.address_match = false THEN 10
        ELSE 5
    END as payment_risk_score,
    
    -- Value Risk (0-10 points)
    CASE 
        WHEN ta.value_anomaly_type = 'high_value_outlier' THEN 10
        WHEN ta.value_anomaly_type = 'medium_value_outlier' THEN 5
        ELSE 0
    END as value_risk_score,
    
    -- Anomaly Risk (0-10 points)
    CASE 
        WHEN ta.anomaly_score >= 6 THEN 10
        WHEN ta.anomaly_score >= 4 THEN 7
        WHEN ta.anomaly_score >= 2 THEN 5
        ELSE 0
    END as anomaly_risk_score,
    
    -- Total Risk Score
    (
        CASE WHEN trc.customer_segment = 'new_customer' THEN 20 WHEN trc.customer_segment = 'first_repeat' THEN 10 WHEN cbp.behavior_risk_category = 'high_risk' THEN 25 WHEN cbp.behavior_risk_category = 'medium_risk' THEN 15 ELSE 5 END +
        CASE WHEN trc.device_fraud_rate > 0.5 THEN 25 WHEN trc.device_fraud_rate > 0.2 THEN 20 WHEN trc.is_vpn = true THEN 15 WHEN trc.is_proxy = true THEN 15 WHEN trc.device_transaction_count = 0 THEN 10 ELSE 5 END +
        CASE WHEN trc.avs_result = 'N' THEN 20 WHEN trc.cvv_result = 'N' THEN 15 WHEN trc.3ds_result = 'N' THEN 15 WHEN trc.address_match = false THEN 10 ELSE 5 END +
        CASE WHEN ta.value_anomaly_type = 'high_value_outlier' THEN 10 WHEN ta.value_anomaly_type = 'medium_value_outlier' THEN 5 ELSE 0 END +
        CASE WHEN ta.anomaly_score >= 6 THEN 10 WHEN ta.anomaly_score >= 4 THEN 7 WHEN ta.anomaly_score >= 2 THEN 5 ELSE 0 END
    ) as total_risk_score,
    
    -- Risk Category
    CASE 
        WHEN (
            CASE WHEN trc.customer_segment = 'new_customer' THEN 20 WHEN trc.customer_segment = 'first_repeat' THEN 10 WHEN cbp.behavior_risk_category = 'high_risk' THEN 25 WHEN cbp.behavior_risk_category = 'medium_risk' THEN 15 ELSE 5 END +
            CASE WHEN trc.device_fraud_rate > 0.5 THEN 25 WHEN trc.device_fraud_rate > 0.2 THEN 20 WHEN trc.is_vpn = true THEN 15 WHEN trc.is_proxy = true THEN 15 WHEN trc.device_transaction_count = 0 THEN 10 ELSE 5 END +
            CASE WHEN trc.avs_result = 'N' THEN 20 WHEN trc.cvv_result = 'N' THEN 15 WHEN trc.3ds_result = 'N' THEN 15 WHEN trc.address_match = false THEN 10 ELSE 5 END +
            CASE WHEN ta.value_anomaly_type = 'high_value_outlier' THEN 10 WHEN ta.value_anomaly_type = 'medium_value_outlier' THEN 5 ELSE 0 END +
            CASE WHEN ta.anomaly_score >= 6 THEN 10 WHEN ta.anomaly_score >= 4 THEN 7 WHEN ta.anomaly_score >= 2 THEN 5 ELSE 0 END
        ) >= 60 THEN 'high_risk'
        WHEN (
            CASE WHEN trc.customer_segment = 'new_customer' THEN 20 WHEN trc.customer_segment = 'first_repeat' THEN 10 WHEN cbp.behavior_risk_category = 'high_risk' THEN 25 WHEN cbp.behavior_risk_category = 'medium_risk' THEN 15 ELSE 5 END +
            CASE WHEN trc.device_fraud_rate > 0.5 THEN 25 WHEN trc.device_fraud_rate > 0.2 THEN 20 WHEN trc.is_vpn = true THEN 15 WHEN trc.is_proxy = true THEN 15 WHEN trc.device_transaction_count = 0 THEN 10 ELSE 5 END +
            CASE WHEN trc.avs_result = 'N' THEN 20 WHEN trc.cvv_result = 'N' THEN 15 WHEN trc.3ds_result = 'N' THEN 15 WHEN trc.address_match = false THEN 10 ELSE 5 END +
            CASE WHEN ta.value_anomaly_type = 'high_value_outlier' THEN 10 WHEN ta.value_anomaly_type = 'medium_value_outlier' THEN 5 ELSE 0 END +
            CASE WHEN ta.anomaly_score >= 6 THEN 10 WHEN ta.anomaly_score >= 4 THEN 7 WHEN ta.anomaly_score >= 2 THEN 5 ELSE 0 END
        ) >= 40 THEN 'medium_risk'
        WHEN (
            CASE WHEN trc.customer_segment = 'new_customer' THEN 20 WHEN trc.customer_segment = 'first_repeat' THEN 10 WHEN cbp.behavior_risk_category = 'high_risk' THEN 25 WHEN cbp.behavior_risk_category = 'medium_risk' THEN 15 ELSE 5 END +
            CASE WHEN trc.device_fraud_rate > 0.5 THEN 25 WHEN trc.device_fraud_rate > 0.2 THEN 20 WHEN trc.is_vpn = true THEN 15 WHEN trc.is_proxy = true THEN 15 WHEN trc.device_transaction_count = 0 THEN 10 ELSE 5 END +
            CASE WHEN trc.avs_result = 'N' THEN 20 WHEN trc.cvv_result = 'N' THEN 15 WHEN trc.3ds_result = 'N' THEN 15 WHEN trc.address_match = false THEN 10 ELSE 5 END +
            CASE WHEN ta.value_anomaly_type = 'high_value_outlier' THEN 10 WHEN ta.value_anomaly_type = 'medium_value_outlier' THEN 5 ELSE 0 END +
            CASE WHEN ta.anomaly_score >= 6 THEN 10 WHEN ta.anomaly_score >= 4 THEN 7 WHEN ta.anomaly_score >= 2 THEN 5 ELSE 0 END
        ) >= 20 THEN 'low_risk'
        ELSE 'very_low_risk'
    END as risk_category
    
FROM transaction_risk_comprehensive trc
LEFT JOIN customer_behavior_patterns cbp ON trc.customer_id = cbp.customer_id
LEFT JOIN transaction_anomalies ta ON trc.transaction_id = ta.transaction_id;

Fraud Ring Risk Adjustment

Adjust risk scores for transactions from known fraud rings:

CREATE MATERIALIZED VIEW transaction_risk_scores_adjusted AS
SELECT 
    trs.*,
    fri.ring_risk_category,
    
    -- Adjusted Risk Score (increase if from fraud ring)
    CASE 
        WHEN fri.ring_risk_category = 'high_risk_ring' THEN LEAST(trs.total_risk_score + 30, 100)
        WHEN fri.ring_risk_category = 'medium_risk_ring' THEN LEAST(trs.total_risk_score + 15, 100)
        ELSE trs.total_risk_score
    END as adjusted_risk_score,
    
    -- Adjusted Risk Category
    CASE 
        WHEN (
            CASE 
                WHEN fri.ring_risk_category = 'high_risk_ring' THEN LEAST(trs.total_risk_score + 30, 100)
                WHEN fri.ring_risk_category = 'medium_risk_ring' THEN LEAST(trs.total_risk_score + 15, 100)
                ELSE trs.total_risk_score
            END
        ) >= 60 THEN 'high_risk'
        WHEN (
            CASE 
                WHEN fri.ring_risk_category = 'high_risk_ring' THEN LEAST(trs.total_risk_score + 30, 100)
                WHEN fri.ring_risk_category = 'medium_risk_ring' THEN LEAST(trs.total_risk_score + 15, 100)
                ELSE trs.total_risk_score
            END
        ) >= 40 THEN 'medium_risk'
        ELSE trs.risk_category
    END as adjusted_risk_category
    
FROM transaction_risk_scores trs
LEFT JOIN fraud_ring_indicators fri ON trs.device_id = fri.device_id;

Real-Time Fraud Alerts

Generate real-time alerts for high-risk transactions and route them for review or blocking.

Fraud Alert Generation

CREATE MATERIALIZED VIEW fraud_alerts AS
SELECT 
    trsa.transaction_id,
    trsa.customer_id,
    trsa.transaction_amount,
    trsa.adjusted_risk_score,
    trsa.adjusted_risk_category,
    trsa.ring_risk_category,
    
    -- Alert Priority
    CASE 
        WHEN trsa.adjusted_risk_score >= 70 THEN 'critical'
        WHEN trsa.adjusted_risk_score >= 60 THEN 'high'
        WHEN trsa.adjusted_risk_score >= 40 THEN 'medium'
        ELSE 'low'
    END as alert_priority,
    
    -- Recommended Action
    CASE 
        WHEN trsa.adjusted_risk_score >= 70 THEN 'block'
        WHEN trsa.adjusted_risk_score >= 60 THEN 'review_required'
        WHEN trsa.adjusted_risk_score >= 40 THEN 'additional_verification'
        ELSE 'approve'
    END as recommended_action,
    
    -- Risk Factors Summary
    ARRAY[
        CASE WHEN trsa.customer_risk_score > 15 THEN 'high_customer_risk' END,
        CASE WHEN trsa.device_risk_score > 15 THEN 'high_device_risk' END,
        CASE WHEN trsa.payment_risk_score > 15 THEN 'high_payment_risk' END,
        CASE WHEN trsa.ring_risk_category = 'high_risk_ring' THEN 'fraud_ring' END
    ]::text[] as risk_factors
    
FROM transaction_risk_scores_adjusted trsa
WHERE trsa.adjusted_risk_score >= 40 -- Only flag medium and high risk
ORDER BY trsa.adjusted_risk_score DESC;

MCP Tool for Fraud Detection

{
  "name": "check_transaction_fraud_risk",
  "description": "Checks the fraud risk score for a transaction based on customer history, device fingerprint, payment details, and behavior patterns. Use when processing new orders or payments to determine if additional verification is needed.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "transaction_id": {
        "type": "string",
        "description": "Transaction ID to check"
      },
      "customer_id": {
        "type": "string",
        "description": "Customer ID (if transaction_id not available)"
      },
      "transaction_amount": {
        "type": "number",
        "description": "Transaction amount"
      }
    },
    "required": ["transaction_id"]
  },
  "query": "SELECT * FROM fraud_alerts WHERE transaction_id = $1",
  "policies": ["authenticated", "role:fraud OR role:operations"]
}

Schema Overview

The fraud detection pipeline uses a hierarchical schema that traces from transactions to risk scores and alerts.

erDiagram TRANSACTIONS ||--o{ TRANSACTION_RISK_COMPREHENSIVE : enriches CUSTOMERS ||--o{ TRANSACTION_RISK_COMPREHENSIVE : joins DEVICE_FINGERPRINTS ||--o{ TRANSACTION_RISK_COMPREHENSIVE : joins TRANSACTION_RISK_COMPREHENSIVE ||--o{ CUSTOMER_BEHAVIOR_PATTERNS : analyzes TRANSACTION_RISK_COMPREHENSIVE ||--o{ TRANSACTION_ANOMALIES : detects DEVICE_FINGERPRINTS ||--o{ FRAUD_RING_INDICATORS : identifies TRANSACTION_ANOMALIES ||--o{ TRANSACTION_RISK_SCORES : calculates CUSTOMER_BEHAVIOR_PATTERNS ||--o{ TRANSACTION_RISK_SCORES : factors FRAUD_RING_INDICATORS ||--o{ TRANSACTION_RISK_SCORES_ADJUSTED : adjusts TRANSACTION_RISK_SCORES ||--o{ TRANSACTION_RISK_SCORES_ADJUSTED : adjusts TRANSACTION_RISK_SCORES_ADJUSTED ||--o{ FRAUD_ALERTS : generates TRANSACTIONS { string transaction_id PK string customer_id decimal transaction_amount timestamp transaction_date } TRANSACTION_RISK_COMPREHENSIVE { string transaction_id PK string customer_id string device_id boolean address_match string avs_result } TRANSACTION_RISK_SCORES { string transaction_id PK int customer_risk_score int device_risk_score int payment_risk_score int total_risk_score string risk_category } FRAUD_ALERTS { string transaction_id PK int adjusted_risk_score string recommended_action string alert_priority }

Key relationships:

  • transactionstransaction_risk_comprehensive (enriched with customer/device data)
  • transaction_risk_comprehensivecustomer_behavior_patterns (behavior analysis)
  • transaction_risk_comprehensivetransaction_anomalies (anomaly detection)
  • device_fingerprintsfraud_ring_indicators (pattern detection)
  • All factors → transaction_risk_scores (risk calculation)
  • transaction_risk_scores + fraud_ring_indicatorsfraud_alerts (alert generation)

Traceability: Every fraud alert can be traced back to transaction details, risk factors, and scoring logic.


Closing Thoughts

Building an e-commerce fraud detection pipeline transforms fraud management from reactive to proactive. The key is treating fraud detection as a data problem—not a rule-based process.

Key takeaways:

  1. Enrichment enables detection: Combine transaction, customer, and device data. Comprehensive context makes fraud detection possible.

  2. Multiple factors determine risk: Consider customer history, device fingerprint, payment details, and behavior patterns. No single factor determines fraud.

  3. Pattern detection surfaces rings: Identify fraud rings and coordinated attacks. Pattern detection prevents repeat fraud.

  4. Scoring enables automation: Calculate risk scores from multiple factors. Automated scoring enables real-time decisions.

  5. Feedback improves accuracy: Learn from fraud outcomes. Update models based on what actually was fraud vs false positives.

The architecture described here is repeatable. Start with basic risk scoring, add pattern detection, then implement ML models and feedback loops. Each stage builds on the previous one.

Most importantly: this isn't about blocking all transactions—it's about identifying risk. Teams still review and make decisions. The fraud detection pipeline provides risk scores, enabling teams to focus on high-risk transactions and reduce false positives.


This architecture pattern is implemented in Pylar, but the concepts apply to any system detecting fraud. The key is the transaction enrichment → risk scoring → pattern detection → alert generation pipeline.