Skip to main content
sean@portfolio:~$
$ whoami
Security Data Architect & Analytics Lead
$ cat interests.txt
Security Decision Platforms, Governed Data Architectures,
Large-Scale Pipelines, Canonical Data Modeling, ML Risk Signals
$ ls skills/
GCP AWS Python PySpark Kafka Kubernetes SQL Airflow Tableau GenAI Hadoop Linux Git ETL
$ _
SZ

Sean Zhang

seanzhangd.com

Security Data Architect & Analytics Lead

Security data architect specializing in enterprise security decision platforms, governed data architectures, and large-scale ingestion pipelines. Experienced in building canonical security data models, audit-ready decision logic, and automation frameworks that translate metrics into controlled actions.

8 Team Members Led
$2M+ Saved via Automation
12K+ Hours Saved Annually

// Certifications

GCP Professional Data Engineer
GCP Associate Cloud Engineer
AWS Solutions Architect Associate
EPIC Certified

// Technical Expertise

☁️

Cloud Platforms

GCP (PDE, ACE) AWS (SAA) BigQuery GCP Dataflow Cloud Storage
🔧

Data Engineering & Databases

Python PySpark Kafka Kubernetes SQL Oracle SQL MSSQL Airflow Hadoop ETL Automation MySQL Teradata Linux Git EPIC (Cogito, Caboodle, Clarity)
🤖

Machine Learning & Security Analytics

Risk Scoring Anomaly Detection Predictive Modeling scikit-learn TensorFlow Pandas NumPy R (tidyverse, dplyr, caret)
📊

BI & Visualization

Tableau (Server Admin) Alteryx Power BI SAP
🚀

GenAI & Automation

MCP Claude Code Cursor ChatGPT Enterprise Microsoft Copilot Jenkins CI/CD
👥

Leadership & Collaboration

Stakeholder Alignment Executive Reporting Cross-functional Execution Team Leadership Server Management

// Professional Experience

Data Architect (Security Decision Platform)

PNC Bank Apr 2023 - Present
  • Architected and owned an enterprise Security Decision Platform (SPOG) across 14 security and risk domains within Enterprise Information Security (WIAM, AppSec, ASM, Certificates, Data Protection, Third-Party Risk, Physical Security)
  • Unified dozens of heterogeneous upstream systems into a governed security data lake and canonical decision layer, enabling end-to-end workforce and application risk scoring through cross-domain analytics
  • Defined canonical security entities and cross-domain linkage (identity, assets, vulnerabilities, access, controls) enabling analytics not achievable inside individual systems
  • Authored platform standards: data contracts, schema/versioning rules, audit-grade retention, deterministic recomputation, and controlled backfills to support regulatory and audit requirements
  • Designed ingestion and serving patterns (batch + incremental + event-driven) balancing reliability, change control, and downstream compatibility
  • Led platform evolution through migration initiatives introducing S3 landing/curation patterns and governed consumption in Redshift while maintaining continuity for mission-critical downstream consumers
Solution Architect Python ETL Oracle Kafka Airflow AWS S3/Redshift Tableau

Advisor, Data Management and Governance

Cardinal Health Apr 2022 - Feb 2023
  • Built and optimized ETL pipelines integrating SQL Server, GCP, Workday, ServiceNow, and third-party APIs using SQL, Alteryx, Python, Dataflow, Cloud Storage, and BigQuery
  • Led migration of 50+ ETL workflows from SQL Server to BigQuery, reducing query latency by ~70% and supporting 2–3× larger HR data volumes without performance degradation
  • Administered Tableau Server for 900+ HR stakeholders and 10+ C-level leaders, optimizing dashboard performance, strengthening governance, and reducing reporting backlog by 35%
  • Developed Python/R predictive models for turnover risk, internal movement, and workforce planning, deployed into 10+ C-level reporting workflows
GCP BigQuery Python Alteryx

Data Specialist

St. Luke's University Health Network Jul 2021 - Mar 2022
  • Built and maintained 40+ SQL/Tableau assets for ED, OR, and Administration from hybrid clinical warehouse sources, supporting daily clinical operations and utilization analytics
  • Integrated EPIC Cogito/Caboodle/Clarity and SAP into unified Tableau pipelines, improving data consistency and reducing refresh failures by ~40%
  • Engineered Twilio-based call-center data pipelines and reporting for ~20K calls/day with 100+ configurations, improving QA workflows and operational oversight
SQL Tableau EPIC SAP

// Featured Projects

Enterprise Security Decision Platform

Architected enterprise Security Decision Platform (SPOG) across 14 security and risk domains, unifying heterogeneous upstream systems into a governed security data lake with canonical data models enabling cross-domain workforce and application risk scoring.

Python Airflow AWS S3/Redshift GCP BigQuery

GCP to BigQuery Migration Pipeline

Led migration of 50+ ETL workflows from GCP SQL instances to BigQuery, doubling read/write speed and enabling scalability for 300+ stakeholders across HR, Finance and Executives.

GCP BigQuery Dataflow Alteryx

Predictive Risk Analytics Models

Developed machine learning models for individual risk scoring, insider fraud prediction, and turnover risk prediction, improving risk management accuracy by 80% and serving C-level executives.

Python scikit-learn TensorFlow R

// Education & Certifications

Master of Science in Business Analytics

Washington University in St. Louis January 2021

GPA: 3.97/4.0

Honors: Knight Scholar (Top 1%), Beta Gamma Sigma

Coursework: Machine Learning and Statistical Modeling

Bachelor of Marketing

Xiamen University June 2019

GPA: 3.61/4.0

Coursework: Software Engineering, Marketing

Exchange: McGill University, Montreal, Canada with full scholarship (Top 1%)

< Security Decision Platform />

Enterprise Security Analytics Lakehouse (SPOG)

Governance
Airflow
Contracts
Lineage
Compliance
SLA
Sources
Identity
Data Protection
ASM
...
Ingestion
Batch Python | Oracle
Stream Kafka
Idempotence
Lakehouse
🥉 Bronze
Immutable
Schema Tolerant
Quarantine
Replayable
🥈 Silver
Normalization
Quality Gates
Lineage
🥇 Gold
Metrics as Code
Exec Reports
Traceability
Processing
ETL CDC | Python | SQL
Quality Checks
Consumption
Dashboards
Reports
Automation