Skip to main content
sean@portfolio:~$
$ whoami
Senior Data Engineer & Analytics Manager
$ cat interests.txt
Cloud Data Platforms, Security Analytics, GenAI Automation,
Large-Scale ETL, Machine Learning
$ ls skills/
GCP AWS Python PySpark SQL Airflow Tableau GenAI Hadoop Linux Git ETL
$ _
SZ

Sean Zhang

seanzhangd.com

Senior Data Engineer & Analytics Manager

Specializing in cloud data platforms (GCP, AWS), security analytics, large-scale ETL, and GenAI-driven automation. Built multi-domain security analytics, engineered pipelines that reduce manual effort, and delivered ML-driven detection improvements that enhance enterprise risk visibility.

8 Team Members Led
$1M+ Saved via Automation
12K+ Hours Saved Annually

// Certifications

GCP Professional Data Engineer
GCP Associate Cloud Engineer
AWS Certified Cloud Practitioner
EPIC Certified

// Technical Expertise

☁️

Cloud Platforms

GCP (PDE, ACE) AWS (CCP) BigQuery GCP Dataflow Cloud Storage
🔧

Data Engineering & Databases

Python PySpark SQL Oracle SQL MSSQL Airflow Hadoop ETL Automation MySQL Teradata Linux Git EPIC (Cogito, Caboodle, Clarity)
🤖

Machine Learning & Security Analytics

Risk Scoring Anomaly Detection Predictive Modeling scikit-learn TensorFlow Pandas NumPy R (tidyverse, dplyr, caret)
📊

BI & Visualization

Tableau (Server Admin) Alteryx Power BI SAP
🚀

GenAI & Automation

MCP Claude Code Cursor ChatGPT Enterprise Microsoft Copilot Jenkins CI/CD
👥

Leadership & Collaboration

Stakeholder Alignment Executive Reporting Cross-functional Execution Team Leadership Server Management

// Professional Experience

Business Analytics Manager

PNC Bank Feb 2025 - Present
  • Lead an 8-member engineering and analytics team supporting multiple security domains (WIAM, DLP, PGA, ASM, Digital Identity, ITH), delivering automated workflows and real-time reporting that increased operational visibility and reduced reporting cycles from days to minutes
  • Owned automated budgeting workflows for the Information Security organization, managing $200M+ across cost centers and reducing manual deployment and audit rework by ~20% through automated variance tracking and forecasting
  • Integrated Microsoft Copilot into Python ETL and BI workflows (Tableau, Power BI), automating code generation and anomaly detection while improving engineering velocity 2× and reducing manual coding effort by 40% for security analytics deliverables
Python Tableau PySpark SQL

Business Analytics Lead

PNC Bank Apr 2023 - Jan 2025
  • Owned end-to-end workflow from data engineering to Tableau dashboards, delivering 30+ real-time KRI metrics, automated controls, and issue management for CIO and board-level reporting
  • Engineered ETL pipelines across 10+ security and IT systems (AD/OUD, ServiceNow, Tableau, Tenable, Archer) using Python/PySpark/SQL, processing 50M+ monthly records at 99.9% reliability to support real-time operations
  • Led automation initiatives that saved $1M+ and 10,000+ hours annually, increasing operational efficiency 3–10x on key programs
  • Supported ML models for individual risk scoring, insider fraud, and application risk, reducing false positives by 40%+ and shortening investigation cycles by 60%, improving analyst throughput and reducing investigative workload
  • Maintained data platform infrastructure including Jenkins CI/CD, server management, and sensitive data management (GLBA, PII, PCI, HIPAA)
Python PySpark Tableau Jenkins

Advisor, Data Management and Governance

Cardinal Health Apr 2022 - Feb 2023
  • Built and optimized ETL pipelines integrating SQL Server, GCP, Workday, ServiceNow, and third-party APIs using SQL, Alteryx, Python, Dataflow, Cloud Storage, and BigQuery
  • Led migration of 50+ ETL workflows from SQL Server to BigQuery, reducing query latency by ~70% and supporting 2–3× larger HR data volumes without performance degradation
  • Administered Tableau Server for 900+ HR stakeholders and 10+ C-level leaders, optimizing dashboard performance, strengthening governance, and reducing reporting backlog by 35%
  • Developed Python/R predictive models for turnover risk, internal movement, and workforce planning, deployed into 10+ C-level reporting workflows
GCP BigQuery Python Alteryx

Data Specialist

St. Luke's University Health Network Jul 2021 - Mar 2022
  • Built and maintained 40+ SQL/Tableau assets for ED, OR, and Administration from hybrid clinical warehouse sources, supporting daily clinical operations and utilization analytics
  • Integrated EPIC Cogito/Caboodle/Clarity and SAP into unified Tableau pipelines, improving data consistency and reducing refresh failures by ~40%
  • Engineered Twilio-based call-center data pipelines and reporting for ~20K calls/day with 100+ configurations, improving QA workflows and operational oversight
SQL Tableau EPIC SAP

// Featured Projects

Real-time Security Analytics Platform

Developed comprehensive analytics platform providing 30+ real-time KRI metrics for information security, integrating data from 10+ APIs with automated ETL pipelines and executive dashboards.

Python PySpark Tableau Jenkins

GCP to BigQuery Migration Pipeline

Led migration of 50+ ETL workflows from GCP SQL instances to BigQuery, doubling read/write speed and enabling scalability for 300+ stakeholders across HR, Finance and Executives.

GCP BigQuery Dataflow Alteryx

Predictive Risk Analytics Models

Developed machine learning models for individual risk scoring, insider fraud prediction, and turnover risk prediction, improving risk management accuracy by 80% and serving C-level executives.

Python scikit-learn TensorFlow R

// Education & Certifications

Master of Science in Business Analytics

Washington University in St. Louis January 2021

GPA: 3.97/4.0

Honors: Knight Scholar (Top 1%), Beta Gamma Sigma

Coursework: Machine Learning and Statistical Modeling

Bachelor of Marketing

Xiamen University June 2019

GPA: 3.61/4.0

Coursework: Software Engineering, Marketing

Exchange: McGill University, Montreal, Canada with full scholarship (Top 1%)