> AKANKSH
GATLA

Data Engineer AI/ML
📍 Plano, Texas
"I transform raw data into intelligent systems and build ML pipelines that scale. AI is my playground."
Akanksh Gatla
$ cat akanksh_profile.json
  {"experience": "4+ years",
  "ms_degree": "UB New York",
  "certifications": 9,
  "projects": "6+",
  "publications": 1,
  "coffee_consumed": "∞"}
> ready to scale your data.
🏦 Dec 2025 — Senior Data Engineer @ Capital One (Plano, TX) 🔧 Aug 2025 - Nov 2025 — Data Integration Engineer @ Genzeon (Exton, PA) 📊 Nov 2024 - Aug 2025 — Data Engineer @ Capital One (West Creek, VA) 💼 Aug 2023 - Nov 2024 — Data Engineer @ Unity Population Health (Columbus, OH) 🚀 Sep 2020 - Jul 2022 — Associate Data Engineer @ Unity Population Health (Remote) 🎓 Apr 2020 - Aug 2020 — Junior Data Scientist @ The Spark Foundation (Remote) 🏦 Dec 2025 — Senior Data Engineer @ Capital One (Plano, TX) 🔧 Aug 2025 - Nov 2025 — Data Integration Engineer @ Genzeon (Exton, PA) 📊 Nov 2024 - Aug 2025 — Data Engineer @ Capital One (West Creek, VA) 💼 Aug 2023 - Nov 2024 — Data Engineer @ Unity Population Health (Columbus, OH) 🚀 Sep 2020 - Jul 2022 — Associate Data Engineer @ Unity Population Health (Remote) 🎓 Apr 2020 - Aug 2020 — Junior Data Scientist @ The Spark Foundation (Remote)

MY JOURNEY

Building AI-powered data systems at scale — from startups to Fortune 500

DEC 2025 - PRESENT

Senior Data Engineer

Capital One | Plano, Texas

  • Built region-agnostic automation enabling 200+ jobs to onboard without prod impact (85% efficiency gain)
  • Optimized Snowflake warehouse by 40%, cutting query latency from hours to seconds
  • Led AI Agent development for end-to-end automated pipeline orchestration
AWS Glue Snowflake PySpark Databricks Step Functions Jenkins Terraform
AUG 2025 - NOV 2025

Data Integration Engineer

Genzeon Corporation | Exton, PA

  • Architected HL7/FHIR ingestion pipelines for 350+ healthcare clients into Azure Data Lake
  • Built intelligent automation reducing manual data mapping by 40%, improving data quality by 30%
  • Designed Medallion Architecture (Bronze/Silver/Gold) delivering analytics-ready curated views
Azure Databricks PySpark Delta Lake ADLS Gen2 FHIR HL7
NOV 2024 - AUG 2025

Data Engineer

Capital One | West Creek, Virginia

  • Orchestrated 20+ AWS Glue Spark ETL jobs through Step Functions for automated daily reporting
  • Optimized Snowflake by 40%, enhanced query performance by 30% via indexing and micro-partitioning
  • Delivered AWS Serverless Data Products (Lambda, Step Functions, S3, Glue) in 6-week sprint cycles
AWS Glue Snowflake Lambda PySpark Terraform Jenkins
AUG 2023 - NOV 2024

Data Engineer

Unity Population Health | Columbus, Ohio

  • Architected HIPAA-compliant ELT platform automating 8-hourly EMR data extraction via OAuth 2.0
  • Built ML-powered automation for non-compliant patient cohorts in Value-Based Care programs
  • Deployed UDS reporting platform with Spark SQL and Power BI, reducing deployment time by 30%
Azure Data Factory Databricks PySpark Delta Lake Power BI HL7/FHIR
SEP 2020 - JUL 2022

Associate Data Engineer

Unity Population Health | Remote

  • Orchestrated Airflow DAGs processing 3.8M patient records with PySpark at scale
  • Built AI-powered Patient Chatbot (NLP) handling 5K+ interactions with 92% satisfaction rate
  • Enhanced risk stratification by 35% through feature engineering (PCA, K-Means, DBSCAN)
Apache Airflow PySpark Databricks NLP Azure Bot Service Tableau
APR 2020 - AUG 2020

Junior Data Scientist

The Spark Foundation | Remote

  • Optimized data pipelines with Airflow DAGs and PostgreSQL, accelerating retrieval by 40%
  • Built Ensemble ML models (Random Forest, Gradient Boosting) achieving 84% accuracy
  • Created Tableau dashboards for data visualization and storytelling
Python PostgreSQL scikit-learn Apache Airflow Tableau ML

EDUCATION

🎓 UNIVERSITY AT BUFFALO
M.S. Computer Science & Engineering
  • Advanced coursework in Machine Learning and Data Computing
  • Specialized in Database Systems and Algorithm Design
  • Published research in IREJT (July 2023)
  • Built multiple data-intensive projects with 94.8% model accuracy
ML Data Computing Algorithms Programming Database
🎓 LOVELY PROFESSIONAL UNIVERSITY
B.Tech Computer Science & Engineering
  • Foundation in Data Structures, Algorithms, and DBMS
  • Object-Oriented Programming and Software Engineering
  • Computer Networks and Operating Systems
  • Started journey into data engineering and analytics
Data Structures DBMS OOP Algorithms Networks

PROJECTS

State Crime Analysis & Safety Prediction

Nov 2023
State Crime Analysis State Crime Analysis preview
  • R² = 94.8% Random Forest model for crime prediction
  • Analyzed 10+ years of state-level crime data
  • Built interactive visualizations with Python
  • Deployed as web dashboard for insights
Python ML Pandas
→ GITHUB

Wikipedia API Scraping + Solr

Oct 2023
Wikipedia Solr Wikipedia Solr preview
  • Scraped 500+ Wikipedia docs via API
  • Implemented full-text search with Apache Solr
  • Deployed on Google Cloud Platform
  • Optimized query performance with indexing
Python Solr GCP
→ GITHUB

Boolean Query & Inverted Index

Aug 2023
Boolean Query Boolean Query preview
  • Flask REST API for document retrieval
  • Implemented inverted index from scratch
  • Support for complex boolean queries
  • Efficient search on large document corpus
Python Flask IR
→ GITHUB

Global Super Store CRUD App

Apr 2023
Super Store CRUD Super Store CRUD preview
  • PostgreSQL database with strategic indexing
  • Full CRUD operations web interface
  • Query optimization for fast retrieval
  • RESTful API design patterns
PostgreSQL Python REST
→ GITHUB

8-bit Computer Architecture Processor

Apr 2023
8-bit Processor 8-bit Processor preview
  • Designed using Verilog HDL
  • Synthesized with Xilinx Vivado
  • Non-pipelined architecture implementation
  • Complete instruction set design
Verilog FPGA Vivado
→ GITHUB

Healthcare ML Application

May 2022
Healthcare ML Healthcare ML preview
  • 89% F1 Score Random Forest Classifier
  • Predictive modeling for patient outcomes
  • Feature engineering on medical data
  • Model deployment as Flask API
ML Python Flask
→ GITHUB

TECH STACK

PROGRAMMING LANGUAGES

Python
Java
C++
R
JavaScript
SQL
LaTeX

BIG DATA & FRAMEWORKS

Apache Spark
Hadoop
Flask
Django
Pandas
NumPy
Scikit-learn

DATABASES & TOOLS

PostgreSQL
MySQL
MongoDB
Git
Docker
Apache Solr
Linux

CLOUD & DEVOPS

Google Cloud
AWS
Tableau
Power BI

CERTIFICATIONS

Professional certification growth trajectory — from research to enterprise-level expertise

YEAR 2024 2023 2020 2020 2023 2024 TIMELINE Intro to Data Science Predictions & Analytics Data Analysis (Pandas) Intro to Generative AI Lakehouse Fundamentals Gen AI Fundamentals Azure Data Engineer Data Engineer Associate ↗ TRENDING UP Skills Growth
Introduction to Data Science
Lovely Professional University
Predictions & Analytics
IRJET Publication
Data Analysis with Python
Jovian
Introduction to Generative AI
Google Cloud Skills Boost
Lakehouse Fundamentals
Databricks Academy
Generative AI Fundamentals
Databricks Academy
Azure Data Engineer
Microsoft
Data Engineer Associate
Databricks Academy

LET'S CONNECT

Have a data engineering challenge or just want to chat about tech, pipelines, and coffee consumption? Drop me a message and let's build something amazing together.

Email
akankshgatla@gmail.com
Location
Plano, Texas 📍