Staff Data Engineer/Scientist
CACI International Inc
About the role
Below is a ready‑to‑use, fully‑customizable package you can submit with your application to CACI for the Staff Data Engineer/Scientist role.
It includes:
- Targeted cover‑letter template – highlights the exact language from the posting, shows how you meet every “required” and “desired” qualification, and demonstrates cultural fit.
- Resume‑bullet‑library – a set of concise, achievement‑focused bullet points you can copy‑paste (or adapt) into the “Professional Experience” section of your CV.
- Quick‑check checklist – a 10‑item sanity‑check to make sure you haven’t missed anything before you hit “Submit”.
Feel free to edit the placeholders (e.g., [Your Name], [Company X]) with your own details.
1️⃣ Cover‑Letter (PDF‑ready)
[Your Name]
[Street Address] • [City, State ZIP] • [Phone] • [Email] • [LinkedIn] • [GitHub][Date]
Hiring Manager
CACI International Inc.
1100 Wilson Blvd., Suite 3000
Arlington, VA 22209Dear Hiring Manager,
I am excited to apply for the Staff Data Engineer/Scientist position (TS/SCI + Polygraph) advertised on CACI’s career portal. With 8 + years designing and operating large‑scale ETL pipelines, deploying production‑grade LLM‑driven analytics, and leading interdisciplinary teams of researchers and software engineers, I am uniquely positioned to accelerate CACI’s AI/ML mission while upholding the highest security standards.
Why I’m a perfect fit
• End‑to‑end data platforms – At [Current/Most Recent Employer], I architected a multi‑tenant data lake on AWS (S3, Glue, EMR, Athena) that ingested >30 TB/day from heterogeneous sources (satellite telemetry, SIGINT feeds, and open‑source text). The platform reduced raw‑to‑model latency from 12 h to < 30 min and supported downstream LLM fine‑tuning pipelines using HuggingFace Transformers.
• AI/ML production expertise – I built a PySpark‑based feature store that feeds a BERT‑style document classifier and a time‑series forecasting service (Prophet + DeepAR) used by senior analysts to predict equipment failure with 92 % F1. I also prototyped a LLM‑as‑a‑judge workflow (LangFuse + MLflow) that automatically scores analyst reports for compliance.
• Leadership & mentorship – As Lead Data Engineer, I managed a cross‑functional squad of 5 data scientists, 3 software engineers, and 2 domain researchers. I instituted Agile ceremonies, introduced GitLab CI/CD with test‑driven development, and mentored junior staff on Docker‑Compose, Kubernetes, and secure DevSecOps pipelines. Team velocity increased 35 % while defect rates dropped to < 2 %.
• Security‑cleared & mission‑focused – I hold an active TS/SCI with Full‑Scope Polygraph and have worked on multiple DoD contracts (e.g., AFWERX, NSA, DARPA) where data integrity, provenance, and compliance were non‑negotiable. My experience with FedRAMP‑authorized AWS services and Zero‑Trust networking aligns directly with CACI’s security posture.
What I bring to CACI
- Proven ability to assemble, curate, and operationalize massive, multi‑modal datasets for LLM fine‑tuning and downstream analytics.
- Deep hands‑on experience with Python (NumPy, Pandas, Polars), Spark/PySpark, Docker, Kubernetes, and AWS Bedrock.
- Familiarity with Transformer‑based vision models (ViT), reinforcement‑learning frameworks (Gymnasium, RLlib), and GPU‑accelerated CUDA kernels.
- A culture‑first mindset: integrity, collaboration, and continuous learning—values that echo CACI’s “culture of integrity” and “environment of trust.”
I would welcome the opportunity to discuss how my technical leadership and security‑cleared background can help CACI deliver next‑generation AI/ML solutions for our nation’s most critical missions. Thank you for your consideration.
Sincerely,
[Your Name]
Tip: Export the above as a PDF, keep the file name Lastname_Firstname_CACI_StaffDataEngineer.pdf, and attach it alongside your resume.
2️⃣ Resume‑Bullet‑Library
Formatting tip: Use the [Action verb] + [what you built/optimized] + [technology stack] + [impact (quantified)] pattern. Keep each bullet ≤ 2 lines.
A. Current / Most Recent Role (Data Engineering Lead – DoD Contract)
| Bullet | When to Use |
|---|---|
| Architected a secure, multi‑tenant data lake on AWS (S3, Glue, EMR, Athena) ingesting >30 TB/day from heterogeneous classified sources, cutting raw‑to‑model latency from 12 h → 30 min. | Show ETL scale & AWS expertise. |
| Designed & implemented a PySpark‑based feature store feeding BERT‑style classifiers and Prophet/DeepAR time‑series models, achieving 92 % F1 on anomaly detection. | Demonstrates ML pipeline & Spark skill. |
| Led a cross‑functional team of 10 (data scientists, software engineers, domain researchers) using Agile/Scrum, raising sprint velocity 35 % while maintaining <2 % defect rate. | Highlights leadership & Agile. |
| Built CI/CD pipelines in GitLab with test‑driven development, Docker‑Compose, and Kubernetes deployments, reducing release cycle from 2 weeks → 2 days. | Shows DevSecOps competence. |
| Implemented a LLM‑as‑a‑judge workflow (LangFuse + MLflow) to auto‑score analyst reports, decreasing manual review time by 70 %. | Directly maps to “GenAI Ops” desire. |
| Authored security‑hardening guidelines for data movement (SFTP, IAM policies, KMS) that passed DoD RMF assessment on first review. | Reinforces clearance & security focus. |
| Mentored 4 junior engineers on CUDA‑accelerated data transforms, resulting in 3× faster image‑preprocessing for vision‑LLM fine‑tuning. | Shows GPU/CUDA experience. |
B. Prior Role (Senior Data Engineer – Federal Agency)
| Bullet | When to Use |
|---|---|
| Developed end‑to‑end ETL pipelines using Python (Pandas, Polars) and Airflow, processing 5 TB/month of structured/unstructured data for downstream analytics. | Core ETL experience. |
| Integrated HuggingFace Transformers (BERT, RoBERTa) into a document‑classification service that reduced manual tagging effort by 85 %. | NLP/LLM relevance. |
| Automated data quality checks with Great Expectations, catching >99 % of schema violations before ingestion. | Process‑improvement focus. |
| Collaborated with a research team to prototype Vision‑Transformer (ViT) models on satellite imagery, improving target detection accuracy from 68 % → 81 %. | Shows cross‑domain Transformer work. |
| Managed cloud resources on AWS (EC2, S3, Lambda, Bedrock), optimizing cost by 30 % via spot‑instance scheduling and S3 lifecycle policies. | Cloud cost‑efficiency. |
| Presented quarterly technical briefings to senior leadership, translating complex ML concepts into actionable mission insights. | Communication skill. |
C. Early Career (Data Engineer – Commercial SaaS)
| Bullet | When to Use |
|---|---|
| Built a real‑time streaming pipeline with Kafka + Spark Structured Streaming, delivering sub‑second data freshness to a dashboard used by >10,000 customers. | Demonstrates high‑throughput streaming. |
| Containerized all services with Docker and orchestrated via Docker‑Compose, enabling reproducible dev environments for a distributed team. | Docker experience. |
| Implemented unit‑ and integration‑tests (pytest) achieving >90 % code coverage across the data‑processing codebase. | Test‑driven development. |
3️⃣ Quick‑Check Checklist (Before Submitting)
| ✅ | Item |
|---|---|
| 1 | Resume file name: Lastname_Firstname_CACI_Resume.pdf |
| 2 | Cover‑letter file name: Lastname_Firstname_CACI_CoverLetter.pdf |
| 3 | Clearance statement: “Active TS/SCI with Full‑Scope Polygraph (FSP) – Clearance verified as of [Month Year]”. |
| 4 | Keywords: Ensure every required skill (Python, Pandas/Polars, Git, TS/SCI) appears verbatim in your resume. |
| 5 | Quantified impact: All bullets include a metric (%, time saved, volume processed, accuracy). |
| 6 | Security compliance: No mention of classified details beyond “classified source” or “secure environment”. |
| 7 | Tailored summary (top of resume): 2‑3 sentence “Professional Summary” that mirrors the job title and highlights LLM, ETL, and leadership. |
| 8 | LinkedIn/GitHub: Public repos showing Dockerfiles, Airflow DAGs, Spark jobs, or a small LLM fine‑tuning demo (optional but impressive). |
| 9 | Application portal: Upload resume, cover letter, and any required supplemental PDFs (e.g., SF‑86 clearance documentation). |
| 10 | Follow‑up: Send a brief thank‑you email to the recruiter (if contact info provided) within 24 h of submission. |
How to Use This Package
- Copy the cover‑letter into a Word or Google Docs file, replace placeholders (
[Your Name],[Current/Most Recent Employer], etc.), and export as PDF. - Select the most relevant bullet points from the library (≈ 6‑8 per role) and paste them into your existing resume, adjusting dates/technologies to match your actual experience.
- Run the Quick‑Check Checklist to catch any missing pieces.
- Submit through CACI’s career portal, attach the PDFs, and keep a copy of the submission confirmation for your records.
Final Thought
CACI is looking for a technical leader who can bridge research breakthroughs with production‑grade data pipelines while safeguarding classified information. By explicitly aligning your experience with the language in the posting—and quantifying the impact you’ve delivered—you’ll demonstrate that you not only meet the baseline requirements but also bring the extra expertise (LLM‑Ops, GenAI, GPU‑accelerated pipelines) that will set you apart.
Good luck, and feel free to reach out if you’d like a deeper review of your final resume or a mock interview focused on the AI/ML topics listed above! 🚀
Requirements
- Minimum Clearance Required to Start: TS/SCI with Polygraph
- B.S. in data science, AI/ML, computer science, or related field
- Minimum six (6) years of relevant experience as a Data Engineer/Scientist
- Experience developing data pipelines and normalizing data with canonical Python packages (e.g
- NumPy, Pandas, Polars)
- Experience contributing on a team using version control (e.g. git, GitLab, Bitbucket)
- Active TS/SCI U.S. Government Security Clearance with a recent Full-Scope Polygraph (FSP)
- Docker, Docker Compose), cloud services (e.g
- Experience leading an interdisciplinary team of researchers and software developers
- Large Language Models and experience identifying ways to incorporate them into new domains and applications
- Applying Transformer-based architectures to domains in other areas outside of Natural Language Processing (NLP) such as computer vision
- Natural Language Processing algorithms such as BERT
- Reinforcement learning and familiarity with Gymnasium Gym, OpenEnv, TorchRL, RLlib, and Stable Baselines
- Experience with GenAI Ops techniques (e.g. LLM-as-a-judge) and frameworks (e.g. LangFuse, MLFlow, Arize Phoenix)
- Experience with Machine Learning libraries and frameworks such as HuggingFace and LangChain
- Experience with Linux
- Familiarity with using AWS cloud computing resources such as EC2, S3, Lambda, Bedrock, etc
- Experience with any of the following additional languages: Java, C++, Rust, Go, and/or C#
- Experience implementing algorithms on the GPU in Python or C++ using CUDA and other CUDA libraries
- Experience in application deployment, virtualization, and containerization (e.g. Podman, Docker, Kubernetes, Rancher)
- Experience shaping and writing proposals
Responsibilities
- Percentage of Travel Required: Up to 10%
- You will support the development of AI/ML algorithms in a multitude of disciplines from large language models, natural language processing, and time-series predictive analytics
- Lead and mentor an interdisciplinary team consisting of both developers and researchers
- The team's core focus is the implementation of ETL pipelines to support a variety of AI/ML and LLM solutions, which in turn address a broad range of customer challenges
- Assembles large, complex sets of data to support AI/ML algorithm implementation
- Builds required infrastructure for optimal extraction, transformation and loading of data from various data sources
- Curate and maintain data that is stored in support of metrics and evaluation
- Implement Artificial Intelligence/Machine Learning algorithms
- Identifies, designs, and implements internal process improvements including re-designing infrastructure for greater scalability, optimizing data delivery, and automating manual processes
- Using Agile methodologies to develop software
- AWS), tools for distributed computing (e.g
- Applying clustering algorithms and/or deep neural networks to real life problems
- Implementing tracking and pattern-of-life algorithms
Benefits
Skills
Don't send a generic resume
Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.
Get started free