Skip to content
mimi

Staff Data Engineer/Scientist

CACI International Inc

Chantilly · On-site Full-time Lead $113k – $238k/yr 1mo ago

About the role

Below is a ready‑to‑use, fully‑customizable package you can submit with your application to CACI for the Staff Data Engineer/Scientist role.

It includes:

  1. Targeted cover‑letter template – highlights the exact language from the posting, shows how you meet every “required” and “desired” qualification, and demonstrates cultural fit.
  2. Resume‑bullet‑library – a set of concise, achievement‑focused bullet points you can copy‑paste (or adapt) into the “Professional Experience” section of your CV.
  3. Quick‑check checklist – a 10‑item sanity‑check to make sure you haven’t missed anything before you hit “Submit”.

Feel free to edit the placeholders (e.g., [Your Name], [Company X]) with your own details.


1️⃣ Cover‑Letter (PDF‑ready)

[Your Name]
[Street Address] • [City, State ZIP] • [Phone] • [Email] • [LinkedIn] • [GitHub]

[Date]

Hiring Manager
CACI International Inc.
1100 Wilson Blvd., Suite 3000
Arlington, VA 22209

Dear Hiring Manager,

I am excited to apply for the Staff Data Engineer/Scientist position (TS/SCI + Polygraph) advertised on CACI’s career portal. With 8 + years designing and operating large‑scale ETL pipelines, deploying production‑grade LLM‑driven analytics, and leading interdisciplinary teams of researchers and software engineers, I am uniquely positioned to accelerate CACI’s AI/ML mission while upholding the highest security standards.

Why I’m a perfect fit

End‑to‑end data platforms – At [Current/Most Recent Employer], I architected a multi‑tenant data lake on AWS (S3, Glue, EMR, Athena) that ingested >30 TB/day from heterogeneous sources (satellite telemetry, SIGINT feeds, and open‑source text). The platform reduced raw‑to‑model latency from 12 h to < 30 min and supported downstream LLM fine‑tuning pipelines using HuggingFace Transformers.

AI/ML production expertise – I built a PySpark‑based feature store that feeds a BERT‑style document classifier and a time‑series forecasting service (Prophet + DeepAR) used by senior analysts to predict equipment failure with 92 % F1. I also prototyped a LLM‑as‑a‑judge workflow (LangFuse + MLflow) that automatically scores analyst reports for compliance.

Leadership & mentorship – As Lead Data Engineer, I managed a cross‑functional squad of 5 data scientists, 3 software engineers, and 2 domain researchers. I instituted Agile ceremonies, introduced GitLab CI/CD with test‑driven development, and mentored junior staff on Docker‑Compose, Kubernetes, and secure DevSecOps pipelines. Team velocity increased 35 % while defect rates dropped to < 2 %.

Security‑cleared & mission‑focused – I hold an active TS/SCI with Full‑Scope Polygraph and have worked on multiple DoD contracts (e.g., AFWERX, NSA, DARPA) where data integrity, provenance, and compliance were non‑negotiable. My experience with FedRAMP‑authorized AWS services and Zero‑Trust networking aligns directly with CACI’s security posture.

What I bring to CACI

  • Proven ability to assemble, curate, and operationalize massive, multi‑modal datasets for LLM fine‑tuning and downstream analytics.
  • Deep hands‑on experience with Python (NumPy, Pandas, Polars), Spark/PySpark, Docker, Kubernetes, and AWS Bedrock.
  • Familiarity with Transformer‑based vision models (ViT), reinforcement‑learning frameworks (Gymnasium, RLlib), and GPU‑accelerated CUDA kernels.
  • A culture‑first mindset: integrity, collaboration, and continuous learning—values that echo CACI’s “culture of integrity” and “environment of trust.”

I would welcome the opportunity to discuss how my technical leadership and security‑cleared background can help CACI deliver next‑generation AI/ML solutions for our nation’s most critical missions. Thank you for your consideration.

Sincerely,

[Your Name]

Tip: Export the above as a PDF, keep the file name Lastname_Firstname_CACI_StaffDataEngineer.pdf, and attach it alongside your resume.


2️⃣ Resume‑Bullet‑Library

Formatting tip: Use the [Action verb] + [what you built/optimized] + [technology stack] + [impact (quantified)] pattern. Keep each bullet ≤ 2 lines.

A. Current / Most Recent Role (Data Engineering Lead – DoD Contract)

Bullet When to Use
Architected a secure, multi‑tenant data lake on AWS (S3, Glue, EMR, Athena) ingesting >30 TB/day from heterogeneous classified sources, cutting raw‑to‑model latency from 12 h → 30 min. Show ETL scale & AWS expertise.
Designed & implemented a PySpark‑based feature store feeding BERT‑style classifiers and Prophet/DeepAR time‑series models, achieving 92 % F1 on anomaly detection. Demonstrates ML pipeline & Spark skill.
Led a cross‑functional team of 10 (data scientists, software engineers, domain researchers) using Agile/Scrum, raising sprint velocity 35 % while maintaining <2 % defect rate. Highlights leadership & Agile.
Built CI/CD pipelines in GitLab with test‑driven development, Docker‑Compose, and Kubernetes deployments, reducing release cycle from 2 weeks → 2 days. Shows DevSecOps competence.
Implemented a LLM‑as‑a‑judge workflow (LangFuse + MLflow) to auto‑score analyst reports, decreasing manual review time by 70 %. Directly maps to “GenAI Ops” desire.
Authored security‑hardening guidelines for data movement (SFTP, IAM policies, KMS) that passed DoD RMF assessment on first review. Reinforces clearance & security focus.
Mentored 4 junior engineers on CUDA‑accelerated data transforms, resulting in faster image‑preprocessing for vision‑LLM fine‑tuning. Shows GPU/CUDA experience.

B. Prior Role (Senior Data Engineer – Federal Agency)

Bullet When to Use
Developed end‑to‑end ETL pipelines using Python (Pandas, Polars) and Airflow, processing 5 TB/month of structured/unstructured data for downstream analytics. Core ETL experience.
Integrated HuggingFace Transformers (BERT, RoBERTa) into a document‑classification service that reduced manual tagging effort by 85 %. NLP/LLM relevance.
Automated data quality checks with Great Expectations, catching >99 % of schema violations before ingestion. Process‑improvement focus.
Collaborated with a research team to prototype Vision‑Transformer (ViT) models on satellite imagery, improving target detection accuracy from 68 % → 81 %. Shows cross‑domain Transformer work.
Managed cloud resources on AWS (EC2, S3, Lambda, Bedrock), optimizing cost by 30 % via spot‑instance scheduling and S3 lifecycle policies. Cloud cost‑efficiency.
Presented quarterly technical briefings to senior leadership, translating complex ML concepts into actionable mission insights. Communication skill.

C. Early Career (Data Engineer – Commercial SaaS)

Bullet When to Use
Built a real‑time streaming pipeline with Kafka + Spark Structured Streaming, delivering sub‑second data freshness to a dashboard used by >10,000 customers. Demonstrates high‑throughput streaming.
Containerized all services with Docker and orchestrated via Docker‑Compose, enabling reproducible dev environments for a distributed team. Docker experience.
Implemented unit‑ and integration‑tests (pytest) achieving >90 % code coverage across the data‑processing codebase. Test‑driven development.

3️⃣ Quick‑Check Checklist (Before Submitting)

Item
1 Resume file name: Lastname_Firstname_CACI_Resume.pdf
2 Cover‑letter file name: Lastname_Firstname_CACI_CoverLetter.pdf
3 Clearance statement: “Active TS/SCI with Full‑Scope Polygraph (FSP) – Clearance verified as of [Month Year]”.
4 Keywords: Ensure every required skill (Python, Pandas/Polars, Git, TS/SCI) appears verbatim in your resume.
5 Quantified impact: All bullets include a metric (%, time saved, volume processed, accuracy).
6 Security compliance: No mention of classified details beyond “classified source” or “secure environment”.
7 Tailored summary (top of resume): 2‑3 sentence “Professional Summary” that mirrors the job title and highlights LLM, ETL, and leadership.
8 LinkedIn/GitHub: Public repos showing Dockerfiles, Airflow DAGs, Spark jobs, or a small LLM fine‑tuning demo (optional but impressive).
9 Application portal: Upload resume, cover letter, and any required supplemental PDFs (e.g., SF‑86 clearance documentation).
10 Follow‑up: Send a brief thank‑you email to the recruiter (if contact info provided) within 24 h of submission.

How to Use This Package

  1. Copy the cover‑letter into a Word or Google Docs file, replace placeholders ([Your Name], [Current/Most Recent Employer], etc.), and export as PDF.
  2. Select the most relevant bullet points from the library (≈ 6‑8 per role) and paste them into your existing resume, adjusting dates/technologies to match your actual experience.
  3. Run the Quick‑Check Checklist to catch any missing pieces.
  4. Submit through CACI’s career portal, attach the PDFs, and keep a copy of the submission confirmation for your records.

Final Thought

CACI is looking for a technical leader who can bridge research breakthroughs with production‑grade data pipelines while safeguarding classified information. By explicitly aligning your experience with the language in the posting—and quantifying the impact you’ve delivered—you’ll demonstrate that you not only meet the baseline requirements but also bring the extra expertise (LLM‑Ops, GenAI, GPU‑accelerated pipelines) that will set you apart.

Good luck, and feel free to reach out if you’d like a deeper review of your final resume or a mock interview focused on the AI/ML topics listed above! 🚀

Skills

AWSBERTCUDADockerDocker ComposeGenAI OpsGitGitLabGoGymnasium GymHuggingFaceJavaKubernetesLambdaLangChainLangFuseLLM-as-a-judgeLinuxMLFlowNumPyOpenEnvPandasPodmanPolarsPythonRancherReinforcement learningRLlibRustS3SparkStable BaselinesTorchRLTransformer-based architecturesTS/SCI with PolygraphVersion control

Don't send a generic resume

Paste this job description into Mimi and get a resume tailored to exactly what the hiring team is looking for.

Get started free