Christopher
Millones.

AI/ML Engineer

I build AI systems that actually ship — LLM pipelines, RAG architectures, and the MLOps wiring around them, running at production scale in regulated environments. Four years of making models reliable enough to trust with real decisions.

Most AI projects fail in the gap between the prototype and production. I work in that gap.

Four years of it, specifically — in healthcare, where the margin for error is low and the data is messy. I work across the full stack of an LLM system: prompt design, retrieval architecture, evaluation pipelines, and the cloud infrastructure underneath.

The goal is always the same. Models that perform consistently when it counts, and a paper trail that explains why when they don't.

  • 4+Years Experience
  • ~2MAnnual LLM Invocations
  • 84%Cache Hit Rate Achieved
  • 70% → 90%Extraction Accuracy

The tools are the easy part. Here's what I reach for.

AI / ML

01
LLMsPrompt EngineeringRAG (Modular)Multimodal ExtractionEmbeddingsVector StoresRerankingFine-tuningComputer VisionNLP

Cloud

02
AWS LambdaAWS Step FunctionsSageMakerBedrockAuroraRedshiftDynamoDBEventBridgeRekognitionGCP Vertex AIGemini

Languages

03
PythonJavaScript / TypeScriptSQLBashJava

Frameworks

04
Hugging Face TransformersReactJSServerless FrameworkCloudFormation

MLOps / DevOps

05
CI/CDDockerModel MonitoringBatch & Real-time InferenceA/B EvaluationGitLab

Compliance

06
HIPAA-aware DesignPHI Data HandlingGCP Vertex AI BAA

The roles, the rooms, and the systems that came out of them.

01 — Role

Software Engineer – MLOps & AI

Glidewell Dental Inc.

Nov 2021 – Present

  • Dental prescription forms are handwritten, inconsistent, and structurally chaotic — and we process about a million of them a year. I designed the pipeline that reads: a hierarchical LLM fallback architecture that handles edge cases gracefully, paired with 8+ rounds of prompt iteration that got us to a 90% extraction accuracy and 84% cache hit rate, and meaningfully reduced per-call costs at scale.

  • Call center agents were losing time hunting through internal docs for answers that should've been instant. I designed a RAG system that changed that — not a basic vector search, but a full retrieval stack with semantic chunking, query rewriting, and LLM-based reranking to make sure the right answer actually surfaces.

  • Doctors leave free-text comments on orders. Those comments contain routing decisions that the operations team needs to act on — fast, at scale. I built the pipeline that reads them structurally, extracts the intent, and routes accordingly. About a million orders a year flow through it.

  • Prompt engineering without measurement is just guessing. I built the system that made iteration data-driven: automated eval against ground truth across relational and warehouse databases, with A/B benchmarking baked in. Accuracy went from 70.2% to roughly 90% over successive rounds.

  • Beyond the headline projects: CV and OCR model pipelines, production monitoring with multi-channel alerting, a facial recognition access control system, and a full cost-and-performance analysis for migrating our LLM provider from Bedrock to Vertex AI.

02 — Role

Market Solutions Engineer

Star Micronics Inc.

Aug 2019 – Jan 2021

Before AI, I was the integration person — the one vendors called when something wasn't working, the one writing the SDK docs developers actually used. Hardware, retail systems, anything that had to plug into anything else. It's where I learned to talk technically across the stack, and it still shows up in how I work.

B.S. Computer Science · Cal State Fullerton · GPA 3.70

California State University, Fullerton

B.S. Computer Science

GPA

3.70

Graduated

Aug 2021

Let's build something that ships.

If you're shipping something hard with LLMs, agents, or the infrastructure underneath — I'd like to hear about it. Based in Irvine, CA. Remote or hybrid, both fine.