Joaquin Hui Gomez — Technical Program Manager

01 — selected work

Production AI work, cross-system program delivery, and a published eval benchmark.

amazon · public

Agentic AI for operations planning

Won Most Innovative Solution, 2026 at an internal Amazon AI hackathon. Built an agentic system that compresses a multi-day analyst loop into under two hours — tool-using LLM agent over schema-disciplined deterministic preprocessing of multi-GB experiment bundles, plus async AWS-native orchestration across Lambda, Step Functions, S3, and DynamoDB. Shipped idea-to-production in two weeks; eliminates >1,400 hours/year of senior-PM time on what was previously a 40+ hour analyst process; 7-figure annualized savings; live in production.

agentic ai tool use aws-native in production

Production ML programs · cross-system delivery

Owner of BRDs, launches, and the simulation + eval loops behind production ML decision systems running on a continent-scale network · hundreds of nodes. Modeled across 50+ dynamic inputs · 30+ operational metrics; millions of automated decisions per week; multi-system integration across planning, audit, and instrumentation pipelines. Two production launches in 2025 produced record reliability gains, validated on live metrics, not backtests. 9-figure annualized impact earned through the business-review loop, then channeled into the next Kaizen-driven sprint on the same systems — the build → measure → review → iterate cycle that defines the work.

ml policy simulation multi-system live metrics

Gen-AI eval & frontier-model migration framework

Built an internal LLM-as-judge eval pipeline combining qualitative rubric scoring (6 criteria: depth, rigor, actionability, accuracy, reasoning, conciseness), quantitative metrics (latency, tokens, cost-per-call), and system metrics (tool-call accuracy, citation coverage, hallucination rate, correct-refusal rate on adversarial probes). Used it to justify and execute successive frontier-model migrations across three model generations with explicit ROI and opportunity-cost framing — quality gains in double-digit pp on tool-call accuracy and citation coverage, hallucination rate cut by an order of magnitude, ≥30% inference-cost reduction per call. Prompt, model, and eval artifacts version-controlled; cutover atomic with predecessor model pinned for archival replay. Cost-per-call instrumented as a first-class metric in the AWS pipeline (Bedrock + Lambda + DynamoDB telemetry) alongside latency and quality. Public companion: LLM Judge Calibrator — same methodology, open-sourced.

model evaluation model migration post-training alignment aws bedrock cost monitoring

02 — open source

Open-source contributions across the AI infrastructure stack.

full ledger →

34 merged upstream PRs across the AI infrastructure stack — first-party (Anthropic Claude Agent SDK, OpenAI Agents SDK), LLM serving (HuggingFace Transformers, LiteLLM), compute & orchestration (Ray Serve, ZenML on Kubernetes), post-training & fine-tuning (axolotl), RAG & document AI (LlamaIndex, Docling), agents & eval (Continue, Mastra). 29 in flight including further first-party work at Anthropic (Agent SDK, Cookbooks, Go SDK) and Microsoft (Semantic Kernel), plus Chroma and the agent-tooling stack.

34merged · 24mo

29in flight

1.1M+stars across repos

huggingface/transformers

LLM serving — generation fix removing stale num_return_sequences warning on continuous-batching path. Keeps inference logs honest in the canonical LLM serving library.

merged · apr 24

★ 159.9k

ray-project/ray

Compute & orchestration — Ray Serve autoscaling timing fix; scale-up decisions happen on the intended wall-clock path instead of drifting under real traffic.

merged · apr 3

★ 42.3k

BerriAI/litellm

LLM gateway — TTFT capture for /v1/messages across Anthropic, Bedrock, and Vertex. Observability primitive for multi-provider gateways.

merged · apr 16

★ 44.7k

axolotl-ai-cloud/axolotl

Fine-tuning — two merges in the leading OSS fine-tuning framework: skip redundant evaluation on checkpoint-resume; chat-template handling for qwen3_5.

2× merged · apr

★ 11.7k

anthropics/claude-agent-sdk-python

First-party Anthropic — merged. Security-advisory remediation in the Claude Agent SDK: bumped the MCP dependency floor for GHSA-9h52-p55h-vw2f. Two more Anthropic PRs in flight (Agent SDK, Go SDK).

merged · may

★ 6.7k

03 — side projects

Other things I've built on the side.

public urls · in development

multilingual speech-to-draft

Verba

Workspace that preserves raw transcripts, surfaces ambiguity, and runs a multi-provider eval loop across English, 中文, español, français, and português. In daily personal use.

open ↗

multilingual AI · telegram mini app

Tarot Truth Teller

Trilingual AI tarot in 繁體中文 / español / english. Telegram-native flows and shareable reading cards. Voice that's deadpan and a little funny.

open ↗

04 — background

Three continents, six languages, one operating thesis.

education · work · footprint

I started in Hong Kong with dual LLB + BBA degrees at HKU, did a magna cum laude master's in Strategy & International Management at St. Gallen with a GenAI thesis on LLMs, agents, alignment, and governance, exchanged in Bogotá and Santiago, led an NGO in Guatemala, consulted out of Beijing with Accenture (market-entry strategy, Oracle ERP migration, ML revenue modeling) and Shanghai with PwC (SAP system integration), and joined Amazon as a Business Analyst in Luxembourg before rotating to the L5 PM II role in London in 2023.

My background compounds into how I build with AI. Legal training (HKU LLB) shapes the way I write specs and structure prompts — AI as systems operating inside explicit rules, authorities, and validation layers. Strategic + consulting training (St. Gallen Master's, Accenture, PwC) shapes the MECE problem decomposition, the customer-back planning under ambiguity, and the structured prompting habit that turns vibe-coding into rigor. Multilingual + cross-continent operating range (six natural languages, three continents) shapes how I read AI behavior across languages and locales — and is part of why I built Verba. PM craft shapes how I scaffold agentic teams, the knowledge bases they ground in, and the pipelines they run on. Together, this is how the in-production agentic system above went from idea to live in two weeks.

working thesis The next decade of AI is won by the teams that make capability claims honest. Capabilities are the input. The eval loop is the product.

Claude Certified Architect – Professional · 2026 AWS Certified Generative AI Developer – Professional · 2026 AWS Certified AI Practitioner · 2025 AWS Certified Cloud Practitioner · 2022 4× AWS-certified stack

leadership · cross-functional programs · DEI

Cross-functional program ownership — coordinated production-ML rollouts across 5+ stakeholder teams (PM, SDE, ML research, operations) on EU-wide footprint · two-pizza-team structure. Owned BRDs, launch artifacts, and eval loops; coordinated through stakeholder leads on each launch.
Mentorship track record — every intern and direct mentee mentored at Amazon to date has received a return offer. Coached through both technical execution and PM-craft skill development.
Cross-cultural delivery — operated and shipped across 3 continents and 6 natural languages: consulting tenure in Beijing (Accenture) and Shanghai (PwC), master's-era programs in Latin America, current EU operations leadership.
ERG involvement — active in Glamazon and the Asian & Latino affinity groups at Amazon. Diversity is a leadership lens, not a checkbox.
NGO leadership — led Niños de Xela for one year. Sustainable agricultural-product diversification and long-term business-model design with underprivileged indigenous women in Quetzaltenango, Guatemala. Main speaker and point of contact during in-country fieldwork; organized fundraising events.

master's thesis · st. gallen · 2024

A systems view of generative AI in no-code / low-code development — built around LLMs, agents, alignment, and governance rather than prompt theater.

Supervisor: Dr. Edona Elshan · graded magna cum laude

50K+reddit entries · chained pipeline

3pillars · LLMs · agents · governance

Pydanticagent coordination layer

footprint

past current

hover a city · london current

🇬🇧LondonUK

2023 — present · pm II

Amazon PM II. Production ML & Gen AI programs across European operations. Active OSS contributor across the AI infrastructure stack — merged first-party contribution in Anthropic's Claude Agent SDK, with further Anthropic PRs in flight (Agent SDK, Cookbooks, Go SDK).

🇱🇺LuxembourgLU

amazon · business analyst

Amazon Business Analyst. Analytics, automation, and cloud-planning work across EU operations finance. Foundation for the later return to Amazon at PM II.

🇨🇭St. GallenCH

master's · magna cum laude

University of St. Gallen — MA Strategy & International Management. GenAI thesis on LLMs, agents, alignment, and governance: chained pipeline over 50K+ Reddit entries with Pydantic-coordinated agents.

🇭🇰Hong KongHK

hku · llb + bba

The University of Hong Kong — dual LLB + BBA. Scholarships and debating-society leadership. Legal training that shapes how I build AI systems: explicit rules, authorities, validation layers.

🇨🇳BeijingCN

accenture · consultant

Accenture Consultant. Three engagement areas: China market-entry strategy for global liquor clients; Oracle ERP integration and migration for a major electronics manufacturer; revenue-strategy modeling using ML. Cross-cultural delivery at large-org scale.

🇨🇳ShanghaiCN

pwc · consultant · sap

PwC Consultant. SAP system integration and migration across Greater China clients. Trilingual delivery in English / 中文 / 粵語.

🇨🇴BogotáCO

u. de los andes · master exchange

Universidad de los Andes — master's exchange. Latin American economy, accounting rules, and business context. Spanish + Portuguese operating range deepened.

🇬🇹QuetzaltenangoGT

niños de xela · 1-yr ngo lead

Year-long lead of Niños de Xela — sustainable agricultural-product diversification and long-term business-model design with underprivileged indigenous women in Quetzaltenango. Main speaker and point of contact during in-country fieldwork; organized fundraising events.

🇨🇱SantiagoCL

u. católica · undergrad exchange

Pontificia Universidad Católica de Chile — undergrad exchange. Sharpened cross-cultural operating range and Latin American context for early business-strategy work.

05 — contact

AI systems
teams can trust
in production.