technical program manager · ai infrastructure · london

AI systems
teams can trust
in production.

Public engineering record: 28 merged upstream PRs across the AI infrastructure stack — HuggingFace Transformers, Ray Serve, LlamaIndex, LiteLLM, axolotl, OpenAI Agents Python, Continue — plus first-party in-flight contributions to Anthropic and Microsoft codebases. Day job: PM II at Amazon, leading production ML and Gen AI programs across European operations — agentic-AI system in production (internal hackathon Most Innovative 2026), LLM-as-judge eval pipeline driving a frontier-model migration with ≥30% inference-cost reduction, 9-figure annualized impact across multiple ML launches. The AI-infrastructure work — inference, agents, post-training & fine-tuning, evaluation, alignment — happens in the OSS contributions and the day job; the operations work is where I learned how to run programs at network scale. HKU law + business, St. Gallen Master's magna cum laude, three continents, six natural languages.

amazon · accenture · pwc · 3 continents · english · 中文 · 粵語 · español · français · português
01 — selected work

Production AI work, cross-system program delivery, and a published eval benchmark.

amazon · public
▲ MOST INNOVATIVE · 2026 Agentic AI for operations planning >1,400 hrs / year of manual work eliminated LLM intake + deterministic prep + async AWS orchestration internal AI hackathon · most innovative · 2026

Agentic AI for operations planning

Won Most Innovative Solution, 2026 at an internal Amazon AI hackathon. Built an agentic system that compresses a multi-day analyst loop into under two hourstool-using LLM agent over schema-disciplined deterministic preprocessing of multi-GB experiment bundles, plus async AWS-native orchestration across Lambda, Step Functions, S3, and DynamoDB. Eliminates >1,400 hours/year of recurring manual work; live in production.

agentic ai tool use aws-native in production
PRODUCTION ML · CROSS-SYSTEM DELIVERY 9-figure annualized · modeled, multi-year ▸ continent-scale network · 50+ inputs · 30+ metrics ▸ BRDs · launches · eval loops · live-metric validation ▸ multi-system integration · rollout · audit · instr. ▸ millions of automated decisions / week ▲ amazon · pm II · london · since 2022

Production ML programs · cross-system delivery

Owner of BRDs, launches, and the simulation + eval loops behind production ML decision systems running on a continent-scale network · hundreds of nodes. Modeled across 50+ dynamic inputs · 30+ operational metrics; millions of automated decisions per week; multi-system integration across planning, audit, and instrumentation pipelines. Two production launches in 2025 produced record reliability gains, validated on live metrics, not backtests. 9-figure annualized impact (modeled, multi-year).

ml policy simulation multi-system live metrics
▲ EVAL · MIGRATION · POST-TRAINING LLM-as-judge eval pipeline ▸ qualitative rubric · 6 criteria ▸ quantitative · latency · tokens · cost ▸ system · tool-call accuracy · refusal rate ▸ adversarial honesty probe ▲ amazon · frontier-model migration · 2026

Gen-AI eval & frontier-model migration framework

Built an internal LLM-as-judge eval pipeline combining qualitative rubric scoring (6 criteria: depth, rigor, actionability, accuracy, reasoning, conciseness), quantitative metrics (latency, tokens, cost-per-call), and system metrics (tool-call accuracy, citation coverage, hallucination rate, correct-refusal rate on adversarial probes). Used it to justify and execute a frontier-model migration with explicit ROI and opportunity-cost framing — quality gains in double-digit pp on tool-call accuracy and citation coverage, hallucination rate cut by an order of magnitude, ≥30% inference-cost reduction per call. Prompt, model, and eval artifacts version-controlled; cutover atomic with predecessor model pinned for archival replay. Cost-per-call instrumented as a first-class metric in the AWS pipeline (Bedrock + Lambda + DynamoDB telemetry) alongside latency and quality. Public companion: LLM Judge Calibrator — same methodology, open-sourced.

model evaluation model migration post-training alignment aws bedrock cost monitoring
02 — open source

Open-source contributions across the AI infrastructure stack.

28 merged upstream PRs across the AI infrastructure stack — LLM serving (HuggingFace Transformers, LiteLLM), compute & orchestration (Ray Serve), post-training & fine-tuning (axolotl), RAG & document AI (LlamaIndex, Docling), agents & eval (OpenAI Agents Python, Continue, Mastra). 40 in flight including first-party contributions to Anthropic (Cookbooks, Go SDK), Microsoft (Semantic Kernel), and Vercel (AI SDK security disclosure), plus active work in CrewAI, Chroma, and the agent-tooling stack.

28merged · 24mo
40in flight
1.11Mstars across repos
03 — side projects

Other things I've built on the side.

public urls · in development
04 — background

Three continents, six languages, one operating thesis.

education · work · footprint

I started in Hong Kong with dual LLB + BBA degrees at HKU, did a magna cum laude master's in Strategy & International Management at St. Gallen with a GenAI thesis on LLMs, agents, alignment, and governance, exchanged in Bogotá and Santiago, led an NGO in Guatemala, consulted out of Beijing with Accenture (market-entry strategy, Oracle ERP migration, ML revenue modeling) and Shanghai with PwC (SAP system integration), and joined Amazon as a Business Analyst in Luxembourg before rotating to the L5 PM II role in London in 2024.

The legal training shapes how I build AI: not as unconstrained generation engines, but as systems operating inside explicit rules, authorities, and validation layers. The international footprint shapes the operator habit: most of a program is translation — of constraints between teams, of intent between locales, of model behavior between markets.

working thesis The next decade of AI is won by the teams that make capability claims honest. Capabilities are the input. The eval loop is the product.
AWS Certified AI Practitioner · 2025 AWS Certified Cloud Practitioner · 2022 3× AWS-certified stack

leadership · cross-functional programs · DEI

  • Cross-functional program ownership — coordinated production-ML rollouts across 5+ stakeholder teams (PM, SDE, ML research, operations) on EU-wide footprint · two-pizza-team structure. Owned BRDs, launch artifacts, and eval loops; coordinated through stakeholder leads on each launch.
  • Mentorship track record — every intern and direct mentee mentored at Amazon to date has received a return offer. Coached through both technical execution and PM-craft skill development.
  • Cross-cultural delivery — operated and shipped across 3 continents and 6 natural languages: consulting tenure in Beijing (Accenture) and Shanghai (PwC), master's-era programs in Latin America, current EU operations leadership.
  • ERG involvement — active in Glamazon and the Asian & Latino affinity groups at Amazon. Diversity is a leadership lens, not a checkbox.
  • NGO leadership — led Niños de Xela for one year. Sustainable agricultural-product diversification and long-term business-model design with underprivileged indigenous women in Quetzaltenango, Guatemala. Main speaker and point of contact during in-country fieldwork; organized fundraising events.

master's thesis · st. gallen · 2024

A systems view of generative AI in no-code / low-code development — built around LLMs, agents, alignment, and governance rather than prompt theater.
Supervisor: Dr. Edona Elshan · graded magna cum laude
50K+reddit entries · chained pipeline
3pillars · LLMs · agents · governance
Pydanticagent coordination layer
work
education
footprint
past current
hover a city · london current
🇬🇧LondonUK
2024 — present · pm II

Amazon PM II. Production ML & Gen AI programs across European operations. Active OSS contributor across the AI infrastructure stack — direct in-flight contributions to Anthropic Cookbooks and the Anthropic Go SDK.

🇱🇺LuxembourgLU
amazon · business analyst

Amazon Business Analyst. Analytics, automation, and cloud-planning work across EU operations finance. Foundation for the later return to Amazon at PM II.

🇨🇭St. GallenCH
master's · magna cum laude

University of St. Gallen — MA Strategy & International Management. GenAI thesis on LLMs, agents, alignment, and governance: chained pipeline over 50K+ Reddit entries with Pydantic-coordinated agents.

🇭🇰Hong KongHK
hku · llb + bba

The University of Hong Kong — dual LLB + BBA. Scholarships and debating-society leadership. Legal training that shapes how I build AI systems: explicit rules, authorities, validation layers.

🇨🇳BeijingCN
accenture · consultant

Accenture Consultant. Three engagement areas: China market-entry strategy for global liquor clients; Oracle ERP integration and migration for a major electronics manufacturer; revenue-strategy modeling using ML. Cross-cultural delivery at large-org scale.

🇨🇳ShanghaiCN
pwc · consultant · sap

PwC Consultant. SAP system integration and migration across Greater China clients. Trilingual delivery in English / 中文 / 粵語.

🇨🇴BogotáCO
u. de los andes · master exchange

Universidad de los Andes — master's exchange. Latin American economy, accounting rules, and business context. Spanish + Portuguese operating range deepened.

🇬🇹QuetzaltenangoGT
niños de xela · 1-yr ngo lead

Year-long lead of Niños de Xela — sustainable agricultural-product diversification and long-term business-model design with underprivileged indigenous women in Quetzaltenango. Main speaker and point of contact during in-country fieldwork; organized fundraising events.

🇨🇱SantiagoCL
u. católica · undergrad exchange

Pontificia Universidad Católica de Chile — undergrad exchange. Sharpened cross-cultural operating range and Latin American context for early business-strategy work.

05 — contact