Platform engineering · Regulated environments

Governed AI infrastructure for regulated environments

I build the production platforms that run LLMs and agents where audit, security, and reliability are not optional - for banks, fintechs, and high-stakes operators.

Book an intro call See what I build

llm-gateway — production governed · audited

$ curl -s https://llm.internal/v1/chat \

-H "Authorization: Bearer $TEAM_TOKEN"

route: bedrock/claude ALLOWED

audit: request logged TRACED

budget: team quota WITHIN LIMIT

8+ yrs regulated infra 4 bank platform deliveries Banking · Fintech · Enterprise SaaS

What I build

The deployment, not the demo

Infrastructure that runs AI in production for organizations where governance and reliability are first-class requirements.

AI & LLM Platform Engineering

Governed platforms for running LLMs and agents in production: AWS Bedrock and LiteLLM gateways, multi-account architecture, access control, observability, and cost governance. The layer that turns an AI experiment into something a regulated org can actually run.

Explore

AgentOps & Production AI Reliability

Operating agentic systems safely once they are live: health monitoring, human-gated actions, and audit trails. The discipline of running AI in production rather than prototyping it.

See Tendwell

Core DevOps & Cloud Engineering

The full foundation, in any environment: AWS and Azure architecture, Kubernetes, CI/CD, Infrastructure as Code with Terraform and Ansible, and production incident response. Eight-plus years across banking, fintech, and enterprise SaaS.

All services

Production Incidents, Resolved

The hard production failures others have given up on - diagnosed, fixed, and turned into runbooks so the same failure does not recur. Networking, failover, and observability that reflects reality.

How it works

Years Experience

Bank Platform Deliveries

~30%

Cloud Cost Reduced

15+

Projects Delivered

Who you work with

You work with me directly - no bench, no hand-offs. Currently the lead DevOps engineer delivering a core banking platform into production across four banks, concurrently.

Bogdan Moldovan Lead DevOps & Platform Engineer Full background LinkedIn

Products

Tools I've built

I build the tools I wish existed for this work - and ship them.

Apache-2.0 Self-hosted Local-first

Tendwell

Self-hostable, local-first AgentOps for production health. It observes metrics and runbooks, reasons with a local LLM, and explains what it finds - with human-gated, hash-chained-audited actions. Built for security-conscious and regulated teams.

Explore Tendwell Read the story

tendwell — health analysis local-first · no egress

> is production healthy right now?

availability: error_rate=0.057 BREACHED

propose: restart_service AWAITING HUMAN

audit: proposal recorded HASH-CHAINED

Commercial CLI 105+ resource types AWS · Azure · GCP

Terraback

Reverse engineer your cloud infrastructure into Terraform code. Terraback imports live AWS, Azure, and GCP resources into clean, maintainable Terraform - so teams can retrofit Infrastructure as Code onto legacy ClickOps environments.

Visit Terraback.io Read the story

terraback — import zero-diff plans

$ terraback scan aws --region eu-west-1

discovered: 214 resources across 12 services

$ terraback generate

wrote: modules/vpc, modules/eks, modules/rds HCL

Commercial SDK 896 model-generations On-device · API

CarVision

A production car-recognition engine: make, model, and generation from a single photo. 896 model-generations across 76 makes at 93.85% top-1 accuracy, with calibrated confidence and explicit rejection handling. Runs on-device (TensorFlow Lite, Core ML, ONNX) in ~50 ms, or as a self-hosted API - it powers Boby's Garage in production.

View on GitHub Read the story

carvision — inference on-device · ~50 ms

$ carvision classify photo.jpg

BMW 3 Series (F30) · 2011–2019 TOP-1

confidence: 0.94 CALIBRATED

runtime: tflite-fp16 · 48 ms

Field notes

Latest from the blog

Practitioner write-ups from regulated banking work - running AI in production, compliance in practice, and the infrastructure underneath.

Jun 2026

Operating Agentic Systems in Production: Lessons from Building Tendwell

The hard part is not the model, it is the guardrails around it - the propose/approve/execute separation, local-first as a hard default, and audit as a feature.

May 2026

The four-hour clock starts before you understand the incident: notes on DORA Article 17 in practice

Why incident classification is harder than the regulation makes it look, and why the tooling you bring in is itself a compliance surface.

All posts

Building AI platforms where reliability and compliance matter?

If you are running - or planning to run - AI in a regulated or high-stakes environment, let's talk about the infrastructure underneath it.

Book an intro call

Replies within one business day · NDA-friendly · contact@reops.tech