“A scalable multilingual translation system leveraging Kafka, FastAPI, and Hugging Face — achieving real-time, low-latency LLM deployment with full observability.”
<aside> 📢
A Kafka-Driven Pipeline with MLOps Automation Author: Wonha Shin | University of Rochester Tools: Kafka • FastAPI • Hugging Face • Docker • Prometheus • Grafana • W&B • AWS EC2
</aside>
This project presents a real-time multilingual translation pipeline operationalized with LLMOps principles. The system leverages Kafka for distributed streaming, FastAPI for real-time translation serving, and Hugging Face’s Helsinki-NLP/opus-mt-es-en model for Spanish → English translation. The entire pipeline was deployed on AWS EC2, with containerized monitoring through Prometheus + Grafana, and experimental integration with Weights & Biases (W&B) for logging latency and translation quality.
🧩 The goal: achieve low-latency, high-throughput translation while maintaining observability, scalability, and reliability.

Backbone for message ingestion and routing — handles millions of multilingual messages per day with three brokers and ZooKeeper for high availability.
Spanish→English translation with Helsinki-NLP/opus-mt-es-en, deployed as a FastAPI microservice for sub-second inference latency.
REST endpoints for real-time translation, seamlessly integrated with Kafka topics for streaming ingestion and output publishing.
Prometheus + Grafana dashboards for Kafka metrics, latency, and resource utilization.
Early-stage W&B integration for BLEU and drift metrics.