LLM-Powered Real-Time Translation System

“A scalable multilingual translation system leveraging Kafka, FastAPI, and Hugging Face — achieving real-time, low-latency LLM deployment with full observability.”


<aside> 📢

A Kafka-Driven Pipeline with MLOps Automation Author: Wonha Shin | University of Rochester Tools: Kafka • FastAPI • Hugging Face • Docker • Prometheus • Grafana • W&B • AWS EC2

</aside>


This project presents a real-time multilingual translation pipeline operationalized with LLMOps principles. The system leverages Kafka for distributed streaming, FastAPI for real-time translation serving, and Hugging Face’s Helsinki-NLP/opus-mt-es-en model for Spanish → English translation. The entire pipeline was deployed on AWS EC2, with containerized monitoring through Prometheus + Grafana, and experimental integration with Weights & Biases (W&B) for logging latency and translation quality.

🧩 The goal: achieve low-latency, high-throughput translation while maintaining observability, scalability, and reliability.

kafka_project_overview.png

🛰️ Kafka for Data Streaming

Backbone for message ingestion and routing — handles millions of multilingual messages per day with three brokers and ZooKeeper for high availability.

🧠 Translation Model (Hugging Face)

Spanish→English translation with Helsinki-NLP/opus-mt-es-en, deployed as a FastAPI microservice for sub-second inference latency.

API Management (FastAPI)

REST endpoints for real-time translation, seamlessly integrated with Kafka topics for streaming ingestion and output publishing.

📈 Monitoring & Observability

Prometheus + Grafana dashboards for Kafka metrics, latency, and resource utilization.

Early-stage W&B integration for BLEU and drift metrics.


Deliverables

📌 Key Components of the Pipeline