This project aims to create a scalable, real-time language operations pipeline that leverages AWS services to manage multilingual Twitter data in a batch-processing workflow. The pipeline collects Twitter data based on specific hashtags, saves it in batches, and processes it through a series of AWS services.

Data Preparation Step-By-Step (Option 1)

🌍 Project Pipeline in Details

1. Data Collection and Storage


2. Simulated Real-Time Streaming (Kafka Setup)


3. AWS Lambda Trigger for Initial Processing


4. Real-Time Processing and Translation Pipeline (Kafka Consumers)