This step ensures that your dataset is formatted correctly for real-time simulation. The dataset has entries with fixed 30-second intervals, simulating the timing of real-world data arrival.
text: The actual tweet content.timestamp: A unique timestamp for each entry to simulate time-based data.language: The tweet's language (e.g., 'en' for English, 'es' for Spanish, 'fr' for French).user_info: Optional additional information, such as user ID or username.YYYY-MM-DD HH:MM:SS format.real_time_dataset.csv), which the Kafka producer script will read.