This step ensures that your dataset is formatted correctly for real-time simulation. The dataset has entries with fixed 30-second intervals, simulating the timing of real-world data arrival.
text
: The actual tweet content.timestamp
: A unique timestamp for each entry to simulate time-based data.language
: The tweet's language (e.g., 'en' for English, 'es' for Spanish, 'fr' for French).user_info
: Optional additional information, such as user ID or username.YYYY-MM-DD HH:MM:SS
format.real_time_dataset.csv
), which the Kafka producer script will read.