<aside> 💡

All about Dataset Considerations

Professor Comment: The source of the dataset was not adequately specified in the proposal. Please include a more clear and specific source.

</aside>

Dataset Considerations

📌 Option 1:  Twitter API + Local Kafka Setup + AWS S3 (Batch Processing).

📌Option2:  Simulate Real-Time Streaming from a Dataset

📌Option3:  Create News RSS Hub

Dataset Preparations 🔌

I have chosen to focus on real-time news data directly from the website “El País" for my project to ensure its relevance and applicability. The options I considered above were slightly more complicated and not particularly suited for the ‘real-time’ project. The data source is reputable and frequently updated, which reinforces its reliability for my project's objectives.

Data Preparation Step-By-Step (Option 1)

Using TweetEval Pre-Saved Dataset (Option 2)

How To Generate Custom Dataset (For Test)

How to Create RSS Hub (Option 3)