• Implemented end to end data pipeline for collecting live stream of tweets from Twitter. Created a Twitter developer account and an application in it. Using the Twitter API & Access tokens and keys from this app, user can collect live stream of tweets for the interested topic of tweets in real time. • 4 different Maven modules for Java in backend illustrating 1) Idempotent Kafka Producer to get data from Twitter API into Kafka Topic. 2) Idempotent Kafka Consumer to get data from Kafka & storing it in ElasticSearch hosted in bonsai.io cloud. 3) A custom Java class for filtering Twitter tweets based on followers count & other features. 4) Performance improvement using Batching with Bulk Request Handling, Exception Handling for bad data, Multithreading and Logging. • Stored data locally in PostgreSQL with schema enforcement using Avro. Tested REST proxy using Insomnia client.
Stars
0
Forks
0
Watchers
0
Open Issues
4
Overall repository health assessment
No package.json found
This might not be a Node.js project
1
commits