Crypto Tweets Fetch using Flume & Hadoop (PRACTICAL)

Shubham Kumar Gupta
4 min readJan 23, 2022

--

Simran: Hey! I am new to investing in cryptocurrency.

Me: Nice! At least you started investing! That’s good!

Simran: But I feel this crypto market highly depends on news, being from technical backgrounds, can we do something?

Me: Yeah! Sure I guess we can do something. I have heard of Apache Flume which is an awesome application used for logging big data, we can analyze the tweets of Elon Musk😏 and get something.

Simran: That sounds interesting. Can you tell me in brief how I can also analyze them?

Me: Sure! So let’s start!

Me: So, basically we will start streaming data from Twitter, in order to get tweets from Twitter, we will need set up a Twitter application, we need to pick keywords related to cryptocurrency Doge 🪙, and then we need to run Hadoop and Flume.

Simran: As far as I remember from your last medium article on Hadoop (WordCounter in Hadoop! (Windows PRACTICAL) | by Shubham Kumar Gupta | Jan, 2022 | Medium) was to handle big data but what this Flume now?

Me:

Flume

Apache Flume is a distributed, reliable, and available software for efficiently collecting, aggregating and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.

It can stream live logs from different cloud sources like social platforms such as Facebook, Twitter, etc. These streamed data can be passed to Hive and Hadoop for further analysis.

Flume accepts data from a source and stores it in the channel. Reading speed is generally faster than writing speed, so we need a buffer to match the read-write pace. Then these data are passed and stored in hdfs.

Simran: Can you tell me how to do this straight away 🙄? Practically

Me: 😅 Sure! Let's start! first let's create a Twitter Application

Twitter Application

i) So, First we need to visit http://apps.twitter.com/

ii) We need to give the name and click Get Keys

iii) Now you will get API_KEY, API_KEY_SECRET, & BEARER_ACCESS_TOKEN

iv) But you need some more, so let's click on setup OAuth, you can choose v2 and provide your description, T&C URL, privacy policy URL, Now you can click generate to get ACCESS_TOKEN, and ACCESS_TOKEN_SECRET.

v) Now, it may happen that tweets you are fetching is way more than the limits set by Twitter, so apply for Elevated twitter developer

Me: Cool! Now, Let’s see how to set up Apache Flume.

Flume Setup

i) Download Apache Flume : [DOWNLOAD LINK ]

ii) Extract the tar file tar -xvf flume.tar.gz or using WinRAR

iii) Inside the conf folder, Rename flume-conf.properties.template to flume-conf.properties

iv) Write this inside the flume-conf.properties file

v) we need to set a path FLUME_HOME =D:\apache\flumeand, append to the path D:\apache\flume\bin

vi) Here, You can see in sources we mentioned Twitter, we named our channel as MemChannel, we mentioned jar file needed to be used, and put all tokens here.

vii) Now, we named our sink and put the path for the sink in HDFS, We set the type of output stream of data type to be text.

viii) We set the batch size(number of tweets that should be in a batch), capacity(number of events stored in the channel), and transaction capacity(number of events the channel accepts )

Now, to Fetch all tweets related to cryptocurrency Doge, we will be to use keywords like

TwitterAgent.sources.Twitter.keywords= elon musk, doge, doge coin, bitcoin, crypto, forex, tesla, coin, rocket, ether, mining

Simar: Blah Blah! When we will get results? 😟

Me: 😅, Not to worry we have to run it now.

Steps to run

i) Run start-all.cmd

ii) From the terminal you have to just run this command (I'm in this location D:\apache\flume)

>> bin\flume-ng agent --conf conf --conf-file conf/flume-conf.properties -property "flume.root.logger=INFO,console" -n TwitterAgent

iii) Now we can go to the path which we set in flume-conf his path, i.e flume_tweets using commandhdfs dfs -ls /flume_tweets we can see which files are there in this directory

iv) Now we can read using the cat command hdfs dfs -cat /flume_tweets/FlumeData

Me: Tada! We got our tweets! Now, let's move into analyse them properly!

Now, we have tweets related to crypto, now we can analyse them using google NLP to get further information! So time for another medium blog till then,

Thank You for reading this!

Simran: Thank you for this will wait for your next blog.

--

--

Shubham Kumar Gupta
Shubham Kumar Gupta

No responses yet