Infestation!

All semester I have struggled to define ‘usefulness’ for myself, my tool, my method of experimentation. Now at the end, I can safely say that most of what I toyed with didn’t end up in the final…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Crawling Twitter Account Data using Rapid Miner

I am kinda using Rapid Miner for a few University Projects , thought i would share some things i learned during the process , this is just about getting data from specific twitter accounts .

Installation :

The Interface :

When you done setting up Rapid Miner you will be greeted with this start page .

As you can see there is a bunch of different options here for the sake of this tutorial we will use the Blank option .

Our goal is to get Twitter Data from specific twitter accounts of news publication .

As you can see in from the screenshot , there is three available operators on regards to twitter available from the get go in Rapid Miner . Since we want to crawl user tweets , we will select get twitter user statuses for our case .

We will take three of them because we want to crawl three accounts accounts and append them together using the append operator .

On the right hand side you will see a connection option , click the twitter logo beside it to set up a twitter account to log in to , that’s not the account we will take tweets from but it is important be logged into an account in order to crawl data.

Now we need to set up the Get Twitter User Status operator , to do that lets start with finding the news outlet we want to pick for our work , for this we will nytimes ,dailystar and the star .

We will use each of these twitter accounts user name . For New York Times , it’s nytimes , for The star it is staronline and for The daily star , it is dailystar .

If we select one of the operators , parameters of that operator will show up on the right side of the Rapid Miner dashboard . We will select the connection type we already added before . Query type we will select name because we will crawl using the username of the account . We will then set the user to our desired account , for this case we will use nytimes , dailystar and staronline and we will set the limit of query to 1000 . so it will work with 1000 tweets.

Now we will use the filter operator to filter what type of news we would want to find , for now we are trying to find the negative news posted by these accounts recently (out of 1000 tweets each account .

So we are almost done , now we are moving close to getting our results , if we want to save our result in a excel file we can use the Write to Excel operator which will print the result to excel.

So finally it will look something like this , now we will try running the program using the blue play button and it should produce something like this .

From-User holds all user accounts and text field contains the status . From the number of column we can now find out out of 1000 news , how many of them contain the negative words we used as our filter .

Easy right ?

Exercise : Compare the number of Bad and Good news generated in the United States of America , using four news news outlet’s twitter account and compare the results .

Visualization using Rapid Miner :

Now let’s say we want visualize what we want to display a pie chart , that will display the Retweet count for each account .

Let’s select our plot type . pie chart from these then tick the Aggregate data column

Since we will plot the chart using the number of tweets generated from the three accounts , we are gonna group by From-User , Aggregation Fund will be set to count as we are counting number of tweets .

And we have our Pie Chart .

Exercise : From KFC and Pizza hut’s twitter page , plot a pie chart based on the retweets for each account .

Add a comment

Related posts:

My Brush with Busking

The cement vibrates and it sounds like another rush of lava is exploding through the train-sized tube buried deeply under my feet. Amazingly, my heartbeat persists over the geologic commotion…

My start in Makeover Monday

My Makeover Monday journey started in Las Vegas and as any good Vegas story should be, it was a little random. You see I was attending my first ever Tableau Conference it was #TC17 and I knew nothing…

The Visit or The Old Lady Comes to Call.

Tony Kushner adapted (yes he of wondrous Angels in America) and staring the sublime Lesley Manville and introducing to the NT stage the phenomenal Hugo Weaving. This is GOLD. No, this is PLATINUM…