Search This Blog

Sunday 20 July 2014

Third Step: Popularity Comparison of Two Celebrities in Twitter Using R

Hey everyone...
Its time we appreciate the power of R.
In this post we will try to fetch the tweets of two famous personalities, teams etc in Twitter and try to analyze who is famous between two with respect to some common comparison parameters.

Oh one thing, I hope you already have done the handshake with Twitter using your credentials. If not, then please refer to my post "Second Step" to do that.

OK let's proceed now...

********************* Code In R to Accomplish The Mission ********************
#Get Tweets for a searchTerm
TweetFrame<- function (searchTerm,maxTweets)
{
  twtlist<-searchTwitter(searchTerm,maxTweets,cainfo="cacert.pem")
  return(do.call("rbind",lapply(twtlist,as.data.frame)))


}                                                                         #End of Function TweetFrame


#Function to do a popularity check
popularityCheck<-function(name1,name2,count)
{
  name1DF<-TweetFrame(name1,count)   
  name2DF<-TweetFrame(name2,count)  
 
  sortname1<-name1DF[order(as.integer(name1DF$created)),]
  sortname2<-name2DF[order(as.integer(name2DF$created)),]
 
  eventdelays1<-as.integer(diff(sortname1$created))
  eventdelays2<-as.integer(diff(sortname2$created))
 
  meanof1<-mean(eventdelays1)
  sumval1<-sum(eventdelays1<=round(meanof1,1))  #here val of sumval1 becomes the        

                                                                       #common ground of comparison
  res1<-poisson.test(sumval1,count)$conf.int
 
  meanof2<-mean(eventdelays2)
  sumval2<-sum(eventdelays2<=sumval1)               #hence sumval1 is used to compare.
  res2<-poisson.test(sumval2,count)$conf.int
 
  p1<-as.single(sumval1/count)
  p2<-as.single(sumval2/count)
 
  l1=as.single(res1[1])
  l2=as.single(res2[1])
  u1=as.single(res1[2])
  u2=as.single(res2[2])
  barplot2(c(p1, p2), ci.l = c(l1,l2), ci.u = c(u1,u2), plot.ci=TRUE,  

  names.arg=c(name1,name2))

}                                                                           #End of Function popularityCheck


******************************** END**********************************

At first, let's see what this code will do and then we will see how it did that...

Well in the past few days people were so much engrossed in FIFA 2014 that there was flood of posts and tweets. So why not conduct a popularity check on FIFA teams.  

Input

Team 1: Argentina (#argentina)
Team 2: Germany (#germany)
Number of Tweets extracted from Twitter: 500 (for each team)


>popularityCheck("#argentina","#germany",500)



Output



So, this is what we got. This plot clearly shows that on some comparison basis Argentina is more popular than Germany.

Now let us understand how we got this..

At first look at the TweetFrame(searchTerm,maxTweets) function. This function takes a "searchTerm" say #germany and 500 tweets as "maxTweets" in input and return 500 tweets in a list form. Hence the result is stored in a variable twtList. Now the content of twtList is very haphazard. To give it a proper tabular format we convert the list into a Dataframe and return it. 

Now let us look at the function popularityCheck(name1,name2,count) which is of more concern. name1 and name2 are the two search terms and count is no.of tweets we need to extract. 

If we look at the first two lines of the function it takes these terms and prepares two separate lists.

The next two lines sort the respective lists in order of arrival times of tweets. The latest tweet is kept first and so on.

The next two lines prepares two lists say eventdelays1 and eventdelays2 which keep the difference of arrival times..

Next we compute mean of eventdelays1 named as meanof1 and count the number of tweets that comes within the mean value... This becomes the ground for comparison and we find the number of tweets for next search term that came within meanof1. The count of tweets satisfying condition of meanof1 is kept in sumval1 and sumval2.

The next two lines compute the probabilities of tweets coming within meanof1. The values are stored in p1 and p2. l1 and u1 is the range which says that 95% of the tweets out of 500 with a desired mean of 'meanof1' lies in between this range. This goes for l2 and u2 as well.

And the godfather line is executed which is barplot(...) [please select the package gplots from package window and if it is not there then you can install writing "install.packages("gplots") ]with arguments shown above which are self explanatory. This command plots the graph and an instance is shown above. The plot shows that people are tweeting more about Argentina and comparatively less for Germany. Well this is it....

Well I hope it was of some worth spending your time... Please feel free to make any suggestions.

The next thing which i am going to do is build a word cloud in R using Tweets. And I feel fun doing it.. Till my next post, as I say,  Happy Learning...
 

Second Step: Configuring R with Twitter.


 Hi...

Well i hope you are done with the installation of R console and R Studio...
Now we are interested in getting Tweets from Twitter and do some kind of analysis. But for that we need to configure R in such a way that it can communicate with Twitter. Hard??? Trust me, its not. Let's get started...

Step 1: You should have an account in Twitter. Create one if you don't have.

Step 2: Go to https://dev.twitter.com/ and log in with your Twitter credentials. The reason why we are here is that you need to create an app so that you can prove to twitter your true identification and your purpose of accessing tweets from it. Don't worry its not that terrifying. Once you have logged in please find your profile icon on the upper right corner of the screen and select "My Applications". Then select "Create a New Application. Fill in your details sensibly. It will ask you for a "website" address and in case if you don't have one, fill in with an address of any decent site. This part is just a formality. And "Callback URL" can be left blank. Please select the check box so that your application is set in a way that it can be used to sign in with Twitter.

Step 3: You will come across a  screen showing you some data. Please don't rush in closing it. It's the reason why all this hard work is done. Save the data as you will need it. Importantly "Consumer Key"  and the "Consumer Secret" is needed. This will help R to communicate with Twitter so that it can fetch data. Save "Request Token URL" and "Authorize URL".. 

Step 4: Open R studio.

Step 5: Write the following commands one by one.
            EnsurePackage("bitops")
            EnsurePackage("RCurl")
            EnsurePackage("RJSONIO")
            EnsurePackage("twitteR")
            EnsurePackage("ROAuth")
            This are some packages which we need for our work. Make sure you have selected
            them from the packages window on the bottom-right side after they are    
            installed. 


Step 6: Now if you are using windows then you need SSL tokens. This is needed to 
             maintain a secure communication. You need to download a file. Below is the
             syntax.
        
            download.file(url="http://curl.haxx.se/ca/cacert.pem",destfile="cacert.pem")

Step 7: It's time we will use the consumer secret and key, and others.

credential <- OAuthFactory$new(consumerKey="lettersAndNumbers",
consumerSecret="lettersAndNumbers",
requestURL="https://api.twitter.com/oauth/request_token", 
accessURL="https://api.twitter.com/oauth/access_token",
authURL="https://api.twitter.com/oauth/authorize")

         I m extremely sorry for the alignment of the above code. All I can do is explain it
         to you. First of all, credential will store the result of statement written on right  
         side. Note this is a single command. In place of "lettersAndNumbers" substitute 
         your Key and Secret saved earlier. Press enter to run it. 

Step 8: Write credential$handshake(cainfo="cacert.pem") for Windows and if you 
             are running it on Linux you might erase the code inside parentheses.

Step 9:  You will get a response back that looks like this:

             When complete, record the PIN given to you and provide it here:
             To enable the connection, please direct your web browser to:
             https://api.twitter.com/oauth/authorize?oauth_token=...
             (Please copy the link you get and paste it into your browser address bar)

            Please provide the pin in order to complete the handshake process. You are 
            almost done if you don't get any errors. You might be wondering that you need to
            repeat this steps again. So the answer is no. The credential object, and all of the 
            other active data, will be stored in the default workspace when you exit R or  
            R-Studio. Make sure you know which workspace it was saved in so you can get it  
            back later.

Step 10: Yes the hard work is over. Let's see if everything went right or not. Write:
                registerTwitterOAuth(credential) and if it gives TRUE then buddy you 
                are ready to go. Hard work payed you. But if errors peek in, you need to
                revisit the commands and find bugs. 

Well this is the end to my post. In my next post we are going to fetch the data from Twitter...

Wednesday 9 July 2014

First Step: Introduction to R

Hi everyone...

Being a student of data analytics the first baby step one takes is getting familiar with R. Lets begin with defining R (source: Wikipedia)


What is R???

R is a programming language and a software environment for statistical computing and graphics. It compiles and runs on a wide variety of OS like Linux, Windows and Mac OS. It is basically used for data analysis... 

 
 How to install R in your machine???

 R console is like command prompt, where you can write and run R commands. You can  write and save your commands in a script file with extension .R and you can later run  the script in console. This will give you the feel of writing programs and running it. But if you are a fan of IDEs (Integrated Development Environment) and you are more inclined towards development and experimentation then installing R console only will not serve your purpose. In that case, you need R studio. 

R studio helps in many ways. You can think R studio as "genie".  You wish, it fulfills... We will see why is R studio important.
  1. Here is the link to download R Console in your machine.
  2. Now that you have installed R Console you need to install R studio as well for smooth and smart experience. Get R Studio 
 
What after installing R???

Major part of this post is almost done.
Now you can learn basic R commands from tryr.codeschool.com.
This is a very beautiful website from where you can learn basic R commands easily. They have 7 Chapters each focusing on a different concept in R. 
Sign in with your existing accounts and you are ready to go...



Well this brings me to end of this post... My next post will be something related to the title of the blog. And by that time I expect that reader will find some interest in completing  the chapters from the site mentioned earlier.

Happy Learning...