Friday, July 31, 2015

Visualizing Data Science

Linked is a presentation I put together for TDWI Boston for the "Advanced Topics in Data Visualization" session.  Enjoy!

https://www.slideshare.net/fullscreen/51147149/1

Monday, July 27, 2015

POS News Update: Automatically Tweeting Using R

Introduction

I am fond of automation.  Why?  Because if I can automate something, then I don't have to do it manually, and that saves me time.  It also saves me emotional energy if it does a task that I really don't want to do. So automation can save me time and energy, particularly for unpleasant tasks.

One unpleasant task (at least for me) is that of self-promotion.  Nowadays it is a wise career move to continually post links and comment through Twitter, LinkedIn, Yammer, and various other job-related social media.  It is to one's advantage to remain engaged in hot topics in one's field and to demonstrate knowledge by posting insightful comments.  By doing so, one demonstrates that one is a leader in the field (and worthy of pursuit in hiring and career advancement).

But we know that such self-promotion can be very time consuming, and perhaps unpleasant for us, even though it is beneficial for our careers.  How can we have our cake and eat it too?  Enter in automation. I thought, if I could automate some of this "self-promotion", I could be more actively engaged in the community AND also free up some time for myself.  Plus, it would be a fun avenue of personal research for advancing my skills.

Automating Twitter "News" Updates

I am not very active in Twitter (at all), so I thought this would be a good place to start with an automation experiment.  What would I Tweet?  I wanted to explore Natural Language Processing a bit, so I eventually settled on doing the following:

1. Gather the latest Google News headlines.
2. Use Parts of Speech tagging to break down each headline into parts of speech.
3.  Swap out words in each headline for another word with the same parts of speech tag (e.g, swap a noun for a different noun).
4. Pick a headline that reads the most like a sentence
5. Post the headline to Twitter automatically on a regular schedule

R packages Used

For those that are interested, I used the following packages:
  • httr - used to authenticate the twitter api and create my twitter token.  Follow the code here to replicate.  Used the POST() function to send my update to Twitter.  See here for code.
  • rvest - used to get the Google News headlines.
  • openNLP - used to do the POS tagging for the headlines and sentence evaluation.
I used Task Scheduler to run a batch file that ran the R file I created.  See here for the process to automate this.

Results

The results are mixed.  Sometimes the headlines make sense and sometimes they don't.  I suppose this isn't very surprising.  Headlines are difficult to analyze because they are not complete sentences.  As such, the POS tags are often wrong.  Additional work is needed to have better swaps, and consequently, more readable headlines.  However, my larger goal of being able to automatically post Tweets was successful, and may prove useful in the future for other projects of mine.

Here is my favorite post so far:

POS News Update: 'HUCKABEE MUSK, STEPHEN HAWKING WANT TO SAVE THE WEBSITE FROM LETTUCE CONCERNS'

You can see these for yourself by following me on Twitter at @philanalytics