Titanic: Machine Learning from Disaster

January 4th, 2017 | kaggle, machine learning, R

This exercise is a classic machine learning exercise popularized by Kaggle and often used as a 'machine leaning 101' exercise. I'm currently working through the book, 'Machine Learning with R Cookbook' by Yu-Wei and David Chiu and this is the first exercise in the book. To produce this post, I output the html from the R Notebook that [...]

Kaggle: Sentiment Analysis on Movie Reviews

November 17th, 2016 | kaggle, machine learning

Problem Statment (see competition on Kaggle here) This problem is a very advanced problem of sentiment analysis, because it focuses to capture not so obvious trends. For instance sarcasm or frustration etc. And we do it by breaking down the sentence. For instance, if I say “The movie was OK but not that awesome” . What [...]

Cheatsheet – Python & R codes for common Machine Learning Algorithms

August 25th, 2016 | machine learning

I ran across this really cool cheatsheet on a great blog that I follow, Analytics Vindhya, so wanted to post it here to share. Admittedly, scikit-learn.org does have pretty great quick-start documentation, but I still find this to be a pretty useful piece of reference material. Enjoy! If you want to copy / paste code, download the [...]

HDFS & Pig Notes

March 1st, 2016 | Hadoop, Map Reduce

Some pretty scrappy notes that got me through my time learning Pig & HDFS Setup the sshfs on local machine <> sshfs -p 22 hsohn@4.26.4.XX:/ebs/user/hsohn /Users/hsohn/Documents/remoteHome/dev/ -o auto_cache,reconnect,defer_permissions,negative_vncache,volname=dev Hadoop For dev access, do ssh bi@4.26.4.XX For prod access do ssh bi@4.26.4.XX Run a file in pig      pig -f query_20150917_test.pig Run locally      pig -x local script.pig Retrieving Results from HDFS      hdfs dfs -text [path / directory]*.gz Redirect [...]