In part 2 of the “Getting Started Using Hadoop” series, I discussed how to build a Hadoop cluster on Amazon EC2 using Cloudera CDH. This post will cover how to get your data into the Hadoop Distributed File System (HDFS) using the publicly available “Airline Dataset“. While there are multiple ways to upload data into HDFS, this post will only cover the easiest method, which is to use the Hue ‘File Browser’ interface.
Innovation Will Never Be At The Push Of A Button
May 17, 2013 Leave a Comment
@randyzwitch @benjamingaines @usujason I am envisioning the data science equivalent of an autonomous vehicle pileup. — Todd Belcher (@toddmetrics) May 16, 2013 Recently, I've been getting my blood pressure up reading (marketing) … [Continue reading]
Getting Started Using Hadoop, Part 2: Building a Cluster
April 25, 2013 Leave a Comment
In Part 1 of this series, I discussed some of the basic concepts around Hadoop, specifically when it's appropriate to use Hadoop to solve your data engineering problems and the terminology of the Hadoop eco-system. This post will cover how to install … [Continue reading]




Discussion