Getting Started Using Hadoop, Part 3: Loading Data

In part 2 of the “Getting Started Using Hadoop” series, I discussed how to build a Hadoop cluster on Amazon EC2 using Cloudera CDH. This post will cover how to get your data into the Hadoop Distributed File System (HDFS) using the publicly available “Airline Dataset“. While there are multiple ways to upload data into HDFS, this post will only cover the easiest method, which is to use the Hue ‘File Browser’ interface.

[Continue reading]

Innovation Will Never Be At The Push Of A Button

@randyzwitch @benjamingaines @usujason I am envisioning the data science equivalent of an autonomous vehicle pileup. — Todd Belcher (@toddmetrics) May 16, 2013   Recently, I've been getting my blood pressure up reading (marketing) … [Continue reading]

Getting Started Using Hadoop, Part 2: Building a Cluster

In Part 1 of this series, I discussed some of the basic concepts around Hadoop, specifically when it's appropriate to use Hadoop to solve your data engineering problems and the terminology of the Hadoop eco-system. This post will cover how to install … [Continue reading]