In part 2 of the “Getting Started Using Hadoop” series, I discussed how to build a Hadoop cluster on Amazon EC2 using Cloudera CDH. This post will cover how to get your data into the Hadoop Distributed File System (HDFS) using the publicly available “Airline Dataset“. While there are multiple ways to upload data into HDFS, this post will only cover the easiest method, which is to use the Hue ‘File Browser’ interface.
@randyzwitch @benjamingaines @usujason I am envisioning the data science equivalent of an autonomous vehicle pileup. — Todd Belcher (@toddmetrics) May 16, 2013 Recently, I've been getting my blood pressure up reading (marketing) … [Continue reading]
In Part 1 of this series, I discussed some of the basic concepts around Hadoop, specifically when it's appropriate to use Hadoop to solve your data engineering problems and the terminology of the Hadoop eco-system. This post will cover how to install … [Continue reading]