Big Data Course Descriptions
Apache Hadoop Fundamentals
Apache Hadoop is an open source platform designed to query and analyze big data distributed across large clusters of servers with a very high degree of fault tolerance. The strength of the clusters comes from the software's ability to detect and handle failures at the application layer instead of relying on high-end hardware. This hands-on course is designed for programmers seeking to analyze datasets, and for administrators who need to set up and run Hadoop clusters. Topics include the Hadoop Distributed File System (HDFS), MapReduce , Hadoop's data and I/O building blocks, how to design, build, and administer a Hadoop cluster, how to run Hadoop in the cloud, Sqoop, the Pig query language, Hive, HBase and ZooKeeper.
Prerequisites: Prior programming experience is required, prior Java programming experience is recommended.
Big Data: Mango, Acid, BigTable, CouchDB
Relational and traditional databases are unable to keep up with the demand for Big Data. Given this demand, a new set of technologies have emerged and been termed Big Data or NoSQL (meaning Not Only SQL). Although these Big Data databases differ, common characteristics exist: they are designed to be very fast and to scale to vast sets of data. In this course, students study Big Data concepts, technologies, and the new set of techniques. Topics include emerging tools like Mango, Acid, BigTable, CouchDB, building Big Data systems, and new tools designed for Big Data.
Prerequisites: There are no prerequisites for this class, but prior relational database experience is recommended.
Programming Apache Hive and Pig
Hive and Pig are higher-level abstractions that allow the management and manipulation of data in a Hadoop cluster without Java programming experience. Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Apache Hive is Hadoop's data warehouse infrastructure. In this hands-on course students learn the Pig Latin scripting language, the Grunt shell, and Pig User Defined Functions (UDFs) as well as how to use Hive's SQL dialect, HiveQL, to summarize, query, and analyze large datasets stored in Hadoop. Topics include Pig's data model, Pig Latin scripts to sort, group, join, project, and filter your data, Grunt, and load and store functions. Other topics include how to use Hive to create, alter, and drop databases, tables, views, functions, and indexes, how to load and extract data from tables, and how to perform queries, grouping, filtering, joining, and other conventional query operations.
Prerequisites: There are no prerequisites for this class, but prior experience with SQL and a scripting language is recommended.