Certificate in Big Data and Hadoop

Learn to use Apache Hadoop to build powerful applications to analyse Big Data

Course at a Glance

Mode of learning : Online - Instructor Lead(LVC)

Domain / Subject : Engineering & Technology

Function : Information Technology(IT)

Starts on : 24th Aug 2014

Difficulty : Medium

How will this Big Data and Hadoop Certificate benefit you?

  • Learn to use Apache Hadoop to build powerful applications to analyse Big Data
  • Understand the Hadoop Distributed File System (HDFS)
  • Learn to setup, manage and monitor Hadoop cluster
  • Learn about Apache Hive,how to install Hive, run HiveQL queries to create tables, load data etc
  • Learn about Apache Pig, Pig Latin scripting language
  • Learn about Apache Sqoop, how to run scripts to transfer data between hadoop and relational databases
  • Learn about Apache HBase, how to perform realtime read/write access to your Big Data
  • Learn how to deploy Hadoop on cloud

What Jobs does this Big Data and Hadoop Certificate prepare you for?

  • Hadoop/Big Data Developer/Solutions Engineer
  • Software Engineer - Real Time/ Big Data Distributed Systems
  • Big Data Platform Architect
  • Data Scientist to help analyse big data and generate reports

Course materials provided to Learn Big Data and Hadoop online

  • Apart from the Big Data and Hadoop course videos, we will courier you the book "Hadoop: The Definitive Guide" by Tom White. This textbook will cover in further detail, the Big Data and Hadoop topics taught in the videos

Kind Of Placement Assistance Provided

  • Help you in rewriting your resume to showcase skills you have learnt in the Big Data and Hadoop course course
  • Give you mock interview practice
  • Give you Career guidance

Learn Big Data and Hadoop Online

  • 60 modules and each module contains a 30 minute video followed by an adaptive quiz


  • Days:Sunday-Firday
  • 06:00 PM - 08:00 PM US PDT

Kindly Note:Plus 1 Extra Project

  • Work on an end to end Hadoop project with a Mentor
  • Four 30 minute one-on-one skype calls with Mentor
  • Fees:  Rs.38,900

1. Module 1

What is Big Data
  • Why did Big Data suddenly become so prominent
  • Who are main vendors in the space - Cloudera - Hortonworks
  • Companies using Hadoop and use cases in different domains
  • Limitations of traditional large scale systems architecture
  • How Hadoop is solving the overcoming of traditional large scale system architecture

2. Module 2

Hadoop Architecture / Introduction to Hadoop Distributed File System (HDFS)
  • Introduction and history of Hadoop
  • Core components of Hadoop
  • Understanding Hadoop Master-Slave Architecture
  • Learn about NameNode, DataNode, Secondary Node
  • Learn about JobTracker, TaskTracker
  • Understanding HDFS Architecture
  • Anatomy of Read and Write data on HDFS

3. Module 3

Installing and setting up a Hadoop Cluster
  • Understanding MapReduce Framework Architecture
  • Hadoop deployment Modes - Standalone, Single node, Multinode
  • Understand the important configuration files in a Hadoop Cluster
  • Important Web URL's for Hadoop
  • Run HDFS and Linux commands
  • Execute some examples to get a high level understanding
  • Manuals for installation of Hadoop1 and Hadoop2 would be provided(Mac and Ubuntu)
  • Manual for Demo VM installation steps for Windows
  • Manual for Multinode Hadoop Cluster installation on AWS

4. Module 4

Understanding Hadoop MapReduce Framework
  • Overview of the MapReduce Framework
  • Use cases of MapReduce
  • MapReduce Process
  • Anatomy of MapReduce Program
  • MapReduce Flow
  • Pseudo code and understand the concept of Mappers, Reducers, Combiners
  • Splits and Blocks
  • Writing MapReduce Mappers, Reducers and combiners in Java using Eclipse

5. Module 5

Advance MapReduce - Part 1
  • Writing Map and Reduce in other languages(not Java)
  • Understanding Partitioners , Write you own Partitioner
  • Distributed Cache
  • Joining Multiple datasets in MapReduce
  • MapSide Join
  • Reduce Side Join
  • Semi Join & Counters

6. Module 6

Advance MapReduce - Part 2

  • Understand different Input Output Formats
  • Hadoop Data Types
  • Using writable interface and writable comparable Interface
  • Custom Input Format
  • Sequence Files
  • JUnit and MRUnit Testing Frameworks, Writing and running unit test

7. Module 7

Apache Pig
  • Introduction to PIG
  • Setting up and running PIG
  • Grunt
  • Pig Latin
  • Writing PIG Latin scripts
  • Need of PIG
  • Introduction to PIG
  • Why PIG not MapReduce
  • Pig Components
  • Pig Execution Modes
  • Pig Shell - Grunt
  • Pig Latin, Writing PIG Latin scripts
  • Pig Data Types
  • Pig Operators- Arithmetic, Relational
  • Storage Types
  • Diagnosing Pig commands
  • UDF and External Scripts

8. Module 8

Apache Hive and HiveQL
  • Introduction to Hive
  • History of Hive and Facebook
  • Pig Vs Hive
  • Understand how queries are converted into MapReduce jobs
  • Hive Data Types
  • Hive DDL
  • Hive DML commands
  • HiveQL - Importing data, sorting and aggregating
  • Understand the Hive architecture
  • Hive MetaStore
  • Writing join queries and inserting data back into Hive
  • Difference between traditional RDBMS and Hive
  • Choosing between PIG, Hive and MapReduce

9. Module 9

Advance HiveQL
  • Multi Table Inserts
  • Hive Complex Data Types
  • HiveQL - Joins, Map joins
  • Cubes , Rollups
  • Running Custom Map Reduce Scripts
  • Hive Tables and storage formats
  • UDF and UDAF

10. Module 10

Apache Flume
  • Overview of Flume
  • Where is Flume used - import/export unstructured data
  • Flume Architecture
  • Using Flume to load data into HDFS
  • Using Flume to load data into Hive

11. Module 11

Apache Sqoop
  • Overview of Sqoop
  • Where is Sqoop used - import/export structured data
  • Using Sqoop to import data from RDBMS into HDFS
  • Using Sqoop to import data from RDBMS into Hive
  • Using Sqoop to export data from HDFS into RDMBS
  • Sqoop connectors

12. Module 12

Apache Oozie
  • Introduction to Oozie
  • Oozie workflow jobs
  • Oozie coordinator jobs
  • Creating Oozie Workflows
  • Using HUE UI for Oozie
  • Using CLI to run and track workflows

13. Module 13

NoSQL Databases
  • Introduction to NoSQL database
  • Types of NoSQL databases and their features
  • Brewers CAP Theorem
  • Advantage of NoSQL vs. traditional RDBMS

14. Module 14

Introduction to MongoDB and Apache Cassandra

  • Introduction to MongoDB
  • MongoDB Architecture
  • MongoDB documents and CRUD Operations
  • Introduction to Apache Cassandra
  • Overview of Cassandra - data model, reading/writing data, CQL
  • MongoDB vs. Cassandra

15. Module 15

Apache HBase
  • Introduction to HBase
  • Why use HBase
  • HBase Architecture - read and write paths
  • HBase vs. RDBMS
  • Installing and Configuration
  • Schema design in HBase - column families, hotspotting
  • Accessing data with HBase Shell
  • Accessing data with HBase API - Reading, Adding, Updating data from the shell, JAVA API
  • SCAN and Advanced API

16. Module 16

Apache Zookeeper
  • Overview of Zookeeper
  • Uses of Zookeeper
  • Zookeeper Service
  • Zookeeper Data Model
  • Using Zookeeper with HBase
  • Building applications with Zookeeper

17. Module 17

Hadoop 2.0, YARN, MRv2

  • Understand new features in Hadoop 2.0
  • NameNode High Availability
  • Federation and Namespaces
  • Schedulers
  • Introduction to YARN
  • YARN architecture
  • Upgrading MRv1 to MRv2
  • Developing application using MapReduce version 2
  • Manuals for Hadoop 2 installation

18. Module 18

  • Demo of 2 Sample projects.
  • Few openly available large datasets would be shared with all the attendees.
  • Attendees would choose one of the data sets and asked to perform an analysis on the data using the various technologies you've learnt in the course. You will use Flume, Sqoop to load data into HDFS, use Hive, Pig, HBase to perform analysis of data. You can also use Oozie to schedule and chain your Hadoop jobs. The project will give you a complete understanding of the Hadoop Ecosystem and how all the players come together.


Write Your Own Review

Write your review here (required)

Is the price of course overrated?
would you recommend this course to others?
Is duration of the course sufficient enough?
Did you like the faculties?
What would you prefer in future classroom or online learning?

Disclaimer: The contents of the course & Institute are obtained from the institute’s website by automated scraping or manual updates. For the latest information, please refer the institute website directly. For any discrepancies in the content, contact us at