Mode of learning : Online - Self Paced
Domain / Subject : Engineering & Technology
Function : Information Technology(IT)
Duration : 41 Hours
Difficulty : Basic
About the Course
The Hadoop Big data online Video Certification Analyst course has been created to cater for professionals working in Data warehouse, Business Intelligence, Databases, Main frames or people who are comfortable with basic SQL coding language and want to pursue a career in designing, development, and architecting Hadoop-based solutions.
It lays emphasis on understanding what is Hadoop, how flow of data takes place in it , how it can enable storage and large-scale processing with deep dive into Hive, Pig and introduction to Impala along with other Hadoop ecosystem projects and basic administration like installation of single node cluster and Multi node cluster on ec2. Hadoop Tutorial provided as part of training contains in depth detail description of the topics mentioned.
As part of Hadoop Data warehousing / Analyst course we also cover how ETL tools like Pentaho or Talend can connect to Hadoop ecosystem.
The key objectives of this online training is to:
• Setting up Hadoop infrastructure with single and multi node cluster on amazon ec2(CDH4).
• ETL tool connectivity with Hadoop, real time case studies etc.
• Detailed hands on with Impala for real time queries on Hadoop.
• Writing Hive and Pig Scripts and working with Sqoop.
• Understanding in YARN (MRv2) latest version of Hadoop Release 2.0.
• Implementation of HBase, MapReduce Integration, Advanced Usage and Advanced Indexing.
• Work on a Real Life Project on Big Data Analytics and gain Hands on Project Experience.
• Implement linked-in algorithms – Identification of Shortest path for 1st level or 2nd level connection using Map Reduce.
• Play with Datasets – Twitter data set for sentiment analysis, Whether dataset, Loan Data set.
• Guidance and Quiz to prepare for Professional Certification exams like – Cloudera, etc.
• Ability to design and develop applications involving large data using Hadoop eco system.
• 3 months support to latest version of technology or Product , it will shared in form of recorded sessions.
Module 1 – Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS
Module 2 – Hands on Exercises
1.Introduction to Sqoop , use cases and Installation
2.Introduction to Hive , use cases and Installation
3.Introduction to Pig , use cases and Installation
4.Introduction to Oozie , use cases and Installation
5.Introduction to Flume , use cases and Installation
6.Introduction to Yarn
Assignment – 1
Mini Project – Importing Mysql Data using Sqoop and Querying it using Hive
Module 3 – Deep Dive in Map Reduce
Module 4 – Hive
1. Introduction to Hive
2. Relational Data Analysis with Hive
3. Hive Data Management
5. Extending Hive
6. Hands on Exercises – Playing with huge data and Querying extensively.
7. User defined Functions,Optimizing Queries, Tips and Tricks for performance tuning
Module 5 – Pig
1. Introduction to Pig
2. Basic Data Analysis with Pig
3. Processing Complex Data with Pig
4. Multi-Dataset Operations with Pig
5. Extending Pig
6. Pig Jobs
Module 6 – Impala
1. Introduction to Impala
2. Choosing the Best (Hive,Pig,Impala)
Module 7 – Cluster Planning
Module 8 – Hadoop Cluster Setup and Running Map Reduce Jobs – Multinode Setup
Module 9 – Major Project – Putting it all together and Connecting Dots
Module 10 – ETL Connectivity with Hadoop Ecosystem
Module 11 – Job and certification support
Assignment – 3
Disclaimer: The contents of the course & Institute are obtained from the institute’s website by automated scraping or manual updates. For the latest information, please refer the institute website directly. For any discrepancies in the content, contact us at