• support@conveytechlabs.com

online_training [woocommerce_currency_switcher_drop_down_box]


SKU: WOO-ALBUM-3 Categories: ,


This course is designed make the participant proficient in Hadoop, Map-reduce with hands-on. It also covers various Eco-Systems (Hive, Pig, Sqoop, Flume)



Any of the following are the pre-requisites for Hadoop Development course. Core java with collections

Linux commands


Module 1: Bigdata Concepts

  • Understand big data, challenges, distributed environment.
  • Aware of hadoop and sub projects.
  • Introduction
  • Data
  • Storage
  • Bigdata
  • Distributed environment
  • Hadoop introduction
  • History
  • Environment
  • Benefits
  • Hadoop Components / Eco-Systems
  • Cluster Deployment
  • Pseudo Vs Fully Distributed
  • Arranging cluster for practice
  • Cloudera cluster environment

Module 2: HDFS

Should aware of HDFS Components, Namenode, Datanode

Aware of storing and maintaining data in cluster, reading and writing data to/from cluster.

  • Able to maintain files in HDFS
  • Able to access data from HDFS through java program
  • HDFS Architecture HDFS Shell
  • NameNode FS Shell Commands
  • Datanode Uploading & Downloading
  • Fault Tolerence Directory Handling
  • Read&Write operations File Handling
  • Interfaces(Command line interface, JSP, Use cases API) Using Hue for browsing data.

Module 3: Map-Reduce (for Java Programmers)

  • Understand Map-Reduce paradigm and Yarn Architecture. Analyze a given problem in map-reduce pattern. Able to Implement map-reduce applications
  • Map-Reduce Introduction Yarn Architecture
  • Map-Reduce Architecture Designing and application on MR
  • Work Flow of MR Program Implementation
  • Placement of components on cluster Detailed description of M-R Methods
  • MR on HDFS key/value pairs

Module 4: Data Ingestion:

  • Understand the Data ingestion and types
  • Recognize various Data Ingestion tools
  • Hive Architecture
  • Introduction
  • Types of Data Ingestion
  • Ingesting Batch Data
  • Ingesting Streaming Data
  • Use Cases

Module 5: Apache Sqoop

  • Understand Sqoop architecture and uses
  • Able to load real-time data from an RDBMS table/Query on to HDFS
  • Able to write sqoop scripts for exporting data from HDFS onto RDMS tables
  • Introduction Sqoop-importall Sqoop Architecture Integrating with Hive Connect to MySQL database Export Sqoop – Import Eval Importing to specific location Joins Querying with import Use Cases

Module 6: Apache Flume

  • Understand Flume architecture and uses
  • Able to create flume configuration files to stream and ingest data onto HDFS
  • Introduction Creation of Flume configuration files
  • Flume Architecture Streaming local disk
  • Flume master Streaming web / Social Networking
  • Flume Agents Examples
  • Flume Collectors Use Cases

Module 7: Data transformation (PIG):

  • Understand data types, data model, and modes of execution.
  • Able to store the data from a Pig relation on to HDFS.
  • Able to load data into Pig Relation with or without schema.
  • Able to split, join, filter, and transform the data using pig operators Able to write pig scripts and work with UDFs.
  • Introduction Join
  • Pig Data Flow Engine Order
  • Map Reduce Vs. Pig Flatten
  • Data Types Cogroup
  • Basic Pig Programming Illustrate
  • Modes of execution in PIG Explain
  • Loading Parameter substitution
  • Storing Creating simple UDFs in Pig
  • Group Use Cases.
  • Filter


Module-8: Hive & HCatalog:

Understand the importance of Hive, Hive Architecture

Able to create Managed, External, Partitioned and Bucketed Tables Able to Query the data, perform joins between tables Understand storage formats of Hive Understand Vectorization in Hive


Hive Vs. RDBMS

HiveQL and Shell

Data Types


Hive Commands

Hive Tables

Managed Tables

External Tables



Inserting from other tables


Loading into partitions

Dynamic partitioning





Distribute by


Using HCatStorer


Use Cases




Module-9: Introduce other Eco-systems


Able to migrate to other eco-systems






Use Cases



Use Cases

Course Description

Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation

Course Information

  • Duration: 30 hrs
  • Timings: Week days 1-2 Hours per day (or) Weekends: 2-3 Hours per day
  • Method: Online/Classroom Training
  • Study Material: Soft Copy

Course Content



Understanding Big Data and Hadoop


  •   What Is Big Data.
  •   Limitations and Solutions of existing Data Analytics Architecture
  •   Hadoop Features
  •   Hadoop Ecosystem
  •   Hadoop 2.x core components
  •   Hadoop Storage: HDFS
  •   Hadoop Processing: MapReduce Framework


Hadoop Architecture and HDFS
  •   Hadoop 2.x Cluster Architecture – Federation and High Availability
  •   A Typical Production Hadoop Cluster
  •   Hadoop Cluster Modes
  •   Common Hadoop Shell Commands
  •   Hadoop 2.x Configuration File
  •   Single node cluster and Multi-node cluster set up Hadoop Administration


Hadoop MapReduce Framework
  •   MapReduce Use Cases
  •   Traditional way Vs MapReduce way
  •   Why MapReduce
  •   Hadoop 2.x MapReduce Architecture
  •   Hadoop 2.x MapReduce Components
  •   YARN MR Application Execution Flow
  •   YARN Workflow
  •   Demo on MapReduce
  •   Input Splits
  •   Relation between Input Splits and HDFS Blocks
  •   MapReduce: Combiner & Partitioner, Demo on de-identifying Health Care Data set
  •   Demo on Weather Data set.


Advanced MapReduce
  •   Counters
  •   Distributed Cache
  •   MRunit,
  •   Reduce Join
  •   Custom Input Format
  •   Sequence Input Format
  •   Xml file Parsing using MapReduce.


  •   About Pig
  •   MapReduce Vs Pig
  •   Pig Use Cases , Programming Structure in Pig, Pig Running Modes, Pig components, Pig Execution, Pig Latin Program, Data Models in Pig, Pig Data Types, Shell and Utility Command
  •   Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Specialized joins in Pig, Built In Functions ( Eval Function, Load and Store Functions, Math function, String Function, Date Function,
  •   Pig UDF, Piggybank, Parameter Substitution ( PIG macros and Pig Parameter substitution ), Pig Streaming, Testing Pig scripts with Punit, Aviation use case in PIG, Pig Demo on Healthcare Data set.



  •   Hive Background, Hive Use Case, About Hive, Hive Vs Pig, Hive Architecture and Components, Metastore in Hive, Limitations of Hive, Comparison with Traditional Database, Hive Data Types and Data Models, Partitions and Buckets, Hive Tables(Managed Tables and External Tables), Importing Data, Querying Data, Managing Outputs, Hive Script, Hive UDF, Retail use case in Hive, Hive Demo on Healthcare Data set.

Chapter- 7

Advanced Hive and Hbase
  •   Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts, Hive Indexes and views Hive query optimizers, Hive : Thrift Server, User Defined Functions, HBase: Introduction to NoSQL Databases and HBase, HBase v/s RDBMS, HBase Components, HBase Architecture, Run Modes & Configuration, HBase Cluster Deployment.

Chapter – 8  

Advanced Hbase
  •   HBase Data Model, HBase Shell, HBase Client API, Data Loading Techniques, ZooKeeper Data Model, Zookeeper Service, Zookeeper, Demos on Bulk Loading, Getting and Inserting Data, Filters in HBase.

Chapter- 9 

Processing Distributed Data with Apache Spark
  •   What is Apache Spark, Spark Ecosystem, Spark Components, History of Spark and Spark Versions/Releases, Spark a Polyglot, What is Scala?, Why Scala?, SparkContext, RDD.

Chapter- 10 

Oozie and Hadoop Project


  •   Flume and Sqoop Demo, Oozie, Oozie Components, Oozie Workflow, Scheduling with Oozie, Demo on Oozie Workflow, Oozie Co-ordinator, Oozie Commands, Oozie Web Console, Oozie for MapReduce, PIG, Hive, and Sqoop, Combine flow of MR, PIG, Hive in Oozie, Hadoop Project Demo, Hadoop Integration with Talend.

Key Features

  • Career oriented training.
  • One to One live interaction with a trainer.
  • Demo project end to end explanation.
  • Interview guidence with resume preparation.
  • Support with the trainer through E-mail.

Live Traffic

Live Traffic Feed