• support@conveytechlabs.com


SKU: WOO-ALBUM-3 Categories: ,


This course is designed make the participant proficient in Hadoop, Map-reduce with hands-on. It also covers various Eco-Systems (Hive, Pig, Sqoop, Flume)



Any of the following are the pre-requisites for Hadoop Development course. Core java with collections

Linux commands


Module 1: Bigdata Concepts

  • Understand big data, challenges, distributed environment.
  • Aware of hadoop and sub projects.
  • Introduction
  • Data
  • Storage
  • Bigdata
  • Distributed environment
  • Hadoop introduction
  • History
  • Environment
  • Benefits
  • Hadoop Components / Eco-Systems
  • Cluster Deployment
  • Pseudo Vs Fully Distributed
  • Arranging cluster for practice
  • Cloudera cluster environment

Module 2: HDFS

Should aware of HDFS Components, Namenode, Datanode

Aware of storing and maintaining data in cluster, reading and writing data to/from cluster.

  • Able to maintain files in HDFS
  • Able to access data from HDFS through java program
  • HDFS Architecture HDFS Shell
  • NameNode FS Shell Commands
  • Datanode Uploading & Downloading
  • Fault Tolerence Directory Handling
  • Read&Write operations File Handling
  • Interfaces(Command line interface, JSP, Use cases API) Using Hue for browsing data.

Module 3: Map-Reduce (for Java Programmers)

  • Understand Map-Reduce paradigm and Yarn Architecture. Analyze a given problem in map-reduce pattern. Able to Implement map-reduce applications
  • Map-Reduce Introduction Yarn Architecture
  • Map-Reduce Architecture Designing and application on MR
  • Work Flow of MR Program Implementation
  • Placement of components on cluster Detailed description of M-R Methods
  • MR on HDFS key/value pairs

Module 4: Data Ingestion:

  • Understand the Data ingestion and types
  • Recognize various Data Ingestion tools
  • Hive Architecture
  • Introduction
  • Types of Data Ingestion
  • Ingesting Batch Data
  • Ingesting Streaming Data
  • Use Cases

Module 5: Apache Sqoop

  • Understand Sqoop architecture and uses
  • Able to load real-time data from an RDBMS table/Query on to HDFS
  • Able to write sqoop scripts for exporting data from HDFS onto RDMS tables
  • Introduction Sqoop-importall Sqoop Architecture Integrating with Hive Connect to MySQL database Export Sqoop – Import Eval Importing to specific location Joins Querying with import Use Cases

Module 6: Apache Flume

  • Understand Flume architecture and uses
  • Able to create flume configuration files to stream and ingest data onto HDFS
  • Introduction Creation of Flume configuration files
  • Flume Architecture Streaming local disk
  • Flume master Streaming web / Social Networking
  • Flume Agents Examples
  • Flume Collectors Use Cases

Module 7: Data transformation (PIG):

  • Understand data types, data model, and modes of execution.
  • Able to store the data from a Pig relation on to HDFS.
  • Able to load data into Pig Relation with or without schema.
  • Able to split, join, filter, and transform the data using pig operators Able to write pig scripts and work with UDFs.
  • Introduction Join
  • Pig Data Flow Engine Order
  • Map Reduce Vs. Pig Flatten
  • Data Types Cogroup
  • Basic Pig Programming Illustrate
  • Modes of execution in PIG Explain
  • Loading Parameter substitution
  • Storing Creating simple UDFs in Pig
  • Group Use Cases.
  • Filter


Module-8: Hive & HCatalog:

Understand the importance of Hive, Hive Architecture

Able to create Managed, External, Partitioned and Bucketed Tables Able to Query the data, perform joins between tables Understand storage formats of Hive Understand Vectorization in Hive


Hive Vs. RDBMS

HiveQL and Shell

Data Types


Hive Commands

Hive Tables

Managed Tables

External Tables



Inserting from other tables


Loading into partitions

Dynamic partitioning





Distribute by


Using HCatStorer


Use Cases




Module-9: Introduce other Eco-systems


Able to migrate to other eco-systems






Use Cases



Use Cases

Live Traffic

Live Traffic Feed