Hadoop Administration

Course Overview

Hadoop Administartion is a four-day training course for Apache Hadoop provides participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster using Cloudera Manager. From installation and configuration through load balancing and tuning, Cloudera’s training course is the best preparation for the real-world challenges faced by Hadoop administrators.

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:

Cloudera Manager features that make managing your clusters easier, such as aggregated logging, configuration management, resource management, reports, alerts, and service management.
The internals of YARN, MapReduce, Spark, and HDFS

Determining the correct hardware and infrastructure for your cluster

Proper cluster configuration and deployment to integrate with the data center

How to load data into the cluster from dynamically-generated files using Flume and from RDBMS using Sqoop

Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster

Best practices for preparing and maintaining Apache Hadoop in production

Troubleshooting, diagnosing, tuning, and solving Hadoop issues


Course Details

Introduction The Case for Apache Hadoop

Why Hadoop?

Fundamental Concepts

Core Hadoop Components

Hadoop Cluster Installation

Rationale for a Cluster Management Solution

Cloudera Manager Features

Cloudera Manager Installation

Hadoop (CDH) Installation

The Hadoop Distributed File System (HDFS)

HDFS Features

Writing and Reading Files

NameNode Memory Considerations

Overview of HDFS Security

Web UIs for HDFS

Using the Hadoop File Shell

MapReduce and Spark on YARN

The Role of Computational Frameworks

YARN: The Cluster Resource Manager

MapReduce Concepts

Apache Spark Concepts

Running Computational Frameworks on YARN

Exploring YARN Applications Through the

Web UIs, and the Shell

YARN Application Logs

Hadoop Configuration and Daemon Logs

Cloudera Manager Constructs for Managing Configurations

Locating Configurations and Applying Configuration Changes

Managing Role Instances and Adding Services

Configuring the HDFS Service

Configuring Hadoop Daemon Logs

Configuring the YARN Service Getting Data Into HDFS

Ingesting Data From External Sources With Flume

Ingesting Data From Relational Databases With Sqoop

REST Interfaces

Best Practices for Importing Data Planning Your Hadoop Cluster

General Planning Considerations

Choosing the Right Hardware

Virtualization Options

Network Considerations

Configuring Nodes

Installing and Configuring Hive, Impala, and Pig

Hive

Impala

Pig

Hadoop Clients Including Hue

What Are Hadoop Clients?

Installing and Configuring Hadoop Clients

Installing and Configuring Hue

Hue Authentication and Authorization

Advanced Cluster Configuration

Advanced Configuration Parameters

Configuring Hadoop Ports

Configuring HDFS for Rack Awareness

Configuring HDFS High Availability

Hadoop Security

Why Hadoop Security Is Important

Hadoop’s Security System Concepts

What Kerberos Is and how it Works

Securing a Hadoop Cluster With Kerberos

Other Security Concepts

Managing Resources

Configuring cgroups with Static Service Pools

The Fair Scheduler

Configuring Dynamic Resource Pools

YARN Memory and CPU Settings

Impala Query Scheduling

Cluster Maintenance

Checking HDFS Status

Copying Data Between Clusters

Adding and Removing Cluster Nodes

Rebalancing the Cluster

Directory Snapshots

Cluster Upgrading

Cluster Monitoring and Troubleshooting

Cloudera Manager Monitoring Features

Monitoring Hadoop Clusters

Troubleshooting Hadoop Clusters

Common Misconfigurations


Prerequisites

This course is best suited to systems administrators and IT managers who have basic Linux experience. Prior knowledge of Apache Hadoop is not required.