Hadoop Administration
Course Overview
Hadoop Administartion is a four-day training course for Apache Hadoop provides participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster using Cloudera Manager. From installation and configuration through load balancing and tuning, Cloudera’s training course is the best preparation for the real-world challenges faced by Hadoop administrators.
Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as:
– Cloudera Manager features that make managing your clusters easier, such as aggregated logging, configuration management, resource management, reports, alerts, and service management.
– The internals of YARN, MapReduce, Spark, and HDFS
– Determining the correct hardware and infrastructure for your cluster
– Proper cluster configuration and deployment to integrate with the data center
– How to load data into the cluster from dynamically-generated files using Flume and from RDBMS using Sqoop
– Configuring the FairScheduler to provide service-level agreements for multiple users of a cluster
– Best practices for preparing and maintaining Apache Hadoop in production
– Troubleshooting, diagnosing, tuning, and solving Hadoop issues
Course Details
Introduction The Case for Apache Hadoop
– Why Hadoop?
– Fundamental Concepts
– Core Hadoop Components
Hadoop Cluster Installation
– Rationale for a Cluster Management Solution
– Cloudera Manager Features
– Cloudera Manager Installation
– Hadoop (CDH) Installation
The Hadoop Distributed File System (HDFS)
– HDFS Features
– Writing and Reading Files
– NameNode Memory Considerations
– Overview of HDFS Security
– Web UIs for HDFS
– Using the Hadoop File Shell
MapReduce and Spark on YARN
– The Role of Computational Frameworks
– YARN: The Cluster Resource Manager
– MapReduce Concepts
– Apache Spark Concepts
– Running Computational Frameworks on YARN
– Exploring YARN Applications Through the
Web UIs, and the Shell
– YARN Application Logs
Hadoop Configuration and Daemon Logs
– Cloudera Manager Constructs for Managing Configurations
– Locating Configurations and Applying Configuration Changes
– Managing Role Instances and Adding Services
– Configuring the HDFS Service
– Configuring Hadoop Daemon Logs
– Configuring the YARN Service Getting Data Into HDFS
– Ingesting Data From External Sources With Flume
– Ingesting Data From Relational Databases With Sqoop
– REST Interfaces
– Best Practices for Importing Data Planning Your Hadoop Cluster
– General Planning Considerations
– Choosing the Right Hardware
– Virtualization Options
– Network Considerations
– Configuring Nodes
Installing and Configuring Hive, Impala, and Pig
– Hive
– Impala
– Pig
Hadoop Clients Including Hue
– What Are Hadoop Clients?
– Installing and Configuring Hadoop Clients
– Installing and Configuring Hue
– Hue Authentication and Authorization
Advanced Cluster Configuration
– Advanced Configuration Parameters
– Configuring Hadoop Ports
– Configuring HDFS for Rack Awareness
– Configuring HDFS High Availability
Hadoop Security
– Why Hadoop Security Is Important
– Hadoop’s Security System Concepts
– What Kerberos Is and how it Works
– Securing a Hadoop Cluster With Kerberos
– Other Security Concepts
Managing Resources
– Configuring cgroups with Static Service Pools
– The Fair Scheduler
– Configuring Dynamic Resource Pools
– YARN Memory and CPU Settings
– Impala Query Scheduling
Cluster Maintenance
– Checking HDFS Status
– Copying Data Between Clusters
– Adding and Removing Cluster Nodes
– Rebalancing the Cluster
– Directory Snapshots
– Cluster Upgrading
Cluster Monitoring and Troubleshooting
– Cloudera Manager Monitoring Features
– Monitoring Hadoop Clusters
– Troubleshooting Hadoop Clusters
– Common Misconfigurations
Prerequisites
This course is best suited to systems administrators and IT managers who have basic Linux experience. Prior knowledge of Apache Hadoop is not required.