Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Learn By Example: Hadoop, MapReduce for Big Data problems
Introduction
You, this course and Us (1:52)
Why is Big Data a Big Deal
The Big Data Paradigm (14:20)
Serial vs Distributed Computing (8:37)
What is Hadoop? (7:25)
HDFS or the Hadoop Distributed File System (11:00)
MapReduce Introduced (11:39)
YARN or Yet Another Resource Negotiator (4:00)
Installing Hadoop in a Local Environment
Hadoop Install Modes (8:32)
Hadoop Standalone mode Install (15:46)
Hadoop Pseudo-Distributed mode Install (11:44)
The MapReduce "Hello World"
The basic philosophy underlying MapReduce (8:49)
MapReduce - Visualized And Explained (9:03)
MapReduce - Digging a little deeper at every step (10:21)
"Hello World" in MapReduce (10:29)
The Mapper (9:48)
The Reducer (7:46)
The Job (12:27)
Run a MapReduce Job
Get comfortable with HDFS (10:58)
Run your first MapReduce Job (14:30)
Juicing your MapReduce - Combiners, Shuffle and Sort and The Streaming API
Parallelize the reduce phase - use the Combiner (14:39)
Not all Reducers are Combiners (14:31)
How many mappers and reducers does your MapReduce have? (8:23)
Parallelizing reduce using Shuffle And Sort (14:55)
MapReduce is not limited to the Java language - Introducing the Streaming API (5:05)
Python for MapReduce (12:19)
HDFS and Yarn
HDFS - Protecting against data loss using replication (15:38)
HDFS - Name nodes and why they're critical (6:54)
HDFS - Checkpointing to backup name node information (11:16)
Yarn - Basic components (8:39)
Yarn - Submitting a job to Yarn (13:16)
Yarn - Plug in scheduling policies (14:27)
Yarn - Configure the scheduler (12:32)
MapReduce Customizations For Finer Grained Control
Setting up your MapReduce to accept command line arguments (13:47)
The Tool, ToolRunner and GenericOptionsParser (12:35)
Configuring properties of the Job object (10:41)
Customizing the Partitioner, Sort Comparator, and Group Comparator (15:16)
The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!
The heart of search engines - The Inverted Index (14:47)
Generating the inverted index using MapReduce (10:31)
Custom data types for keys - The Writable Interface (10:29)
Represent a Bigram using a WritableComparable (13:19)
MapReduce to count the Bigrams in input text (8:32)
Setting up your Hadoop project
Test your MapReduce job using MRUnit (13:47)
Input and Output Formats and Customized Partitioning
Introducing the File Input Format (12:48)
Text And Sequence File Formats (10:21)
Data partitioning using a custom partitioner (7:11)
Make the custom partitioner real in code (10:25)
Total Order Partitioning (10:10)
Input Sampling, Distribution, Partitioning and configuring these (9:04)
Secondary Sort (14:34)
Recommendation Systems using Collaborative Filtering
Introduction to Collaborative Filtering (7:25)
Friend recommendations using chained MR jobs (17:15)
Get common friends for every pair of users - the first MapReduce (14:50)
Top 10 friend recommendation for every user - the second MapReduce (13:46)
Hadoop as a Database
Structured data in Hadoop (14:08)
Running an SQL Select with MapReduce (15:31)
Running an SQL Group By with MapReduce (14:02)
A MapReduce Join - The Map Side (14:19)
A MapReduce Join - The Reduce Side (13:07)
A MapReduce Join - Sorting and Partitioning (8:49)
A MapReduce Join - Putting it all together (13:46)
K-Means Clustering
What is K-Means Clustering? (14:04)
A MapReduce job for K-Means Clustering (16:33)
K-Means Clustering - Measuring the distance between points (13:52)
K-Means Clustering - Custom Writables for Input/Output (8:26)
K-Means Clustering - Configuring the Job (10:49)
K-Means Clustering - The Mapper and Reducer (11:23)
K-Means Clustering : The Iterative MapReduce Job (3:39)
Setting up a Hadoop Cluster
Manually configuring a Hadoop cluster (Linux VMs) (13:50)
Getting started with Amazon Web Servicies (6:25)
Start a Hadoop Cluster with Cloudera Manager on AWS (13:04)
Appendix
Setup a Virtual Linux Instance (For Windows users) (15:58)
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables (8:25)
HDFS - Checkpointing to backup name node information
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock