Description
Apache Hadoop was a pioneer in the world of big data technologies, and it continues to be a leader in enterprise big data storage. Apache Spark is the top big data processing engine and provides an impressive array of features and capabilities. When used together, the Hadoop Distributed File System (HDFS) and Spark can provide a truly scalable big data analytics setup. In ...
read more
Preview
Course Content
The combined power of Spark and Hadoop Distributed File System (HDFS)(In progress)
01:20
Apache Hadoop overview
01:50
Apache Spark overview
00:45
Integrating Hadoop and Spark
01:19
Setting up the environment
03:20
Using exercise files
04:06
Storage formats
02:20
Compression
02:05
Partitioning
02:02
Bucketing
01:17
Best practices for data storage
01:19
Reading external files into Spark
02:33
Writing to HDFS
02:00
Parallel writes with partitioning
01:11
Parallel writes with bucketing
01:27
Best practices for ingestion
00:55
How Spark works
02:59
Reading HDFS files with schema
01:44
Reading partitioned data
01:32
Reading bucketed data
00:55
Best practices for data extraction
01:08
Pushing down projections
01:45
Pushing down filters
01:50
Managing partitions
02:42
Managing shuffling
02:35
Improving joins
02:04
Storing intermediate results
02:39
Best practices for data processing
01:18
Problem definition
01:57
Data loading
01:37
Total score analytics
01;31
Average score analytics
01:18
Top student analytics
01:48
Next steps
00:44
About Educator
Kumaran Ponnambalam
Working with data for 20+ years
Kumaran Ponnambalam has been working with data for more than 20 years.
He has built enterprise and cloud applications that ingest data to produce meaningful insights for its consumers. Data has always intrigued Kumaran and he has always searched for ways to mine, manage, and master it. Using analytics to solve business problems is his key interest domain. Of late, he has taken a keen interest in building quality courses for people to understand and use data. Big data analytics is fast growing, but quality education, especially in application areas, is lacking and he wants to contribute to it.
He has built enterprise and cloud applications that ingest data to produce meaningful insights for its consumers. Data has always intrigued Kumaran and he has always searched for ways to mine, manage, and master it. Using analytics to solve business problems is his key interest domain. Of late, he has taken a keen interest in building quality courses for people to understand and use data. Big data analytics is fast growing, but quality education, especially in application areas, is lacking and he wants to contribute to it.
Course Info
Course Duration
1h 1m
Course Language
English
Course Level
Intermediate
Certification
Yes
$12.32
Enroll Now