data-analytics | TNPSC Fuhrer Notes

Amazon EMR is a managed cluster and serverless solution that can make it more efficient to run big data frameworks, such as [[Apache Hadoop]] and [[Apache Spark]], on Amazon Web Services (AWS) to process and analyze vast amounts of data.
In this course, you will learn the benefits and technical concepts of Amazon EMR.
If you are new to the service, you will learn how to start using Amazon EMR through a demonstration using the AWS Management Console and AWS Command Line Interface (AWS CLI).
You will learn about the native architecture and how the built-in features can help you process data for analytics purposes and business intelligence workloads.

What does Amazon EMR do?

With Amazon EMR, you can process vast amounts of data efficiently using Apache Hadoop and services offered by Amazon Web Services (AWS).
Use Amazon EMR to run large-scale distributed data processing jobs, interactive queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.

Amazon EMR automates time-consuming tasks like setup, tuning, monitoring, and capacity planning.

How Amazon EMR works

![[Product-Page-Diagram_Amazon-EMR.png]]

Amazon EMR is the industry-leading cloud big data solution for petabyte-scale data processing, interactive analytics, and ML using open-source frameworks.

Amazon EMR Serverless is a serverless option for Amazon EMR that makes it easy for data analysts and engineers to run open-source big data analytics frameworks without configuring, managing, and scaling clusters or servers.
You can use Amazon Simple Storage Service (Amazon S3) to store data in data lakes.
Then, you can analyze and derive insights from your data using dashboards and visualizations to perform big data processing, real-time analytics, and machine learning to guide better decisions.

![[Product-Page-Diagram_Amazon-EMR-Serverless.jpg]]