Description

Apache Hadoop, Yarn, Hive and Spark are popular big data tools used by many organizations to develop big data analytics solutions. Through this course students can develop big data applications using these tools to process data and derive valuable insights from data. By the end of the course, students will be able to set up a personal big data development environment, master the fundamental concepts of Hadoop, Yarn, Hive and Spark, copy data into and from a big data cluster, process the data using the Map/Reduce paradigm, run Map/Reduce and Spark jobs on Yarn, Learn to process big data using Scala programming language in Spark, Use RDDs and dataframes to process big data, use Parquet format to store data, and finally use Machine Learning Libraries of Spark to develop Machine Learning solutions like decision trees, recommendation engine, Linear Regression and Anomaly detection. This is a hands on development course and you will practice more than 50 activities during this course. While Java knowledge is assumed, fundamentals of Scala are taught so that you can write Scala code to process data in Spark. The course provides a foundation for developers to join big data development teams in their organization.