Introduction to Apache Spark

What is Apache Spark? Reasons behind Apache Spark invention: • Exploding Data • Data Manipulation speed Several shortcomings of Hadoop are: • Adherence to its Map Reduce programming model • Limited programming language API options • Not a good fit for iterative algorithms like Machine Learning Algorithms • Pipelining of tasks is not easy What is Spark Apache Spark is an open source data processing framework for performing Big data analytics on distributed computing cluster. Spark Features Spark has several advantages when compared to other big data and Map Reduce technologies like Hadoop and Storm. Spark is faster than Map Reduce and offers low latency due to reduced disk input and output operation. Spark has the capability of in memory computation and operations, which makes the data processing really fast than another Map Reduce. Unlike Hadoop, spark maintains the intermediate results in memory rather than writing every intermediate outpu...