Posts

Showing posts from December, 2018

Scala vs Python vs Java :: Big Data processing with Apache Spark

Scala vs Python vs Java :: Big Data processing with Apache Spark I have tried to compare Python and Scala on various parameters like: performance, ease of using the language, integration using existing libraries, support for streaming use cases and of Apache Spark’s core capabilities. I did not try to evaluate Java for the following reasons: Java does not support the REPL command line feature (Read, Evaluate, Print, Loop) which is very extensively used to check if small code-snippets are working as expected.  Java is too verbose – It uses more lines of code, and displays more messages than needed. Scala on the JVM is way more powerful and cleaner than Java. SCALA PYTHON De-Facto language for Spark Scala is the 1st preferred language for Spark as Spark itself is written in Scala, so developers can dig deep into the Spark source code whenever required. New features of Spark are first av