Spark Map and FlatMap Usages

Spark Map and FlatMap Usages

Apache Spark supports various transformation techniques. In this blog, we will learn about the Apache Spark Map and FlatMap Operation and Comparison between these two transformation methods.

Apache Spark Map and FlatMap Operation

Both map and flatMap are similar operations and in both we apply the transformations on the input data.

Map performs a one to one transformation. It operates on each and every element of the RDD one by one and produces new RDD out of it.

While, the flatMap method returns a new RDD formed by flattening the collection of sequences.

Let’s discuss Spark map and flatMap in detail and see how they work on the same dataset

Sample Data File

+--------------------+ | data| +--------------------+ | David Copperfield| |William Shakespeare| |J K Rowling | |Mark Twain| |Harper Lee| +--------------------+

Map ( ) Transformation

The map method is a higher-order method that takes a function as input and applies it to each element in the source RDD to create a new RDD in Spark.

//Map Transformation val mapDF=df.map(fun=> { fun.getString(0).split(" ") }) mapDF.show()

//Output +-------------------------------------+ |value | +-------------------------------------+ |[David, Copperfield] | |[William, Shakespeare]| |[J, K, Rowling] | |[Mark, Twain] | |[Harper, Lee] | +-------------------------------------+

In this process, Spark map() transformation applies a function to each row in a DataFrame/Dataset and returns the new transformed Dataset.The column on DataFrame converted from String to Array Type.

FlatMap ( ) Transformation

Spark flatMap() transformation flattens the DataFrame column after applying the function on every element and returns a new DataFrame respectively.

The returned DataFrame can have the same count or more elements than the current DataFrame. This is one of the major differences between flatMap() and map(), where map() transformation always returns the same number of elements as in input.

– FlatMap (func) Flatmap is somehow sim //Flat Map Transformation val flatMapDF=df.flatMap(fun=> { fun.getString(0).split(" ") }) flatMapDF.show()

//Output +-----------+ | value| +-----------+ | David| |Copperfield| | William| | Shakespeare| | J| | K| | Rowling| |Mark| | Twain| | Harper| | Lee|

Map and FlatMap – Conclusion

As a result, we have seen that It’s not a surprise that the map function is the key building block in Apache spark RDD. Both Transformation operations are used to create RDDs but with different styles.

We have also seen that map() and flatMap() transformation methods are high in use in Apache Spark and also learned the whole comparison on Apache Spark Map vs FlatMap Operation.