Filtering out Nulls and Headers in Scala/Spark
Filtering out Nulls and Headers in Scala/Spark
id,fname,lname,age,designation
1, amarnath, jaiswal, 61, Businessman
2, prakash, yadav, 30, Developer
3, vishal, jaiswal, 32, Engineer
4, ravi, jaiswal,, Builder
Solution: Usage of mapPartitionsWithIndex to drop the 1st iterator for 0th index will filter the header from your input file, while the Usage of != "" on the 4th field will filter out the 3rd line
scala> sc.textFile("/User/VJ/testfile").mapPartitionsWithIndex((x,y) => if (x==0) y.drop(1) else y).filter(x=>x.split(",")(3) != "" ).take(5).foreach(println)
Output:
1, amarnath, jaiswal, 61, Businessman
2, prakash, yadav, 30, Developer
3, vishal, jaiswal, 32, Engineer
Thanks,
Vishal.
Comments
Post a Comment