All the way, I have been reading that RDD are immutable but to my surprise today I found different result. I would like to know the reason and supporting documentation if possible.
scala> val m = Array.fill(2, 2)(5) m: Array[Array[Int]] = Array(Array(5, 5), Array(5, 5)) scala> val rdd = sc.parallelize(m) scala> rdd.collect() res6: Array[Array[Int]] = Array(Array(5, 5), Array(5, 5)) // Interesting here. scala> m(0)(1) = 99 scala> rdd.collect() res8: Array[Array[Int]] = Array(Array(5, 99), Array(5, 5))
Answer by Vinod Bonthu · Feb 22, 2016 at 06:03 PM
In a distributed environment, RDD is spread over across many nodes. When you call some operation, you tell each node what to do with the piece of the RDD that it has. If you refer to any local variables (like
myMap), they get serialized and sent to the machines, so they can use it. But nothing comes back. So your original copy of
myMapis unaffected. RDD are not just immutable but deterministic functions of their inputs and RDD's parts can be recreated at any time. This helps in taking advantage of caching. RDD isn't really a collection of data, but just a recipe for making data from other data.
Answer by Paul Hargis · Feb 22, 2016 at 10:49 PM
The UC Berkeley white paper on RDD's spells it out this way:
Although individual RDDs are immutable, it is possible to implement mutable state by having multiple RDDs to represent multiple versions of a dataset. We made RDDs immutable to make it easier to describe lineage graphs, but it would have been equivalent to have our abstraction be versioned datasets and track versions in lineage graphs.
If you notice the output above, your collect() output is initially res6, and second time is res8. This, presumably, is a new version of the initial RDD and thus gets a different reference name.
I have one RDD where the first row is fixed and the next row onwards calculated based upon the previous rows math operations. Please give some sugession How I can achieve this? I want code in scala. 1 Answer
How to install Spark/Scala on Windows ? 2 Answers