pyspark.RDD.subtractByKey¶
-
RDD.
subtractByKey
(other, numPartitions=None)[source]¶ Return each (key, value) pair in self that has no pair with matching key in other.
Examples
>>> x = sc.parallelize([("a", 1), ("b", 4), ("b", 5), ("a", 2)]) >>> y = sc.parallelize([("a", 3), ("c", None)]) >>> sorted(x.subtractByKey(y).collect()) [('b', 4), ('b', 5)]