pyspark.RDD.sortBy#

RDD.sortBy(keyfunc, ascending=True, numPartitions=None)[source]#

Sorts this RDD by the given keyfunc

New in version 1.1.0.

Parameters
keyfuncfunction

a function to compute the key

ascendingbool, optional, default True

sort the keys in ascending or descending order

numPartitionsint, optional

the number of partitions in new RDD

Returns
RDD

a new RDD

Examples

>>> tmp = [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]
>>> sc.parallelize(tmp).sortBy(lambda x: x[0]).collect()
[('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)]
>>> sc.parallelize(tmp).sortBy(lambda x: x[1]).collect()
[('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]