pyspark.RDD.sortBy#

RDD.sortBy(keyfunc, ascending=True, numPartitions=None)[source]#

Sorts this RDD by the given keyfunc

New in version 1.1.0.

Parameters

keyfuncfunction: a function to compute the key
ascendingbool, optional, default True: sort the keys in ascending or descending order
numPartitionsint, optional: the number of partitions in new RDD

Returns

RDD: a new RDD

See also

RDD.sortByKey()
pyspark.sql.DataFrame.sort()

Examples

>>> tmp = [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]
>>> sc.parallelize(tmp).sortBy(lambda x: x[0]).collect()
[('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)]
>>> sc.parallelize(tmp).sortBy(lambda x: x[1]).collect()
[('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]