pyspark.RDD.sortBy#
- RDD.sortBy(keyfunc, ascending=True, numPartitions=None)[source]#
Sorts this RDD by the given keyfunc
New in version 1.1.0.
- Parameters
- keyfuncfunction
a function to compute the key
- ascendingbool, optional, default True
sort the keys in ascending or descending order
- numPartitionsint, optional
the number of partitions in new
RDD
- Returns
Examples
>>> tmp = [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)] >>> sc.parallelize(tmp).sortBy(lambda x: x[0]).collect() [('1', 3), ('2', 5), ('a', 1), ('b', 2), ('d', 4)] >>> sc.parallelize(tmp).sortBy(lambda x: x[1]).collect() [('a', 1), ('b', 2), ('1', 3), ('d', 4), ('2', 5)]