pyspark.sql.Catalog.recoverPartitions#

Catalog.recoverPartitions(tableName)[source]#

Recovers all the partitions in the directory of a table and updates the catalog.

Only works with a partitioned table, and not a view.

New in version 2.1.1.

Parameters

tableNamestr: Table name; may be qualified with catalog and database (namespace). If no database identifier is provided, the name refers to a table in the current database.

Examples

The example below creates a partitioned table against the existing directory of the partitioned table. After that, it recovers the partitions.

>>> import tempfile
>>> with tempfile.TemporaryDirectory(prefix="recoverPartitions") as d:
...     _ = spark.sql("DROP TABLE IF EXISTS tbl1")
...     p = d.replace("'", "''")
...     spark.range(1).selectExpr(
...         "id as key", "id as value").write.partitionBy(
...             "key").mode("overwrite").format("csv").save(d)
...     _ = spark.sql(
...         "CREATE TABLE tbl1 (key LONG, value LONG) USING csv OPTIONS (header false, path '{}') "
...         "PARTITIONED BY (key)".format(p))
...     spark.table("tbl1").show()
...     spark.catalog.recoverPartitions("tbl1")
...     spark.table("tbl1").show()
+-----+---+
|value|key|
+-----+---+
+-----+---+
+-----+---+
|value|key|
+-----+---+
|    0|  0|
+-----+---+
>>> _ = spark.sql("DROP TABLE tbl1")