pyspark.sql.Catalog.recoverPartitions#
- Catalog.recoverPartitions(tableName)[source]#
Recovers all the partitions in the directory of a table and updates the catalog.
Only works with a partitioned table, and not a view.
New in version 2.1.1.
- Parameters
- tableNamestr
Table name; may be qualified with catalog and database (namespace). If no database identifier is provided, the name refers to a table in the current database.
Examples
The example below creates a partitioned table against the existing directory of the partitioned table. After that, it recovers the partitions.
>>> import tempfile >>> with tempfile.TemporaryDirectory(prefix="recoverPartitions") as d: ... _ = spark.sql("DROP TABLE IF EXISTS tbl1") ... p = d.replace("'", "''") ... spark.range(1).selectExpr( ... "id as key", "id as value").write.partitionBy( ... "key").mode("overwrite").format("csv").save(d) ... _ = spark.sql( ... "CREATE TABLE tbl1 (key LONG, value LONG) USING csv OPTIONS (header false, path '{}') " ... "PARTITIONED BY (key)".format(p)) ... spark.table("tbl1").show() ... spark.catalog.recoverPartitions("tbl1") ... spark.table("tbl1").show() +-----+---+ |value|key| +-----+---+ +-----+---+ +-----+---+ |value|key| +-----+---+ | 0| 0| +-----+---+ >>> _ = spark.sql("DROP TABLE tbl1")