pyspark.sql.functions.tuple_intersection_double#
- pyspark.sql.functions.tuple_intersection_double(col1, col2, mode=None)[source]#
Returns the intersection of two Datasketches TupleSketch objects with double summaries.
New in version 4.2.0.
- Parameters
- Returns
ColumnThe binary representation of the intersected TupleSketch.
See also
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1, 10.0, 2, 20.0), (2, 20.0, 3, 30.0), (3, 30.0, 4, 40.0)], ["key1", "v1", "key2", "v2"]) # noqa >>> df = df.agg( ... sf.tuple_sketch_agg_double("key1", "v1").alias("sketch1"), ... sf.tuple_sketch_agg_double("key2", "v2").alias("sketch2") ... ) >>> df.select(sf.tuple_sketch_estimate_double(sf.tuple_intersection_double(df.sketch1, "sketch2"))).show() # noqa +------------------------------------------------------------------------------+ |tuple_sketch_estimate_double(tuple_intersection_double(sketch1, sketch2, sum))| +------------------------------------------------------------------------------+ | 2.0| +------------------------------------------------------------------------------+