pyspark.sql.streaming.DataStreamWriter.outputMode¶
-
DataStreamWriter.
outputMode
(outputMode: str) → pyspark.sql.streaming.readwriter.DataStreamWriter[source]¶ Specifies how data of a streaming DataFrame/Dataset is written to a streaming sink.
New in version 2.0.0.
Changed in version 3.5.0: Supports Spark Connect.
Options include:
- append: Only the new rows in the streaming DataFrame/Dataset will be written to
the sink
- complete: All the rows in the streaming DataFrame/Dataset will be written to the sink
every time these are some updates
- update: only the rows that were updated in the streaming DataFrame/Dataset will be
written to the sink every time there are some updates. If the query doesn’t contain aggregations, it will be equivalent to append mode.
Notes
This API is evolving.
Examples
>>> df = spark.readStream.format("rate").load() >>> df.writeStream.outputMode('append') <...streaming.readwriter.DataStreamWriter object ...>
The example below uses Complete mode that the entire aggregated counts are printed out.
>>> import time >>> df = spark.readStream.format("rate").option("rowsPerSecond", 10).load() >>> df = df.groupby().count() >>> q = df.writeStream.outputMode("complete").format("console").start() >>> time.sleep(3) >>> q.stop()