pyspark.testing.assertSchemaEqual¶
-
pyspark.testing.
assertSchemaEqual
(actual: pyspark.sql.types.StructType, expected: pyspark.sql.types.StructType)[source]¶ A util function to assert equality between DataFrame schemas actual and expected.
New in version 3.5.0.
- Parameters
- actualStructType
The DataFrame schema that is being compared or tested.
- expectedStructType
The expected schema, for comparison with the actual schema.
Notes
When assertSchemaEqual fails, the error message uses the Python difflib library to display a diff log of the actual and expected schemas.
Examples
>>> from pyspark.sql.types import StructType, StructField, ArrayType, IntegerType, DoubleType >>> s1 = StructType([StructField("names", ArrayType(DoubleType(), True), True)]) >>> s2 = StructType([StructField("names", ArrayType(DoubleType(), True), True)]) >>> assertSchemaEqual(s1, s2) # pass, schemas are identical
>>> df1 = spark.createDataFrame(data=[(1, 1000), (2, 3000)], schema=["id", "number"]) >>> df2 = spark.createDataFrame(data=[("1", 1000), ("2", 5000)], schema=["id", "amount"]) >>> assertSchemaEqual(df1.schema, df2.schema) Traceback (most recent call last): ... PySparkAssertionError: [DIFFERENT_SCHEMA] Schemas do not match. --- actual +++ expected - StructType([StructField('id', LongType(), True), StructField('number', LongType(), True)]) ? ^^ ^^^^^ + StructType([StructField('id', StringType(), True), StructField('amount', LongType(), True)]) ? ^^^^ ++++ ^