pyspark.pandas.DataFrame.itertuples¶

DataFrame.itertuples(index: bool = True, name: Optional[str] = 'PandasOnSpark') → Iterator[Tuple][source]¶

Iterate over DataFrame rows as namedtuples.

Parameters

indexbool, default True: If True, return the index as the first element of the tuple.
namestr or None, default “PandasOnSpark”: The name of the returned namedtuples or None to return regular tuples.

Returns

iterator: An object to iterate over namedtuples for each row in the DataFrame with the first field possibly being the index and following fields being the column values.

See also

DataFrame.iterrows: Iterate over DataFrame rows as (index, Series) pairs.
DataFrame.items: Iterate over (column name, Series) pairs.

Notes

The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. On python versions < 3.7 regular tuples are returned for DataFrames with a large number of columns (>254).

Examples

>>> df = ps.DataFrame({'num_legs': [4, 2], 'num_wings': [0, 2]},
...                   index=['dog', 'hawk'])
>>> df
      num_legs  num_wings
dog          4          0
hawk         2          2

>>> for row in df.itertuples():
...     print(row)
...
PandasOnSpark(Index='dog', num_legs=4, num_wings=0)
PandasOnSpark(Index='hawk', num_legs=2, num_wings=2)

By setting the index parameter to False we can remove the index as the first element of the tuple:

>>> for row in df.itertuples(index=False):
...     print(row)
...
PandasOnSpark(num_legs=4, num_wings=0)
PandasOnSpark(num_legs=2, num_wings=2)

With the name parameter set we set a custom name for the yielded namedtuples:

>>> for row in df.itertuples(name='Animal'):
...     print(row)
...
Animal(Index='dog', num_legs=4, num_wings=0)
Animal(Index='hawk', num_legs=2, num_wings=2)

pyspark.pandas.DataFrame.iterrows pyspark.pandas.DataFrame.keys