pyspark.pandas.DataFrame.truncate¶

DataFrame.truncate(before: Optional[Any] = None, after: Optional[Any] = None, axis: Union[int, str, None] = None, copy: bool = True) → Union[DataFrame, Series]¶

Truncate a Series or DataFrame before and after some index value.

This is a useful shorthand for boolean indexing based on index values above or below certain thresholds.

Note

This API is dependent on Index.is_monotonic_increasing() which can be expensive.

Parameters

beforedate, str, int: Truncate all rows before this index value.
afterdate, str, int: Truncate all rows after this index value.
axis{0 or ‘index’, 1 or ‘columns’}, optional: Axis to truncate. Truncates the index (rows) by default.
copybool, default is True,: Return a copy of the truncated section.

Returns

type of caller: The truncated Series or DataFrame.

See also

DataFrame.loc: Select a subset of a DataFrame by label.
DataFrame.iloc: Select a subset of a DataFrame by position.

Examples

>>> df = ps.DataFrame({'A': ['a', 'b', 'c', 'd', 'e'],
...                    'B': ['f', 'g', 'h', 'i', 'j'],
...                    'C': ['k', 'l', 'm', 'n', 'o']},
...                   index=[1, 2, 3, 4, 5])
>>> df
   A  B  C
1  a  f  k
2  b  g  l
3  c  h  m
4  d  i  n
5  e  j  o

>>> df.truncate(before=2, after=4)
   A  B  C
2  b  g  l
3  c  h  m
4  d  i  n

The columns of a DataFrame can be truncated.

>>> df.truncate(before="A", after="B", axis="columns")
   A  B
a  f
b  g
c  h
d  i
e  j

For Series, only rows can be truncated.

>>> df['A'].truncate(before=2, after=4)
2    b
3    c
4    d
Name: A, dtype: object

A Series has index that sorted integers.

>>> s = ps.Series([10, 20, 30, 40, 50, 60, 70],
...               index=[1, 2, 3, 4, 5, 6, 7])
>>> s
1    10
2    20
3    30
4    40
5    50
6    60
7    70
dtype: int64

>>> s.truncate(2, 5)
2    20
3    30
4    40
5    50
dtype: int64

A Series has index that sorted strings.

>>> s = ps.Series([10, 20, 30, 40, 50, 60, 70],
...               index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
>>> s
a    10
b    20
c    30
d    40
e    50
f    60
g    70
dtype: int64

>>> s.truncate('b', 'e')
b    20
c    30
d    40
e    50
dtype: int64

pyspark.pandas.DataFrame.sample pyspark.pandas.DataFrame.backfill