ENH: Ability to specify only `rows` or `columns` as `margins` using `pd.pivot_table`

This issue has been tracked since 2022-09-21.

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently, using pd.pivot_table(..., margins=True) applies aggfunc to both axes. This is not always useful, depending on the use case, and so I propose changing margins to allow the specific axis of interest to be specified.

e.g., in this example I am only interested in adding a daily total sales

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "state": ["CA", "WA", "CO", "AZ"] * 3,
        "office_id": list(range(1, 7)) * 2,
        "date": [pd.Timestamp.now().date() - pd.Timedelta(days=d) for d in range(6)] * 2,
        "sales": [np.random.randint(100_000, 999_999) for _ in range(12)]
    }
).astype(
    {
        "sales": np.float64,
        "date": pd.api.types.pandas_dtype("O")
    }
)

df.pivot_table(index=["office_id", "date"], columns="state", margins=True, aggfunc="sum").iloc[:-1, :].stack() # workaround
df.pivot_table(index=["office_id", "date"], columns="state", margins="rows", aggfunc="sum").stack() # proposed syntax

Feature Description

from typing import Literal


def pivot_table(
    data: DataFrame,
    ...,
    margins: Literal["rows", "columns", "both"] | bool = False

Perhaps in future, boolean argument would be depreciated in favour of more explicit "both" argument.

Alternative Solutions

Currently the only way to achieve this outcome is by slicing to remove the axis which is not of interest (i.e. in above example with iloc)

Additional Context

No response

More Details About Repo
Owner Name pandas-dev
Repo Name pandas
Full Name pandas-dev/pandas
Language Python
Created Date 2010-08-24
Updated Date 2022-10-04
Star Count 35430
Watcher Count 1120
Fork Count 15089
Issue Count 3589

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date