REGR: Setting values with 'loc' and boolean mask mixes up values (all-True mask)

This issue has been tracked since 2022-09-22.

According to documentation, using loc should work with Boolean arrays. However, the following does not seem to work:

import pandas as pd

n = 17

data = pd.DataFrame({
    "name": n * ["a"], 
    "x": range(n), 
    "y": range(n)
})

copy = data.copy()

idx = n * [True]
data.loc[idx, ["x", "y"]] = data[["x", "y"]]
assert data.equals(copy)   # Raises assertion error

The weird thing is that if n is smaller then the code works. This has been tested on Pandas 1.5.0

martinfleis wrote this answer on 2022-09-22

I believe that this is a regression in pandas 1.5.0 as this works correctly on 1.4.4. We ran into the same bug in downstream yesterday geopandas/geopandas#2558.

jorisvandenbossche wrote this answer on 2022-09-22

The reproducer from that issue:

In [1]: df = pd.DataFrame({'idx': range(20), 'col': np.arange(20, dtype="float")})

In [2]: df.loc[np.ones(len(df), dtype=bool), 'col'] = df.col.values

In [3]: df
Out[3]: 
    idx   col
0     0   0.0
1     1  17.0
2     2  16.0
3     3  15.0
4     4  14.0
5     5  13.0
6     6  12.0
7     7  11.0
8     8  10.0
9     9   9.0
10   10   8.0
11   11   7.0
12   12   6.0
13   13   5.0
14   14   4.0
15   15   3.0
16   16   2.0
17   17   1.0
18   18  18.0
19   19  19.0

The weird thing is that if n is smaller then the code works.

It indeed needs some minimum values, because of np.argsort only starting to return non-sorted values after a minimum number of True values:

In [4]: np.argsort([True]*10)
Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [5]: np.argsort([True]*17)
Out[5]: array([ 0, 14, 13, 12, 11, 10,  9, 15,  8,  6,  5,  4,  3,  2,  1,  7, 16])

This argsort is used here:

value = value[np.argsort(pi)]

That code assumes that indexer is already an integer indexer, not a boolean one. Before, we converted a boolean indexer to integers in _locIndexer._convert_to_indexer, but this was removed in #45501 (location in diff).

I haven't verified to be 100% sure, but so given the code path and the change in #45501, I assume that's the cause.

MarcoGorelli wrote this answer on 2022-09-24

I haven't verified to be 100% sure, but so given the code path and the change in #45501, I assume that's the cause.

yup, git bisect confirms

More Details About Repo
Owner Name pandas-dev
Repo Name pandas
Full Name pandas-dev/pandas
Language Python
Created Date 2010-08-24
Updated Date 2022-10-04
Star Count 35430
Watcher Count 1120
Fork Count 15089
Issue Count 3589

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date