BUG: merge_asof unrecoverably loses precision, yet does not support nullable integer columns.

This issue has been tracked since 2022-09-22.

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df_a = pd.DataFrame({
    'a':[0, 1549049940000000000],
    'v2': ['a1', 'a2']
})
df_b = pd.DataFrame({
    'b': [1549049937688215000],
    'v': ['b1']
})
pd.merge_asof(df_a, df_b, left_on=['a'], right_on=['b']).astype({'b': pd.Int64Dtype()})

Issue Description

The join result incorrectly ends up with b = 1549049937688215040, rather than the value provided. Note the last 2 digits being 40 rather than 00 like in the input value.

This is because it casts the values to floats to be able to produce "null" values in the row that hasn't been matched, and as part of casting the value to floats, it loses precision and produces incorrect values.

If you remove the first row (where a = 0) in df_a, it returns the correct result as there are no rows that need to be "null".

Sadly you cannot do a merge_asof on nullable types (pd.Int64Dtype() et al) to fix this.

Expected Behavior

import pandas as pd
df_a = pd.DataFrame({
    'a':[0, 1549049940000000000],
    'v2': ['a1', 'a2']
})
df_b = pd.DataFrame({
    'b': [1549049937688215000],
    'v': ['b1']
})
df_a['a_copy'] = df_a['a'].astype(pd.Int64Dtype())
df_b['b_copy'] = df_b['b'].astype(pd.Int64Dtype())

pd.merge_asof(df_a, df_b, left_on=['a'], right_on=['b']).drop(columns=['a', 'b']).rename(columns={'a_copy': 'a', 'b_copy': 'b'})

returns the correct result but is not very nice to use.

Installed Versions

1.5.0

phofl wrote this answer on 2022-09-22

Hi, thanks for your report. We are not casting from numpy dtypes to nullable dtypes just because there are nans introduced. You have to be explizit there. You can use convert_dtypesto generalise this

Audrius-GR wrote this answer on 2022-09-22

Sure, but We are not casting from numpy dtypes to nullable dtypes just because there are nans introduced. is exactly what causes the integers to be converted to floats and lose precision unrecoverably.

Why not convert to pandas nullable types instead and not lose precision?

As I said, there are workarounds, as you pointed out, convert_dtypes is one of them (except you still have to use numpy columns for the merge keys, as it does not support pandas nullable types),

But I feel this is "a trap by default" when a user passes in a integer 1549049937688215000 and gets back a float that cannot be converted back to the original value, as casting it to int yields 1549049937688215040.

phofl wrote this answer on 2022-09-22

There is lots of discussion about this on the issue tracker. For now, it is a deliberate decision not to do that

Audrius-GR wrote this answer on 2022-09-22

Except that the decision is not consistent.

If there is no "null" row, I get int64's back with the correct values that don't lose precision.
If there is a null row, I get floats that have lost precision.

Audrius-GR wrote this answer on 2022-09-22

and doing:

df_a['a'] = df_a['a'].astype({'a': pd.Int64Dtype()})
df_b['b'] = df_b['b'].astype({'b': pd.Int64Dtype()})

before the merge does not work, and ends up with:

  File "\lib\site-packages\pandas\core\reshape\merge.py", line 1993, in _get_join_indexers
    return func(left_values, right_values, self.allow_exact_matches, tolerance)
  File "pandas\_libs\join.pyx", line 868, in pandas._libs.join.__pyx_fused_cpdef
TypeError: No matching signature found
phofl wrote this answer on 2022-09-22

This is the numpy casting behavior, not pandas

phofl wrote this answer on 2022-09-22

Merge is not yet fully supported for nullable dtypes,

There are open issues about that

More Details About Repo
Owner Name pandas-dev
Repo Name pandas
Full Name pandas-dev/pandas
Language Python
Created Date 2010-08-24
Updated Date 2022-10-04
Star Count 35430
Watcher Count 1120
Fork Count 15089
Issue Count 3589

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date