QST: handle lines with less separators than main lines in read_csv

This issue has been tracked since 2022-09-23.

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

https://stackoverflow.com/questions/73820090/make-pandas-read-csv-to-not-add-lines-with-less-columns-delimiters-than-the-main

Question about pandas

The on_bad_lines=warn question me. Pandas team added the functionality to directly handle lines with more separators than the main lines, that's why it don't seems strange to me that some other pandas option could handle in the same way lines with less separators than main lines

Using pandas.read_csv with on_bad_lines='warn' option the case there is too many
columns delimiters work well, bad lines are not loaded and stderr catch the bad lines
numbers:

    import pandas as pd
    from io import StringIO
    data = StringIO("""
    nom,f,nb
    bat,F,52
    cat,M,66,
    caw,F,15
    dog,M,66,,
    fly,F,61
    ant,F,21""")
    df = pd.read_csv(data, sep=',', on_bad_lines='warn')

    b'Skipping line 4: expected 3 fields, saw 4\nSkipping line 6: expected 3 fields, saw 5\n'

    df.head(10)
    #    nom  f  nb
    # 0  bat  F  52
    # 1  caw  F  15
    # 2  fly  F  61
    # 3  ant  F  21

But in case the number of delimiter (here sep=,) is less than the main, the line
is added adding NaN.:

    import pandas as pd
    from io import StringIO
    data = StringIO("""
    nom,f,nb
    bat,F,52
    catM66,
    caw,F,15
    dog,M66
    fly,F,61
    ant,F,21""")
    df = pd.read_csv(data, sep=',', on_bad_lines='warn', dtype=str)
    df.head(10)

    #       nom    f   nb
    # 0     bat    F   52
    # 1  catM66  NaN  NaN            <==
    # 2     caw    F   15
    # 3     dog  M66  NaN            <==
    # 4     fly    F   61
    # 5     ant    F   21

Is there a way to make read_csv to not add lines with less columns delimiters than
the main lines ?

phofl wrote this answer on 2022-09-29

Hi, thanks for your report.
Could you check, if we have open issues about this? Pretty sure that I've seen something like this already

More Details About Repo
Owner Name pandas-dev
Repo Name pandas
Full Name pandas-dev/pandas
Language Python
Created Date 2010-08-24
Updated Date 2022-10-04
Star Count 35430
Watcher Count 1120
Fork Count 15089
Issue Count 3589

YOU MAY BE INTERESTED

Issue Title Created Date Updated Date