I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
on_bad_lines=warn question me. Pandas team added the functionality to directly handle lines with more separators than the main lines, that's why it don't seems strange to me that some other pandas option could handle in the same way lines with less separators than main lines
on_bad_lines='warn' option the case there is too many
columns delimiters work well, bad lines are not loaded and stderr catch the bad lines
import pandas as pd from io import StringIO data = StringIO(""" nom,f,nb bat,F,52 cat,M,66, caw,F,15 dog,M,66,, fly,F,61 ant,F,21""") df = pd.read_csv(data, sep=',', on_bad_lines='warn') b'Skipping line 4: expected 3 fields, saw 4\nSkipping line 6: expected 3 fields, saw 5\n' df.head(10) # nom f nb # 0 bat F 52 # 1 caw F 15 # 2 fly F 61 # 3 ant F 21
But in case the number of delimiter (here
sep=,) is less than the main, the line
is added adding
import pandas as pd from io import StringIO data = StringIO(""" nom,f,nb bat,F,52 catM66, caw,F,15 dog,M66 fly,F,61 ant,F,21""") df = pd.read_csv(data, sep=',', on_bad_lines='warn', dtype=str) df.head(10) # nom f nb # 0 bat F 52 # 1 catM66 NaN NaN <== # 2 caw F 15 # 3 dog M66 NaN <== # 4 fly F 61 # 5 ant F 21
Is there a way to make
read_csv to not add lines with less columns delimiters than
the main lines ?
|Issue Title||Created Date||Updated Date|