detect_format, train on column before extraction #28

DanielJDufour · 2023-11-01T02:02:58Z

Assuming a column in a csv will all be formatted the same, I should be able to train on a column of dates before detecting dates

two new methods

from date_extractor import detect_format

data = [None, "", "10/31/23", "1/2/23"]

detect_format(data)
"%m/%d/%Y"


from date_extractor import prepare

extract_date = prepare(data)
for date in data:
    extract_date(date)

DanielJDufour · 2023-11-01T02:05:01Z

Have to also handle situations where the date format doesn't fit with the traditional date formats, in which case might need to return a special format or regex like
"(?PJanu|Febr|Marc|)/..."

DanielJDufour · 2023-11-16T02:11:08Z

I think I like the word train more like

from date_extractor import train

data = [None, "", "10/31/23", "1/2/23"]

# train will basically randomly select 100 text from the data set
extract_date = train(data, limit=100)



for item in data:
    date = extract_date(item)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detect_format, train on column before extraction #28

detect_format, train on column before extraction #28

DanielJDufour commented Nov 1, 2023

DanielJDufour commented Nov 1, 2023

DanielJDufour commented Nov 16, 2023

detect_format, train on column before extraction #28

detect_format, train on column before extraction #28

Comments

DanielJDufour commented Nov 1, 2023

DanielJDufour commented Nov 1, 2023

DanielJDufour commented Nov 16, 2023