Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect_format, train on column before extraction #28

Open
DanielJDufour opened this issue Nov 1, 2023 · 2 comments
Open

detect_format, train on column before extraction #28

DanielJDufour opened this issue Nov 1, 2023 · 2 comments

Comments

@DanielJDufour
Copy link
Owner

Assuming a column in a csv will all be formatted the same, I should be able to train on a column of dates before detecting dates

two new methods

from date_extractor import detect_format

data = [None, "", "10/31/23", "1/2/23"]

detect_format(data)
"%m/%d/%Y"


from date_extractor import prepare

extract_date = prepare(data)
for date in data:
    extract_date(date)
@DanielJDufour
Copy link
Owner Author

Have to also handle situations where the date format doesn't fit with the traditional date formats, in which case might need to return a special format or regex like
"(?PJanu|Febr|Marc|)/..."

@DanielJDufour
Copy link
Owner Author

I think I like the word train more like

from date_extractor import train

data = [None, "", "10/31/23", "1/2/23"]

# train will basically randomly select 100 text from the data set
extract_date = train(data, limit=100)



for item in data:
    date = extract_date(item)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant