Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cumulative translation #148

Merged
merged 37 commits into from
Mar 16, 2023
Merged

Conversation

hleft
Copy link
Collaborator

@hleft hleft commented Mar 11, 2023

Wait for how many tokens have been accumulated before starting the translation

--accumulated_num num

The strange part: I don't know how to judge how the cumulative translation results correspond to the original paragraph, so use sep = "\n\n\n\n\n", if sep="\n\n", it is often translated into the same line

A possible bug is that even if \n\n\n\n\n, it may still be translated to the same line, and then there will be problems where the translation appears, this will only affect the last num characters, after which it will return to normal

A good effect is that it can understand the context more, and the prompt is more effective

I have used various prompts before to ask it not to translate people's names, but to no avail, with context it works(It will most likely work, and in a few cases it will still translate, but it can be translated to the same name)

name2

left:

time OPENAI_API_SYS_MSG="You are an assistant who translates computer technology books but don't translate people's names, there is a blank line after each of your translation results" python3 "make_book.py" --book_name "$filepath" --openai_key "${openai_apikey_book}" --language "zh-hans" --accumulated_num 1500

right:

time python3 "make_book.py" --book_name "$filepath" --openai_key "${openai_apikey_book}" --language "zh-hans"

book_maker/loader/epub_loader.py Outdated Show resolved Hide resolved
book_maker/loader/epub_loader.py Outdated Show resolved Hide resolved
book_maker/loader/epub_loader.py Outdated Show resolved Hide resolved
book_maker/loader/epub_loader.py Outdated Show resolved Hide resolved
book_maker/loader/epub_loader.py Outdated Show resolved Hide resolved
book_maker/loader/epub_loader.py Outdated Show resolved Hide resolved
@hleft hleft force-pushed the cumulative-translation branch from 8144fdd to 6ef21e7 Compare March 11, 2023 14:14
@hleft hleft force-pushed the cumulative-translation branch from 6ef21e7 to 2baf359 Compare March 11, 2023 14:21
@hleft hleft force-pushed the cumulative-translation branch from 4eecaa6 to 3523469 Compare March 11, 2023 16:08
@yihong0618
Copy link
Owner

Will take a look later for this PR.

@yihong0618
Copy link
Owner

the function part let me do it is OK, when I got time

@hleft
Copy link
Collaborator Author

hleft commented Mar 12, 2023

@yihong0618 Ok, I'm trying to get a better prompt so that it can translate the same number of paragraphs as the original paragraphs, and still often have problems

@yihong0618
Copy link
Owner

@hleft FYI, I want to keep the original prompt as default because of that it costs the least tokens cc @ConanChou, most users do not care how good of all, they just want it to help them read, so users can DIY their prompt but we do not change the default one unless we found a better one better for translate and for token cost.

@hleft
Copy link
Collaborator Author

hleft commented Mar 12, 2023

@yihong0618 I mean only modify the default prompt when --accumulated_num is enabled, because if don't modify it, you will get an error and not just a bad translation (the translated result appears in the wrong place).

@yihong0618
Copy link
Owner

yep thanks~

@hleft
Copy link
Collaborator Author

hleft commented Mar 12, 2023

Still in testing, the current main progress is that I understand that it is not enough to just modify the prompt and retry, and some special cases must be manually handled, such as <sup>, Source: xxx link

@hleft hleft force-pushed the cumulative-translation branch from 8e669d6 to 8badb41 Compare March 12, 2023 11:30
@hleft hleft force-pushed the cumulative-translation branch 4 times, most recently from 22f3b2e to 7773778 Compare March 15, 2023 12:26
Copy link
Owner

@yihong0618 yihong0618 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM will test and merge.
amazing work!

@zstone12
Copy link
Collaborator

LGTM+1.
The fantastic work!

@zstone12
Copy link
Collaborator

zstone12 commented Mar 15, 2023

Please add the relevant content of the --accumulated_num parameter to README:D

hleft added 3 commits March 15, 2023 21:14
refactor: _process_paragraph

refactor: translate_paragraphs_acc
@hleft hleft force-pushed the cumulative-translation branch from 5aa35c4 to cc85d6b Compare March 15, 2023 13:14
@hleft hleft force-pushed the cumulative-translation branch from 780147b to 173b756 Compare March 15, 2023 15:30
@yihong0618
Copy link
Owner

can not test...
image

@yihong0618
Copy link
Owner

@hleft can this merge now?

@hleft
Copy link
Collaborator Author

hleft commented Mar 16, 2023

@yihong0618 yes ready

@yihong0618 yihong0618 merged commit e38a236 into yihong0618:main Mar 16, 2023
@yihong0618
Copy link
Owner

please check new CI if it failed we need to fix it quick

@danparizher danparizher mentioned this pull request Mar 16, 2023
wayhome pushed a commit to wayhome/bilingual_book_maker that referenced this pull request Aug 29, 2024
wayhome pushed a commit to wayhome/bilingual_book_maker that referenced this pull request Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants