fix (#90) split text if token larger than 4096 #106

jeffery9 · 2023-03-08T04:32:17Z

split text when token larger than the quota

jeffery9 · 2023-03-08T04:33:11Z

should fix #90

yihong0618 · 2023-03-10T13:04:15Z

book_maker/translator/chatgptapi_translator.py

+
+        message_log = [
+            {
+                "role": "user",
+                # english prompt here to save tokens
+                "content": f"Please help me to translate,`{text}` to {self.language}, please return only translated content not include the origin text",
+            }
+        ]
+        count_tokens = num_tokens_from_messages(message_log)
+        consumed_tokens = 0
+        t_text = ""
+        if count_tokens > 4000:
+            print("too long!")
+
+            splits = count_tokens // 4000 + 1
+
+            text_list = text.split(".")
+            sub_text = ""
+            t_sub_text = ""
+            for n in range(splits):
+                text_segment = text_list[n * splits : (n + 1) * splits]
+                sub_text = ".".join(text_segment)
+                print(sub_text)
+
+                completion = openai.ChatCompletion.create(
+                    model="gpt-3.5-turbo",
+                    messages=[
+                        {
+                            "role": "user",
+                            # english prompt here to save tokens
+                            "content": f"Please help me to translate,`{sub_text}` to {self.language}, please return only translated content not include the origin text",
+                        }
+                    ],
+                )
+                t_sub_text = (
+                    completion["choices"][0]
+                    .get("message")
+                    .get("content")
+                    .encode("utf8")
+                    .decode()
+                )
+                print(t_sub_text)
+                consumed_tokens += completion["usage"]["prompt_tokens"]
+
+                t_text = t_text + t_sub_text
+
+            else:
+                try:
+                    completion = openai.ChatCompletion.create(
+                        model="gpt-3.5-turbo",
+                        messages=[
+                            {
+                                "role": "user",
+                                # english prompt here to save tokens
+                                "content": f"Please help me to translate,`{text}` to {self.language}, please return only translated content not include the origin text",
+                            }
+                        ],
+                    )
+                    t_text = (
+                        completion["choices"][0]
+                        .get("message")
+                        .get("content")
+                        .encode("utf8")
+                        .decode()
+                    )
+                    consumed_tokens += completion["usage"]["prompt_tokens"]
+
+                except Exception as e:
+                    # TIME LIMIT for open api please pay
+                    key_len = self.key.count(",") + 1
+                    sleep_time = int(60 / key_len)
+                    time.sleep(sleep_time)
+                    print(e, f"will sleep  {sleep_time} seconds")
+                    self.rotate_key()
+                    completion = openai.ChatCompletion.create(
+                        model="gpt-3.5-turbo",
+                        messages=[
+                            {
+                                "role": "user",
+                                "content": f"Please help me to translate,`{text}` to {self.language}, please return only translated content not include the origin text",
+                            }
+                        ],
+                    )
+                    t_text = (
+                        completion["choices"][0]
+                        .get("message")
+                        .get("content")
+                        .encode("utf8")
+                        .decode()
+                    )
+                    consumed_tokens += completion["usage"]["prompt_tokens"]
+
        print(t_text)
+        print(f"{consumed_tokens} prompt tokens used.")


this functions is too long let's split it

book_maker/utils.py

jeffery9

refactored.

jeffery9

verified

jeffery9 · 2023-03-13T05:09:27Z

have passed ' black . --check' local, but not pass ci.

yihong0618 · 2023-03-13T05:25:33Z

pip insall -U black

jeffery9 · 2023-03-13T08:29:46Z

pip insall -U black

already formatted.


jeffery@jeffery-MBP ~/repos/bilingual_book_maker (split_p) $ python3.9 -m pip install -U black
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://mirrors.163.com/pypi/simple/
Requirement already satisfied: black in /Users/jeffery/Library/Python/3.9/lib/python/site-packages (21.6b0)
Collecting black
  Using cached https://mirrors.163.com/pypi/packages/9b/27/b2f98b627738b02dcac06ae9e2ab13f14ab906fe6dd6366050c76883d4b5/black-21.12b0-py3-none-any.whl (156 kB)
Requirement already satisfied: click>=7.1.2 in /Users/jeffery/Library/Python/3.9/lib/python/site-packages (from black) (8.0.3)
Requirement already satisfied: mypy-extensions>=0.4.3 in /Users/jeffery/Library/Python/3.9/lib/python/site-packages (from black) (0.4.3)
  Using cached https://mirrors.163.com/pypi/packages/c7/24/0de05480822e5f0f2cc539fce9029bc2507b44b7f85ec1a9e23d89dea6c3/black-21.11b1-py3-none-any.whl (155 kB)
  Using cached https://mirrors.163.com/pypi/packages/3d/ad/1cf514e7f9ee4c3d8df7c839d7977f7605ad76557f3fca741ec67f76dba6/black-21.11b0-py3-none-any.whl (155 kB)
  Using cached https://mirrors.163.com/pypi/packages/12/df/0e55791b9c6ca07b4a3404eef6cee1ca42503bf16e9fc9df0247b4803cf1/black-21.10b0-py3-none-any.whl (150 kB)
  Using cached https://mirrors.163.com/pypi/packages/d2/16/a92c999103bee1236dd93f703f3522217fe00bd97bd50ae3699c2d91e320/black-21.9b0-py3-none-any.whl (148 kB)
  Using cached https://mirrors.163.com/pypi/packages/9d/11/cee7b695f95178025c428168dd75094f0e00fdcfe0fd004a0f8bc9bea3ee/black-21.8b0-py3-none-any.whl (148 kB)
  Using cached https://mirrors.163.com/pypi/packages/b6/6e/b706ab6440ebac6e0f7fb4615232216dd3bba09fa9fba6815df90601411c/black-21.7b0-py3-none-any.whl (141 kB)
Requirement already satisfied: appdirs in /Users/jeffery/Library/Python/3.9/lib/python/site-packages (from black) (1.4.4)
Requirement already satisfied: toml>=0.10.1 in /Users/jeffery/Library/Python/3.9/lib/python/site-packages (from black) (0.10.2)
Requirement already satisfied: regex>=2020.1.8 in /Users/jeffery/Library/Python/3.9/lib/python/site-packages (from black) (2021.10.21)
Requirement already satisfied: pathspec<1,>=0.8.1 in /Users/jeffery/Library/Python/3.9/lib/python/site-packages (from black) (0.8.1)
jeffery@jeffery-MBP ~/repos/bilingual_book_maker (split_p) $ black .
All done! ✨ 🍰 ✨
17 files left unchanged.
jeffery@jeffery-MBP ~/repos/bilingual_book_maker (split_p) $ git status
On branch split_p
Your branch is up to date with 'origin/split_p'.

nothing to commit, working tree clean
jeffery@jeffery-MBP ~/repos/bilingual_book_maker (split_p) $

yihong0618 · 2023-03-13T08:33:02Z

no worry I will take a look tonight or tomorrow.

…ihong0618#106

jeffery9 closed this Mar 8, 2023

jeffery9 force-pushed the split_p branch from 46772e9 to 8a4806c Compare March 8, 2023 04:33

jeffery9 reopened this Mar 8, 2023

jeffery9 marked this pull request as ready for review March 8, 2023 06:40

jeffery9 force-pushed the split_p branch from 14e3d32 to f7b3daa Compare March 8, 2023 06:52

jeffery9 changed the title ~~fix #90~~ split if token larger than 4096, try to fix #90 Mar 8, 2023

jeffery9 mentioned this pull request Mar 8, 2023

加入官方token数据的统计，显示翻译消耗的资费 #26

Open

yihong0618 mentioned this pull request Mar 9, 2023

fix(#92): add a arguments to allow NavigableStrings translate #126

Merged

jeffery9 changed the title ~~split if token larger than 4096, try to fix #90~~ [fix #90] split text if token larger than 4096 Mar 10, 2023

jeffery9 changed the title ~~[fix #90] split text if token larger than 4096~~ fix (#90) split text if token larger than 4096 Mar 10, 2023

yihong0618 reviewed Mar 10, 2023

View reviewed changes

jeffery9 and others added 4 commits March 12, 2023 17:11

count tokens

2515379

transalte in splits for token larger than 4096

084b7a6

count token used.

9d242a7

refactor code and resolve conflicts with upstream

7eeb4d2

jeffery9 force-pushed the split_p branch from f74ae37 to 7eeb4d2 Compare March 12, 2023 09:38

fix var

f43c778

jeffery9 commented Mar 12, 2023

View reviewed changes

jeffery9 force-pushed the split_p branch from edeee85 to f43c778 Compare March 13, 2023 05:05

Merge branch 'main' into split_p

4d2b7f6

jeffery9 requested review from yihong0618 March 13, 2023 05:17

yihong0618 mentioned this pull request Mar 14, 2023

Cumulative translation #148

Merged

wayhome pushed a commit to wayhome/bilingual_book_maker that referenced this pull request Aug 29, 2024

Fixed a bug that prevented the proper storage of translated ebooks. y…

62bdbca

…ihong0618#106

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix (#90) split text if token larger than 4096 #106

fix (#90) split text if token larger than 4096 #106

jeffery9 commented Mar 8, 2023 •

edited

Loading

jeffery9 commented Mar 8, 2023

yihong0618 Mar 10, 2023

jeffery9 left a comment

jeffery9 left a comment

jeffery9 commented Mar 13, 2023

yihong0618 commented Mar 13, 2023

jeffery9 commented Mar 13, 2023

yihong0618 commented Mar 13, 2023

fix (#90) split text if token larger than 4096 #106

Are you sure you want to change the base?

fix (#90) split text if token larger than 4096 #106

Conversation

jeffery9 commented Mar 8, 2023 • edited Loading

jeffery9 commented Mar 8, 2023

yihong0618 Mar 10, 2023

Choose a reason for hiding this comment

jeffery9 left a comment

Choose a reason for hiding this comment

jeffery9 left a comment

Choose a reason for hiding this comment

jeffery9 commented Mar 13, 2023

yihong0618 commented Mar 13, 2023

jeffery9 commented Mar 13, 2023

yihong0618 commented Mar 13, 2023

jeffery9 commented Mar 8, 2023 •

edited

Loading