Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deepcopy in recorder #1376

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

johny-b
Copy link
Contributor

@johny-b johny-b commented Oct 12, 2023

Recorder doesn't write the record file immediately when record_sampling is called, but does this once there are enough entries to be written. To be able to write the logs later, recorder saves the prompt object passed to record_sampling.
If someone alters the prompt object after a call to record_sampling, but before data is dumped to a file, recorder writes incorrect data.

This is fixed with deepcopy. Note: common copy() would help for cases when we add/remove stuff from the prompt, but not if we modify elements of it.

Example. There should be no difference between this:

messages = [{"role": "system", "content": "hello there!"}]
response = self.completion_fns[0](prompt=messages).get_completions()[0]
messages = messages + [{"role": "assistant", "content": response}]
messages += [{"role": "system", "content": "tell me a joke"}]
response = self.completion_fns[0](prompt=messages).get_completions()[0]

and

messages = [{"role": "system", "content": "hello there!"}]
response = self.completion_fns[0](prompt=messages).get_completions()[0]
messages.append({"role": "assistant", "content": response})
messages += [{"role": "system", "content": "tell me a joke"}]
response = self.completion_fns[0](prompt=messages).get_completions()[0]

but now the second code snipped creates weird logs.

Recorder doesn't write the record file immediately when record_sampling
is called, but does this once there are enough entries to be written.
To be able to write the logs later, recorder saves the prompt object
passed to record_sampling.
If someone alters the prompt object after a call to record_sampling,
but before data is dumped to a file, recorder writes incorrect data.

This is fixed with deepcopy. Note: common copy() would help for
cases when we add/remove stuff from the prompt, but not if we modify
elements of it.

Example. There should be no difference between this:

```
messages = [{"role": "system", "content": "hello there!"}]
response = self.completion_fns[0](prompt=messages).get_completions()[0]
messages = messages + [{"role": "assistant", "content": response}]
messages += [{"role": "system", "content": "tell me a joke"}]
response = self.completion_fns[0](prompt=messages).get_completions()[0]
```

and

```
messages = [{"role": "system", "content": "hello there!"}]
response = self.completion_fns[0](prompt=messages).get_completions()[0]
messages.append({"role": "assistant", "content": response})
messages += [{"role": "system", "content": "tell me a joke"}]
response = self.completion_fns[0](prompt=messages).get_completions()[0]
```

but now the second code snipped creates weird logs.
@JunShern
Copy link
Collaborator

JunShern commented Dec 3, 2023

Would it be better to implement the copy further down the chain in record_event so that other recorder logs also receive this fix? (e.g. record_match, record_embedding, etc are also affected by this)

@logankilpatrick logankilpatrick removed their request for review January 3, 2024 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants