forked from openai/openai-quickstart-python
-
Notifications
You must be signed in to change notification settings - Fork 0
/
file_demo.py
83 lines (57 loc) · 2.78 KB
/
file_demo.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
import os
import json
import openai
from dotenv import load_dotenv # Add
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
# with open('data.jsonl', 'r') as json_file:
# json_list = list(json_file)
# for json_str in json_list:
# result = json.loads(json_str)
# print(f"result: {result}")
# print(isinstance(result, dict))
# openai.File.create(
# file=open("data.jsonl", "rb"),
# purpose='fine-tune'
# )
res = openai.File.list()
json_object = json.dumps(res, indent=4)
with open("file_list.json", "w") as outfile:
outfile.write(json_object)
res = openai.File.retrieve("file-K164kgEHnfwu9xpH39BxZ6Yg")
json_object = json.dumps(res, indent=4)
with open("file_retrieve.json", "w") as outfile:
outfile.write(json_object)
"""
openai.error.InvalidRequestError:
To help mitigate abuse,
downloading of fine-tune training files is disabled for free accounts.
"""
# res = openai.File.download("file-wNtVrddnJeqAkmpy8WjYXyah")
# print(type(res))
"""
Analyzing...
- Your file contains 2 prompt-completion pairs. In general, we recommend having at least a few hundred examples. We've found
that performance tends to linearly increase for every doubling of the number of examples
- All prompts end with suffix `?\nAgent:`
WARNING: Some of your prompts contain the suffix `?
Agent:` more than once. We strongly suggest that you review your prompts and add a unique suffix
- All prompts start with prefix `Summary: You're a customer service chat bot.
Specific information: Customers are using our company's platform via web or mobile app.
###
Customer: `. Fine-tuning doesn't require the instruction specifying the task, or a few-shot example scenario. Most of the time you should only add the input data into the prompt, and the desired output into the completion
- All completions end with suffix `.\n`
Based on the analysis we will perform the following actions:
- [Recommended] Remove prefix `Summary: You're a customer service chat bot.
Specific information: Customers are using our company's platform via web or mobile app.
###
Customer: ` from all prompts [Y/n]: y
Your data will be written to a new JSONL file. Proceed [Y/n]: y
Wrote modified file to `.\data_prepared.jsonl`
Feel free to take a look!
Now use that file when fine-tuning:
> openai api fine_tunes.create -t ".\data_prepared.jsonl"
After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string `?\nAgent:` for the model to
start generating completions, rather than continuing with the prompt. Make sure to include `stop=[".\n"]` so that the generated texts ends at the expected place.
Once your model starts training, it'll approximately take 2.47 minutes to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you.
"""