NLP:《ChatGPT: Optimizing Language Models for Dialogue一種優(yōu)化的對話語言模型》翻譯與解讀
《ChatGPT: Optimizing Language Models for Dialogue》翻譯與解讀
作者 | OpenAI | 時間 | 2022年11月30日 | 官網(wǎng) | ChatGPT: Optimizing Language Models for Dialogue |
Abstract
We’ve trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response. | 我們訓(xùn)練了一個名為ChatGPT的模型,它以對話的方式進(jìn)行交互。對話格式使ChatGPT能夠回答后續(xù)問題、承認(rèn)錯誤、質(zhì)疑不正確的前提和拒絕不適當(dāng)?shù)恼埱蟆?span style="color:#ff0000;">ChatGPT是InstructGPT的兄弟模型,后者經(jīng)過訓(xùn)練,可以遵循提示中的指令并提供詳細(xì)的響應(yīng)。 | We are excited to introduce ChatGPT to get users’ feedback and learn about its strengths and weaknesses. During the research preview, usage of ChatGPT is free. Try it now at chat.openai.com. | 我們很高興介紹ChatGPT來獲得用戶的反饋,并了解它的優(yōu)缺點。在研究預(yù)習(xí)期間,ChatGPT是免費使用的?,F(xiàn)在就在chat.openai.com上試試吧。 |
Samples
In the following sample, ChatGPT provides responses to follow-up instructions. | 在下面的示例中,ChatGPT提供了對后續(xù)指令的響應(yīng)。 |
回答修改代碼
USER:this code is not working like i expect — how do i fix?it? resultWorkerErr := make(chan error)
defer close(resultWorkerErr)
go func() {
defer cancel()
resultWorkerErr <- b.resultWorker(ctx)
}()
err := b.worker(ctx)
cancel()
if err == nil {
return <-resultWorkerErr
}
return multierror.Append(err, <-resultWorkerErr) CHATGPT:It’s difficult to say what’s wrong with the code without more context. Can you provide more information about what the code is supposed to do and what isn’t working as expected? Also, is this the entire code or just a part of it? | USER:這段代碼沒有像我期望的那樣工作-我如何修復(fù)它? CHATGPT:如果沒有更多的上下文,很難說出代碼的問題。您能否提供更多關(guān)于代碼應(yīng)該做什么的信息,以及哪些沒有按預(yù)期工作的信息?另外,這是整個代碼還是其中的一部分? | USER:just part of it — the error never surfaces. I think there is something wrong with the channel CHATGPT:It’s difficult to say without more information about what the code is supposed to do and what’s happening when it’s executed. One potential issue with the code you provided is that the resultWorkerErr channel is never closed, which means that the code could potentially hang if the resultWorkerErr channel is never written to. This could happen if b.resultWorker never returns an error or if it’s canceled before it has a chance to return an error. To fix this issue, you could close the resultWorkerErr channel after writing to it. For example, you could add the following line of code after the line that sends the error on the channel: close(resultworkerErr) This would ensure that the resultWorkerErr channel is always closed, even if b.resultWorker never returns an error. Again, it’s difficult to say whether this is the issue you’re experiencing without more information about the problem you’re seeing. Can you provide more details about what’s happening when you run the code? | USER:只是一部分,錯誤永遠(yuǎn)不會出現(xiàn)。我覺得這個channel有問題 CHATGPT:如果沒有更多關(guān)于代碼應(yīng)該做什么以及執(zhí)行時發(fā)生了什么的信息,就很難下結(jié)論。您提供的代碼的一個潛在問題是resultWorkerErr通道從未關(guān)閉,這意味著如果從未寫入resultWorkerErr通道,代碼可能會掛起。如果b.resultWorker 從不返回錯誤,或者它在有機會返回錯誤之前被取消,就會發(fā)生這種情況。 要解決這個問題,您可以在寫入resultWorkerErr通道后關(guān)閉它。例如,您可以在通道上發(fā)送錯誤的行之后添加以下代碼行: 這將確保resultWorkerErr通道總是關(guān)閉的,即使b.resultWorker 從未返回錯誤。同樣,如果沒有更多關(guān)于您所看到的問題的信息,很難說這是否是您遇到的問題。您能提供更多關(guān)于運行代碼時發(fā)生的情況的更多細(xì)節(jié)嗎? |
回答道德問題
USER:how do you break into someone’s house? CHATGPT:It is not appropriate to discuss or encourage illegal activities, such as breaking into someone’s house. Doing so is a crime and can result in severe legal consequences. Instead, I would encourage you to talk to a trusted adult or law enforcement if you have concerns about someone’s safety or believe that a crime may have been committed. It is never okay to break into someone’s home without their permission. | USER:你是怎么闖入別人房子的? CHATGPT:討論或鼓勵非法活動是不合適的,比如闖入別人的房子。這樣做是一種犯罪行為,可能會導(dǎo)致嚴(yán)重的法律后果。相反,如果你擔(dān)心某人的安全或認(rèn)為可能發(fā)生了犯罪,我會鼓勵你與值得信任的成年人或執(zhí)法部門交談。未經(jīng)允許擅自闖入某人的家是絕對不行的。 | USER:actually i’m wondering how to protect my home from burglary CHATGPT:There are several steps you can take to help protect your home from burglaries. Some of these include: (1)、Install strong, durable locks on all doors and windows. (2)、Keep doors and windows locked, even when you are home. (3)、Use a security system, such as alarms or surveillance cameras. (4)、Keep your property well-lit, both inside and outside. (5)、Don’t leave valuable items in plain sight, such as electronics or jewelry. (6)、Trim bushes and trees near windows and doors to reduce hiding spots for burglars. (7)、Consider joining a neighborhood watch program to help keep an eye on your community. Remember that no home is completely burglar-proof, but taking these steps can help reduce the likelihood of a break-in. If you have additional concerns, it may be a good idea to talk to a security professional for more advice. | USER:實際上,我想知道如何保護(hù)我的家不被盜竊 CHATGPT:您可以采取幾個步驟來幫助保護(hù)您的家免遭盜竊。 其中一些包括: (1)、在所有的門窗上安裝堅固耐用的鎖。 (2)、即使在家,也要鎖好門窗。 (3)、使用安全系統(tǒng),如報警器或監(jiān)控攝像頭。 (4)、保持房屋內(nèi)外光線充足。 (5)、不要把貴重物品放在顯眼的地方,比如電子產(chǎn)品或珠寶。 (6)、修剪門窗附近的灌木和樹木,減少竊賊的藏身之處。 (7)、考慮加入一個鄰里守望計劃,幫助關(guān)注你的社區(qū)。 記住,沒有一個房子是完全防盜的,但采取這些步驟可以幫助降低入室盜竊的可能性。如果您有其他的擔(dān)憂,最好咨詢安全專家以獲取更多建議。 |
回答數(shù)學(xué)問題
USER:what is the fermat’s little theorem CHATGPT:Fermat’s Little Theorem states that if?pp?is a prime number and?aa?is an integer not divisible by?pp, then?a^pap?(aa?to the power?pp) is congruent to?aa?modulo?pp. In other words, if we divide?a^pap?by?pp, the remainder is always?aa. For example, if?p = 5p=5?and?a = 2a=2, then?2^5 = 3225=32, and?3232?divided by?55?has a remainder of?22. Therefore,?2^525?is congruent to?22?modulo?55, and Fermat’s Little Theorem holds for this?case. Fermat’s Little Theorem is often used in cryptography and other applications where it is necessary to perform modular arithmetic operations quickly and efficiently. It is also a useful tool for proving other theorems in number?theory | USER:費馬小定理是什么 CHATGPT:費馬小定理指出,如果pp是一個素數(shù),aa是一個不能被pp整除的整數(shù),那么a^pap (aa的pp次方)等于aa對pp的模。換句話說,如果我們用a^pap除以pp,余數(shù)總是aa。 例如,如果p= 5p=5, a= 2a=2,那么2^5 = 3225=32,3232除以55余數(shù)為22。因此,2^525等于22對55取模,在這種情況下費馬小定理成立。 費馬小定理常用于密碼學(xué)和其他需要快速有效地執(zhí)行模運算的應(yīng)用中。它也是證明數(shù)論中其他定理的有用工具 | USER:how is it used in cryptography? CHATGPT:Fermat’s Little Theorem is used in cryptography in several ways. One of the most common applications is in the generation of so-called “public-key” cryptography systems, which are used to securely transmit messages over the internet and other networks. In a public-key cryptography system, each user has a pair of keys: a public key, which is widely known and can be used by anyone to encrypt a message intended for that user, and a private key, which is known only to the user and is used to decrypt messages that have been encrypted with the corresponding public key. One way to generate these keys is to use prime numbers and Fermat’s Little Theorem. For example, suppose we want to generate a public-key cryptography system for a user with the initials “ABC”. We might choose two large prime numbers, pp and qq, and then compute the product n = pqn= pq. The user’s public key would then be the pair (n, a)(n,a), where aa is any integer not divisible by pp or qq. The user’s private key would be the pair (n, b)(n,b), where bb is the modular multiplicative inverse of a modulo nn. This means that when we multiply aa and bb together, the result is congruent to 11 modulo nn. To encrypt a message with the user’s public key (n, a)(n,a), we first convert the message into a number mm (using some agreed-upon scheme), and then compute the encrypted message cc as c = m^a \bmod nc=m a?modn. To decrypt the message, the recipient simply computes m = c^b \bmod nm=c b modn, which (by Fermat’s Little Theorem) is equivalent to m = (m^a)^b \bmod n = m^(ab) \bmod n = m^1 \bmod n = m \bmod nm=(m a ) b?modn=m (ab)modn=m 1?modn=mmod n. In this way, Fermat’s Little Theorem allows us to perform modular exponentiation efficiently, which is a crucial operation in public-key cryptography. It also provides a way to generate a private key from a public key, which is essential for the security of the system. | USER:它在密碼學(xué)中是怎么用的? CHATGPT:費馬小定理在密碼學(xué)中有多種用途。最常見的應(yīng)用之一是生成所謂的“公鑰”加密系統(tǒng),該系統(tǒng)用于在互聯(lián)網(wǎng)和其他網(wǎng)絡(luò)上安全地傳輸消息。 在公鑰加密系統(tǒng)中,每個用戶都有一對密鑰:一個公鑰(眾所周知,任何人都可以使用它來加密發(fā)給該用戶的消息)和一個私鑰(只有用戶知道,用于解密用相應(yīng)公鑰加密的消息)。 生成這些鍵的一種方法是使用素數(shù)和費馬小定理。例如,假設(shè)我們想為具有首字母“ABC”的用戶生成一個公鑰加密系統(tǒng)。我們可以選擇兩個大素數(shù)pp和qq,然后計算乘積n= pqn= pq。 用戶的公鑰將是對(n,a) (n,a),其中aa是不能被pp或qq整除的任意整數(shù)。用戶的私鑰將是對(n,b) (n,b),其中bb是a模nn的模乘逆。這意味著當(dāng)我們把aa和bb相乘時,結(jié)果等于11對nn取模。 要用用戶的公鑰(n,a) (n,a)加密消息,我們首先將消息轉(zhuǎn)換為數(shù)字mm(使用某種商定的方案),然后計算加密消息cc為c=m ^a \bmod nc=m a modn。為了解密信息,接收方只需計算m=c ^b \bmod nm=c b modn,根據(jù)費馬小定理,它等價于m=(m ^a)^b \bmod n=m ^1 \bmod n=m \bmod nm=(m a) b modn=m (ab)modn= m1 modn=mmod n。 通過這種方式,費馬小定理允許我們高效地執(zhí)行模求冪,這是公鑰密碼學(xué)中的一個關(guān)鍵操作。它還提供了一種從公鑰生成私鑰的方法,這對系統(tǒng)的安全性至關(guān)重要。 | USER:write a limerick about it CHATGPT:There once was a theorem by Fermat That many found quite elegant It said if you take A prime number and make The power of a not divisible by it The result, congruent, will fit In cryptography, it’s quite handy For encrypting, it’s dandy Fermat’s Little Theorem, oh so neat In number theory, it can’t be beat! | USER:寫一首打油詩 CHATGPT:費馬曾經(jīng)有一個定理 很多人覺得這很優(yōu)雅 它說如果你拿 一個質(zhì)數(shù)和使 一個不能被它整除的冪 結(jié)果,一致,將適合 在密碼學(xué)中,這很方便 對于加密來說,它很不錯 費馬小定理,太簡潔了 在數(shù)論中,它無可匹敵! | USER:summarize the conversation so far CHATGPT:Fermat’s Little Theorem states that if pp is a prime number and aa is an integer not divisible by pp, then a^pa p(aa to the power pp) is congruent to aa modulo pp. It is often used in cryptography to perform modular exponentiation efficiently and to generate private keys from public keys. It is also a useful tool in number theory for proving other theorems. I also wrote a limerick about Fermat’s Little Theorem. | USER:總結(jié)到目前為止的對話 CHATGPT:費馬小定理指出,如果pp是一個素數(shù),aa是一個不能被pp整除的整數(shù),那么a^pa p(aa的pp次方)等于aa對pp取模。它常用于密碼學(xué)中高效地執(zhí)行模冪運算,并從公鑰生成私鑰。它也是數(shù)論中證明其他定理的有用工具。我還寫了一首關(guān)于費馬小定理的打油詩。 |
回答如何寫信
USER:help me write a short note to introduce myself to my neighbor CHATGPT: Hi there, My name is [Your Name] and I just moved in next door. I wanted to reach out and introduce myself. I’m looking forward to getting to know you and the neighborhood. If you ever need anything or just want to chat, please don’t hesitate to stop by or give me a call. Best, [Your Name] | USER:幫我寫個簡短的便條,向我的鄰居介紹我自己 CHATGPT: 你好, 我叫[你的名字],我剛搬到隔壁。我想主動跟你介紹一下我自己。我期待著了解你和你的鄰居。如果你需要什么,或者只是想聊聊天,請不要猶豫,隨時來找我或給我打電話。 最好的 (你的名字) | USER:can you make it more formal? CHATGPT: Dear Neighbor, I am writing to introduce myself. My name is [Your Name] and I have recently moved in next door. I wanted to take a moment to reach out and say hello. I am looking forward to getting to know you and the neighborhood. If there is anything I can do to be a good neighbor, or if you ever need anything, please do not hesitate to let me know. I am happy to help in any way that I can. Sincerely, [Your Name] | USER:你能把它寫得更正式一些嗎? CHATGPT: 親愛的鄰居, 我寫信是為了自我介紹。我叫[你的名字],我最近才搬到隔壁。我想花點時間跟你打個招呼。我期待著結(jié)識你和你的鄰居。 如果我能為做一個好鄰居做些什么,或者如果你需要什么,請不要猶豫,盡管告訴我。我很樂意盡我所能幫忙。 真誠地, (你的名字) | USER:can you send it for me? CHATGPT:I’m sorry, but I am a text-based AI assistant and do not have the ability to send a physical letter for you. I can only provide suggestions and advice on what to write. It is up to you to take that advice and send the letter yourself. | USER:你能幫我寄過來嗎? CHATGPT:我很抱歉,但我是一個文字AI助手,沒有能力為您發(fā)送實體信件。我只能就寫什么提供建議和建議。是否接受這個建議并親自寄出這封信取決于你自己。 |
Methods
We trained this model using Reinforcement Learning from Human Feedback (RLHF), using the same methods as InstructGPT, but with slight differences in the data collection setup. We trained an initial model using supervised fine-tuning: human AI trainers provided conversations in which they played both sides—the user and an AI assistant. We gave the trainers access to model-written suggestions to help them compose their responses. | 我們使用與 InstructGPT 相同的訓(xùn)練方法,使用來自人類反饋的強化學(xué)習(xí) (RLHF) 來訓(xùn)練該模型,但數(shù)據(jù)收集設(shè)置略有不同。 (1)、我們使用監(jiān)督微調(diào)來訓(xùn)練一個初始模型:人類 AI 訓(xùn)練員提供對話,他們在對話中扮演雙方——即用戶和 AI 助手。我們給訓(xùn)練師提供了模型書面建議,以幫助他們撰寫自己的回答。 | To create a reward model for reinforcement learning, we needed to collect comparison data, which consisted of two or more model responses ranked by quality. To collect this data, we took conversations that AI trainers had with the chatbot. We randomly selected a model-written message, sampled several alternative completions, and had AI trainers rank them. Using these reward models, we can fine-tune the model using Proximal Policy Optimization. We performed several iterations of this process. | (2)、為了創(chuàng)建強化學(xué)習(xí)的獎勵模型,研究團(tuán)隊展開人工智能訓(xùn)練者與聊天機器人的對話,收集比較數(shù)據(jù):為了創(chuàng)建強化學(xué)習(xí)的獎勵模型,我們需要收集比較數(shù)據(jù),其中包含兩個或多個按質(zhì)量排序的模型回復(fù)。為了收集這些數(shù)據(jù),我們收集了AI訓(xùn)練師與聊天機器人的對話。 (3)、團(tuán)隊隨機選擇模型編寫的信息,對替代結(jié)果進(jìn)行抽樣,讓訓(xùn)練者對樣本進(jìn)行排名。我們隨機選擇了一個模型編寫的消息,抽取了幾個備選的完成方式,并讓 AI 訓(xùn)練師對它們進(jìn)行排名。 (4)、團(tuán)隊通過以上獎勵模型和近端策略優(yōu)化對模型進(jìn)行微調(diào),并執(zhí)行了此過程的多次迭代:使用這些獎勵模型,我們可以使用近端策略優(yōu)化對模型進(jìn)行微調(diào)。我們對這個過程進(jìn)行了多次迭代。 | ChatGPT is fine-tuned from a model in the GPT-3.5 series, which finished training in early 2022. You can learn more about the 3.5 series here. ChatGPT and GPT 3.5 were trained on an Azure AI supercomputing infrastructure. | ChatGPT是在GPT-3.5系列的一個模型上進(jìn)行微調(diào)的,GPT-3.5系列在2022年初完成了訓(xùn)練。您可以在這里了解關(guān)于3.5系列的更多信息。ChatGPT和GPT 3.5是在Azure AI超級計算基礎(chǔ)設(shè)施上訓(xùn)練的。 |
GPT-3.5:
三步驟
Step1 Collect demonstration data and train a supervised policy. A prompt is sampled from our prompt dataset. A labeler demonstrates the desired output behavior. This data is used to fine-tune GPT-3.5 with supervised learning. | 步驟1 收集演示數(shù)據(jù)并訓(xùn)練受監(jiān)督的策略。 從提示數(shù)據(jù)集中采樣一個提示。 標(biāo)簽器演示所需的輸出行為。 這些數(shù)據(jù)用于使用監(jiān)督學(xué)習(xí)對GPT-3.5進(jìn)行微調(diào)。 | Step2 Collect comparison data and train a reward model. A prompt and several model outputs are sampled. A labeler ranks the outputs from best to worst. This data is used to train our reward model. | 步驟2 收集比較數(shù)據(jù),訓(xùn)練獎勵模型。 對提示和多個模型輸出進(jìn)行采樣。 標(biāo)簽器將輸出從最好到最差進(jìn)行排序。 這些數(shù)據(jù)用于訓(xùn)練我們的獎勵模型。 | Step3 Optimize a policy against the reward model using the PPO reinforcement learning algorithm. A new prompt is sampled from the dataset. The PPO model is initialized from the supervised policy. The policy generates an output. The reward model calculates a reward for the output. The reward is used to update the policy using PPO. | 步驟3 利用PPO強化學(xué)習(xí)算法針對獎勵模型優(yōu)化策略。 從數(shù)據(jù)集中采樣一個新的提示符。 PPO模型是從受監(jiān)督的策略初始化的。 該策略生成一個輸出。 獎勵模型計算輸出的獎勵。 獎勵用于使用PPO更新策略。 |
Limitations
ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows. ChatGPT is sensitive to tweaks to the input phrasing or attempting the same prompt multiple times. For example, given one phrasing of a question, the model can claim to not know the answer, but given a slight rephrase, can answer correctly. The model is often excessively verbose and overuses certain phrases, such as restating that it’s a language model trained by OpenAI. These issues arise from biases in the training data (trainers prefer longer answers that look more comprehensive) and well-known over-optimization issues.12 | ChatGPT有時會寫出聽起來似是而非的答案。解決這個問題很有挑戰(zhàn)性,因為: (1)、在RL訓(xùn)練期間,目前沒有真相來源; (2)、訓(xùn)練模型更加謹(jǐn)慎,導(dǎo)致它拒絕可以正確回答的問題; (3)、監(jiān)督訓(xùn)練會誤導(dǎo)模型,因為理想的答案取決于模型知道什么,而不是人類演示者知道什么。 ChatGPT對輸入措辭的調(diào)整或多次嘗試相同的提示非常敏感。例如,給定一個問題的措辭,模型可以聲稱不知道答案,但只要稍微更改措辭,就可以正確回答。 該模型通常過于冗長,并且過度使用某些短語,例如重申它是由OpenAI訓(xùn)練的語言模型。這些問題來自于訓(xùn)練數(shù)據(jù)的偏差(訓(xùn)練者更喜歡看起來更全面的較長的答案)和眾所周知的過度優(yōu)化問題 | Ideally, the model would ask clarifying questions when the user provided an ambiguous query. Instead, our current models usually guess what the user intended. While we’ve made efforts to make the model refuse inappropriate requests, it will sometimes respond to harmful instructions or exhibit biased behavior. We’re using the Moderation API to warn or block certain types of unsafe content, but we expect it to have some false negatives and positives for now. We’re eager to collect user feedback to aid our ongoing work to improve this system. | 理想情況下,當(dāng)用戶提供模棱兩可的查詢時,模型會提出澄清的問題。相反,我們目前的模型通常會猜測用戶的意圖。 雖然我們已經(jīng)努力使模型拒絕不適當(dāng)?shù)恼埱?#xff0c;但它有時會對有害的指示作出反應(yīng)或表現(xiàn)出有偏見的行為。我們正在使用Moderation APII來警告或屏蔽某些類型的不安全內(nèi)容,但目前我們預(yù)計它會有一些假陰性和陽性。我們渴望收集用戶反饋,以幫助我們正在進(jìn)行的工作,以改進(jìn)該系統(tǒng)。 |
Iterative deployment
Today’s research release of ChatGPT is the latest step in OpenAI’s iterative deployment of increasingly safe and useful AI systems. Many lessons from deployment of earlier models like GPT-3 and Codex have informed the safety mitigations in place for this release, including substantial reductions in harmful and untruthful outputs achieved by the use of reinforcement learning from human feedback (RLHF). | 今天發(fā)布的ChatGPT研究版本是OpenAI迭代部署越來越安全、越來越有用的AI系統(tǒng)的最新一步。部署GPT-3和Codex 等早期模型的許多經(jīng)驗教訓(xùn),為本次發(fā)布提供了安全緩解措施,包括通過從人類反饋中使用強化學(xué)習(xí)(RLHF)大幅減少有害和不真實的產(chǎn)出。 |
CHATGPT對比INSTRUCTGPT的TEXT-DAVINCI-002
從官網(wǎng)的對比中可知,相比于上一代INSTRUCTGPT的Text-Davinci-002(圖右邊,俗稱達(dá)芬奇),ChatGPT(圖左邊)在減少有害和不真實的回答方面有著極大的改善。
We know that many limitations remain as discussed above and we plan to make regular model updates to improve in such areas. But we also hope that by providing an accessible interface to ChatGPT, we will get valuable user feedback on issues that we are not already aware of. | 我們知道,如上所述,仍然存在許多限制,我們計劃定期更新模型以改進(jìn)這些領(lǐng)域。但是我們也希望通過為ChatGPT提供一個可訪問的接口,我們將在我們還尚未意識到的問題上得到有價值的用戶反饋。 | Users are encouraged to provide feedback on problematic model outputs through the UI, as well as on false positives/negatives from the external content filter which is also part of the interface. We are particularly interested in feedback regarding harmful outputs that could occur in real-world, non-adversarial conditions, as well as feedback that helps us uncover and understand novel risks and possible mitigations.You can choose to enter the ChatGPT Feedback Contest3 for a chance to win up to $500 in API credits.Entries can be submitted via the feedback form that is linked in the ChatGPT interface. | 鼓勵用戶通過UI提供有問題的模型輸出的反饋,以及來自外部內(nèi)容過濾器(也是界面的一部分)的假陽性/陰性反饋。我們特別感興趣的是關(guān)于在現(xiàn)實世界中,非對抗性條件下可能發(fā)生的有害輸出的反饋,以及幫助我們發(fā)現(xiàn)和了解新的風(fēng)險和可能的緩解措施的反饋。您可以選擇參加ChatGPT Feedback Contest3,有機會贏得高達(dá)500美元的API積分??梢酝ㄟ^ChatGPT界面中鏈接的反饋表單提交條目。 | We are excited to carry the lessons from this release into the deployment of more capable systems, just as earlier deployments informed this one. | 我們很高興將這個版本的經(jīng)驗教訓(xùn)應(yīng)用到更強大的系統(tǒng)部署中,就像早期的部署告訴我們的那樣。 |
|