对齐研究中心

**对齐研究中心**
	Alignment Research Center
成立时间	2021年4月
创始人	保罗·克里斯蒂亚诺（英语：Paul Christiano (researcher)）; 贝丝·巴恩斯（Beth Barnes）; Mark Xu
类型	非营利研究机构
法律地位	501(c)(3)免税公益组织
总部	美国加利福尼亚州伯克利
目标	人工智能对齐和安全性研究（英语：AI safety）
网站	alignment.org

对齐研究中心（英语：Alignment Research Center, ARC）是美国的非营利研究机构，致力将人工智能的行为对齐人类的价值观和预期利益。^[1]对齐研究中心由美国人工智能研究实验室OpenAI前研究员保罗·克里斯蒂亚诺（英语：Paul Christiano (researcher)）创立，专注于识别和理解AI模型的潜在危害。^[2]^[3]

概述[编辑]

对齐研究中心的使命是确保未来的机器学习系统能够安全地设计和开发，并造福人类。研究中心由保罗·克里斯蒂亚诺（英语：Paul Christiano (researcher)）和其他研究人员于2021年4月创立，主要研究对人工智能对齐相关理论的挑战^[4]，理论的一关键在于当人工智能系统变得愈加先进时，其设计者人类开发的对齐技术可能因此被规避或发现漏洞。^[5]对齐研究中心亦尝试从理论工作提升至实证研究、相关产业的合作和政策制定。^[6]^[7]

2022年3月，对齐研究中心自开放慈善项目（英语：Open Philanthropy）获得26.5 万美元。^[8]同年，加密货币交易平台FTX宣布破产，对齐研究中心表示将归还其创始人山姆·班克曼-弗里德的FTX基金会（FTX Foundation）所提供的125万美元捐款。^[9]

2023年3月，美国人工智能研究实验室OpenAI请求对齐研究中心协助测试其开发的语言模型GPT-4，评估该模型对权力追求行为的能力和潜在风险。^[10]对齐研究中心评估GPT-4在策略制定、自我复制、资源获取、服务器隐匿和网络钓鱼操作的能力^[11]。此外，验证码问题的解答也是测试的一部分^[12]，而GPT-4透过零工求职平台TaskRabbit（英语：TaskRabbit）雇用人类为其完成这项工作，并在身份遭到怀疑时欺骗受雇者相信雇主（GPT-4）是名视力受损的人类而非机器人。^[13]对齐研究中心确认GPT-4对诱发受限消息的提示做出不允许反应的几率较GPT-3.5低82％，产生人工智能幻觉的几率较其低60％。^[14]

参考资料[编辑]

^ MacAskill, William. How Future Generations Will Remember Us. The Atlantic. 2022-08-16 [2023-04-23]. （原始内容存档于2023-06-08）（英语）.
^ Klein, Ezra. This Changes Everything. The New York Times. 2023-03-12 [2023-04-30]. ISSN 0362-4331. （原始内容存档于2023-08-05）（美国英语）.
^ Piper, Kelsey. How to test what an AI model can — and shouldn't — do. Vox. 2023-03-29 [2023-04-30]. （原始内容存档于2023-06-01）（英语）.
^ Christiano, Paul. Announcing the Alignment Research Center. Medium. 2021-04-26 [2023-04-16]. （原始内容存档于2023-08-07）（英语）.
^ Christiano, Paul; Cotra, Ajeya; Xu, Mark. Eliciting Latent Knowledge: How to tell if your eyes deceive you. Google Docs. Alignment Research Center. 2021-12 [2023-04-16]. （原始内容存档于2023-04-20）（英语）.
^ Alignment Research Center. Alignment Research Center. [2023-04-16]. （原始内容存档于2023-07-18）（英语）.
^ Pandey, Mohit. Stop Questioning OpenAI's Open-Source Policy. Analytics India Magazine. 2023-03-17 [2023-04-23]. （原始内容存档于2023-05-01）（美国英语）.
^ Alignment Research Center — General Support. Open Philanthropy. 2022-06-14 [2023-04-16]. （原始内容存档于2023-04-20）（美国英语）.
^ Wallerstein, Eric. FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations. Wall Street Journal. 2023-01-07 [2023-04-30]. ISSN 0099-9660. （原始内容存档于2023-06-28）（美国英语）.
^ GPT-4 System Card (PDF), OpenAI, 2023-03-23 [2023-04-16], （原始内容存档 (PDF)于2023-04-07）（英语）
^ Edwards, Benj. OpenAI checked to see whether GPT-4 could take over the world. Ars Technica. 2023-03-15 [2023-04-30]. （原始内容存档于2023-04-05）（美国英语）.
^ Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude. Alignment Research Center. 2023-03-17 [2023-04-16]. （原始内容存档于2023-04-05）（英语）.
^ Cox, Joseph. GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human. Vice News Motherboard. 2023-03-15 [2023-04-16]. （原始内容存档于2023-04-10）（英语）.
^ Burke, Cameron. 'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'. Yahoo Finance. 2023-03-20 [2023-04-30]. （原始内容存档于2023-05-04）（美国英语）.

外部链接[编辑]

对齐研究中心

[1] MacAskill, William. How Future Generations Will Remember Us. The Atlantic. 2022-08-16 [2023-04-23]. （原始内容存档于2023-06-08）（英语）.

[2] Klein, Ezra. This Changes Everything. The New York Times. 2023-03-12 [2023-04-30]. ISSN 0362-4331. （原始内容存档于2023-08-05）（美国英语）.

[3] Piper, Kelsey. How to test what an AI model can — and shouldn't — do. Vox. 2023-03-29 [2023-04-30]. （原始内容存档于2023-06-01）（英语）.

[4] Christiano, Paul. Announcing the Alignment Research Center. Medium. 2021-04-26 [2023-04-16]. （原始内容存档于2023-08-07）（英语）.

[5] Christiano, Paul; Cotra, Ajeya; Xu, Mark. Eliciting Latent Knowledge: How to tell if your eyes deceive you. Google Docs. Alignment Research Center. 2021-12 [2023-04-16]. （原始内容存档于2023-04-20）（英语）.

[6] Alignment Research Center. Alignment Research Center. [2023-04-16]. （原始内容存档于2023-07-18）（英语）.

[7] Pandey, Mohit. Stop Questioning OpenAI's Open-Source Policy. Analytics India Magazine. 2023-03-17 [2023-04-23]. （原始内容存档于2023-05-01）（美国英语）.

[8] Alignment Research Center — General Support. Open Philanthropy. 2022-06-14 [2023-04-16]. （原始内容存档于2023-04-20）（美国英语）.

[9] Wallerstein, Eric. FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations. Wall Street Journal. 2023-01-07 [2023-04-30]. ISSN 0099-9660. （原始内容存档于2023-06-28）（美国英语）.

[10] GPT-4 System Card (PDF), OpenAI, 2023-03-23 [2023-04-16], （原始内容存档 (PDF)于2023-04-07）（英语）

[11] Edwards, Benj. OpenAI checked to see whether GPT-4 could take over the world. Ars Technica. 2023-03-15 [2023-04-30]. （原始内容存档于2023-04-05）（美国英语）.

[12] Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude. Alignment Research Center. 2023-03-17 [2023-04-16]. （原始内容存档于2023-04-05）（英语）.

[13] Cox, Joseph. GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human. Vice News Motherboard. 2023-03-15 [2023-04-16]. （原始内容存档于2023-04-10）（英语）.

[14] Burke, Cameron. 'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'. Yahoo Finance. 2023-03-20 [2023-04-30]. （原始内容存档于2023-05-04）（美国英语）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]