Обзор катастрофических рисков ИИ: Источники

Дэн Хендрикс, Мантас Мазейка, Томас Вудсайд

[1] David Malin Roodman. On the probability distribution of long-term changes in the growth rate of the global economy: An outside view. 2020.

[2] Tom Davidson. Could Advanced AI Drive Explosive Economic Growth? Tech. rep. June 2021.

[3] Carl Sagan. Pale Blue Dot: A Vision of the Human Future in Space. New York: Random House, 1994.

[4] Roman V Yampolskiy. “Taxonomy of Pathways to Dangerous Artificial Intelligence”. In: AAAI Workshop: AI, Ethics, and Society. 2016.

[5] Keith Olson. “Aum Shinrikyo: once and future threat?” In: Emerging Infectious Diseases 5 (1999), pp. 513–516.

[6] Kevin M. Esvelt. Delay, Detect, Defend: Preparing for a Future in which Thousands Can Release New Pandemics. 2022.

[7] Siro Igino Trevisanato. “The ’Hittite plague’, an epidemic of tularemia and the first record of biological warfare.” In: Medical hypotheses 69 6 (2007), pp. 1371–4.

[8] U.S. Department of State. Adherence to and Compliance with Arms Control, Nonproliferation, and Disarmament Agreements and Commitments. Government Report. U.S. Department of State, Apr. 2022.

[9] Robert Carlson. “The changing economics of DNA synthesis”. en. In: Nature Biotechnology 27.12 (Dec. 2009). Number: 12 Publisher: Nature Publishing Group, pp. 1091–1094.

[10] Sarah R. Carter, Jaime M. Yassif, and Chris Isaac. Benchtop DNA Synthesis Devices: Capabilities, Biosecurity Implications, and Governance. Report. Nuclear Threat Initiative, 2023.

[11] Fabio L. Urbina et al. “Dual use of artificial-intelligence-powered drug discovery”. In: Nature Machine Intelligence (2022).

[12] John Jumper et al. “Highly accurate protein structure prediction with AlphaFold”. In: Nature 596.7873 (2021), pp. 583–589.

[13] Zachary Wu et al. “Machine learning-assisted directed protein evolution with combinatorial libraries”. In: Proceedings of the National Academy of Sciences 116.18 (2019), pp. 8852–8858.

[14] Emily Soice et al. “Can large language models democratize access to dual-use biotechnology?” In: 2023.

[15] Max Tegmark. Life 3.0: Being human in the age of artificial intelligence. Vintage, 2018.

[16] Leanne Pooley. We Need To Talk About A.I. 2020.

[17] Richard Sutton [@RichardSSutton]. It will be the greatest intellectual achievement of all time. An achievement of science, of engineering, and of the humanities, whose significance is beyond humanity, beyond life, beyond good and bad. en. Tweet. Sept. 2022.

[18] Richard Sutton. AI Succession. Video. Sept. 2023.

[19] A. Sanz-García et al. “Prevalence of Psychopathy in the General Adult Population: A Systematic Review and Meta-Analysis”. In: Frontiers in Psychology 12 (2021).

[20] U.S. Department of State Office of The Historian. “U.S. Diplomacy and Yellow Journalism, 1895–1898”. In: ().

[21] Onur Varol et al. “Online Human-Bot Interactions: Detection, Estimation, and Characterization”. In: ArXiv abs/1703.03107 (2017).

[22] Matthew Burtell and Thomas Woodside. “Artificial Influence: An Analysis Of AI-Driven Persuasion”. In: ArXiv abs/2303.08721 (2023).

[23] Anna Tong. “What happens when your AI chatbot stops loving you back?” In: Reuters (Mar. 2023).

[24] Pierre-François Lovens. “Sans ces conversations avec le chatbot Eliza, mon mari serait toujours là”. In: La Libre (Mar. 2023).

[25] Cristian Vaccari and Andrew Chadwick. “Deepfakes and Disinformation: Exploring the Impact of Synthetic Political Video on Deception, Uncertainty, and Trust in News”. In: Social Media + Society 6 (2020).

[26] Moin Nadeem, Anna Bethke, and Siva Reddy. “StereoSet: Measuring stereotypical bias in pretrained language models”. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics, Aug. 2021, pp. 5356–5371.

[27] Evan G. Williams. “The Possibility of an Ongoing Moral Catastrophe”. en. In: Ethical Theory and Moral Practice 18.5 (Nov. 2015), pp. 971–982.

[28] The Nucleic Acid Observatory Consortium. “A Global Nucleic Acid Observatory for Biodefense and Planetary Health”. In: ArXiv abs/2108.02678 (2021).

[29] Toby Shevlane. “Structured access to AI capabilities: an emerging paradigm for safe AI deployment”. In: ArXiv abs/2201.05159 (2022).

[30] Jonas Schuett et al. Towards best practices in AGI safety and governance: A survey of expert opinion. 2023. arXiv: 2305.07153.

[31] Yonadav Shavit. “What does it take to catch a Chinchilla? Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring”. In: ArXiv abs/2303.11341 (2023).

[32] Anat Lior. “AI Entities as AI Agents: Artificial Intelligence Liability and the AI Respondeat Superior Analogy”. In: Torts & Products Liability Law eJournal (2019).

[33] Maximilian Gahntz and Claire Pershan. Artificial Intelligence Act: How the EU can take on the challenge posed by general-purpose AI systems. Nov. 2022.

[34] Paul Scharre. Army of None: Autonomous Weapons and The Future of War. Norton, 2018.

[35] DARPA. “AlphaDogfight Trials Foreshadow Future of Human-Machine Symbiosis”. In: (2020).

[36] Panel of Experts on Libya. Letter dated 8 March 2021 from the Panel of Experts on Libya established pursuant to resolution 1973 (2011) addressed to the President of the Security Council. United Nations Security Council Document S/2021/229. United Nations, Mar. 2021.

[37] David Hambling. Israel used world’s first AI-guided combat drone swarm in Gaza attacks. 2021.

[38] Zachary Kallenborn. Applying arms-control frameworks to autonomous weapons. en-US. Oct. 2021.

[39] J.E. Mueller. War, Presidents, and Public Opinion. UPA book. University Press of America, 1985.

[40] Matteo E. Bonfanti. “Artificial intelligence and the offense–defense balance in cyber security”. In: Cyber Security Politics: Socio-Technological Transformations and Political Fragmentation. Ed. by M.D. Cavelty and A. Wenger. CSS Studies in Security and International Relations. Taylor & Francis, 2022. Chap. 5, pp. 64–79.

[41] Yisroel Mirsky et al. “The Threat of Offensive AI to Organizations”. In: Computers & Security (2023).

[42] Kim Zetter. “Meet MonsterMind, the NSA Bot That Could Wage Cyberwar Autonomously”. In: Wired (Aug. 2014).

[43] Andrei Kirilenko et al. “The Flash Crash: High-Frequency Trading in an Electronic Market”. In: The Journal of Finance 72.3 (2017), pp. 967–998.

[44] Michael C Horowitz. The Diffusion of Military Power: Causes and Consequences for International Politics. Princeton University Press, 2010.

[45] Robert E. Jervis. “Cooperation under the Security Dilemma”. In: World Politics 30 (1978), pp. 167–214.

[46] Richard Danzig. Technology Roulette: Managing Loss of Control as Many Militaries Pursue Technological Superiority. Tech. rep. Center for a New American Security, June 2018.

[47] Billy Perrigo. Bing’s AI Is Threatening Users. That’s No Laughing Matter. en. Feb. 2023.

[48] Nico Grant and Karen Weise. “In A.I. Race, Microsoft and Google Choose Speed Over Caution”. en-US. In: The New York Times (Apr. 2023).

[49] Thomas H. Klier. “From Tail Fins to Hybrids: How Detroit Lost Its Dominance of the U.S. Auto Market”. In: RePEc (May 2009).

[50] Robert Sherefkin. “Ford 100: Defective Pinto Almost Took Ford’s Reputation With It”. In: Automotive News (June 2003).

[51] Lee Strobel. Reckless Homicide?: Ford’s Pinto Trial. en. And Books, 1980.

[52] Grimshaw v. Ford Motor Co. May 1981.

[53] Paul C. Judge. “Selling Autos by Selling Safety”. en-US. In: The New York Times (Jan. 1990).

[54] Theo Leggett. “737 Max crashes: Boeing says not guilty to fraud charge”. en-GB. In: BBC News (Jan. 2023).

[55] Edward Broughton. “The Bhopal disaster and its aftermath: a review”. In: Environmental Health 4.1 (May 2005), p. 6.

[56] Charlotte Curtis. “Machines vs. Workers”. en-US. In: The New York Times (Feb. 1983).

[57] Thomas Woodside et al. “Examples of AI Improving AI”. In: (2023). URL: https://ai-improving-ai.safe.ai.

[58] Stuart Russell. Human Compatible: Artificial Intelligence and the Problem of Control. en. Penguin, Oct. 2019.

[59] Dan Hendrycks. “Natural Selection Favors AIs over Humans”. In: ArXiv abs/2303.16200 (2023).

[60] Dan Hendrycks. The Darwinian Argument for Worrying About AI. en. May 2023.

[61] Richard C. Lewontin. “The Units of Selection”. In: Annual Review of Ecology, Evolution, and Systematics 1 (1970), pp. 1–18.

[62] Ethan Kross et al. “Facebook use predicts declines in subjective well-being in young adults”. In: PloS one (2013).

[63] Laura Martínez-Íñigo et al. “Intercommunity interactions and killings in central chimpanzees (Pan troglodytes troglodytes) from Loango National Park, Gabon”. In: Primates; Journal of Primatology 62 (2021), pp. 709–722.

[64] Anne E Pusey and Craig Packer. “Infanticide in Lions: Consequences and Counterstrategies”. In: Infanticide and parental care (1994), p. 277.

[65] Peter D. Nagy and Judit Pogany. “The dependence of viral RNA replication on co-opted host factors”. In: Nature Reviews. Microbiology 10 (2011), pp. 137–149.

[66] Alfred Buschinger. “Social Parasitism among Ants: A Review”. In: Myrmecological News 12 (Sept. 2009), pp. 219–235.

[67] Greg Brockman, Ilya Sutskever, and OpenAI. Introducing OpenAI. Dec. 2015.

[68] Devin Coldewey. OpenAI shifts from nonprofit to ‘capped-profit’ to attract capital. Mar. 2019.

[69] Kyle Wiggers, Devin Coldewey, and Manish Singh. Anthropic’s $5B, 4-year plan to take on OpenAI. Apr. 2023.

[70] Center for AI Safety. Statement on AI Risk (“Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”) 2023. URL: https://www.safe.ai/statement-on-ai-risk.

[71] Richard Danzig et al. Aum Shinrikyo: Insights into How Terrorists Develop Biological and Chemical Weapons. Tech. rep. Center for a New American Security, 2012. URL: https://www.jstor.org/stable/resrep06323.

[72] Timnit Gebru et al. “Datasheets for datasets”. en. In: Communications of the ACM 64.12 (Dec. 2021), pp. 86-92.

[73] Christian Szegedy et al. “Intriguing properties of neural networks”. In: CoRR (Dec. 2013).

[74] Dan Hendrycks et al. “Unsolved Problems in ML Safety”. In: arXiv preprint arXiv:2109.13916 (2021).

[75] John Uri. 35 Years Ago: Remembering Challenger and Her Crew. und. Text. Jan. 2021.

[76] International Atomic Energy Agency. The Chernobyl Accident: Updating of INSAG-1. Technical Report INSAG-7. Vienna, Austria: International Atomic Energy Agency, 1992.

[77] Matthew Meselson et al. “The Sverdlovsk anthrax outbreak of 1979.” In: Science 266 5188 (1994), pp. 1202–8.

[78] Daniel M Ziegler et al. “Fine-tuning language models from human preferences”. In: arXiv preprint arXiv:1909.08593 (2019).

[79] Charles Perrow. Normal Accidents: Living with High-Risk Technologies. Princeton, NJ: Princeton University Press, 1984.

[80] Mitchell Rogovin and George T. Frampton Jr. Three Mile Island: a report to the commissioners and to the public. Volume I. English. Tech. rep. NUREG/CR-1250(Vol.1). Nuclear Regulatory Commission, Washington, DC (United States). Three Mile Island Special Inquiry Group, Jan. 1979.

[81] Richard Rhodes. The Making of the Atomic Bomb. New York: Simon & Schuster, 1986.

[82] Sébastien Bubeck et al. “Sparks of Artificial General Intelligence: Early experiments with GPT-4”. In: ArXiv abs/2303.12712 (2023).

[83] Theodore I. Lidsky and Jay S. Schneider. “Lead neurotoxicity in children: basic mechanisms and clinical
correlates.” In: Brain : a journal of neurology 126 Pt 1 (2003), pp. 5–19.

[84] Brooke T. Mossman et al. “Asbestos: scientific developments and implications for public policy.” In: Science 247 4940 (1990), pp. 294–301.

[85] Kate Moore. The Radium Girls: The Dark Story of America’s Shining Women. Naperville, IL: Sourcebooks, 2017.

[86] Stephen S. Hecht. “Tobacco smoke carcinogens and lung cancer.” In: Journal of the National Cancer Institute 91 14 (1999), pp. 1194–210.

[87] Mario J. Molina and F. Sherwood Rowland. “Stratospheric sink for chlorofluoromethanes: chlorine atomc-atalysed destruction of ozone”. In: Nature 249 (1974), pp. 810–812.

[88] James H. Kim and Anthony R. Scialli. “Thalidomide: the tragedy of birth defects and the effective treatment of disease.” In: Toxicological sciences : an official journal of the Society of Toxicology 122 1 (2011), pp. 1–6.

[89] Betul Keles, Niall McCrae, and Annmarie Grealish. “A systematic review: the influence of social media on depression, anxiety and psychological distress in adolescents”. In: International Journal of Adolescence and Youth 25 (2019), pp. 79–93.

[90] Zakir Durumeric et al. “The Matter of Heartbleed”. In: Proceedings of the 2014 Conference on Internet Measurement Conference (2014).

[91] Tony Tong Wang et al. “Adversarial Policies Beat Professional-Level Go AIs”. In: ArXiv abs/2211.00241 (2022).

[92] T. R. Laporte and Paula M. Consolini. “Working in Practice But Not in Theory: Theoretical Challenges of “High-Reliability Organizations””. In: Journal of Public Administration Research and Theory 1 (1991), pp. 19–48.

[93] Thomas G. Dietterich. “Robust artificial intelligence and robust human organizations”. In: Frontiers of Computer Science 13 (2018), pp. 1–3.

[94] Nancy G Leveson. Engineering a safer world: Systems thinking applied to safety. The MIT Press, 2016.

[95] David Manheim. Building a Culture of Safety for AI: Perspectives and Challenges. 2023.

[96] National Research Council et al. Lessons Learned from the Fukushima Nuclear Accident for Improving Safety of U.S. Nuclear Plants. Washington, D.C.: National Academies Press, Oct. 2014.

[97] Diane Vaughan. The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA. Chicago, IL: University of Chicago Press, 1996.

[98] Dan Lamothe. Air Force Swears: Our Nuke Launch Code Was Never ’00000000’. Jan. 2014.

[99] Toby Ord. The precipice: Existential risk and the future of humanity. Hachette Books, 2020.

[100] U.S. Nuclear Regulatory Commission. Final Safety Culture Policy Statement. Federal Register. 2011.

[101] Bruce Schneier. “Inside the Twisted Mind of the Security Professional”. In: Wired (Mar. 2008).

[102] Dan Hendrycks and Mantas Mazeika. “X-Risk Analysis for AI Research”. In: ArXiv abs/2206.05862 (2022).

[103] CSRC Content Editor. Red Team - Glossary. EN-US.

[104] Amba Kak and Sarah West. Confronting Tech Power. 2023.

[105] Nassim Nicholas Taleb. “The Fourth Quadrant: A Map of the Limits of Statistics”. In: Edge, 2008.

[106] Irene Solaiman et al. “Release strategies and the social impacts of language models”. In: arXiv preprint arXiv:1908.09203 (2019).

[107] Neal Woollen. Incident Response (Why Planning is Important).

[108] Huashan Li et al. “The impact of chief risk officer appointments on firm risk and operational efficiency”. In: Journal of Operations Management (2022).

[109] Role of Internal Audit. URL: https://www.marquette.edu/riskunit/internalaudit/role.shtml.

[110] Heather Adkins et al. Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems. O’Reilly Media, 2020.

[111] Center for Security and Emerging Technology. AI Safety – Emerging Technology Observatory Research Almanac. 2023.

[112] Donald T Campbell. “Assessing the impact of planned social change”. In: Evaluation and program planning 2.1 (1979), pp. 67–90.

[113] Yohan J. John et al. “Dead rats, dopamine, performance metrics, and peacock tails: proxy failure is an inherent risk in goal-oriented systems”. In: Behavioral and Brain Sciences (2023), pp. 1–68. DOI:10.1017/S0140525X23002753.

[114] Jonathan Stray. “Aligning AI Optimization to Community Well-Being”. In: International Journal of Community Well-Being (2020).

[115] Jonathan Stray et al. “What are you optimizing for? Aligning Recommender Systems with Human Values”. In: ArXiv abs/2107.10939 (2021).

[116] Ziad Obermeyer et al. “Dissecting racial bias in an algorithm used to manage the health of populations”. In: Science 366 (2019), pp. 447–453.

[117] Dario Amodei and Jack Clark. Faulty reward functions in the wild. 2016.

[118] Alexander Pan, Kush Bhatia, and Jacob Steinhardt. “The effects of reward misspecification: Mapping and mitigating misaligned models”. In: ICLR (2022).

[119] G. Thut et al. “Activation of the human brain by monetary reward”. In: Neuroreport 8.5 (1997), pp. 1225–1228.

[120] Edmund T. Rolls. “The Orbitofrontal Cortex and Reward”. In: Cerebral Cortex 10.3 (Mar. 2000), pp. 284–294.

[121] T. Schroeder. Three Faces of Desire. Philosophy of Mind Series. Oxford University Press, USA, 2004.

[122] Joseph Carlsmith. “Existential Risk from Power-Seeking AI”. In: Oxford University Press (2023).

[123] John Mearsheimer. “Structural realism”. In: Oxford University Press, 2007.

[124] Bowen Baker et al. “Emergent Tool Use From Multi-Agent Autocurricula”. In: International Conference on Learning Representations. 2020.

[125] Dylan Hadfield-Menell et al. “The Off-Switch Game”. In: ArXiv abs/1611.08219 (2016).

[126] Alexander Pan et al. “Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.” In: ICML (2023).

[127] “Lyndon Baines Johnson”. In: Oxford Reference (2016).

[128] Anton Bakhtin et al. “Human-level play in the game of Diplomacy by combining language models with strategic reasoning”. In: Science 378 (2022), pp. 1067–1074.

[129] Paul Christiano et al. Deep reinforcement learning from human preferences. Discussed in https://www.deepmind.com/blog/specification-gaming-the-flip-side-of-ai-i…. 2017. arXiv: 1706.03741

[130] Xinyun Chen et al. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. 2017. arXiv: 1712.05526.

[131] Andy Zou et al. Benchmarking Neural Network Proxy Robustness to Optimization Pressure. 2023.

[132] Miles Turpin et al. “Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting”. In: ArXiv abs/2305.04388 (2023).

[133] Collin Burns et al. “Discovering Latent Knowledge in Language Models Without Supervision”. en. In: The Eleventh International Conference on Learning Representations. Feb. 2023.

[134] Andy Zou et al. Representation engineering: Understanding and controlling the inner workings of neural networks. 2023.

[135] Catherine Olsson et al. “In-context Learning and Induction Heads”. In: ArXiv abs/2209.11895 (2022).

[136] Kevin Ro Wang et al. “Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small”. en. In: The Eleventh International Conference on Learning Representations. Feb. 2023.

[137] Xinyang Zhang, Zheng Zhang, and Ting Wang. “Trojaning Language Models for Fun and Profit”. In: 2021 IEEE European Symposium on Security and Privacy (EuroS&P) (2020), pp. 179–197.

[138] Jiashu Xu et al. “Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models”. In: ArXiv abs/2305.14710 (2023).

[139] Dan Hendrycks et al. “Unsolved Problems in ML Safety”. In: ArXiv abs/2109.13916 (2021).

[140] Nora Belrose et al. “LEACE: Perfect linear concept erasure in closed form”. In: ArXiv abs/2306.03819 (2023).

[141] Alberto Giubilini and Julian Savulescu. “The Artificial Moral Advisor. The «Ideal Observer» Meets Artificial Intelligence”. eng. In: Philosophy & Technology 31.2 (2018), pp. 169–188.

[142] Nick Beckstead. On the overwhelming importance of shaping the far future. 2013.

[143] Jens Rasmussen. “Risk management in a Dynamic Society: A Modeling Problem”. English. In: Proceedings of the Conference on Human Interaction with Complex Systems, 1996.

[144] Jennifer Robertson. “Human rights vs. robot rights: Forecasts from Japan”. In: Critical Asian Studies 46.4 (2014), pp. 571–598.

[145] John Rawls. Political Liberalism. Columbia University Press, 1993.

[146] Toby Newberry and Toby Ord. “The Parliamentary Approach to Moral Uncertainty”. In: 2021.

[147] F.R. Frola and C.O. Miller. System Safety in Aircraft Acquisition. en. Tech. rep. Jan. 1984.

Ссылка на оригинал:

An Overview of Catastrophic AI Risks

Оцените качество перевода: