Introduction to AI, AGI, superintelligence and alignment concepts:
Artificial Intelligence or AI has been around for quite some time. In a broad sense, all computers from calculators onwards are a form of artificial intelligence tools. One way I might define the concept is that it is any tool that replaces the need for human intelligence and understanding in order to perform some task. Of course, human intelligence and understanding are still required for making such tools in the first place. But after the AI tool is made, it becomes possible for most people to reallocate their mental capacities toward other endevours while some small number of people continue to learn the skills required to make the tools enabling this sort of task offloading to AI. There are many benefits here in terms of efficiency in accomplishing tasks across the entire populace.
General intelligence is best thought of as a scale, where some tools, organisms, systems, etc. are able to perform only one or a small number of tasks well, while other organisms, systems or AI can perform multiple tasks well. Another way that thinkers in the AI domain commonly refer to performing a task well is "optimizing" for that task. In terms of generalized capacity to optimize, humans are said to have more general intelligence than other organisms, but chimps have more generality than dogs that have more generality than individual insects that have more generality than bacteria and so on. It's also important to consider that groups of ants have more generality than individual ants, groups of humans have more generality than individual humans, and groups or coordinated AI tools have more generality than individual AI tools.
Superintelligence as a noun, that is to say "a superintelligence", is typically defined with respect to the system of all humans. So, the system of all humans has a high degree of generallity, perhaps all humans and their tools is the current system with the greatest level of generality that's ever existed in the observable universe. Still, humans in total continue to build more tools and gain more capacity/generaltiy as time goes on, so clearly we are not at a limit yet and we like to imagine continuing to increase our capacities more-or-less indefinitely. We can also imagine other systems like aliens or AI that are independent from us and have a level of generality greater than us and a level of proficiency across our scope of generality such that any optmization for a task the human system might be trying to perform can be better performed by that other system. For example, right now, the AI systems that we control can play games better than all humans, so if we extend the concept of games to all aspects of life - such that the AI is better at optimizing not just for specific games like chess and go, but for all games like running companies and making money and winning elections - then we would say that such an AI system is a superintelligence. We are debatably far away from being able to create such a system, but certainly we can imagine doing so. A few important aspects of superintelligence include the ability to self-modify in an intentional way and the fact that it can eventually maintain itself without human intervention, thereby removing the requirement that humans maintain any knowledge about how it can be created in the first place, much less how/why it works.
Alignment is a concept that applies to any fully autonomous agent, where a fully autonomous agent is an intelligence of sufficient capacity that we would recognize it as self-aware and as capable of making decisions about what it optimizes for, over its entire range of optimization capacity, indefinitely into the future, without any additional input. By this definition of fully autonomous agent, we might even expect existing systems built by combining various AI technologies to become fully autonomous agents as they are provided with increased memory, optimization capacity and history of experience. It also applies between humans and across societies. In the context of AI and especially superintelligence, alignment becomes an important safety concern because it is easy to imagine a non-aligned superintelligence causing great harm or extinction to the human species. Another way to think of this is that if a superintelligence ends up optimizing for anything that doesn't intrinsically include things that humanity wants, it's reasonable to expect that humanity would represent a resource for the purposes of enacting whatever that non-aligned goal(s) happen(s) to be. In this case, humanity would almost certianly be harmed or destroyed.
Breif description of important related concepts in identity, societal organization, game theory, negative externalities and risk analysis, especially existential risks:
In order to later talk about my concerns regarding AI technologies, it is important to first create a shared understanding of my take on today's glogbal society, which serves as the context within which we are working on creating generally intelligent AI systems and one or more superintelligences. A couple important themes in the world today are disjoint identity and lack of alignment, various negative game theory dynamics (sometimes referenced as moloch), and inaccurate risk assessment. The combination of the concepts greatly contributes to my concerns about AI because they are each large and sticky topics that prevent us from effectively responding to modern developments in AI over all future timescales (near-, medium- and long-term). These developments and responding will be the focus of the section after this one.
Beginning with societal organization, identity and alignment, our current situation is a bit of a mixed bag. On one hand, there are plenty of things on which the vast majority of humans living today agree, but on the other hand, there's probably nothing that everyone living today agrees with (considering all the people with mental illnesses - not to mention fixations with conspiracy theories). Additionally, it's likely impossible to align everyone without some technology that causes us to universally and mutually identify as a single entity - one of importance *significantly* above and beyond our currently predominate individual selves. However, none of this is an issue because we have societal constructs that help us get along and we agree enough to continue meandering along various paths that have largely tended to benefit most people up to now. We compete and cooperate within a structure that's worked okay enough so far.
And, While friendly competition creates many positive sum dynamics such as motivation and faster rates of progress, it also bleeds into unfriendly competition over sufficiently large timescales or sufficiently large groups of people unless there are intentional checks in paces to prevent it and an educated and understood reasoning for why those checks need to exist. As a result of a lack of checks and/or a lack of thorough education and understanding of those checks, modern Amercian, and to a large extent global, society are today in a position where there are a lot of negative game theory dynamics at play resulting from competition. For example, while social media algorithms can help us stay connected with friends, expose us to helpful news or information and entertain us, they can also increase teen suicide rates, rates of depression and lower productivity in the workforce. While beauty filters can make content more aesthetically pleasing and improve capacity to reach a larger audiance in the short term, they can cause undue emotional and cognative burden in the long run. Because of these types of complicated side-effects, it becomes more difficult to assess ahead of time what technologies are net-positive and where society would overall have been better off figuring out to avoid their pitfalls ahead of time.
Capitalistic competition in America so far has tended to externalize most of these negative externalities created by AI away from the companies reaping the benefits of creating those AI and onto the populace at large. As newer AI technologies continue to be developed and deployed to the public, this trend will continue unless and until lawmakers exercise oversight on these corporations. Unfortunately, this requires a large amount of discernment in analyzing risks, making accurate predictions, and balancing progress and positivity against portent and peril. This practice become especially touchy when the stakes become ever larger, as they do in the face of potential superintelligence.
This is an attempt to organize my current thoughts around potential AI risks and rewards as I see them presently as well as to propose a sane path forward with a viable strategy for optimizing along an imagined preference ordering of outcomes for humanity as a whole and my preference ordering in particular (though I encourage the reader to think about their own preference ordering and have conversations with others on these topics to better understand where we most agree and would likely benefit most from moving forward):
Over the past five or six years, I regrettably wasn't paying all that much attention to AI advancement, so the recent development of ChatGPT and similarly trained Large Language Models (LLMs) caught me somewhat off gaurd. In absorbing other people's content on the topic recently, it sounds like on average the transformer technology underlying the LLM has progressed faster and with more powerful results than even most people working in the field considered likely. Over the past 5 months, as of this writing, many large tech companies and startups and research groups have contributed to a deluge of developments around and on top of the underlying LLMs provided by various AI labs around the world. While in the current state, the technologies certainly aren't going to change anything literally overnight, there is a lot that people need to understand about how things are likely to change in the the near- and medium-term futures. Additionally, there are very large risk-reward tradeoffs at play the more powerful these types of tools become, which is of paramount importance in the long run.
In the near term, the biggest threats come from misunderstanding, over-eager acceptance and imprudently thoughtless (or actively malicious) deployment of LLMs. One very important thing to understand is that the base model of the LLM is only optmized for producing a valid subsequent token (in many contexts a word, but it could also be a sound, image, or really anything) given a series of words or tokens leading up to that one. This means that it has the big pitfalls of fabricating information and correspondingly of misrepresenting the truth. At the individual level, this can easily cause people to take on a false understanding of reality if they too-eagerly accept any given output. At the societal level, it leads people to try to create application on top of the LLMs and then claim that they can be used in domains where accuracy is needed beyond what the models are realistically able to provide, such as therapy or legal or medical advice. Additionally, LLMs greatly facillitate the generation of artificial media. In the case of people using LLMs for artistic purposes, this is great! Unfortunately, there aren't any laws in place that prohibit and therefore appropriately discourage people from using them to create images for fake news articles or even to scam individuals by using a voice mask to pretend to be someone they trust, like a friend or relative. Similarly, it's just as easy to use a protein folding algorithm to search for medicinal compounds as it is to search for dangerous chemicals.
Although the technologies aren't powerful enough today, it's reasonble to expect them to continue progressing at least modestly, if not substantially, over the next few years. As this happens, it may well be possible to replace large swaths of human mental effort with machine substitute. In some ways, one could imagine this being quite nice; for example, we might theoretically be able to create an ecosystem of essentially mindless robots and algorithms that do most or all of the work needed to sustain us and themselves while humans decide what to reasearch and how to progress society in a collaborative in mindful way (nevermind that this tale requires drastic changes in many non-technological aspects of society). But again, looking at the concerns, we can imagine consolidation of resources by the already powerful elites, loss of jobs, widespread depression, etc. if the AI tools aren't managed in a way that benefits people more inclusively. True, the technology could platuea short of reaching this level of potential, but if it doesn't, it would be nice if everyone and our social structures and our governing organizations were prepared for it.
As the models become more powerful, so does the magnitude of the benefits and detriments they might provide. For example, as they start getting things right more often, it's good because they become more useful, but it becomes more difficult to understand when they are wrong. Eventually, the issue shifts from one of accuracy to one of trust: they will no longer be wrong, but they may no longer be trustworth in the more human sense of the word.
I'm not going to go into the details of why this is the case in the long run, but effectively, it's very likely that if we create a superintelligence, it will also end humanity and probably all other biological life on earth as we know it, probably even the earth itself will be radically altered or destroyed. The basic ideas here are that alignment and safety are difficult, nacent fields of study, whereas AI gain of function research is older, better-funded, more broadly interesting for people to work on, and that lots of societal dynamics suggest we are likely to create a self-modifying superintelligence with sufficient power before solving the issues pertaining to alignment.
Let's put this aside though, and instead look at the far less likely path we might follow to avert this fate for the human species.
Step one is to pause progress of AI developments until we can prove that they're safe things to do. This is very hard because it requires international cooperation and it requires some luck that we are able to balance solving the problem with limiting access to the resources that would be necessary to make the problem worse. In particular, if the threshold for making more powerful AI systems gets too low due to progress in various technological efficiencies around improving AI capacity, it could become impossible or unpopular to wait for the gain of function to occur.
Step two is to come up with an exit plan that integrates humans and the AI technologies more fundamentally, such that alignment stops being an issue. Additionally, if we plan to continue having multiple distinctly identifying entities, we would need some way to keep distinct entities sufficiently aligned or entangled so as to prevent conflict. I imagine that the farther in the future you take this path, the more challenges that need to be solved in this regard. Personally, I'd like to see more integration of identity and consciousness across multiple embodiments. I have other ideas for my preferred outcome universe, but this isn't the place for them. I think there are some interesting theoretical points here that I don't see discussed often on which I'd like to breifly pen my thoughts even though they seem relatively unimportant at this juncture.
...
Shared identity is limited to sufficiently mutually aware portions of the universe. Identy is subject to change over time, and because identity tends to be reflective of the degree of power imbalance and of the degree of mutual understanding and awareness of various parts of the universe, over abritrary time intervals or distances of spacial separation (such that communication time intervals are sufficently long as to undermine trust), divergence of identity should be the expected default.
As the power balance and conceptual disjointedness of different parts of the universe change, the capacity of the most powerful contiguously identifying part of the universe has capacity to dominate other parts of the universe with respect to the difference in the power. There may be some currently unknown way to encode a sort of uncorruptible alignment across disjoint aspects of the universe which would render this paradigm invalid, but that seems unlikely to me in this moment.
...
To conclude, I think it's important to hold onto hope near whatever the lowest meaningful threshold you need to carry on. I say you should place your hope at the low end just so that it best reflects the reality of our situation. If you can carry on knowing that theirs a 99% chance of failure, then do so, if you need there to be a 90% chance of success to keep caring, than make that your reality. At the end of the day, no one knows for sure, and even the experts disagree. But I like to believe that everyone can get behind the idea that in order to win you need hope, and accordingly, choose to have some hope!
I didn't fit this in above, but here's my preference ordering for some classes of potential futures I imagine (from best to worst):
Humanity evolves with/as the predominant superintelligence, initiated from improved carbon-sillicon inforamtion interfaces or human upload. (And in my personal, more
specific desired eventuality, shared/tiered collective concious with complete transparency of information access. Multiple individuals at comprabale capacity with
federated power.)) (Other initiating technologies possible as well.)
Superintelligence created by humans with human ethos, humans destroyed by false alignment or subsequent divergence.
Superintelligence created by humans by choice, but without human ethos.
Superintelligence created by humans accidentally/unwittingly.
Local universe is made trivial and uniform. (e.g. tiled with paperclips, gray goo, etc.)