Polynomials, AGI, and Neural Nets
I am a Student, who finds beauty in simple things. I like to teach sometimes.
If neural networks, including modern LLMs, are just a series of simple linear regressions whose outputs are modified by a simple activation function to add a bit of non-linearity, and then the output is fed again to another layer and so on, does that mean that we can mathematically have AGI or any other very ambitious task by using a single polynomial? Probably of a very high n-order, since any function can be approximated by a polynomial at the end of the day. Can there be a single polynomial out there that can simply replace GPT-4?
AGI might actually be trivial if one accepts local mathematical realism as a consequence of the fundamental axiom of systems theory.
In theory, a single polynomial could work. The benefit of neural networks is that there is specialized hardware for training thousands of parameters in parallel. There’s nothing special about a neural network otherwise. There are many universal function approximators, such as polynomials.
However, polynomial regression is impractical for very high orders (e.g., a polynomial whose n is in the millions or billions) over huge ranges and with a very high number of dimensions, since every token or word is represented by a high-dimensional vector. But there are probably countless polynomials that will never be found that can actually replace any neural net regardless of its complexity. It's really crazy. Probably, there is a near-perfect predictor polynomial function for everything in this universe (e.g., predicting Apple stock prices for every day over the next 100 years correctly), but it is too hard to obtain its coefficients.
It’s mainly a technological limitation. If there were chips specialized to working with polynomials in the same way GPUs are specialized for neural networks, then it could be possible. As it is now, it would just take a long time—there’s no theoretical blocker. Additionally, this idea aligns with the Bolzano-Weierstrass theorem from the 1830s, which suggests that such approximations exist.
That being said, the idea of a near-perfect predictor polynomial is debatable because many phenomena are not computable, no matter what model one constructs.
The question of whether it's just a technological limitation depends on what one means by "technical." Evaluating a high-order polynomial accurately is a problem for a computer because it involves adding up a bunch of terms with large magnitudes and opposite signs.
Instead of the word “technical,” the word “technological” seems more appropriate. There are no theoretical problems with modeling language with a polynomial on paper, but we don’t have the technology to do it as quickly as the technology that enables us to model language with neural networks.