Artificial intelligence, as it aims to mimic intelligence, must necessarily be analyzed through philosophical concepts that study intelligence.
While majoring in physics and studying robotics, I found myself increasingly dissatisfied with the field of physics. Physics certainly explains how we move; however, it doesn't explain how we should move. Why shouldn't I kill you with a knife? Is this something that can be explained by gravity, force, or energy? Isn't this a question of ethics? Intelligence itself seemed distant from the natural sciences. To study intelligence itself, I began studying philosophy, which has been researching intelligence for thousands of years. Philosophy is already equipped with fields like logic, which studies the structure of thought; epistemology, which studies the forms of cognition; aesthetics, which studies aesthetic judgment; and ethics. In particular, as I studied Kant, I began to explore how his ideas could be applied to artificial intelligence and found connections between Kant's "Critique of Pure Reason" and the unexplainability of AI judgments. The "Critique of Pure Reason" maps out the entire capacity of human sensibility, intellect, and reason. Using this map, it examines whether judgments about propositions beyond experience are truly possible. Having grasped the full extent of human cognitive abilities, I began to compare these with the capabilities exhibited by artificial intelligence.
Breaking the stereotype - Zero initialization leads to performance improvement.
Among my peers, I would likely be one of the people who have worked the most with backpropagation. I have degrees in both physics and philosophy, which reflect my tendency to dig deep into the fundamentals rather than study superficially. I approached studying artificial intelligence in the same way, and this is expressed in how I began practicing coding without relying on frameworks. For instance, I found it frustrating to implement a convolutional neural network (CNN) just by writing the standard code like "torch.nn.conv2d" after looking at a few images that describe the principle. It felt like I was implementing it without truly understanding it, and without a solid understanding, I couldn't freely modify the code. I started to believe that, in order to modify things freely, I had to fully understand them.
Consider someone researching car engines: How can they make a better engine? One simple approach is to take an existing engine and add some external modifications to it. However, if they don't fully understand the engine, they can only modify the exterior and lack the ability to open up and adjust the internal components. Therefore, to study it fundamentally, they would need to disassemble and reassemble all ten thousand engine parts repeatedly. Through this process, they would eventually be able to fully understand the mechanics.
In the same way, I implemented most neural networks by writing the forward and backward propagation code line by line, without using any frameworks. I also researched spiking neural networks for a while, where backpropagation is challenging due to the non-differentiability, requiring approximate gradients. While implementing this, I looked at the backpropagation formula. In backpropagation, the gradient should not be affected by the magnitude of the weights (though it is indirectly influenced). Whether the weight is -1, 0, or an infinite value, learning should still occur. So, I sketched out the internal operations in a notebook and found that, theoretically, learning should occur even if the weight is initialized to zero. However, the academic consensus was that it wouldn’t work. This led to a conflict, as my understanding suggested it should work. I then conducted experiments, proved my hypothesis, and published a paper on it. After detailed analysis, I discovered that while initializing with zero typically prevents learning, there is a way to slightly tweak the process so that learning does indeed occur.
Can AI outperform physicists in the field of physics? — Yes
Intuitive physics refers to the ability of animals to process physical phenomena without the use of formal physics. Previous research has focused on implementing artificial intelligence that understands simple physical concepts, such as Newton's laws. However, from my perspective, physics has limitations. Even formulating a physical phenomenon into a mathematical equation is often a significant challenge. Most physics textbooks simplify situations, but real-world problems involve many variables and become increasingly complex. When formulated into mathematical equations, these typically take the form of differential equations, and in most cases, we don’t even know if these equations have solutions. It’s not an exaggeration to say that solving them is practically impossible 99% of the time. This is why mathematicians around the world are constantly searching for solutions to differential equations, and applied mathematics researchers strive to find approximate solutions. Even textbooks used in quantum mechanics courses start with the disclaimer that only a few simple problems can actually be solved. Physics cannot fully calculate the orbits of planets in the solar system. This issue is known as the "many-body problem" and is a major research topic in academia. Yet, physicists often speak as if they fully understand concepts like "the Earth orbits the Sun," when in reality, these are approximate calculations. How do we express the likelihood of an asteroid or satellite colliding with Earth? We express it in probabilistic terms, which means it is impossible to calculate with complete accuracy. We cannot even precisely calculate something as simple as walking. This is why implementing bipedal robots is so challenging in mechanical engineering. AI that uses reinforcement learning algorithms can actually perform better in obtaining approximations. This is why engineering students all study numerical analysis, which is the discipline of finding approximate solutions. The goal of previous research was to implement AI with as much physical knowledge as possible, but physics itself has limitations. So, I began to wonder, could AI surpass physics? It seemed straightforward to demonstrate. I created a physics problem that no one could solve, trained AI to solve it, and then compared the results with those obtained through physics. In this way, I showed that AI could indeed surpass physics.
How can human knowledge be transferred to AI? — Through the input layer with an activation function.
During my university years, I personally studied the philosophy of medicine, likely based on Hegel's philosophy. As modern medicine developed, diseases began to be expressed in measurable forms, quantifying what is essentially qualitative. This raises an issue: if hypertension is defined as having a blood pressure of 120 or above, then what about 119.99999999? The argument here is that quantitative measures cannot fully capture the qualitative nature of a condition. Computers process information quantitatively, meaning they use numbers. How could I represent qualitative differences in a way that would be clearly distinguishable quantitatively to such a machine? I thought that perhaps by creating a larger numerical difference, I could achieve this. For example, I suggested the idea of multiplying input values near 36 degrees Celsius (normal body temperature) by 10, turning them into a value like 360. In this way, 35 and 36 become 35 and 360. Academically, this can be described as applying an activation function in the input layer. The function is linear (y=x), but in certain ranges, it becomes nonlinear (y=10x). Although this information was not included in the paper, I found that after applying this to the input layer, the time it took to reach peak performance during training was significantly reduced.