Please provide your feedback about the website at feedback@apc.iisc.ernet.in

“Any computing machine that is to solve a complex mathematical problem must be ‘programmed’ for this task. This means that the complex operation of solving that problem must be replaced by a combination of the basic operations of the machine.”

John von Neumann

Can machines do the job of mathematicians? Professor Kevin Buzzard, a pure mathematician (an algebraic number theorist) from Imperial College, London, addressed this question in his plenary talk at the 14th EECS Research Students Symposium (EECS 2023) at the Indian Institute of Science (IISc), Bengaluru. A recording of the talk is available** here.**

Buzzard dealt with this question by taking up two aspects of his job: (i) research, and (ii) teaching. What does research look like in pure mathematics? In mathematics, some problems may be easy to state. For example, the Goldbach conjecture states that every even number which is 4 or more is the sum of 2 prime numbers. However, there is as yet no proof or a counterexample. Fermat’s Last Theorem states that there are no positive integers x, y, z and n ≥ 3 such that x^{n} + y^{n} = z^{n}. This conjecture was open for 351 years, before it was proved in 1994. The proof uses elliptic curves, modular forms, Galois representations, deformation theory, finite flat group schemes, algebraic geometry, harmonic analysis, and abelian varieties, while the statement uses only elementary notions of positive integers, addition, and exponentiation. No ‘mathematical machinery’ was needed to state the theorem, but all the above-stated machinery was required to prove the same.

Some problems are hard to even state. All the machinery that was required to prove Fermat’s Last Theorem was needed to rigorously state the Birch and Swinnerton–Dyer conjecture.

Pure mathematics researchers may work on solving a hard problem, or they may work on moving closer to solving a hard problem. Where do computers come in? Buzzard says that most hard problems (and many easy problems) in pure mathematics cannot be solved with a finite computation. For example, the Goldbach conjecture has been checked by computers for all even numbers ‘n’ in the range 4 ≤ n ≤ 4 x 10^{18}; however, a check in this range cannot amount to a proof!

Another example is the theorem of Euclid, which states that there are infinitely many prime numbers. In this case, it is easy to write Python code that will give prime numbers one after the other – this is some evidence that there are infinitely many prime numbers but does not constitute a proof. But nowadays, we can use computers to prove that there are infinitely many prime numbers; we can ask this of large language models (LLMs) such as ChatGPT. So, Buzzard asked ChatGPT, “Can you give me a proof that there are infinitely many prime numbers?”. He got a proof, which he marked 7 out of 10, as the logic in the composite case was confused. It resembled the kind of solution one would get when one asks a class of undergraduate (UG) students to prove it!

Next, Buzzard asked ChatGPT to solve a harder mathematical problem, “Can you prove that there are infinitely many prime numbers which end in 7?” This problem was proved by humans just over a 100 years ago, but ChatGPT’s answer was a ‘disaster’, says Buzzard! It did what a fresh student mathematician would try – it knew a method and tried to generalise it; however, this cannot work here, as a large composite number ending in 7 can be the product of 2 smaller primes that do not; the system did not notice this.

The next question to ChatGPT was “Can you prove that there are infinitely many even numbers which end in 7?” This was an even greater disaster! So, Buzzard said that “Right now, I can’t see it scaling to serious mathematics.” Nevertheless, these systems have become exponentially better recently, and some computer scientists say that this exponential increase will continue. However, according to Buzzard, these models are training on the Internet and scale-up is doubtful right now.

Buzzard also demonstrated the use of an interactive theorem prover (ITP) called Lean (developed by Leo de Moura at Microsoft Research) to prove that there are infinitely many prime numbers. Did Lean prove it automatically? No. Lean’s mathematics library *mathlib* contains over a million lines of code, containing more than 110,000 theorems that were typed in manually by humans. Lean could prove that there are infinitely many primes that end in 7; the proof contained tens of thousands of lines of incomprehensible computer code; this proof is not in *mathlib*; the computer had translated a proof from the maths library of *metamath*, another interactive theorem prover; however, the *metamath* proof was written by a human—Mario Carneiro—who had also written the program that translated the proof into Lean. So, with interactive theorem provers, the proof is correct but ultimately human-written.

What happens when we put LLMs and ITPs together? An LLM has been trained by Meta to write Lean code, and it has automatically proved half of a very old International Math Olympiad problem. However, Buzzard says this is at the high-school level.

The second part of Buzzard’s job is teaching; he gives undergraduate lectures and has one-to-one conversations with students. Especially during COVID, Buzzard had to teach online, and his university has recordings of all his lectures. So, what is left with is the one-to-one conversations. Will a LLM be able to replace Buzzard here? It would be easier to replace Buzzard in an ‘Introduction to proofs’ course (for first-year undergraduate students) than for a real manifolds course (for final-year undergraduate/first-year Master of Science students). What about supervising doctoral (PhD) students? How does a first-year PhD student find out the answer to questions such as “Does this claim, which is true for *foos*, also hold for generalised *foos*?” Students may ask more experienced PhD students/ask a post-doctoral fellow/ask their PhD advisor/ask Google/ask *mathoverflow*/try to work it out themselves. The answer may be standard, and it may be possible to find it in a standard textbook or by asking the PhD advisor.

Buzzard says that if LLMs cannot solve this kind of task, then they cannot prove interesting new theorems. ITPs can be used to help make such a system, where the backend can have a big database of formalised theorems and counterexamples. We already have counterexamples and hammers (tools that put theorems together), and a human-friendly front end. The current problem is the lack of a database of modern results in pure mathematics in a form understandable by an ITP. This is a problem because computers cannot read human literature! Right now, LLMs struggle to read human proofs and translate them into Lean. However, they are getting very good at translating theorem statements.

In summary, Buzzard says that he is not scared of being replaced by a machine right now. However, writing code in interactive theorem provers that correspond to proofs of hard theorems is fun, and may also be important. He suggests the following to such enthusiasts: “Choose a mathematical statement or proof you want to formalise and ask an expert how to do it or where to get help.”