
Google’s DeepMind AI division, Starship IT protein folding. Therefore, it is probably not surprising that its creators eventually turned to computer programming as a personal interest. In Thursday’s issue of Science, the company describes a system it has developed that generates code in response to programming specific to those used in human programming competitions.
On an average difficulty, the AI system can score close to the first half of the participants. However, it had some trouble scaling as it was less likely to produce a successful program on problems where more code was typically required. Still, it’s a little surprising that it works without any structural information about algorithms or programming languages.
rise to the challenge
Computer programming challenges are pretty simple: People are given a task to complete and generate the code that must perform the requested task. In an example given in the new document, programmers are given two strings and asked to determine whether the shorter of the two strings can be produced, by placing backspaces on some keys needed to type the larger one. The submitted programs are then checked to see if they provide a general solution to the problem or fail when additional samples are tested.
Given enough examples of programs that can solve a single problem, it will likely be possible for an AI system to understand the algorithmic structure needed to be successful. However, this will not be a general solution to overcome any problem; An AI trained in a difficulty class fails when asked to tackle an unrelated challenge.
To make things more generalizable, the DeepMind team has treated it as a bit of a language issue. To some extent, the description of the difficulty is a statement of what the algorithm should do, while the code is just a statement of the same thing in a different language. So the AI in question was designed to have two parts: one that takes the description and converts it to an internal representation, and the second that uses the internal representation to generate functional code.
The training of the system was also a two-stage process. In the first stage, the system was asked to process a snapshot of material on GitHub with a total of over 700GB of code. (This may not sound like much these days, when you can fit it on a flash drive, but remember that the code is just raw text so you get lots of lines per gigabyte.) Note that this data will also include: It’s natural to describe what nearby code does. comments that should use the language and thus help with both input and output tasks.
After the system was trained, it went through a tuning process. DeepMind held its own programming competitions and then submitted the results to the system: the problem description, the working code, the error code, and the test cases used to check it.
Similar approaches have been tried before, but DeepMind states that it can dedicate more resources to education. “A major driver of AlphaCode’s performance came from scaling the number of model instances much more than previous work,” says the article.