Re: Recreation and Restoration Through AI

Berhane Cole
10 min readMay 17, 2021
Left: Gerhard Richter’s ‘Stadtbild’ (1968). Right: Sigmar Polke’s ‘Don Quichotte’ (1968)

Last week, a video that made rounds a couple years ago caught my interest. The video is an upscaling of the Lumière brothers’ 1896 breakthrough, L’arrivée d’un train en gare de La Ciotat, done by the youtuber Denis Shiryaev whose channel is dedicated to similar exercises in AI-assisted film restoration. When I first came across this video years ago I had no context for how Shiryaev transformed the Lumière’s film experiment, which was crafted to introduce the public to the efficacy of motion pictures as a simulacrum for reality, into a demonstration of AI’s potential to extend the Lumière’s film and aims to the 21st Century. Still with no context, Shiryaev’s feat was still impressive.

I am greatly interested in images and restoration so I was naturally intrigued by the question of how Shiryaev reformed this late-19th century artifact done on a crank-operated 16fps cinématographe into a 60fps and 4k demonstration of artificial intelligence’s viability. Evidently, Shiryaev utilized two deep learning technologies in particular to upscale the film: DAIN AI and Topaz Labs’ Gigapixel AI. Both of these technologies are powered by Artificial Intelligence systems so in order to understand them we must first understand the basics of AI.

From Artificial Intelligence to Machine Learning to Deep Learning:

photo from intel

The field of artificial intelligence is vast so this will be a cursory glance on some aspects of it. An initial clarification to make is the one between Artificial Intelligence, Machine Learning, and Deep Learning. As the illustration above demonstrates: Deep Learning is a subset of Machine Learning, which in turn, is a subset of Artificial Intelligence. We can think of Artificial Intelligence broadly as a classification of when programs make choices that are ‘intelligent.’ The deeper one goes within the field, the more complex those choices and the mechanisms that assist in that choice-making process become.

We can make programs make choices by programming methods to differentiate what the program should do according to certain situations. This is not artificial intelligence but a program reading information and acting on that data differently according to that data, helps illustrate how artificial intelligence works. A simple way to instruct a program to make these sorts of choices is through conditional statements, e.g. if-else and switch statements in JavaScript. These are pre-programmed choices delegated by programmers. The fields of Artificial Intelligence and Machine Learning begin when a program can distinguish patterns and make choices accordingly without explicitly being programmed to do so.

To illustrate machine reasoning, let’s take a look at predictive text and T9. We know T9 as this obsolete technology that enabled faster texting before the advent of touchscreen on cellphones, however, I find it sheds light on an elementary to understand Machine Learning. A phone user with a state-of-the-art Nokia set-up utilizing T9 has a restricted set of inputs that correspond to letters on the alphabet. The work of the T9 program is to interpret what the user is trying to type with their numeric inputs. In order to facilitate these options the T9 program will give to the user, programmers will need to make some kind of dictionary or database available for the program to check the user’s word permutations against. This would be universal for all user’s utilizing this T9 service.

T9’s structure can be mapped as a Trie/Prefix-Tree:

From University of Washington course

A way to think about T9 moving from programmed choices to more intelligent behavior is how it might internalize and intuit user behavior.

Thinking t9:User input is always an integer between 2 and 9user input 2 
letter possibilities [a, b, c]
full input [2]
current dictionary word possibility [a];
current idiomatic word possibilities [...]
current subsequent word possibilities [...];
subsequent input possibilities [2, 3, 4, ... 9];
user input 2
letter possibilities [a, b, c]
full input [2, 2]
current dictionary word possibility [ undefined ]
current idiomatic word possibilities [...]
current subsequent word possibilities [...];
subsequent input possibilities [2, 3, 4, ... 9],
user input 7
letter possibilities [p, r, s]
full input [2, 2, 7]
current dictionary word possibility [ cap, car, ... ]
current idiomatic word possibilities [...]
current subsequent word possibilities [...];
subsequent input possibilities [2, 3, 4, ... 9],

Where the intelligence part comes in is how the program begins to integrate a user’s texting habits over time. If we are simply thinking about words the program may adapt to discover and suggest non-dictionary searchable idioms does this user use. If we take this further we can reason about how the program may suggest subsequent words to the user. This brings the program’s intelligence to something more akin to understanding language and nuances of meaning which can only really be enabled with Deep Learning.

Deep Learning and Neural Networks:

As intimated above, as we burrow deeper into Artificial Intelligence, how ‘intelligent’ programs make choices and those choices/predictions themselves become more and more complex. Neural Networks are a mechanism that can usher this complexity. We can understand Neural Networks as a set of algorithms that take in an input, or vector, and expose tests on that input to lead the input to some desired or eventual output.

They are so-called neural networks because they are modeled after the way neurons react to stimuli in an organic brain. That is to say, in models, Neural Networks architecture is that of a cluster of nodes where each node responds to all of the inputs and react in the way they are programmed transforming the data as it moves through each layer. An excerpt from Artificial Intelligence Wiki PathMind’s explanation of Neural Networks:

Each step for a neural network involves a guess, an error measurement and a slight update in its weights, an incremental adjustment to the coefficients, as it slowly learns to pay attention to the most important features.

Deep Learning comes into play when the depth and complexity of the amount of hidden layers the input is checked against builds up past three.

The study of Neural Networks is vast and supersedes the scope of this beginner’s primer, however some examples of how what kinds of Machine Learning employ Neural Networks are Deep Reinforcement Learning, Supervised Learning, and Unsupervised Learning. Additionally, there are many types of Neural Networks that are used in different fields of Artificial Intelligence such as Convolution Neural Networks (CNN/ConvNet), Feed-Forward Networks, Recurrent Networks, and Long Short-Term Memory Network (LTSM). A discussion of these can be found here.

A succinct explanation of how we will regard Neural Networks also comes from PathMind:

So you can think of neural networks as feature-producers that plug modularly into other functions. For example, you could make a convolutional neural network learn image features on ImageNet with supervised training, and then you could take the activations/features learned by that neural network and feed it into a second algorithm that would learn to group images.

Serendipitously, the above quote forecasts the direction of which Neural Network we will explore. Since the upscaling of the Lumière Brothers’ film is a question of how Neural Networks work with images, we will focus our attention on Convolutional Neural Networks.

Convolutional Neural Networks:

How Kernels in a CNN interpret information

Convolutional Neural Network’s are an example of a Neural Network commonly employed to process and cluster images. CNNs do not see images as a two-dimensional plane but rather as volumes with channels or pages however many dimensions deep that encode features of the image, such as the RGB color values, in a way that the program can ingest the image. These channels are made up of tensors which are matrices of information that illustrate a feature of a pixel from the input image.

an example visual representation of a multidimensional object

ConvNets work over and process an image by performing the same algorithms on each pixel of the input image. They work over a subset of the image called a kernel scanning over the entire image performing calculations. The kernel can be of variable size but obviously the size of the kernel the ConvNet employs involves trade-offs as as smaller kernel will take longer time to travers the entirety of an image but will provide far more information about the input while a larger kernel demands fewer resources but may provide a meeker bounty when it comes to information. This trade-off can be critical as neural networks effectiveness can be tied to the amount to their data-store of information.

When training machines, techniques such as Backpropagation are used to account for errors so that the machines can learn from past predictions. This is directly tied to the amount of information that a machine has access to. This is relevant when it comes to discussing ConvNets as they often fall under the classification of Unsupervised Learning. This is to say, Machine Learning without a clear distinction of a correct answer the machine should be striving for, so more of a guessing game and in a guessing game you are more likely to run into errors.

ConvNets are utilized in many aspects of technology from computer vision as it relates to self-driving cars and drones to medical diagnostics to more mundane uses such as google image search and facial recognition. Additionally, ConvNets are used to upscale degraded images and recognize inperfections such as scratches and dust on film reels and recreate what the images may look like from a store of similar images. However, when addressing the technology used in the updated L’arrivée d’un train en gare de La Ciotat, ConvNets are only a part of the equation.

Deep Network Interpolation:

The technologies implemented by Shiryaev, DAIN AI and Gigapixel AI, both utilize AI to interpolate an image. Image Interpolation deals with estimating features of an image that are not currently there by running tests on the image that is there. Interpolation is used in many examples of digital image manipulation such as image resizing and motion smoothing settings in televisions.

There are many types of interpolation but as it relates to upscaling the Lumière film, there are two necessary needs for interpolation. One to bring the frame rate up from 16fps to 60fps so almost 400% as well as increasing the base image quality to 4k. For Shiryaev, DAIN AI handles the frame-rate while Gigapixel handles upscaling the image quality. While we cannot know exactly what algorithms these programs use, in a statement Topaz Labs, the team behind Gigapixel AI sheds some light on their technology and how it differs from traditional interpolation:

Traditional up-scale methods use “interpolation” (bi-cubic, Lanczos, fractal, etc.) to create higher resolution images, but exhibit limitations such as loss of detail and sharpness, which causes very pixelated and ‘blurry’ upsampled images. Gigapixel AI, however analyzes the image and recognizes details and structures and ‘completes’ the image with AI Models that we have trained in our lab. Our AI Models are trained with thousands of images with different resolutions to learn how to distinguish poorly upsampled images from high quality upsampled images. During this training period, our models not only learn to distinguish quality but also learn to recognize certain structures within the image. This information is committed to ‘memory’ and used later as a reference to complete and achieve high quality upsampled images.

Towards a Newer La Ciotat:

Every topic touched upon in this primer is a field unto itself. I have not explored the nuances of the algorithms that power and in many ways are the starts of Neural Networks and they deserve a thorough examination in order to understand how any of this mechanically works.

With this limitation granted, anyone can see that there is something miraculous about the upscaling of L’arrivée d’un train en gare de La Ciotat. When I first encountered the video I had little understanding of what made it so miraculous outside of bringing to life these long-dead commuters and somehow making history contemporary. It’s use of AI was a mystery to me and how to comprehend how AI would be used to ‘complete’ this film even more opaque. My foray into the world of data structures and algorithms the past few weeks have afforded the comprehension to begin to fathom how these technologies are working.

There are many apocryphal stories about the original audience’s response to seeing this film in the late 19th century. Those famous stories of patrons running to the back of the theater in fear of the incoming train seem hyperbolic. It is more likely that the initial reception of the film hued closer to the passing interest and curiosity of those who watched its 4k AI-assisted descendant. This video is just one example of many experiments in using AI to upscale old films, however, it still underlines the immense possibilities in AI similar to what the Lumières exhibited for film.

With the 4k video there is more of the film predicted than what was there to begin with, yet it seems more ‘real’ and ‘complete.’ There is something frightening in that, as AI can have some frightening aspects and implications. However, the technology and possibilities remain exhilarating at the same time.

--

--