IT 06: AI & Art, Idea-Set-Match

Cyan Man, White Parrot,  M. ©  (2018)

A major recent development in AI-research was automated image captioning. Machine learning algorithms could already label objects in images, and now they learned to put those labels into natural language descriptions.

Researchers: We can do image to text. Why not try doing text to images?

It was a more difficult task. They didn’t want to retrieve existing images the way a search-engine does. They wanted to generate novel scenes, differentiated from this world.

Cyan Man, White Parrot [Enter]

It was a 32 x 32-pixel tile. If you looked closely, you would find a purple blob and black snaking line. They tried some other prompts like “Elephant, Fish, Monster”...maybe something to hang on your wall. This application showed the potential for what might become possible in the future. 

You may be thinking, AI-generated images aren’t new. You may have heard about a 50% generated portrait of The Madonna. It was created by M., who explained to me that this type of art required him to collect a random dataset of landscapes, run them through an algorithm and superimpose those abstraction manually into The Madonna.

The Madonna, M. ©  (2020)

If I want to create landscapes, I view a lot of landscapes - If I want to create portraits, I train on portraits.” - M.

But then a portrait-AI would not really be able to create landscapes out of portraits, or vice-versa…Same with those hyper-realistic-fake-faces used by scammers, those come from an AI that generates pictures of artificial faces, combining characteristics from a set of millions of photos of real faces.

Generating a novel scene from any combination of text-input requires a different approach. Huge AI models contain every image they are fed. What this means is that we can now create images without having to execute them in paint or with a camera, a pencil, tools, or code. The input is just a simple line of text. An AI company said they could create images from text captions, but they haven’t released a finished version to the public yet. So, over the past year, a community of independent, open-source developers-built text-to-image generators out of other pre-trained models that they did have access to.

“I've been up until the morning, just really trying to change things, piece things together. I've done about 7,000 images. It’s ridiculous. It’s not dancing, and it could be better.”

The craft of communicating  with these deep learning models has been dubbed “prompt engineering”. If you can find the right words, you can refine the way you talk to the machine.
 
Diver smoking red pipe,  M. © (2018)

05:00

It becomes a kind of a dialog.
You can say “Diver smoking red pipe”.

Cut! lino cut or wood cut...?

05:15

Coming up with funny pairings, like “Black Snake, Purple Dog.”

Some of the most striking images can come from prompting the model to synthesize a long list of concepts:

[The particles get stuck in place they like to be, with their friends, there's a force of attraction, packed together. Not rolling over each other, in a nice pattern, oranges in a bowl. Juggling in place, but not having enough motion to get loose of their own  and to break the structure down, Ice Cream.] 

It's having a very strange collaborator to bounce ideas off and get unpredictable ideas back.

Ice Cream, M. © (2021)

For an image generator to be able to respond to so many different prompts, it needs a massive, diverse training dataset. Hundreds of millions of images scraped from the internet, along with their text descriptions. Those captions come from things like the alt/text that website owners upload with their images, for accessibility and for search engines. That’s how the engineers get these giant datasets.




A chair towards Miró, in Latent Space


What do the AI-Models do with them?

We might assume that when we give them a text prompt, like Yellow Steamboat, in the style of superabstraction. They search through the training data to find related images and then copy over some of those pixels, but that’s not what’s happening.

The new generated image doesn’t come from the training data, it comes from the “Latent Space” of the deep learning model. If I gave you 2 images and told you to match them to  2 captions, you’d have no problem. What images look like to a machine are just 1s and 0s, pixel values for red, green, and blue.
You’d guess, and that’s what the computer does too at first.

Yellow Steamboat, M. © (2018)

You could go through thousands of rounds of this and never figure out how to get better at it. Which grid of 0s and 1s [Image] is related to this other grid of 0s and 1s [Caption]...?

Whereas a computer can eventually figure out a method that works- that’s what deep learning does. To understand that this arrangement of pixels is the Black Tree, and this arrangement of pixels is a Turtle, it looks for metrics that help separate these images in mathematical space.



Arrangend in 1-Dimensional Space, Left [No Color], Right [Color]. Black Tree (Left), Turtle (right), M. © (2018)

How about color? If we measure the amount of yellow in the image, that will put the tree to the left and the turtle to the right in one-dimensional space. Our yellowness metric isn’t very good at separating turtles from trees. We need a different variable. Let’s add a dimension for roundness.

Now we’ve got a 2D space with the round tree up and the turtle down. But if we look at more data, we may come across a turtle that’s round, and a tree that isn’t. Maybe there’s some way to measure shininess. Turtles are usually shinier than trees, now we created a 3D space with three variables. And ideally, when we get a new image, we  can measure those 3 variables and see whether it falls in the turtle region or in the region of the tree. 
Dacing Tree (Woogie), M. © (2018)

If we want our model to recognize, not just trees and turtles, but…all these other things. Yellowness, roundness, and shininess don’t capture what’s distinct about these objects. That’s what deep learning algorithms do as they go through all the training data. They find more and more variables, create more dimensions, that help improve their performance on the task and in the process. A mathematical space with over 500 dimensions.



5Th Dimension II (clustering regions in a multidimensional space), M. © (2020)

We as humans are consciously incapable of picturing this multidimensional space, but these AI-models use it. This is called latent space. Those 500 dimensions, or axis, represent variables that humans wouldn’t even recognize or have names for, but the result is that the space has meaningful clusters: a region that captures the essence of turtleness.

A region that represents the textures  and colors of photos from the 1910s. An area for chess and an area for players, and chess-players somewhere in between. Any point in this space can be thought of as the recipe for a possible image. The text prompt is what navigates us to that location. But then there’s one more step. Translating a point in that mathematical space into an actual image involves a generative process called diffusion. It starts with just noise and then, over a series of iterations, arranges pixels into a composition that makes sense to humans.


Three stages of an Elephant-Fish-Monster,  M. © (2018)

Because of some randomness in the process, it will never return the same image for the same prompt, every trial will generate another slightly random image. And if you enter the prompt into a different model designed by different people and trained on different data, you’ll get a different result. Because you’re in a different latent space.

The ability of deep learning to extract patterns from data means that you can copy an artist’s style, but not the art, just by putting their name in the prompt.

“What the heck? The brush strokes, the color palette. That’s fascinating. I wish I could like — I mean he’s dead,  butI would go up to him and be like, "Look what I have!" Oh, that’s cool. Probably the only Miró that I could afford anyways.”
 
A chair towards Miró,  M. © (2019)




Prompting Mirrors of the Self


The latent space of these models contains some dark corners that get scarier as outputs become photorealistic. It also holds an untold number of associations that we wouldn’t teach our children but that it learned from the internet. If you ask an image of the boss, it gives you a bald white guy. If you ask for images of nurses, they're all women. We don’t know exactly what’s in the datasets used by these AI-companies, but we know the internet is biased toward the English language and western concepts, with whole cultures not represented at all. 

Mirrors of the Self, M. © (2020)

In one open-sourced dataset, the word “Asian” is represented first and foremost by an avalanche of porn. It really is a mirror held up to our society and what we deemed worthy enough to share on the internet in the first place and how we think about what we share.

We are on a voyage here; this is a bigger deal than just the immediate technical consequences. It's a change in the way humans imagine, communicate, work with their own culture.


Pyramid scheme {bird~man~eye~sun~chair} M. © (2019)





Part Deux: Running up the Tower of Babel


Babel's Image Archive, Image #8342365033547796


The Babel's Image Archive runs an algorithm that creates a randomized landscape of pixels in a 640 by 416-pixel frame using 4096 different colors. The archive contains 4096^266240 unique transfigurations. You can upload any image you have on your computer and get a slightly pixeled version of it in return with a string of numbers that corresponds to its location in the archive. It contains a pixelated picture of the day you were born, it contains an image of every piece of art that currently exists, every piece of art that has ever been created, and every piece of art that could ever be made. It contains every frame of a theoretical low-resolution movie of the universe from the beginning of time to the end, from every conceivable perspective. In fact, it contains every image that could ever exist. You can upload an image and find its location, or if you have 10^961748 years on hand you can click on the universal-slide-show and simply wait for your image to appear.

This web page is a different section in another more famous website called the Library of Babel, which is based on the short story by Borges, an Argentine writer who often grappled with the idea of infinity.

In the story, The Library of Babel is a seemingly infinite construction of hexagonal walls containing shelves upon shelves of books filled with every possible combination of characters that could fit in 410 pages. It contains everything that could ever be written, Homer, the complete history of the world, a description of your birth as it will occur as well as many false descriptions of your death. Since there is no filter for meaning, as you can imagine, the library is overwhelmingly filled with noise.


The Works, M. © (2021)



Mandy, Mandy, Man, M…


If you have a set of computerprograms, or just one computerprogram randomly hitting keys on a typewriter for an infinite amount of time it will almost surely type any given text. It will also almost surely type every piece of text that could ever be written, such as a finished version of Charles Dickens, “The Mystery of Edwin Drood”

The problem is that the probability of even enough computers filling up the entire observable universe typing away for a period hundreds of thousands of orders of magnitude longer than the age of the universe, successfully typing the 6 remaining instalments of Edwin Drood is so low it might as well be zero, but technically it isn’t. Despite this, some have attempted experiments with many fundamental  constraints. Someone named AJ got a bunch of computer programs to spit out sequences of text reduced to 9-character and matched them to all the works of Shakespeare, it eventually replicately-generated the complete text.

Real monkeys pose a different challenge. The simple shape of the keyboard would cause an uneven distributions of which keys are hit, also...the monkeys would probably hit a few keys repeatedly, get bored, and then piss on the machine. Which is exactly what happened when monkeys at a zoo were given a computer by researchers at Plymouth University. 

The canvas, that if you are impossibly lucky could reveal to you the most meaningful images of your life. It could reveal the most powerful work of art you have ever seen. It could reveal the blueprints for a spaceship that travels faster than godspeed. You could find images of Mandy looking at Mandy looking at every image of Man in the image archives. But you will never find anything there, you will just find endless pictures of noise.

Mandy, M. © (2021)
 
The Library of Babel contains 10^4677 books, there are around 10^80 atoms in the universe,  you could fit the universe inside the library of babel an astronomically large number of times. The Babel Image Archive contains 10^961755 images, so you can fit the library inside the image frame multiple times. If you somehow unzipped these libraries, it seems like they would simply crash the universe. The Image Archive of Babel nor the Library of Babel can fit inside this universe.

You would be extremely lucky to find even a single coherent sentence in the library, an image with a concentration of pixels that even approaches looking deliberately made. We could ask a million people to watch their computer screens running the Universal Slideshow just to find something interesting, even just a small group of pixels of the same color, something that isn’t pure random noise.

A library full of every arrangement of bits of sound to produce every piece of audio that could be made...or even simpler. Arrange every possible note to produce every possible song, every possible melody. That’s exactly what DR-NR did when they created a computer program to generate all 8-note melodies of the C-scale, the area where most popular music resides, that have ever existed and could ever exist, which comes out to be just 68 billion. 
Sheet Music, M. © (2021)

The project was the result of trying to prove a point, to provide a counter  argument in response to a series of court cases on supposed “copyright violations” concerning melodies that one artist “plagiarized” from another. The claim was that one artist could “subconsciously plagiarize” the melody of a previous artist without consciously remembering and stealing that work. Anyone who knows a thing or two about music knows this is a dangerous precedent to set because the amount of 8-note melodies that can be made are finite, in fact much smaller than the still finite but larger library of babel.

These three libraries combined house basically every piece of creative work a human being could make, naturally it questions the nature of originality. If you ever worry that a story or movie, you are planning out isn’t original enough, you’re right, it literally already exists...somewhere.



1st Matter


: Combine the symbols for 'water' (squiggly lines) and 'vessal' (bowl shape).”


(无为)(Water) , M. © (2021)

In the short story by Jorge Luis Borges, where a group of “Purifiers” would go around the Library and condemn entire walls of books, throwing them down the infinite shaft, getting rid of everything they deemed worthless.

But how does one construct a machine to look for meaning? What if there is a text that uses very little real words but is nonetheless extremely moving somehow?
How about all the words that haven’t been invented yet? How do you find the truth in a sea of noise? 

What about abstract artworks?

You may have seen some of the many oddly mesmerizing and musically beeping videos that try to visualize the different ways a computer can sort different elements into an ordered list. Algorithms: Heap Sort, Quick Sort, Bubble Sort, all with the same goal; to take a bunch of random elements and order them. And then there’s - Bogo Sort -, easily the most popular sorting algorithm for these videos, not because it’s any good, quite the opposite, because it is the worst, most useless sorting algorithm. It’s something of a joke.

While Bubble Sort is considered the generic bad algorithm because it is very inefficient, Bogo Sort is as inefficient as someone could possibly get. Bogo Sort takes whatever elements you give it, reshuffles them randomly, checks them, and if they aren’t ordered, reshuffles them again until they are ordered. It’s kind of like someone in the Library of Babel going to a random shelf, picking up a book and expecting to find Homer.

“I went to level {-2} and placed the book of Van Doesburg, next to the book, About Cooking, and another book about Thermodynamics. 

What the hell, I just place Torres-García next to it.” - 
The Library 


Taken from the Library: About Cooking, Theo van Doesburg, Thermodynamics, Joaquín Torres-García, #, M.© (2019)

The interesting thing as many have pointed out is that Bogo Sort is the fastest sorting algorithm of them all…if you are  astronomically lucky. It could shuffle it perfectly on its first try. It’s time to complete its sort is 0 to infinity. The Quantum-Bogo-Sort is even more guaranteed. It generates all possible permutations in every universe and simply destroys every universe except for the one it is sorted in. Bogo-Sort has a certain allure, people treat it as if it were a person. Some users have uploaded ludicrously long videos and commenters point out timestamps where Bogo gets so close but ultimately fails. There’s this psychological itch that remains with all these impossible odds.

Maybe Bogo will work instantly this one time that the libraries will reveal something to me because I’m special in some way and...well the odds aren’t technically zero. It’s the same mentality that makes the lottery work. 

“I've been up until the morning, just really trying to change things, piece things together. I've done about 7,000 images. It’s ridiculous. It’s not dancing, and it could be better.” - About AI-Image generating applications, from Part UNO

The great thing about art is that I rarely feel like someone is simply filling in some slot in some universal list, as if we were just computers generating permutation after permutation of every work of art that could be made until we happen upon the “Ultimate Artwork.”

The reality is  that The Image Archive of Babel - The Library of Babel - The Audio Library of Babel, all contain everything that could ever be written, imagined, or heard but you will never find anything useful. You simply can’t. They’re about as useful in finding something of meaning as saying: “the meaning of life exists somewhere”. The only way you find meaningful art in The Library is by compiling it yourself.

.

Comment:  “AI-Image-text-prompt-generators are just a tool after all, like a pencil, a camera. Don't forget in the 19th century the most brilliant minds said the camera is going to end art. Well, it didn't. When I see those images all over the place, first you see the picture: that is a boring dog, a replica of an old bird, or a store advertisement.
The true question is, is it good art or is it bad art? A  surrealistic parrot as a variation of pictures of old parrots combined with Dali, Gold Earings, this is a copy of Vermeer, this is pretty crappy illustration. It has no imagination or creativity.


If they just could programm something like the end of Edwin Drood, written by Marcel Proust, and have it come out looking like typical steampunk...”


Imaginational Theory 06: AI & Art, Idea-Set-Match Compiled by M. Moonen 08.2022 [EDUCATIONAL PURPOSE ONLY] Triple-A Society, M. Production