Internal Structures 2

I’m afraid that I didn’t say what I meant to say last week.

What I meant to write was an essay concerning the fact that neural net programs create pictures in a fundamentally different way from a human. While a human creates an image by first determining the overall shape of the subjects and then filling in the details, the programs create an image by calculating what pixels would be surrounded by other pixels, essentially creating the details first. This leaves the program very good at imitating photos, but rather poor at simulating highly stylized art, such as the visual novel character designs that EndlessVN hoped to emulate.

To get around this, I proposed a program that could use a hard-coded skeleton to define the overall shape of the human body, and then having the program draw features over that skeleton. More fully, I was envisioning a program that could use this skeleton to draw the same character in several different art styles, with certain features being held constant, and possibly defined by the user. Hence, the same process that could draw a character in the style of a visual novel could also draw the same character as something from an American comic book, or a Victorian painting.

I also proposed breaking up designing characters and posing them into two separate process, both of them using the skeleton to keep things consistent between the two. In the same way a character can be made recognizable between art styles, a similar process would be used to keep different images in the same style consistent between them. I suppose what I’m proposing here is something like the animation industries style sheets, and I think that anything like I’m talking about could be used for creating animations as well as drawings, as long as processing power is available.

I then attempted to extend the idea of internal structures, such as the skeleton used for drawing, to GPT. The main thing I was thinking of was hard-coding the concept of characters into the software. This is the pain of working with GPT, so much so that Novel AI started adding descriptions of characters to the training data, so that the users could use the same formatting the team used to get the program to keep the details straight. A discussion on NAI’s reddit, concerning creating a beneath-the-surface record of who said what, rather than just having to have everything be right there in the text.

Regardless, I don’t actually care much about any of this. What I actually want right now is a website where I can upload something I drew and have it redrawn into something good.

Internal Structures

When the page loads, you immediately see a drawing of a girl. She has silver hair, and is wearing something that calls to mind a Japanese serafuku. Or, rather, your brain begins to starts to interpret it as a school girl uniform, until you catch up to your eyes and realize that the flesh-colored blotches on her chest cannot be hands. In fact, she doesn’t have hands; her arms simply fuse into each other, leaving you wondering why a service that bills itself as ‘AI’ would use a picture that makes it so very obvious that their program has no understanding of human anatomy.

This is the experience of opening the webpage for EndlessVN, a service that seeks to do for visual novels what Novel AI does for literature. Interestingly, while most services would stick to one neural net program, EVN seeks to recreate the experience of a visual novel by stapling several programs together, having different programs handle the text, the pictures, and the music. I’ve tried out the free version, in case you’re wondering, but it was too slow for me to do anything; but before I talk about EndlessVN itself, I want to talk about picture generators.

Picture generators are much like text predictors, in that they both allow the user to probabilistically generate a kind of data. But while text predictors spit out words, picture generators fill pixels with colors, using the colors of the pixels around each one to determine the RGB value of each individual pixel.

A quirk of this method is that, while a human is good at creating an impression of person using nothing but lines and space and would be hard pressed to create a photo-realistic face of someone that doesn’t exist, the program is the opposite. The program relies on the the fact that a photo will have patterns of texture on a human’s skin, hair, and clothing to tell where a hard edge, like where the face ends in a picture and the wall behind it begins, would be. This isn’t possible when it’s imitating drawings, which are dominated by solid blocks of color, whether those colors are supposed to represent the foreground or the background.

But even putting aside current image generators difficulties replicating the anime style, I don’t think Endless VN needs a block of pixels of a given size for all of it’s images. It works fine for backgrounds, for the characters, I feel that you need a completely different paradigm.

The fundamental problem with getting a program to draw a character is that you want individual drawings to be consistent. If a character is blonde and has green eyes, you want every picture of them to be blonde and have green eyes. If a character is wearing clothing for a particular scene, you want them to wear the same clothing for the entire scene. And if you want a character to have a cowlick coming off of the back of their head, you want every picture of them to have a cowlick coming off the back of their head.

In other words, there’s a difference between creating a design for a character, and making drawings of that character in various poses. I suspect that you would need different kinds of programs for each. The first would be concerned with generating variations around a set of attributes, such that not every blue-eyed beauty with freckles looks like every other blue-eyed beauty with freckles, and the second would focus on moving the body parts around, and giving the drawings some facade of emotion. I think that both of these would need some understanding of the internal structure of the human body, even if the end result looked like drawings, and I rather suspect that such an internal structure would need to be hard-coded.

But hard-coding internal structures into neural nets isn’t an idea that’s limited to pictures. As it stands, GPT doesn’t really know when someone is talking, just when words are between quotation marks. If it were possible to hard-code some idea of what a character is into it, and allowing it to create personalities in the same way our first program above, it would bring us so much closer to the dream of a program that can simulate an entire, arbitrary world.

Some GPT services have already started to put character profiles right in the training data, so that when the user goes to describe someone in the form of [name: / appearance: / personality:…], the program has something to latch onto. Even still, text predictors still have difficulty keeping characters straight. And like understanding the word ‘not’, I suspect that this is for mechanical reasons, that no amount of training data can actually overcome.

The Consistories and the Episcopate

This is something I wrote in Holo AI. I selected the metadata (tags, expected length, year) and typed in the memory and author’s note, then I let the program run and selected which ever output I thought was most interesting. I’m posting this because it is appropriate to show you something I wrote using a text predictor, and because it is a good example of what you can expect from such a program, including how liable it is to go off in an unexpected direction.

I am also, of course, publishing the tags, memory, and author’s note along with the main writing. I do this so that you can have some idea about what effect each parameter has on the output. In particular, I refer to how the fantasy tag seems to have been completely drown out by the combination of the religion tag and the writing year being set to 1957.


Genre tags: religion 0.85; fantasy 0.50; political-philosophy 0.64

Document length: 10,000 tokens

Publication year: 1957

Memory:

There is a great conflict within the Apostolic Church, between those who believe that responsibility of guiding the faithful should rest with the bishops, who each can trace their line of ordination back to the original Twelve, and those who believe that that responsibility should lie with the consistories, the councils whose members are chosen among the congregations.

Author’s note:

The author will describe the conflict between the episcopals and the consistorials, while expressing sympathy for both sides.

Output:

But I must turn my attention to the danger which is posed by our lack of clarity.

The church’s present membership, if we allow for the clergy, is almost one hundred thousand; but many people doubt that we are as large.

Yet this does not have to be so. Our growth may be greater than any statistics show. In order to understand why, we must take a brief look at the state of our members.

Let us say first that our membership is declining. In ten years, there were seven million members, but by five years ago, that number had fallen to six and a half million. Why? This decline may be explained as follows:

First, many of the new members were attracted by the search for greater peace in society. They came to the church with this idealistic hope, but once they found themselves there, the idea was not so appealing. There are many persons who find the Church’s teaching too abstract, and do not like the fact that it is not practical. So they leave, often without regret; for they feel that their efforts at social improvement have been wasted. It is not difficult to understand why.

The world of today has few problems which cannot be solved by human effort; and when a person comes to the church because he wishes to have a share in solving these problems, he may be disappointed if he does not find the answers here. Many churches were founded on such an idea. We do not need to look far back in history to see that this is a common source of dissatisfaction among the faithful.

Another cause of defections is a kind of intellectual laziness which we often encounter in our members. Here again, one can go back in time to find its origins. The spirit of our fathers was a simple, God-loving, peaceful people. They accepted the church as something higher than themselves; but they also felt that the church must teach them how to live better, and give them the tools to accomplish that task. This is why they used to say, “I believe but I must know.”

It is understandable that people who have had the faith of the saints should feel this way. When a saint dies, he does not leave behind a complicated, longed-for, earthly inheritance. He leaves his knowledge in the hands of others, so that it may be passed on to those who come after him. And it is these successors who use this knowledge for the good of mankind. Therefore, a man does not lose anything by believing; for he still has the benefit of the heritage of the saints, even if the priests are unable to explain it to him.

But there is another reason for the decline of our membership. It is not only that we are losing the best of our members, but that we are also gaining a number of undesirable elements. Our attitude toward the world of today is a sad example of this. The main aim of the church should be to maintain the purity and the peace of Christ. But the world has become full of strife. Instead of the ideal of harmony, we have the chaos of war and struggle. We, the children of the Church, are not responsible for this state of affairs. But, alas, we cannot keep ourselves aloof from it, either. Our members are surrounded by the world.

One does not have to travel far to see the signs of this. There are the people who think that peace and harmony are possible, and so they try to bring these things about. Yet, there is no end to the difficulties of achieving it. What would it take to change the world? They do not know. And yet they talk about peace and harmony all the time. Then there are others who fight and quarrel. They do not agree on what is right or wrong; they do not even know what is true and false. But they attack each other with vicious words, and make their situation worse. In our own country, we find that the world of today has been transformed into a battlefield of words. There are those who advocate freedom, and there are those who do not want to hear of it. There are some who wish to stay at home, and those who would like to go out and meet the world. Thus, there are those who preach a theory of life which has already gone beyond the church, and there are those who simply want to continue as they are.

How can a person whose mind is bent in one direction seek guidance in the church? How can a person whose eyes are blinded by something outside the church find an answer within its walls?

This is not the first such problem in the church. There have been many times in the past when the faithful have asked themselves questions like these. But in the early days of the Church, these problems were solved, because the original Twelve and the Apostles were the real guides. They were chosen by the Holy Spirit and their knowledge was confirmed by miracles. But as the years passed, the apostles died, and their successors could not be so easily verified.


If you like this piece of writing, please support the House Apart on Patreon.

Uses of Text Predictors

You can’t just set up a text predictor and have it go. In addition to evaluating each output, the user constantly needs to be manipulating the memory and the author’s note, making sure that relevant text keeps getting sent to the program and that the character’s don’t teleport around. In addition, it’s sometimes easier to throw out what the program sends out and write something yourself. All of this together means that using a text predictor saves absolutely no labor.

But there are still things text predictors can be used for. The first and biggest use is ideation. A machine that can make a sentence that follows from a few sentences of writing, or more, if you keep hitting continue, obviously has profound implications for thinking up characters, magic systems, factions, or any number of things.

For example, let’s say I have a faction in a story called the adjurists, and I have no idea what adjurism is. I can write a few sentences about what the adjurists have done in my story for the memory, put [Describe the philosophy of the adjurists.] in the author’s note, and there’s a pretty good chance that the computer will spit out an article useful to worldbuilding. This is especially true if you use a module based on an encyclopedia of philosophy.

The next use that comes to mind is using the predictor to write scenes the user is uncomfortable writing. For example, let’s say that the user is uncomfortable writing fight scenes, so they write the story up until the point that the fists start flying, load it into the predictor, write some notes as to why these people are fighting in the memory, and use [Writing style: vivid, violent] for the author’s note. This may or may not work.

While it is true that text predictors do have tools to write in a radically different style than the user, for the most part, the programs tend to imitate whatever text is fed into them. While this is partially simply an effect of the mechanics of how the program works, it’s also because the creators want the text generated to flow from the input as seamlessly as possible. Still, this can be overcome with enough attention and effort focused on the metadata, or the use of a module.

While the tendency for output to resemble input causes problems when trying to make the computer do the work, it does point to another way of overcoming difficulty with writing certain kinds of themes. Namely, practice. You can use the output to monitor what the input would look like to someone who didn’t write that exact sentence. This means that one can alter their writing until the program starts spitting out something they enjoy reading. While this isn’t as good as showing your writing to another person, it’s still better than nothing.

To sum up, the best thing to use text predictor like Novel AI and Holo AI for is to help with coming up with ideas. Trying to get it to write things for you takes too much effort. It can also be used to practice writing scenes and styles that the user isn’t used to, but it might be better to practice writing those yourself, which the programs can also help with.

I’ll leave you with one last piece of advice: don’t use a text predictor if you have a clear idea of what you want it to write. If you do, you’ll waste a lot of time trying to keep it on track, and you won’t be able to wander off where it takes you. You’ll still need to edit things to keep the characters from teleporting or perhaps to knock the program out of list mode, but it’s still the best way to use them.


If you like this essay, please support the House Apart on Patreon.

Comparisons Between Text Predictors

Holo AI and NovelAI both came into existence after another service, AI Dungeon, decided to start censoring private stories. This decision was doubly infuriating due to how AI Dungeon was notorious for responding to the phrase ‘mount horse.’ This is why both Holo and NAI are particularly interested in making sure that only the user can read the stories on their account.

Both Holo and NAI run off of the same underlying model, GPT-J-6B. While this model is considered to be technically inferior to the GPT-3 AI Dungeon uses, there is a difference in finetuning. This goes into how GPT models are made, which essentially consists of feeding a program a large amount of text so that it has something to imitate for the public. This is called ‘training.’ Individual services also feed their models a bit more text, called ‘finetuning,’ which should leave a service optimized for making a particular kind of texts, such as novels or interactive fiction.

While AI Dungeon’s finetuning was theoretically made for interactive fiction, most of it’s finetuning text came from a particularly god-forsaken corner of the internet, causing the issues surrounding ‘mount horse.’ Novel AI, on the other hand, was finetuned with professionally published novels, while Holo was finetuned with a mixture of published books and highly rated amateur fiction.

Beyond the finetune, the most easily noticed difference between the two services is how the user is expected to guide the machines output. Besides providing the initial text for the program to imitate, the philosophies of the services sharply differ, and this is probably going to be the biggest difference between the services for the foreseeable future.

Novel AI gives the user the ability to manipulate the probabilities of a particular set of words being generated by the machine. This is done by giving the user access to several sliders, controlling things like repetition penalty, temperature, nucleus sampling, and top-k sampling. However, because manipulating the sliders directly requires the user to understand what nucleus sampling and so forth are, I suspect that most users simply select among the presets the service provides, depending on if they want the scene to continue, or if they want something unexpected to happen.

Holo AI doesn’t provide nearly as many sliders. There are still sliders available, but not nearly as many. Instead, whenever the service generates text, it spits out to chunks, that the user is then expected to choose between. This gives the user the ability to guide the story, even when not typing anything. While both services can tell the program to try again, and both, generally speaking, eventually require the user to simply write something to get the story going, it’s still a different set of ideals between the two teams.

As for the teams themselves, NovelAI has the larger team. Not only that, it also has the larger user base, which means that there are more people experimenting with it, which means that how NAI acts is better understood. So much so, that Holo AI’s community frequently has to borrow what they’ve learn and applying it to their own model.

Actually, now that I think about it, Holo playing catch up is something that keeps happening. I know that it’s in large part due to the small staff, but it still means that there are features that Novel has over Holo. The biggest thing that NAI has over Holo, even though the Holo team is currently working on something similar, is modules.

Modules are something like a small finetune, that the users can make and switch out during generation. To do this, the user must prepare a corpus of text and load it into the program. They then set the module making program to run. To prepare the corpus, the user must not only collect the stories they want the program to imitate, but also make sure the text is formatted correctly, including making sure that their using the right kind of quotation marks. In addition, intense computational resources are needed to make the module, which currently means that NAI users need to make a commitment of either time or money while the program runs.

The upshot of all of this is that modules can make the program imitate authors and works that it wouldn’t otherwise be able to, whether it’s that one fanfic author you really like, or the Stanford Encyclopedia of Philosophy. The imitation can also be much more intense, depending on how the user makes the module, but this is also depends on what the program input is during writing.

Some of the functionality of modules can be replicated with metadata, for example, you can get the program to imitate Arthur Conan Doyle by typing [Author: Arthur Conan Doyle] into the author’s note, it’s possible that the program won’t recognize other things you might want the program to imitate, like the entire genre of cyberpunk, or if it does, the results might be weaker than you like.

That said, Holo AI does give the user a greater ability to manipulate metadata. First, Holo gives the user multiple data sets for the program to imitate, each with their own unique style of manipulation. For example, the data set for professional novels allows the user to weight various tags against each other, while the fanfiction data set gives the user a third field, analogous to the memory and the author’s note, for a plot summery. While I haven’t had a chance to use the field myself (I’ve had no particular fanfics I’ve wanted to write), I’ve heard good things about it.

How the services expect the user to guide output is a wash. In terms of team size and community size, Novel AI is a clear winner. Holo AI is still waiting for something like modules, although I don’t believe that Novel AI has any plans to imitate Holo’s additional metadata features. However, there is one area where Holo AI blows Novel AI out of the water: the price. Holo at its most expensive is cheaper than NAI at its least.

The price alone is enough for me to recommend Holo over NAI, it even has a free trial that let’s you generate up to 8000 characters. However, before you spend any money, I do think it is worthwhile to think about what you want to use the service for. That’s going to need another essay to cover.


If you like this essay, please support the House Apart on Patreon.

Writing with Text Predictors

The Chinese Room is a famous thought experiment about artificial intelligence. The premise is that there is a man in a room, let’s say this man is a 19th century English gentleman, that cannot read Chinese, but it is possible to hand him a piece of Chinese writing, and it is possible for the gentleman to hand some Chinese writing back. It’s usually specified that the hanzi than the man is handing back is printed on little squares, so that it’s not possible for him to miswrite a character, but anyway, let’s say that the gentleman has a big chart telling him how to respond to any given piece of writing, and that these responses are completely indistinguishable from the writing of someone that speaks and writes Chinese fluently. What, precisely, is giving the response?

I’m not sure how to answer that question, but let me give you another scenario. Let’s say another man, let’s say a retiree who lives in Florida, is put in a different room, and is put in there with a big piece of paper and some dice. Let’s say that it is possible to pass some number of poker chips into the room. These poker chips each have a number written upon them, and the man is told to draw a big chart describing the probabilities of any given number showing up in a given sequence of numbers. What the man does not know – indeed, has no way of knowing – is that each number corresponds to a series of characters, whether letters, numbers, or punctuation.

Now, when the chart is finished, the people running the experiment start passing tokens into the room, and the man, following his chart, passes out a series of tokens of his own that, when decoded back into characters, are surprisingly grammatical. There are two interesting aspects to this scenario. The first is that the researchers have no idea what is on the chart, and can only guess at what causes any particular sequence of tokens to come out. The second is that the retiree is making these grammatical sentences without realizing that there’s any meaning assigned to any poker chip. In particular, he does not know which number corresponds to the word ‘not.’

This is important. To understand why, let’s say that the researchers are writing a story, and one of the characters in the story has a very large house. The researchers, simply writing the story without much of an idea of what the man will send back, describe the house as ‘not small.’ These words are tokenized, and sent into the room, and what comes out is a series of words that seem to describe how unbearably cramped the house is. The retiree has no way of knowing that the numbers for ‘not’ and ‘small’ have any relationship to each other, beyond that they are right next to each other.

This, essentially, is how the GPT series of programs work.

Over the past two months, I have been experimenting with two GPT based services, Novel AI and Holo AI. Although both of these services have ‘AI’ in their name, they’re really nothing more than text predictors. The ‘not’ problem is the biggest reason why I can’t call them intelligence, but nevertheless, there are situations where they can be useful for writing.

First, I need to describe what working with a text predictor is like. The first thing you need to understand is that the program spits out little chunks of text, usually forty to fifty words in length, but this can be adjusted, based on the story that was written before. Now, when I speak of what was written before, understand that only so many tokens can be sent to the retiree, equaling around 1500 words, meaning that if a story goes too long, the program will stop generating text based on the earliest details.

To compensate, the programmers have provided methods for the user to retain details, called memory and the lorebook, which is also known as world info. The memory is a chunk of text, written in a special pane on the side of the screen, that’s attached to the top of what ever text is sent to the generator. The lorebook is similar, only the lorebook text is only attached when specific words show up in the text.

As an example, let’s say we want a recurring character named Rachel, who’s the main character’s boss, has brown hair, and is married. We would go into the lorebook, make a new entry, write out something explaining who Rachel is and what she looks like, and finally, give it the keys “Rachel” and “boss.” Now, when ever the words “Rachel” or “boss” get sent to the generator, our lorebook entry will be attached to the top of the text.

This is where the ‘not’ problem comes in. If we wrote “Rachel has no romantic interest in the protagonist” in the lorebook, the generator is going to spit out text describing how flirty Rachel is with the protagonist. Instead, we would need to write something like “Rachel acts professionally with the protagonist” to get the results we want.

Continuing on, the next thing to talk about is the author’s note, which is where most of the user’s work is done. It works similarly to the memory, only instead of being attached to the top of the text, it’s inserted a few lines from the bottom. This means that it has a much greater effect on the output.

For example, if we wanted to make the program output a sad scene, we could write something like [This is a sad scene] or [Tags: sad, depressing, dreary]. The program would then output words that resembled what was flagged as sad in it’s training data.

As I said, this is where most of the work on the user’s part is done. While the memory and lorebook can simply be set and forget deals, the author’s note effect on the output is so great that it needs to be constantly fiddled with, sometimes as frequently as a few paragraphs apart. This is especially necessary when you need characters to be in particular positions. This is because the program doesn’t envision characters existing in space.

If your wondering why I used brackets for the author’s note, it seems to have something to do with bad fanfiction. When the initial GPT models were being created, the programmers basically fed a large chunks of the internet into the program, and what came out was something that could create text that was similar to what went in. At least one model’s initial data included really bad fanfiction, the kind where the author feels the need to point out if a scene is supposed to be sad, usually in a sentence between paragraphs, encased in brackets.

However, while this technique is horrible for normal writing, it did provide a good way to control the predictor’s output. I’m not sure if the programmers of the services I used deliberately set up the author’s note, but it wouldn’t surprise me. As for what, exactly, the brackets do, I’m not entirely sure. They seem to weaken the links between the output and the bracketed input, as bracketed text in the training data is less likely to affect the story directly and simply be something that the writer was telling the audience.

Or that’s what I think is happening. Nobody actually knows what’s on the big chart in the room, so the only way to have any idea of what input is likely to have what output is to run experiments and see what happens. It’s a fascinating idea, humans building machines that they don’t understand.

This post is getting long, so I’ll leave you on that philosophical note. I’ll come back to this, and hopefully, I’ll be able to give you a comparison between Novel AI and Holo AI.


If you liked this essay, please support the House Apart on Patreon.