378. Three forms of drawing AI

[377 is the content of the election must not come out.,The first half of 378 is harmonious.,Can't be written.,I'll put the second half of the free chapter here.] 】

No matter how outrageous the views are, it is an indisputable fact that interest in the field of AI has been increasing with the general election.

This popularity came to a new peak after Meng Fanqi announced that he was about to release a real, text-based artificial intelligence.

Because nearly half a year ago, the trial version of Clip released by Meng Fanqi has already shown quite excellent drawing ability and multimodal understanding ability.

So good that everyone thinks that this thing was developed specifically for AI drawing.

Unexpectedly, just by adding the correspondence between images and text, the model quickly and spontaneously possessed such a strong image generation ability.

And it was already so amazing half a year ago, and now that's it?

For the much-anticipated AI drawing, the internal research and development is actually not smooth, which can be seen from the release time.

Meng Fanqi also hesitated for quite a while, what route should he choose.

The most famous AI image generators in the previous life were mainly StableDiffusion, Midjourney and DALLE.

The SD diffusion model is a text-generated image model based on Clip, which starts with the presence of noise, gradually improves the image until it is completely noise-free, and gradually approaches the text description provided.

The training method has also been refined many times, sampling an image and gradually increasing the noise over time until the data cannot be recognized. The model then tries to roll back the image to its original form, learning how to generate pictures or other data in the process.

As the name suggests, stable, this route is very stable, but it is computationally expensive to produce very high-quality images.

Technically, it has been agreed, but in terms of cost, it seems that it is not very suitable for the market at the moment.

In the previous life, Midjourney was more adept at all kinds of artistic styles, and the images produced often had very beautiful results.

The "Space Opera" that won the gold medal in the painting competition incognito is Midjourney's work.

It stands to reason that this route is more aesthetically pleasing, which can not only have the effect of shocking publicity, but also attract a large number of users, which should be the best choice.

However, compared to the open-source diffusion model approach, Midjourney uses a public platform bot to handle user requests.

Due to its closed profit model, Meng Fanqi knew little about the specific technical details of this AI, and he did not know what its core technical key was, so he had to abandon this route.

"If you look at the popularity and popularity of my previous life, the diffusion model and Midjourney will be more stable, but DALLE has already combined with ChatGPT before I was reborn, and it has great potential, and I need to integrate the two routes considering the future development."

It is precisely because of the need to combine the strengths of the two schools that Meng Fanqi's diffusion mapping AI will be a few months later than expected.

Finally, a relatively mature three-step system of compression, diffusion, and recessal space re-diffusion was formed.

This holistic approach takes even longer to experiment, discuss, and finalize than formal training.

"I don't know when quantum computers, which have an order of magnitude increase in computing performance, will be able to get out, but if the computing power is fast enough, it can actually save a lot of trouble." Meng Fanqi still felt tired when he thought about it.

One of the biggest reasons why so many modules need to be split is the problem of computing resource consumption.

The resolution of the image is squared, and the operation in the T method has the operation of square in dimensions, and the user feels that the image with 256 and 512 resolution is about the same, but it can reflect the overall situation and it is often an order of magnitude improvement.

For this reason, the learning steps of the diffusion model had to be sampled in a low-latitude space.

To put it bluntly, the resolution is lowered first, which greatly reduces the amount of computation before and after the diffusion.

"Does this hurt performance? Is the resulting image not good enough? "The decision to release this version of the diffusion model that has been emasculated in computing power has also raised such concerns within CloseAI.

After all, the algorithm can actually do better, although it will be more expensive.

"It's not just a matter of calculating time, it's also a matter of memory. Without this splitting and castration of image resolution, the same card is not only an order of magnitude slower, but also several times fewer tasks can be performed at the same time. Meng Fanqi insisted on solving the problem of the number of users first, and the performance and effect could be slowly optimized.

It's like a huge fat man coming to eat, not only does he take several times as many times as many times as others, but he can also sit in four seats alone.

In Meng Fanqi's view, before ControlNet was proposed, the first drawing AI released was just a toy.

It doesn't hurt that its performance fluctuates up and down, because the success rate of early high-quality mapping is not high, and it often requires a lot of testing to pick one that can be seen.

This is mainly due to the fact that both Wensheng and Tusheng lacked a particularly good means of control in the early days.

"The diffusion model we are launching now is still used with a large amount of text input to control the output of the image. However, it is very difficult for words to clearly describe a specific image, and even if a large number of attempts and a large amount of generation are required, it may not be possible to get the desired result. ”

This generation mode should also be combined with pictures and texts. We also need to find concrete ways to control the behavior of the diffusion model by adding additional conditions, telling it what to adjust and what not to tweak. It is far more important and a higher priority to produce as much control as possible than the image to appear more beautiful and beautiful. ”

Meng Fanqi is well aware of the biggest problem with early AI drawing, which is that the generated images are like chanting black magic.

In order to get a satisfying picture, it is likely that a hundred or so keywords will need to be sung.

At that time, many people laughed and said that playing AI drawing was like a cyber cult, muttering a lot of things that others seemed to understand.

There are even a large number of high-quality image keywords packaged and sold directly.