Chapter 113: An Unexpected Invitation

Previous Chapter Next Chapter

Text Size:

Appearance:

Compared with the Tiger algorithm, which has a significant effect and is very good, the effect of the mobile optimization sorting algorithm is slightly worse.

Therefore, Meng Fanqi did not rush to promote the online test, but waited for the update that combined with the AI language interpretation model to be ready to push together.

Recurrent neural networks (RNNs) and long short-term memory (LSTMs) are often used for language problems, both of which are old approaches from the end of the last century.

These two methods were so simple and easy to use that they flourished until around 2017.

Until Transformer, that is, ChatGPT's T-method, came along.

In general, it is generally accepted that the Transformer method can quickly replace RNNs and LSTMs mainly because it is more convenient to work in parallel.

The core significance of the fact that it is easy to be parallel on multiple devices is to make a large version possible, which also laid the foundation for the ultimate giant model like ChatGPT.

"In fact, the old version of RNN also has a way to do a good job of parallelism, and there is a big misunderstanding of this in the field." Meng Fanqi frowned and thought.

Originally, after Transformer came out, everyone put down the research of the old method at hand and embraced the T method.

But in 18 years, someone actually made a high degree of parallel RNN, but unfortunately it was too late.

If this discovery could have been made a year earlier, it would have been possible for RNNs to be a competitor to Method T for a long time, and we could have seen the emergence of ChatRNNs.

"The early T method required a lot of data, various parameters were difficult to adjust, and the computing power required was also huge." Even though Meng Fanqi made an improved version of many of the methods that matured later, the T method was still troublesome in the early days.

"Fortunately, there is no shortage of data and computing power at Google, and I am familiar with various classic parameter settings." Meng Fanqi first wrote a rudimentary version of the T method and tested it.

"However, due to the limited memory of the current graphics card, there is no way to make the model very large, unless I specifically develop advanced parallelism methods such as DeepSpeed."

Training a model on multiple cards may be for speed, or it may be because one card can't fit on.

Among them, data parallelism is the simplest, that is, different cards are doing the same thing, and each card will store a model.

It's just that the input data is different, and after the different cards have completed the calculation, they will be integrated and updated together.

It's like everyone taking the same knife to cut different dishes, and finally piling the cut ingredients on top of each other.

But sometimes, you can't put the model on a card at all, so it's more troublesome. Because one person can't hold the knife at all, it requires the cooperation of multiple people.

Each layer can be split into different cards, and different layers can also be assigned to different cards, so that in fact, multiple cards are used to achieve a similar effect to single card training.

Obviously, the former will be much easier than the latter, and the former will only need to copy these models on different cards and read the data separately for calculations.

The latter needs to be split and merged according to different situations and settings, and one will make a mistake if you are not careful.

Looking at the Google Brain server, there are several batches of 2013 GTX Titans in it, which are really valuable.

Considering other products at the time, the video memory of 6G still stands out from the crowd.

Compared with the 4G flagship model that Meng Fanqi purchased with a lot of money, the extra 2G video memory is enough to do a lot of other things.

Using speed for video memory, Meng Fanqi did a lot of operations on the CPU and GPU to transfer parameters and information repeatedly.

Because before he officially joined the company, Google Brain assigned him 16 Titans for his graphics card, which was allocated to Meng Fanqi for exclusive use and could be used at any time.

In addition, there are 32 GPUs on different nodes that can be occupied.

"At this time, there are not so many Google graphics cards, and this configuration is already quite generous."

There are not only uniformly configured systems and environments, but also good multi-card parallelism and examples.

In another two years, thousands of tens of thousands of TPU will be standard.

If Meng Fanqi wants to integrate AI into the search system, there are three main directions.

One is to get a better ranking of results by splitting keywords and using language models to get their meaning in the real world.

The second is to expand the scale of the model so that it has a certain broad understanding ability, so as to expand the amount of content that can be searched.

The third is to make search engines more able to understand how different language sequences can change the intent of a query.

Two of them are currently more difficult to handle, and the first and third Meng Fanqi have a great grasp.

The traditional RNN and LSTM looping methods make it difficult to handle long statements properly, and the order changes are not well understood.

Meng Fanqi's embryonic T method has unique advantages in this regard.

In addition, although the T method is difficult to learn from small data, it is also difficult to fine-tune individual parameters, and the overall training is difficult.

But this is not a difficult thing in front of Meng Fanqi, an old alchemist, and with the massive data that Google has already prepared, Meng Fanqi is still very confident in the effect of this method.

After putting all the graphics card resources into training, on Christmas Eve of 13 years, Meng Fanqi ended his work journey at Google Shanghai for about ten days.

The training of the model takes a certain amount of time, and the next two axes of the advertising algorithm may be two weeks, after New Year's Day.

finally completed the technique that attracted the most money in his early career, and Meng Fanqi was relieved.

Just when he was planning to start a company and start looking at the workplace and the amount of equipment, an unexpected phone call disrupted his rhythm.

"Hello Mr. Meng, I am the secretary of Li Kaifu of the Innovation Factory, and he really wants to talk to you face-to-face, but due to physical reasons, it is not very convenient to travel, I don't know if it is inconvenient for you to come?"

Lee Kai went? It can also be regarded as a Chinese senior in Google, the highest global vice president, and the first place in China.

Not only that, but he has also held high positions at Apple and Microsoft.

However, after the four-year contract expired in 09, he quit his job and made his own dream of investing in college students with angel funds.

"Where is Mr. Li Kaifu now?" Meng Fanqi was still familiar with Li Kaifu's experience, which should have been in the early stages of his cancer, but he didn't know where he was being treated.

"Mr. Li Kaifu will receive treatment in Baodao North City first, if it is convenient, let's make an appointment, right? In fact, the treatment effect during this period is not particularly good, so Mr. Li has basically stopped participating in any meetings and company work, but he insists on taking a day out to talk to you. ”

"I've just finished what I'm doing now, and I can go and apply for an entry permit tomorrow." Meng Fanqi felt a little strange, although he had made a name for himself in the AI world, there seemed to be nothing that was indispensable for a senior of Li Kai's level.

Especially considering that his current physical condition is not very good.

"But after the processing, it should be two weeks later."

Meng Fanqi asked the secretary, but she didn't know the specific reason, so Meng Fanqi suppressed her curiosity and met in mid-January.

Shanghai flies to Taoyuan, Beibei City, in a total of two or three hours, which is actually closer than going to Yanjing. He really hasn't been to Treasure Island in his two lives, and it's not bad to go to see Li Kai and take a walk by the way.

It's just a permit to enter the island, but it's like a visa, which makes people very unhappy.

Previous Chapter Next Chapter

Back to Book