Chapter 15: The game is over before it starts
In the summer of 2013, there was still about a month before the start of the competition.
"The training process of the model requires putting all the weights, the data, and many intermediate processes into the GPU for processing. Therefore, the memory size of the GPU is particularly important. Meng Fanqi sighed, "Even the flagship 690 we bought is too small, only 4G in size." ”
Compared with the A100-80G, which was later banned from being sold to China by the United States, the 690 has 20 times less video memory alone, not to mention other performance. Meng Fanqi can only pitifully iterate on the model with 16 images at a time.
"Sixteen at a time, close to a million loops to update the entire data set. If you want to converge the model well, hundreds of cycles are indispensable. ”
Meng Fanqi estimated that it would take nearly 20 days for this version to come out with a result, and the final training process did take about three weeks to converge to the current performance.
FORTUNATELY, IMAGENET HAS BASICALLY BECOME A MUST-HAVE TRAINING DATASET FOR EVERY ALGORITHM ENGINEER LATER, AND MENG FANQI HIMSELF HAS BRUSHED THE LIST COUNTLESS TIMES, SO HE IS NATURALLY FAMILIAR WITH THE ROAD AND KNOWS THE APPROXIMATE SETTINGS OF VARIOUS PARAMETERS.
This saved him at least a month or two of precious time.
Even though a training session took three weeks, Meng Fanqi still had a version of the model ready before the competition began.
Seeing that the final performance of the trained model met expectations, a big stone in Meng Fanqi's heart finally landed.
In the past few months, the only thing he has worried about is that the old framework from many years ago will have some problems that he didn't expect, resulting in the final result not matching the theoretical expectations.
Once this happens, the cost of finding the problem and testing to fix it is too great. If it can't be solved in time, it will greatly affect his initial planning.
The current result is around 4.9% top-5 error rate, which is a little worse than the performance in later papers, but fortunately it is still better than the human standard given by the competition.
In general, the specific data used in the competition will not be released before the competition. IT'S JUST THAT THE IMAGENET COMPETITION IS MORE SPECIAL, AND IT IS IMPOSSIBLE TO HOLD ONE OR TWO COMPETITIONS AND DISCARD THEM AND NO LONGER USE THEM.
Therefore, the data used in each competition changes very little, but the specific track, the content of the competition and the way of judging will often be adjusted.
ALTHOUGH IT IS ACTUALLY POSSIBLE TO SUBMIT THE RESULTS DURING THE OFF-SEASON, AND MENG FANQI CAN UPLOAD THE RESULTS NOW AND SEIZE THE FIRST POSITION, THE ATTENTION CANNOT BE COMPARED WITH THE FIERCE COMPETITION DURING THE COMPETITION.
At the same time, Don Juan finally began to realize that things were going far from what he expected.
"I remember finding out that AlexNet's accuracy rate on this was less than 85, and you're now over 95." When Don Juan first came to check the results, he couldn't believe it.
"Are you mistaken? Don't fool your brother. Brothers read little and are easy to be deceived. Don Juan had a complicated state of mind at the moment, and he hoped it was true, but it was hard to believe it because it seemed so good.
"It's fake, I lied to you." Meng Fanqi rolled his eyes, "I added stunts, and they are all chemical ingredients." ”
"No, I've seen this performance converge along the way." Don Juan flipped through the model training log again, and his voice was full of grievances. He has just made up the scene of hugging his thighs and walking to the pinnacle of his life.
This is the poor man who suffers from gains and losses, who can't believe it, but is afraid that it is fake.
"Although I didn't have the real answer to the test set, I scratched 5% out of the training set and didn't use it as a way to verify it." Meng Fanqi is well aware of the variance of this dataset, 95% of the data is used for training, and 5% of the data is used for testing, which is quite a stable and conservative ratio.
"In other words, as long as the 5 percent data is not much different from the data in the test set, your method can be ten percentage points better than last year's champion?" Don Juan was still in a state of extreme shock. "It's as simple as that? I haven't done anything yet, and you're all lying down? ”
Don Juan's feeling at this time was like the first time that Night God Moon found out that he could directly assign the Grim Reaper to get rid of his biggest opponent, L. The imaginary effort, struggle and struggle did not happen and were completely unnecessary, and the amazing results and progress were achieved even before the official start of the competition.
"That's life. Success and failure may not have anything to do with you in many cases, just get used to it. Meng Fanqi patted him on the shoulder, "It's okay if you don't get used to it this time, there is still a long, long road in the future, you will get used to it." ”
I can't help it because I'm not used to it, right? People who can't change their weight can only change their aesthetics.
Otherwise, you will be tortured by yourself for the rest of your life.
Now that you've got this result on 95% of the data, the next thing to do is to add the remaining 5% and continue to fine-tune the model for a few days.
In this way, the final result can be used directly for submission in November.
Continuing to fine-tune its performance on a model that is already performing pretty well will take far less than 21 days.
In just two days or so, the new training log shows that the model's performance has largely converged to a fixed value, with little further fluctuations.
In this case, Meng Fanqi has only one thing left to do before going to the Australian conference, which is to complete the experimental data of these papers at hand.
Fill in the last piece of the puzzle that is missing from these articles.
By this time, Meng Fanqi had completed nearly 7 articles. In addition to the core of this competition, the new model DreamNet based on the idea of residuals, as well as related training techniques, batch normalization, Adam second-order optimizer, and mix-up data augmentation.
Meng Fanqi has also prepared groundbreaking work in three other directions to occupy three key areas.
Among the relevant content of the competition, in fact, only the residual network can be regarded as groundbreaking. The remaining three, although they are masterpieces in their respective directions, can hardly be called the foundation work of a certain subdivision.
Writing a paper to describe it in detail was only forced to be helpless, because in order to ensure the performance and training speed of DreamNet, Meng Fanqi had to use some tricks.
In order to ensure that such important results could be replicated in the industry, Meng had to describe these training techniques in detail, so he wrote a thesis. But if you have a choice, you are not in a hurry.
What he really hopes to take the lead in the layout is, first, the principle discussed with Dean Fu before, the generative adversarial network. This is the most promising and elegant label-free learning method in recent years, and it is a milestone that will be difficult for all future generative technologies to bypass.
Second, it is a real-time detection network based on new ideas. This will make it much faster and more accurate to distinguish objects and determine their position on the picture. In the future, the most widely implemented image detection technology, whether it is face recognition, autonomous driving or industrial detection. These new technologies have to mention the importance of this speed-up.
Third, it is the most concise and easy-to-use segmentation network, U-Net. This will be the baseline for complex segmentation tasks and will dominate the field of medical imaging.
Meng Fanqi selected these three types plus residual networks, which cover the four major fields of classification, detection, segmentation and generation. It occupies the four main tracks of image algorithms.
The reason why all image technologies are chosen is also to appear reasonable. As for language, phonetics, or multi-modal fusion algorithms, he plans to come up with them in a slightly slower time.