Chapter 50: You Only Look Once: YOLO
Li Yanhong recalled that in the process of exchange and interviews at that time, he actually basically lost the initiative.
Because at first, his core plan was to recruit talent and find out the technical details of DreamNet.
As soon as Meng Fanqi got in the car, he happily gave himself a version of the DreamNet paper.
This incident directly disrupted his rhythm, and every step after that intensified the process.
Moved out of the details of the communication with Alix and Hinton politely declined the recruitment, talked about the route of the AI model, and lured himself to propose technical cooperation.
Then he suddenly pulled out such a shocking algorithm, as if he hadn't intended to mention it in the first place.
"When you think about it, it feels a bit like magic. Start by diverting your attention and hiding your true intentions. And then take advantage of it and attack to deceive you. ”
Under the strong doubts of several technicians led by Yu Kai, Li Yanhong couldn't help but have such an idea.
After all, at that time, Meng Fanqi only gave some experimental results, and did not have any other information.
If the situation is really as Yu Kai said, the performance improvement comes from the downstream application of DreamNet technology, and the detection speed has not actually improved, it is actually a big breakthrough.
It only didn't go to the point where it was worth the direct intervention of the CEO of the company.
However, the feeling of "as if this trip was not intended to be mentioned at all" is really not a wrongful accusation against Meng Fanqi, who originally planned to use this algorithm to negotiate directly with Google.
But after Li Yanhong proposed technical cooperation, Meng Fanqi thought about it for a while, and still thought that it was very beneficial for him to cooperate with Baidu first.
First of all, whiteness is far more dangerous than Google's lack of AI technology. Robin Li also came out to talk to himself in person, the same technology, the price that can be obtained in whiteness is higher.
Secondly, it has only been a few months since Google gave him a letter of intent, and he was able to have such an initiative and technical cooperation with Baidu. It can greatly enhance their bargaining power and negotiation space.
You must know that there are also many factions within the larger companies, and resources are all grabbed.
I don't have any historical achievements, I don't have any external relationships, and I came to Silicon Valley unfamiliarly, if there is a shortage of computing resources, it will be a lot of delay.
Of course, the most important thing is to look at the resources of the Chinese government.
Detection technology is the most widely used AI technology for government agencies at this stage, not only hundreds of millions of cameras can be intelligently marked with detection algorithms to monitor the key time periods, but also high-precision real-time face detection with higher security, which is a very large market.
I plan to go to Silicon Valley early next year, and if I want to catch the official line of Huaguo, I still need to rely on a large Internet company like Baidu.
At this time, the whiteness is not like ten years later, it has shown a great decline, and the whiteness and penguin Ahri are currently in the top three, which is still of great value.
Li Yanhong also considered this, and he also knew more about the Chinese government than Meng Fanqi, and was very eager for the potential opportunities.
Since you want to win this direction, you don't need to be suspicious, and you don't have to be suspicious of employing people, Li Yanhong still has this bit of courage.
Of course, the main thing is that the contract has not been signed yet.
"To put it bluntly, you have nothing to worry about, we will only sign the contract after passing the acceptance results, and then you will review the code yourself and reproduce the results. Can't trust others, can't you still trust yourself? ”
Robin Li quickly adjusted his mentality, "It is very undesirable for us to directly hold such a questioning attitude. After a while, we still have to adjust and pay attention to the ways and means. ”
On the other side, Meng Fanqi, who didn't know anything about the inside story here, was preparing to go to Baidu's Yanjing headquarters.
As a reborn person, he still overestimated the existing detection technology after all.
The first real application of deep learning technology to object detection should be the R-CNN, that is, the area detection neural network, which was just proposed this month.
In the case that the mAP value of the traditional algorithm stops at 30-40 and does not continue to increase, R-CNN breaks through the mAP value of 60 in one fell swoop based on the neural network.
Its R refers to the area, and the detection task is, to put it bluntly, to indicate the location/area of the object in the picture.
And even in 14-15 years, the R-CNN series is a leading high-performance algorithm, and its inference time is extremely slow.
Using the 14-year-old VGG network of Oxford University as the backbone of the structure, it takes a full tens of seconds to process an image. There is no possibility of real-time, and it is only used for academic research, which is difficult to invest in the industry.
Even a year or two later, the fast version of the Fast R-CNN series, which has been updated and upgraded repeatedly, only has an FPS of 0.5 and single digits.
And the algorithm given by Meng Fanqi: YOLO. Even on 448 x 448 images, the speed exceeds 80 FPS.
If inference is performed with the smallest version of the model, the speed can even reach an astonishing 200 frames.
How many people can't display 100 frames per second when playing games ten years from now?
The original YOLO technology was not accurate enough, after all, as a speed-focused detection technology, it was inevitable that there would be sacrifices in performance.
But when Meng Fanqi started to contact YOLO technology, he had already come out to V4, and by 2023, he had even reached V7 and V8.
There are many problems in details, Meng Fanqi just wants to make mistakes and doesn't know how to make them.
The first thing I remember is the optimized technology.
At this point, the most commonly used detection technique is DPM, which has a performance of 26.1 mAP at 30FPS and only 16.0 mAP at 100FPS.
And the R-CNN technology that just came out this month, although there is a qualitative breakthrough in performance, coming to 50-60, but the FPS has gone to a few decimal places, and it can't be used at all.
Meng Fanqi handed over the result of 69.5 mAP, 82 FPS, 58.3 mAP, 200 FPS.
This can no longer be said to be an ordinary transcendence, it is simply a complete explosion in a complete explosion.
However, in addition to being negligent in this regard, Meng Fanqi actually consciously wants to improve this performance.
Looking at all the AI technologies at my disposal, only detection is the fastest monetization at this stage.
This feature is straightforward, rude, easy to understand, and easy to show.
Just connect the camera and demonstrate it to the audience in real time, this AI technology can smoothly and silkily detect common objects such as tables, chairs, people, animals and plants on the screen, which can give the audience the most direct shock.
Technologies such as image generation and language dialogue still need a certain amount of time, massive data and computing resources to support themselves before they can realize these technologies.
In terms of actual application prospects, detection technology is not only the easiest technology to land at this stage, but also its future prospects are also very broad.
In two or three years, there will be countless companies engaged in autonomous driving, such as carp crossing the river.
Trying his best to make an exaggerated breakthrough in detection will help his historical position in this direction in the future, and to put it bluntly, it is actually easier to fool money.
It's just that he grasped the knife technique for the first time, and he was inexperienced and didn't cut it well. Inadvertently led to a misunderstanding among more professional people.