Chapter 14: Breakthrough! New input method artifact

Previous Chapter Next Chapter

Text Size:

Appearance:

Duke entered a state of demonic research, his time was very precious now, and there was not much room to waste, so he quickly returned to his little home after eating. Sitting in front of the computer that was downloaded 24 hours a day, he frantically searched and downloaded various voice fragments, and handed them over to Kerry for speech semantic analysis and the construction of a basic knowledge base.

Since moving to the rental house, Duke has downloaded thousands of audio clips from various environments and contexts on the Internet, from TV and radio news clips to film and television dialogues, animal world or National Geographic Channel science and education commentary clips, as well as all kinds of candid shots, selfies, pseudo-selfies live clips, thanks to Cool Potato, youtube, BT and eMule, let Duke know that the original world has such a colorful sound.

The large number of sound cables collected by Duke is like a drop in the ocean for Kerry's processing power, and often just after input, Kerry calculates and parses the phonetic semantic features of the fragment, so as to add a new specimen element to the phonetic semantic feature database of speech recognition, and the more different the speech fragment, the more valuable.

It's like the more places a person has lived, the more they can hear different local accents, each sound is composed of some special characteristics and general characteristics, and the recognition rate of standard pronunciation in today's speech recognition software is actually very good.

For example, IBM launched the speech recognition input system ViaVoice many years ago, and the software recognition rate can reach the practical level in quiet environments and standard pronunciation conditions.

Unfortunately, the actual application environment will not be so ideal, but just like the four people in Duke's dormitory, although everyone speaks Chinese, but the accents of the four people from different places are very different, and the four people often have some problems communicating when they first live together, but everyone quickly adapts to it.

The human brain's powerful learning ability is definitely not comparable to today's computers. However, the existing speech recognition software does not have such a strong learning and adaptability, that is to say, it does not have a knowledge base to identify such differences in speech features, and of course, it cannot recognize some unknown types of pronunciation well.

The recognition of different accents and the elimination of ambient noise are two difficult problems in speech recognition, and solving these problems requires a large number of first-hand speech data fragments to build a massive knowledge base of speech features, or to develop a highly intelligent supercomputer like Kerry.

Based on the theoretical data downloaded by Duke, combined with the analysis of various speech fragments, Kerry has continuously updated the basic algorithm of speech recognition, and generated different speech recognition simulators - this is mainly due to the fact that the computing level of the current mainstream computers on Earth is too low compared to Kerry.

Using 50% of the computing power of the simulated iPhone 4S as the minimum benchmark, Kerry simulated the accuracy and reaction time of the speech recognition algorithm under different performance conditions, and the original version was able to achieve 90% recognition accuracy in 5 seconds from the initial baseline performance - of course, this achievement is far beyond the level of all speech recognition software on the planet today.

It is important to know that this 90% accuracy rate is simulated recognition test with thousands of Chinese and English speech information with different accents in different contexts, which means that the filtering of various accents and noise is basically considered.

This score is already much better than Apple's Siri, which can only listen to English now, after all, Siri can now recognize a relatively standard English pronunciation. If you don't believe it, try to take the English audio clips with Indian accents and Singaporean accents and see how much Siri can recognize.

If it is on a computer with analog performance close to dual-core 2G or above, the recognition level of this indicator will be increased to less than 2 seconds to achieve an accuracy of more than 97%, and the response time is actually somewhat in conflict with the recognition accuracy, because in order to recognize more accurately, the basic data sources of the original version of the speech corpus must be more abundant.

The more extensive the voice sampling, the higher the recognition accuracy, and the larger the speech sample database, the longer the time for search and matching, resulting in the prolongation of reaction time, so the sampling compression of speech samples and the voice search matching algorithm have always been the two focuses of Kerry optimization.

Kerry has been simulating and improving the algorithm for extracting speech semantic feature values, continuously reducing the size of the speech sample corpus without distortion by continuously compressing the redundant values, and on the other hand, continuously improving the intelligent search and matching algorithm for the speech corpus.

The optimization algorithm Duke couldn't help, but he had no problem collecting as many voice samples as possible, so Duke lived a very fulfilling life every day, searching and downloading different types of speech samples for Kerry to analyze and refine, and at the same time constantly learning and understanding these new processing algorithms created by Kerry, knocking on the door of MIT.

Duke had to have an innovative paper on the basic theory of speech recognition that reflected his ability, but there was no ready-made speech recognition knowledge in Kerry's knowledge base, which was too old for Kerry, so old that Billem didn't add this knowledge to Kerry.

What Kerry is doing now is to build on the existing speech recognition theories and algorithms on the planet, and use his powerful simulation capabilities to continuously simulate a variety of different speech processing algorithms.

Finding a more effective method through simulation - although this method is a bit clumsy, but with Kerry's super computing power, after all, thousands of possible algorithms per second can be simulated, making this clumsy method also quite effective, finding out several possible optimization algorithms, raising the recognition rate and reaction time to a new level.

However, to write these results in a language and theory that earthlings can understand, and to make people understand, is a new challenge for both Kerry and Duke, because Kerry is not a mechanical binary mindset with 01 as the core, but a biological polymorphic mindset.

Although Kerry is now able to simulate more than a dozen different performance PC virtual machines that are common on Earth at the same time in an instant, in order to allow Kerry to accurately understand the computing power of computers on Earth, Duke bought four hosts with different interfaces and nearly 20 mainstream PC CPUs on the market for Kerry to analyze and test performance benchmarks, and then Kerry made a virtual corresponding simulator based on the performance of these configurations.

However, these special virtual machines do not need to be understood, so Kerry can create them according to his own computing methods, so although the performance is comparable, the implementation patterns are very different, and the complexity is not an order of magnitude compared to the two different architectures of RISC and CISC CPUs on the planet.

Therefore, after Kerry completes the algorithm implemented according to his own model, he has to re-implement it according to the 01 rule on the earth, which is indeed a huge challenge for Kerry, not to mention that the paper has to be abstracted again on this basis, not only to implement the algorithm in software, but also to establish a mathematical model that can be proved on the basis of earth mathematics.

As a result, Kerry worked almost 24 hours a day, and the simulation algorithm was able to achieve a recognition rate of 97% in 1 second above the lowest benchmark, and it took another two weeks to achieve a recognition rate of more than 99% in 1 second on a dual-core 2G computer.

After Duke read more than a dozen mathematical monographs and downloaded and studied several open-source speech recognition software, Kerry completed a paper on a new speech recognition algorithm and assisted Duke in developing a speech recognition software that runs on an Earth computer. And the first application of this speech recognition software is to package it as a voice input method.

Cape Forum. Completing both speech recognition software and essay writing, Duke is now at ease.

He registered for a vest to join a discussion post about the development of Kerry's war plot, and in order to test the new software, he saw him talking to the computer microphone, trying to imitate a variety of different tones and accents, and these words were quickly recognized by the computer and turned into text to reply to the analysis of characters and plots by various young people in the forum.

Duke knows the plot well, and of course the analysis is the first to be the way, often a large paragraph of incisive analysis, which quickly attracted the attention of fans, of course, with voice recognition input, although Duke's reply is real, but each reply is still faster than anyone in the forum.

It feels like it's not inferior even to a professional stenographer.

"Hey, buddy, you use the Shenma input method, why do you reply so quickly, almost in seconds?" A Wen Qing finally couldn't stand Duke's curiosity about this flying reply speed, and couldn't help but ask.

What input method? Duke was stunned for a moment, and then reacted, in order to test the voice recognition input method he had just developed, he didn't pay attention to the speed control for a while, but he didn't expect such a windy second back to unconsciously attract attention.

"A new type of voice input method." Duke said in a Tieling voice similar to Lao Zhao, and immediately converted his speech into text accurately on the computer screen, and there were many Lao Zhao's voice samples recorded, and the recognition rate was naturally no problem at all.

In the discussion just now,Duke has used all the pronunciation methods he can come up with to conduct a simulation test,The recognition accuracy rate is 100%,Although he is now only turning down the sound of the TV as background noise,There is still some distance from a complex noise environment,But because Duke changes different accents and tones,To be able to reach this level,It can almost be seen,The era of keyboard input method is over,The launch of this voice input method will announce the beginning of a new input era。

"Hey, buddy, you tease me, I've used the penguin voice input method, and it has your speed and accuracy." The Wen Qing replied in disbelief.

"Hehe,I just got the internal test version,Oh,Sala input method,If nothing else, you'll be able to download the preview version from major websites soon。 Duke remembered Apple's Siri and replied by casually making up a similar software name.

"True or fake? Which company developed such a great input method? ”

"This is the latest work that the company has just developed. It's being tested, hehe, but it really works. It's a nice feeling to get rid of the keyboard."

"Paid or free? If it's free, can you send me a copy of your beta version, my email address is xxx@"

"Big brother, kneel and beg for a [email protected]"

Soon the post discussion deviated from the direction,More and more people began to pay attention to the conversation between these two individuals,In the end, they all joined the industry of asking for Sala input method,For a while, the screen was full of replies to Sala's voice input method。

Duke, who once again created a sensational effect, did not expect that a software test would evolve in this way, which shows that the scope of application of this voice input software is too wide. But Duke agreed without being hot-headed this time, even if he lacked emotional intelligence, he knew that it was absolutely inappropriate to send out the software for free at this time, which shows that with the surge in IQ, especially after negotiations with the two editors, Duke's emotional intelligence still has a little sign of progress.

The actual trial was very successful, there was no problem in validating the new speech recognition algorithm, Duke confidently submitted the electronic manuscript of the paper to JACM, which is the top journal in the computer industry, and being able to publish papers on it can undoubtedly prove his strong scientific research ability, which is more effective than the recommendation of a hundred well-known professors, and combined with Duke's impeccable GRE score, it is no pressure to apply for a Ph.D. at MIT.

Previous Chapter Next Chapter

Back to Book