Chapter 100: Implications of Medical Literature Retrieval for Scientific Research
We need to develop our own work on the basis of our predecessors, which is understood by the idea of algorithms, which is to store the intermediate value of the operation, so as to reduce redundant operations, search more space for limited computing resources, and thus have a greater probability of finding the optimal solution. Therefore, literature search is the basis for the beginning of the project, and we must stand on the shoulders of giants.
The resources of the library are what we have to take advantage of, and this is our gateway to other resources, such as various websites and databases. Then we can find the theoretically most useful resources through various searches, such as binary search to find the optimal solution. No matter how low the probability of existence is, as long as the base is large enough, then its expectation will definitely be greater than 0.
The medical keyword MeSH can refer to a lot of synonyms, thus increasing our detection rate. To a certain extent, we can understand that this well-defined vocabulary is a more high-dimensional concept, a eigenvalue of a matrix, and can be used as a basis for more combinations.
Our life can be abstracted into many basic objects such as genes and proteins, and then it can form many relationships through linear combinations, and only a few of these exponential exponential relationships are meaningful in reality, and the exploration of these meaningful relationships is the exploration of statistically significant relationships, and their existence can be proved through various representational means. Therefore, I have an idea, can we simplify this literature research process that requires a lot of manpower and material resources into a computer search process, which can be based on the rules formed by our experience, and the keywords of the published literature can be counted, and finally the scientific research ideas that may be valid. This kind of thinking can greatly save unnecessary waste, and allow people to focus their limited energy on exploring new worlds more meaningfully. In general scientific research, we can obtain possible relationships through data analysis and then verify them. Everything is nothing more than a simple recombination, and the so-called innovation is to connect objects with a low probability of combination, so as to be able to play a non-linear role. Therefore, the idea of search engine development may be helpful for the development of our automatic scientific idea proposer: first, it crawls data on a large scale (crawlers crawl web pages), and then stores it in a database, leaving interfaces in the form of indexes, so that it can accept user-submitted query strings and return the results we need. And these results need to be sorted according to certain rules, such as pagerank algorithm, text matching algorithm, and so on. We can filter a large amount of junk information, directly extract certain objects such as various genes and proteins (ontology, different subject words) from the title, abstract, body and so on of the literature, and then construct the relationship between these objects according to the matching of keywords such as promotion and inhibition, etc., after establishing this one-to-one relationship, we can combine through more complex logic such as and or not, so as to build a certain relationship between more objects. We use machine learning algorithms that can learn a certain high-dimensional pattern that may exist through statistics, and can guide the formation of these relationships, which is essentially to establish linear factors one by one, but there are enough hidden layers to make this linear factor form a nonlinear pattern. Finally we can model the network and be able to generate certain outputs based on our external inputs (e.g., gene knockout, RNAi).
SCI indexing is actually a method of ranking importance, the more people cite it, the greater the importance, and this similar metric can be used to determine which specific relationships between objects are more likely to be meaningful. After all, the formation of scientific research topics does not fall from the sky, but has a certain historical lineage, which is consistent with the median value storage of computer science to reduce repetitive operations. We see scientific research as a simple and crude enumeration of the relationships between different objects, and we can't directly do random combinations (ABCD-EFGH), which would be extremely complex algorithms, and we don't have so many computational resources. Therefore, heuristic search can obtain more satisfactory results with a more tolerable consumption of resources, that is, standing on the shoulders of giants, there is a greater probability of obtaining meaningful relationships (as in the Bayesian formula), so most of the work products are extensions and extensions of previous work products, and can be regarded as linear combinations of linear independent substrates, while a small number of very pioneering works can be regarded as discovering new linear independent substrates, allowing us to make more combinations. To a certain extent, scientific research can be regarded as finding the most optimal parameters (just as machine learning algorithms can adjust parameters through data feedback), and determining specific paths between determined objects to be explored, just like linear combinations of linear independent substrates in linear algebra, theoretically, there are always certain combinations with greater biological significance. In fact, the search formula we construct when doing literature search is actually to use a search engine to mine possible relationships, and after accumulating enough relationships, it can be like the original function level of calculus (the simple operation at the high-dimensional level can be equivalent to the complex operation at the bottom level), and can obtain a more definite relationship from a high vantage point, for example, Professor Hou Fanfan's paper proves that ACE inhibitors can reduce the risk of non-diabetic advanced chronic kidney disease to end-stage renal failure, and this conclusion is based on a series of molecular mechanisms at the bottom。
Scientific research is essentially to find the relationship between different objects, and standing on the shoulders of predecessors is necessary to reduce a lot of redundant calculations (from the perspective of computer science to interpret the uselessness of scientific research), so our literature search is very important: we need to find all and find it accurately. On this basis, we need to read the literature to understand the general trends and research framework, first we need to start from the literature, and finally be able to summarize the general trends visualized (reading itself is doing statistical analysis), so as to lay the foundation for ourselves to propose new topics. Therefore, efficient reading, management, and utilization are necessary.
Establish a knowledge management system to manage the literature we collect, and perform analysis of various indicators such as importance ranking (according to the number of citations). We first construct a certain search query (we can also refer to the programming AND or non-structure, which can be like a sequential loop branch structure to implement all possible algorithms), and then import the relevant literature in the database, and finally we can actually display the central data in the form of a mind map (network model, find the central node, that is, scientific research hotspots, etc.), and there may be new scientific research ideas (meaningful rearrangement and combination).