Chapter 36 Bioinformatics Reflections
The explanation of the theory, observation is the first order that we generally recognize, we need to know that the probability network behind it is the most basic, the observation of the network should use a new scientific paradigm, consider the introduction of the yin and yang dichotomy of Bagua and its multi-order differentiation, the sequence 10100100101110101. Pen @ fun @ pavilion wWw. ļ½ļ½ļ½Uļ½Eć info hypothesis:1. The differentiation of times has a certain radius and boundary, and anything can be equated with a certain level or a set of specific levels (fixed point theorem and Fourier analysis)2. 10 is the basic division, like various periodic waves 3. A certain homology of the sequence is related to hierarchical coupling, and different degrees determine the distance of the distance4. Distribution is a fundamental function, and each level conforms to its probabilistic law at a large scale
Network generation and network existence are at the same level, which is a hierarchical coupling, macroscopically one is y, and the other is y'
Barabas's theory consists of two assumptions: first, that the number of nodes in the network increases from fewer to more, and second, that new nodes tend to be connected with more nodes with more existing connections. This network has the characteristics of "scale-free", that is, randomly removing some nodes and corresponding edges in the network will not change the topology of the network, so it is also called "scale-*******work" (the formation of high-dimensional structure is not affected by low-dimensional changes, unless quantitative changes cause qualitative changes, and also change the high-dimensional structure. In line with our cognitive structure, there is a certain degree of robustness)
It should be possible to hypothesize that the reconstitution of relationships, the generation and disappearance of which should be a dynamic process, not a change in the number of nodes but an overall change in the number of red positions
Propensity is important background, such as entropy increase, and there is a distribution that tends to be normally distributed, energetic. The Matthew effect of the number of node links, the complexity is one-way
Almost all biological networks are scale-free networks, which are probabilistic networks, and drastic changes in the whole will not affect the whole, and sometimes small changes in molecules lead to drastic changes in the network
Studies have shown that a 10% increase in the expression level of each molecule in the metabolic pathway leads to a 100% increase in final metabolite production, which is an overall change in the probability network of changes
From the perspective of the level of the network, there must be a certain level of immobility, and there should also be at the overall level of the crowd, but at this time it will lose its meaning. Because the high probability is a small range, there is also a certain distribution here. Cancer-associated mutations are prominently present in genes for specific signaling pathways. That is, there is no commonality in a single mutation, a single gene, but there is a strong tendency at the network level. The target does not have to be a single gene, it can be a specific pathway or network, which is a second-order structure and is more robust
Marginal algorithm for markers
Using the method of gene chip detection, the specific expression was first discovered. Next, by detecting the expression of lncRNAs during fasting and resumption of diet in mice, one of the three lncRNAs was screened to have significantly decreased expression after fasting, and the expression was also restored after resuming diet, that is, lncRNA, lncstr, which was related to the ability level in mice. At present, there are many studies on lncRNA discovery, but there is little work to analyze the function of lncRNA, and many of the work cannot be deepened only after making an expression profile of lncRNA or finding differentially expressed lncRNA. An important reason for this phenomenon is that researchers are not familiar enough with functional examination studies to find suitable conditions to screen for certain functionally relevant LNCRNAs. Therefore, this work has given bioinformatics researchers a good inspiration, to pay more attention to biological problems and regulatory mechanisms, and to communicate or cooperate with biologists, which is conducive to the in-depth research work.
Through the comprehensive analysis of microarray data under different conditions in the NCBIGEO database, Zhao Yi's research group found that the genes involved in multiple metabolic regulatory pathways were related to the expression of LNClstr (with great correlation), and further found that the expression of an important rate-limiting enzyme (CYP8B1) in bile synthesis was positively correlated with the expression of LNCLSTR, which is likely to be the target gene downstream of LNCLSTR. This hypothesis was confirmed by subsequent experiments, and the molecular mechanism of LNCLSTR preventing TDP-43 from binding to the CYP8B1 promoter by binding to TDP-43 was completed, thereby relieving the transcriptional repression of CYP8B1 by TDP-43. Unfortunately, the homologous sequence of LNCstr has not been found in human cells, and it is not known whether there is the same LNCRNA in human cells, or whether there is LNCR with different sequences but performs the same function.
The data are integrated with a level of reference for complementarity, which is quantified into a certain sequence (Fourier analysis) and then considers its homology and specificity, which emphasizes high-dimensional structures, such as Mendel's segregation rate and free combination rate of gene quanta.
Specific combinations have a higher probability of forming specialized regions, such as CPG islands, N6-methylated regions, and the like, which is the second-order level
The partial derivative is the ratio of a whole to a part, i.e., a variable, and provided that it is a continuous space, then its order of higher orders is equivalent. The integral is path-independent and is a high-dimensional relationship that is coupled into a loop. The relationship between surface division and reintegration of the curvature, the sum of partial derivatives of multiple dimensions
Variations in dimensions, such as Green's formula, can be used to calculate low-dimensional calculations where the dimension is equal to the higher dimension
The theory of big data is highly abstract, such as the physical force, the relative action, relying on the emergence of probability, what are the laws of mathematics?
The simple path is the path of comparison eigen, using the human body as a catalyst, and the catalyst is essentially causing the wave function to collapse at the path of comparison eigens through multi-probability constraints
The high-dimensional structure of a large-scale sequence is self-programmed and operates like a program, and can behave differently according to different algorithms. Large-scale bacteria can be programmed, and algorithms can be built based on the selective expression of genes, like computer code, DNA sequences, programs, and apps
Propagation is a systematic process of information transmission, and we can consider the combination of macroscopic communication and microscopic cell signaling, cooperation and integration to form a probabilistic network to express characteristics
Omics is a low-dimensional projection of the network, and we can continuously speculate on the various properties of the network according to the similarity between the levels by looking for a certain pattern at the omics level. And this is the law of big data and large numbers based on databases.
Sequence alignment is the basic unit for omics, as nodes are for networks. The network is reduced to sequences, and then the similarity is determined by matrix scoring by sequence matching
homology to determine the time series of evolution is also the probability of functional structural similarity
Protein structure and function prediction is the traversal and ascending dimensionality of sequences, starting with the search for specific modules, followed by the pattern search for combinations and combinations, and then there is multi-level coupling, i.e., screening of the network. Finally, we can deepen our understanding of the mechanisms of biological development, metabolic processes, and diseases.
Moore's Law of Data Growth, Exponential Growth--- Data Management, Interpretation, Utilization
(1) Research on new algorithms and statistical methods, (2) Analysis and interpretation of various types of data, and (3) Development of new tools for effective use and management of data. ---forecast
The clustering of sequences, which is now regarded as nodes, is convergent in the average distance of our network
Sequences with different degrees of matching function are functionally discrete states
Molecular sequences construct numbers, and then use the similarity of sequences to couple together to form a network structure
Moving from the accumulation of data to the interpretation of data is like moving from simple addition, subtraction, multiplication and division to calculus, and can also be regarded as a simple Turing machine
The processing of sequences from the automatic sequencer is a kind of black box processing, drawing on the original discussion of various theories of the human body in traditional Chinese medicine: yin and yang, five elements, meridians, qi and blood, etc., and the sequence matching theories we are looking for are not the same? Of course, their evolutionary speed is different
The distribution function of frequencies at different levels, e.g. base frequencies in gene regions, repeat regions are different: base adjacent frequencies are not independent. Bases adjacent (two, three...... ) is generally not equal to the product of the frequencies of the individual bases. Frequency comparison. Codon correspondence: The number of codons corresponding to different amino acids is different, and since the change of the base at the third position of the codon often does not change the type of amino acid, its convergence range is 3. This is also the mechanism of the Markov chain: the k-order Markov chain assumes that the existence of a base at a certain position in the sequence depends only on the base at the first k position
Repetitive sequences are also a type of module
;