Chapter 87 Data Analysis of Networks

Statistical analysis, grasping the trend of data, can be converted into information. Qualitative and quantitative are different levels of description, and the sample can refer to the fixed point for the description of the whole. Essentially this is a description of frequency and probability, describing the distribution of data through various conditions such as variance, mean, and so on.

The coupling of sequences can be orthogonal, probabilistic connections between levels, fuzzy multi-level mathematics.

Various algorithms such as data analysis, clustering, dimensionality reduction, SVM, neural network, etc., can carry out a certain path collapse, that is, we can extract information that we can understand from the infinite high-dimensional network. Its specific form is consistent with sequence analysis in bioinformatics.

The one-dimensional is simply up-regulated and down-regulated, and different objects form a certain correlation pathway, such as the signaling pathway we are accustomed to, which is a macroscopic picture constructed by using the relative relationship between these protein levels. Formation of a variety of complex promoting/inhibiting mechanisms. I think that the relative competition of these paths should also be introduced in order to form a higher-dimensional picture.

Two-dimensional is the specific numerical value, on the basis of which we can fit these data according to a certain model.

The three-dimensional is the emergence of patterns in large-scale data: gene/protein networks regulate expression, which is a multi-level coupling network.

Fundamental trends: entropy increase and its resistance changes, power-law distributions (clustering), correspondence between sequence similarity and functional similarity, greater probability of functional connections between interacting proteins, modularity, hierarchical traversal and path formation, topological properties of networks, bioinformatics, system dynamics, expression profiles, probabilistic networks and hidden Markov models, compensation and stability of networks, simulating gene expression changes through external attacks, and verifying stability (similar to today's gene therapy); Matrix representation, through classification methods such as clustering, can infer the location of a particular object based on limited information, and thus infer other properties. We can also use Bayesian inference to continuously improve its accuracy.

The algorithm integrates database-level information and extracts indicators suitable for different levels, with certain information. The nodes of the network form a secondary structure such as loops, etc., and then can be traversed upwards to form a higher-dimensional structure, which is like the modular idea of programming. We can look for correspondences with sequences in these structural changes.

Chapter 1: Meaning; traditional methods; the approach of a systems perspective; specific network approaches;

Prediction of interaction networks based on sequences, prediction of hybrid systems, formation of sublayers, role of statistical indicators

Network understanding of sequence information, which is coupled with our attempt to reduce the dimensionality of the network structure to a sequence. One is from the bottom to the top, and the other is from the top to the bottom, and we are confident that their competitive games can form a certain balance. Perhaps the axiomatic system is a choice, of course, this is the core, and the specific situation also needs the coupling of multiple systems, that is, its selective expression can have a greater fit with reality. Anyway, the network can form different levels, which have a certain similarity and can be converted to each other, and can be combined to form a high-dimensional structure, the idea of knowledge modularity.

The usefulness of the mathematically similar partial derivative study model of univariate in complex network structures is much lower than that of other low-coupled systems (relatively independent distributions), because the underlying layer of the network can be selectively expressed as the missing part, which is the compensation and stability of the network. Of course, this is a matter of probability distribution, because if a particular gene is the central node of a highly connected network, it may have more obvious effects, such as lethality or other obvious changes in traits. We can't just do experiments every time we screen out this kind of fixed point (the current research model), we should rely on large-scale data computing to make a specific model emerge, that is, we take multivariates (that is, interactions according to certain criteria) as the research object, and construct the relationship between different sublayer structures that form a certain cluster at this level, that is, the interaction between modules, connect different modules in the form of probability, and finally achieve multi-level information integration.

Interacting proteins at the time have a greater probability of acting with the same/similar function. This is a pattern emerges at the statistical level.

Based on the same features of clustering (expression trends), a high-dimensional picture can be extracted, that is, different classes can be connected according to a certain path. First, co-expression patterns may have the same function, then various fixed-point indicators to introduce the connections between different patterns, and finally the relationship construction of all proteins at the network level. And on this basis, it is based on the correspondence of basic correlation

Microarray technology/yeast two-hybrid/co-immunoprecipitation—large-scale data production—data analysis—mining mode

Network-level data: matrix representation, matrix multiplication is path formation, and the sum of combinations is also the coupling of multiple paths, that is, path integration.

Chapter 2: The computation of big data provided by the protein interaction network can be compared in many dimensions: topology, expression level, etc. Traversing all paths, and finally the path integral is trivial, which is path collapse. The law of conservation, the input and output of the dissipative structure. Specific expressions are not necessarily needed, this is probabilistic.

The combination between proteins is the result of higher dimensions, and this network of interactions can improve greater correlation with specific functions, and the connection between the expression amount of specific proteins and these high-dimensional functions is a probability distribution, that is, there is a certain central node with greater correlation, of course, most of the nodes are connected with low probability, and are generally regarded as fluctuations. This network of interactions is a high-dimensional structure that corresponds to complex functions.

The study of the protein interaction network of a certain species can achieve certain knowledge transfer, such as various distributions, power-law distributions, small world models, and so on. The idea of the module is derived from programming.

Connectivity, distribution differentiation

Machine learning resolves pattern recognition of sequences to resolve possible relationships, i.e., probabilistic connections between different objects in the network.

Different levels of clustering, different combinations according to different criteria. Theoretically, it is possible to refer to a specific object as a whole in a series of characteristic descriptions, which is the idea of sequences, which can be accurately corresponded, but the resources required are too great to accept.

Combined with Bayesian statistical learning, that is, clustering can not only consider the relationship between distances, but also consider the relative operation of probability, and consider the power-law distribution of the network to form a certain inner module with high coupling and low cohesion

The molecular level is the lowest sublayer of the biological network, and we can traverse it to build other layers upwards: cells, tissues, organs, systems, and so on. Each level has a certain similarity, which is the basis for the interaction between the levels. However, the intensity of the influence between them is convergent (e.g., the change of the molecular level has the greatest impact on the cell, and the effect on the other layers is attenuated), and only the influence range is used to make the interaction between the layers transfer.

Life is a complex system. The structure of the network is coupled with the transmission of genetic information. Because it is detailed enough, it can be described in great detail, i.e. the results that we recognize. This is the result of the selective expression of the organism's network, and the selective expression of the network built from these results is a phenomenon of life that we can understand.

Omics is a network that provides a big picture.

Disease is a selective expression of the body's network, which is manifested by a local imbalance in the number of molecules and the exchange variation between levels (changes in the topology of the network), but it is not our symptomatic treatment that changes the local situation that can restore the overall network to homeostasis, because the network has a certain inertia that will counteract this effort. For the time being, we can understand that over-clustering makes the modules too independent, thus reducing the transfer of information. This is the recessive structure of abnormal gene expression. The molecular mechanism is the collapse path of the network, and we cannot determine the real situation, but our observations at the statistical level can construct a high-probability pathway, which is an equivalence.

The central principle is the overall trend, but there is also a certain resistance to change, such as RNAi and microRNA, etc., which is the basis for us to influence the overall network expression. At the same time, there are also intrinsic factors, genes, etc., which can regulate network behavior.

The pathogenesis of sickle cell anemia is a large degree of expression at the molecular level, which is due to the distribution of the network. At the same time, it also reveals that the network is relatively independent of the module. This is relatively low-level, and since the network formed by traversal is based on these underlying concepts, it can cause the effects of overall network diseases, such as chromosomal diseases.

Artificial recombinant DNA is a substitution of the peripheral system, which may have an impact on the expression of the network.

Our genes also communicate with the outside world because these genes are foreign objects, such as oncogenes.

Sequencing of omics, an exponential explosion of data.

Enzymes are the essence of network control, and there was a hypothesis before: onegeneoneenzyme (gene control of traits is achieved by genes controlling enzymes), and enzymes are also the coupling points of various levels. Since the construction of the network requires multi-level communication, it is necessary to maintain a certain level of speed, otherwise the network may be broken down into individual modules (clustering convergence at different levels), and enzymes act as catalysts for this speed.

Genes are meaningful coding regions, selectively expressed based on non-coding regions whose meaning we don't understand for the time being, which is a distribution: coding DNA, RNA, and so on. This is hierarchical convergence based on omics big data.