Chapter 77: The Methodology of Network Philosophy

From our current point of view, the reductionist idea of biological research, that is, the decomposition into pathways and even protein-gene interactions, is actually a continuation of the calculus of Newton's time: the idea of treating biology as a complex function, and decomposing it into small enough pieces, theoretically infinitesimal quantities, can then be infinitely summed, that is, integrals, and can be continuously ascended in this way to construct a sufficiently high-dimensional structure.

Theoretically, it is possible to continuously approximate the real biological network through this method, because we can assume that the network is a sufficiently high-dimensional structure, and based on our previous life experience, there is nothing that cannot be explained in the higher perspective dimension, and if there is, then rise to a higher dimension.

If we could, we could think of Hilbert space as a kind of modeling of the network, because infinite dimensions can kill everything, of course, this is a mathematical ideal, and the amount of computation is desperately large, so we need to develop various algorithms to simplify.

And there is a big monster of Gödel's law of incompleteness, and yes, we think that the construction of Hilbert's formal system is the process of building a model of a network.

(I don't comment on the truth or falsity of this inference/brain hole). This scientific paradigm can be said to have laid the foundation for the brilliant achievements of our modern science, and this reductionist way of thinking does not exclude the holistic mode of thinking, after all, the whole is a sufficiently high-dimensional structure that can be constructed by integral traversal ascending.

In the study of biology, we found that it uses a methodology between absolute reductionism and absolute holism: through the influence of the underlying molecules, observe their changes at different levels, such as cells, tissues and organs, etc., and finally build a certain relationship with macroscopic biological processes such as diseases.

From our current point of view, the construction of this pathway is a probabilistic behavior, that is, only a certain proportion of macroscopic large-scale expressions will appear in a certain proportion of specific patterns, which may partly explain why the reproducibility of biological papers is still relatively low.

In other words, we believe that the construction of the network is actually a probabilistic connection of various basic modules at different levels.

The combination of these modules is a Bayesian process, even if the probability of the overall pathway increasing until a certain threshold is exceeded.

And then, haha, an article came out. This can be explained by the emergence of networks, and of course the annealing of metals is also a good explanation angle.

The application of various existing mathematical knowledge, probability theory, graph theory, information theory and other system theory is an attempt to explain this structure.

At present, we tend to use sequence analysis, a branch of systems biology, bioinformatics to model the network structure.

We use sequences as the basic units to deal with the properties of networks, like the infinitesimal quantities of calculus.

Of course, this is just an idea at present, with the limited computer knowledge I have learned so far, we still need to define the sequence and its operation, after all, genomics can use large-scale sequencing ACTG sequences to do various operations to build different levels of databases, so as to extract high-dimensional information such as protein structure.

There is a preliminary theoretical reflection on this, as shown in Appendix 1. If true, we will be able to explore the various properties of the network through various operations on the sequence.

Of course, at the moment it seems to be just a pipe dream. But if we can really construct this kind of data structure, we can imitate the various equivalence transformations of logical operations to construct new sequences (different concentrations of proteins of different genes) and thus add various operations, and the current ideas are that game theory and equilibrium achievement can be used as a mathematical construct for sequence operations.

The path formed by the connection between the sequences is a high-dimensional structure, like the formation of a feedback module.

At the same time, various discoveries in network science, such as the power-law distribution of Barabír, and the small-world model are also very helpful for the construction of networks.

Of course, the application of various statistics of mathematics is also necessary, because the emergence of various situations in the network can be distributed to a certain extent.

That means we need to make full use of all kinds of big data, which means that we also need to use all kinds of mathematical knowledge, especially various statistical tools.

According to the current trend, we still need to learn various programming languages such as C language and have basic programming skills.

Appendix 1: Today's article is about Shinya Yamanaka's 12th Nobel Prize-winning article. The logic is as clear as the literature I have read in the past, and it gives people a beautiful feeling, especially when you see so many documents cited in the introduction and discussion, you can't help but feel that scientific research is a great project that relies on close cooperation.

Only by basing our work on the work of our predecessors like them can we expand the boundaries of human cognition, which is a great inspiration to me, after all, I have a good impression of the system theory of holism, and I prefer the network theory in it, and I have always tried to use the perspective of the network to interpret the world.

If you remember my earlier modeling of love, it was based on a sequence (and also a good definition) matching, and the sequence is an object of operation in the network.

But this is all relatively abstract thinking, and I have to go down to the ground to fulfill my ideal of network theory.

This article embodies the philosophy of complexity that I want, the mechanism by which four transcription factors can produce pluripotent stem cells, understood from the perspective of the network, is that for a specific central node, it is affected by the transfer and expression of exogenous genes, resulting in a change in the overall network topology, which can be activated to the original pluripotency state of stem cells, like an energy level transition.

The above is a bit too abstract, I'll try to simplify it a bit. Then imitate the process of building theories in mathematics.

So first of all, the definition, we define genes at different expression levels as certain sequences, such as Oct3/4, Sox2, c-Myc, Klf4, p53, Nanog, Eras, βactin, which can be understood in an undulating curve.

Theoretically, we can map all the cells in the world to a specific sequence as long as the defined sequence is long enough.

Although this is impossible to do, it is theoretically feasible but it is too difficult to operate in practice. The second is to define operations based on sequence-sufficient objects, such as addition, subtraction, multiplication, and division.

As a mathematical structure, according to the previous knowledge of bioinformatics, we define the operation of sequence as matching.

The pattern of expression levels of the sequences is similar and can be approximated as an equivalence relationship. In this article, we will confirm the pluripotency of iPS cells by measuring various key gene proteins.

At the same time, since I have always been fond of game theory in economics, we can idealize this sequence matching process as a sequence of competitive games.

The competition between sequences and the achievement of equilibrium are the basis for the formation of important network paths. The coupling of this sequence can form a complex relationship, that is, we need a certain coupling pair to play a special effect, that is, it can form a certain high-dimensional space due to the competitive game of the effect, and its equilibrium is the expression of the specific Markov sequence, which determines the specific expression of reality.

One of the key applications is the concept of fixed points, where sequential games can form a certain equilibrium point, i.e., a fixed point.

This is reflected in the various marker gene proteins that we commonly use, and we are able to determine the high-dimensional structure by measuring specific fixed points.

This leads me to my beloved theory of networking. The network is a high-dimensional structure formed by sequence competition game, and its selective expression is the formation of specific sequences, that is, it can be regarded as the differentiation of various types of cells.

The way of thinking about the network is not one-way, but a multi-directional comprehensive expression. For example, the antiarrhythmic drug Tambocor's mechanism of action is to inhibit premature beats, thereby reducing the occurrence of heart disease, but this drug may eventually cause tens of thousands of deaths.

The view of the network is not only to suppress or activate specific objects, but to selectively activate and suppress sequences, that is, to have different effects on different objects.

This article is about the selective processing of different sequences in order to lead to the emergence of the final pattern, i.e., the generation of iPS cells, such as Klf4 activating p21CIP1, thereby inhibiting cell proliferation.

This can be understood as the convergence of the Taylor series caused by the first and second orders of the series. To take a very popular example, the chain of suspicion: I know, you know I know, I know you know, you know I know, you know I know, you know I know...... Theoretically, it can be carried out indefinitely, but the reality is that everyone stops after a few cycles, which is convergence.

Of course, I found that smart people can do it a few more times, and there is a certain distribution. The third is the extension of various concepts based on definitions.

Based on the known knowledge, we know that the expression levels of various genes in cells are dynamic, and that there is a specific housekeeper that the expression level of genes is stable, so we can construct certain sequences based on the measured data and discover various patterns through sequence alignment.

What we can do now is to process the significant up-regulation and downregulation of a limited number of genes, and our future direction is to synthesize these data to discover patterns such as transduction of specific combinations of transcription factors that can lead to the formation of stem cells, and specific tumor formations, and so on.

The pattern found in this article is that the high expression of Oct3/4, Sox2, c-Myc, and Klf4 can make the sequence expression of cells close to that of embryonic stem cells, so that it can be approximated that this is equivalent.