As a pre-med, I studied lots of biology, beginning with invertebrate, which I found fascinating - so much so that I briefly considered becoming a marine biologist (until my dad used his considerable child-of-the-depression influence to steer me back to medicine - “doctors always eat”, he said). Anyway, we moved on to studying embryology, including morphogenesis - how individual organisms develop into their characteristic body forms. The similarity of different phyla of animals at early stages of development is quite remarkable, leading to the speculative aphorism: "ontogeny recapitulates phylogeny".

In those days before computer graphics, in order to study 3-dimensional development of organisms, organs and tissues, we looked through the microscope at thin serial (cross and longitudinal) sections through embryos in order to mentally reconstruct the 3-D structure at various stages of development. Nowadays, this can be much more quickly learned with animated 3-D graphics. You could say “a graphic is worth a thousand sections”.

Left unasked in those days were two obvious and fundamental questions, as the answers were mere conjecture with little if any science: how do cells 1. differentiate - become structurally and functionally different from each other; and 2. how do they arrange themselves in space so as to form a morphologically recognizable member of the species to which they belong. To take the second question down another level - how, for example, do skin cells in the human finger “know” how to arrange themselves in relation to each other? How can they “know” with sufficient precision to even create unique fingerprints in one individual, for example? Is there some kind of biological GPS at work?

Nowadays, there are theories as to both differentiation, which I will set aside here, as it is a very large topic and as to cytoarchitectonics. I will also say little enough about cytoarchitectonics - because, still, little is known. The most prominent theory involves differential adhesion between cells, resulting from induction of specific adhesion zones induced on specific parts of the membrane of a given cell as part of the complex process of self-organization. These zones, it is thought, tend to adhere preferentially to other specific cells. These adhesion codes, in turn, are hypothesized to be induced by humoral chemicals emitted by surrounding cells, called morphogens.

I have not had the time to explore much of the current state of understanding of cytoarchitectonics, other than to note that much of the research is aimed at understanding the development of brain connections. Of course, this is an important area of study and it may offer some insight into the general process of cellular self-organization. It remains, however, a specific case, and may diverge substantially from the general process, since neuronal connections - reflecting sensory inputs associated with experience of the entire organism, i.e. learning, likely depend on environmental factors well beyond those of morphogenesis generally.

In my cursory survey of the current state of knowledge, one thing I have not come across is an informational analysis of cytoarchitectonics. I have found no speculation, even to the nearest order of magnitude, as to the quantity of information required to make order of what might otherwise be a chaotic blob of partially-differentiated cells.

At one end of the information spectrum, perhaps cellular automata provide a model for differentiation and/or morphogenesis; here, a few simple rules can result in quite complex arrangements. On the other extreme, on might ask whether the DNA of a fertilized ovum can contain sufficient information to direct the differentiation and ultimate architectural location of every single one of the cells in the human body. Considering there are an estimated 30 trillion cells in the human body, this is an interesting informational question, indeed.


In one sense, it might be said that the information required cannot be greater than the information content of the organism’s DNA (two bits per base pair), or around 6 billion bits (755 megabytes). That includes all the so-called “junk DNA”, but as I’ve suspected ever since learning of the concept, more and more of it is being discovered to be significant and function in various ways.

Now, that information is meaningless without the molecules that interpret it and manufacture the proteins whose sequences it specifies, plus the three-dimensional structures those chains of amino acids fold into. But from a pure information theoretic standpoint, since the DNA contains all of the instructions to manufacture more of those molecules, they contain no additional information.

As many sceptics of conventional theories of abiogenesis have observed, creating even 256 bits of functional and precisely structured information by random processes is absolutely impossible within a universe of the age and size of ours. While 755 megabytes may not seem that much in our era of multi-terabyte solid state drives and gigabyte RAM computers, understanding where that information came from is a formidable conundrum.


Thank you. I was hoping you might comment on the informational aspect of this. I don’t understand the informational analysis of cellular automata. The inputs are a set of simple rules for each cell. The outputs appear quite complex when they combine in the rulial (as Stephen Wolfram uses the term, I think) system. Do they actually ‘create’ new information, or are the outcomes bounded merely by the universe of possible permutations of the rules applicable to the individual cells i.e., no new information, just that which is inherent in the permissible combinations determined by the original rules?

Oh yes. The idea of “junk DNA” never made sense to me. Nature is parsimonious.


Information is surprisingly difficult to define, and intertwined with another thorny concept, randomness. A perfectly random sequence has the maximum possible information content, since knowledge of the entire sequence so far provides no information at all on what comes next. But such a sequence also encodes no functional information, so in that sense its information content is zero.

The digits of π pass every test for randomness that anybody has applied to them, and yet they contain very little information, since you can generate as many as you wish with a program of just a few lines. Many computer scientists use the term “algorithmic complexity” to express this: they define the information content of data by the shortest possible program which can generate it. For a purely random sequence, the shortest program is just one that prints the sequence itself, so it has maximum algorithmic complexity. Wolfram’s cellular automata rule 30, which can be thought of as a computer program which is 8 bits in length, generates a string of bits which pass all tests of randomness, yet are perfectly predictable from the rule and three adjacent bits. It, thus, has very high Shannon complexity in its output, but very little algorithmic complexity.

1 Like