Friday, July 16, 2010

Aristotle in a chip

The reemergence of ancient notions in the modern field of bioinformatics

Aristotle, in his zoological opus Historia Animalium (The history of animals), launches into his analysis of the animal kingdom by observing differences and similarities between the species. For example, he observes that bats and birds both have wings, so he surmises that they must be grouped together; like fish and dolphins should. By examining animal anatomy and by comparing features such as number or shape of legs (or absence of legs), wings, types of skin, habitats, etc., Aristotle put together a logically coherent taxonomy of animal life that remained virtually unchallenged until Linnaeus. This idea of comparative anatomy, as systematized by Aristotle, is essentially the study of homology (from the Greek word “hómoios”: “similar”) – i.e. of similarities. The idea flowed naturally from Aristotelian Logic and in particular his theory of syllogisms: is A equals B and C equals B, then A equals C. If one replaces “equal” with “similar”, then homology is the logical corollary of equality.

Ancient Greek, and by consequence Medieval European, homology was explained by ideal archetypes, by timeless blueprints designed by a heavenly architect, and into which the objects of perceived reality were molded. Darwin’s revolutionary idea was to provide a naturalistic explanation to animal homology, thus ushering in the era of the scientific study of life.

One and a half centuries after the publication of Darwin’s Origin of Species the modern brethren of his Victorian genius spend much of their time, alas not aboard adventurous sailing yachts roaming the southern seas, but in front of computer monitors applying an ever-expanding arsenal of mathematical and computational techniques in the analysis of living organisms.

One of the most significant application areas of bioinformatics – as this contemporary fusion of biology, computer science and mathematics is termed – is in the study of complex molecules, such as proteins.

Proteins, the building blocks of cells, have structures made up from their particular sequence of aminoacids (which are, in turn, the building blocks of proteins); the way these amino acid molecules unfold in three-dimensional space is what determines the function of a protein. So it is very important for biologists to be able to predict the structure of proteins. What we know is that a protein structure is generally determined by the sequence of the gene that codes for it. And here is where the notion of homology reemerges. It is used to predict the function of a gene. If the function of gene A, whose function is known, is homologous to the sequence of gene B, whose function is unknown, one could infer that B may share A’s function. In a technique called homology modeling, this information is used to predict the structure of a protein once the structure of a homologous protein is known.

Caveat Lector: biologists beware! Meddling with mathematicians who are, secretly, Platonic devotees, may one day lead you to the defense of positivist naturalism against subversive philosophical attacks from the musical spheres of perfect, ideal, proteins-out-there. Ancient ideas, as you should know, are very hard to beat.

Monday, July 12, 2010

The Word Machine of Lagado

Jonathan Swift published the first edition of Gulliver’s Travels in 1726 and since then it has never been out of print. In Book III, Gulliver is abandoned by pirates on the continent of Balnibarbi. After a visit to the flying island of Laputa, he is taken to the Academy of Lagado, where “useless projects” are undertaken. There, he is given a demonstration of a word machine, which is nothing less than a giant mechanical computer used for making sentences and books. The satirical aspect of Swift’s idea is that the machine renders obsolete any study or expertise; an absolute idiot can write a masterpiece by virtue of cranking the machine. In the post-modern context the irony becomes a tenet: all texts are self-produced, they have an transcendental-bibliographical animus which acts like a virus. Human minds are the hosts of this viral propagation and mutation of texts. The writer “thinks” he is the creator but he is merely an empty vessel, a hapless idiot.

The word machine of Lagado has fascinated computer dreamers too. It is the original idea behind Hilbert’s Ur-algorithm – a logical contraption that, should humanity come to an end, can recreate by itself, automatically, the works and knowledge that was lost. The machine that can write any book. The mathematical formula that can prove every theorem. Thanks to Gödel we now know that such a machine, or algorithm, is impossible to construct. But the fascination with the word machine of Lagado is too strong to let go. Like a childhood dream it returns again and again to haunt the adult life with nostalgia. What if there is a way round Godel’s incompleteness theorem? What if there exists, somewhere in an infinite multiverse, a word machine like the one dreamt by Swift? What if our thoughts are written in the pages of its infinite books?