Expected Anomalies in the Fossil Record

The problem of intermediates in the fossil record has been frequently discussed ever since Darwin. The extent of `gaps' (missing transitional stages) has been used to argue against gradual evolution from a common ancestor. Traditionally, gaps have often been explained by the improbability of fossilization and the discontinuous selection of found fossils. Here we take an analytical approach and demonstrate why, under certain sampling conditions, we may not expect intermediates to be found. Using a simple null model, we show mathematically that the question of whether a taxon sampled from some time in the past is likely to be morphologically intermediate to other samples (dated earlier and later) depends on the shape and dimensions of the underlying phylogenetic tree that connects the taxa, and the times from which the fossils are sampled.


Introduction
Since Darwin's book On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life [2], there has been much debate about the evidence for continuous evolution from a universal common ancestor.Initially, Darwin only assumed the relatedness of the majority of species, not of all of them; later, however, he came to the view that because of the similarities of all existing species, there could only be one 'root' and one 'tree of life' (cf.[11]).All species are descended from this common ancestor and indications for their gradual evolution have been sought in the fossil record ever since.Usually, the improbability of fossilization or of finding existing fossils was put forward as the standard answer to the question of why there are so many 'gaps' in the fossil record.Such gaps have become popularly referred to as 'missing links', i.e. missing intermediates between taxa existing either today or as fossils.
Of course, the existence of gaps is in some sense inevitable: every new link gives rise to two new gaps, since evolution is generally a continuous process whereas fossil discovery will always remain discontinuous.Moreover, a patchy fossil record is not necessarily evidence against evolution from a common ancestor through a continuous series of intermediates -indeed, in a recent approach, Elliott Sober (cf.[11]) applied simple probabilistic arguments to conclude that the existence of some intermediates provides a stronger support for evolution than the non-existence of any (or some) intermediates could ever provide for a hypothesis of separate ancestry.Moreover, some lineages appear to be densely sampled, whereas of others only few fossiliferous horizons are known (cf.[10]).This problem has been well investigated and statistical models have been developed to master it (see e.g.[6], [7]), [12]).
In this paper, we suggest a further argument that may help explain missing links in the fossil record.Suppose that three fossils can be dated back to three different times.Can we really expect that a fossil from the intermediate time will appear (morphologically) to be an 'intermediate' of the other two fossils?We will explore this question via a simple stochastic model.
In order to develop this model, we first state some assumptions we will make throughout this paper: firstly, we will consider that we are sampling fossil taxa of closely related organisms and which differ in a number of morphological characteristics.We assume this group of taxa has evolved in a 'tree-like' fashion from some common ancestor; that is, there is an underlying phylogenetic tree, and the taxa are sampled from points on the branches of this tree.
It is also necessary to say how morphological divergence might be related to time, as this is important for deciding whether a taxon is an intermediate or not.In this paper, we make the simplifying assumption that, within the limited group of taxa under consideration (and over the limited time period being considered), the expected degree of morphological divergence between two taxa is proportional to the total amount of evolutionary history separating those two taxa.This evolutionary history is simply the time obtained by adding together the two time periods from the most recent common ancestor of the two taxa until the times from which each was sampled (in the case where one taxon is ancestral to the other, this is simply the time between the two samples).This assumption on morphological diversity would be valid (in expectation) if we view morphological distance as being proportional to the number of discrete characters that two species differ on, provided that two conditions hold: (i) each character has a constant rate of character state change (substitution) over the time frame T that the fossils are sampled from, and (ii) T is short enough that the probability of a reverse or convergent change at any given character is low.We require these conditions to hold in the proofs of the following results.We will discuss other possible relations of morphological diversification and distance towards the end of this paper.When the tree consists of only one lineage from which samples are taken at times T1, T2 and T3, then clearly the distance d1,3 is always larger than d1,2 and d2,3.Consequently, E1,3 > max{E1,2, E2,3}.For samples taken from different lineages of a tree, the distance d1,3 of one particular sample from time T1 to the one of T3 can be smaller than the distance of either of them to the sample taken at time T2.Yet in expectation we always have E1,3 > max{E1,2, E2,3} for two-branch trees.For more complex trees this can fail as we show in Example 2.7.
The simplest scenario is the case where the three samples all lie on the same lineage, so that the evolutionary tree can be regarded as a path (cf. Figure 1).In this case, the path distance (and hence expected morphological distance) between the outer two fossils is always larger than the distance that either of them has from the fossil sampled from an intermediate time.But for samples that straddle bifurcations in a tree, it is quite easy to imagine how this intermediacy could fail; for example, if the two outer taxa lie on one branch of the tree and the fossil from the intermediate time lies on another branch far away (cf. Figure 2).But this example might be unlikely to occur, and indeed we will see that if sampling is uniform across the tree at any given time, in expectation the morphological distances remain intermediate even for this case (cf. Figure 2).Yet for more complex trees, this expected outcome can fail, and perhaps most surprisingly, the distance between the earliest and latest sample can, in expectation, be the smallest of the three distances in certain extreme cases.Thus, in order to make general statements, we will consider the expected degree of relatedness of fossils sampled randomly from given times.Our results will depend solely on the tree shape (including branch lengths) of the underlying tree and the chosen times.

Results
We begin with some notation.Throughout this paper, we assume a rooted binary phylogenetic tree to be given with an associated time scale 0 < T 1 < T 2 < T 3 .The number of T i -lineages (of lineages extant at time T i ) is denoted by n i .For instance, in Figure 3, the number n 1 of T 1 -lineages is 3, whereas the numbers n 2 and n 3 of T 2 -and T 3 -lineages are both 5.If not stated otherwise, extinction may occur in the tree.Every bifurcation in the tree is denoted by b i , where b 0 is the root.Note that in a tree without extinction, the total number of bifurcations up to time T 3 (including the root) is n 3 − 1.For every b i let t i denote the time of the occurrence of bifurcation b i .We may assume that the root is at time t 0 = 0. Now, for every b i , we make the following definitions: where n l j,i denotes the number of descendants the subtree with root b i has at time T j to the left of its root b i , and n r j,i is defined analogously for the descendants on the right hand side of b i .
It can be seen that bifurcations for which at least one branch of offspring dies out in the same interval where the bifurcation lies always have P j,k i -value 0. Consequently, if either t 0 < t i < T 1 or T 1 < t i < T 2 or T 2 < t i < T 3 and one of b i 's branches becomes extinct in the same interval, respectively, then P j,k i is 0 for all j, k.Note that the number P j,k i denotes the number of different paths in the tree from time T j to time T k in the subtree with root b i and in which no edge is taken twice.
Example 2.1.Consider the tree given in Figure 3. Here, the values P j,k i for bifurcation b 1 corresponding to time t In the sampling, select uniformly at random one of the T i -lineages as well as one of the T j -lineages to get the expected length E i,j of the path connecting a lineage at time T i with one at time T j in the underlying phylogenetic tree.Then, the expectation that a fossil from the intermediate time T 2 also will be an intermediate taxon of two taxa taken from T 1 and T 3 , respectively, refers to the assumption that E 1,3 > max{E 1,2 , E 2,3 }.We will show in the following lemma that this last inequality can fail and describe the precise condition for this to occur.Moreover, we later show that E 1,3 can be strictly smaller (!) than both E 1,2 and E 2,3 -that is the temporally most distant samples can, on average, be more similar than the temporally intermediate sample is to either of the two.Note that if P j,k i is 0, the corresponding branch does not contribute to the expected Figure 3: A rooted binary phylogenetic tree with three times T1, T2, T3 at which taxa have been sampled.The dotted branches refer to taxa that do not contribute to the expected distances from one of these times to another and thus are not taken into account.On the other hand, bifurcation b2 at time t2 shows that extinction may have an impact on the expected values.Such branches have to be considered.
distance from one time to another.We can therefore assume without loss of generality that all bifurcations b i have at least one descendant on their left-hand side and at least one on their right-hand side, each in at least one of the times T 1 , T 2 , T 3 .In Figure 3, branches that therefore need not be considered are represented with dotted lines.
In order to simplify the statement of our results, for all bifurcations b i set Lemma 2.2.Given a rooted binary phylogenetic tree with times 0 < T 1 < T 2 < T 3 and the root at time t 0 = 0.Then, E 1,3 ≤ E 1,2 if and only if Proof. (1) every T 3 -lineage has an ancestor in T 1 ways along the root In the above bracket, the three summands refer to different paths from time T 1 to time T 3 .The first summand belongs to those paths that go directly from T 1 to T 3 and thus have length T 3 − T 1 .There are n 3 such ways as every T 3 -lineage has an ancestor in T 1 .The second summand sums up all paths going along one of the bifurcations b i for i = 0.For every i, there are by definition exactly P 1,3 i such paths.Similarly, the third summand refers to all paths along the root b 0 , whose length is determined by taking the distance from T 1 to the root plus the distance from there to T 3 .
Hence, there are no values 0 < T 1 < T 2 < T 3 such that T 3 − T 2 fulfills the required condition, and so E 1,3 > E 1,2 for all choices of T i .Conversely, suppose i:0<ti<T1 Then, select T 1 , T 2 with 0 < T 1 < T 2 and set Then, T 3 > T 2 and Corollary 2.4.If either (i) n 1 = 2 or (ii) no extinction occurs in the tree and Proof.(i) Note that if n 1 = 2, obviously only one bifurcation, say b î (for some î such that 0 ≤ t î < T 1 ), contributes to the number n 1 of lineages at time T 1 , all the branches added by additional bifurcations become extinct before T 1 .Thus: P 1,3 î , P 1,2 î = 0 and P 1,3 i , P 1,2 i = 0 for all i = î.Analogously to the proof of Lemma 2.2 we have for n 1 = 2: î .Thus, n 2 = P 1,2 î and = 0 for all i = î.Thus, i:0<ti<T1 (ii) In this case, obviously i for all i : 0 < t i < T 1 and therefore i:0<ti<T1 Lemma 2.2 essentially states that the expected degree of relatedness from taxa of time T 1 to taxa of time T 3 can be larger than the one to taxa of time T 2 , but it requires the distance from T 2 to T 3 to be "small enough".Whether such a solution is feasible can be checked via Corollary 2.3.Lemma 2.2 shows already how the role of intermediates depends on the times the fossils are taken from.Corollary 2.4(i) on the other hand shows how the tree itself has an impact on the expected values: if the tree shape (including branch lengths) is such that at time T 1 only two taxa exist, then the just mentioned scenario cannot happen as the condition of Corollary 2.3 is not fulfilled.However, we can prove an even stronger result, namely that not only E 1,3 < E 1,2 is possible, but E 1,3 < min{E 1,2 , E 2,3 } can be obtained for a suitable choice of times T 1 , T 2 , T 3 .For this, we need the following lemma.Lemma 2.5.Given a rooted binary phylogenetic tree with times 0 < T 1 < T 2 < T 3 and the root at time t 0 = 0. Then E 1,3 ≤ E 2,3 if and only if As in the proof of Lemma 2.2, we have (cf.( 3)) (5) Analogously, Thus, which holds precisely if With the help of the two lemmas we can now state the following theorem.
Theorem 2.6.Given a rooted binary phylogenetic tree with times 0 < T 1 < T 2 < T 3 and the root at time 0.Then, E 1,3 ≤ min{E 1,2 , E 2,3 } if and only if the following two conditions hold: Proof.The Theorem follows directly from Lemmas 2.2 and 2.5.
The following example demonstrates the influence of times 0 < T 1 < T 2 < T 3 according to the above theorem.

Discussion
The analysis of the fossil record provides an insight into the history of species and thus into evolutionary processes.Stochastic models can provide a useful way to infer patters of diversification, and they form a useful link between molecular phylogenetics and paleontology [8].Such models would greatly benefit from incorporation of potential fossil ancestors and other extinct data points to infer patterns of evolution.In this paper we have applied a simple model-based phylogenetic approach to study the expected degree of similarity between fossil taxa sampled at intermediate times.
'Gaps' in the fossil record are problematic [10] as they can be interpreted as 'missing links'.Therefore, numerous studies concerning the adequacy of the fossil record have been conducted (see, for example, [3], [9], [13]), and it is frequently found that even the available fossil record is still incompletely understood.This is particularly true for ancestor-descendant relationships (see, for instance, [4], [5]).For example Foote [5] reported the probability that a preserved and recorded species has at least one descendant species that is also preserved and recorded is on the order of 1%-10%.This number is much higher than the number of identified ancestordescendant pairs.Thus, it remains an important challenge to recognize such pairs [1].This is also essential with regard to ancestor-intermediate-descendant triplets, as it is possible that there are in fact fewer 'gaps' than currently assumed, i.e. that intermediates are present but not yet recognized.Such issues have an important bearing on any conclusions our results might imply concerning the testing of hypotheses of continuous morphological evolution, or concerning the shape of the underlying evolutionary tree based on the non-existence of certain intermediates.
Another challenge is to investigate different phylogenetic models for describing the expected degree of morphological separation between different fossil taxa sampled at different times.Our findings strongly depend on the assumption that morphological diversification is proportional to the distance in the underlying phylogenetic tree.This is justified if morphological difference is proportional to the number of differing discrete characters, that each of these characters changes at a constant rate over the time period of sampling, and that homoplasy is rare.This last assumption requires the rate of character change to be sufficiently small in relation to the time period of the sampling -the appearance of reverse or convergent character states will lead to a more concave (rather than linear) relationship between morphological divergence and path distance.A similar concave relationship might be expected for continuous morphological evolution as described by neutral Brownian-motion.
Thus, the impact of different assumptions on the role of intermediates could be further investigated.But even if we assume that diversification is proportional to time, there may be other ways to measure 'distance' that could be usefully explored -for instance, one could define the distance between two taxa to be the maximum (rather than the sum) of the two divergences times of the taxa back to their most recent common ancestor.This definition of distance allows the degree of relatedness to be higher for taxa on the same clade than for other taxa.In this case, there exist analogous results to Lemmas 2.2 and 2.5 (results not shown), but the formulae are somewhat different, particularly for Lemma 2.5. time

1 Figure 2 :
Figure2: For samples taken from different lineages of a tree, the distance d1,3 of one particular sample from time T1 to the one of T3 can be smaller than the distance of either of them to the sample taken at time T2.Yet in expectation we always have E1,3 > max{E1,2, E2,3} for two-branch trees.For more complex trees this can fail as we show in Example 2.7.