Evaluating the use of pairwise dissimilarity metrics in paleoanthropology

Adam D. Gordon, Bernard Wood

Source - http://www.sciencedirect.com/science/article/pii/S0047248413001863

Journal of Human Evolution, Volume 65, Issue 4, October 2013, Pages 465–477

Abstract

Questions of alpha taxonomy are best addressed by comparing unknown specimens to samples of the taxa to which they might belong. However, analysis of the hominin fossil record is riddled with methods that claim to evaluate whether pairs of individual fossils belong to the same species. Two such methods, log sem and the related STET method, have been introduced and used in studies of fossil hominins. Both methods attempt to quantify morphological dissimilarity for a pair of fossils and then evaluate a null hypothesis of conspecificity using the assumption that pairs of fossils that fall beneath a predefined dissimilarity threshold are likely to belong to the same species, whereas pairs of fossils above that threshold are likely to belong to different species. In this contribution, we address (1) whether these particular methods do what they claim to do, and (2) whether such approaches can ever reliably address the question of conspecificity. We show that log sem and STET do not reliably measure deviations from shape similarity, and that values of these measures for any pair of fossils are highly dependent upon the number of variables compared. To address these issues we develop a measure of shape dissimilarity, the Standard Deviation of Logged Ratios (s_LR). We suggest that while pairwise dissimilarity metrics that accurately measure deviations from isometry (e.g.,s_LR) may be useful for addressing some questions that relate to morphological variation, no pairwise method can reliably answer the question of whether two fossils are conspecific.

Figure 1. Illustration of log sem. Consider a pair of crania where one cranium is larger in most measurements than the other. In this case, 56 interlandmark distances were measured on two modern human crania. Paired homologous measurements are plotted against each other (open circles), e.g., orbital height from cranium 1 is plotted against orbital height from cranium 2, cranial length from cranium 1 is plotted against cranial length from cranium 2, etc. An ordinary least squares regression is then calculated for these data points, as is the logarithm of the standard error of the slope (i.e., log sem). The standard error of the slope is dependent on three factors: the slope of the regression line (m), the number of data points (i.e., the number of measurements, k, which is 56 in this case), and the coefficient of determination (r2), as follows: View the MathML source. Thus the value of log sem decreases when the slope decreases, the number of measurements increases, or r2 increases. If all pairs of measurements were perfectly correlated (gray points), they would sit directly on the regression line and have an infinitely negative log sem value (because sem = 0). The actual measurements have a lower correlation (open circles) and thus they sit farther from the line and have a higher log sem. As proposed by Thackeray et al. (1997), a higher value of log sem is a measure of greater shape dissimilarity between two crania. Note that swapping the axes upon which the measurements are plotted does not affect k or r2, but it does affect the slope (in this case, changing it from a value greater than one to a value less than one), which in turn affects the value of log sem (see text for discussion of STET, which was developed to take this asymmetry into account).

Figure 2. Comparison of homologous measurements for two idealized primate crania (in profile). Note the difference in shape between the monkey-like cranium on the left and the australopith-like cranium on the right. Despite the different shapes of these crania, when the four highlighted measurements are plotted against each other (as non-transformed, not logged data), all four data points fall exactly on a regression line that does not pass through the origin (solid diagonal line) although they do not fall on a regression line that is constrained to pass through the origin (dashed diagonal line). According to log sem and STET, the shape of these two crania would be considered to be identical.