New publication ( Returning to the roots: resolution, reproducibility, and robusticity in the phylogenetic inference of Dissorophidae (Amphibia: Temnospondyli); Gee, 2021; PeerJ)
Title: Returning to the roots: resolution, reproducibility, and robusticity in the phylogenetic inference of Dissorophidae (Amphibia: Temnospondyli)
Authors: B.M. Gee
General summary: Phylogenetics, the means of inferring relationships among organisms and one way of classifying them, is a timeless part of paleontology, in part because our datasets are much more limited than those for modern organisms in which genetic barcoding is both widely accessible and robustly informative. As a result, there is often a lot more uncertainty and disparity among different analyses, and people should generally be skeptical when someone says there is a strong consensus for a given group, no matter how well-known (paleo nerds will definitely recall the debate a few years back about what 'Dinosauria' constituted). Temnospondyls are generally not well-known, and this means that they are not as well-studied. This often has the effect of resulting in one or two people producing a lot of the work for a given group, which can lead to an overstated consensus when the same dataset keeps getting used. Dissorophids, the armoured dissorophoids, are a great example of this. There is a ton of work, but pretty much everyone uses the same phylogenetic matrix. As you might imagine, analyzing pretty much the same dataset a bunch of times should lead to pretty much the same result, which would not be particularly compelling as a "consensus," but, SPOILER, there is in fact no consensus among these nearly identical analyses! That's a huge issue. This study delves into why, both through a separate phylogenetic matrix and from closely scrutinizing the previous one. There is a ton of data packed into it, but the short and skinny are that (1) previous studies have been compromised by systemic errors from the original; and (2) there is a general lack of resolution / of well-supported nodes among dissorophids (i.e. we don't know a lot more than we do know). Then there's a bunch of other toss-ins like missing holotypes and chimeras. Keep reading to find out more!
This is an example of an admittedly "boring" paleontological study. There are very few pictures of fossils, none of which are particularly nice specimens; the results will never turn into some kind of remarkable paleoartistic reconstruction; and you will never hear about these findings in the news. Nonetheless, studies like this underpin the very foundations of paleontology and all of those attention-grabbing paleontology headlines, which hopefully will become apparent from this post / paper.
Before getting in to the details of this particular paper, I think it's important to give an overview of phylogenetics for readers who maybe don't know very much (or anything) about it. (Those familiar with phylogenetics can keep scrolling). I've touched on phylogenetics before in relation to some of my other papers, but I'll review some key terms and ideas here, beginning with "what is phylogenetics?"
The data. Data entry is the first step in computer-assisted phylogenetics. What we do is make a character matrix with a lot of numbers. The image below is a partial screenshot of one matrix that I worked with for this study. On the left-hand side, you have all the names of the different species being studied. On the top-row, oriented diagonally, you have names of different characters (these are different features being assessed). The rest is made up of numbers, which are colour-coded in this example.
Phylogeneticists begin by sampling taxa - which species do they want to study? Without going into details, paleontologists have particular challenges because some taxa are only represented by fragments, and these tend to be very bad for phylogenetic analyses because they are scored for very few characters, so usually these are omitted. It is very rare to include every species of a group that you are interested in. Then they sample characters - which features differentiate these species? A simple example:
It's important to note that the simplest explanation is not always the right one. Life finds ways to do some really weird stuff. The gist is that parsimony is a workable baseline. Once we allow for more complicated solutions, the number of possible solutions increases exponentially, and it becomes both computationally expensive to search for them and difficult to actually figure out which one is the most likely (but this is where likelihood methods come in, which I'm not talking about here).
The output. A parsimony analyses will return what are called the most parsimonious trees (MPTs). There can be as few as one but are often many more. All of these MPTs have the same "length," which is given as a number of steps, which are essentially the number of evolutionary changes that had to occur to produce a tree with these relationships. If there are multiple MPTs, they will differ from each other at least slightly. In the example below, we recovered three MPTs. The red "taxa" (A–D) are ones whose relationships differ between each MPT. Note that some taxa may have the same relationships across all of the MPTs.
We then have different ways of summarizing the results (some of these are controversial).
The actual study (finally)
Okay, we are finally to the part where I talk about the actual study! This study looked at a group of dissorophoid temnospondyls, a group mostly found in the late Carboniferous and early Permian that I worked on for my dissertation (this is not part of my dissertation though). More specifically, I was looking at the large terrestrial dissorophoids, collectively known as olsoniforms (named after E.C. Olson) and which includes two groups, dissorophids and trematopids (yes, 'dissorophid' is one letter away from 'dissorophoid' and requires careful reading to avoid confusing).
Now what I wanted to do was to take the matrix I made for that 2020 trematopid study and expand it to include dissorophids, thereby forming an olsoniform phylogenetic matrix that comprehensively sampled all appreciably known taxa (no previous study has done this). What I ended up doing was a lot more than that, which is where the alliterative title comes in: "Returning to the roots: resolution, reproducibility, and robusticity in the phylogenetic inference of Dissorophidae (Amphibia: Temnospondyli)." I'll go through these three themes: resolution, reproducibility, and robusticity next.
Recent dissorophid phylogenies, shown below, display a range of resolution. There are two points of emphasis here. One is that disparity between studies. Most of these show large polytomies, but Liu (2018) nearly got the fully resolved tree that we're looking for. The second point speaks to what I commented on above - fiddling with the analysis to get resolution where none existed before. If you look at Dilkes (2020), there are two trees. The one on the left is poorly resolved. This is not very helpful for discussing relationships or evolution. The tree on the right is very resolved. We like that one! But to get that tree, Dilkes had to remove four taxa from the analysis. These taxa, often called "wildcards," are problematic in the sense that they confound resolution, whether because they are really fragmentary and have a lot of missing data or present weird combinations of features that don't line up with general trends. Removing wildcards is a common strategy to boost resolution, but like I said above, it inherently omits data that you know exist.
The problem crops up when people use the reduced-data results, which gives the more resolved tree, to talk about evolution and classification schemes, but then place omitted taxa in that framework. For example, Brevidorsum profundum is a typical dissorophid "wildcard." It's only known from one partial skull described in 1964 and is not well-preserved. Some people don't even think it's a dissorophid. Brevidorsum is either recovered within a large polytomy (uninformative) or is excluded from the analysis, but some workers constantly refer to it as being a member of a specific subfamily within dissorophids: Cacopinae (the dark green boxes). However, no recent analysis has ever demonstrated support for that, and this is where the old-school qualitative methods ("it sort of looks like that") and the modern quantitative methods ("the analysis says X") combine in a bad way.
My own analyses tended to recover very poor resolution. Sometimes this could be attributed to including a lot of pretty fragmentary taxa that aren't normally included (like the analysis shown above on left). When it looks like a giant comb...that's an uninformative tree. I could further extend that by showing that removing just a few taxa, some of which are not that badly preserved, could lead to a drastic increase in resolution (like the analysis shown above on right), similar to that example I just cited above.
Summary: Resolution is generally quite poor in olsoniform, dissorophid, and trematopid analyses. The only way to get decent resolution that allows one to create a detailed narrative of evolutionary history is to remove a fair number of taxa from the analysis, which only allows you to create a coarse-level narrative that often omits the "weird" taxa that may in fact figure prominently in a clade's evolution.
Reproducibility is one of the foundational tenets of science. If someone does an experiment again, can they get the same results? The first step in this process is that it has to be clear how to reproduce an experiment. If you run a complicated study of colour preference in spiders, but only say "we picked some spiders and not others for this experiment" in your methods, it will be impossible for someone to know the specifics of how you did the experiment. So even if someone redoes an experiment on colour preference in spiders, it will likely not be the same as your experiment. Maybe they picked the wrong spiders. Or used different colour palettes. Etc. Because of this, whether their results are the same or not ("spiders like flaming hot pink") as yours, comparing them is difficult because differences in the methods could be responsible for the outcome.
In phylogenetics, the main areas where reproducibility comes into play are (1) the character matrix; and (2) the parameters of the analysis. Most journals require people to include their character matrix in one form or another, but ambiguity in the requirements is one problem – some people include their matrices in very inaccessible formats that have to be modified or typeset before a program can analyze them (even though the software to make universally accessible formatted files is F-R-E-E and easily available). In the simplest format, all of those scores in the nice GUI display as text strings:
So when people don't provide their scores in an accessible format like the NEXUS file above, that means the next person has to edit or transcribe the data, which can introduce errors (indeed, I found a few errors in older studies where they had to transcribe it from a previous study and introduced a typo). Below is an example of a very inaccessible data matrix (the dissorophid matrix of Schoch, 2012); this is a figure of a table that cannot be OCR'd (even with the original publisher's version). This means that anyone who wanted to use this matrix would have to type out all of these scores themselves! That is a big waste of time.
Reproducibility doesn't always have to be about repeating the exact same experiment though – more straightforward experiments or ones where the results are intuitive (i.e. doesn't seem like there was data manipulation or accidental error) can be a waste of time to repeat except in coursework. So an alternative way to test reproducibility is to see whether an independent approach to the same question can recover the same solution. If the same general results are recovered, this really strengthens the argument derived from them. If there are different results, that provides an opportunity for future work to figure out why there are discrepancies.
One characteristic of most previous dissorophid studies is that they all draw from the same source matrix (Schoch, 2012; this is the blue bubble in the 2012 row in the figure above). This is common practice in science – taking an existing matrix and adding taxa of interest. It saves a lot of time because creating a new matrix is a lot of work! But there are two important points here. The first is that if everyone uses the same matrix and doesn't make many changes, each derivate of that matrix is not fully independent (we call these pseudoreplicates). Each pseudoreplicate is predisposed to producing the same results as the last iteration because the underlying data are nearly the same. So getting the same general result from five pseudoreplicates is not as much of a "consensus" as getting the same general result from five independent matrices. Essentially, this can be misleading and make you think that the results are very reproducible when that is really just because the data are nearly identical. This leads to the second point, which is that the data must be solid when they're getting propagated, otherwise you're just propagating errors.
Immediately, there is a big red flag when looking at dissorophid phylogeny:
There is no consensus among these pseudoreplicates, in spite of being 99% similar to each other!. Not getting a consensus from a bunch of pseudoreplicates is really concerning because it indicates that relationships are not well-supported and can easily change with just the addition of one taxon or a few scoring changes!
The explanations. There are a lot of different explanators for why there are different results. Some of these are a little more towards personal preference in setting different parameters in the analysis. One probable explanation is that the taxon sample differs slightly between studies; those with more resolution tend to have omitted more of those "wildcards" that are weird or highly fragmentary. If you compare the results of Maddin et al. (part C above) and Schoch & Sues (part D above), this is a likely explanator for why the former is much less resolved - it includes historical wildcards like Brevidorsum profundum, fragmentary taxa like Broiliellus olsoni, and somewhat odd taxa like Reiszerpeton renascentis (the holotype of which was long considered to be an amphibamiform dissorophoid).
Taxon samples are low-hanging fruit for explaining topological differences. So I went deeper and started looking at specific cells in the matrix to see if there were perhaps typographic errors (I did find some of those). However, I found a lot more errors that do not seem to be typographic. I looked at the latest version of the Schoch (2012) matrix, which was the one published by Dilkes (2020). I identified over 140 scoring errors, which is a lot in a fossil matrix that only has up to 77 characters and 29 taxa. Many of these were instances where Taxon X was scored for Feature Y...except Feature Y isn't even known in Taxon X! One example is Cacops woehri from the Richards Spur locality. This taxon is only known from partial skulls, but somehow it was scored for a bunch of postcranial characters. Suspiciously, all 14 scores of this taxon that were erroneous were scored the same as Cacops morrisi, also from Richards Spur but which is fairly distinct from C. woehri. In fact, this deep dive revealed that the three species of Cacops, which are readily differentiated from each other, didn't actually differ in any scores other than their distribution of missing data (i.e. when they could be scored, they were all scored the same). This explains why all previous analyses recovered them as a single polytomy, when qualitative comparisons clearly show that C. morrisi and C. aspidephorus are much more similar to each other than to C. woehri (which might not even be Cacops).
Cacops was the most egregious example of what seems to be previous workers "assuming" scores that are not validated by the present fossil record, but other taxa also seemed to have scores here and there where a score was "assumed" based on the general affinity of that taxon (e.g., that some dissorophids with no known quadrate had a dorsal quadrate process preserved; this is a common feature in dissorophoids). This is, of course, a major problem! If you have what you think is a new species of Cacops, and you score it for a bunch of features of other species of Cacops, you have pretty much guaranteed yourself of the outcome that you are allegedly testing because you have made this new species more similar to its alleged relatives than the fossil record actually bears out. Cacops aspidephorus is the best example of this - for over a century, nobody knew what the sutures were because the specimens were so poorly preserved. You could see this if you looked at reconstructions of the skull of different dissorophids, like from Schoch (2012) below. As you can see, C. aspidephorus is just a grey blob - no sutures. Why then, were there 15 scores for C. aspidephorus for characters that require sutures to be known? The sutures of the skull, based on a few specimens that were not so bad off, would not be published for another eight years until Anderson et al. (2020). In fact, other than characters with missing data for one or two of the three species, there were no characters for which one species of Cacops was scored differently than another.
Other issues relate to how characters are defined. For example, if a character requires skull length to be known, it doesn't make sense to score a taxon only known from specimens that aren't complete longitudinally. However, taxa like these were scored for these characters. Issues with characters like these that explicitly require certain landmarks but that seem to be scored in the absence of at least one were fairly common as well. In the examples below, both Brevidorsum profundum (only one specimen; Carroll, 1964) and Reiszerpeton renascentis (only one specimen; Maddin et al., 2013) are known from incomplete specimens. Therefore, their skull length is not actually known, but characters that require a ratio invoking the skull length were scored for these taxa previously.
As one might expect, correcting a large number of errors and then rerunning an analysis leads to drastically different results. On the left of the figure below are the original results from the Dilkes' matrix, and on the right are the results when the corrected version of the matrix is analyzed. There is a major loss of resolution! This suggests that at least some of the inconsistencies between previous pseudoreplicates of this matrix result entirely from erroneous scores in the matrix.
Furthering the issue is that these systemic errors didn't just appear in the latest version of Schoch (2012), published by Dilkes (2020). In fact, none of them were introduced by Dilkes, which I feel is very important to state, lest the reader think that Dr. Dilkes produces error-riddled work. Most of the errors go back to the original matrix and were thus introduced by Schoch, although some were separately introduced as taxa were added (e.g., Holmes et al., 2013; Liu, 2018). This means that every single previous study that used this matrix has also been compromised. I suspect, though I didn't test this, that correcting the same scores in all of the previous matrices would result in a loss of resolution in all of them, but this would actually boost the similarity between studies because the resolution of wildcard nodes would be eliminated.
Summary: Previous dissorophid analyses are highly irreproducible insofar as the source matrix (Schoch, 2012) contains substantial errors, many of which seem to be based on assumed anatomy that cannot be validated from the fossil record. This has likely compromised studies that analyzed this matrix or a derivate thereof with respect to resolved relationships within Dissorophidae (unresolved relationships are probably okay).
Without some assessment of reliability, a phylogeny has limited value. It may still function as an efficient summary of available information on character-state distributions among taxa [...] but it is effectively mute on the evolutionary history of those taxa (Sanderson, 1995:299).
One of the challenges with strictly morphological data is that there are just not very many datapoints. Compared to thousands to millions of genetic base pairs in molecular analyses, the average fossil tetrapod matrix has between 50 and 350 characters. This means that just a few scores could be responsible for a certain recovered relationship, whether a longstanding one or an entirely new one. Sometimes, even changing just a single score can change the entire tree! This is where support metrics come in. These are statistical means of assessing how "strong" or "robust" a given node is; in other words, is this relationship likely supported by only a single score, and thus not very reliable, or are there many lines of evidence (scores) supporting it? In parsimony analyses, there are two conventional methods of assessing support.
Bootstrapping: Say you have a character matrix with 8 taxa and 14 characters. Bootstrapping is called "resampling with replacement." In one bootstrap replicate, the program will randomly select a character and its correspondent scores. Let's say it selects character #7 from the original matrix. This now becomes the new character #1 in the bootstrap. The program then picks the new #2 and the new #3 and so on. The caveat is that it can pick a character from the original matrix that was already selected. So it could pick character #7 three times in a row. It keeps picking until it has the same number of characters as the original matrix. Invariably, it is almost a given that some characters will be resampled multiple times, which means others are not sampled at all. Then you run this a lot of times (like 10,000) and calculate the percentage of these resampling replicates that nodes are found in. If you get low bootstrap support, that means very few replicates recovered that relationship and suggests that it is not well-supported because it is probably linked to one character. Bootstrap support over 50% (a node occurs in >50% of the bootstrap replicates) is considered "strong."
Bremer decay index: Bremer support is an entirely different measure that has nothing to do with resampling. Bremer decay calculates how many additional "steps" are needed for a given node to collapse. This is exclusive to parsimony analyses as a result since it is predicated on the number of steps in the most parsimonious tree(s). Some programs have an automated way of calculating Bremer support, but the common way to do it is to get the MPTs and the number of steps among them, then get the strict consensus. Then you rerun the search and tell it to save not only the MPTs but also all trees that are +1 step longer. This will inherently be more trees and it includes trees that are not the most parsimonious. Then you get the results, compute the strict consensus, and if a node previously found is no longer found, that node has "collapsed." If it collapses at +1 steps, then its Bremer decay index is 1. You then proceed to do this search for all trees +2 steps longer, +3 steps longer, etc. >3 is considered "strong."
It is standard for scientists to report at least one if not both of these support metrics because it tells you how robust your recovered tree is. If a tree is not well-supported, then even one scoring error or dubious score could be producing this topology. Analyses with my matrix tended to recover low support for at least one of these metrics. I used a visual key to indicate whether a node was strongly supported or not by marking "weak" support in greyed-out text. You can see in the example below, which was an analysis of my matrix with the same taxon sample as Dilkes (2020), that there is a lot of grey... In fact, in this particular analysis, the only node with both strong Bremer and bootstrap support is the node for trematopids (orange circle), which was not even a focus of this analysis!
Previous dissorophid studies are a giant mixed bag. Some studies only reported one metric, and others didn't report anything. The latter is particularly bad because it emphasizes getting resolution (intentionally or not), no matter how weakly it might be supported or how few characters it might hinge on. This can lead people to "tinker" with the matrix, leaning one way when they're on the fence or specifically making modifications to certain characters or taxa until the expected or desired result comes out. It might be a reason why there were so many errors in the Schoch (2012) matrix in which one species was scored the same as another species in the same genus – scoring based on assumptions like this is a great way to boost both resolution and support. It's a reminder that even an "objective" quantitative method like computer-assisted phylogenetics is still underlain by subjective human decisions. Furthermore, when support is reported, it's often weak except for really large clades. For example, it isn't really that surprising or informative in a dissorophid-focused study that Dissorophidae is a strongly supported node unless you were suspicious about whether all dissorophids actually formed a clade (this isn't particularly controversial).
This is an example from the Dilkes (2020) analysis. As a reminder, "strong" support is Bremer > 3 (the first value) and bootstrap > 50 (the second value). Values near the bottom (more inclusive clades) are high, but they get really low as we go up the tree. Like my own analyses, trematopids (node E) have excellent support...but we're not studying trematopids in this analysis... Once you get into Dissorophidae (node J), which is what we wanted to study, no in-group node has Bremer > 1 (literally the lowest it can be) and most of them have bootstrap hovering around that 50% threshold. This isn't an indictment of the scientist, just to be really clear. But it does underscore that the topology of the tree is not the only thing that matters! Weakly supported relationships are just that – weakly supported.
Summary: Even when there is resolution achieved in dissorophid analyses, the statistical support is weak for most nodes except those that are outside of Dissorophidae. So this suggests that the inconsistent topologies between pseudoreplicates of Schoch (2012) are all insignificant insofar as one is not preferable because none is strongly supported statistically. Reporting these stats is important, even if it hinders what author(s) can say.
A few other goodies...
Having already gone way further into the weeds than I intended to for this project, a lot of the discussion is a comprehensive synthesis of where things stand with dissorophids in their entirety. I overview Cacops, which could get a little help from a revision of Parioxys, a taxon originally interpreted as an eryopoid (and thus commonly placed as one in supertrees) but that in fact seems to share many similarities with Cacops. It has not usually been compared with dissorophoids at all, but could it be one? I also commented on Broiliellus, which is a dumpster fire of a genus. Nobody has ever recovered Broiliellus as a clade except by...you guessed it...excluding taxa. My preference is to restrict Broiliellus to the type species, B. texensis, and put everything else in single quotes (e.g., 'B.' reiszi) until this can be sorted out (some of these taxa have not been described since the 60s).
Then we have two more exciting discussions, the first a missing holotype and the second a possible chimera.
Allegedly, AMNH FARB 4785 should not exist! Dating back to the 60's, people said this specimen was missing. But I found it! (technically I found it in 2017 and said as much in a 2018 paper). But then where is AMNH FARB 4785a, which was apparently confused for AMNH FARB 4785? Concerningly, that specimen is nowhere to be found, and there is no record of it in the museum database or in their most recent inventory. It is always bad when a specimen goes missing, let alone a holotype. However, as far as I know, this specimen has never even been figured, only described generically as spines. It certainly wouldn't be the first time that a specimen just disappeared.
Above is AMNH FARB 11544. These are characteristic spines of Platyhystrix rugosa (part A). Coincidentally, this specimen was collected in 1881, the same year that AMNH FARB 4785a was collected, and they were both collected by the same guy from the same bonebed in New Mexico. However, AMNH FARB 11544 was not described or figured for a century until Berman et al. (1981). However, it meets the original description of AMNH FARB 4785 ("several spines") by E.C. Case in 1910, includes several that are in line with the measurements given by Case, and even includes a partial synapsid scapulocoracoid that was also mentioned by Case. The coincidence is almost too compelling! But there is no record of a number transfer, even after I checked with the AMNH collections staff (who have no record of AMNH FARB 4785a in their database on in the most recent inventory), so at present, the holotype of P. rugosa remains the missing AMNH FARB 4785a. I put this in the paper in the hopes that someone will be able to shed more insight on this...
Most examples of chimeras in paleontology are unintentional. They frequently result from assumptions that a collection of bones from the same spot, without excessive elements (e.g., five femora), belong to not only the same species but to the same individual. This is where preconceived biases can come in; everyone will have a list of "likely candidates" for a fossil when they come across it (e.g., you do not expect to find any temnospondyls in the Cretaceous Hell Creek Formation alongside Triceratops), but sometimes this list is too restrictive. A recent example is Dakotaraptor, a putative dromaeosaurid dinosaur (Velociraptor group) in which the purported wishbone turned out to be part of a turtle (see here). Another was a relatively rare report of a pterosaur pelvis from Canada that turned out to be part of a tyrannosaurid skull (see here and here). As far as I know, nobody's ever definitively identified a temnospondyl chimera.
That's where Aspidosaur binasser comes in. This taxon is based on one specimen from the Permian of Texas. It includes several skull fragments that collectively fill out most of the top of the skull and about 20 vertebral positions. This species is interesting for a couple of reasons. The first is that Aspidosaurus (like "Aspidosaurus" apicalis above) is a wastebasket taxon, encompassing a bunch of generally similar morphotypes that probably don't represent a single taxon. These morphotypes are largely isolated osteoderms that form an inverted V shape; cranial material of Aspidosaurus is practically unknown, which wasn't helped by the destruction of all material of the type species (which did include a skull), A. chiton, during an Allied bombing of Germany in WWII. So A. binasser is the best (and only) skull material we have. Then, A. binasser was described as having three different osteoderm types (shown below; figure from Berman & Lucas, 2003), which is unprecedented variation – other dissorophids only have one type, though it may change slightly in width or other minor attributes. And this is really where things start to get a little suspicious.
Despite having a good portion of the skull, including the occiput, which is the neck joint, and 20 vertebral positions from specifically the presacral (anterior to the pelvis) region, there is no articulation between anything. All three osteoderm types are found on their own; there is no block or fragment with two or three different osteoderm morphotypes on it. Then there is the peculiar question of where are the atlas and axis (the neck vertebrae)? Most dissorophids seem to have between 20 and 25 presacral vertebrae, so what are the odds of getting half the skull and 20 presacral vertebrae but not the neck vertebrae? If this holotype is a chimera, it would be really hard to prove because it's not like some composite Chinese bird that was Frankenstein'd together from different slabs, and all of the material does appear to be genuinely dissorophid in nature. And unlike how finding previously unidentified fits can positively prove association, not finding fits is not necessarily evidence for chimerism.
Original caption: Revised stratigraphic distributions of dissorophids in the U.S. midcontinent. Asterisks represent newly reported occurrences from the Pennsylvanian of the northern midcontinent. Solid bars represent observed occurrences. Dashed bars represent inferred or approximated occurrences where specimen provenance is imprecisely constrained or poorly documented. Abbreviations: CH, China; CO, Colorado; KS, Kansas; NE, Nebraska; NM, New Mexico; OH, Ohio; OK, Oklahoma; RU, Russia; TX, Texas; UT, Utah. (Stratigraphic ranges and midcontinental chronostratigraphy were compiled and updated from the following sources: Carroll, 1964; Vaughn, 1971; Olson, 1972; Ossian, 1974; Lewis and Vaughn, 1975; Foreman and Martin, 1988; Hentz, 1988; Hook, 1989; Sumida et al., 1999; Kissel and Lehman, 2002; Reisz et al., 2009; Berman et al., 2010; Lucas et al., 2010.)
The original argument (more like assumption) for associating all of this material to a single species, let alone to a single individual, was that there was "no evidence for more than one dissorophid." This is, of course, negative evidence. Just because you don't find five femora doesn't mean that the two that you did find belonged to the same animal or species. Positive evidence would be definitive articulation. At the time, the recognized dissorophid diversity was much lower than it is now – we've had at least five new species named since then – and the interpretation that dissorophid-bearing sites only preserved one dissorophid species seemed to hold up (like with the Cacops Bone Bed). Since then, numerous examples of sites with multiple dissorophids have emerged. Especially at Richards Spur, we know that not only are there many species, but they are not equally abundant or represented by the same elements (more on this). So finding a cornucopia of dissorophid elements does not mean they all belong to one species or to one individual. Aspidosaurus is already known to be known almost only from isolated osteoderms (the same is true of Platyhystrix), so it is not unreasonable to argue that you could have some stereotypical Aspidosaurus osteoderms, possibly with an Aspidosaurus skull, mixed with random bits of other dissorophids.
So it is my suspicion that Aspidosaurus binasser is a chimera, but given the nature of its preservation, it is basically impossible to prove. Even if you found a specimen with the same skull that was articulated with vertebrae that only had one osteoderm morphotype, someone could (and probably would) argue that it's just a different species. This is a great example of the burden of dodgy taxonomy – it is very hard to undo (inertia), no matter how weak the original working assumptions were / are. At present, this is not a huge issue for phylogeny because there is no character for number of osteoderm morphotypes, but it is an issue for taxonomy because A. binasser and the functionally lost A. chiton are only differentiated on this feature.
Apples and oranges
Those who keep up with enough of my research know that one of my big themes is ontogeny, how we identify how mature temnospondyl specimens were at the time of death, and what not accounting for biases or skews in the fossil record might totally screw up our interpretations and analyses. I didn't focus on this too much with dissorophids as I did previously with trematopids, but there is a major disparity in size, which might correlate with disparity in the maturity of preserved specimens. Part of my thesis looked at Cacops and suggested, based on braincase morphology, that pretty much all known specimens of Cacops are juveniles at best, which underscores the point that just because you have a lot of the same size doesn't mean that was the largest size. This is definitely an interesting future direction though because cacopines like Cacops include the largest dissorophoids, which seem to get especially large in the middle Permian after lots of other Paleozoic temnos go extinct (including trematopids). Dissorophids are really diverse, and unlike trematopids, seem to have coexisted with other dissorophid species fairly commonly, so there could be legitimate major size differences between taxa that would lead to niche partitioning (e.g., one targets small tetrapods, one targets medium tetrapods, etc.).
This has been a longggg post. Hopefully it made sense if you read it all the way through, and if you did, thanks for reading! Hopefully you learned something new! This paper can really be distilled to four main points, which I also summarized at the end of the paper in a very atypical fashion for me just because of how much is packed in here.
New publication (First record of the amphibamiform Micropholis stowi from the lower Fremouw Formation (Lower Triassic) of Antarctica; Gee & Sidor, 2021, JVP)
Title: New publication (First record of the amphibamiform Micropholis stowi from the lower Fremouw Formation (Lower Triassic) of Antarctica
Authors: B.M. Gee, C.A. Sidor
Journal: Journal of Vertebrate Paleontology
General summary: Following the end-Permian mass extinction (252 mya), life on Earth hit a big reset button. Many major groups suffered substantial losses in biodiversity or went entirely extinct, being replaced by other groups that were rare or non-existent before the extinction. The temnospondyl amphibians are no exception - almost all of the characteristic groups found in the Permian are entirely wiped out, and that ones that survive it are found in very low diversity and are restricted to mostly a few localities that would have been at high latitude. Concurrent with this is the observation that a lot of the first temnospondyls that appear in the Early Triassic are rather small compared to their Permian predecessors or their later Triassic successors. One of these is a holdover from the Permian, the amphibamiform dissorophoids. There is only one definitive amphibamiform in the Triassic (there were over 30 in the Permian), Micropholis stowi, a species known from South Africa for over 150 years now. Our study reports the first record of M. stowi from outside of South Africa, in similarly aged Early Triassic rocks from Antarctica. This further solidifies the similarities between the Early Triassic of Antarctica and South Africa, but it still comes in the face of somewhat stark endemism otherwise - most of the temnospondyls known from the southern hemisphere at the time are found in only one spot, despite there being a rich record of this group in places like South Africa, Australia, India, and Madagascar. This is also the first paper from my postdoc and sets the stage for some of our in prep projects on other newly collected material from Antarctica!
New publication ( Description of the metoposaurid Anaschisma browni from the New Oxford Formation of Pennsylvania; Gee & Jasinski, 2021, Journal of Paleontology)
Title: Description of the metoposaurid Anaschisma browni from the New Oxford Formation of Pennsylvania
Authors: B.M. Gee, S.E. Jasinski
Journal: Journal of Paleontology
The amazing art above was done by Sergey Krasovskiy; you can find his DeviantArt profile here and his Twitter here! The number of metoposaurids is not random - we know that there were at least two individuals preserved at the site based on the number of duplicate elements from the site.
General summary: This study provides the first detailed description of metoposaurid temnospondyls from Late Triassic deposits in Pennsylvania. Metoposaurids are well-known for their widespread distribution in the Late Triassic, including in the southwestern U.S., but they have a much scarcer record west of Texas. The east coast in particular was geographically closer to other regions that preserve metoposaurids that have drifted apart over the millenia, like northern Africa and western Europe. The east coast is also the only region in North America where a large-bodied temnospondyl that isn't a metoposaurid is known from (this pattern of metoposaurid exclusivity is not observed anywhere else with decent sampling). While previous people have mentioned the Pennsylvania material and ID'd it, they never provided good figures or explicit justification for their ID, so essentially any reader was left to take them at their word. We formally describe the material, identifying it to Anaschisma browni (best known from Arizona, New Mexico, Texas, and Wyoming), which is not the same species that a metoposaurid from Nova Scotia belongs to, and discuss the importance of providing proper documentation and justification for taxonomic identifications (which are fluid hypotheses, not static facts). While the east coast is much more fossil-poor (this is definitely the best metoposaurid material from anywhere in North America west of Texas), it may also be the only region that preserves the transition from a diverse large-bodied temnospondyl assemblage to one that is entirely dominated by metoposaurids.
New publication ( New information on the dissorophid Conjunctio (Temnospondyli) based on a specimen from the Cutler Formation of Colorado, U.S.A.; Gee et al., 2021, JVP)
Title: New information on the dissorophid Conjunctio (Temnospondyli) based on a specimen from the Cutler Formation of Colorado, U.S.A.
Authors: B.M. Gee, D. S Berman, A. C. Henrici, J.D. Pardo, A. K. Huttenlocker
Journal: Journal of Vertebrate Paleontology
The Red River
The Red River is a mostly east-west oriented, eastward-flowing indirect tributary of the Mississippi River, stretching from the Texas Panhandle all the way to Louisiana (over 1,350 miles). It forms the border between Texas and Oklahoma (hence where the "Red River Rivalry" in collegiate football between University of Texas and the University of Oklahoma derives from), two of the most historically productive states for early Permian tetrapod fossils (especially the famed Texas redbeds). Much of our record of North American temnospondyls from this time comes from these two states, with more minor contributions from New Mexico, Utah, and Ohio. Dissorophids, the armoured dissorophoids, are among the best known taxa from these redbed deposits, and in many instances, the entire record of a given species is restricted to one or two states. Whether this is an artifact of the fossil record or an accurate representation of a truly restricted geographic range is often unclear; many taxa are in fact known from only one specimen or one site (the two species of Cacops below are examples; figures from Gee & Reisz, 2018 and Anderson et al., 2020).
Land of Enchantment
Conjunctio multidens (reconstruction from Schoch & Sues, 2013) is one such dissorophid, known only from two sites (but the same county) in New Mexico. Although New Mexico and Texas are adjacent states, their faunal assemblages were somewhat different, a result of regional geography no longer present that would have created some degree of spatial segregation. Other examples can be found in the trematopids Anconastes and Ecolsonia, only found in New Mexico, or the dissorophid Broiliellus reiszi, which differs starkly from other members of the genus that are known from Texas. Because the Texas redbeds have been so extensively sampled, with at least a dozen other dissorophids known, it can be reasoned that indeed Conjunctio also did not occur in Texas. But does that mean that it was only found in New Mexico? Not necessarily. The rest of the Four Corners region and the midcontinent is less well-explored, probably because the early Permian deposits that yield tetrapod fossils are less extensive. One of the well-known productive localities in the Four Corners region is in the Placerville area in Colorado, where the Cutler Formation (also found in New Mexico) is several hundred meters thick. First reported by Lewis & Vaughn (1965), the assemblage preserves one of the few sails of the dissorophid Platyhystrix (below on right), the holotype of Diadectes sanmiguelensis, a diadectomorph stem amniote (below on left), and a number of other tetrapods also known from other redbeds deposits. Synapsid fans may also know this as the type locality for the sphenacodontid Cutleria.
Terminology: While 'transitional form' or 'transitional fossil' is a commonly used term that even the general public knows (e.g., Archaeopteryx), scientists are moving away from it because it's misleading, suggesting that one animal directly turns into another (one of the most common misconceptions about evolution). In fact, this very rarely occurs (a process called anagenesis), and it does not capture major transitions like the theropod-bird transition or the fish-tetrapod transition. Instead, continual splitting of new species that go extinct is what gradually leads to a shift in a group's anatomy. A given species' suite of features may be transitional insofar as it captures part of this shift, but the species or individual itself is not.
Agree to disagree
As one might expect from a taxon with a weird mixture of features, there is no agreement on the position of Conjunctio in phylogenetic analyses of dissorophids, as summarized on the right. Historically, the phylogeny did not support that they were conspecific (expected recovery as exclusive sister taxa), as they were frequently scored separately due to uncertainty over Carroll's referral (the referred specimen was called the "Rio Arriba Taxon"). It doesn't really help that the holotype (C-D) and the much smaller referred specimen (A-B) differ from each other in cranial proportions, somewhat questioning whether they are actually the same taxon (since Bob Carroll created Conjunctio in 1964). You don't have to be a paleontologist to look at the two specimens above and kinda wonder whether they are actually the same (or whether the bottom one can be said to be much of anything really). Sometimes both specimens were recovered as eucacopines (Holmes et al., 2013), sometimes only one was recovered as a eucacopine (Schoch, 2012), sometimes Eucacopinae didn't exist (Maddin et al., 2013), sometimes the composite was recovered as a eucacopine (Schoch & Sues, 2013), and sometimes the composite was recovered as a dissorophine (Liu, 2018). So basically every possible result short of it being recovered as a non-dissorophid!
The Centennial State
The new specimen that we report here was collected from what's known as the Placerville locality by my coauthors, Adam Huttenlocker and Jason Pardo, a few years back. It is not nearly as complete as the other two specimens, but it is well-preserved, enough to show some distinctive features not found in most or all of the other dissorophids. For example, the two bones called the postorbital (right behind the eye) and the supratemporal (a little farther back) usually touch, but here they do not (this is found in two of the species of Cacops as well). The jaw articulation (marked by the quadrate) also sits in front of the level of the back of the skull (marked by the postparietal); this is something found only in Dissorophus and Scapanops. So there is actually quite a lot of information in this little skull despite it being at best 25% complete!
Our phylogenetic analysis, which combines two previous matrices (it was originally intended for a larger sample), finally recovers not just the original two specimens, but also this third one as a proper clade (i.e. what you expect if they really do all belong to the same species)! The statistical support is not very good though, and it's quite possible that a future study could find a different result. Dissorophid phylogenetics remains very much in flux outside of just Conjunctio (something that I'm working on right now).
COVID addendum: Lest someone think that all of us are superhumans who pumped out this paper in the pandemic, it was mostly completed before the pandemic really set in (we submitted the first version in May 2020). Jason and I in particular as current and recently graduated PhD students have really slowed down in putting new work into the pipeline during the pandemic, which I think is important to note since a lot of people are really struggling and should not be concerned with their perceived relative productivity!
New publication (Reassessment of historic ‘microsaurs’ from Joggins, Nova Scotia, reveals hidden diversity in the earliest amniote ecosystem; Mann et al., 2020, Papers in Palaeontology)
Title: Reassessment of historic ‘microsaurs’ from Joggins, Nova Scotia, reveals hidden diversity in the earliest amniote ecosystem
Authors: A. Mann; B.M. Gee, J.D. Pardo, D. Marjanović, G.R. Adams, A.S. Calthorpe, H.C. Maddin, J.S. Anderson
Journal: Papers in Palaeontology
DOI to paper: 10.1002/spp2.1316
General summary: In what is my biggest collaboration to date with nearly all of the early tetrapod workers in Canada, here's probably my last paper of 2020: a very thorough revision of the 'microsaur' Asaphestera from Joggins, Nova Scotia! Joggins is one of the oldest sites with a well-preserved late Carboniferous tetrapod community (315-318 Ma), and it preserves a unique forest swamp environment in which animals are frequently preserved inside of logs (whether or not they lived in the logs is another question). Because of its age, Joggins is critical for exploring the early origins of many groups, as it often preserves the oldest members of many Paleozoic clades in North America, if not globally. For example, the temnospondyl Dendrerpeton from Joggins is the oldest known temnospondyl on the continent. There's a rich suite of 'microsaurs,' the ever controversial group of tetrapods of unclear affinities to either crown reptiles or to crown amphibians, that have been reported from Joggins, but many of the original studies are dated such that the interpretations are of limited utility for modern workers trying to elucidate phylogenetic relationships and other macroevolutionary trends. The frequent disarticulation and fragmentation of specimens also makes referrals of specimens to the same taxon somewhat suspect, with the potential for purportedly well-known taxa to be represented by a chimeric blend of multiple taxa. We spent a lot of time looking at historic specimens of one such 'microsaur,' Asaphestera, because we were particularly interested in exploring some of these very poorly known Carboniferous 'microsaurs' that are really important for understanding the evolution and relationships of the much better known early Permian taxa that everybody on this paper has studied to various degrees. Through this, we identified that the various specimens lumped under Asaphestera intermedia are actually a hodge-podge of different taxa, at least one of which is not even a 'microsaur,' but as we propose, is perhaps the earliest definitive synapsid!
The oldest synapsid?
RM 2-1131 was the original holotype of Hylerpeton intermedium and became the holotype of the new Asaphestera intermedia. Here it is above on the right. If the figure does not look that impressive...that's because the specimen is not very impressive. We couldn't identify it to a particular tetrapod clade, let alone diagnose it as a unique taxon of whatever clade that might be, and we designated it as a nomen dubium ("doubtful name"), which is the fancy way of saying "this is not valid because it is not clearly unique." So at this point, we've killed one 'microsaur.' Net: -1 microsaur
In any event, we then set about examining the referred specimens of "Asaphestera intermedia," at this point a sunk taxon, because some of them are much nicer. The one above was sufficiently complete to be diagnosed and differentiated from other taxa, and we made it a holotype of a new 'microsaur' taxon: Steenerpeton silvae, which we named for Margaret Steen, (the same Steen I keep mentioning throughout this post), and for 'of the forest,' referring to Joggins original paleoenvironment. So now we've added back one 'microsaur!' Net: +0 microsaurs
Steenerpeton appears fairly derived already, in contrast to traditional interpretations of Asaphestera of any flavour, and we suggest that it may be a recumbirostran, the derived group of 'microsaurs' that host a suite of fossorial adaptations. This actually lines up well with the documented presence of other derived 'microsaurs' like possible pantylids and gymnarthrids at Joggins and provides more evidence that the origin and diversification of recumbirostrans begin earlier than has been historically conceived and probably occurred in an interval without substantial body fossils. The former point in particular is a real running theme at this point for most of the research that all of us do.
This is the biggest collaboration I've ever worked on (some might say the most ambitious crossover ever). It was born nearly 2.5 years ago at the SVP Annual Meeting in Calgary, which is where Arjan and I both gave talks on different 'microsaurs' (Arjan's turned into his published paper on the new Mazon Creek 'microsaur' Diabloroter, and mine turned into my published paper on new material of Llistrofus from Richards Spur). Jason Pardo (and Jason Anderson) had just had a high-profile paper come out that summer that recovered 'microsaurs' as crown amniotes, a paper I didn't stop hearing about from my advisor even though I spent the whole summer doing fieldwork in Arizona. We all ended up talking about future directions for 'microsaur' work and agreed on needing to reevaluate a lot of the earliest records of what are called "tuditanomorphs" to better characterize them for big tetrapod analyses to continue to sort out different relationships. This in turn spawned what's become a dynamic working group among the three of us and eventually led to us jokingly assigning ourselves as different members of the Avengers as this team of early career researchers doing early tetrapod work on basically every single clade that's out there. If you add up everything that we've collectively published, we account for actinopterygian and sarcopterygian fish, aïstopods (stem tetrapods), temnospondyls, parareptiles, seymouriamorphs, embolomeres, synapsids, and 'microsaurs.' You can see some of the other fruits of this working group in Pardo & Mann (2018), Mann, Pardo & Maddin (2019) and Mann & Gee (2020), with plenty of future work slated for the immediate future (or once things sort of go back to normal)! I will leave it to the readers' imagination as to which members we have self-assigned as...
The final sum
The Joggins 'microsaurs' are the only taxa from Carboniferous localities that have experienced these sorts of "lump/split" boom-bust cycles, going from being remarkably taxonomically diverse to remarkably intraspecific variable and back again. The classic Joggins temnospondyl, Dendrerpeton, is another great example of this; the type species, D. acadianum, includes several defunct genera such as Smilerpeton (great name) and Dendryazousa. There are numerous taxa that have been named that weren't even assigned to the correct clade! A lot of this has been because of historic interpretations based on the often fragmentary and disarticulated specimens at Joggins; we don't get many 3-D specimens, and the distortions can affect interpretations of the anatomy. This means that there's a lot of room to go back and look at specimens that were often collected more than 120 years ago and that haven't been re-examined in detail in more than 60, 70, 80... years to assess the original interpretations. Particularly in light of new discoveries at other localities, updated taxonomy, and the recognition that a lot of clades have deeper origins than was often hypothesized in the early 20th century, exploring one of the oldest (crown) tetrapod assemblages in the world is essential for unraveling a lot of the mysteries of the origin of the modern tetrapod clades.
When we started this project, there were seven 'microsaurs' recognized from Joggins: Archerpeton, Asaphestera, Hylerpeton, Leiocephalikon, Ricnodon, and Trachystegos. We added Steenerpeton (total: 8), but deep-sixed Archerpeton, recognized Asaphestera proper as a synapsid, and cast doubt on the Joggins records of Ricnodon (final count: 5). We've got plans for some of those other ones that didn't get addressed here...stay tuned for plenty more studies coming out on Joggins material and other Carboniferous faunas!
About the blog
A blog on all things temnospondyl written by someone who spends too much time thinking about them. Covers all aspects of temnospondyl paleobiology and ongoing research (not just mine).