Friday, October 10, 2014

How should we report the age of fossils? Pitfalls and implications for paleontologists.

A few years ago during a talk I was watching at a conference (the details are better left unstated), I realized that there is quite a variety - both in terms of methodology (or lack thereof) and quality of format - of ways to report the geologic age of fossils. Ever since I have wanted to write up a post since the topic has been nagging at the back of my mind. Paleontologists often frustrate geologists for poor understanding of certain major geological principles - something I never really understood when I was an undergraduate student; at Montana State U., we had a pretty strong background in geology. Further to the point, my undergraduate Dave Varricchio is a specialist in taphonomy, and hammered into our poor little heads the importance of geology in our field - which was subsequently galvanized by a master's thesis in the subject, my adviser for which was (gasp!) a sedimentologist (Jim Schmitt). After attending various conferences, however, statements about geology and taphonomy in various talks have left me slack-jawed, and I unfortunately understand why paleontologists get ribbed about it.

I'll start by saying this:  we study fossils because they give us unique insight into the history of life on earth, and we risk missing the true impact of paleontological data if we are careless about it. Now that that's out of the way, here's some of the topics I'm going to touch on: 1) basics of geologic age, 2) sources of age data, 3) recently proposed best practices for reporting ages for molecular clock data, 4) dates from the paleontologic and stratigraphic literature, 5) pitfalls of the paleobiology database, and 6) some suggestions for how to present age data for paleontologists. I'll try to keep this informative and useful instead of a rant, but there will be specific points where I just won't be able to help myself. 

As a quick note: while I am perhaps conversant in stratigraphy and geology, I am by no means an expert (I spend far more time looking at bones!), so anybody who knows better than I do - if you have anything to add to this, whether it be corrections, hate mail, tweaks to suggestions or even additional suggestions - your input can help improve this.

Basics of Geologic Age

I think that since this blog attracts readers with a wide variety of experience, some very basic geology is worth retreading what some have probably/hopefully read elsewhere. The geologic time scale was first assembled during the early 19th century in western Europe primarily based upon biostratigraphy. Pioneering geologists like William Smith used index fossils to identify zones that he identified as being the same age, and in 1815, Smith published the first large geologic map - looking at the beautifully exposed bands of strata in England, it's no wonder it formed the basis for the early understanding of geologic time and stratigraphy.

Charles Lyell introduced the terms Paleozoic, Mesozoic, and Cenozoic, to replace the Primary, Secondary, and Tertiary "periods" (as they were known then; Tertiary is the only one that lingers on). Lyell also introduced epoch divisions for the Cenozoic, and coined the epochs Pliocene, Miocene, and Eocene - at first, these formed the only divisions of the Cenozoic. Eocene roughly translates to "new dawn", referring to the dawn of the "new" Cenozoic invertebrate fauna; Miocene translates to "less new" and Pliocene as "more new", and the latter roughly means "continuation of the recent", referring to the relatively "young" aspect of the invertebrate fauna. These epochs were originally defined by Lyell based upon what percentage of invertebrates were extant: only 3.5% of Eocene mollusks and 17% of Miocene mollusks were extant; on the other hand, the "older Pliocene" was defined upon a mollusk fauna composed of 30-50% extant taxa. Lastly, the "newer Pliocene" was based upon mollusk faunas that were 90-95% extant. Subsequently, additional epochs were added, and over the next century, piecemeal advances in biostratigraphy clarified the stratigraphic distribution of many invertebrates and a geologic time scale was cobbled together across Europe.
What is obviously missing thus far is the sense of time; dating of radioisotopes was not discovered until the early 20th century and not perfected until the nuclear age. Until radiometric dating was employed, geologists worked effectively in a vaccum of numerical age data - all work was done in an understanding of relative time; Lyell would never learn that the Eocene was far longer than the Pliocene, for example, or that the Miocene-Pliocene boundary was only about 5 million years ago - but the Cenozoic in total had a 65 million year duration.

The geologic time scale updated for 2014. [From]
This abstract idea of time - otherwise referred to as Geochronology - regardless of absolute dates, has always been separated from chronostratigraphy. Chronostratigraphy can be viewed as the rock record itself with all of its imperfections. This dichotomy may sound a bit strange for the uninitiated, and indeed, it seems strange to many paleontologists (this is actually not a jest). Chronostratigraphic units are in a sense material units, while Geochronologic units are immaterial and refer only to periods of time. For example, the Paleogene period in geochronology is a period of time from about 65 to 25 mya, while in chronostratigraphy the Paleogene system comprises all rocks deposited during that time period. This dichotomy serves to highlight epistemological differences between the two frameworks (e.g. they are defined based upon different sets of criteria). Why the difference? Further to the point: the rock record, as reflected by the deposition of sediment, is like a barcode through time: surprisingly little time is represented by packages of sediment, and the majority is locked up instead in erosional surfaces (e.g. unconformities). Because one framework is purely based upon rocks (Chronostratigraphy) and the other is purely time (Geochronology), we need to be careful about using terms like upper versus late: upper denotes a position within the stratigraphic column, whereas late refers to time. This is why if you write something like "the late Miocene Santa Margarita Sandstone" geologists will either yell at you or make fun of you. This would be better written as "the upper Miocene Santa Margarita Sandstone" as we're discussing a rock unit; however, it would be acceptable to say "the Santa Margarita Sandstone is of late Miocene age". I'll admit that I've been slack about this before and this slipped through in at least one article, and I've gotten considerably more careful about it after being chewed out. A bunch of super-helpful writing tips for some of these confusing issues is given by Owen (2009).

Chronostratigraphy (left) versus Geochronology (right). From Owen (2009).

One last point: capitalization of time period modifiers. Some periods and epochs are formally subdivided, others are not; if a period of time has been formally subdivided, then it should be capitalized. For example, the Late Cretaceous is a formally subdivided period of time; the middle Miocene is not. Some confusion exists about the Pliocene and Pleistocene; some stratigraphers such as John Van Couvering have designated various international stages (e.g. Zanclean, Piacenzian) and their boundaries as defining partitions within the Pliocene and thus formally subdivided the Pliocene, and only some partitions of the Pleistocene have been formally subdivided. Further complicating matters is that according to the Geologic Time Scale 2012 the Pliocene and some of the Pleistocene subdivisions are formally designated (and thus can be capitalized), but the International Committee for Stratigraphy ( do not recognize these as being formally subdivided (except for the Upper and Middle Pleistocene). So it's admittedly a huge headache, but the moral of the story of this: paleontologists frequently screw this up too, but then again so to geologists: there is disagreement within stratigraphy about these things, so as long as you cite a particular framework (e.g. GTS 2012 or, or others) and follow that consistently, you'll be fine. This is perhaps not totally relevant to some of the specific points I'll be making, but is good ground to cover as relevant background, as these issues are often a source of confusion.

Sources of Age Data in Geology and Paleontology

With that out of the way, what sources of age data do we have at our disposal? My list is far from comprehensive, but these are the major methods used.

1) Radiometric dates. More often than not these are ash/tuff beds that have been dated using Ar/Ar or K/Ar from feldspars, although occasionally we get interbedded basalts that are even easier to date. Additionally, albeit less common, are radiometric dates from glauconite (that's right, ordinary greensand lends itself towards radiometric dates, randomly). These are generally quite good, and can be recalculated using updated decay constants. There are other types of dating methods under this category, such as U-Pb series dating.

A cute infographic showing the basics of radiometric dating (in this case, the isotope is Carbon-14). [Borrowed from]

2) Fission track dates. Elements undergoing radioactive decay like Uranium-238 leave little damaged tracks in the crystal lattice of minerals like zircon each time the uranium undergoes spontaneous fission. Because the decay rate of U-238 is known, the number of fission tracks can reliably tell us how long ago the zircon cooled from a magma. This is best used as a dating method when fresh zircons are sampled from an ash bed and were formed during that eruption.

Fission tracks in a zircon. [Ironically, this is borrowed from a creationist website:]

3) Ash correlation. Not all ash beds are datable - indeed, many ash beds lack phenocrysts large enough to sample by radiometric means or execute a fission track count. However, because every volcano has punched through a slightly different suite of rocks, every eruption has a distinct chemical fingerprint which can be used to tie non-dated ash beds to dated basalts or proximal ashes (e.g. an ash bed may have larger phenocrysts within it close to the volcanic vent). Ash fingerprinting - otherwise known as tephrochronology - is immensely important in Cenozoic marine and terrestrial rocks in the western U.S., as volcanism was fairly active during the Cenozoic.

A tephrochronologic correlation web for the late Neogene of Northern California. 
From Powell et al. (2007).

4) Magnetostratigraphy. Little iron rich grains in sediment align themselves with the earth's magnetic field - just like iron filaments on a sheet of paper adjacent to a bar magnet as illustrated in literally every science textbook ever made. Every so often the earth's magnetic poles switch, and when that happens the grains will point in the opposite direction - so a geologist (specifically a magnetostratigrapher) can take hundreds of oriented samples from a section of rock and identify which sections have normal or reversed polarity - this data can then be organized into a vertical "barcode" which can be calibrated using microfossils (see below). This vertical barcode can of course be matched to stripes on the Atlantic seafloor parallel to the mid-Atlantic Spreading Ridge; as new oceanic crust cools at the ridge, iron-rich grains also align with the magnetic field, resulting in magnetic stripes along the entire seafloor. These stripes go back to the mid-Mesozoic (as much of the pre-Cretaceous Atlantic oceanic crust has been subducted). In concert with biostratigraphy, a robust paleomagnetic framework exists and can be applied to virtually any strata on earth (generally Cretaceous or younger), provided that some biostratigraphically useful fossil content is preserved in the strata in question. A bonus is that because mid Atlantic seafloor spreading is more or less continuous (and rates of change in spreading are well-known), paleomagnetism can be a good way to tell if a significant hiatus in deposition exists; indeed, paleomagnetic work identified that a single bonebed in the Purisima Formation (=Bonebed 6 of my thesis/PLOS One article) recorded a 1 million year gap in deposition.

Paleomagnetism of the mid Atlantic spreading ridge and within a sediment core: matching the vertical barcode to the horizontal barcode is essentially the main principle of magnetostratigraphy. [Images borrowed from and]
5) Biostratigraphy. This is the oldest method for age determination and is a form of relative dating. Whereas during the 19th and early 20th century fossils told us how old the rocks were in a sense relative to other strata, we now have a whole host of absolute dates determined from the other four methods which have allowed assignment of well-constrained dates to biostratigraphic boundaries. Biostratigraphic zones (aka biozones) are historically defined based upon a number of zones, including A) taxon range zones (e.g. range of a single species), B) concurrent range zones (overlap range of two species), C) assemblage zones (overlap range of several species), D) interval zones (defined based on the bracketing of two bioevents, such as the extinction of two separate species at different times), and E) acme zones (defined based on abundance of a particular species; very susceptible to taphonomic processes). Taxa useful for biostratigraphy include microfossils such as diatoms (siliceous algae), benthic and planktic foraminifera (calcareous protists), calcareous nannoplankton (aka coccolithophores), radiolarians (also siliceous, but protists), dinoflagellates, conodonts (Paleozoic and Triassic only; actually a phosphatic element of early vertebrates); useful macroinvertebrate groups include bivalves, gastropods, and particularly ammonites for the Mesozoic - echinoderms have also been used in places. In terrestrial settings mammals are frequently used (e.g. North American Land Mammal Ages) as well as pollen. Useful fossils (index fossils) need to be widespread, easy to identify (and preserved well enough to be easily identifiable), and have a rate of morphological change that permits reasonable biostratigraphic subdivision. In other words, a hypothetical brachiopod that has changed little since the Triassic, only lives in deep sea environments, and is usually too poorly preserved to identify would be a pretty shitty choice for an index fossil. On the other hand, foraminifera tend to live everywhere in the marine realm and are widespread and evolve somewhat rapidly, and are therefore one of the most useful groups.

 Examples of biozone types and definitions from the North American Stratigraphic Code.

So what sort of age data do these provide? The first three generally all provide classical "dates" in the sense that it is presented as a midpoint with error bars: 25 ± 0.2 Ma, for example. The third sort is an ash correlation, so while the date is not intrinsic to the section of rock in question, it is chemically unique to an ash or lava bed somewhere else that has been dated, and thus the date can be applied to the correlative ash bed in our section. The last two are a bit of a mixed bag. Many boundaries of magnetozones are well-dated, so if we have the Gauss-Matuyama chronozone boundary preserved in our section, we can slap a date of 2.58 Ma to that magnetic reversal in our section. Many biozone boundaries - particularly for marine microfossils - have known absolute dates established based upon other methods. If a zone boundary is reflected in our section, we can use that; however, if a fossil in question just has "Boringmicrofossil taxon subzone B" attached to it, then the range of that subzone must be used - let's say the zone is 45.5-48.7 Ma in age, for example. What if our fossil is just above or below a well-dated biozone boundary, but is not bracketed by another date or biozone boundary? In that case, we're stuck with a range as well - whatever biozone our fossil belongs to, from whichever side of the zone boundary it came from. To be totally honest, most absolute ages for biozone boundaries are derived from methods reporting error bars, but are rarely reported as such and thus for biozone boundary ages it's not necessary to report a "±" range as few biostratigaphers bother. Lastly, what if our fossil occurrence has a biozone determination that lacks absolutely dated zone boundaries? Well, for the time being we're out of luck until a nice geologist fixes it for us.

An example of a robust set of dates for a vertebrate fossil: a fossil of the bony toothed bird Pelagornis from the Purisima Formation in California dated to ~2.5-3.35 Ma, from Boessenecker and Smith (2011). The fossil occurred in a stratum bracketed by two ash beds, and remains the youngest well-dated pelagornithid from the Pacific Basin.

Ideally, we would have radiometric dates that bracket a fossil occurrence nicely - for example, a fossil of Pelagornis that N. Adam Smith and I reported in 2011 was bracketed by a 2.5 ± 0.2 Ma ash below and a 3.35 ± 0.05 Ma ash above; using the midpoints of each results in an age of 2.5-3.35 Ma for the fossil. Alternatively, we could be slightly more conservative and utilize the endpoints (e.g. 2.3-3.34 Ma in this example). Rarely does this happen, and often we need to cobble together dates from various sources and dating methods in order to get the most tightly constrained age determination possible. For example, our fossil may be 10 meters below a dated ash, but 50 meters above the next dated ash - and a foraminiferal zone boundary may be 10 meters below our fossil. In which case, it's totally fine to use that zone boundary and the upper ash. Or, perhaps microfossils associated with the fossil itself have a substantially younger maximum age than the lower ash bed; in that case, it's fine to use the endpoint of that zone as a maximum age control, and the overlying ash date for the minimum age. We have to use whatever age control we can get our hands on - and age determinations for fossils are best regarded as ranges. Rarely do we ever get a date right from a fossil - radiocarbon dating in the Pleistocene is a notable exception (or, a fossil preserved directly in an ash bed, or a greensand unit with a K/Ar date) - and in these cases, a midpoint with a "±" is appropriate. If, for the purposes of graphical portrayal of geologic age, a range is needed (or for methodical purposes), the range of error could perhaps be used (e.g. 67 ± 0.5 Ma would be come 67.5-66.5 Ma). In general, one could use the minimum and maximum endpoints no matter what: the message here is that consistency is important, even if our age data have different sources. Lastly, it is necessary to point out that the most up-to-date papers on the geology and stratigraphy of a particular locality should be read and cited.

A quick note that doesn't fit elsewhere nicely: most strata are diachronous. That means that the age of the rock unit changes laterally. This makes sense because a given rock unit is generally a mappable unit of similar lithology, generally produced by a single depositional environments; depositional environments tend to move around through time (think of beaches during periods of sea level rise or fall). What this means is that geologic/stratigraphic data from the same locality (or close nearby) is preferable as it will decrease the chance of using dates that are too old or too young, owing to diachroneity.

Time transgressive strata, using beach deposited sand as an example here. [From]
Otherwise, it is important to note that this is all empirical: species and their geologic age are not abstract units existing in some intangible ether - the range of a species, or genus, is defined based upon the youngest and oldest occurring specimens identifiable to the taxon in question. This is all based on specimens - physical objects somebody dug out of a hole somewhere that are reasonably assigned to a given taxon, and are sitting in some museum collection someplace (which we can go and visit!). All fossil specimens sitting in a museum drawer were once part of the rock record - which technically makes paleontologists earth scientists. Sometimes specimens are misidentified, which may throw a wrench or two into the machine. The point is this: geochronology, chronostratigraphy, and biostratigraphy are dynamic and based upon real data, and refinements in methodology and framework "anatomy" (e.g. zone boundaries) as well as new dates are constantly being published. More on this below.

Best practices for molecular clock calibrations

Up to this point the discussion has been mostly academic with little application outside paleontology or geology. However, many enterprising biologists have over the past two decades used certain fossils as calibration points for molecular clocks; these clocks, if properly calibrated using paleontological evidence, can provide powerful estimates for the dating of certain nodes on a phylogeny. The problem with all this is that some paleontologists have been notably sloppy in publishing age data for fossils; most biologists are complete novices when it comes to geology, and lack the background to discern whether age data sounds "bullshitty" or not. I won't cite a specific example so as to avoid offending anyone, but here's an example of a bad geologic age determination for an important fossil with specifics modified or omitted: "XXX part of the XXX Formation, middle XXXocene, correlative with the XXX North American Land Mammal age, ca XX-XX Ma (citations of paleontologic papers that didn't actually provide primary age information for the locality)". First off: whenever we see "circa", it means it's an estimate; estimates are fine, so long as they're understood to be such (which means it must be obvious from the text). Secondly, in this case no actual land mammals were found in that unit; the author(s) wished to indicate that the formation in question was thought to be broadly correlative with the XXX NALMA, but not actually based upon any information. The problems with this are twofold: at the time of writing, the author(s) of the paper in question had access to up-to-date papers on the stratigraphy of the unit which actually constrained the time fairly well - and this was either unknown, or simply ignored out of laziness. The second problem is that fairly irrelevant papers were cited. What might a biologist see when reading this, however? If I were a biologist I'd say "look, so and so said this fossil was about 35-37 million years old, I can't really argue with him, so let's go with that." The author in particular regularly publishes papers that do not cite geological papers published after 1990. It sounds silly, but unfortunately this is the state of affairs.

Because of issues like this and other lapses in quality on behalf of certain molecular clock papers - Jim Parham and a huge lot of other authors (2012) published a fantastic paper advocating a set of "best practices" for identifying extinct taxa for molecular clock calibration points. Various studies have failed to identify individual fossil specimens, anatomical criteria (e.g. synapomorphies) for explaining why such fossils were identified as relevant for calibration, the formations they come from, or detailed age and locality information. Without this information, it is not necessarily possible to track down the rationale behind the use of certain criteria - indeed, certain molecular calibration points have even been based upon unpublished fossil specimens (that other researchers are not allowed to see) where the rationale for selection boils down to"one of our coauthors is a paleontologist and this is their opinion". Well, that doesn't sound very repeatable. So, Parham et al. (2012) outlined 5 recommended steps to ensure that a "readily auditable chain of evidence" could be established for each fossil selected as a calibration point. These steps designed by Parham et al. (2012) include:

1) Museum numbers of specimen(s) that demonstrate all the relevant characters and provenance data should be listed. Referrals of additional specimens to the focal taxon should be justified.

2) An apomorphy-based diagnosis of the specimen(s) or an explicit, up-to-date, phylogenetic analysis that includes the specimen(s) should be referenced.

3) Explicit statements on the reconciliation of morphological and molecular data should be given.

4) The locality and stratigraphic level (to the best of current knowledge) from which the calibrating fossil(s) was/were collected should be specified.

5) Reference to a published radiosotopic age and/or numeric timescale and details of numeric age selection should be given.

In retrospect, none of these sound very demanding and are altogether reasonable - points 4 and 5 are italicized because they apply specifically to this topic. In general these "best practices" emphasize repeatability. Last year I published a paper on the baleen whale Herpetocetus from the Pleistocene and included a phylogeny with the stratigraphic ranges of various mysticetes. In order to firmly establish the duration of different mysticete lineages, I spent some time reading the most recent literature and compiled some notes; these notes turned into a supplementary appendix for the paper that included a short (~1 paragraph) justification for each species shown in the cladogram (Boessenecker 2013B: appendix). Altogether this entailed an extra few days of work - well worth it for getting a publication that will hopefully be useful, or at least blatantly transparent.

Dates from the paleontologic and stratigraphic literature

Should we use/read/cite up-to-date stratigraphic literature? It feels asinine to even ask the question. If you answer 'no', you're lazy. Many paleontologists are satisfied with papers published as long ago as the 1960's for stratigraphic information about particular fossil occurrences. Well, one of the great things about paleontology is that we're a small field and because we're a historical science, many early papers will always be relevant, unlike fields such as physics and medicine where some studies may be totally irrelevant and outdated within a decade (or less) of being published. Lets say we wanted to cite the age of some fossil that Cope or Marsh described - we surely wouldn't rely upon a Victorian account of local geology and take Ed or Charles's statement on the matter at face value. Victorian era publications are, effectively by default, outdated in terms of stratigraphic knowledge. Why is that? Could it be, perhaps, that geologists are still poking around hillsides, badlands, coastal cliffs, and roadcuts, collecting ever more data? Last I checked, geology is still a pretty dynamic field (with far more researchers than paleontology, for starters) and hundreds of journals crank out new publications every week. If it makes sense to double check for up-to-date stratigraphy for Victorian-era discoveries, than it follows to do the same for fossils published in the 1960's, 1970's, 1980's, and 1990's. Hell, you should make sure to double check up to date stratigraphy even for something you published a year ago (this week I found out that one of my published age determinations is off by about 0.4 Ma, thanks to an overlooked piece of data). It's not like we don't have magical internet machines and google to find this sort of stuff - it's not that hard to find, thanks to the power of the internet. Parham et al. (2012) eloquently point out that "Any numeric age is merely the best current estimate and can be refined through time." I'll give a series of "what if?" questions and some answers, since there are numerous issues to tackle:

What if a paleo article doesn't cite any geology articles, but gives an estimate of the age?

           Shame on them. Use google, georef, or web of science to search for the formation name. Using what papers you can, try to find stratigraphic data for that locality.

What if I can only find stratigraphic data for a different locality?

            Sometimes this can't be avoided. If it really can't be avoided, make sure to state that available evidence is from a separate locality.

What if I can't find any information about that formation at that locality?

            Stratigraphic terminology has changed a lot, with formations being promoted to Group status or demoted to members, or completely renamed in some cases. For example, the Drakes Bay Formation at Point Reyes in Northern California is now known to actually be three different formations already known from other localities (Santa Margarita Sandstone, Santa Cruz Mudstone, and Purisima Formation). Again, use google; see if papers on regional geology make any reference to nomenclatural changes.

What if the guy who described this fossil said it was Pliocene, but other papers indicate the fossil is Miocene?

            Just like lithostratigraphy, chronostratigraphic stages and boundaries have been modified and refined through time. In many papers published pre-1980, the Pliocene included much of what is now late Miocene. Read the available literature, and use whatever is most careful and up-to-date. Also, use google (again).

What if the paper that reported the fossil reported some sort of new data associated with the fossil?

            Perfect! A best-case scenario as this means the separation between the fossil and geologic data is minimized. In this case, please cite whatever data they report. If biostratigraphic data is involved, double check that that zone is still in use. Also, in the case of diatom biochronology in the North Pacific, the names of different zones have changed; they used to have roman numerals (I, II, III etc.) but in the 1980's were switched to taxon-range and concurrent-range zones using the names of the various diatom species.

What if the aforementioned example is sort of old?

            Then you should double check the biostratigraphic literature to see if refinements have been made to the boundary dates and zone definitions. If an absolute date is involved, it may be prudent to recalculate the age using updated constants. Personally I don't feel qualified to do this myself, but for my personal research purposes most of the absolute dates I've cited are less than a decade old or have already been recalculated by others. Refinements in radiometric dating methods mean that old dates ought to be updated.

What if age data is available only at the level of the formation?

            It happens unfortunately - sometimes a given unit is not well studied and lacks internal age determinations, and perhaps the formation corresponds to a single biozone or magnetic chronozone, or has a date from the top and base. Well, there's not much else you can do other than accept formation-level data; sometimes it's not a big deal as many units are thin and record a short period of time - but you're a bit hosed if it's a long-duration unit like the Monterey Formation.

What if the locality is virtually unstudied, but the author said it was a certain epoch?

            This is sort of an almost worse case scenario, but for some poorly studied localities, perhaps "Pliocene" (or better yet a stage, like "Maastrichtian") is all you're going to get; in that case, you can use the known age boundaries for the epoch (or other period of time). Often the original author doesn't have a hope in hell of knowing exactly how old it is, but this is still defensible as we're leaving a "citation trail".

What if I got these neat dates off of the Paleobiology Database?

            Much of the other concerns listed above apply to dates from the PBDB. Read the following section.

What if I’m putting together an analysis that requires “binning” of age data for different taxa into interval time bins? I’ve already binned it so I might as well use the binned age data for reporting the actual age (e.g. in a figure showing lineage ranges or a time-calibrated phylogeny showing lineage duration).

For heaven’s sakes don’t do that either! Binning is great for paleobiologic analyses of diversity data, but why bother with converting the real age data into something less accurate? Let’s say, for example, that you have an unusually long time bin, based on a long stage: a lineage that ends at the beginning of the time bin and another lineage that begins at the end of the time bin, but do not actually overlap in time would be assigned the same time bin. Based upon the modified data, this would artificially make it appear that the two taxa lived at the same time. Although technically defensible as it can be applied consistently, we should be reporting the actual geochronologic age of particular fossils.

Should I publish citations of which stratigraphic articles I looked at, so that the reader can tell the difference between the citation for the fossil occurrence and the geologic age if they come from different papers?

An emphatic YES! This should always be done so as not to leave the reader guessing (guessing ≠
science). Here are two examples of how you can organize such data.

This is just a tiny part of an enormous table I put together for my 2013 Geodiversitas paper. Geochronologic ages are frequently derived from non paleontological sources. From Boessenecker (2013A).

In this case the table includes hard dates for fossil occurrences where the paleontologists provided reasonable dates, and one exception where several sources are cited thanks to refinements in stratigraphy. From Churchill et al. (2014).

To wrap up this section, I'll include another quote from Parham et al. (2012) that explains the point behind all this: "Anatomically trained fossil systematists may not be able to retrieve [geologic age] data any more easily than molecular systematists, but by listing the specimen numbers, rock units, and ages in a standardized way, others may check the claim, thus facilitating the refinement of numeric dates over time."

Pitfalls of the PaleoBiology Database

The PBDB is an excellent reservoir of data, and is used frequently for analyses of diversity and paleobiogeography. For the uninitiated, the PBDB is a database recording various types of data directly from the paleontological literature, foremost of which are geochronologic age and the taxonomic identification of fossil occurrences. The problem with dates provided by the PBDB is that the quality is only as high as the investment by the data enterer: most fossil occurrences just use the date from the original article the data is taken from. In some rare cases primary stratigraphic literature is cited; rarer yet are cases where unpublished stratigraphic data are supplied.

Because the PBDB is just an aggregator of data – data primarily from the paleontological (rather than stratigraphic) literature – it suffers from many of the same problems I’ve already discussed earlier. In addition to possessing these limitations, it also suffers from having outdated age determinations – including the shift in recognition of the Miocene-Pliocene boundary from ~12 Ma to 5.33 Ma, and also recording dates from old paleontological articles that cited stratigraphic works that are long out of date. Because of the lack of double-checking the stratigraphic literature, most of the age data on the PBDB should be assumed to be as accurate to the time at which the article was published. Another issue is binning: unless pinpointed dates are given in a paleontological article, if a stage or epoch was given – hard numbers for the boundaries are given. This is not necessarily a problem because I advocate that above (e.g. so and so said the fossil is “Pliocene”, so the fossil is assigned an age of 5.33-1.81 Ma). However, it’s important to evaluate the manner in which age was determined for each occurrence. I strongly recommend that age data from the PBDB be double checked using the stratigraphic literature. Much of the PBDB is “reasonably” accurate, but reidentifications in the literature are not always recorded accurately – which will affect age ranges. I don’t expect PBDB data enterers to bother with the stratigraphic literature as it would multiply the amount of work involved tenfold – but if someone is serious about getting accurate ages for fossils, the PBDB is an excellent tool and a great bibliography, but should not be accepted at face value.

Suggested Methodology

1) Familiarize yourself with the stratigraphic literature for a given fossil occurrence. Old paleontological publications are almost guaranteed to require citations of more recent stratigraphic studies (and many do a piss-poor job of even citing current geological work at the time of publication!).

2) Double check that subsequent paleontological papers have not reidentified the fossil in question. Is it really a fossil of the walrus Pontolis, or is it an Imagotaria?

3) If the age of a fossil occurrence was given using biostratigraphic zones, they are probably out of date! Find the most up to date zone boundary ages available in the literature. Similarly, old radiometric dates can (and should) be recalculated (although often these dates can be found recalculated in more recent geological literature).

4) Justify the geochronologic age range that is being reported. Now that you’ve read the stratigraphic literature, cite it! Other researchers need to know how you arrived at your age determination – as Parham et al. (2012) stated, a readily auditable chain of evidence” is the goal. At the bare minimum, at least include a table showing a citation for the fossil occurrence (primary paleontologic literature) and a citation for the geochronologic age from the best source possible (occasionally within paleontologic literature, but most often from the stratigraphic literature). The distinction between the two is very important! If age data are important to your study, then an appendix with a short description of the occurrences and age - filled with both of the aforementioned citation types – is a good idea.

5) Always vet fossil occurrences on the PBDB in a similar manner as you would for published occurrences.


Boessenecker, R.W. and N.A. Smith. 2011. Latest Pacific basin record of a bony-toothed bird (Aves, Pelagornithidae) from the Pliocene Purisima Formation of California, U.S.A. Journal of Vertebrate Paleontology 31(3):652-657.

Boessenecker, R.W. 2013. A new marine vertebrate assemblage from the Late Neogene Purisima Formation in Central California, Part II: Pinnipeds and Cetaceans. Geodiversitas 35:4:815-940.

Boessenecker, R.W. 2013. Pleistocene survival of an archaic dwarf baleen whale (Mysticeti: Cetotheriidae). Naturwissenschaften 100:4:365-371.

Churchill, M.M., Boessenecker, R.W., and Clementz, M. 2014. Colonization of the Southern Hemisphere by fur seals and sea lions (Carnivora: Otariidae), revealed by combined evidence phylogenetic and Bayesian biogeographic analysis. Zoological Journal of the Linnean Society 172:200-225.

Owen, D.E. 2009. How to use stratigraphic terminology in papers, illustrations, and talks. Stratigraphy 6:106-116.

Parham, J.F., and many other authors. 2012. Best practices for justifying fossil calibrations. Systematic Biology 61:346-359.

Powell, C. L., II, J. A. Barron, A. M. Sarna-Wojcicki, J. C. Clark, F. A. Perry, E. E. Brabb, and R. J. Fleck. 2007. Age, stratigraphy, and correlation of the late Neogene Purisima Formation, central California coast ranges. U.S. Geological Survey Professional Paper 1740:1–32.

No comments: