Skip to content

Darwin Core Archive export

GBIF and other biodiversity databases accept data packaged as a Darwin Core Archive (DwC-A): a ZIP containing CSV data files, a machine-readable XML descriptor, and an EML metadata document. NaturaList compiles your project directly into this format through the DwC archive table. The feature is entirely opt-in - if the table is absent or empty, no DwC output is produced and no related options appear in the Manage screen.

The mental model: a term-to-source mapping table

Think of the DwC archive table as a translation layer between your project's internal data and the Darwin Core vocabulary. Each row answers the question: "For DwC term X, where should the compiler look to get its value?"

That "where to look" instruction is the value source - a directive that points the compiler at the right place: a column in your data sheet, a property derived from the compiled taxonomy, a dataset-wide setting, or a static string. The compiler reads each row in turn and assembles one row per taxon (for checklists) or one row per occurrence record (for occurrence archives), populating each DwC term accordingly.

The Export to column directs each row to exactly one archive - checklist or occurrences. The deliberate one-archive-per-row design makes the export explicit and auditable: you can read down each archive's rows in isolation and know exactly what it contains.

Two archives, one table

Depending on your settings, the compiler produces up to two ZIP archives:

  • A checklist archive - one row per taxon node in your hierarchy, suited for publishing a species list to GBIF's taxonomic backbone.
  • An occurrence archive - one row per occurrence record, suited for sharing specimen or observation data. This requires occurrence records to be present in your data sheet.

Terms shared between both archives - institution code, language, scientific name - simply need a duplicate row, one targeting checklist and one targeting occurrences. The explicitness is intentional: there is no ambiguity about what ends up in each archive.

Choosing the right value source

The value source directive you choose should reflect where the data actually lives in your project. In Value source you will find documentaiton and examples for each of the following directive types:

  • column: - use this when the term maps directly to a column in your data sheet, typically used for collector names, localities, dates, catalog numbers, field notes. Write the column name directly, or build a composite string by wrapping optional segments in [...] - a segment is dropped entirely when any {placeholder} inside it resolves to empty for that record. Any literal text between [...] blocks acts as a junction and is only emitted when both neighbouring blocks are present, so no ghost separators are left behind. For static values that are the same across every record, simply type the plain value without any prefix:.

  • auto: - use this for terms that are computed contextually from the compiled taxonomy and cannot meaningfully be stored in a spreadsheet column. The key distinction from taxa: (see below) is that auto: values are rank-aware and hierarchy-aware: auto:scientificName, for instance, resolves to the name of the currently processed taxon - a family node yields its family name, a genus node yields its genus name. You cannot replicate this by pointing at a single taxon column, because the right column varies depending on which level of the tree is being exported. Likewise, auto:parentNameUsageID generates the correct parent link for every node in the tree automatically, and auto:taxonID produces a stable UUID derived from each node's identity. Use auto: for all taxonomy identifiers, parent linkages, rank labels, and scientific names and their authorities; these are the right choice for the corresponding checklist terms without exception.

  • config: - use this for dataset-wide metadata already stored in Customization - your checklist's name, about text - so you don't duplicate it in the DwC table.

  • taxa: - use this when you need the literal value stored in a specific taxon rank's column for every record. Unlike auto:, which derives values from the hierarchy structure itself, taxa: simply reads what is in a named column of your taxonomy. Use it for terms like dwc:family or dwc:genus where you want the content of that rank's column as it appears in your spreadsheet, or a sub-field of it such as the authority string or terminal epithet.

  • media: - use this for dwc:associatedMedia and other terms that expect full media URLs. Using a plain column name for an image column would not yield usable URLs.

  • plain text (no prefix:) - use this for a verbatim value or a controlled vocabulary term that is constant across every record in your export: language code, institution code, collection code, basis of record, geodetic datum, license.

EML metadata

You can include an EML metadata document in two different ways:

  1. A precomposed EML file (eml:precomposed) - the preferred option. Use an F: directive pointing to a file in usercontent/ (e.g. F:dwc/checklist-eml.xml). The compiler fetches and bundles that file under the name eml.xml. Best for established datasets or institutional submissions where the full breadth of EML metadata matters. Technically you can also enter the raw XML into the cell, but this is impractical.
  2. eml: rows in the table - the compiler assembles a minimal EML document from term rows prefixed with eml:. Suitable for small or simple projects or as a temporary backup solution.

Naming your data sheet columns to reduce friction

The single most effective way to keep the DwC archive table readable is to name your data sheet columns after Darwin Core terms from the start: recordedBy, catalogNumber, locality, eventDate. When you do this, most of your value source entries are simply the column name - the table reads as a direct, self-evident mapping with no translation layer.

If your project predates this guide or uses institutional column names, you can still map everything correctly - it just requires more explicit directives in the value source column.

NaturaList documentation