Biosystematics, informatics and genomics of the big 4 insect groups: training tomorrow's researchers and entrepreneurs

First Impressions and First Results by Viktor Senderov (@vsenderov,

I recently blogged on my PhD thesis blog about the BIG4 kick-off meeting in Copenhagen. Here, I will revisit this topic and give my first impressions as well as report some first BIG4-relevant results.

As I pointed out, the format of the meeting was two days of presentations by students and PI’s about individual projects, then a field-trip day, then two more days of presentations and entomology-related workshops lead by senior lecturers. Certainly, one of the more memorable moments for me was the Friday workshop, when the students got to examine and document parts of the Fabricius collection at the Natural History Museum of Denmark.


The whole symposium was very well organized thanks to Sree and Alexey. I am certainly looking forward to the next one that will be probably in the Czech Republic.

On the scientific side, I think we have good mix of entomologists and molecular researchers - me being squarely in the second camp. I am looking forward to the next half a year or so when first data begin to be generated so that I have material to work on. In the mean-time, I will be doing some interconnections between data portals and Biodiversity Data Journal (BDJ) in order to learn the Pensoft system, laying the ground-work for an open thesis, and working on different research agendas for my project.

Some of those interconnections have been already engineered and I’d like now to introduce two new workflows. The first workflow facilitates the import of metadata into BDJ as a data paper. What it does is that it allows an author in BDJ to initialize her data paper manuscript from an EML text file containing metadata belonging to a dataset. In other words, given a dataset and its metadata, we convert the structured information about the dataset found in the metadata to a journal-style formatted manuscript ready for submission for review in BDJ after modifications have been made. The other workflow facilitates the import of occurrence records into a taxonomic manuscript at BDJ. As you can see, it is now possible to copy  occurrence records from GBIF, BOLD Systems and iDigBio into your taxonomic manuscript by just typing their ID’s in a dialog.

These two workflows could be used by BIG4 students and PI’s to write data papers about the datasets that are generating and in taxonomy papers. An easy way to utilize them would be that all BIG4 member labs install a copy of GBIF’s Integrated Publishing Toolkit (IPT) on their lab server and share their biodiversity datasets with the community via IPT. Then should the authors decide to publish in Biodiversity Data Journal they would be able to both create a data paper about their dataset based on GBIF’s EML format and import the individual occurrences into a taxonomic paper.

In terms of the big picture, they would be most useful for species redescriptions and more specifically to extend the morphological descriptions of old taxa with genomic data. I also plan in the future, together with the ARPHA engineers, to write an importer for occurrence data from Darwin Core-Archive (DwCA). This will allow for an almost universal exchange of occurrences between databases and BDJ.

We would certainly be very happy to hear from other BIG4 students and PI's and also from the general public. Therefore, I will cross-post this blog entry on the BIG4 Google+ page, and on my PhD thesis blog, where you can start a discussion.

