Haplotypes, median networks, and diagnostic characters as tools to elucidate the intraspecific genetic and taxonomic structure of bumblebees, with a description of Bombus cryptarum pallidocinctus new subspecies (Hymenoptera: Apidae: Bombus).

Ein Alignment von 168 Sequenzen mitochondrialer DNA von Bombus cryptarum liefert 29 Haplotypen. Ein Median-Network zeigt funf klar abgrenzbare Cluster, die Haplogruppen (HG) A-E, die durch diagnostische Positionen definiert werden. Vier dieser HG lassen sich bekannten Taxa zuordnen HG-A = B. cryptarum cryptarum, HG-B = B. cryptarum florilegus, HG-D = B. cryptarum albocinctus, und HG-E = B. cryptarum moderatus. Die Haplogruppe C entspricht einem neuen Taxon B. cryptarum pallidocinctus im Rang einer Unterart. Diese neue Unterart wird beschrieben. Die Verbreitung der Eurasiatischen Taxa wird untersucht und die Folgerungen aus Median-Network und geographischer Verbreitung fur die mogliche Phylogenie werden diskutiert.StichworterHymenoptera, Apidae, Bombus, mitochondrial DNA, new subspecies, geographic distributionNomenklatorische Handlungencryptarum pallidocinctus Bertsch et al., 2014 (Bombus), subspec. n.


Introduction
Since the publications of Pedersen (1996Pedersen ( , 2002, Kawakita et al. (2004) and Cameron et al. (2007) we have a quite-reliable picture of the phylogenetic relationships of most bumblebee species. Recent comprehensive surveys for the subgenus Bombus sensu stricto are available (Bertsch 2010;Williams et al. 2012). However, the main point of view of these investigations always was the definition and identification of 'species' . Since the problematic identification of critical species by difficult morphological characters could be replaced by the use of mtDNA sequences of cytochrome oxidase I (COI), a safe identification of specimens is possible. But although 'species' are important reference points and significant episodes in the course of evolution, with intensive collecting and the study of a large number of individuals from a wide spread of geographic proveniences, the centre of interest is gradually passing from the systematist's 'species' to the ‚natural population' from which the species is abstracted. Two recent examples with a detailed analysis of the haplotype structure of the populations are Duennes et al. (2012) for the Central-American Bombus (Pyrobombus) ephippiatus and Lecocq et al. (2013) for the European Bombus (Melanobombus) lapidarius. In this field probably lies the most promising potential for the DNA-based research on bumblebees: the possibility to study the intraspecific genetic and taxonomic variability of populations.
In his seminal publication 'Les bourdons du genre Bombus Latreille sensu stricto en Europe Occidentale et Centrale' , Rasmont (1984) refers to an unnamed subspecies of B. (Bombus) cryptarum Fabricius 1775 with a question mark, obviously unsure about this taxon, based on 2 worker specimens from Finland. In Pamilo et al. (1997), we find the remark that males of B. cryptarum from Finland did not show the morphological characters developed by Rasmont (1984) for specimens from Belgium. A strange COI sequence from an unidentified specimen from Norway was published by Pedersen (2002, Genbank AY181116), which, after re-sequencing, proved to be a sequence of B. cryptarum (see discussion 'detecting new taxa? ' Bertsch 2009, p. 302) and was identical with a sequence from St. Petersburg (Genbank EF362729) published in Murray et al. (2008). Carolan et al. (2012) investigated polymorphic sequences of B. cryptarum and detected 2 differing haplotypes. Based on three sequences from Southern Finland a cryptarum-haplotype 2 was described (Genbank JN872566, JN872567, JN872568), identical with the sequences published by Pedersen 2002and Bertsch 2009from Norway and Murray et al. 2008 from Russia. Meanwhile 10 more sequences of this cryptarum-haplotype 2 are available at Genbank (Blast search 20.ii 2014), with a geographical spread from Norway to Western Siberia (see appendix Table 2). A detailed analysis of the intraspecific structure of populations may help to understand these findings. B. cryptarum with a circum-polar distribution has proven to be especially genetically variable (Bertsch et al. 2010); COI sequences of specimens from Eurasia and North America will be taken as a case study to test the potential of the genetic information available and to investigate what a median network of haplotypes and their geographical distribution may tell us about the intraspecific genetic and taxonomic structure and recent history of B. cryptarum.

Material & Methods
Polymerase chain reaction (PCR) and DNA sequencing of mitochondrial COI Total DNA was extracted from legs using the QIAamp® DNA Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's specifications for tissue (for details see Bertsch et al. 2010). For sequence analysis five overlapping fragments of mitochondrial COI were amplified (for primer pairs see appendix Table 1). The PCR products were purified using an AMPure® PCR Purification Kit (Agencourt, Beverly, MA, USA). Sequencing reactions were performed using ABI® BigDye Terminator version 3.1 chemistry (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's instructions; all fragments were sequenced from both strands and then analysed on an ABI 3100 sequencer (Applied Biosystems). The ABI trace files were edited with 4PEAKS (from MekenTosj) and aligned manually using ClustalX. To produce an equal sequence length for all individuals, sequences were trimmed to 1533 bp (encoding 511 amino acids). As COI sequences for bumblebees from different authors are of different length and starting points a comparison of positions is often difficult.
To enable a precise referencing of distinct positions and an unambiguous comparison of sequences from different sources we aligned the sequences with the complete COI gene of B. ignitus (position 1643-3202 in GenBank DQ870926) (Cha et al. 2007); the first position of the start codon Ata is Position 1 and the last position of the stop codon taA is position 1560. All other positions are referenced accordingly. The Bold Project with a sequence length of 658 nt is a minimalistic approach, originally created as a novel system for rapid and accurate species identification (Hebert & Gregory 2005). In this respect, the project has been very successful, and a large amount of data is now available for analysis (about 4800 sequences for the genus Bombus, 1370 thereof public). Since the interpretation of results is more and more expanding from the primary purpose of species identification to questions of relationships and phylogeny, the question arises whether this minimalistic approach is adequate for such far-reaching problems. As already discussed in respect to degraded sequences from type material (Bertsch 2009, p. 303) sequences of greater length produce more information and give more security into conclusions derived by such an analysis.
A total of 168 sequences (Bold length 658 nt) from Eurasian B. cryptarum (available at GenBank, at Bold-Public-Data-Portal and unpublished material from our laboratory) has been used for an analysis of genetic population structure; they will be referenced as Group-1 sequences. By using six pairs of primers, we produced overlapping partial sequences, which can be combined to nearly full-length (1533 of 1560 nt) sequences of the COI gene. In all, 64 such full-length sequences are currently available; they will be referenced as Group-2 sequences.

Analysis of sequence divergence, haplotype frequency and calculation of median network
The absolute numbers of substitutions were counted based on a pairwise comparison of COI sequences. The analysis of the genetic distance of sequences was performed using Mega 4.0 (Tamura et al. 2007;Kumar et al. 2008). The aligned data matrix of 168 sequences was collapsed to 29 haplotypes (HT) using DnaSPv5, a program distributed by Librado & Rozas (2009). Median-joining haplotype networks (Bandelt et al. 1999), both with and without maximum-parsimony-post-processing (Mardulyn 2012) were calculated, with ε = 0 and all variable sites weighed equally using Network, a program distributed by Fluxus-Technology. Median-joining networks have been recommended over maximum parsimony approaches in intraspecific studies as they capture a greater degree of ambiguity, thus enabling more realistic interpretations. Diagnostic characters were detected by visual inspection and checked by MacClade version. 4.08 (Maddison & Maddison 2002).

DNA polymorphism, haplotypes, and median network
The aligned data matrix of 168 Group-1 sequences included 35 variable sites. One of these variable positions was parsimony uninformative (noise) and 34 variable positions parsimony informative (signal). The nucleotide frequencies were 34.00 (A), 42.05 (T), 11.77 (C) and 12.18 (G), which proves the known strong A + T bias typical for sequences of Hymenoptera. The data set comprised 29 haplotypes, a haplotype diversity (Hd) of 0.946 and a nucleotide diversity (π) of 0.01147. A median network ( Fig. 1) is well suited to visualise the genetic structure for genetic variability of these 168 B. cryptarum sequences. It represents so to speak, a view from top at the last branches of the phylogenetic tree visualising the lumping of clades for the present-time stadium of evolution.

Taxonomic units defined by genetic divergence
The haplotypes of a median network may be grouped by coherence or by divergence. Structuring populations into Haplogroups (groups of haplotypes) relies on genetic coherence: all haplotypes connected by steps of one mutation cluster to a haplogroup. Recently it has become fashionable to use Motus (Molecular Operational Taxonomic Units), which rely on genetic divergence: all haplotypes separated by a certain cut-off value for genetic divergence belong to a MOTU. With our 168 sequences in search of intraspecific taxa of B. cryptarum things are quite obvious, groups of haplotypes are separated from the next cluster by at p-distance of about 0,006 (cut-off 0,6 %), both approaches result in five distinct groups of haplotypes (HGs A-E) (Fig. 2). Haplotypes 6-9 (Turkey, Iran, and Kyrgyzstan) and haplotype 11 (Etorofu/Iturup Kuril Islands) do not fit into this procedure; they are separated by branches with two mutations. As only short Group-1 sequences are available, most probably longer Group-2 sequences will have increasing numbers of separating mutations, and both clades may merit treatment as separate haplogroups.

Taxonomic units defined by diagnostic characters
A definition of taxa by genetic coherence or divergence my be 'operational' , but as classical taxonomy defines taxa by characters a straightforward linkage of genetic divergence and taxonomic characters is often problematic. As there are no gaps in the alignment of the COI sequences, single nucleotide sites can be used as positional homologies (Hillis 1994;Brower & Schawaroch 1996), therefore, single nucleotide positions can be treated like characters and a direct comparison of genetics and taxonomies gets feasible (Fig. 3).
B. cryptarum is characterised by eight diagnostic characters ( Fig. 3 green boxed), which separate this species from the closely related and morphologically similar species B. magnus, B. patagiatus and B. lucorum (comparisongroup 'cryptarum-magnus-patagiatus-lucorum'). Haplogroups A-E can be separated by private diagnostic characters (Fig. 3 red) or a combination of a few diagnostic positions. Since there is no ambiguity or overlap, Fig. 3 can be used as a 'key' to identify unknown specimens.  1 1 1 1 1 1 1 1 1 1 1 1 1  0 1 2 2 3 3 3 4 4 5 7 7 7 8 8 9 0 0 0 1 1 1 3 3 3 3 3 4 4 5 5  8 2 4 5 2 3 4 0 0 8 0 5 5 3 5 0 0 7 7 1 3 7 1 4 5 5 8 4 5 0 0  4 9 3 3 8 0 8 5 8 8 8 0 1 7 5 6 8 1 7 6 1 6 4 7 6 9 0 0 5 0 7    A graphical display of nested clades (Templeton et al. 1995;Templeton 1998) representing the taxonomic hierarchy proves especially useful to visualise the complex order, which represents the genetic intraspecific structure and their relationships with taxonomy (Fig. 4).  Medium-size species, queen length 14-18 mm. The mandible distally broadly rounded with two pronounced anterior teeth and a weak posterior tooth, which is separated from the rounded anterior margin by a broad, shallow-rounded excision or 'incisura'; labrum with the lateral tubercles low with many coarse punctures on their proximal side, the median labral furrow broad ± rounded; clypeus convex, shining with large punctures widely scattered; ocello-ocular area along inner eye margin with scattered large punctures and with many small punctures, which are only very narrowly reduced in density opposite the ocelli, the area of dense small punctures reaching the eye margin just behind the eye dorsally; mid basitarsus with the distal posterior corner acute but broadly rounded; hind tibia with the outer surface smooth and brightly shining; hind basitarsus with the proximal posterior process produced by more than its proximal breadth, the posterior margin nearly straight; metasomal tergum 2 clearly chagrined, punctures ± distinct; metasomal sternum 6 with a weak central keel. Color pattern: the head with the short hair black, black thorax with a well-defined broad pale yellow collare, going down on the episternum, with a distinct S-band of dark hair at the posterolateral border of the pronotallobus; hairs of the leg bases and hind femur black, hairs of the corbicular fringes black; wings light brown. Hairs of metasomal tergum 1 black, tergum 2 pale yellow, with scattered black hairs intermixed near the posterior margin, especially laterally; tergum 3 predominantly black with lateral white fringes, with white hairs posteriorly; terga 4-6 white with few black hairs intermixed; sternal posterior fringes whitish.

Description of B. cryptarum pallidocinctus new subspecies
Name: Derived from the characteristic pale yellowgreyish coloration of Collare and Tergum 2, from Latin pallido = pale and cinctus = striped, with cingulum.

Number vs. length of sequences
We need more information through either a greater number of sequences and/or sequences of greater length. In the case of the haplotypes 6 -9 from Turkey, Iran, and Kyrgyzstan only more sequences of greater length may help to clarify the situation. At a first sight the B. cryptarum cryptarum from the Austrian Alps and the specimens from Turkey look rather isolated from each other, but there are 'connecting specimens' from the mountains of the Balkans in museum collections. We need sequences from the Balkan mountains to see whether there is a connection by one-step mutation branches or not. Furthermore, it might be possible that by longer Group-2 sequences the divergence from now two mutations with Group-1 length may increase to 4-5 mutations, which would imply an additional taxon also in the rank of a subspecies, B. cryptarum iranicus Krüger 1954. And, as for all B. cryptarum specimens we need much more detailed information about morphological attributes, which might help to lump or separate taxa.

How to use diagnostic characters
Their number recognizes diagnostic characters: they are useful tools, which help to identify unknown specimens, and, contrary to most morphological characters they are unambiguous, having no gradual change or overlap (see discussion 'facing the facts: morphology vs. molecules ' , Bertsch 2009, p. 304). The number (i.e., the position in a sequence) defines a diagnostic character. As long as authors number their diagnostic characters individually their usability is heavily restricted. Murray et al. (2008) numbered their diagnostic characters in comparison with a reference sequence (Genbank AY181162), Carolan et al. (2012) and Williams et al. (2012) numbered their diagnostic characters from the beginning of their partial sequences. As this partial sequences have different starting positions (in reference to the COI gene), their diagnostic characters get different numbers. None of these publications, even though some have identical co-authors, are comparable in numbering of diagnostic characters. As described in 'methods and materials' we recommend to number sequences and diagnostic characters in reference to the complete COI gene, Number 1 is defined as the first nucleotide of the start codon Ata of COI. Diagnostic characters should be applied in a two-step procedure: A naïve and schematic application of diagnostic characters is not recommended. If in a first step (with comparison group 'cryptarum-magnus-lucorumpatagiatus') the succession of diagnostic characters is 253 T 328 C 330 T 751 C 1314 C 1359 T 1500 C 1507 G, or part of it in case of shorter Group-1 sequences, this individual belongs to B. cryptarum. However, this is only valid if no unknown taxa outside the comparison group are involved. Only after this is settled the use of diagnostic characters to identify subspecies in a second step makes sense. Genbank sequences AY181117 and AY181119 for example are specified as B. cryptarum. Therefore, it might be interesting to see what subspecies they are. To be safe, we examine in a first step their status as 'species' , and instead of 253 T 328 C 330 T 751 C, which we expect for B. cryptarum between positions 253 and 751 we get 253 C 328 T 330 A 751 T. Both sequences do not belong to B. cryptarum (they are B. lucorum, see Bertsch 2009 Fig. 4, but that is another story) and a further test for subspecies of B. cryptarum would be nonsense. Or let us test the identification of a degraded sequence identified as B. cryptarum by PCR-RFLP approach (Genbank KC192045, Vesterlund et al. [2014] Lowenstein et al. (2009), in an attempt to identify Tuna species in Sushi, gave a thorough report about the possibilities for identification by genetic divergence (Bold identification or Blast search) or by diagnostic characters, and came to the conclusion, that using characteristic attributes (= diagnostic characters) is the best solution. The discussion in Lowenstein et al. (2009) of the problems involved in identification by different methods is very much recommended for reading.

Phylogenetic implications
What do we know about the evolutionary history of B. cryptarum? Nothing. According to theory, the phylogenetically older clades are in the centre of a median network, while the phylogenetically younger clades can be found at the peripheries. For the median network of B. cryptarum this would imply that the haplotypes of B. cryptarum florilegus must be seen as the phylogenetic oldest members, from which three different branches evolved. B. cryptarum florilegus then is not just a specialist island colour morph of B. cryptarum albocinctus (see Williams et al. 2012) but some sort of restricted Refugio distribution of a taxon with broader distribution in past times. The restricted occurrence of this subspecies at Nemuro peninsula at Hokkaido always looked more as a Refugio of an old than as a toehold for a young, expanding taxon. A genetic investigation (Takahashi et al., 2008), which analysed microsatellite markers of the Nemuro peninsula population, supports this idea of a Refugio. The present distribution of the three subspecies on the Eurasian mainland makes South-western Siberia the 'centre of diversity' , and the idea of their spread from this centre seems rather convincing. The median network treats the Turkey, Iran, and Kyrgyzstan proveniences as attached to the Central European haplotypes, but it may also be possible that they evolved the other way, from Southern Siberia via Kyrgyzstan and Iran towards Western Europe. With more data, possibly our knowledge about these events will be more precise, and a detailed analysis of the genetic structure of populations will be central for such progress.

Colour form or subspecies?
However, it is absolutely priority that we finish the inconsiderate use of scientific names and the mixing of 'colour forms' and subspecies. B. albocinctus has been described as a colour form of B. lucorum from Kamchatka, we know meanwhile that this ' colour form' is part of B. cryptarum, and that specimens from the provenience 'Russian Far East' identified as B. cryptarum albocinctus are characterised by their specific COI sequences (Haplogroup D). Therefore the use of 'albocinctus' for all colour forms of B. cryptarum with bleached bands (whitish-yellow) all over mainland Eurasia is misleading (see Fig. 5 of Williams et al. [2012]), not all specimens B. cryptarum with bleached colour bands belong to Haplogroup D and not all specimens B. cryptarum albocinctus (Haplogroup D) have bleached colour bands, most specimens of B. burjaeticus Krüger (which belong to B. cryptarum albocinctus (Haplogroup D) are quite yellow, or even melanised. Another example could be B. lapidarius decipiens Perez 1890 described as a subspecies of B. lapidarius from the Pyrenees and later on expanded to the yellow-banded specimens of B. lapidarius from Southern Italy and Sicily. An analysis of the COI gene by Lecocq et al. (2013) shows a difference of B. lapidarius decipiens from Spain to B. lapidarius lapidarius of only one mutation, B. lapidarius decipiens is a colour form, which from the genetical distance does not merit the status of a subspecies. Whereas the yellow banded specimens from Southern Italy and Sicily with a genetic distance of about 9 mutations represent a subspecies, which should get a separate name. It would be a great progress if this mixing of taxa defined by mtDNA (maternal genetic lineages) with mere colour forms would find an end.

Tab. 2:
List of sequences Bombus cryptarum pallidocinctus used in the present analysis with Genbank numbers, and collection locality information. For about 20 unpublished sequences from this new subspecies see 'map' and 'taxon-ID-tree' for Bold: ACE4135.