Genomic analysis of the tribe Emesidini (Lepidoptera: Riodinidae)

We obtained and phylogenetically analyzed whole genome shotgun sequences of nearly all species from the tribe Emesidini Seraphim, Freitas & Kaminski, 2018 (Riodinidae) and representatives from other Riodinidae tribes. We see that the recently proposed genera Neoapodemia Trujano, 2018 and Plesioarida Trujano & García, 2018 are closely allied with Apodemia C. & R. Felder, [1865] and are better viewed as its subgenera, new status. Overall, Emesis Fabricius, 1807 and Apodemia (even after inclusion of the two subgenera) are so phylogenetically close that several species have been previously swapped between these two genera. New combinations are: Apodemia (Neoapodemia) zela (Butler, 1870), Apodemia (Neoapodemia) ares (Edwards, 1882), and Apodemia (Neoapodemia) arnacis (Stichel, 1928) (not Emesis); and Emesis phyciodoides (Barnes & Benjamin, 1924) (not Apodemia), assigned to each genus by their monophyly in genomic trees with the type species (TS) of the genus. Surprisingly, we find that Emesis emesia Hewitson, 1867 is not grouped with Emesis, but in addition to Apodemia forms a third lineage of similar rank, here named Curvie Grishin, gen. n. (TS: Symmachia emesia Hewitson, 1867). Furthermore, we partition Emesis into 6 subgenera (4 new): Emesis (TS: Hesperia ovidius Fabricius, 1793, a subjective junior synonym of Papilio cereus Linnaeus, 1767), Aphacitis Hübner, [1819] (TS: Papilio dyndima Cramer, [1780], a subjective junior synonym of Papilio lucinda Cramer, [1775]), Poeasia Grishin, subgen. n. (TS: Emesis poeas Godman, [1901]), Mandania Grishin, subgen. n. (TS: Papilio mandana Cramer, [1780]), Brimia Grishin, subgen. n. (TS: Emesis brimo Godman & Salvin, 1889), and Tenedia Grishin, subgen. n. (TS: Emesis tenedia C. & R. Felder, 1861). Next, genomic comparison of primary type specimens suggests new status for Emesis vimena Schaus, 1928 as a subspecies of Emesis brimo Godman & Salvin, 1889, Emesis adelpha Le Cerf, 1958 with E. a. vicaria Le Cerf, 1958 are subspecies of Emesis heteroclita Stichel, 1929, and Emesis tristis Stichel, 1929 is not a synonym of E. brimo vimena but of Emesis lupina Godman & Salvin, 1886. A new status of a species is given to the following taxa: Emesis furor A. Butler & H. Druce, 1872 (not a subspecies of E. mandana (Cramer, 1780)), Emesis melancholica Stichel, 1916 (not a subspecies of E. lupina Godman & Salvin, 1886), Emesis progne (Godman, 1903) (not a subspecies of E. brimo Godman & Salvin, 1889), and Emesis opaca Stichel, 1910 (not a synonym of E. lucinda (Cramer, 1775)). Emesis castigata diringeri Gallard 2008 is a subjective junior synonym of E. opaca, new status. Finally, Xanthosa Grishin, gen. n. (TS: Charmona xanthosa Stichel, 1910) is proposed for a sister lineage of Sertania Callaghan & Kaminski, 2017 and Befrostia Grishin, gen. n. (TS: Emesis elegia Stichel, 1929) is proposed for a clade without apparent phylogenetic affinities that we place in Befrostiini Grishin, trib. n. In conclusion, genomic data reveal a number of errors in the current classification of Emesidini and allow us to confidently reclassify the tribe partitioning it in three genera: Apodemia, Curvie gen. n. and Emesis.


Introduction
Metalmark butterflies (family Riodinidae) are distributed worldwide, but the majority of species are found in the Neotropics (Callaghan and Lamas 2004). The family was recognized as a group by Bates, although by a different name (Bates 1868). Stichel comprehensively revised the family, and his pioneering works formed the basis for our current knowledge (Stichel 1910(Stichel -19111928;1930-1931. Harvey refined the higher classification of Riodinidae applying phylogenetic methods to morphological characters (Harvey 1987). However, metalmarks are particularly diverse in their wing shapes and patterns making them challenging to classify based on morphology. DNA-based phylogenies can be more revealing and two larger-scale studies have been instrumental for our understanding of this family (Espeland et al. 2015;Seraphim et al. 2018). These studies revealed several unsuspected evolutionary connections and suggested that a number of species have been misclassified.
Metalmarks are most species-rich in tropical regions, and only several phylogenetic lineages extend to the north. One of these clades is the tribe Emesidini Seraphim, Freitas & Kaminski, 2018. This tribe was proposed as Emesini by Stichel (1911) and in addition to Emesis Fabricius, 1807 and Apodemia C. & R. Felder, 1865, the two genera the tribe is currently composed of, included a number of others, since then transferred to other tribes. Harvey (1987) noted the similarity between Emesis and Apodemia in the position of the silk girdle in their pupae, but placed these and several other genera as 'Emesini' incertae sedis. Regardless, the name Emesini is a junior homonym of Emesini Amyot and Serville, 1843 (Hemiptera) and thus is invalid, so a new name Emesidini was proposed in Seraphim et al. (2018). While this tribe is still most diverse in the Neotropics, its northern offshoot Apodemia reaches Canada. This genus has been critically re-evaluated recently and split into 3 genera, based in part on the evidence from DNA sequences (Trujano-Ortega et al. 2018).
With the advent of genomic sequencing, it becomes feasible to look at the complete genotypes of organisms and learn about their evolution at the level not previously possible (Li et al. 2019). The genomic landscape of a phylogenetic group chosen for the study reveals many unsuspected nuances with implications for its classification. Groups thought to be monophyletic may not be such, and species dissimilar in their appearance may turn out to be close relatives. DNA-based revision also prompts us to think about consistent and more objective criteria to define taxonomic groups and their ranks (Li et al. 2019;Talavera et al. 2012). These methods seem to produce meaningful results, because genetic differentiation leads to phenotypic divergence that was used previously to outline genera by morphology. Understanding correlation between genomic differences and phenotypic divergence is an emerging field of research (Casanova et al. 2018;Costanzo et al. 2019).
In this study, we applied the methods of genomics to the tribe Emesidini. We sequenced and analyzed genomic data for nearly all species, including a number of primary type specimens, and placed it in the phylogenetic context of all Riodinidae by sequencing representatives of other tribes and subtribes. The most surprising result was the need for a new genus for Emesis emesia (Hewitson, 1867), which became the focus of this study. In addition, we classify Emesis into subgenera, find and place some species that do not belong to it in new genera and even define a new tribe. We conclude that genomic approaches bring much needed insights into evolution of Emesidini and allows to improve their classification.

Materials and Methods
Methods used in this study are essentially identical to those described by us in previous publications Shen et al. 2015;Zhang et al. 2019) and are particularly detailed in the SI Appendix to our recently published study (Li et al. 2019). In brief, DNA was extracted from legs of specimens, genomic libraries were constructed and sequenced for 150 bp from both ends targeting 7 Gbp of data on Illumina HiSeq x10 at GENEWIZ. The resulting reads were matched using Diamond (Buchfink et al. 2015) to the exons of the reference genome of Calephelis nemesis (Cong et al. 2017) we have obtained previously.
Coding regions of mitochondrial genome were assembled similarly. Exons expected to be from the Z chromosome were predicted assuming similar syntenic arrangement with Heliconius (Heliconius Genome Consortium 2012). This assumption is reasonable due to the deep conservation of Z chromosome in Lepidoptera (Fraisse et al. 2017). Phylogenetic trees were generated from 3 sets of exons: whole nuclear genome, whole mitochondrial genome and Z-chromosome using RAxML-NG (Kozlov et al. 2018) with default parameters (-m GTRGAMMA). The data used in this project were deposited at NCBI Sequence Read Archive with accession PRJNA549759.
We sampled 52 species placed in Emesidini prior to this work, totaling 66 specimens. Most species were represented by one specimen. However, nine specimens from all parts of the range and of different ages were sequenced for Emesis emesia (Hewitson, 1867), because we noticed a potential taxonomic problem with this species and wanted to study it more rigorously. We did not have DNA samples for only four species: Apodemia planeca R. de la Maza & J. de la Maza, 2017, Apodemia selvatica R. de la Maza & J. de la Maza, 2017, Emesis sinuata Hewitson, 1877 and Emesis toltec Reakirt, 1866, all others were used in our analysis. In addition, 22 Riodinidae were selected as outgroups, representing all tribes and most subtribes of the family. The entire tree was rooted with Curetis bulis (Westwood, 1852) (Lycaenidae). Nine names where represented by their primary type specimens. Data about these specimens are summarized in the Table S1 (see supplementary file).
We identified diagnostic DNA characters in nuclear genomic sequences using our recently published procedure (see SI Appendix to Li et al. 2019). We found those positions in exons that were most likely to be synapomorphic for the clade being diagnosed. For a clade with several specimens sequenced, positions that are invariant in all species from this clade and have a base pair different from the (mostly invariant) base pair in all other clades were found, and those with the smallest number of species with missing data were selected. If the clade had only one or two specimen sequenced, we detected synapomorphic characters for its sister clade, taking not that base pair as the character state, and added these characters to the synapomorphic characters for the clade that leads to the common ancestor of this single specimen clade and its sister clade. The union of these characters was used to diagnose the taxon. This more sophisticated treatment increases the chances that the character found is not a random non-conserved change or a sequencing error. Moreover, number of sequence reads covering this position was accounted for in choosing the characters, and priority was given to positions with better coverage. The character states are given in diagnoses below as abbreviations. E.g., cne1547.14.1:T789C means position 789 in exon 1 of gene 14 from scaffold 1547 of Calephelis nemesis (cne) reference genome (Cong et al. 2017) is C, changed from T in the ancestor. When characters were found for the sister clade of the diagnosed taxon, the following statement was used: cne1086.2.12: G82G (not A), which means that position 82 in exon 12 of gene 2 on scaffold 1086 is occupied by the ancestral base pair G, which was changed to A in the sister clade (so it is not A in the diagnosed taxon). 145A, means position 145 is A, but the ancestral state is unclear. The sequences of exons from the reference genome with the positions used as character states highlighted in green are given in the Supplementary file. Distribution of these sequences together with this publication ensures that the numbers given in the diagnoses can be easily associated with actual sequences, which can be found in other genomic-scale datasets, or amplified with specifically developed primers. Furthermore, we provide a list of characters detected in the standard COI barcode region of 658 positions as defined previously (Ratnasingham and Hebert 2007).
Derivation of the name.-The name is a feminine noun in the nominative singular given for the curved forewing costa. We see that the three trees show differences in topology (Fig. 1). In particular, mitogenome protein-coding regions are not sufficient to resolve a number of clades and support values for these are below 75%. While Curvie is sister to Apodemia in both nuclear trees, it is sister to the clade formed by Apodemia and Emesis in the mitogenome tree. However, all three genera (Apodemia, Curvie and Emesis) are monophyletic even in the mitogenome tree indicating closeness within each genus and their prominent separation from all others.
After the monophyly of the three genera in the tribe was assured by transferring species between Apodemia and Emesis and erecting the genus Curvie n. gen., we studied each genus to find meaningful phylogenetic groups to be defined as subgenera. While no additional partitions to those proposed by Trujano-Ortega et al. (2018) can be found in Apodemia, Emesis splits into 6 clades. These clades are observed in all three trees (Fig. 1). Two of them have names (Emesis and Aphacitis Hübner, [1819]), and four are described here as new.
Derivation of the name.-The name is a feminine noun in the nominative singular. It is formed from the type species name.
Next, our genomic analysis revealed that some species placed in Emesis do not belong in that genus. In agreement with morphological analysis of Hall & Harvey (2002), genomic phylogeny places Emesis xanthosa (Stichel, 1919) as a sister of Sertania guttata (Stichel, 1910), the type species of a genus described recently (Kaminski et al. 2017). The difference between COI barcodes of E. xanthosa and Sertania guttata is about 9%, similar to the difference of 8% between two genera Lasaia H. Bates, 1868 and Calephelis Grote & Robinson, 1869, but larger that the difference of 6.5% between two subgenera Neoapodemia Trujano, 2018 and Plesioarida Trujano & García, 2018. Therefore, similarly to Kaminski et al. (2017), we do not place xanthosa in Sertania, but a new genus is erected for it here. wing ground color, both above and below, and similar pattern of spots between ventral hindwing and forewing, instead of 3-toned (darker orange, paler yellowish-orange and brown) wings in Sertania with ventral hindwing patterned differently from forewing; and by prominent dark submarginal spots on wings above and below ( Fig. 2e-g). In DNA, a combination of the following base pairs is diagnostic: nuclear genome: cne658. Furthermore, and to our surprise, Emesis elegia Stichel, 1929, for which we sequenced primary type specimens (Fig. 2i-k), was not allied to Emesis. Genomic phylogeny placed this species away from all other Riodininae, right at the point of rapid diversification of the subfamily and not associating it with any tribe. Although E. elegia has a general appearance of Emesis, it differs from it by extensive pale overscaling at the wing bases below not present in any Emesis species. Also, hindwing has nearly rectangular shape in females, different from Emesis. Instead of being related to Emesis, genomics revealed another surprise. Sequencing of the Lasaia lalannei Gallard, 2008 holotype (Fig. 2l) revealed its close similarity to E. elegia, but the lack of association with Lasaia H. Bates, 1868 (Fig.  1). As discussed in the original publication (Gallard 2008), genitalia of L. lalannei do not agree with the placement of this species in Lasaia. To accommodate these differences and similarities, a new genus is proposed for these species here.

Genera included:
Only the type genus.
Selecting specimen for the genomic analysis, we attempted to sequence as many Emesis species as we could find. More, some of these were represented by their primary type specimens. Analysis of primary types enables us to put our taxonomic analysis on solid footing. We find that the syntype of Emesis vimena Schaus, 1928 from Guatemala is tightly grouped with Emesis brimo Godman & Salvin, 1889 (e.g. only 0.6% difference in COI barcodes) and is better viewed as a more northern subspecies of this species. Emesis tristis Stichel, 1929 considered a synonym of E. vimena, should instead be a synonym of Emesis lupina Godman & Salvin, [1886]. Sequencing of primary type specimens of Emesis adelpha Le Cerf, 1958 and Emesis heteroclita Stichel, 1929 suggests their conspecificity. However, due to wing pattern differences and differences in their distributions, we view E. adelpha and its subspecies E. a. vicaria Le Cerf, 1958 as subspecies of E. heteroclita, rather than its synonyms.
Conversely, we find that some taxa placed as subspecies differ markedly from their nominal subspecies and should be considered distinct as the species level (new status): Emesis furor A. Butler & H. Druce, 1872 (not a subspecies of E. mandana (Cramer, 1780): not sister taxa, COI barcodes difference of about 2%), Emesis melancholica Stichel, 1916 (not a subspecies of E. lupina Godman & Salvin, 1886: not in the same clade, COI barcodes 9% different), and Emesis progne (Godman, 1903) (not a subspecies of E. brimo Godman & Salvin, 1889: COI barcodes 3.8% different). Furthermore, Emesis opaca Stichel, 1910 is not a synonym of E. lucinda (Cramer, 1775) (COI barcodes difference nearly 5%), but a valid species, new status. This change further reveals that Emesis castigata diringeri Gallard 2008 is not a subspecies of E. castigata Stichel, 1910 (COI barcodes difference about 3%), and due to genomic (Fig. 2, COI barcodes are 100% identical) and morphological similarities we suggest it to be a subjective junior synonym of E. opaca, new status. Both taxa are from French Guiana. We summarize our results as the following taxonomic list.
Taxonomic arrangement of the tribe Emesidini.-The list of species arranged into genera and subgenera resulting from our genomic analysis augmented with morphological considerations is given below. Synonymic names are included for genera and subgenera. Names treated as synonyms (genera and names of type species that are considered to be synonyms) are preceded by "=": not followed by daggers are subjective junior synonyms; † objective junior synonyms; ‡ unavailable names (such as homonyms and nomina nuda); "preocc." indicates preoccupied, the taxonomic order (insects) of the senior name is shown in brackets. Synonyms are attributed to subgenera. Type species (TS) for genera and subgenera are listed. For type species that are considered to be synonyms, valid names are shown in parenthesis. For valid genera and subgenera (not their synonyms), names of the type species or names which type species are considered to be synonyms of, are underlined in the list. The type of change is explained after the name (new status, new combination, new placement), and the former status or the genus of former placement is listed. Subspecies names are not listed (except those resulting from the status change in this work) pending further studies.

Discussion
In the absence of DNA sequences, it is not readily apparent that Emesis emesia is not monophyletic with Emesis. In particular, the similarity in wing shapes apparent between E. emesia and the type species of Emesis, E. cereus, and reinforced by similar wing patterns, does not raise any suspicions. Here, genomic analysis is critical. In the absence of vast DNA sequence information, grouping of E. emesia with Apodemia rather than with Emesis would seem spurious. Even with our large datasets we were cautious to accept the paraphyly of Emesis. To avoid the effects of poor sample quality, we sequenced 9 E. emesia specimens from its entire range from southern US to Costa Rica. To avoid negative effects possible due to poor taxon sampling, we sequenced nearly all species from the tribe Emesidini, including type species of all available genus-group names. Confirming our preliminary analyses on smaller datasets, E. emesia was not in the same clade with Emesis with very high statistical support. Even if this phylogeny is incorrect, E. emesia is at a larger phylogenetic distance from Emesis species, than they are from each other suggesting that it does not belong to Emesis. Therefore, proposing a new genus-group name is justified by our analysis.
A contentious issue is a rank assigned to a genus-group name. Currently, there are no objective criteria to select a clade that is a genus or a clade that is a subgenus. Several reasonable considerations include the age of the clade, its prominence (relative branch length of that clade compared to others should be larger) and agreement with the currently used classification. Intuitively, genus should correspond to major groupings above species but below tribe. For Emesidini, the first split of the nuclear trees separates well-known genera Apodemia and Emesis. So, in principle, we can divide the tribe into 2 genera. However, inclusion of E. emesis into Apodemia makes it a less prominent genus, because the branch of this clade is short. More, E. emesia is not phenotypically similar to Apodemia because it was placed in Emesis before. Considering its large evolutionary distance from Apodemia, and non-monophyly with Emesis, we took the next major level (3 groups) to be a genus.
It is possible to move the genus boundary closer to the leaves of the tree. Indeed, recently Apodemia was split into 3 genera, two of which were proposed as new: Neoapodemia Trujano, 2018 and Plesioarida Trujano & García, 2018. If these are treated as genera, to be consistent with their age, Emesis would need to be split into several genera as well.
Would such action be desirable? The resulting small genera will contain species that are very close to each other and these units do not have clear further partitions into smaller groups. However, there are two genus-group ranks: genus and subgenus. Thus, if the genus rank is assigned to a group that cannot be meaningfully split any further, the rank of subgenus cannot be used. Number of meaningful levels in a phylogenetic tree exceeds the number of ranks in classification. Thus, it seems undesirable to further reduce the number of ranks by making subgenus level impossible due to genus level clades placed too close to the leaves. From historical perspective, it is equally undesirable to introduce many additional names within clearly defined and prominent monophyletic groups that already have the names (Apodemia and Emesis) widely in use for a century. For these reasons, we treat most prominent clades at the level below Emesis and Apodemia (and thus Curvie) as subgenera. We confirm recently published results (Trujano-Ortega et al. 2018) that the three groupings within Apodemia are phylogenetically meaningful, but use them at the subgenus rank. Furthermore, we propose similar level subgroups within Emesis that we also assign the subgenus rank.
As discussed recently (Grishin 2019), if family-or genus-group taxa are discovered using phylogenetic trees constructed from DNA sequences and their monophyly is ensured with these trees, the most direct diagnoses of these taxa would include DNA characters. The diagnosis cannot simply refer to the phylogenetic tree, because statements like "diagnosed based on DNA similarities and position in the phylogenetic tree" are not sufficient according to the ICZN Code (ICZN 1999). Article 13.1.1. of the Code requires that the diagnostic characters are explicitly stated in words. Positions in DNA sequences can be used as characters, and base pairs in these positions would be character states. An ideal DNA character would be an invariant and unique synapomorphy, i.e., a base pair that is invariantly the same in all individuals of the diagnosed taxon, and is different from all individuals in all other taxa. Such characters are challenging to find, especially when only a small number of individuals are sequenced. To maximize the probability that the characters are indeed meaningful, we developed a sophisticated method that finds several highly conserved positions in genomic regions that are most accurately sequenced (see Materials and Methods). High conservation of the position outside the clade being diagnosed increases the probability that the base pair change occurred in the last common ancestor of the diagnosed clade and thus indeed is a likely synapomorphy. Several of such characters are found for each taxon and are listed in the diagnosis section in addition to morphological characters. We believe that DNA-based diagnoses may be more meaningful and may be more robust to extrapolation than morphological characters, when additional (yet unknown or not included in the analysis) species that belong to the taxon and discovered and/or sequenced. At the very least, they complement morphological diagnoses with orthogonal evidence.