October 1, 2023

Reporting MSY haplogroups in horse populations around the globe

A previously-released fine-scale MSY HT topology23,24 provided the basis for the selection of 16 MSY diagnostic markers for major haplogroups (mjHGs) in modern horse breeds. Genotyping these markers helped to differentiate Przewalski’s horses’, as well as six non-Crown and three Crown mjHGs (Fig. 1a, Supplementary Table S1). In order to test whether other non-Crown HGs could be identified, we genotyped those loci amongst 15 additional Przewalski’s horses as well as 1522 domestic horses representing 135 breeds. These breeds spanned a broad geographical and phenotypic distribution range, and not only included many economically important and globally distributed breeds, but also multiple breeds with a documented history of genetic isolation, such as Icelandic horses, or horse populations raised in Mongolia, Yakutia and Japan (dataset given in Supplementary Table S2). The previously established HG nomenclature23,24,26 was adapted to reflect the newly generated data in an unambiguous naming system (Supplementary Information File S1).

Figure 1

Distribution of MSY haplogroups across horse populations. (a) The tree shows the topology of ten mjHGs resulting from the 16 MSY markers (Supplementary Table S1) selected for genotyping. The markers are given on branches. The five observed inner node clustering positions are denoted by a ‘*’ in their nomenclature. (b) MSY HG frequencies from genotyping 15 Przewalski’s horses and 1522 domestic horses representing 135 breeds. Domestic horses were grouped into 10 geographic regions and the number of samples from each region is given in parenthesis. The color code is according to (a) and the full dataset is given in Supplementary Table S2.

Genotyping results corroborated the pronounced signature of Crown lineages (indicated as daC_A/H/T in Fig. 1b) across all geographical regions, with Crown HG reaching frequencies from 48% in South East Asia, to full fixation in Central and Western Asia, the Arabian Peninsula, North Africa, and Latin America (Fig. 1b, Supplementary Table S2). Our comprehensive MSY panel further unveiled the distribution of previously defined non-Crown HGs around the world. For example, we confirmed the presence of HGs db, da2_M, da3_J, da3_Y in Asia and/or Russia. Interestingly, HGs da1_N and da1_I were detected in Northern European breeds, but da1_N was identified in North American populations for the first time. Similarly, HG p was confirmed to be common amongst Przewalski’s horses, but was also in a single Mongolian horse.

Strikingly, our dataset included 302 non-Crown samples, 191 of which clustered outside of the mjHGs previously defined. These samples specifically placed on the inner nodes of the MSY phylogeny, as indicated with a ‘*’ suffix (i. e. da1*, da2*, da3*, and da4*), most frequently along the da1* inner node (N = 133). Together with 30 samples allocated to the basal Crown node (daC*), our results uncovered a considerable fraction of previously unknown Y chromosomal diversity in domestic horses. This motivated the further characterization of those HTs using next-generation sequencing data and a finer-grain horse MSY phylogeny.

Improving the horse MSY tree

We used the genotyping data to select 31 samples clustering on the inner nodes of the MSY phylogeny and the Mongolian horse in HG p for target enriched sequencing (TES) of 5.06 Mb of MSY single copy Y (scY) region. These were processed together with an additional set of 45 samples representing haplogroups or breeds hitherto overlooked. TES data were analyzed together with samples previously characterized through TES experiments (N = 39,24) and whole-genome sequencing (WGS) (N = 55)23, representing a final dataset of 165 domestic horses and five Przewalski’s horses (see Supplementary Table S3). Variant calling and stringent filtering provided a total of 2678 Single Nucleotide Polymorphisms (SNPs) that were used for maximum parsimony tree reconstruction (Fig. 2a, Supplementary Fig. S1).

Figure 2
figure 2

Horse Y phylogeny from stringently filtered SNPs. (a) The maximum parsimony tree is based on 2678 SNPs gathered from NGS data analysis of 165 domestic horse and 5 Przewalski’s horse samples. Number of mutations are given on the branches of the tree. Haplogroups newly identified in this study are marked with an underline. Because of the extensive number of samples in Crown haplogroup, it is collapsed into its five major branches. (b) Shows the number of samples in each cluster and their region of origin.

The resulting finer-grained phylogeny confirmed the general separation of domestic (da/db) and Przewalski’s horse’s (pa/pb) HGs23,26, in line with mjHGs diagnostic markers, as well as the Mongolian horse forming a new branch within Przewalski’s group (pb_D). All other domestic horses grouped into clades da or db and the HTs in da showed a pronounced multifurcation from the basal node da1 (Fig. 2). In total, we identified 21 non-Crown and 5 Crown mjHGs in domestic horses, 17 of which were defined for the first time (Fig. 2, underlined labels) and mainly comprised Asian horses or isolated populations. In the Crown, we defined two novel basal branching HGs; daC_O in a single sample from Mongolia, and daC_S in one European and two Latin American horses. Finally, the Crown mjHG daC_A was represented by the largest number of samples (N = 56) and showed a worldwide distribution.

Despite the identification of many new HTs, the sequence diversity on the horse domestic MSY was found to be extremely limited (Table 1). This was especially so when considering intensively managed breeds from Central Europe and the Americas, which mostly carry Crown HTs (Watterson’s θ = 4.57–8.87 × 10–6). The lack of genetic diversity precluded further resolution of the phylogenetic relationship among Crown HTs on the basis of SNP variation only. In order to further improve phylogenetic resolution, short insertion and deletion polymorphisms (indels) and a previously described short tandem repeat (STR)23,24,31 were included as well as imputed genotypes (see Supplementary Information S1) resulting in a total of 2,966 MSY variants (2781 SNPs, 184 indels, 1 STR). This provided the basis for the refined parsimony tree, ‘horseYtree.vs1’, and the placement of diagnostic positions (‘identifiers’) underlying each individual branch (Fig. 3). Our strategy confirmed 2031 identifiers previously described (r-, s-, f- and q-variants) but uncovered 935 (880 SNPs, 55 indels), that were ascertained for the first time (e-variants; Supplementary Table S4). Overall, the horseYtree.vs1 phylogeny included five HTs in p, four in db and 144 in da (Fig. 3). In the Crown we distinguished 107 HTs from 487 variants in 120 samples, out of which 40 were previously unknown (Fig. 3b). Besides the two new Crown mjHGs (daC_O and daC_S), we detected several former unresolved24,31 early branching Crown subhaplogroups (for example daC_Ak, daC_Ad-m, daC_Ad-s or daC_T2r). This demonstrated the wide distribution of Oriental lineages in addition to the previously identified Thoroughbred22 and Arabian23 signatures.

Table 1 Horse MSY HT diversity across geographic regions based on stringently filtered SNPs (related to Fig. 2).
Figure 3
figure 3

Horse MSY HT topology ‘horseYtree.vs1’. Fine-scaled HT tree from 165 domestic and five Przewalski’s horses tree based on 2966 MSY variants. Variants (‘identifiers’) are given on branches in red. The samples (IDs) are at the tips, and samples carrying HTs first detected in this study are circled with solid lines. mjHGs and breeds are denoted in the outer circle. Breeds are identified by geographic region as indicated in the box. (a) HTs outside the Crown. For long branches the number of identifiers is given in parenthesis (full information in Supplementary Table S4). (b) Crown HTs. In the Crown, the 39 subhaplogroups (sHGs) are separated with black dots and denoted in grey letters. The black lines in the outer circle mark the previously defined Thoroughbred (dac_Tb-d, dac_Tb-oB3b, dac_T3a, dac_Tb-oB1) and Arabian signatures (dac_Ao-aA, daC_Ao-aA1D)23,24. Details on samples, variants and HGs are given in Supplementary Tables S3, S4. The figure is fully readable only in the digital version.

Ancient DNA insights into the emergence of MSY haplogroups

Our refined MSY phylogeny showed that the da-expansion was wider than previously suggested (designated as ‘Dom-West’ in23), and that many db branches might have formed in a similar timeframe (Fig. 2a). It has recently been shown that the modern domestic horse (DOM2) was domesticated in the Western Eurasian steppe around 4200 years before present (BP) and does not descend from previous domestic bloodlines emerging at Botai around 5500 years BP18,20,32. However, in previous work, estimates of the basal branching point of da (da1) applying the MSY molecular clock23, returned a significantly younger timeframe (2600 ± 900 years BP) than for the DOM2 domestication. Radiocarbon date and/or archaeological context associated with ancient specimens18,20,32 allowed us to date the emergence of some of the main branches of the MSY phylogeny, without relying on previous molecular clock calibrations (Fig. 4a,b). The majority of the 163 ancient samples showing sufficient coverage clustered into da1 or db (N = 98), and included samples dating back from the 2nd millennium BCE onwards. The three oldest samples clustering within da1 (KSH5_Kaz_3845BP, Halvai2_Kaz_3806BP, UR17 × 5_Rus_3901BP) suggest that this HG already segregated in Russia/Kazakhstan between 4000 and 3500 years BP. In Europe, da1 HGs were also retrieved from a single Bronze age sample (Gar3_Rom_3489 from Romania) and from many sites dated to the last 2000 years (Fig. 4b). Sample TP4_Geo_3528BP proves the occurrence of the db HG in West Asia (Georgia) around ~ 3500 years BP. The ancient samples in db (N = 14) suggest that this clade, which is restricted today to East Asia and Russia, had a broader distribution in early domestic horses until the Sassanid and Byzantine period. Since none of the samples carrying da1 and db were associated with autosomal ancestry profiles characteristic of other populations than the DOM2 lineage18,20,32, the MSY DNA evidence analyzed here demonstrates the tight association of the da1 and db HGs with the DOM2 domestication and further spread across Eurasia. It also revealed that the two HGs were formed prior to what was estimated on the basis of molecular clocks (i.e. for da at least ~ 3900 years ago vs ~ 2600 years in23).

Figure 4
figure 4

Clustering of ancient samples and emergence of haplogroups. (a) Results from grouping of 149 samples (Y-axis) into four clusters based on number of derived/ancestral states of identifier variants (X-axis). (b) Illustrated summary of the clustering of all ancient samples (according to panel a). The number of identifier SNPs is given on the branches, and the number of samples in each cluster is given in the dashed circles. Groupings that were only detected in ancient samples are given in Greek letters. The spatiotemporal distribution of the 98 samples clustered in db and da1 is shown in the lower panel. The oldest samples in da1 and db, which are dated to the 4000–3500 BP period, are outlined. (c) Clustering of 53 sufficiently sequenced da1 samples into one of the mjHGs. The six samples that clustered to the basal node of the Crown (daC) are given on the right. Full information is given in Supplementary Table S5. (d) Hypothesized emergence and spread of Y haplotypes in domestic and Przewalksi’s horses on the basis of currently available data. Coloured lines represent detected, grey lines uncovered haplotypes.

Outside da1 and db, we clustered 65 samples (Supplementary Fig. S3). Twenty of these, representing a timespan from 36800 to 4000 years BP and a Tarpan sample from the early twentieth century CE obtained from natural history museum collections, have been placed either in p (Botai, ~ 5500 BP; N = 5), π* (Pershinskaya, Upper Paleolithic; RN115_Rus_16686BP) and ρο* (Upper Paleolithic to Bronze Age and the twentieth century Tarpan sample; N = 14). The remaining 45 samples were placed along the Þ*, δα* and δβ* branches of the MSY phylogeny and were mainly dated to the 6000–2000 years BP time range. It is noteworthy that 20 of those samples showed a typical DOM2 autosomal ancestry profile (Supplementary Table S5), which suggests that their observed MSY pattern captures the evolutionary events associated with the domestication of wild horses into the DOM2 lineage. Importantly, the considerable proportion of ancient samples clustering outside da and db confirmed previous work reporting far more MSY diversity within ancient than modern domestic horses19. This suggests that the spread and management of DOM2 horses in the past 4000 years was associated with a major change in the Y-chromosomal gene pool (Fig. 4d), leading to the prevalence of the da and db lineages17,19,33.

In previous work, we postulated that the rise and spread of the Crown HG were associated with the expansion of Oriental bloodlines26, and the most recent common ancestor of the Crown was dated around 1500 ± 500 years BP assuming a molecular clock of 1.69 × 10−9 mutations/site/year23. To test for the validity of this model, we identified 53 sufficiently sequenced ancient da1 HG samples (Supplementary Information File S1) and checked whether they clustered into any of the mjHGs. A total of 34 such samples showed a basal placement within da1, while the remaining 19 belonged to da mjHGs (Fig. 4c, Supplementary Fig. S4, Supplementary Table S5), including six daC samples, representing the most basal node of the Crown. The oldest such samples originated from the Byzantine site of Yenikapi, Turkey, which covers a time period between the fourth and the eleventh century CE. For one specimen (Tur140_Tur_1289BP) the age has been confirmed to the seventh/eighth century CE by radiocarbon dating20. Therefore, ancient DNA evidence is congruent with previous dating estimates for the emergence of the Crown group.

Unique Y diversity in indigenous horse populations

Our refined MSY tree revealed the presence of several unique Y lineages within indigenous populations. Some such mjHGs were supported by at least two samples from a given region (for example, da1_G and da1_Uj in in Mongolia and Japan, respectively), while others were detected across a wider geographic range (for example, da1_Ub-e in Myanmar, Laos and Inner Mongolia) (Fig. 3, Supplementary Table S3). Moreover, the phylogenetic placement of a cryptic HT previously detected in Estonian Native horses29 is now resolved to define one new mjHG (da4_E) together with the Eriskay and the Newfoundland Pony. Also, former hidden private Y variation in Japanese horses30 and inner Mongolian horses28 and several other regions were ascertained (Fig. 3). Some indigenous horses grouped nested within modern breeds (for example the Inner Mongolian horse in daC_Ad-b, the Japanese Horse in daC_Ao-aA) and as ‘horseYtree.vs1’ delineates the chronologic descent of the lineages, this pattern is in coherence with a recent cosmopolitan descent of those lineages.

Overall, Mongolian and Inner Mongolian horses were found to exhibit the most diverse range of MSY HTs, with 14 HTs distributed over 10 mjHGs (Supplementary Table S3). This is not only in line with the pronounced genetic diversity detected in Mongolian and Chinese horse populations reported in previous studies34,35,36,37, but indicates that horse management in these regions preserved some of the original diversity of paternal lineages and was not characterized by the intensive selection of specific stallion bloodlines. Interestingly, one Mongolian horse carried an MSY HT (pb_D) clustering with Przewalski’s horses’ HG (Fig. 2). All todays Przewalski’s horses descend from twelve wild-caught individuals and in a recent study, representing all founding lineages, only two well separated Y-HTs were detected38, which were both represented in our dataset (pa and pb_P). Based on the HT topology, showing clear distinction of pb_D from the HTs segregating in the current Przewalski’s horse population (Fig. 3a), the pb_D individual is unlikely the result of a recent hybridization between a Przewalski’s stallion and a domestic mare. The finding instead suggests past introgression of an HT that is not detectable in the current population of Przewalski’s horses anymore or provides evidence for the previously-reported leaky genetic isolation post-divergence38.

It is also noteworthy that, outside of Asia and northern Europe, non-Crown mjHGs were only detected in samples from populations inhabiting islands (Greece—Skyros, Rhodes; North America—Shackleford Banks/Corolla). In the context of Skyros Horses, which carried three private HTs, the Y signature is in congruence with autosomal data, where Skyros horses remained isolated without any close relationships with modern breeds or Middle Eastern populations39. The MSY data underline that the Skyros as well as Rhodes horse populations have been spared to some extent from recent breeding influences.

The spread of mjHG da1_N

Our new MSY phylogeny not only reveals the true extent of the MSY diversity in modern and ancient horses but also allows us to track the geographic spread of specific haplogroups after horse domestication. The mjHG da1_N provides a clear example about how this approach can provide such new insights. As this mjHG was first detected in Shetland Ponies and Norwegian Fjord Horses26, and the North Swedish draft23,25, it was hypothesized to reflect ‘Northern European’ origins. Our initial screening data confirmed the presence of da1_N in five breeds from Northern Europe but also identified it in horses from the Baltic region and the Shackleford Banks Horse from North Carolina (USA) (Fig. 1, Supplementary Table S2). We genotyped a collection of 92 males, representing six da1_N harbouring breeds, for 16 HG identifiers. We could further split da1_N carriers into three HGs (da1_Nf, da1_Ns-s and da1_Ns-d) and re-assigned 27 individuals to four other mjHGs (Fig. 5; Supplementary Table S1, Supplementary Table S2). The da1_Nf HG was found to be private to Norwegian Fjord (in congruence with breeding history22), while da1_Ns-s was detected in Shetland Ponies, Gotland Ponies and horses from Shackleford Banks. Such phylogenetic clustering indicates that at least a fraction of the horses from Shackleford Banks trace their origins into da1_Ns ancestors from Northern Europe, that were likely imported to the region as part of the colonization history of the continent40,41. All North Swedish drafts were found to carry the da1_Ns-d HG, which was also detected in Zemaitukai horses and a few Gotland ponies. Allover, MSY pattern harmonize with breeding strategy: the isolated breeding in Norwegian Fjords, Shetland Ponies and North Swedish draft, was reflected in the single HG detected. This contrasted to the mixed ancestry in the Gotland Pony and in the Zemaitukai breed. The latter showed four mjHGs most of which (da1_Ns-d as well as the Crown mjHGs daC_A and daC_T) can be explained by recent breeding influences. In addition, we detected da4_Es in the Zemaitukai, showing a common ancestry with the Estonian Native Horse. We did not detect any recently introgressed HTs in Shakleford Banks horses, as we found only two, rather unique, HGs: the above described da1_Ns-s and da1_Uc, a HG shared with Corolla horses, another feral population on Outer Banks. In the ancient dataset we detected da1_N via identifier ‘rBA’ in two samples from the Marvele medieval cemetery in Lithuania (Marvele21_Ltu_1087BP, Marvele18_Ltu_1189BP), dated to the eighth to the eleventh century CE, and a sample from Yenikapi (in present-day Istanbul; radiocarbon dated to 260–395 CE) (Supplementary Fig. S4).

Figure 5
figure 5

MSY lineage tracing in breeds carrying mjHG da1_N. On the left the illustrated tree based on 16 identifiers discriminating the HGs detected in six breeds carrying mjHG da1_N is given, with identifier variants on branches. The three ancient samples that clustered into the da1_N are denoted on its basal branch. On the right, the frequencies of each haplogroup in the studied breeds and their geographical region are shown in a Sankey diagram.

While da1_N was not surprising in the medieval Marvele samples, the Yenikapi finding was rather unexpected. But this observation is an explicit example of how dynamically horses were moved after domestication. As shown here, refined MSY HT tracking based on contemporary breeds is very powerful in revealing recent stallion-mediated population dynamics, whereas a comprehensive collection of ancient samples will be required to elucidate the distribution and impact of stallions in early periods.