A · A1 · A1b · A1b1 · BT · CT · CF · F · G

Haplogroup G

G-M201

Macro-haplogroup
G
Parent clade
F
Formed (estimate)
c. 45,000–50,000 years before present (estimate)
TMRCA (estimate)
c. 20,000–30,000 years ago (estimate for surviving lineages)

Overview

Haplogroup G (G-M201) is a major West Eurasian Y-chromosome lineage that traces back to an early branch of the F-M89 expansion out of Africa. Phylogenetically, G descends from the macro-clade GHIJK, which in turn branches from F and ultimately from the CT–CF backbone shared by most non-African paternal lineages. The defining mutation M201 and a cluster of tightly linked SNPs mark the split of G from its GHIJK sister branch HIJK, placing this event in the Upper Pleistocene at a time when early modern human groups were dispersing and differentiating across the Near East and adjacent regions. G itself is not the result of a single Holocene founder event; instead, it represents a deep and internally structured lineage whose surviving subclades (especially G1 and G2) participated in key demographic processes in West Eurasia. The G2-P287 branch, and particularly its G2a-P15 derivatives, are strongly associated with early Neolithic farmers from Anatolia and the Aegean who carried agriculture into much of Europe. Ancient DNA from early farming sites in central Anatolia, the Balkans, central Europe and the western Mediterranean repeatedly identifies G2a lineages, indicating that haplogroup G contributed disproportionately to the paternal ancestry of the first agriculturalists in Europe. At the same time, G lineages were never restricted to farmers alone. The highest present-day frequencies of G (often in the form of G2 or mixed G1/G2 profiles) occur in the Caucasus and surrounding regions, pointing to a long-standing association with highland and foothill populations of the central Near East. There, G appears to have maintained substantial continuity from the Late Pleistocene through the Bronze Age, later interacting with Iranian, Anatolian, steppe and Near Eastern groups. Elsewhere across Europe, the Middle East, Central Asia and parts of South Asia, G is generally present at low to moderate frequencies, usually as G2 subclades that reflect layered episodes of Neolithic diffusion, Bronze Age mobility and more recent historical movements.

Geographic distribution

Modern distributions of haplogroup G are highly structured. The strongest concentrations occur in and around the Greater Caucasus region, including high frequencies among Ossetians, various North Caucasian groups, Georgians and some populations of eastern Anatolia. In these areas, G (mostly G2) can reach 30–70% of the male population, suggesting long-term regional continuity and possible local founder effects. Outside the Caucasus and northern Near East, G appears at lower but still notable frequencies across much of Europe, West Asia and parts of Central and South Asia. In Europe, G (especially G2a) forms several regional peaks: parts of the Alps (Tyrol), northern and central Italy, Sardinia, the western Balkans, some central European regions, and scattered islands such as Ibiza and Crete. These pockets often coincide with areas where ancient DNA shows strong Neolithic farmer ancestry, underlining the role of G-bearing early agriculturalists in shaping local paternal profiles. In the Near East and Iran, G typically ranges from ~3–15% of males, with higher local frequencies among specific groups in the Zagros, the southern Caucasus fringe and parts of western Iran. In Central Asia and the Eurasian steppe, G is generally rare, but certain tribal or clan groups (for example in Kazakhstan) show strong enrichment in particular G1 or G2 lineages, likely due to historical founder events. In South Asia, G is overall uncommon but can reach modest levels in some populations of the northwestern subcontinent and on specific migration routes from the Iranian Plateau. In North Africa and the eastern Mediterranean, G is present at low frequencies, often associated with historical movements from Anatolia, the Levant or Europe. In sub-Saharan Africa it is essentially absent except where there has been recent back-migration or admixture. In the Americas and Oceania, G occurs exclusively via post-Columbian or historic era migration and does not represent pre-contact lineages.

Ancient DNA

  • Multiple Neolithic and Early Chalcolithic individuals from central Anatolia (e.g., Boncuklu, Barcın and related sites) carry G2a lineages, indicating that G was an important component of the earliest sedentary farming communities in the northern Fertile Crescent and Anatolian plateau.
  • Early European Neolithic burials associated with the Linearbandkeramik (LBK), Starčevo–Körös–Criș and related cultures frequently yield G2a2-derived haplogroups, demonstrating that G-bearing males were central to the pioneering agricultural expansions into central and western Europe.
  • Neolithic and Copper Age remains from the western Mediterranean, including sites in southern France, Iberia and the Alps (for example the Tyrolean Iceman ‘Ötzi’), show repeated occurrences of G2a2b and other G2 subclades, linking haplogroup G to the maritime and overland spread of farming around the western Mediterranean basin.
  • Ancient DNA from Chalcolithic and Bronze Age contexts in Anatolia and the Caucasus continues to reveal G2 and occasionally basal G lineages, consistent with long-term regional continuity as farming, metallurgy and state-level societies developed in these regions.
  • Later prehistoric and historic individuals with G lineages are scattered across a wide arc from the Caucasus and eastern Anatolia to central Europe, the Balkans and the Levant, reflecting the integration of early farmer-derived paternal lines into subsequent demographic events including steppe interactions, imperial expansions and localized founder effects.

Phylogeny & subclades

Within the Y-chromosome tree, G descends from GHIJK under the broad F–CF–CT backbone. The internal structure of G is dominated by two primary branches: G1 (M285/M342) and G2 (P287). G1, historically associated with regions of Iran, Central Asia and parts of the steppe fringe, forms a relatively compact but geographically distinctive branch. G2, by contrast, is both older in terms of substructure depth and far more diverse, including G2a (P15 and allied markers), G2b (M377) and multiple downstream radiations. G2a in particular underwent substantial diversification during the Neolithic and post-Neolithic periods of West Eurasia, producing a dense hierarchy of subclades such as G2a2b (L30/S126), G2a2b1 (M406), G2a2b2a (P303), G2a2b2a1a1b (L497) and numerous regional micro-branches that mark founder events in specific populations. Many of these clades are now strongly associated with distinct geographic foci (for example, the Caucasus, Anatolia, central Europe or the western Mediterranean). Basal G* and G2* lineages are rare and largely confined to the Near East and surrounding regions, where they help anchor the deeper topology of the G tree and calibrate divergence times.

  • G* (basal G-M201 lineages; extremely rare and poorly sampled)
  • G1-M285/M342 (primary branch with foci in Iran, Central Asia and some steppe-border populations)
  • G2-P287 (major branch strongly associated with Neolithic farmers and later West Eurasian expansions)
  • G2a-P15 and downstream clusters (Anatolia, the Caucasus, Europe and the Near East)
  • G2b-M377 and related minor branches (scattered across West and Central Asia with localized founder effects)

Notes & context

Haplogroup G is a key lineage for understanding the early demographic history of West Eurasia and the spread of farming north and west of the Fertile Crescent. Its deep divergence from other F-derived lineages and its strong representation among early Near Eastern and European farmers make it a cornerstone for models of Neolithic expansion. At the same time, modern distributions reveal that the very lineages that were once dominant among early farmers were later diluted in many regions by subsequent male-biased migrations and expansions, including those associated with steppe pastoralists and later historical movements. The high frequencies and diversity of G in the Caucasus, coupled with repeated ancient occurrences in Anatolia and the Levant, suggest that the macro-haplogroup maintained refugial strongholds in mountainous and highland regions even as surrounding lowlands underwent repeated demographic turnover. Ongoing high-coverage Y sequencing continues to refine the internal topology of G, splitting previously broad categories into fine-scale branches that often show striking geographic or ethnic specificity. For a mega-scale atlas, G provides a rich example of how a single Pleistocene-origin haplogroup can embody both deep-time structure and dense Holocene substructure, tying together Near Eastern agricultural origins, European Neolithic dispersals and persistent highland refugia.