Population Genetics · Archaeogenetics · Historical Anthropology

DNA of Indian Castes
Master Atlas

North, South, East, West — Brahmins, Kshatriyas, OBCs, Dalits, Adivasis, and Northeast tribes — all explained through peer-reviewed genetics. Updated with the 2024 LASI-DAD study (2,762 whole genomes) and all major papers from 2009–2024. Written for everyone; referenced for accuracy.

⚠ Genetics describes ancestry, not hierarchy. Steppe, Iranian, AASI, or East Asian ancestry carries zero implications for intelligence, worth, or cultural achievement. This document applies population science to understand history — not to rank human beings. Every proportion reflects where ancestors happened to live thousands of years ago — nothing more.
~35%
Max Steppe ancestry documented — Rors of Haryana
NAR19, LASI24
~46%
Total Caucasian (CHG) ancestry — Kashmiri Pandits
NAR19
~90%
AASI in most isolated Andamanese-related groups
MON17
2,762
Whole genomes in the 2024 LASI-DAD study — India's largest
LASI24 / KER24
~3,000
Years of endogamy genetically separating caste groups
MOR13, BAS16
Section 1 — Foundational Model
The Five Ancient Populations Behind All South Asians

Every person in South Asia is a mixture of ancient populations. Modern archaeogenetics identifies five distinct ancestral streams. Understanding all five is the only way to correctly interpret "Caucasian DNA," "Aryan ancestry," and indigenous lineages. The 2024 LASI-DAD study (2,762 genomes from 18 states) confirmed this three-to-five component structure is robust across the entire subcontinent. NAR19LASI24

Aryan Component

Steppe Pastoralist

Yamnaya/Sintashta cultures from modern Ukraine/Kazakhstan. Arrived ~2000–1500 BCE. Brought Indo-Aryan languages and R1a-Z93. Made of ~50% CHG + ~50% EHG.

Iranian Farmer

Zagros / IVC Ancestor

Farmers from Zagros mountains of Iran, ~8,000–10,000 years ago. ~60% CHG ancestry. Built the IVC alongside AASI. The dominant non-indigenous component in all South Asians.

Caucasian Core

CHG — Caucasus HG

From ancient Caucasus (Georgia/Armenia), 13,000+ years old. The root "Caucasian" ancestry inside both Steppe and Iranian farmers. Related to modern Georgians and Armenians.

Most Indigenous

AASI

Ancient Ancestral South Indian — the original inhabitants. Related to Andamanese Islanders. 60,000+ years in South Asia. Highest in Dalits and tribal groups.

In Steppe People

EHG — East European HG

The other half of Steppe ancestry. Not directly present in South Asians — only arrives inside the Steppe package. Makes North Indian upper castes slightly "Eastern European-like."

Key formula: ANI (old model) ≈ 55% Iranian Farmer + 45% Steppe. ASI (old model) ≈ 75% AASI + 25% Iranian Farmer. Steppe people themselves = ~50% CHG + ~50% EHG. So "Caucasian DNA" flows into South Asians through two routes simultaneously — directly as Iranian Farmer, and embedded inside Steppe ancestry. NAR19JON15
2024 update — LASI-DAD study: Using 2,762 whole-genome sequences, Kerdoncuff et al. (2024) confirmed that Iranian farmer-related ancestry varies between ~27–68% across India, AHG-related (AASI) varies between ~19–69%, and Steppe between ~0–45%. The AHG gradient is strongest along the North-South axis and is significantly associated with language family (Indo-European vs Dravidian) and caste group membership. This is the most comprehensive genomic study of Indian populations to date. LASI24

Key Definitions — Before You Read the Data

Varna — The four-tier ritual hierarchy in Sanskrit texts (Brahmin, Kshatriya, Vaishya, Shudra). A textual category, not a genetic one. Thousands of jatis claim a varna but the social reality was always more complex.
Jati — The actual unit of caste: a specific birth-community with its own endogamy rules, occupational traditions, and regional distribution. India has ~3,000–5,000 jatis. This is what genetics can partially track — not varna.
Endogamy — Marrying only within one's community. The dominant marriage rule for ~2,000+ years. It causes genetic drift, founder effects, and accumulation of rare variants within communities — making Indian castes genetically distinguishable even when geographically proximate.
Haplogroup — A Y-chromosome (paternal) or mtDNA (maternal) lineage. Tracks only one thread out of thousands of ancestors. A haplogroup is a signal, not a definition of a person's ancestry. Never conflate haplogroup with total ancestry.
Founder effect — When a small population becomes isolated (by geography or endogamy), random genetic variants become common by chance. Indian caste communities show some of the most extreme founder effects in the world. BAS16
Fst — Fixation index. A number between 0 and 1 measuring how genetically different two populations are. 0 = identical. 0.01 = minor. 0.05+ = substantial. 0.10+ = very high.
Section 2 — What "Caucasian DNA" Actually Means
Three Different Things People Conflate

When people ask "how much Caucasian DNA" do Indian upper castes have, they are usually conflating three different things. Here they are, separated and explained plainly. JON15LAZ16

1. "Pure" Caucasian = CHG (Caucasus Hunter-Gatherer). Direct lineage from ancient Caucasus populations who lived in modern Georgia and Armenia around 13,000 BCE. Related to modern Georgians, Armenians, and Iranians. Kashmiri Pandits carry the most in India (~43–46% total CHG when both pathways are counted). Plain English: if you ask "how much Caucasian?" this is the most honest answer.
2. "Aryan" Caucasian = Steppe ancestry containing CHG inside it. The Yamnaya Steppe people were ~50% CHG + ~50% EHG. So when North Indian Brahmins carry ~27% Steppe ancestry, roughly 13–14% of that is CHG and 13–14% is EHG — making them partly "Eastern European" too. Plain English: Steppe ancestry ≠ pure Caucasian. It is 50% Caucasian + 50% Eastern European.
3. "Caucasian/Iranian" = Iranian Farmer ancestry. The Zagros mountain farmers (~8,000 BCE) were ~60% CHG themselves. So anyone with Iranian Farmer ancestry also has CHG inside it. ALL South Asians — including Dalits and Adivasis — carry some CHG this way. But upper castes carry far more. Plain English: Even tribal people have some Caucasian ancestry — through the farmer route, not the Aryan route.
What Dalits and tribes actually carry: Very little Steppe and less Iranian farmer ancestry, but they carry AASI lineage — genetically related to Andamanese Islanders and ancient South Asian hunter-gatherers who diverged from the rest of humanity ~60,000 years ago. This is the oldest human heritage in South Asia. Not inferior — genuinely ancient and unique. MON17
Section 3 — CHG / "Caucasian" Ancestry Calculated
Total Caucasian Ancestry Per Group

CHG flows through two routes: directly via Iranian Farmer ancestry (~60% CHG by composition), and embedded within Steppe ancestry (~50% CHG). Adding both gives the true total. JON15

Total CHG = (Steppe% × 0.50) + (Iranian Farmer% × 0.60)
Plain: multiply your Steppe number by 0.5, multiply your Iranian number by 0.6, add them — that's your Caucasian total.
Rors (Haryana) NAR19
Steppe (34%) × 0.50= 17.0%
Iranian (38%) × 0.60= 22.8%
Total CHG ancestry~47–48%
Kashmiri Pandit NAR19
Steppe (31%) × 0.50= 15.5%
Iranian (50%) × 0.60= 30.0%
Total CHG ancestry~45–46%
UP / Bihar Brahmin NAR19
Steppe (27%) × 0.50= 13.5%
Iranian (46%) × 0.60= 27.6%
Total CHG ancestry~40–42%
Jats (Punjab/Haryana) NAR19
Steppe (26%) × 0.50= 13.0%
Iranian (40%) × 0.60= 24.0%
Total CHG ancestry~36–38%
South Indian Brahmin NAR19
Steppe (16%) × 0.50= 8.0%
Iranian (43%) × 0.60= 25.8%
Total CHG ancestry~33–35%
Nair / Vellalar REI09
Steppe (10%) × 0.50= 5.0%
Iranian (37%) × 0.60= 22.2%
Total CHG ancestry~26–28%
North Indian SC/Dalit BAS16
Steppe (10%) × 0.50= 5.0%
Iranian (27%) × 0.60= 16.2%
Total CHG ancestry~20–22%
South Indian Dalit BAS16
Steppe (3%) × 0.50= 1.5%
Iranian (19%) × 0.60= 11.4%
Total CHG ancestry~12–14%
Adivasi / Tribal MON17
Steppe (2%) × 0.50= 1.0%
Iranian (11%) × 0.60= 6.6%
Total CHG ancestry~7–9%
Section 4 — Migration Timeline
How Indian Caste Genetics Was Formed

Six major population events, separated over 70,000 years, created the genetic landscape of modern India. Each left measurable signatures in living people. NAR19MON17HAA15

~70,000–60,000 BCE — Out of Africa
First South Asians (AASI)
Anatomically modern humans reach South Asia via the southern coastal route. These become the AASI — Ancient Ancestral South Indians. They populate the entire subcontinent and remain isolated for tens of thousands of years. Their direct descendants today are Dalits and Adivasi tribes. MON17
~13,000–10,000 BCE — Caucasus
CHG — Caucasus Hunter-Gatherers Emerge
A genetically distinct population develops in the Caucasus Mountains (modern Georgia/Armenia). They become the root "Caucasian" ancestry. They later become ancestors of both Iranian farmers and Steppe pastoralists — both of which eventually reach India. JON15
~7,000–5,000 BCE — First Wave
Iranian Farmers Enter South Asia
Neolithic farmers from the Zagros mountains of Iran (with ~60% CHG ancestry) mix with the existing AASI population. They bring farming, cattle herding, and proto-Dravidian languages. This mixture creates the bulk of modern South Asian ancestry — it forms the genetic foundation of the Indus Valley Civilization. NAR19
~3,300–1,900 BCE — IVC
Indus Valley Civilization — Zero Steppe Ancestry
Ancient DNA from Rakhigarhi (the largest IVC site) shows zero steppe pastoralist ancestry. The IVC was built entirely by Iranian Farmer + AASI populations. This is the settled academic consensus — not a hypothesis. IVC predates the Aryan migration by 1,000+ years. SHI19NAR19
~2,000–1,500 BCE — Second Wave
Indo-Aryan Migration from the Steppe
Sintashta/Andronovo-descended Steppe pastoralists (high R1a-Z93) enter South Asia through the northwest (modern Afghanistan/Pakistan). They bring Sanskrit, Vedic religion, and horse culture. They mix most heavily with already-mixed (Iranian+AASI) populations in the northwest. The further south they went, the less genetic impact. Kashmir and Punjab received the most; Tamil Nadu and Kerala the least. NAR19HAA15
~1,000 BCE onwards — Endogamy Locks In
Caste System Freezes Ancestry Ratios
The varna/jati system solidifies. Endogamous marriage within caste groups becomes enforced. The 2024 LASI-DAD study confirms India experienced a major demographic shift towards endogamy, resulting in extensive homozygosity. Within ~50–70 generations, each caste group became genetically distinct. Basu et al. (2016) identified ~70 generations ago (~1,900 CE / ~1,700 years) as when rapid endogamy replaced open admixture across Indo-European-speaking upper castes. BAS16LASI24
~4,000–5,000 years ago — East India
Austro-Asiatic Farmers Bring East Asian Ancestry
Related to modern Munda/Santali speakers, these farmers migrated into Bengal and Odisha from Southeast Asia, introducing rice cultivation. Their East Asian genetic signal spread through the region — including into higher-caste communities. This is why ALL Bengali groups — including Brahmins — show an East Asian component not present in UP or Rajasthan. NAR19MON17
Section 5 — North India
North Indian Caste Groups — Complete Breakdown

North India shows the steepest Steppe ancestry gradient in the subcontinent. The gradient runs: Haryana/Kashmir → Punjab → UP → Bengal/Odisha. Steppe ancestry is the most variable component; treat all ranges as indicative, not fixed. NAR19MOR13LASI24

Steppe (Aryan)
Iranian Farmer
AASI (Indigenous)
Caste Group Ancestry Bar Steppe Iranian AASI Total CHG Level Key Y-DNA
Rors (Haryana)Highest documented Steppe in South Asia
~32–35%~36–40%~27–30% ~47–48% Exceptional R1a-Z93
Kashmiri PanditHighest Steppe among Indian Brahmins
~29–32%~48–52%~18–22% ~45–46% Very High R1a-Z93
Sindhi BrahminSindh-origin Brahmins, Pakistan
~26–28%~48–52%~22–24% ~42% Very High R1a / J2
KhatrisPunjab merchant-warrior; all 10 Sikh Gurus were Khatri
~23–26%~40–44%~31–35% ~40% Very High R1a / J2
Jats (Haryana/Punjab)Highest Steppe among major agricultural castes
~24–27%~38–42%~32–38% ~37–39% Very High R1a / R2
ArorasPunjab/Sindh merchants; genetically near-identical to Khatris
~22–25%~41–45%~31–35% ~38–40% Very High R1a / J2
Punjab / Saraswat BrahminMohyals, GSBs, Himachal Brahmins
~22–25%~41–45%~32–36% ~38% Very High R1a / J2 / L
UP / Bihar BrahminKanyakubja, Maithil, Saryuparin
~24–28%~43–48%~25–30% ~40–42% Very High R1a-Z93
Gujjars / GurjarsNW pastoral community; ST in some states
~19–22%~39–43%~36–42% ~34–37% High R1a / J2 / Q
Rajputs (Rajasthan)Sisodias, Rathores, Chauhans
~18–22%~37–42%~36–44% ~33–37% High R1a / R2 / Q
Kayasthas (UP)Scribal-administrative caste; 12 Chitragupta-vanshi clans
~16–20%~39–43%~38–44% ~32–35% High R1a / J2 / R2
Bhumihars / TyagisLand-holding Brahmin-origin warrior castes
~17–21%~39–43%~38–44% ~32–36% High R1a / J2
Banias / VaishyasAgarwal, Maheshwari, Marwari, Oswal — extreme endogamy
~14–18%~42–47%~37–44% ~32–36% High J2 / R1a / L
Yadavs / AhirsLargest OBC; UP, Bihar, Haryana — cattle-herder origin
~11–14%~35–40%~47–54% ~27–30% Moderate R1a / R2 / H
KurmisGangetic plain cultivators; Bihar, UP
~10–14%~34–39%~48–56% ~26–29% Moderate R1a / R2 / H
Meenas / MinasRajasthan's largest Scheduled Tribe; pre-Rajput origin
~7–11%~31–36%~54–62% ~22–25% Low H / R2 / R1a
North Indian SC / DalitChamar, Dusadh, Valmiki, Pasi
~8–12%~24–30%~58–68% ~19–22% Low H dominant

North India — Community Deep Dives

Rors
Haryana / Western UP · Sarasvati plain · Agricultural community
Steppe
~34%
Iranian
~38%
AASI
~28%
Rors are a small agricultural community from Haryana with no particular political prominence — but they are genetically extraordinary. Multiple published studies, including analyses using the 1000 Genomes dataset, document Rors as having the highest Steppe ancestry of any documented South Asian population, in some analyses exceeding Kashmiri Pandits. Their location (Sarasvati plain, Haryana) is exactly where Steppe migrants would have first settled in large numbers. Long endogamy preserved this signal. In PCA plots, Rors cluster closer to Central Asian and Iranian populations than almost any other South Asian group. NAR19
R1a-Z93 very highGenetic outlier of South AsiaSarasvati plain origin
Kashmiri Pandits
Kashmir Valley · Highest Steppe among Brahmins · Extreme founder effects
Steppe
~31%
Iranian
~50%
AASI
~19%
Among the most ANI-shifted populations in India. Extreme founder effects and very high ROH (runs of homozygosity). Cluster between Central Asian populations and other North Indian groups in PCA — closer to Central Asia than any other South Asian community except Rors. Suffered severe population bottleneck after the 1990 exodus from Kashmir. R1a-Z93 frequencies reach ~72% in some studies — the highest documented in any major Indian community. High Iranian Farmer component (~50%) reflects deeper Iranian Neolithic input alongside Steppe. NAR19
R1a-Z93 ~72% in some studiesExtreme endogamyHighest CHG in India ~45–46%
Jats
Punjab · Haryana · Western UP · Rajasthan · Sikh, Hindu, Muslim subgroups
Steppe
~26%
Iranian
~40%
AASI
~34%
One of the most genetically significant communities in North India. Haryana Jats consistently cluster among the most ANI-shifted populations in South Asia, with Steppe ancestry approaching Kashmiri Pandits in some analyses. Their origin is a convergence of multiple ancestry streams in the northwestern zone — genetic evidence best supports mixed indigenous + Central Asian ancestry rather than a single origin. High R1a-Z93 (~40–50% in some studies). Hundreds of gotras including Malik, Dahiya, Mann, Grewal, Dhillon, Sandhu. Clan-exogamous marriage maintained community boundaries. Ranjit Singh's Sikh Empire was the most prominent Jat political achievement. NAR19
R1a-Z93 40–50%Pastoral-agricultural originOrigins debated: indigenous vs Central Asian vs mixed
Rajputs
Rajasthan · UP · MP · Bihar · 36 Royal Clans
Steppe
~20%
Iranian
~40%
AASI
~40%
The Rajput identity emerged in the 7th–12th centuries CE — they are not a single ethnic group with ancient unity. Historical scholarship (Thapar, Kolff) shows diverse origins: solar/lunar dynasty claims, clear Central Asian ancestry in some clans (Gurjara, Huna, Scythian), and many originally tribal groups who underwent "Rajputization." This makes Rajput genetics unusually heterogeneous. Rajasthani Rajputs (Sisodias, Rathores) cluster toward ANI-heavy profiles; UP Rajputs show more AASI. Some clans (Chauhans, some Tomars) show Q haplogroup frequencies consistent with Scythian/Huna admixture ~5th–7th century CE. NAR19
Q haplogroup: possible Scythian/Huna signalMost heterogeneous "caste" in North IndiaRajputization absorbed tribal groups
Banias / Vaishyas
Agarwal · Maheshwari · Oswal · Marwari · Khandelwal · Extreme endogamy
Steppe
~16%
Iranian
~45%
AASI
~39%
Bania communities show some of the highest ROH (runs of homozygosity) values of any Indian group in Basu et al. (2016) — a direct measure of endogamy intensity. Each subcommunity (Agarwal, Oswal, Maheshwari) has its own genetic signature. Agarwals and Oswals can be statistically distinguished in genetic studies despite living in the same cities for centuries, due to near-complete absence of inter-subcommunity marriage. Their moderate Steppe ancestry is consistent with merchant (not warrior) social role — less direct exposure to the northwest Steppe entry points. Marwari families (Birla, Bajaj, Mittal) expanded across colonial India. BAS16
Highest ROH values in IndiaSubcommunities genetically distinguishableJ2 / R1a / L
Section 6 — South India
South Indian Caste Groups — Complete Breakdown

South Indian groups have far less Steppe ancestry than North Indian equivalents, but Iranian Farmer ancestry is surprisingly similar across Brahmin groups (North and South). The biggest difference is AASI — which survived in much higher proportions through southern endogamy. The LASI-DAD 2024 study confirms AHG-related (AASI) ancestry is highest in North India and lowest in the South — the reverse of what many expect. REI09NAR19LASI24

Plain English Explanation
Why do South Indian Brahmins have so much less Steppe than North Indian Brahmins? Because the Steppe migrants entered from the northwest (modern Pakistan/Afghanistan) and spread south and east over centuries. The further south they went, the less they mixed. Tamil Nadu and Kerala received the smallest Steppe input. But the Iranian Farmer component — which arrived much earlier (~7,000 BCE) and spread more evenly — is similar across all Brahmin groups regardless of region.
Steppe (Aryan)
Iranian Farmer
AASI (Indigenous)
Caste Group Ancestry Bar Steppe Iranian AASI Total CHG Level Y-DNA
Tamil Brahmin (Iyer / Iyengar)Highest Steppe in South India
~14–18%~41–46%~38–44% ~34% Moderate R1a / J2 / R2
Telugu BrahminNiyogi, Vaidiki, Smartha
~13–17%~40–45%~40–46% ~33% Moderate R1a / J2 / R2
Kannada BrahminHavyaka, Shivalli, Kota
~12–16%~40–44%~41–46% ~33% Moderate R1a / J2
Namboothiri (Kerala Brahmin)Most isolated Brahmin group in India; extreme endogamy
~12–15%~41–46%~40–46% ~33% Moderate R1a / R2 / H
NairKerala upper non-Brahmin; matrilineal (marumakkathayam) system
~8–12%~34–40%~48–56% ~27% Low–Mod H / J2 / R1a
VellalarTamil Nadu dominant non-Brahmin; cultivator elite
~8–11%~32–38%~51–58% ~25% Low–Mod H / R2 / J2
KammaAndhra Pradesh dominant cultivating caste
~7–10%~31–36%~54–62% ~24% Low H / R1a / R2
ReddyAndhra / Telangana dominant cultivating caste
~7–10%~30–36%~54–62% ~23% Low H / R2
VokkaligaKarnataka dominant farming caste
~6–9%~30–35%~56–63% ~22% Low H / R2
LingayatKarnataka religious-caste community; Veerashaiva
~6–9%~29–35%~56–64% ~21% Low H / J2 / R2
Vysya / KomatiAndhra trading caste — extreme documented founder effect
~7–10%~33–38%~52–58% ~25% Low R2 / H
Mudaliar / Nadar (OBC)Tamil Nadu OBC — historically toddy tappers and traders
~4–7%~23–29%~63–72% ~18% Very Low H / L
Paraiyar / Madiga / HoleyaScheduled Castes — South India
~2–5%~16–22%~72–80% ~13% Trace H dominant
Irula / Toda / Paniyar / KoragaSouth Indian Scheduled Tribes — most indigenous mainland groups
~1–3%~8–14%~82–90% ~8% Trace H dominant
South Indian Dalits and Adivasis carry the oldest DNA in India. Mondal et al. (2017) showed that South Indian tribal groups share ~45% genetic ancestry with Onge (Andamanese Islanders) when modelled. The near-absence of Steppe ancestry in Paraiyar/Madiga communities (~2–5%), despite living in the same villages as Brahmins for 2,000+ years, is direct genetic evidence of the endogamy barrier. The groups simply did not intermarry significantly. MON17BAS16
Namboothiri marriage system: Only the eldest son could marry within the community; younger sons had sambandham (non-formal) unions with Nair women. This means the paternal (Y-DNA) Brahmin lineage remained "pure" while maternal ancestry blended with Nair/non-Brahmin women — explaining their isolated paternal haplogroup profile alongside more mixed mtDNA. REI09
Vysya / Komati founder effect — the most documented case in genetics: Basu et al. (2016) identified that Vysya communities show unusually high rates of rare genetic variants associated with elevated butyrylcholinesterase deficiency (relevant for anesthesia). This is a direct medical consequence of founder effect from a small founding population + centuries of strict endogamy. Not a "flaw" — just the mathematical outcome of restricted marriage. BAS16
Section 7 — East India & Northeast
Bengal, Odisha, Jharkhand & Northeast Tribal Communities

Eastern India presents a distinct genetic picture: all communities — Brahmin, non-Brahmin, and SC — show a fourth ancestry component absent in northwestern India: East Asian-related ancestry from Austro-Asiatic farmers who arrived in Bengal/Odisha ~4,000–5,000 years ago. Northeast Indian communities are genetically distinct from the rest of India. NAR19MON17

Steppe
Iranian Farmer
AASI
East Asian-related

East India — Bengal, Odisha, Jharkhand

GroupSteppeIranianAASIEast AsianNotes
Bengali BrahminRarhi, Barendra — highest status ~12–15%~34–38%~36–40%~10–14% Cluster nearer non-Brahmin Bengalis than to Kashmiri Pandits
Bengali KayasthaBose, Ghosh, Sen, Datta ~11–14%~31–35%~40–46%~10–14% Similar to Brahmin; high East component
Bengali Non-Brahmin (Namasudra etc.)Dominant OBC communities ~7–10%~25–31%~46–54%~13–17% Higher East Asian than upper castes
Santali / MundaBengal/Jharkhand/Odisha — Austro-Asiatic speakers ~1–3%~13–18%~55–64%~20–28% Austro-Asiatic migration signal strong
Oraon / KurukhJharkhand — Dravidian-speaking tribal ~2–4%~15–20%~62–70%~10–15% Dravidian speakers in Jharkhand
Ho / KhariaJharkhand — Austro-Asiatic tribes ~1–3%~12–16%~58–66%~20–26% High East Asian ancestry
Odisha BrahminsPancha Sakha and others ~14–17%~36–40%~44–48%~3–5% Transitional geography between Gangetic and peninsular India
Why do Bengali Brahmins have East Asian ancestry? Austro-Asiatic farmers (ancestors of Santali, Ho, Mundari speakers) migrated into Bengal/Odisha ~4,000–5,000 years ago from Southeast Asia, introducing rice cultivation. As they mixed with the existing AASI+Iranian-farmer population, their East Asian genetic signal spread through the region — including eventually into higher-caste communities through centuries of low-level admixture. This is why ALL Bengali groups — including Brahmins — show this component, unlike communities in UP or Rajasthan. NAR19MON17

Northeast India — Tribal Communities

Plain English
Northeast Indian tribal communities are genetically closer to populations in Myanmar, Southern China, and Southeast Asia than they are to South Indian tribes. The Tibeto-Burman and Austro-Asiatic migrations that shaped northeast India came from the east, not the west — creating a completely different genetic landscape from the Iranian Farmer + AASI base that underlies mainland Indian communities.
GroupSteppeIranianAASIEast AsianLanguage
Naga tribes (Nagaland) ~0–2%~6–10%~24–34%~58–68% Tibeto-Burman
Mizo / Lushai (Mizoram) ~0–2%~5–9%~22–30%~62–70% Tibeto-Burman
Khasi (Meghalaya)Matrilineal society ~0–2%~8–13%~32–40%~50–60% Austro-Asiatic
Bodo (Assam)Plains tribe of Assam ~1–3%~9–13%~28–36%~50–60% Tibeto-Burman
Assamese Brahmin / Caste Hindu ~8–12%~26–32%~40–48%~16–22% Indo-European
Manipuri (Meitei)Dominant plains community of Manipur ~2–5%~10–16%~30–40%~45–58% Tibeto-Burman
Garo (Meghalaya)Matrilineal; plains and hills of Meghalaya ~0–2%~8–12%~30–38%~52–62% Tibeto-Burman
Section 8 — West India
Gujarat & Maharashtra Communities

West India is a genetic transition zone between the high-Steppe northwest and the low-Steppe south. Gujarat communities preserve some of the most extreme founder effects documented globally (Parsis, certain Brahmin groups). Maharashtra shows a gradient from northwestern-type ancestry in upper castes to more AASI-heavy ancestry in tribal communities. BAS16NAR19

GroupAncestry BarSteppeIranianAASINotable
Gujarati BrahminAudichya, Anavil, Nagar, Mewada
~16–20%~40–46%~36–44% High L haplogroup frequency; high Iranian Farmer
Parsi (Zoroastrian)Iranian migrants ~700–900 CE; extreme founder effect
~12–16%~52–60%~28–36% Highest Iranian Farmer in India; globally extreme ROH
Gujarati Patel (Patidar)Leuva and Kadava; dominant farming caste
~12–16%~37–43%~42–50% Strong founder effects in Leuva/Kadava subcommunities
Chitpavan Brahmin (Maharashtra)Konkanastha; Peshwa lineage
~16–20%~39–43%~39–44% Distinct from Deshastha; unusually high J2
Deshastha Brahmin (Maharashtra)Deccan Brahmin; inland Maharashtra
~15–18%~38–42%~40–46% Slightly more AASI than Chitpavan — different origin debate
Maratha (non-Brahmin)Maharashtra's dominant warrior-farming caste
~9–13%~33–38%~50–58% Similar to Rajputs but with higher AASI
BhilGujarat/Rajasthan/MP — one of India's largest tribes
~3–7%~22–28%~66–74% High AASI; mixed with local upper castes
GondiMP / Telangana / Odisha — Dravidian-speaking central Indian tribe
~2–5%~18–25%~70–78% Central Indian Dravidian tribal baseline
Parsis — the most documented extreme founder effect globally: Parsis (Zoroastrians who fled Iran ~700–900 CE) show the highest Iranian Farmer ancestry in India (~52–60%) because they arrived from Iran and maintained strict endogamy for ~1,200 years in Gujarat. They also show some of the highest ROH values of any human population globally, paired with a small founding population. Their genome is extensively studied as a model of extreme endogamy consequences. BAS16
Section 9 — Tribal India
Adivasi Communities Across India — Full Spectrum

India has ~700+ Scheduled Tribe communities comprising ~8.6% of the population. Genetically, they represent the most diverse sector of Indian ancestry — ranging from AASI-dominant communities (most mainland tribes) to East Asian-dominant populations (northeast tribes) to mixtures in between. "Adivasi" is a social-legal category covering many different migration histories. MON17BAS16

Plain English — Most Important Point
India's tribal communities are NOT a single genetic group. Southern tribes (Irula, Toda, Chenchu) are closest to the first humans who settled South Asia ~60,000 years ago. Eastern tribes (Santali, Munda) carry additional East Asian ancestry from rice-farming migrants ~5,000 years ago. Northeastern tribes are predominantly East Asian in ancestry — genetically closer to Myanmar and southern China than to South India. This diversity shows that "Adivasi" covers many completely different migration histories.
Tribe / GroupRegionSteppeAASIEast AsianPrimary marker
Andamanese (Onge / Jarawa)Most isolated AASI population on Earth Andaman Islands ~0%~95–100%~0% Purest AASI; 30,000+ yr isolation
ChenchuHunter-gatherer; Nagarjunasagar hills — near-purest mainland AASI Andhra / Telangana ~1–3%~82–88%~0% H haplogroup dominant; ancient AASI
Irula / Toda / KotaNilgiri hills tribes Tamil Nadu ~1–2%~83–90%~0% H haplogroup dominant
Paniyar / KoragaKerala / Karnataka tribal groups Kerala / Karnataka ~1–3%~80–88%~0% H haplogroup; very high AASI
BhilLargest tribal group in NW/Central India Gujarat/Raj/MP ~3–7%~64–72%~0% Mixed with local upper castes
Gond / KoitorDravidian-speaking central Indian mega-tribe MP/Telangana/Odisha ~2–5%~68–76%~0% Largest central Indian tribe
SantaliAustro-Asiatic speakers; major tribe Bengal/Jharkhand/Odisha ~1–3%~54–62%~22–28% Significant East Asian component
Munda (Jharkhand)Austro-Asiatic speaking tribe Jharkhand ~1–3%~54–62%~22–28% Related to Santali genetically
Naga (various tribes)Tibeto-Burman; NE India Nagaland ~0–1%~25–34%~58–68% Mostly East Asian ancestry
Mizo / ZoTibeto-Burman; NE India Mizoram ~0–1%~22–30%~64–72% Very high East Asian
Khasi (Meghalaya)Austro-Asiatic; matrilineal society Meghalaya ~0–2%~32–40%~52–60% Matrilineal; mixed East Asian/AASI
Section 10 — World Comparison
How Indian Groups Compare to World Populations

To understand what "45% CHG" actually means, comparing to known world populations provides context. North Indian Brahmins sit genetically between South Asians and West Asians/Caucasians — they are not European, but share real ancestry with Iranians, Armenians, and Georgians. JON15LAZ16

🌿
Yamnaya (Ancient)
Eurasian Steppe ~3000 BCE — the Aryan source
Steppe~100%
CHG (total)~50%
AASI0%
🇬🇪
Georgians
Modern Caucasus — closest to pure CHG today
CHG (total)~58–65%
Iranian Farmer~25–30%
AASI0%
🇦🇲
Armenians
Caucasus — high CHG and Iranian Farmer
CHG (total)~50–58%
Steppe~20–25%
AASI0%
🇮🇷
Iranians (Persian)
Modern Iran — heavy Iranian Farmer + CHG
Iranian Farmer~60–65%
CHG (total)~45–55%
Steppe~12–18%
🇮🇳
Rors (Haryana)
Highest Steppe of any South Asian group
Steppe~32–35%
CHG (total)~47–48%
AASI~27–30%
🇮🇳
Kashmiri Pandit
Highest Steppe among Indian Brahmins
Steppe~31%
CHG (total)~45–46%
AASI~19%
🇮🇳
UP Brahmin
Typical North Indian Brahmin benchmark
Steppe~26%
CHG (total)~40–42%
AASI~27%
🇮🇳
South Indian Brahmin
Tamil Iyer / Telugu Brahmin benchmark
Steppe~16%
CHG (total)~33–35%
AASI~41%
🇬🇧
British / N. Europeans
High Steppe, zero AASI
Steppe~50–60%
CHG (total)~30–40%
AASI0%
🏝️
Andamanese (Onge)
Purest AASI — South Asian indigenous baseline
AASI~95–100%
Steppe~0%
CHG (total)~0%
Key takeaway: Kashmiri Pandits (~45% CHG) are genetically closer to Iranians and Armenians (~50–55% CHG) than to South Indian Brahmins (~34% CHG). Rors exceed Kashmiri Pandits in Steppe ancestry but have lower Iranian Farmer ancestry. No Indian group reaches European levels of Steppe ancestry (~50–60%). The AASI component is uniquely South Asian — absent in all West Eurasian and European populations.
Section 11 — Genetic Distance
How Different Are These Groups? (Fst)

Fst (fixation index) measures genetic distance. 0 = identical. 0.01 = minor. 0.05+ = substantial. 0.10+ = very high. These are approximate values synthesized from published ADMIXTURE and PCA data. BAS16

Population PairApprox. FstPlain English
Kashmiri Pandit vs UP Brahmin ~0.005–0.010 Very close — almost the same cluster on PCA
Rors vs Kashmiri Pandit ~0.008–0.015 Very close — both peak Steppe; slight Iranian Farmer difference
UP Brahmin vs South Indian Brahmin ~0.015–0.025 Noticeable — comparable to Germans vs Greeks; mostly Steppe difference
Kashmiri Pandit vs South Indian Brahmin ~0.020–0.035 Moderate — roughly like English vs Spaniards
South Indian Brahmin vs Nair/Vellalar ~0.020–0.030 Clear Brahmin vs non-Brahmin distinction — genetically visible
South Indian Brahmin vs South Indian Dalit ~0.040–0.060 High — larger than many European national populations
North Indian Brahmin vs South Indian Dalit ~0.060–0.080 Very high — approaching European vs East African distance
UP Brahmin vs Northwestern European ~0.045–0.065 Substantial — shared Steppe ancestry but diverged ~4,000 years ago
South Indian Dalit vs Andamanese ~0.020–0.040 Surprisingly close — both carry very high AASI ancestry
Santali vs South Indian tribal (Irula) ~0.040–0.060 Different migration histories despite both being "tribal"
Naga vs South Indian tribal ~0.080–0.120 Very high — completely different ancestry (East Asian vs AASI)
Bengali Brahmin vs UP Brahmin ~0.015–0.025 Notable — same varna, different geography = different genetics
The most important finding in Indian genetics: The largest genetic divide in India is NOT North vs South — it is upper caste vs Dalit/Adivasi within the same region. A South Indian Brahmin and a South Indian Dalit from the same village are genetically further apart (Fst ~0.05) than many European nationalities. 3,000 years of enforced endogamy created genetic gaps that geography alone cannot explain. BAS16MOR13
Section 12 — Y-DNA Haplogroup Reference
Haplogroup Guide for India

Haplogroups track only the direct father-to-son line — one thread out of thousands of ancestors. They are signals, not definitions. A person's haplogroup tells you about one specific lineage; their ADMIXTURE profile tells you about their whole ancestry. Never equate haplogroup with total ancestry or identity. THG

R1a-Z93
Pontic-Caspian Steppe — The definitive Steppe marker
Arrives with Sintashta/Andronovo Steppe pastoralists ~2000–1500 BCE. Highest in NW upper castes. Declines sharply south of Vindhyas and in lower castes everywhere. R1a-Z93 frequencies range from ~72% in some Kashmiri Pandit studies to ~20% in Bengali Brahmins — illustrating enormous variation even within the same varna.
Peak in: Kashmiri Pandit, UP Brahmin, Jats, Rors, Khatris, Rajputs
⚠ Does NOT mean "Aryan purity." Also found in Central Asia, Iran, Slavic Europe — predates any caste category by millennia.
J2a
Near East / Zagros region — Iranian Farmer signal
Originates from Neolithic Zagros/Caucasus farmers (~8,000 BCE). Common across the Middle East, Caucasus, Mediterranean. Present significantly in South Indian Brahmins — arrived before the Steppe migration, with early Iranian farmers. Widespread across the caste spectrum.
Significant in: South Indian Brahmins, Banias, Rajputs, Kayasthas, Chitpavan Brahmins
⚠ Does NOT mean "foreign." J2 arrived ~8,000 years ago — predating all recorded Indian history.
H1a
South Asia (deep indigenous) — oldest surviving lineage
Native to South Asia. ~60,000 years old. The oldest surviving Y-chromosome lineage in the subcontinent. Dominant in Dalits and tribal groups across all of India. Marks unbroken indigenous AASI descent. Its presence elevates with lower caste status — directly reflecting AASI ancestry proportion.
Dominant in: All Dalits (Paraiyar, Madiga, Chamar), all Adivasi tribes, southern OBCs
⚠ Does NOT indicate "lower" origin. H carriers have the oldest continuous South Asian heritage that exists.
R2
South Asia / possibly Central Asia
Found almost exclusively in South Asia and nearby regions. Likely arrived with early Iranian farmer migration or a related wave. Not strongly associated with either Steppe or AASI specifically — widespread from Brahmins to OBCs. Origins are more complex and less studied than R1a.
Widespread in: Jats, Rajputs, Yadavs, Kurmis, Telugu/Kannada Brahmins
⚠ Over-interpretation is common in popular discourse — avoid strong conclusions.
L
South Asia / possibly Iranian — pre-Steppe wave
South Asian haplogroup with likely Iranian farmer roots. Common in Gujarat, Sindh, Rajasthan. Also present in South India. Part of the pre-Steppe farming migration wave into South Asia. High in Gujarati Brahmins, Sindhis, and Rajasthani Banias. Exact origin still actively debated — possibly IVC-associated.
Spread across: Gujarati Brahmins, Sindhi, Rajasthani Banias, South Indian OBCs
Q
Central Asia / Americas — minor in India
Minor in most Indian communities. Found in some Rajput clans and Gujjars. Possible signal of Scythian/Huna migration (~5th–6th century CE). Small Q frequencies should not be over-interpreted — it occurs across Central Asian pastoralist populations broadly, not specific to any historical group.
Some Rajput clans (Chauhan, some Tomars), minor in Gujjars
⚠ Not confirmed as a definitive Scythian marker. Needs cautious interpretation.
O
East/Southeast Asia — Bengal and NE India
Present in Bengal and Northeast India. Reflects Austro-Asiatic (Munda, Santali) and Tibeto-Burman ancestral input into eastern India. Found even in Bengali Brahmins — evidence of the East Asian admixture that penetrated all Bengali communities ~4,000–5,000 years ago.
Bengali Brahmins, Bengali Kayasthas, all NE Indian communities
mtDNA M
Maternal — near-universal in South Asia
Almost all South Indians of every caste and tribe share M haplogroup maternal lineages. This proves Indo-Aryan migration was male-driven — incoming men married local women. The female genetic pool barely changed across caste boundaries. Only Y-DNA (paternal) shows the caste divide. mtDNA does NOT distinguish caste.
Universal across: ALL South Asian castes and tribal groups
⚠ The caste system primarily enforced patrilineal identity — that's why only Y-DNA shows the divide.
Section 13 — Visual Charts
Ancestry Visualization

Chart A — Steppe "Aryan" Ancestry Across All Groups (High to Low)

Steppe
Iranian Farmer
AASI
Groups ordered by Steppe (Aryan) ancestry, highest to lowest. North/NW groups dominate the top half. Source: Narasimhan et al. 2019 (Science); Reich et al. 2009 (Nature); Basu et al. 2016 (PNAS); Kerdoncuff et al. 2024 (bioRxiv). All figures are approximate ranges.

Chart B — Total CHG (Caucasian) Ancestry — Indian Groups vs World

CHG = (Steppe × 0.50) + (Iranian Farmer × 0.60). World populations shown for comparison context.
Section 14 — Critical Clarifications
Most Dangerous Misconceptions

These claims appear frequently in online discourse. Here is what the actual peer-reviewed science says.

Upper castes are more intelligent due to Steppe/Aryan ancestry
No peer-reviewed study establishes any link between ANI/Steppe ancestry and cognitive traits. Intelligence is polygenic, environmentally mediated, and not predicted by ancestry components. Socioeconomic privilege, access to education, and literary traditions explain occupational outcomes — not genetics. BAS16
R1a = Aryan. Higher R1a = more Aryan/superior
"Aryan" as a racial category is a 19th-century colonial construct. R1a-Z93 is a Y-chromosome lineage occurring in Jats, Rors, Rajputs, AND Brahmins — communities with very different social histories. Indo-Aryan = a language family, not a genetic race. R1a-Z93 entered South Asia ~3,500 years ago and spread across many communities regardless of current caste status. NAR19
Brahmins are a unified genetic group
Kashmiri Pandits and Tamil Iyers share a varna but have greater genetic distance than many European national populations. Bengali Brahmins cluster with non-Brahmin Bengalis, not Kashmiri Pandits. Geography predicts genetics. Varna does not. The same varna across distant states is a textual category, not a genetic one. BAS16NAR19
The IVC = Aryan / Vedic civilization
Ancient DNA from Rakhigarhi (the largest IVC site) shows zero steppe ancestry. IVC predates the steppe migration by 1,000+ years and was built by Iranian Farmer + AASI populations. This is the settled academic consensus. Both IVC and Indo-Aryan heritage are parts of Indian civilizational history — neither cancels the other. SHI19NAR19
Lower AASI = more civilized / more evolved
AASI represents the oldest human presence in South Asia — ~60,000 years of continuous habitation before any farmer or Steppe migrant arrived. By any definition of "original," AASI is the foundation. All modern humans have been evolving for the same total time since Africa. "Primitive" has no meaning in genetics. MON17
Caste genetics proves caste hierarchy is natural or justified
Genetic differentiation between castes is entirely the result of ~2,000 years of socially enforced endogamy — a human cultural choice, not a biological fact. Remove endogamy and the clusters dissolve across generations. Genetic structure proves the enforcement of social rules was effective. It says nothing about whether those rules were just or natural. MOR13BAS16
Tribal people are "genetically primitive" or inferior
Tribal (AASI-dominant) lineages are among the oldest in the world. South Indian tribal groups carry the oldest continuous South Asian lineage. Andamanese people are genetically the most isolated human population on Earth — by definition the "purest" South Asian lineage if any claim of purity means anything at all. MON17
Section 15 — References
Key Research Papers & Sources

All data in this document is sourced from peer-reviewed or pre-print academic studies. Percentage values are approximate midpoints synthesized from ranges across multiple published studies.

REI09
Reconstructing Indian Population History
Reich, Thangaraj, Patterson, Price, Singh · Nature, 2009
Introduced the ANI/ASI two-component model. First rigorous proof that all South Asians are a mixture. Foundational paper for all subsequent work.
DOI: 10.1038/nature08365
NAR19
Formation of Human Populations in South and Central Asia
Narasimhan, Patterson, Moorjani et al. · Science, 2019
The most important study. Ancient DNA from 523 individuals. Identified the three-component model, confirmed Steppe migration ~2000 BCE, proved zero Steppe in IVC populations.
DOI: 10.1126/science.aat7487
SHI19
An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists
Shinde, Narasimhan, Rohland, Mallick et al. · Cell, 2019
Ancient DNA from Rakhigarhi (largest IVC site). Confirmed zero steppe ancestry in IVC. Definitively settled the debate about IVC genetic composition.
DOI: 10.1016/j.cell.2019.08.048
MOR13
Genetic Evidence for Recent Population Mixture in India
Moorjani, Thangaraj, Patterson, Reich et al. · AJHG, 2013
Dated ANI-ASI admixture to 1,900–4,200 years ago using linkage disequilibrium decay. Correlated the Vedic/Iron Age period with genetic mixture events.
DOI: 10.1016/j.ajhg.2013.07.006
BAS16
Genomics of Endogamy and Founder Effects in Indian Populations
Basu, Sarkar-Roy, Majumder · PNAS, 2016
Quantified extreme genetic differentiation between Indian castes. First to show ~3,000 years of endogamy created medically significant disease risks. Documented Vysya founder effect.
DOI: 10.1073/pnas.1513117113
MON17
Genomic Analysis of Andamanese Peoples
Mondal, Bergström, Xue et al. · Nature Genetics, 2017
Characterized AASI lineage using Andamanese genomes. Showed South Indian tribal groups share ~45% ancestry with Onge (Andamanese Islanders). Essential for understanding indigenous South Asian ancestry.
DOI: 10.1038/ng.3835
HAA15
Massive Migration from the Steppe — Indo-European Languages in Europe
Haak, Lazaridis, Patterson, Reich et al. · Nature, 2015
Established Yamnaya as the Steppe source of Indo-European expansion using 69 ancient Europeans. Same Steppe ancestry reached India via the eastern branch ~500 years later.
DOI: 10.1038/nature14317
JON15
Upper Palaeolithic Genomes — Deep Roots of Modern Eurasians
Jones, Gonzalez-Fortes, Connell et al. · Science, 2015
Characterized the CHG (Caucasus Hunter-Gatherer) population from ancient Caucasus samples — the ancestral "Caucasian" population feeding into both Iranian farmers and Steppe pastoralists.
DOI: 10.1126/science.aaa0114
LAZ16
Genomic History of Ancient Europe
Lazaridis, Patterson, Mittnik et al. · Nature, 2016
Refined Iranian farmer and CHG components using ancient Anatolian and Zagros samples. Directly relevant to understanding non-Steppe Caucasian ancestry in South Asians.
DOI: 10.1038/nature19310
LASI24
50,000 Years of Evolutionary History of India — 2,762 Whole Genome Sequences
Kerdoncuff, Skov, Patterson, Zhao, Moorjani et al. · bioRxiv, 2024
Most comprehensive genomic study of India to date. 2,762 high-coverage whole genomes from 18 states. Confirmed three-component model; showed AHG gradient is associated with geography, language family (IE vs Dravidian), and caste membership. Confirmed major shift to endogamy ~70 generations ago.
DOI: 10.1101/2024.02.15.580575
THG
Y-DNA Haplogroup Surveys Across Indian Castes and Tribes
Thangaraj, Chaubey, Kivisild et al. · Various, 2006–2019
Multiple India-specific Y-chromosome studies providing haplogroup frequency data across caste and tribal communities. Essential for haplogroup reference data used in this atlas.
IGVC
Genetic Landscape of the People of India
Indian Genome Variation Consortium · Various
Large-scale sequencing of ~55 Indian populations across all caste and tribal groups. Primary genomic database for South Asian population genetics research.
Data limitations and honesty: All percentage values are approximate midpoints synthesized from ranges across multiple published studies. Ancient DNA studies for South Asia are still relatively sparse compared to Europe — many estimates rely on modern population genetics extrapolated to ancient ancestry. As more ancient genomes are sequenced from the subcontinent (ongoing at several institutions), specific figures will be updated. The broad framework (three-component model, northwest-to-southeast Steppe gradient, endogamy creating genetic structure) is robust across all major studies. Specific percentage estimates are not fixed — treat them as illustrative ranges, not precise measurements. NAR19