Unlocking Caste Enumeration Through Linguistic and Cultural Insights

Professor G.N. Devy highlights the importance of linguistic markers in accurately enumerating castes during the upcoming Census.

Gopi

•February 9, 2026•5 mins read

Census 2026–27: Scholars Push for Open-Field Caste Data

Not Started

1. Context: Why Linguistic and Cultural Markers Matter for Caste Enumeration

The upcoming 2026–27 Census will include caste enumeration, but the Union government has not yet announced the methodology. This uncertainty has triggered debate among scholars, activists, and community leaders on how to ensure accurate and inclusive caste data. Professor G.N. Devy, a noted linguist and cultural activist, argues that linguistic and cultural markers—already proven effective in language surveys—can be used to address duplication, variation, and inconsistencies in caste naming.

Caste names often vary across regions due to differences in spellings, dialects, or local histories. Relying solely on self-reported names—as the 2011 SECC did—can lead to enormous lists with duplication; that exercise returned over 46 lakh caste names. Without a method to analyse these names systematically, enumeration risks becoming unmanageable and unreliable, thereby affecting welfare targeting and social justice policy design.

Language-based methodologies, similar to those used in the Peoples’ Linguistic Survey of India, which consolidated 19,000 mother tongues into 1,369, offer a tested model. These techniques rely on markers such as ancestry, kinship patterns, lifestyle, and linguistic affinity to classify communities more accurately. Applying them to caste enumeration could help produce a comprehensive and credible caste list.

If linguistic-cultural scrutiny is not integrated into caste enumeration, the Census may generate inflated or fragmented caste data, weakening evidence-based policymaking and complicating welfare delivery.

"This model has been tried and tested for languages." — G.N. Devy

2. Two Competing Models: Open Field vs. Pre-Listed Castes

Debate currently centres on two major enumeration approaches. The first is an open text field, allowing citizens to self-report their caste. This mirrors the 2011 SECC, which produced an unwieldy dataset requiring deep post-survey rationalisation. The second is a pre-compiled caste list, as used in the Bihar Caste Survey, which minimises variation but risks excluding smaller or regionally distinct groups.

Professor Devy strongly favours the open field method, arguing that post-Census linguistic-anthropological scrutiny can refine the data effectively. However, this requires the Census office to allow academic scrutiny and involve institutions like the Anthropological Survey of India (AnSI).

Open field responses capture the diversity and fluidity of social identities but can overwhelm administrative systems if left unprocessed. Pre-listed responses offer administrative ease but risk reinforcing outdated or politicised categories.

If methodological choices prioritise administrative convenience over sociological accuracy, marginalised or lesser-known communities may be misclassified or left out, affecting their representation and access to welfare.

Key Challenges:

46+ lakh caste names returned in 2011 SECC due to spelling and nomenclature variations.
Risk of over-inclusion (duplicate names) and under-inclusion (missing communities).
Need for post-survey scrutiny involving experts and institutions.
Lack of clarity on 2026–27 Census methodology.

3. Post-Census Scrutiny: A Critical Step for Data Reliability

Professor Devy emphasises a multi-layered post-enumeration scrutiny process. The model used for language enumeration is instructive: the 2011 Census recorded 19,000 mother tongues, but through layered analysis—checking for duplication, spelling variations, grammar, linguistic lineage—this was narrowed to 1,369 recognised mother tongues.

The same logic can apply to caste names. Communities often share linguistic or cultural traits even when their regional names differ. For example, Sansi (Punjab), Kanjar (Rajasthan), Chhara (Gujarat), and Kanjar Bhat (Maharashtra) are regionally distinct names for essentially one community that shares a common language, Bhaktu.

Such consolidation requires collaboration with institutions like AnSI and reference to large ethnographic works such as People of India, which documents community origins, kinship patterns, and socio-cultural traits.

Without structured post-survey analysis, raw caste data risks becoming unusable for policymaking, limiting the state’s ability to design inclusive welfare and representation frameworks.

Required Institutional Measures:

Involvement of AnSI, linguistic experts, and sociologists.
Access to raw Census data for scholarly scrutiny.
Use of reference ethnographic works (e.g., People of India).
Classification through markers: language, ancestry, kinship, lifestyle.

4. The Need to Explicitly Count Denotified, Nomadic & Semi-Nomadic Tribes (DNTs)

India’s 10 crore+ DNT–NT–SNT population remains inadequately enumerated and poorly understood. Historically stigmatised under the Criminal Tribes Act, 1871, these communities continue to face exclusion in welfare schemes due to lack of precise, disaggregated data.

Professor Devy, who co-founded the DNT-Rights Action Group and chaired the 2006 Technical Advisory Group for DNTs, stresses that the Census must explicitly announce the intention to count DNT communities. Generic caste or tribe categories fail to capture their distinct mobility patterns, socio-economic vulnerabilities, and intra-group variations.

If the Census does not proactively include DNT categories, these communities risk further marginalisation, creating a governance challenge larger than the administrative difficulty of compiling an accurate caste list.

Ignoring explicit DNT enumeration perpetuates invisibility, undermines social justice frameworks, and increases the risk of deepening historical exclusion.

Consequences of Non-Enumeration:

Continued absence in targeted welfare schemes.
Policy blind spots for a population of 10+ crore.
Reinforcement of historical stigma and socio-economic deprivation.

5. Way Forward: Towards an Inclusive, Accurate and Evidence-Based Enumeration Framework

The credibility of the 2026–27 Census depends on clear methodology, expert involvement, and publicly accessible scrutiny. Incorporating linguistic and cultural markers can help reconcile the diversity of caste names while ensuring inclusivity. The Census must also proactively enumerate DNT communities and engage institutions such as the AnSI for classification.

Combining open field responses with rigorous post-survey analysis offers both democratic expression and scientific accuracy. Transparency in methodology can build trust among communities and ensure policy relevance.

Conclusion

India’s caste enumeration challenge lies not in the diversity of names but in the lack of structured analytical frameworks. Linguistic and cultural markers, proven effective in language surveys, offer a robust pathway to reliable caste data. An inclusive, expert-driven, transparent approach—especially one that explicitly counts DNT communities—will strengthen evidence-based policymaking and support long-term social justice and governance outcomes.

Quick Q&A

Everything you need to know

Conceptual premise: Professor G.N. Devy argues that caste enumeration should move beyond a purely nominal listing of caste names and instead rely on linguistic, cultural, and anthropological markers to arrive at a scientifically robust classification. According to him, even if individuals report different caste names based on local usage, spelling variations, or regional identities, deeper post-Census analysis can reveal underlying commonalities. These include shared language, ancestry, lifestyle, kinship systems, and marriage patterns, which together provide a more accurate picture of social groupings.

Learning from linguistic surveys: Devy draws a direct analogy with language enumeration. In the 2011 Census, around 19,000 mother tongues were reported, many of which were duplications or variations. Through multiple layers of scrutiny—removing spelling errors, clubbing dialects, and verifying grammatical structures—this number was refined to 1,369 languages. He proposes that caste data can be treated similarly: an inclusive collection phase followed by rigorous academic consolidation.

Administrative and social relevance: This approach reconciles two competing needs—inclusivity and usability. It avoids excluding communities at the data collection stage while ensuring that the final caste list is coherent and analytically meaningful. For policymakers, such an approach would generate reliable data for welfare planning, affirmative action, and social justice interventions, making caste enumeration a tool of governance rather than political contestation.

Rationale for openness: Professor Devy supports the open-field method—where respondents self-report their caste—because it best reflects India’s complex and evolving social reality. A predefined list, as used in Bihar’s caste survey, risks excluding marginal, nomadic, or less-documented communities whose identities may not fit neatly into official categories. In contrast, an open field allows every individual and group to articulate their identity without prior constraints.

Addressing the SECC concern: Critics of the open-field method often cite the 2011 SECC, which produced over 46 lakh caste names, as evidence of administrative chaos. Devy counters this by arguing that the problem lay not in data collection but in the absence of structured post-Census analysis. With proper scholarly scrutiny—similar to linguistic classification—this apparent disorder can be systematically resolved.

Implications for democratic governance: From a UPSC perspective, the open-field approach aligns with constitutional values of dignity and recognition. It treats caste enumeration as a participatory exercise rather than a bureaucratic imposition. When combined with expert validation by institutions like the Anthropological Survey of India (AnSI), it ensures that inclusivity does not come at the cost of accuracy.

Step-by-step mechanism: Post-Census scrutiny involves layering raw data with academic analysis. The first step is to map reported caste names alongside linguistic data, especially mother tongues. Subsequent layers examine shared ancestry narratives, occupational patterns, kinship systems, and marriage networks. Together, these markers help identify whether different names refer to the same social group.

Illustrative example: Professor Devy cites the Sansi community as a case study. Known as Sansi in Punjab, Kanjar in Rajasthan, Chhara in Gujarat, and Kanjar Bhat in Maharashtra, these groups appear distinct in name. However, they share a common language—Bhaktu—and similar cultural practices. Linguistic and anthropological evidence thus reveals them to be a single community despite multiple labels.

Institutional role: Bodies such as AnSI can certify such classifications, drawing on projects like ‘People of India’. This ensures that consolidation is evidence-based and transparent. Practically, this method transforms raw Census data into a reliable social database, avoiding both over-fragmentation and arbitrary clubbing.

Strengths of openness: Keeping Census data open to scholars enhances transparency, credibility, and methodological rigour. Academic scrutiny helps correct errors, explain variations, and provide sociological depth to raw numbers. It also builds public trust by demonstrating that caste enumeration is guided by evidence rather than political convenience.

Potential challenges: However, open data raises concerns about privacy, data misuse, and politicisation. Without clear safeguards, sensitive caste information could be selectively interpreted or misrepresented. There is also the administrative challenge of coordinating between government agencies and independent scholars.

Balanced assessment: For UPSC analysis, the key lies in institutional design. With anonymisation protocols, clear mandates, and involvement of credible institutions like AnSI, openness can strengthen governance capacity. In a diverse society like India, evidence-based transparency is preferable to closed, opaque decision-making.

Historical context: DNT communities were criminalised under the colonial-era Criminal Tribes Act, 1871, and continue to face stigma and exclusion. Their nomadic lifestyles and lack of formal documentation have resulted in persistent undercounting in official data, rendering them invisible in policymaking.

Cause-and-effect relationship: The absence of reliable data directly affects access to welfare schemes, reservations, education, and housing. Professor Devy warns that failure to explicitly enumerate DNTs could alienate over 10 crore people, deepening social marginalisation and mistrust of the state.

Governance implications: Explicit counting is not merely statistical but an act of restorative justice. Accurate enumeration enables targeted interventions and acknowledges historical wrongs. Ignoring this opportunity, Devy argues, risks creating a social problem far larger than the technical challenges of caste classification.

Background: The Peoples’ Linguistic Survey of India, led by Professor Devy, documented over 780 living languages, offering a more nuanced picture than official records. It demonstrated how community participation combined with scholarly verification can produce reliable social data.

Lessons for caste Census: The survey showed that large, diverse datasets can be refined without erasing identity. Applying similar methodologies—layered scrutiny, expert validation, and openness—to caste data can convert self-reported identities into a coherent classification system.

UPSC relevance: As a governance case study, it highlights evidence-based policymaking, inter-disciplinary collaboration, and the balance between inclusivity and administrative efficiency. It underscores how social science tools can strengthen state capacity in managing diversity.