1. Context: Why Linguistic and Cultural Markers Matter for Caste Enumeration
The upcoming 2026–27 Census will include caste enumeration, but the Union government has not yet announced the methodology. This uncertainty has triggered debate among scholars, activists, and community leaders on how to ensure accurate and inclusive caste data. Professor G.N. Devy, a noted linguist and cultural activist, argues that linguistic and cultural markers—already proven effective in language surveys—can be used to address duplication, variation, and inconsistencies in caste naming.
Caste names often vary across regions due to differences in spellings, dialects, or local histories. Relying solely on self-reported names—as the 2011 SECC did—can lead to enormous lists with duplication; that exercise returned over 46 lakh caste names. Without a method to analyse these names systematically, enumeration risks becoming unmanageable and unreliable, thereby affecting welfare targeting and social justice policy design.
Language-based methodologies, similar to those used in the Peoples’ Linguistic Survey of India, which consolidated 19,000 mother tongues into 1,369, offer a tested model. These techniques rely on markers such as ancestry, kinship patterns, lifestyle, and linguistic affinity to classify communities more accurately. Applying them to caste enumeration could help produce a comprehensive and credible caste list.
If linguistic-cultural scrutiny is not integrated into caste enumeration, the Census may generate inflated or fragmented caste data, weakening evidence-based policymaking and complicating welfare delivery.
"This model has been tried and tested for languages." — G.N. Devy
2. Two Competing Models: Open Field vs. Pre-Listed Castes
Debate currently centres on two major enumeration approaches. The first is an open text field, allowing citizens to self-report their caste. This mirrors the 2011 SECC, which produced an unwieldy dataset requiring deep post-survey rationalisation. The second is a pre-compiled caste list, as used in the Bihar Caste Survey, which minimises variation but risks excluding smaller or regionally distinct groups.
Professor Devy strongly favours the open field method, arguing that post-Census linguistic-anthropological scrutiny can refine the data effectively. However, this requires the Census office to allow academic scrutiny and involve institutions like the Anthropological Survey of India (AnSI).
Open field responses capture the diversity and fluidity of social identities but can overwhelm administrative systems if left unprocessed. Pre-listed responses offer administrative ease but risk reinforcing outdated or politicised categories.
If methodological choices prioritise administrative convenience over sociological accuracy, marginalised or lesser-known communities may be misclassified or left out, affecting their representation and access to welfare.
Key Challenges:
- 46+ lakh caste names returned in 2011 SECC due to spelling and nomenclature variations.
- Risk of over-inclusion (duplicate names) and under-inclusion (missing communities).
- Need for post-survey scrutiny involving experts and institutions.
- Lack of clarity on 2026–27 Census methodology.
3. Post-Census Scrutiny: A Critical Step for Data Reliability
Professor Devy emphasises a multi-layered post-enumeration scrutiny process. The model used for language enumeration is instructive: the 2011 Census recorded 19,000 mother tongues, but through layered analysis—checking for duplication, spelling variations, grammar, linguistic lineage—this was narrowed to 1,369 recognised mother tongues.
The same logic can apply to caste names. Communities often share linguistic or cultural traits even when their regional names differ. For example, Sansi (Punjab), Kanjar (Rajasthan), Chhara (Gujarat), and Kanjar Bhat (Maharashtra) are regionally distinct names for essentially one community that shares a common language, Bhaktu.
Such consolidation requires collaboration with institutions like AnSI and reference to large ethnographic works such as People of India, which documents community origins, kinship patterns, and socio-cultural traits.
Without structured post-survey analysis, raw caste data risks becoming unusable for policymaking, limiting the state’s ability to design inclusive welfare and representation frameworks.
Required Institutional Measures:
- Involvement of AnSI, linguistic experts, and sociologists.
- Access to raw Census data for scholarly scrutiny.
- Use of reference ethnographic works (e.g., People of India).
- Classification through markers: language, ancestry, kinship, lifestyle.
4. The Need to Explicitly Count Denotified, Nomadic & Semi-Nomadic Tribes (DNTs)
India’s 10 crore+ DNT–NT–SNT population remains inadequately enumerated and poorly understood. Historically stigmatised under the Criminal Tribes Act, 1871, these communities continue to face exclusion in welfare schemes due to lack of precise, disaggregated data.
Professor Devy, who co-founded the DNT-Rights Action Group and chaired the 2006 Technical Advisory Group for DNTs, stresses that the Census must explicitly announce the intention to count DNT communities. Generic caste or tribe categories fail to capture their distinct mobility patterns, socio-economic vulnerabilities, and intra-group variations.
If the Census does not proactively include DNT categories, these communities risk further marginalisation, creating a governance challenge larger than the administrative difficulty of compiling an accurate caste list.
Ignoring explicit DNT enumeration perpetuates invisibility, undermines social justice frameworks, and increases the risk of deepening historical exclusion.
Consequences of Non-Enumeration:
- Continued absence in targeted welfare schemes.
- Policy blind spots for a population of 10+ crore.
- Reinforcement of historical stigma and socio-economic deprivation.
5. Way Forward: Towards an Inclusive, Accurate and Evidence-Based Enumeration Framework
The credibility of the 2026–27 Census depends on clear methodology, expert involvement, and publicly accessible scrutiny. Incorporating linguistic and cultural markers can help reconcile the diversity of caste names while ensuring inclusivity. The Census must also proactively enumerate DNT communities and engage institutions such as the AnSI for classification.
Combining open field responses with rigorous post-survey analysis offers both democratic expression and scientific accuracy. Transparency in methodology can build trust among communities and ensure policy relevance.
Conclusion
India’s caste enumeration challenge lies not in the diversity of names but in the lack of structured analytical frameworks. Linguistic and cultural markers, proven effective in language surveys, offer a robust pathway to reliable caste data. An inclusive, expert-driven, transparent approach—especially one that explicitly counts DNT communities—will strengthen evidence-based policymaking and support long-term social justice and governance outcomes.
