About
The law has language at its heart. Lawmakers characteristically use language to make law, and law must provide for the authoritative resolution of disputes over the effects of that use of language. Most natural language processing (NLP) has focussed on natural language, which is very different from legalese (the language of law). Legal language is characterized by its wordiness, specificity, formality, completeness, and heavy usage of loanwords. Because of its differences, legalese has been often called a "sublanguage", distinct from spoken or written English. In addition to the linguistic differences, there are also differences at the document-authoring level. While most natural language documents are a few pages long, legal documents span 100s of pages with little use of diagrams, sectioning, tables, and other supporting constructs.
In the recent years, NLP has seen significant advances in natural language understanding and text
representation. However, the same progress has not been there in the legalese sublanguage. Tasks like
question-answering, summarization, simplification, and named entity extraction, could immensely benefit
lay users to answer the type of questions that currently require professional legal help. For example,
the huge difference between lay and legal English is one major reason why calls for comments on bills
and legislation don't gather many responses. AI and NLP have the potential to improve this situation for
better.
Legal Language Processing from the Indian Context
Since the drafting of the Indian Constitution more than 70 years back, it has become the lengthiest
constitution anywhere in the world. While natural language processing has progressed a lot, it still
lacks much in the legal domain, especially in the Indian legal domain. With the most extensive
constitution, 20+ recognized languages (with more than 1 million speakers each), colloquial terms based
on different dialects, and a myriad system of state and national laws, the Indian legal domain offers
both a unique opportunity and a challenge to build unique NLP solutions for it.
Moreover, with people becoming more and more aware, the number of cases is rising. Last year, the
Supreme Court of India alone had more than 80,000 cases. The rise in cases also leads to a consequent
increase in the pendency of cases. More than 40 million cases were pending before all the courts till
2020. All of this calls for efficient NLP solutions for the Indian legal domain. Project Vidhaan aims to
do that. This project aims to build a high-quality corpus of Indian laws and solutions using legal
language processing technology.
Join us to contribute towards making solutions for legal language.
Members



