About


The law has language at its heart. Lawmakers characteristically use language to make law, and law must provide for the authoritative resolution of disputes over the effects of that use of language. Most natural language processing (NLP) has focussed on natural language, which is very different from legalese (the language of law). Legal language is characterized by its wordiness, specificity, formality, completeness, and heavy usage of loanwords. Because of its differences, legalese has been often called a "sublanguage", distinct from spoken or written English. In addition to the linguistic differences, there are also differences at the document-authoring level. While most natural language documents are a few pages long, legal documents span 100s of pages with little use of diagrams, sectioning, tables, and other supporting constructs.


In the recent years, NLP has seen significant advances in natural language understanding and text representation. However, the same progress has not been there in the legalese sublanguage. Tasks like question-answering, summarization, simplification, and named entity extraction, could immensely benefit lay users to answer the type of questions that currently require professional legal help. For example, the huge difference between lay and legal English is one major reason why calls for comments on bills and legislation don't gather many responses. AI and NLP have the potential to improve this situation for better.

Legal Language Processing from the Indian Context


Since the drafting of the Indian Constitution more than 70 years back, it has become the lengthiest constitution anywhere in the world. While natural language processing has progressed a lot, it still lacks much in the legal domain, especially in the Indian legal domain. With the most extensive constitution, 20+ recognized languages (with more than 1 million speakers each), colloquial terms based on different dialects, and a myriad system of state and national laws, the Indian legal domain offers both a unique opportunity and a challenge to build unique NLP solutions for it. Moreover, with people becoming more and more aware, the number of cases is rising. Last year, the Supreme Court of India alone had more than 80,000 cases. The rise in cases also leads to a consequent increase in the pendency of cases. More than 40 million cases were pending before all the courts till 2020. All of this calls for efficient NLP solutions for the Indian legal domain. Project Vidhaan aims to do that. This project aims to build a high-quality corpus of Indian laws and solutions using legal language processing technology.

Join us to contribute towards making solutions for legal language.

Members

Get Involved

Project-Vidhaan offers rich opportunities for Universities to collaborate on the creation of these critical public goods for the future of AI in Justice. Professors, research assistants and advanced students can co-create and learn while students can get hands-on experience developing data-sets and refining technology.We are actively engaging with new potential members and collaborators.
Contact :project-vidhaan@googlegroups.com

Small and Large businesses are welcome to collaborate on building open datasets and using the open tools and datasets built as part of project-Vidhaan. Apart from this you businesses would also have access to a community of experts working in AI for Justice. Do reach out via contact form to connect.

Non-Profits and Civil Society is closer to the ground and hence has a better understanding of the problem. Non-Profits and Civil Society can help build that common understanding and use the open tools to build more effective solutions for their teams to conduct better research and improved workflows. They would also have opportunities to collaborate and receive support from the community of different organizations with similar mission.

Contact Us

Address

A-415, New Acadmic Bulding,
IIIT-Delhi, Shyam Nagar, Okhla Industrial Estate,
New Delhi, Delhi 110020

Email Us

research@midas.center