DMP Final Evaluation

Of course, here is a detailed GitBook documentation for your project.

BBMP Act Version History - GitBook Documentation

Introduction

Welcome to the documentation for the BBMP Act Version History project. This project is designed to address a critical gap in the accessibility and transparency of Indian government documents: the lack of version control and historical tracking for legal acts and amendments.

Many crucial government documents, like the Bruhat Bengaluru Mahanagara Palike (BBMP) Act, are living documents that undergo numerous amendments. However, these changes are often not tracked in a clear, accessible, or consolidated manner. This makes it difficult for citizens, legal professionals, and researchers to understand the evolution of the law and the specific changes made over time.

This project solves this problem by providing a user-friendly interface to:

View different versions of the BBMP Act.
Compare versions to see insertions, deletions, and other modifications.
Track the history of amendments in a clear, version-controlled timeline.

This documentation will walk you through the architecture of the project, from data extraction and processing to the frontend UI that brings it all to life.

Data Extraction and Processing

The first step in our project is to extract the raw text of the BBMP Act from its original format and convert it into a structured, machine-readable format.

Raw Data Source

The initial data for the BBMP Act is stored in JSON files, with each file representing one or more chapters of the act.

File Structure: The raw data is located in the bbmp_data_extractor/ directory, with files like chapter1.json, chapter2and3.json, etc.
Content: Each JSON file contains an array of objects, where each object represents a page or a section of the document and includes a markdown field with the raw text content.

Text Extraction

To process this data, we first need to combine the markdown content from these JSON files into a single string for each chapter. This is handled by the getChapterContent function in our extraction scripts (extractor.js and extractor_gemini.js).

getChapterContent(filePath)

Purpose: Reads a chapter's JSON file, parses it, and concatenates the markdown content of each page into a single string.
Process:
1. Reads the file content from the given filePath.
2. Parses the JSON data.
3. Maps over the array of pages and extracts the markdown from each.
4. Joins the markdown content into a single string.

This gives us the full, continuous text of a chapter, ready for the next stage.

Conversion to Akoma Ntoso

Once we have the raw text, the next crucial step is to convert it into a structured format that allows us to track changes and understand the document's hierarchy. For this, we use the Akoma Ntoso standard, an XML-based schema for legal documents.

The `convertToAkomaNtoso` Function

This function, found in our extraction scripts, is the core of our data structuring process. It sends the raw text of a chapter to an AI model (OpenAI's GPT or Google's Gemini) with a specific prompt to parse the text and return a structured JSON object that follows Akoma Ntoso principles.

Key Features of the Conversion Process:

AI-Powered Parsing: We leverage the power of large language models to understand the complex structure of legal text, including chapters, sections, sub-sections, and clauses.
Hierarchical Structure: The AI is instructed to create a nested JSON object that mirrors the hierarchical nature of the legal act. This is essential for tracking amendments to specific clauses or sub-sections.
Completeness and Accuracy: The prompt emphasizes that the entire text must be processed without any modifications, ensuring that the structured output is a faithful representation of the original document.

The Akoma Ntoso JSON Structure

The output of the conversion process is a JSON file that represents the act in the Akoma Ntoso format. These files are stored in the bbmp_data_extractor/akomo-ntoso/ directory.

A typical structure looks like this:

JSON

{
  "akomaNtoso": {
    "act": {
      "meta": { ... },
      "preamble": { ... },
      "body": {
        "chapter": {
          "@eId": "ch_I",
          "num": "CHAPTER I",
          "heading": "PRELIMINARY",
          "section": [
            {
              "@eId": "sec_3",
              "num": "3.",
              "heading": "Definitions.",
              "content": [ ... ]
            }
          ]
        }
      }
    }
  }
}

This structured format is what powers the versioning and diffing features of our application.

HTML Conversion

In addition to the Akoma Ntoso JSON, we also have a script, jsonToHtml.js, that converts the structured JSON into a simple HTML file for easy viewing and debugging. This script generates a readable HTML page from the Akoma Ntoso JSON, preserving the structure of the document.

Frontend and User Interface

The frontend of the project is a Next.js application that provides an interactive and intuitive interface for exploring the different versions of the BBMP Act.

Project Structure

The frontend code is located in the src/ directory, with the main application logic in src/app/page.tsx. The application uses a modern tech stack, including:

Next.js: A React framework for building server-rendered and static web applications.
React: For building the user interface components.
Tailwind CSS: For styling the application, as seen in src/app/globals.css.
clsx and tailwind-merge: Utilities for constructing dynamic and conflict-free class names.

Key UI Components

The user interface is built from several reusable React components located in src/components/ui/.

page.tsx: This is the main component that orchestrates the entire user interface. It manages the state for the selected chapter, active version, search queries, and diffing modes. It also contains the logic for applying amendments and generating the final HTML to be displayed.
VersionTimeline.tsx: This component displays a Git-like timeline of the different versions and amendments of the selected chapter. It allows users to switch between versions and see the history of changes at a glance.
DiffLegend.tsx: A simple component that provides a legend for the color-coded diffs, explaining what the insertion and deletion highlights mean.
ExportBar.tsx: This component provides buttons for exporting the currently viewed document as a .txt file or printing it as a PDF.
Card.tsx and Button.tsx: These are general-purpose UI components used throughout the application to maintain a consistent design.

Diffing and Versioning Logic

The core feature of this project is its ability to show the differences between versions of the BBMP Act. This is handled by a set of functions within src/app/page.tsx.

Applying Amendments

The applyAmendmentsVI, applyAmend1VII, applyAmend2VII, and applyAmend3VII functions are responsible for taking the original HTML content of a chapter and applying the changes from a specific amendment.

Process:
1. The function takes the original HTML string as input.
2. It uses string replacement to find the specific text that needs to be amended.
3. The original text is wrapped in <del> tags (for deletions), and the new text is wrapped in <ins> tags (for insertions).
4. These tags are styled with different background colors to make the changes visually clear to the user.

Diffing Modes

The application supports two different diffing modes:

Cumulative: In this mode, each new version shows all the changes from the original version up to the current point. This is useful for seeing the complete history of changes in a single view.
Incremental: In this mode, each version only shows the changes made since the previous version. This is useful for understanding the specific changes introduced in each amendment.

The computeVersions function in src/app/page.tsx handles the logic for generating the correct HTML content based on the selected diffing mode.

Change Summary

To provide a quick overview of the changes in each version, the summarizeDiff function calculates:

The number of insertions (<ins> tags).
The number of deletions (<del> tags).
A list of the sections that have been modified.

This summary is displayed in the UI to help users quickly identify the scope of the changes in each version.

Conclusion

The BBMP Act Version History project is a powerful tool for promoting transparency and accessibility in governance. By providing a clear and interactive way to track changes to important legal documents, it empowers citizens, legal professionals, and researchers to better understand the laws that shape their lives.

This documentation provides a comprehensive overview of the project's architecture and functionality. We hope it serves as a useful guide for anyone looking to understand, contribute to, or replicate this project for other legal documents.

PreviousReport query methods evaluation NextDemo

Last updated 5 months ago

hashtagBBMP Act Version History - GitBook Documentation

hashtagIntroduction

hashtagData Extraction and Processing

hashtagRaw Data Source

hashtagText Extraction

hashtagConversion to Akoma Ntoso

hashtagThe convertToAkomaNtoso Function

hashtagThe Akoma Ntoso JSON Structure

hashtagHTML Conversion

hashtagFrontend and User Interface

hashtagProject Structure

hashtagKey UI Components

hashtagDiffing and Versioning Logic

hashtagApplying Amendments

hashtagDiffing Modes

hashtagChange Summary

hashtagConclusion