Workflow
Phase 1: Data Ingestion and Structuring
This initial phase is focused on taking the raw, unstructured legal text of the BBMP Act and converting it into a structured, machine-readable format. This is the foundation upon which the entire versioning and comparison functionality is built.
1. Raw Data Collection and Storage:
Input: The process begins with the BBMP Act provided in JSON files, where each file corresponds to one or more chapters of the act (e.g.,
bbmp_data_extractor/chapter1.json,bbmp_data_extractor/chapter2and3.json).Structure: Inside these JSON files, the act is broken down into an array of objects, with each object representing a page and containing a
markdownfield with the raw text.
2. Text Extraction and Consolidation:
Objective: To process the raw data, the markdown content from each page within a chapter's JSON file is extracted and combined into a single, continuous string of text.
Implementation: This is handled by the
getChapterContentfunction in the Node.js scriptsbbmp_data_extractor/extractor.jsandbbmp_data_extractor/extractor_gemini.js. This function reads the specified JSON file, parses it, and concatenates themarkdownfields into one string, preparing it for the next step.
3. Conversion to a Structured Format (Akoma Ntoso):
Purpose: The consolidated raw text is then converted into a structured JSON format that follows the Akoma Ntosostandard for legal documents. This hierarchical structure is crucial for tracking amendments at a granular level (e.g., section, sub-section, clause).
Process:
The
convertToAkomaNtosofunction in the extraction scripts sends the raw text to a large language model (either OpenAI's GPT or Google's Gemini).The AI is given a detailed prompt that instructs it to parse the legal text and return a nested JSON object that accurately represents the document's structure, including chapters, sections, sub-sections, and clauses, without altering the original text.
Output: The result of this process is a set of structured JSON files, such as
bbmp_data_extractor/akomo-ntoso/chapter1_akoma_ntoso.json, which contain the act in a machine-readable, hierarchical format.
4. HTML Generation (Optional):
Utility: To provide a simple, human-readable version of the structured data, the
bbmp_data_extractor/jsonToHtml.jsscript can be used to convert the Akoma Ntoso JSON files into HTML documents, such asbbmp_data_extractor/chapter1.html. This is useful for debugging and for providing a basic, non-interactive view of the legal text.
Phase 2: Frontend Development and User Interface
This phase involves building the web application that allows users to interact with the versioned legal documents. The application is built using Next.js and React.
1. Project Setup and Configuration:
Framework: The project is a Next.js application, as indicated by the file structure in the
src/app/directory, includinglayout.tsxandpage.tsx.Styling: Styling is handled by Tailwind CSS, with global styles and custom themes defined in
src/app/globals.css. Utility functions for combining CSS classes are provided insrc/lib/utils.ts.
2. Main Application Component (page.tsx):
Core Logic: This file is the heart of the frontend application. It manages the application's state, including:
The currently selected chapter of the BBMP Act.
The active version or amendment being viewed.
The diffing mode (cumulative vs. incremental).
User search queries.
Content Display: It dynamically computes and renders the HTML content for the selected version of the act, including the color-coded diffs.
3. UI Components (src/components/ui/):
VersionTimeline.tsx: Displays a vertical, Git-style timeline of all the versions and amendments for the selected chapter. It allows users to click on a version to view its content.DiffLegend.tsx: A simple component that explains the color-coding used for insertions (green) and deletions (red) in the diff view.ExportBar.tsx: Provides buttons that allow the user to download the current view as a.txtfile or to print it as a PDF.Card.tsxandButton.tsx: These are foundational UI components that provide a consistent look and feel for cards and buttons throughout the application, following modern design principles.
Phase 3: Versioning and Diffing Logic
This is the most critical phase, where the application's core functionality of tracking and displaying changes between versions of the act is implemented. All of this logic resides within the main application component, src/app/page.tsx.
1. Storing Versions and Amendments:
In-Code Storage: The different versions of the act's chapters and the corresponding amendment texts are stored as string constants within
page.tsx(e.g.,chapterVI_v1Content,chapterVI_amendmentContent). For a production application, this data would likely be moved to a database or a more scalable content management system.
2. Applying Amendments and Generating Diffs:
Function-Based Logic: A series of functions, such as
applyAmendmentsVIandapplyAmend1VII, are used to programmatically apply the changes from an amendment to a base version of the text.How it Works:
These functions take the HTML content of a version as input.
They use string replacement to find the specific text that is being changed by an amendment.
The original text is wrapped in
<del>tags, which are styled with a red background to indicate a deletion.The new, amended text is wrapped in
<ins>tags, styled with a green background to indicate an insertion.This modified HTML string is then rendered in the UI, providing a clear visual representation of the changes.
3. Managing Diffing Modes:
Cumulative vs. Incremental: The
computeVersionsfunction inpage.tsxcontains the logic to handle the two diffing modes.Cumulative: When this mode is active, the
applyAmend...functions are chained together, so that each version shows all changes from the originalv1up to that point.Incremental: In this mode, each version is first rendered without diffs, and then the amendment function for that specific version is applied, showing only the changes introduced in that step.
4. Summarizing Changes:
summarizeDiffFunction: To provide users with a quick overview of the changes, this function analyzes the generated HTML of the active version.Process: It counts the number of
<ins>and<del>tags to get the number of insertions and deletions. It also identifies which sections of the act have been modified by looking for the section IDs within the HTML that contains these tags.Display: The results of this summary (number of insertions, deletions, and touched sections) are displayed in a "Change Summary" card in the UI, giving users an at-a-glance understanding of the modifications in the selected version.
Last updated