Workflow

Phase 1: Data Ingestion and Structuring

This initial phase is focused on taking the raw, unstructured legal text of the BBMP Act and converting it into a structured, machine-readable format. This is the foundation upon which the entire versioning and comparison functionality is built.

1. Raw Data Collection and Storage:

  • Input: The process begins with the BBMP Act provided in JSON files, where each file corresponds to one or more chapters of the act (e.g., bbmp_data_extractor/chapter1.json, bbmp_data_extractor/chapter2and3.json).

  • Structure: Inside these JSON files, the act is broken down into an array of objects, with each object representing a page and containing a markdown field with the raw text.

2. Text Extraction and Consolidation:

  • Objective: To process the raw data, the markdown content from each page within a chapter's JSON file is extracted and combined into a single, continuous string of text.

  • Implementation: This is handled by the getChapterContent function in the Node.js scripts bbmp_data_extractor/extractor.js and bbmp_data_extractor/extractor_gemini.js. This function reads the specified JSON file, parses it, and concatenates the markdown fields into one string, preparing it for the next step.

3. Conversion to a Structured Format (Akoma Ntoso):

  • Purpose: The consolidated raw text is then converted into a structured JSON format that follows the Akoma Ntosostandard for legal documents. This hierarchical structure is crucial for tracking amendments at a granular level (e.g., section, sub-section, clause).

  • Process:

    • The convertToAkomaNtoso function in the extraction scripts sends the raw text to a large language model (either OpenAI's GPT or Google's Gemini).

    • The AI is given a detailed prompt that instructs it to parse the legal text and return a nested JSON object that accurately represents the document's structure, including chapters, sections, sub-sections, and clauses, without altering the original text.

  • Output: The result of this process is a set of structured JSON files, such as bbmp_data_extractor/akomo-ntoso/chapter1_akoma_ntoso.json, which contain the act in a machine-readable, hierarchical format.

4. HTML Generation (Optional):

  • Utility: To provide a simple, human-readable version of the structured data, the bbmp_data_extractor/jsonToHtml.jsscript can be used to convert the Akoma Ntoso JSON files into HTML documents, such as bbmp_data_extractor/chapter1.html. This is useful for debugging and for providing a basic, non-interactive view of the legal text.


Phase 2: Frontend Development and User Interface

This phase involves building the web application that allows users to interact with the versioned legal documents. The application is built using Next.js and React.

1. Project Setup and Configuration:

  • Framework: The project is a Next.js application, as indicated by the file structure in the src/app/ directory, including layout.tsx and page.tsx.

  • Styling: Styling is handled by Tailwind CSS, with global styles and custom themes defined in src/app/globals.css. Utility functions for combining CSS classes are provided in src/lib/utils.ts.

2. Main Application Component (page.tsx):

  • Core Logic: This file is the heart of the frontend application. It manages the application's state, including:

    • The currently selected chapter of the BBMP Act.

    • The active version or amendment being viewed.

    • The diffing mode (cumulative vs. incremental).

    • User search queries.

  • Content Display: It dynamically computes and renders the HTML content for the selected version of the act, including the color-coded diffs.

3. UI Components (src/components/ui/):

  • VersionTimeline.tsx: Displays a vertical, Git-style timeline of all the versions and amendments for the selected chapter. It allows users to click on a version to view its content.

  • DiffLegend.tsx: A simple component that explains the color-coding used for insertions (green) and deletions (red) in the diff view.

  • ExportBar.tsx: Provides buttons that allow the user to download the current view as a .txt file or to print it as a PDF.

  • Card.tsx and Button.tsx: These are foundational UI components that provide a consistent look and feel for cards and buttons throughout the application, following modern design principles.


Phase 3: Versioning and Diffing Logic

This is the most critical phase, where the application's core functionality of tracking and displaying changes between versions of the act is implemented. All of this logic resides within the main application component, src/app/page.tsx.

1. Storing Versions and Amendments:

  • In-Code Storage: The different versions of the act's chapters and the corresponding amendment texts are stored as string constants within page.tsx (e.g., chapterVI_v1Content, chapterVI_amendmentContent). For a production application, this data would likely be moved to a database or a more scalable content management system.

2. Applying Amendments and Generating Diffs:

  • Function-Based Logic: A series of functions, such as applyAmendmentsVI and applyAmend1VII, are used to programmatically apply the changes from an amendment to a base version of the text.

  • How it Works:

    1. These functions take the HTML content of a version as input.

    2. They use string replacement to find the specific text that is being changed by an amendment.

    3. The original text is wrapped in <del> tags, which are styled with a red background to indicate a deletion.

    4. The new, amended text is wrapped in <ins> tags, styled with a green background to indicate an insertion.

    5. This modified HTML string is then rendered in the UI, providing a clear visual representation of the changes.

3. Managing Diffing Modes:

  • Cumulative vs. Incremental: The computeVersions function in page.tsx contains the logic to handle the two diffing modes.

    • Cumulative: When this mode is active, the applyAmend... functions are chained together, so that each version shows all changes from the original v1 up to that point.

    • Incremental: In this mode, each version is first rendered without diffs, and then the amendment function for that specific version is applied, showing only the changes introduced in that step.

4. Summarizing Changes:

  • summarizeDiff Function: To provide users with a quick overview of the changes, this function analyzes the generated HTML of the active version.

  • Process: It counts the number of <ins> and <del> tags to get the number of insertions and deletions. It also identifies which sections of the act have been modified by looking for the section IDs within the HTML that contains these tags.

  • Display: The results of this summary (number of insertions, deletions, and touched sections) are displayed in a "Change Summary" card in the UI, giving users an at-a-glance understanding of the modifications in the selected version.

Last updated