Akomo Ntoso format Generation

Understanding Akoma Ntoso: The "Why"

Before diving into the "how," it's important to understand why Akoma Ntoso is the chosen format. Legal documents are more than just text; they have an intricate, hierarchical structure. Akoma Ntoso ("linked hearts" in the Akan language of Ghana) is an international standard designed to represent legal and legislative documents in a structured, machine-readable way.

By converting the BBMP Act into this format, we unlock several key capabilities:

  • Granular Version Tracking: Instead of just knowing that a file has changed, we can pinpoint changes to a specific section, sub-section, or even a single clause. This is crucial for tracking amendments with precision.

  • Semantic Understanding: The Akoma Ntoso structure provides semantic meaning to the text. A machine can understand the difference between a chapter, a section, and a preamble, which is impossible with plain text.

  • Interoperability: As a global standard, it allows for the exchange and comparison of legal documents across different systems and jurisdictions.

The Conversion Logic: From Raw Text to Structured JSON

The conversion process is orchestrated by the scripts in your bbmp_data_extractor directory, primarily extractor.js and extractor_gemini.js. The core of this process is leveraging a large language model (LLM) to act as an expert legal parser.

Here’s a step-by-step breakdown of the logic:

Step 1: Consolidating the Raw Text

First, the raw text, which is fragmented across multiple JSON objects (representing pages), needs to be combined into a single, coherent string for each chapter.

Logic: The getChapterContent function is responsible for this. It reads the JSON file for a specific chapter, iterates through the array of pages, and concatenates the markdown content from each page.

Code Snippet (extractor.js):

JavaScript

async function getChapterContent(filePath) {
    try {
        const data = await fs.readFile(filePath, 'utf-8');
        const pages = JSON.parse(data);

        const combinedMarkdown = pages.map(page => page.markdown).join('\n');
        return combinedMarkdown;
    } catch (error) {
        console.error(`Error reading or parsing file at ${filePath}:`, error);
        throw error;
    }
}

Step 2: Instructing the AI with a System Prompt

This is the most critical part of the process. We provide the LLM with a detailed set of instructions, or a "system prompt," that tells it exactly how to parse the text and what the output JSON structure should look like.

Logic: The systemPrompt in convertToAkomaNtoso is carefully crafted to:

  • Define the Role: It tells the AI to act as an "expert legal document parser."

  • Specify the Goal: It clearly states the objective is to create a hierarchical JSON structure based on Akoma Ntoso principles.

  • Provide a Schema: It gives a clear example of the desired JSON output, including the nesting of chapter, section, subsection, and clauses. This is crucial for ensuring the AI returns a consistent and predictable structure.

  • Set Constraints: It includes critical instructions like "do not change, rephrase, or manipulate any of the original legal text" to ensure the integrity of the legal document is maintained.

Code Snippet (extractor_gemini.js):

JavaScript

Step 3: Making the API Call and Receiving the JSON

With the raw text consolidated and the instructions defined, the script then makes an API call to the LLM.

Logic: The convertToAkomaNtoso function sends the system prompt and the raw text to the AI model. It specifies that the response should be a JSON object, which simplifies the parsing on our end.

Code Snippet (extractor.js):

JavaScript

Step 4: Storing the Structured Output

Finally, the structured JSON returned by the AI is saved to a file.

Logic: The main function orchestrates the entire process: it calls getChapterContent to get the text, passes it to convertToAkomaNtoso, and then writes the resulting structured JSON to a new file in the bbmp_data_extractor/akomo-ntoso/ directory.

Code Snippet (extractor.js):

JavaScript

The output of this process, as seen in files like chapter1_akoma_ntoso.json, is a rich, hierarchical representation of the BBMP Act.

Here is an example from your data, showing how a section with nested clauses is structured:

JSON

This structured format is what enables the powerful versioning and diffing features of your application, turning a static, hard-to-track document into a dynamic, transparent, and accessible legal text.

Last updated