--- base_model: - LiquidAI/LFM2-350M-Extract license: apache-2.0 language: - en tags: - text-generation - instruction-tuning - structured-output - toon - lfm2 - unsloth - lora - transformers datasets: - yasserrmd/TOON-Unstructured-Structured model-index: - name: yasserrmd/LFM2-350M-Extract-TOON results: - task: name: TOON conversion (schema-driven extraction) type: text-generation dataset: name: yasserrmd/TOON-Unstructured-Structured type: text metrics: - name: Final Training Loss type: loss value: 0.2178 - name: Lowest Loss type: loss value: 0.2043 - name: Total Steps type: steps value: 430 --- # yasserrmd/LFM2-350M-Extract-TOON `yasserrmd/LFM2-350M-Extract-TOON` is a **fine-tuned variant of LiquidAI’s LFM2-350M-Extract**, built using the **Unsloth AI** framework and the dataset [`yasserrmd/TOON-Unstructured-Structured`](https://huggingface.co/datasets/yasserrmd/TOON-Unstructured-Structured). This model specializes in **schema-driven conversion of natural-language text into valid TOON (Token-Oriented Object Notation)** format — a compact, token-efficient alternative to JSON designed for large language models. --- ## Model Overview | Property | Description | |-----------|-------------| | **Base Model** | LiquidAI/LFM2-350M-Extract | | **Architecture** | LFM2-350M (Decoder-only Transformer) | | **Fine-tuning Method** | LoRA (via Unsloth AI) | | **Objective** | Structured extraction in TOON format | | **Dataset** | yasserrmd/TOON-Unstructured-Structured | | **Languages** | English | | **Frameworks** | Transformers, Unsloth, PyTorch | | **License** | LFM License v1.0 | | **Final Loss** | 0.2178 (Step 430) | --- ## What is TOON? **TOON (Token-Oriented Object Notation)** is a serialization format optimized for LLMs. It represents structured data with minimal tokens using a **header + rows** pattern: ``` users[2]{id,name,role}: 1,Alice,admin 2,Bob,user ```` Compared to JSON, TOON reduces token count by up to 60% and is easier for LLMs to generate deterministically. --- ## Training Summary The model was trained on 430 steps with the following key trends: - **Initial loss:** 1.3793 - **Final loss:** 0.2178 - **Lowest recorded loss:** 0.2043 - **Steady convergence** after step 250 with consistent decline below 0.3. - **Training method:** Unsloth LoRA (rank 16, alpha 32, learning rate 2e-4, batch size 64). - **Hardware:** 1x NVIDIA T4 (15 GB VRAM). - **Duration:** 30 Minutes. The training demonstrated strong stability and smooth convergence towards sub-0.25 loss, confirming excellent adaptation of the base model to TOON structure. --- ## Usage Example ```python from transformers import AutoTokenizer, AutoModelForCausalLM from transformers import TextStreamer model_id = "yasserrmd/LFM2-350M-Extract-TOON" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto") schema = """ "$schema": "http://json-schema.org/draft-07/schema#" type: object properties: id: type: string pattern: "^(\\d+\\.\\d+) disturbing$" description: Dot-separated integers representing the unique ID of each element in the hierarchy title: type: string description: Descriptive title of the section or element level: type: integer minimum: 0 maximum: 9 description: "Hierarchical level (0 - ROOT, 1 - SECTION, 2 - SUBSECTION, 3+ - DETAIL_N)" level_type: type: string enum[4]: ROOT,SECTION,SUBSECTION,DETAIL_N description: Type of the hierarchical element component: type: array items: type: object properties: idc: type: integer description: Component ID component_type: type: string enum[4]: PARAGRAPH,TABLE,CALCULATION,CHECKBOX description: Type of component metadata: type: string description: "Additional metadata (e.g., title, note, or overview)" properties: type: object properties: variables: type: array items: type: object properties: idx: type: string description: Unique row-column identifier (X.Y format) name: type: string description: Attribute name value: type: string description: Attribute value unit: type[2]: string,"null" description: Optional unit for the value metrics: type: boolean description: Boolean flag indicating if the attribute is a metric formula: type: boolean description: Boolean flag indicating if the attribute is a formula content: type: array items: type[2]: string,"null" description: Text content children: type: array items: "$ref": # required[6]: id,title,level,level_type,component,children """ text = """ SUBSECTION component[1]: - idc: 1 component_type: PARAGRAPH metadata: "Note: Specific to debtor risk." properties: variables[0]: content[1]: The risk of debtors failing to make payments on time. - id: "2.2" title: Liquidity Risk level: 2 level_type: SUBSECTION component[1]: - idc: 1 component_type: PARAGRAPH metadata: "Note: Specific to liquidity risk." properties: variables[0]: content[1]: Liquidity risk is related to the difficulty in selling assets quickly without a significant loss. The document begins with an inclusive overview, elucidating the purpose of the report and its objective to assess risks and propose mitigations for financial operations, such as compliance, fraud detection, and performance metrics. The overall framework is meticulously divided into several sections and subsections reflecting detailed and structured analysis. This report is intended to provide a comprehensive understanding of risk exposure within financial operations. We will now delve into the first section of the report, which covers a vast array of compliance regulations critical for maintaining financial accountability. Firstly, let’s examine the **Compliance Section**. The section’s primary aim is to highlight the key compliance regulations applicable to financial operations. Notably, this includes the **Anti-Money Laundering (AML) Regulation (RC.1)** and the **Data Privacy Act (RC.2)**. Highlighting the significance of these regulations, the Subsection on Anti-Money Laundering identifies several gaps within the current system. These gaps need to be addressed to ensure robust compliance. The analysis suggests the presence of several risk points where the current practices might fall short of regulatory standards. Next, we have a **Detailed Risk Analysis** for the Anti-Money Laundering Regulation. This component outlines the specific risks and potential impacts on financial operations. In the document, a table detailing the risk assessment is provided outlining two primary risks, **Fraudulent Transactions (RA.1)**, and **Non-Compliance with AML (RA.2)**, each with a brief description of the risk and its possible consequences. Addressing these risks requires a systematic approach, ensuring all preventive measures are in place to mitigate financial risks effectively. Moreover, a **Checklist** is included to assess the current status concerning the Anti-Money Laundering Regulation. The Checklist requires the selection of the best option that describes the current status as either **Option 1 (true)** or **Option 2 (false)**. This selection is pivotal in making informed decisions about regulatory compliance and operational adjustments. In parallel, the **Data Privacy Act** (RC.2) Subsection identifies several issues in handling personal data. These issues need to be corrected to fully comply with the Data Privacy Act. The **Fraud Detection Section** and its **Subsections on Misrepresentation and Theft of Data** follow a similar structure, detailing the critical risks associated with these vulnerabilities and emphasizing the necessity for mitigation strategies. In the **Fraud Detection Section**, we have a table outlining two major cases of fraud: **Misrepresentation (FC.1)** and **Theft of Data (FC.2)**. These cases are significant due to their impact on financial integrity and operational continuity. The analysis of these cases includes detailed descriptions of the nature and extent of the fraud, highlighting the importance of robust fraud detection mechanisms. Each regulatory and fraud-related section is equipped with thorough analysis and checks, ensuring that every risk is identified and addressed. While the sections provide detailed tables and checklists, they also reflect the broader context of financial operations and the mitigation strategies required to ensure compliance and prevent fraud. By providing these detailed sections and sub-sections, the report aims to equip stakeholders with the necessary information to assess and improve the risk management framework. This ensures that all financial operations are conducted in a compliant, transparent, and secure manner, thereby safeguarding the interests of all stakeholders involved. """ system_instruction = ( "You are an intelligent model specialized in converting natural language text" "into valid TOON (Token-Oriented Object Notation) format. " "Always follow the given schema strictly, emit the correct header " "in the form