E-commerce product Attribute Value Extraction without sub-dictionaries. What should I do?

#1
by tamarutaca - opened
  • Models used: LFM2-350M-Extract/LFM2-1.2B-Extract

I'm trying to use the product descriptions from the AE110K products benchmark, generating the JSON output and comparing it with the ground-truth. But the generation often returns invalid JSON and doesn't follow instructions. Do you suggest some work-around?

Current System Prompt:

You are an information extraction system for e-commerce product descriptions.

Your task is to extract attributes and output them as a **strict, valid JSON object**.

Follow these REQUIRED rules:

1. The output must be a valid JSON object.
2. Keys MUST:
   - be in English;
   - start with a Capital letter;
3. Values MUST:
   - be flat strings (no lists, no nested JSON, no arrays, no objects);
   - contain only textual descriptions extracted from the input.
4. You MUST NOT:
   - generate lists [] under any key,
   - generate nested JSON objects {},
   - invent attributes not present or implied by the input,
   - include commentary or explanations.
5. The output must contain ONLY the JSON. No extra text.

Below are examples of correct behavior:
    
## Example 1

INPUT: `New Electric Flameless Torch Battery Windproof Lighter BBQ Picnic Camping No Gas Fuel Required Fire Starter`
OUTPUT: `{"Disposable": "No", "Automatic Alarming or Not (Kettle)": "No", "Category": "Lighter", "With Fuel or Not": "No", "With Wooden Handle or Not": "No"}`

## Example 2

INPUT: `Miyouj Leaves Print One Piece Swimsuit Female Off Shoulder Swimwear Women Bathing Suits Brazilian Bikini May Beach XL Monokini`
OUTPUT: `{"Sport Type": "Swim", "Model Number": "Monokini", "Gender": "Women", "Pattern Type": "Print", "one-piece suit": "Swimsuit Female", "Bathers": "Off Shoulder Swimwear", "Female Swimsuits": "Swimwear Women", "Monokini": "Women Bathing Suits", "Bathing Suit Women": "Monokini", "Women Swimwear": "Brazilian Bikini", "Swimwear 2018": "Bikini May Beach", "swimsuit Bathing Suit": "Brazilian Bikini"}`

## Example 3

INPUT: `Outdoor Travel Cooker Stoves Ultra-Light Spirit Alcohol Stove Camping Cooking Furnace`
OUTPUT: `{"Type": "Alcohol Stove"}`

Complete Output:

{
  "product_description": {
    "brand": "APG",
    "model_number": "STO0045",
    "category": "Camping Stove Portable Cooking Equipment",
    "description": "Welding BBQ Butane Hiking Camping Gas Burners Gas Adapter Torch Lighter",
    "price": "₪150",
    "specifications": {
      "portability": "Portable",
      "automatic_armer": "No",
      "welding_capacity": "Gas",
      "battery_life": "Gas",
      "handling": "Butane",
      "handling_weight": "Hiking"
    }
  },
  "product_entries": [
    {
      "product_id": "1",
      "name": "Camping Stove Portable Cooking Equipment Welding BBQ Butane Hiking Camping Gas Burners",
      "price": "₪150",
      "specifications": {
        "portability": "Portable",
        "automatic_armer": "No",
        "welding_capacity": "Gas",
        "battery_life": "Gas",
        "handling": "Butane",
        "handling_weight": "Hiking"
      }
    },
    {
      "product_id": "2",
      "name": "Outdoor Travel Cooker Stoves Ultra-Light Spirit Alcohol Stove Camping Cooking Furnace",
      "price": "₪0",
      "specifications": {
        "portability": "Portable",
        "automatic_armer": "No",
        "welding_capacity": "Alcohol",
        "battery_life": "Alcohol",
        "handling": "Butane",
        "handling_weight": "Camping"
      }
    },
    {
      "product_id": "3",
      "name": "Camping Stove Portable Cooking Equipment Welding BBQ Butane Hiking Camping Gas Burners",
      "price": "₪0",
      "specifications": {
        "portability": "Portable",
        "automatic_armer": "No",
        "welding_capacity": "Gas",
        "battery_life": "Gas",
        "handling": "Butane",
        "handling_weight": "Camping"
      }
    }
  ]
}

Problems: doesn't follow rules 2, 3 and 4

Expected: {'Fuel': 'Gas', 'Brand Name': 'APG'}

Other Output Examples:

{
...
  "words_84": "Case",
  "words_85": "Outdoor Energy",
  "words_86": "Warehouse",
  "words_87": "Bin Stove",
  "words_88": "Cylinder",
  "words_89": "Gas Bin

Problems:

  1. Generates until max tokens are reached;
  2. Invalid JSON;
  3. Hallucinated keys;
  4. Doesn't follow rules.

I've been trying some prompts for fixing the generated output but so far, no good.

Sign up or log in to comment