E-commerce product Attribute Value Extraction without sub-dictionaries. What should I do?
#1
by
tamarutaca
- opened
- Models used:
LFM2-350M-Extract/LFM2-1.2B-Extract
I'm trying to use the product descriptions from the AE110K products benchmark, generating the JSON output and comparing it with the ground-truth. But the generation often returns invalid JSON and doesn't follow instructions. Do you suggest some work-around?
Current System Prompt:
You are an information extraction system for e-commerce product descriptions.
Your task is to extract attributes and output them as a **strict, valid JSON object**.
Follow these REQUIRED rules:
1. The output must be a valid JSON object.
2. Keys MUST:
- be in English;
- start with a Capital letter;
3. Values MUST:
- be flat strings (no lists, no nested JSON, no arrays, no objects);
- contain only textual descriptions extracted from the input.
4. You MUST NOT:
- generate lists [] under any key,
- generate nested JSON objects {},
- invent attributes not present or implied by the input,
- include commentary or explanations.
5. The output must contain ONLY the JSON. No extra text.
Below are examples of correct behavior:
## Example 1
INPUT: `New Electric Flameless Torch Battery Windproof Lighter BBQ Picnic Camping No Gas Fuel Required Fire Starter`
OUTPUT: `{"Disposable": "No", "Automatic Alarming or Not (Kettle)": "No", "Category": "Lighter", "With Fuel or Not": "No", "With Wooden Handle or Not": "No"}`
## Example 2
INPUT: `Miyouj Leaves Print One Piece Swimsuit Female Off Shoulder Swimwear Women Bathing Suits Brazilian Bikini May Beach XL Monokini`
OUTPUT: `{"Sport Type": "Swim", "Model Number": "Monokini", "Gender": "Women", "Pattern Type": "Print", "one-piece suit": "Swimsuit Female", "Bathers": "Off Shoulder Swimwear", "Female Swimsuits": "Swimwear Women", "Monokini": "Women Bathing Suits", "Bathing Suit Women": "Monokini", "Women Swimwear": "Brazilian Bikini", "Swimwear 2018": "Bikini May Beach", "swimsuit Bathing Suit": "Brazilian Bikini"}`
## Example 3
INPUT: `Outdoor Travel Cooker Stoves Ultra-Light Spirit Alcohol Stove Camping Cooking Furnace`
OUTPUT: `{"Type": "Alcohol Stove"}`
Complete Output:
{
"product_description": {
"brand": "APG",
"model_number": "STO0045",
"category": "Camping Stove Portable Cooking Equipment",
"description": "Welding BBQ Butane Hiking Camping Gas Burners Gas Adapter Torch Lighter",
"price": "₪150",
"specifications": {
"portability": "Portable",
"automatic_armer": "No",
"welding_capacity": "Gas",
"battery_life": "Gas",
"handling": "Butane",
"handling_weight": "Hiking"
}
},
"product_entries": [
{
"product_id": "1",
"name": "Camping Stove Portable Cooking Equipment Welding BBQ Butane Hiking Camping Gas Burners",
"price": "₪150",
"specifications": {
"portability": "Portable",
"automatic_armer": "No",
"welding_capacity": "Gas",
"battery_life": "Gas",
"handling": "Butane",
"handling_weight": "Hiking"
}
},
{
"product_id": "2",
"name": "Outdoor Travel Cooker Stoves Ultra-Light Spirit Alcohol Stove Camping Cooking Furnace",
"price": "₪0",
"specifications": {
"portability": "Portable",
"automatic_armer": "No",
"welding_capacity": "Alcohol",
"battery_life": "Alcohol",
"handling": "Butane",
"handling_weight": "Camping"
}
},
{
"product_id": "3",
"name": "Camping Stove Portable Cooking Equipment Welding BBQ Butane Hiking Camping Gas Burners",
"price": "₪0",
"specifications": {
"portability": "Portable",
"automatic_armer": "No",
"welding_capacity": "Gas",
"battery_life": "Gas",
"handling": "Butane",
"handling_weight": "Camping"
}
}
]
}
Problems: doesn't follow rules 2, 3 and 4
Expected: {'Fuel': 'Gas', 'Brand Name': 'APG'}
Other Output Examples:
{
...
"words_84": "Case",
"words_85": "Outdoor Energy",
"words_86": "Warehouse",
"words_87": "Bin Stove",
"words_88": "Cylinder",
"words_89": "Gas Bin
Problems:
- Generates until max tokens are reached;
- Invalid JSON;
- Hallucinated keys;
- Doesn't follow rules.
I've been trying some prompts for fixing the generated output but so far, no good.