transforming an 18-year-old e-commerce platform with ai

date: June 15, 2024

Problem: A clothing brand had been running a self-built e-commerce platform for nearly two decades. Over those years, product data had accumulated without structure. Colors, sizes, fabric composition, washing instructions: all stored as free-form text in single database fields. 29,000 products. 90,000 variants. No way to filter, facet, or programmatically extract any of it.

Solution: Instead of hiring two people for three months to manually restructure the data, we used GPT-4o to parse and categorize every product attribute automatically. Concept to completion in six weeks.

The data problem

Eighteen years of product data entry with no enforced schema. A typical product record had a single text field containing something like “Navy blue cotton/elastane blend, machine wash 30C, slim fit.” Color, material, care instructions, and fit all concatenated into one string. Sometimes in Dutch, sometimes in English. Sometimes with abbreviations, sometimes without.

This is not a problem you solve with regex. The formats were too inconsistent, the edge cases too numerous, and the volume too large for pattern matching to work reliably. We estimated regex-based extraction would cover about 70% of cases, leaving 27,000 variants to handle manually.

Why GPT-4o

We chose GPT-4o specifically. Not the latest model, not the cheapest. It hit the intersection of three requirements: stable enough for batch processing (no random failures mid-run), fast enough to process 90,000 variants in a reasonable timeframe, and cost-effective enough that the API bill stayed well below what manual labor would cost.

Before committing, we ran a prototype against roughly 100 products. The extraction accuracy was high enough to proceed. The remaining question was whether that accuracy would hold at scale and across languages.

The approach

We built the migration tooling as a Strapi application backed by Azure Service Bus for job orchestration. Each product flowed through a queue: legacy MSSQL database read, attribute extraction via GPT-4o, structured output stored in Strapi, then pushed to the new commerce platform.

The migration worked at two levels. Master-level attributes shared across all products (color, material, care instructions) went through one prompt. Body text cleanup (removing HTML links, bullet lists, color-specific language) went through a separate prompt. Both returned structured JSON.

The prompt construction was the core of the system. For each product, we built a prompt from six input fields: English title, Dutch title, English description, Dutch description, washing instructions, and “kenmerken” (Dutch for characteristics) in both languages. The prompt included the full list of predefined attribute options, so the model could only pick from valid values.

const newPrompt = createPromptMasterAttributes(
  productMasterAttributes,
  product.legacy_data_product.Title_EN,
  product.legacy_data_product.Title_NL,
  product.legacy_data_product.Bodytext_EN,
  product.legacy_data_product.Bodytext_NL,
  product.legacy_data_product.WashingInstructions,
  product.legacy_data_product.Kenmerken_NL,
  product.legacy_data_product.Kenmerken_EN
);

The prompt included every valid attribute option as a constraint. The model could only return values from the predefined list. If no option matched, the instruction was to leave the attribute out entirely, not to return “not applicable” or an empty string. This eliminated an entire category of bad output.

We enforced response_format: { type: "json_object" } on every API call. The model returned structured JSON with attribute name, type, localized values (en_US and nl_NL), and a lock flag. No parsing of free-text responses. No regex on the output side.

Handling hallucinations

The hard part was not extraction. It was preventing the model from inventing data. Washing instructions were the worst offender. Given a product with no care instructions in the text, the model would sometimes generate plausible-sounding instructions based on the fabric type.

We solved this through iterative prompt engineering. Each round tightened the constraints: extract only what is explicitly stated, leave out anything ambiguous, never infer from context. After four prompt iterations, we reached over 99% consistency across the full dataset. The remaining edge cases were flagged for manual review.

Every API call was logged with token usage, cost calculation, and the raw completion. This made debugging straightforward. When a product came out wrong, we could pull the exact prompt and response from Strapi and understand why.

let cost = (usage_prompt_tokens * 0.00003)
  + (usage_completion_tokens * 0.00006);

At $0.03 per 1K prompt tokens, processing 29,000 products cost less than a single day of manual labor.

Validation

The final parsing stage ran for three days. Every product went through extraction, then a validation pass that checked for completeness (are all expected fields present?), consistency (does “100% cotton” parse into the right material code?), and language correctness (is a Dutch description returning Dutch attribute values?).

Products that failed validation went into a review queue. Less than 1% of the total. Staff reviewed those manually, which took a single afternoon rather than three months.

What this replaced

The manual alternative was real. Two full-time employees, three months minimum, copy-pasting and categorizing product data by hand. Error-prone, unscalable, and the kind of work that burns people out.

The AI approach shipped the same result in six weeks, including prototyping and validation. The API cost was a fraction of the labor cost. More importantly, the approach is repeatable. When the brand adds new product lines or migrates another data source, the same pipeline runs again.

What it unlocked

With structured data in place, the platform can now offer proper faceted search. Filter by color, material, size, care instructions. Sort by attributes that were previously invisible to the system. Feed clean product data to external marketplaces. Run analytics on inventory composition.

None of this was possible when everything lived in a text field.

Result: 29,000 products restructured. 90,000 variants categorized. Six weeks. Over 99% accuracy.

I designed and built this at rb2. Read the full case study.