APX (ASX) Appen is the first company in Australia to use A.I for operations

This case study from Appen demonstrates a sophisticated approach to integrating LLMs into production data annotation workflows, highlighting both the opportunities and challenges of combining artificial and human intelligence in practical applications.
The context of this implementation is particularly important: Appen's research showed a 177% increase in generative AI adoption in 2024, yet simultaneously observed an 8% drop in projects making it to deployment since 2021. Their analysis identified data management as a key bottleneck, with a 10 percentage point increase in data-related challenges from 2023 to 2024. This speaks to the broader industry challenge of scaling high-quality data annotation for AI systems.
The technical architecture of their co-annotation system consists of several key components:

A co-annotation engine that intelligently routes work between LLMs and human annotators

An uncertainty calculation system using GPT-3.5 Turbo with multiple prompt variations to assess confidence

A flexible threshold system for balancing accuracy vs. cost

Integration with their AI data platform (ADAP) through a feature called Model Mate

The system's workflow is particularly noteworthy from an LLMOps perspective:

Initial data processing through LLMs (primarily GPT-3.5 Turbo)

Uncertainty calculation using multiple prompt variations

Automated routing based on entropy/uncertainty thresholds

Human review for high-uncertainty cases

Quality sampling of high-confidence cases

Feedback loop for continuous improvement

One of the most interesting aspects of their implementation is the uncertainty measurement approach. Rather than relying on the model's self-reported confidence scores (which they found to be unreliable), they generate multiple annotations using different prompt variations and measure the consistency of the outputs. This provides a more robust measure of model uncertainty and helps determine which items need human review.
The system demonstrated impressive results in production:

87% accuracy achieved with hybrid approach (compared to 95% with pure human annotation)

62% cost reduction ($450 to $169 per thousand items)

63% reduction in labor time (150 hours to 55 hours)

8 seconds per item for LLM processing vs 180 seconds for human annotation

Their Model Mate feature implementation shows careful consideration of production requirements, including:

Support for multiple LLMs in the same workflow

Real-time integration within existing task designs

Flexibility to handle various data types (text, image, audio, video, geospatial)

Built-in monitoring and validation capabilities

Support for custom routing rules and multi-stage reviews

A particularly interesting production case study involved a leading US electronics company seeking to enhance search relevance data accuracy. The implementation used GPT-4 for multimodal analysis of search queries, product titles, and images. Key findings included:

3-4 percentage point accuracy increase across different components

94% accuracy when combining LLM assistance with human annotation (up from 90%)

Importantly, incorrect LLM suggestions did not negatively impact human accuracy

From an LLMOps perspective, several best practices emerge from this implementation:

Use of multiple prompt variations for robust uncertainty estimation

Implementation of flexible thresholds that can be adjusted based on accuracy/cost requirements

Integration of human expertise at strategic points in the workflow

Regular sampling of high-confidence predictions to ensure quality

Support for multimodal inputs and various data types

Built-in monitoring and evaluation capabilities

The system also addresses common LLM challenges in production:

Hallucination mitigation through human verification

Bias protection through diverse human annotator pools

Quality control through strategic sampling

Cost management through intelligent routing

The implementation demonstrates a sophisticated understanding of both the capabilities and limitations of LLMs in production. Rather than attempting to fully automate annotation, Appen has created a system that leverages the strengths of both AI and human annotators while mitigating their respective weaknesses. This approach appears particularly well-suited to scaling annotation operations while maintaining quality standards.
Looking forward, the system appears designed to accommodate future improvements in LLM technology while maintaining its core benefits. The flexible architecture allows for easy integration of new models and adjustment of routing thresholds as model capabilities evolve.

No.	Vol.	Price($)
1	2500	$1.18

No.

Vol.

Price($)

2500

$1.18

Price($)	Vol.	No.
$1.18	32759	3

Price($)

Vol.

No.

$1.18

32759

No.	Vol.	Price($)
1	2500	1.175
12	226075	1.170
8	266418	1.165
12	91951	1.160
7	80200	1.155

No.

Vol.

Price($)

2500

1.175

226075

1.170

266418

1.165

91951

1.160

80200

1.155

Price($)	Vol.	No.
1.180	32759	3
1.185	54694	3
1.190	148560	7
1.195	115288	10
1.200	185281	28

Price($)

Vol.

No.

1.180

32759

1.185

54694

1.190

148560

1.195

115288

1.200

185281

(20min delay)
Last $1.18		Change 0.015(1.29%)		Mkt cap ! $311.2M

Open	High	Low	Value	Volume
$1.18	$1.20	$1.17	$6.401M	5.405M

[MEDIA] This case study from Appen demonstrates a sophisticated...

HotCopper Highlights: Trigg; Droneshield; Lumos & more

ASX Market Open: Can we squeeze one more record reset into Week 29? | July 18

History repeats for DroneShield investors but carnage so far subdued

Listen: HotCopper Wire Podcast 018 – Potatoes, TACOs, and the $120M helicopter heist

A new NASDAQ listing for a key player in the global critical minerals space

Saturn brings up seventh straight Apollo Hill upgrade on 'impressive' 300-hole drill run

Prospect's Mumbezhi Project puts it at the middle of copper's megatrend. And FQM just bought in

Burgundy plunges -20% after turning to layoffs to navigate record-low diamond prices

Syrah jumps +19% as graphite tariff threat spurs optimism down under

Strategic Energy extends its QLD copper-gold frontier with Diamantina acquisition

Featured News

Buyers (Bids)

Sellers (Offers)

Featured News

[MEDIA] This case study from Appen demonstrates a sophisticated...

Top Stories

HotCopper Highlights: Trigg; Droneshield; Lumos & more

ASX Market Open: Can we squeeze one more record reset into Week 29? | July 18

History repeats for DroneShield investors but carnage so far subdued

Listen: HotCopper Wire Podcast 018 – Potatoes, TACOs, and the $120M helicopter heist

A new NASDAQ listing for a key player in the global critical minerals space

Saturn brings up seventh straight Apollo Hill upgrade on 'impressive' 300-hole drill run

Prospect's Mumbezhi Project puts it at the middle of copper's megatrend. And FQM just bought in

Burgundy plunges -20% after turning to layoffs to navigate record-low diamond prices

Syrah jumps +19% as graphite tariff threat spurs optimism down under

Strategic Energy extends its QLD copper-gold frontier with Diamantina acquisition

Featured News

Buyers (Bids)

Sellers (Offers)

Featured News