Embodied AI Data Development Platform: Appen's RoboGO, page-8

  1. 127 Posts.
    lightbulb Created with Sketch. 53
    Another interesting case study recently shared by Appen..... Great to see them providing services to Cohere- founded by a ex-Google Brain researcher. And yes, Cohere is still a strategic partner of Google. No doubt Appen still has a foot in the market. They're still part of the wider Google AI ecosystem in some way.Will be good to see more engagement like this. I am sure the engagement below will be encouraging for Appen followers.......
    https://www.appen.com/case-studies/cohere-fine-tuning-for-enterprise

    Introduction

    Aligning LLM performance with human values is a key differentiator in today’s competitive AI market. However, operationalising human feedback at scale while maintaining high-quality inputs and low latency poses several challenges. To address this growing demand, Cohere built PANDA Plus, a program for preference data generation and reward signal development, and partnered with Appen to source expert annotators, support real-time model feedback, and deliver human-centric for both experimental and production fine-tuning. Appen enabled scalable, high-quality data generation and real-time annotation for PANDA Plus — supporting Cohere in improving their generative Large Language Model, Command.

    About Cohere

    Cohere is the leading security-first enterprise AI company. They build cutting-edge AI models and end-to-end solutions designed to solve real-world business problems. Their flagship generative LLM series, optimised for secure enterprise deployments, is called Command. Leading enterprises in regulated industries trust Cohere with customer-facing and internal support use cases, so it is essential that the model produces helpful, safe, and brand-aligned responses across diverse domains from retail to banking. Maintaining this high standard requires continual reinforcement learning and fine-tuning with reliable, domain-relevant human feedback.

    To accelerate Command’s performance, Cohere developed Preference Annotation Data Acquisition Plus Supervised Fine-Tuning (SFT), also known as PANDA Plus. This program improves model performance by collecting structured human preference data and editing the preferred response to better satisfy Command’s principles and the user’s instructions. Cohere collaborated with Appen to scale this system across live models while maintaining quality and adaptability.

    1. Project Goals

    PANDA Plus integrates real-time and editing into Cohere’s training loop. Each task presents annotators with two model completions for a given prompt and asks them to:

    Choose the more helpful or aligned responseOptionally edit a completion to better reflect ideal model behaviourProvide justification and qualitative feedbackComplete completion rewrites

    Cohere partnered with Appen to:

    Ensure consistent, high-quality annotations from contributors with LLM experienceReduce latency for model feedback using Appen’s real-time delivery systemSupport dynamic task variants (e.g. chat continuation, open-ended instruction-following)Enable both experimental and production-ready training cycles

    2. Challenges

    A. Finding Qualified Annotators

    Cohere required annotators familiar with LLMs who could provide the best quality data and efficient onboarding. Appen provided Cohere with a vetted pool of 200 US-English language contributors, prioritising prior LLM/RLHF experience.

    B. Prioritising Quality over Volume

    Unlike traditional annotation pipelines, PANDA Plus emphasised handling time and fidelity over throughput. This required tuning incentive structures and managing contributor pacing to optimise for thoughtful, context-aware edits.

    C. Real-Time Feedback Loop

    PANDA Plus required a live connection to Command’s API, enabling contributors to evaluate model outputs in near-real time. Appen adapted its to interface with PANDA Plus, including dynamic preambles, prompt routing, and response comparison.

    D. Supporting Model Evolution

    Cohere fine-tuned a production-grade model using Appen-generated preference data, while parallel PANDA Plus tasks fed into ongoing experimental variants. This required Appen to maintain annotation consistency across shifting model checkpoints, without compromising data structure or quality.

    3. Solutions

    Step One: Expert Contributor Pipeline

    Appen assembled a domain-qualified contributor pool tailored for PANDA Plus. Contributors were trained to evaluate:

    Usefulness, safety, and toneInstruction adherence and domain relevanceOpportunities for refinement or escalation

    Appen contributors performed:

    A/B preference rankingMulti-turn chat continuation scoringFreeform feedback for tooling and prompt iterationComplex prompt and preamble writingCompletion re-writing for “perfect” SFT inputs

    Step Two: Tooling and Real-Time Delivery

    The PANDA Plus workflow was delivered through a custom deployment of Appen’s , with enhancements including:

    Direct integration with Command’s inference endpointMulti-turn prompt/response workflowsStructured fields for ranking, editing, and justificationWeekly batch summaries and daily live data streams

    Appen contributors logged over 2,400 expert hours in 12 weeks, enabling Command’s training loop to incorporate human feedback in near-real time.

    4. Results

    High-Confidence Fine-Tuning Data

    PANDA Plus data contributed directly to the Command model, with multiple fine-tuning runs leveraging human preference signals collected by Appen.

    Support for Experimental Training

    Beyond production, PANDA Plus also supported research-grade experimentation offering long-term value for model iteration.

    Contributor Retention and Quality

    Appen maintained a consistent contributor pool over the project’s 12-week duration, ensuring stable annotation behaviour and predictable performance across variants.

    System-Level Impact

    By integrating real-time model interaction, edit-based supervision, and crowd feedback into PANDA Plus, Cohere advanced its alignment pipeline — with Appen playing a key role in turning subjective preference into structured .

    Conclusion

    Cohere’s collaboration with Appen on PANDA Plus is a model example of enterprise-scale preference training, including:

    Skilled annotators with LLM contextCustom tooling for real-time feedbackStructured editing and justificationIntegration with both research and production fine-tuning loops

    As frontier model builders look to scale human feedback efficiently and responsibly, PANDA Plus demonstrates how data partnerships can drive both model performance and alignment quality — without sacrificing control, safety, or enterprise readiness.

 
Add to My Watchlist
What is My Watchlist?
A personalised tool to help users track selected stocks. Delivering real-time notifications on price updates, announcements, and performance stats on each to help make informed investment decisions.
(20min delay)
Last
$1.19
Change
0.030(2.59%)
Mkt cap ! $313.9M
Open High Low Value Volume
$1.18 $1.20 $1.17 $2.054M 1.733M

Buyers (Bids)

No. Vol. Price($)
8 88761 $1.19
 

Sellers (Offers)

Price($) Vol. No.
$1.19 28865 6
View Market Depth
Last trade - 10.59am 18/07/2025 (20 minute delay) ?
APX (ASX) Chart
arrow-down-2 Created with Sketch. arrow-down-2 Created with Sketch.