For AI labs · research teams · model builders

Consented humans.
Trainable AI.

The consented dataset your model has been missing. A growing registry of identity-verified humans who have explicitly licensed their likeness for AI training. Demographically representative. Fully consented. Provenance you can prove.

Start a partnership → See what's available

Provenance per sample Revocation-aware delivery Multi-modal data formats

Registry snapshot UPDATED 17 MAY 2026

Verified humans

2,500+

3 continents

Modalities

Image · Video · Audio · 3D · Motion

Age range

18–82

Median 34

Territories

UK, EU, US lead

Per-sample provenance

100%

Cryptographic chain

Revocation SLA

72h

Custom available

The thesis

Scraped data has a half-life.

For most of the last decade, the bottleneck in human-likeness AI wasn't licensing — it was compute. That's reversed. Models are now trained faster than the legal layer underneath them can be defended.

Three things are happening simultaneously: courts are starting to award damages on unauthorised likeness use; regulators are demanding training-data provenance for foundation models; and the public is realising that "the model was trained on millions of images" means "the model was trained on me."

Twinnin's bet is simple. The next generation of human-likeness models will be built on data with provenance, consent, and ongoing payment. Not because labs become altruistic — but because models without that paper become commercially unusable.

"The era of free scraping is closing. The era of licensed, consented training data is opening. Twinnin is positioned at exactly that inflection."

SFC Capital · seed lead · April 2026

"Foundation model labs are already losing deals to procurement teams asking for chain-of-consent paper. The biggest labs see this coming. The smaller ones don't yet."

Katrien Grobler · founder · Deadline interview

Regulation

The next AI regulation is about your training set.

The EU AI Act enters full enforcement on 2 August 2026. From that date, foundation models distributed in the EU must disclose training data sources, demonstrate consent, and respond to data subject withdrawal requests within defined timelines.

The US is twelve to eighteen months behind, not five years. Federal NO FAKES, California §927, and state-level training data disclosure laws are coming. Models built on scraped data will face active commercial restrictions. Models built on licensed data will not.

What you get

Six properties of the dataset. All non-negotiable.

This is what every sample carries — whether you're licensing 100 humans for a research preview or 25,000 for a foundation model training run. The properties are the product.

Per-sample provenance

Every image, frame, audio clip, and motion capture sample carries cryptographic provenance back to the consenting human. Verifiable by anyone, anywhere, without contacting us.

C2PA · SHA-256 · ED25519

Explicit training consent

Twins opt into AI training as a separate, granular consent — not bundled with commercial licensing. Consent specifies model type, retention, and downstream redistribution rights.

Granular · auditable · time-stamped

Revocation-aware delivery

When a twin revokes consent, you're notified within the agreed SLA. We track which samples are revoked, log your handling, and give you a clear audit trail for regulator response.

Webhook · SLA-defined

Multi-modal coverage

Images, video sequences, audio (when separately consented), motion capture, and 3D scans. Same registry, same identity, modalities you can mix and match for multi-modal model work.

5 modalities · expanding

Demographic balancing tools

Filter and sample by age, gender, ethnicity, geography, height, body type, and other documented attributes. Hit your target representation; reduce well-known training set biases.

Self-reported · structured

Ongoing compensation infrastructure

We pay the humans whose data trains your models. Per-sample, per-training-run, or revenue-share on commercial deployment — whatever structure your deal needs. The humans get paid. That's the deal.

Per-sample or revenue-share

Coverage

Sampling that doesn't bias your model.

Most training datasets are accidentally a sample of who had cameras pointed at them — overrepresenting some demographics by orders of magnitude. We're building Twinnin to be deliberately representative, not accidentally biased.

Right now we're 2,500+ verified humans across 18 territories, growing fastest in the UK, EU, and US. We track gaps actively and run targeted outreach in underrepresented categories — older adults, non-Western markets, disabled humans, and rare phenotypes that scraped datasets systematically miss.

If your model needs a specific demographic mix, ask. If we don't have the coverage today, we'll tell you, and tell you when we will.

Registry coverage · May 2026 SELF-REPORTED

Age 18–34 38% · 950 verified

Age 35–54 42% · 1,050 verified

Age 55+ 20% · 500 verified · growing

UK / EEA 71% · primary market

North America 17% · growing

Other territories 12% · actively expanding

Deal structure

Three ways to work with the data.

We structure deals around how you train and what you ship. Per-sample for evaluation work. Cohort-based for production training. Strategic for foundation-model partnerships. All three carry the same provenance, consent, and compensation guarantees.

Evaluation

Sample

For research previews, benchmarking, and proof-of-concept work. Limited cohort, limited duration, non-production use.

Sample size100–500 humans

Modalities1–2

Duration90 days

UseNon-production

Full provenance per sample
72-hour revocation SLA
Use-case attestation required
Per-sample compensation to humans

Request a sample →

Production

Cohort

Production training runs. Custom-built cohorts to your demographic and modality spec. Ongoing access, ongoing payment to the humans.

Cohort size500–25,000

ModalitiesAll available

Duration12–36 months

UseTraining + inference

Everything in Sample
Demographic balancing service
Custom revocation SLA available
Webhook-based change feed
Revenue share to humans on deployment
Quarterly compliance reporting

Discuss a cohort →

Foundation

Strategic

Multi-year partnerships for labs building foundation models on consented human data. Exclusivity options, co-investment, bespoke terms.

ScopeCustom

ExclusivityAvailable

Duration3+ years

UseNegotiated

Everything in Cohort
Co-investment in registry growth
Targeted demographic recruitment
Dedicated technical liaison
Joint regulatory engagement
Co-marketing options

Book a call →

Provenance

Every sample, cryptographically signed.

Every image, video frame, and audio clip in your dataset arrives with a cryptographic chain that proves it came from a consenting human at a specific moment, with a specific licence, with a specific use-case attestation from you.

The chain is open. Anyone — your auditor, your insurer, a regulator, your distribution partner — can verify a sample independently without contacting Twinnin. We're not the trust gatekeeper. The cryptography is.

This is what training data should look like. Provenance-first, regulator-ready, distributable, and durable across the next decade of AI compliance regimes.

SAMPLE PROVENANCE tw_sample_a4f2_9821_00184

subject_id "tw_a4f2_9821"

subject_consent "granted · 2026-04-12T09:14:22Z"

consent_scope "training+inference"

consent_status "active"

modality "image/jpeg"

capture_date "2026-04-14T11:02:00Z"

licensee "lab_xxxxx"

use_attestation "foundation_training_v3"

hash sha-256: a4f2...c918

signature ed25519: 8j2k...1184

revocation_status "none · checked 2026-05-17"

In production

What labs are building with Twinnin.

Representative examples of how research and engineering teams are using the registry today. We're under NDA on specific partners — names available on a signed mutual NDA.

Foundation models

Improving facial diversity in image-generation models

Mid-sized lab using a 4,000-human cohort to reduce demographic bias in a next-generation text-to-image model. Balanced across age, ethnicity, and territory.

Cohort 4,000 humans

Modality Image · multi-pose

Duration 24 months

Comp model Per-sample + rev share

Synthetic actors

Character continuity for video-generation pipelines

Video AI lab licensing high-fidelity multi-modal data on a smaller cohort to support consistent character generation across long-form output. Each human is a "named character" in the model's latent space.

Cohort 200 humans

Modality Video + audio + 3D

Duration 36 months

Comp model Revenue share on deploy

Speech models

Voice cloning with verified speaker consent

Speech AI lab using audio-consented twins to train a voice generation model that ships with on-deployment consent verification — no synthetic voice can run without a registered speaker.

Cohort 800 humans

Modality Audio + paralinguistic

Duration 18 months

Comp model Per-deployment

Evaluation

Bias-testing benchmark for facial recognition

Academic-industrial lab building a fairness benchmark for face recognition systems. Balanced cohort across all major demographic axes, with consented permission to publish results.

Cohort 1,200 humans

Modality Image · standardised

Duration Open benchmark

Comp model Per-sample fixed

Research FAQ

The questions research and legal teams ask first.

What happens if a twin revokes consent during training?

We notify you within the agreed SLA. For pre-training samples, you remove them from the training set and document the removal in your audit log; we provide tooling to support this. For samples already incorporated into model weights, the deal you sign defines downstream obligations — typically a good-faith effort with documented mitigation, not retroactive model deletion. The honesty: model weights are harder than data, and we are transparent about that in every contract.

Is the data structured for our training pipeline?

We deliver in standard ML-friendly formats — Parquet, TFRecord, WebDataset, raw JPEG/PNG/WAV depending on modality. Metadata as JSON or Avro alongside each sample. Cryptographic provenance attached as sidecar files. We work with your data engineering team on bespoke pipelines for Strategic partnerships.

Can we license exclusively?

Yes, but it's more nuanced than yes/no. You can license a cohort exclusively for a specific use-case (e.g. "only this lab trains text-to-image models on this cohort for 24 months") without locking the humans out of other deal types. The humans retain their separate commercial licensing rights. Full exclusivity across all use-cases is available on Strategic plans and priced accordingly.

How do you handle synthetic data we generate from licensed source?

Synthetic outputs derived from licensed source data fall under the licence envelope of the source. The licence specifies whether synthetic derivatives are permitted, how they can be distributed, and whether they require their own provenance trail. For most cohort deals, synthetic outputs are permitted with disclosure obligations. Strategic deals can negotiate cleaner terms.

What about minors? We won't touch under-18s — confirm you don't either.

Twins under 18 are not available for AI training data licensing. Full stop. The registry has under-18 accounts (parental-guardian managed) for narrow commercial licensing categories only — never for training data. This is a hard line we won't cross.

How do twins get paid?

Per-sample compensation on Evaluation deals. Per-sample plus revenue share on Cohort deals (revenue share triggers when the trained model generates revenue or is deployed at scale). Strategic deals support bespoke compensation — fixed annual payments, milestone-based, or hybrid. Humans always get paid. That's the deal — and that's what makes the deal defensible.

Who else have you worked with?

We're under NDA on specific partners. We can reference named partners under mutual NDA in a first call. The shape of partners: research labs in foundation modelling, video generation, voice cloning, and fairness benchmarking. Mostly in the UK, EU, and US.

What's the engagement process?

Evaluation: usually two calls, then a signed evaluation agreement, then sample delivery in 1–2 weeks. Cohort: mutual NDA, then a detailed scoping conversation (cohort spec, demographic requirements, modality mix, compensation structure), then a 4–6 week negotiation to signed contract and first cohort delivery. Strategic: timed to your research roadmap. We start every relationship at the founder level.

Consented humans.Trainable AI.

Scraped data has a half-life.

The next AI regulation is about your training set.

Six properties of the dataset. All non-negotiable.

Per-sample provenance

Explicit training consent

Revocation-aware delivery

Multi-modal coverage

Demographic balancing tools

Ongoing compensation infrastructure

Sampling that doesn't bias your model.

Three ways to work with the data.

Sample

Cohort

Strategic

Every sample, cryptographically signed.

What labs are building with Twinnin.

Improving facial diversity in image-generation models

Character continuity for video-generation pipelines

Voice cloning with verified speaker consent

Bias-testing benchmark for facial recognition

The questions research and legal teams ask first.

What happens if a twin revokes consent during training?

Is the data structured for our training pipeline?

Can we license exclusively?

How do you handle synthetic data we generate from licensed source?

What about minors? We won't touch under-18s — confirm you don't either.

How do twins get paid?

Who else have you worked with?

What's the engagement process?

Build the next model on data that holds up.

Consented humans.
Trainable AI.