The consented dataset your model has been missing. A growing registry of identity-verified humans who have explicitly licensed their likeness for AI training. Demographically representative. Fully consented. Provenance you can prove.
For most of the last decade, the bottleneck in human-likeness AI wasn't licensing — it was compute. That's reversed. Models are now trained faster than the legal layer underneath them can be defended.
Three things are happening simultaneously: courts are starting to award damages on unauthorised likeness use; regulators are demanding training-data provenance for foundation models; and the public is realising that "the model was trained on millions of images" means "the model was trained on me."
Twinnin's bet is simple. The next generation of human-likeness models will be built on data with provenance, consent, and ongoing payment. Not because labs become altruistic — but because models without that paper become commercially unusable.
The EU AI Act enters full enforcement on 2 August 2026. From that date, foundation models distributed in the EU must disclose training data sources, demonstrate consent, and respond to data subject withdrawal requests within defined timelines.
The US is twelve to eighteen months behind, not five years. Federal NO FAKES, California §927, and state-level training data disclosure laws are coming. Models built on scraped data will face active commercial restrictions. Models built on licensed data will not.
This is what every sample carries — whether you're licensing 100 humans for a research preview or 25,000 for a foundation model training run. The properties are the product.
Most training datasets are accidentally a sample of who had cameras pointed at them — overrepresenting some demographics by orders of magnitude. We're building Twinnin to be deliberately representative, not accidentally biased.
Right now we're 2,500+ verified humans across 18 territories, growing fastest in the UK, EU, and US. We track gaps actively and run targeted outreach in underrepresented categories — older adults, non-Western markets, disabled humans, and rare phenotypes that scraped datasets systematically miss.
If your model needs a specific demographic mix, ask. If we don't have the coverage today, we'll tell you, and tell you when we will.
We structure deals around how you train and what you ship. Per-sample for evaluation work. Cohort-based for production training. Strategic for foundation-model partnerships. All three carry the same provenance, consent, and compensation guarantees.
For research previews, benchmarking, and proof-of-concept work. Limited cohort, limited duration, non-production use.
Production training runs. Custom-built cohorts to your demographic and modality spec. Ongoing access, ongoing payment to the humans.
Multi-year partnerships for labs building foundation models on consented human data. Exclusivity options, co-investment, bespoke terms.
Every image, video frame, and audio clip in your dataset arrives with a cryptographic chain that proves it came from a consenting human at a specific moment, with a specific licence, with a specific use-case attestation from you.
The chain is open. Anyone — your auditor, your insurer, a regulator, your distribution partner — can verify a sample independently without contacting Twinnin. We're not the trust gatekeeper. The cryptography is.
This is what training data should look like. Provenance-first, regulator-ready, distributable, and durable across the next decade of AI compliance regimes.
Representative examples of how research and engineering teams are using the registry today. We're under NDA on specific partners — names available on a signed mutual NDA.
Mid-sized lab using a 4,000-human cohort to reduce demographic bias in a next-generation text-to-image model. Balanced across age, ethnicity, and territory.
Video AI lab licensing high-fidelity multi-modal data on a smaller cohort to support consistent character generation across long-form output. Each human is a "named character" in the model's latent space.
Speech AI lab using audio-consented twins to train a voice generation model that ships with on-deployment consent verification — no synthetic voice can run without a registered speaker.
Academic-industrial lab building a fairness benchmark for face recognition systems. Balanced cohort across all major demographic axes, with consented permission to publish results.
If you're training on human likeness today, you'll need licensed training data tomorrow. Start the conversation now and lock in cohort terms before the regulation tightens further.