![]() |
Biodata Generator 1.0.0
Procedural human physical characteristics generation for C++23
|
This guide covers every feature of the biodata-generator library in detail. For a quick overview, see the README. For the full API reference, run doxygen Doxyfile from the repository root and open doc/api/html/index.html.
dasmig/biodatagen.hpp and dasmig/random.hpp into your include path.resources/ folder (containing full/ and/or lite/ subdirectories with biodata.tsv) so it is accessible at runtime.-std=c++23.The library ships two dataset tiers:
| Tier | Enum | Countries | Description |
|---|---|---|---|
| lite | dasmig::dataset::lite | ~111 | Countries with best phenotypic data coverage |
| full | dasmig::dataset::full | ~197 | All countries; gaps filled with regional defaults |
The singleton (bdg::instance()) auto-probes these paths on first access:
resources/lite/biodata.tsvresources/full/biodata.tsv../resources/lite/biodata.tsvbiodata-generator/resources/lite/biodata.tsvEvery biodata object contains:
| Field | Type | Description |
|---|---|---|
country_code | std::string | ISO 3166-1 alpha-2 code |
bio_sex | dasmig::sex | Biological sex (male/female) |
height_cm | double | Height in centimetres |
weight_kg | double | Weight in kilograms |
bmi | double | Body mass index (kg/m²) |
eyes | dasmig::eye_color | Eye colour (blue/intermediate/brown) |
hair | dasmig::hair_color | Hair colour (black/brown/blond/red) |
skin | dasmig::skin_type | Fitzpatrick skin type (I–VI) |
blood | dasmig::blood_type | ABO/Rh blood type |
hand | dasmig::handedness | Handedness (right/left) |
Additionally:
seed() — the random seed used for this generation (for replay).to_string() — human-readable summary.operator std::string() and operator<<.All phenotypic traits use typed enumerations. String conversion helpers are static methods on biodata:
| Value | Label |
|---|---|
male | "male" |
female | "female" |
| Value | Label | Description |
|---|---|---|
blue | "blue" | Blue eyes |
intermediate | "intermediate" | Green, hazel, amber |
brown | "brown" | Brown eyes |
| Value | Label |
|---|---|
black | "black" |
brown | "brown" |
blond | "blond" |
red | "red" |
Fitzpatrick scale types I through VI, based on UV sensitivity and melanin content.
| Value | Label |
|---|---|
O_pos | "O+" |
A_pos | "A+" |
B_pos | "B+" |
AB_pos | "AB+" |
O_neg | "O-" |
A_neg | "A-" |
B_neg | "B-" |
AB_neg | "AB-" |
| Value | Label |
|---|---|
right | "right" |
left | "left" |
Pass an ISO 3166-1 alpha-2 country code:
Different countries produce different distributions. For example, the Netherlands produces taller individuals on average than Guatemala, and Japan produces predominantly brown-eyed individuals while Scandinavian countries produce more blue-eyed individuals.
Three levels of determinism:
Both seed() and unseed() return *this for chaining:
Each call to get_biodata() runs this pipeline:
std::normal_distribution with country/sex-specific mean and SD. Clamped to ±4σ.std::lognormal_distribution parameterised from country/sex mean and global SD (4.5). Clamped to [14, 55].BMI × (height_m)².std::discrete_distribution over 3 categories.std::discrete_distribution over 4 categories.std::discrete_distribution over 6 Fitzpatrick types.std::discrete_distribution over 8 ABO/Rh groups.std::bernoulli_distribution from country-specific left-handedness rate.Note: When sex is forced via parameter the Bernoulli draw in step 1 is skipped, so the internal RNG sequence diverges from an unfixed call with the same seed that happened to land on the same sex. This means
get_biodata("US", seed)andget_biodata("US", sex::male, seed)produce different height/BMI values even when the unfixed call would have chosen male.
The resource TSV files are built from raw data using Python scripts in scripts/:
fetch_all_data.py — Downloads height, eye, hair, skin, handedness data.parse_blood_types.py — Parses blood type data from Wikipedia.fetch_bmi.py — Discovers BMI indicator IDs from OWID.prepare_biodata.py — Merges all sources, maps ISO codes, fills gaps with regional defaults, and outputs the final TSV files.validate_data.py — Sanity checks on the output TSV files.Each bdg instance is independent. Concurrent calls to get_biodata() on the same instance require external synchronisation (e.g. a std::mutex).
The singleton (bdg::instance()) returns a shared instance — wrap calls in a mutex if used from multiple threads.
| Exception | When |
|---|---|
std::runtime_error("No biodata loaded. Call load() first.") | Any generation call before load() |
std::invalid_argument("Unknown country code: XX") | Unknown ISO alpha-2 code |