We start by simulating a simple case of unbiased cultural transmission. We will detail each step of the simulation and explain the code line by line. In the following chapters, we will reuse most of this initial model, building up the complexity of our simulations.
1.1 Initialising the simulation
Here we will simulate a case where each of N individuals possesses one of two mutually exclusive cultural traits. We denote these alternative traits as A and B. For example, A might be eating a vegetarian diet, and B might be eating a non-vegetarian diet. In reality, traits are seldom as clear-cut (e.g. what about pescatarians?), but models are designed to cut away all the complexity to give tractable answers to simplified situations.
Our model has non-overlapping generations. That is, in each generation all N individuals die and are replaced with N new individuals. Again, this is an extreme but common assumption in evolutionary models. It provides a simple way of simulating change over time. Generations here could correspond to biological generations, but could equally be ‘cultural generations’ (or learning episodes).
Each new individual of each new generation picks a member of the previous generation at random and copies their cultural trait. This is known as unbiased oblique cultural transmission. It is unbiased because traits are copied entirely at random. The term oblique means that members of one generation learn from those of the previous, non-overlapping, generation. This is different from, for example, horizontal cultural transmission, where individuals copy members of the same generation, and vertical cultural transmission, where offspring copy their biological parents.
Given the two traits A and B and an unbiased oblique cultural transmission, how is their average frequency in the population going to change over time? To answer this question, we need to keep track of the frequency of both traits. We will use p to indicate the proportion of the population with trait A, which is simply the number of all individuals with trait A divided by the number of all individuals. Because we only have two mutually exclusive traits in our population, we know that the proportion of individuals with trait B must be (1 − p). For example, if 70% of the population have trait A (p = 0.7), then the remaining 30% must have trait B (i.e. 1 − p = 1 − 0.7 = 0.3).
The output of the model will be a plot showing p over all generations up to the last generation. Generations (or time steps) are denoted by t, where generation one is t = 1, generation two is t = 2, up to the last generation t = tmax.
First, we need to specify the fixed parameters of the model. These are quantities that we decide on at the start and do not change during the simulation. In this model, these are N (the number of individuals) and t_max (the number of generations). Let’s start with N = 100 and t_max = 200:
N <- 100 t_max <- 200
Now we need to create our individuals. The only information we need to keep about our individuals is their cultural trait (A or B). We’ll call population the data structure containing the individuals. The type of data structure we have chosen here is a tibble. This is a more user-friendly version of a data.frame. Tibbles, and the tibble command, are part of the tidyverse library, which we need to call before creating the tibble. We will use other commands from the tidyverse throughout the book.
Initially, we’ll give each individual either an A or B at random, using the sample() command. This can be seen in the following code chunk. The sample() command takes three arguments (i.e. inputs or options). The first argument lists the elements to pick at random, in our case, the traits A and B. The second argument gives the number of times to pick, in our case N times, once for each individual. The final argument says to replace or reuse the elements specified in the first argument after they’ve been picked (otherwise there would only be one copy of A and one copy of B, so we could only assign traits to two individuals before running out). Within the tibble command, the word trait denotes the name of the variable within the tibble that contains the random As and Bs, and the whole tibble is assigned the name population.
library(tidyverse) population <- tibble(trait = sample(c(“A”, “B”), N, replace = TRUE))
We can see the cultural traits of our population by simply entering its name in the R console:
population
## # A tibble: 100 × 1 ## trait ## <chr> ## 1 A ## 2 A ## 3 B ## 4 A ## 5 B ## 6 B ## 7 B ## 8 B ## 9 A ## 10 A ## # … with 90 more rows
As expected, there is a single column called trait containing As and Bs. The type of the column, in this case <chr> (i.e. character), is reported below the name.
A specific individual’s trait can be retrieved using the square bracket notation in R. For example, individual 4’s trait can be retrieved by typing:
population$trait[4] ## [1] “A”
This matches the fourth row in the previous table.
We also need a tibble to record the output of our simulation, that is, to track the trait frequency p in each generation. This will have two columns with tmax rows, one row for each generation. The first column is simply a counter of the generations, from 1 to tmax. This will be useful for plotting the output later. The other column should contain the values of p for each generation.
At this stage we don’t know what p will be in each generation, so for now let’s fill the output tibble with ‘NA’s, which is R’s symbol for Not Available, or missing value. We can use the rep() (repeat) command to repeat ‘NA’ tmax times. We’re using ‘NA’ rather than, say, zero, because zero could be misinterpreted as p = 0, which would mean that all individuals have trait B. This would be misleading, because at the moment we haven’t yet calculated p, so it’s non-existent, rather than zero.
output <- tibble(generation = 1:t_max, p = rep(NA, t_max))
We can, however, fill in the first value of p for our already-created first generation of individuals, held in population. The following command sums the number of As in population and divides it by N to get a proportion rather than an absolute number. It then puts this proportion in the first slot of p in output, the one for the first generation, t = 1. We can again write the name of the tibble, output, to see that it worked.
output$p[1] <- sum(population$trait == “A”) / N output
## # A tibble: 200 × 2 ## generation p ## <int> <dbl> ## 1 1 0.54 ## 2 2 NA ## 3 3 NA ## 4 4 NA ## 5 5 NA ## 6 6 NA ## 7 7 NA ## 8 8 NA ## 9 9 NA ## 10 10 NA ## # … with 190 more rows
This first value of p will be close to 0.5, meaning that around 50 individuals have trait A and 50 have trait B. Even though sample() returns either trait with equal probability, this does not necessarily mean that we will get exactly 50 As and 50 Bs. This happens with simulations and finite population sizes: they are probabilistic (or stochastic), not deterministic. Analogously, flipping a coin 100 times will not always give exactly 50 heads and 50 tails. Sometimes we will get 51 heads, sometimes 49, and so on. To see this in our simulation, you can rerun all of the previous code and you will get a different p.