Title: | Forward-in-Time Simulation and Tallying of Pairwise Relationships |
---|---|
Description: | Provides an R wrapper around the program 'spip' (<https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1471-8286.2005.00884.x>), a C program for the simulation of pedigrees within age-structured populations with user-specified life histories. Also includes a variety of functions to parse 'spip' output to compile information about related pairs amongst simulated, sampled individuals, to assess the feasibility and potential accuracy of close-kin mark-recapture (CKMR). Full documentation and vignettes are mirrored at <https://eriqande.github.io/CKMRpop/index.html> and can be read online there. |
Authors: | Eric C. Anderson [aut, cre] |
Maintainer: | Eric C. Anderson <[email protected]> |
License: | CC0 |
Version: | 0.1.3.9999 |
Built: | 2024-11-22 04:53:06 UTC |
Source: | https://github.com/eriqande/ckmrpop |
This merely converts a matrix to a tibble that can be plotted easily using ggplot.
amm2tibble(M)
amm2tibble(M)
M |
an ancestry-match matrix |
amm2tibble()
returns a tibble with three columns:
x
: the 1-based index of the row of input matrix,
y
: the 1-based index of the column of the input matrix,
amm
: the logical value (TRUE/FALSE) of the (x,y)-th cell of the input matrix.
# convert one of the simple example AMMs to a tibble amm2tibble(example_amms$Father_Offspring_2gen)
# convert one of the simple example AMMs to a tibble amm2tibble(example_amms$Father_Offspring_2gen)
This counts up all the different types of pairwise relationships and
makes a plot (or multiple pages of them). For an example, see
the Vignette: vignette("species_1_simulation", package = "CKMRpop")
.
count_and_plot_ancestry_matrices( Pairs, nrow = 6, ncol = 5, white_ball_size = 1 )
count_and_plot_ancestry_matrices( Pairs, nrow = 6, ncol = 5, white_ball_size = 1 )
Pairs |
a tibble like that returned by |
nrow |
the number of rows to plot per page |
ncol |
the number of columns to plot per page |
white_ball_size |
the size of the white dot plotted on top of the primary shared ancestors' cells. |
Returns a list with the following components:
highly_summarised
: a tibble reporting counts of different types of relationships with
the columns:
dom_relat
: The name of the dominant relationship (e.g. Si, PO, A, etc).
max_hit
: The number of primary shared ancestors.
n
: The number of occurrences of this type of relationship.
dr_counts
: A tibble with the counts of all the different, unique ancestry match matrices
observed, with the following columns:
dom_relat
: The dominant relationship type
dr_hits
: list column with the number of primary shared ancestors in the upper and lower quadrants
of the ancestry match matrix (values are repeated for symmetrical relationships).
max_hit
: the number of primary shared ancestors.
anc_match_matrix
: a list column of ancestry match matrices.
n
: The number of pairs with this type of ancestry match matrix
tot_dom
: The total number of pairs within the given dominant relationship category.
ID
: A unique, properly sorting name for this category for placing a title on
facets of plots of these ancestry match matrices.
dr_plots
: a list named by the dominant relationship. Each component is a ggplot
object which is a plot of the ancestry match matrices, faceted by their ID
's as
given in dr_counts
.
anc_mat_counts
: A tibble summarizing the counts of different ancestry match matrices
without regard to dominant relationship type, etc. Rows are arranged in descending order
of number of pairs observed with each ancestry match matrix. It has the columns:
anc_match_matrix
: a list column of ancestry match matrices.
n
: the number of such matrices observed.
anc_mat_plots
: a list of ggplot pages. These are plots faceted by ancestry match
matrix of all ancestry match matrices observed, ordered by number of occurrences (as
given in anc_mat_counts
).
This just tallies up the information from the pedigree. It will plot things faceted by pop (over rows) and sexes (over columns).
count_and_plot_mate_distribution(P)
count_and_plot_mate_distribution(P)
P |
the pedigree from the simulation, like that returned in the |
A list with two components with names:
mate_counts
: A tibble with information about the number of mates with which
a parent produced offspring each year. It has the columns:
sex
: the sex of this parent
year
: the year during which the mating occurred
pop
: the population this parent was in
parent
: the ID of the parent
num_offs
: the number of offspring this parent had in total
num_mates
: the number of mates this parent had
plot_mate_counts
: a ggplot object, faceted on a grid by population in columns
and sex in rows. The x-axis is the number of offspring (in a season), the y-axis is the
number of mates in a season, and the fill color of the grid gives the number of parents
with that number of offspring and mates.
result <- count_and_plot_mate_distribution(three_pops_no_mig_slurped_results$pedigree) # have a look at the results: result$mate_counts result$plot_mate_counts
result <- count_and_plot_mate_distribution(three_pops_no_mig_slurped_results$pedigree) # have a look at the results: result$mate_counts result$plot_mate_counts
This discards individuals from the sample, randomly, until the desired number of samples is achieved, then it returns only those pairs in which both members are part of the retained samples.
downsample_pairs(S, P, n)
downsample_pairs(S, P, n)
S |
the tibble of samples with columns at least of |
P |
the tibble of pairs. Typically this will be what has been returned from
|
n |
The desired number of individuals (or instances, really, see below) to retain in the sample. |
This returns a list with two components as follows:
ds_samples
: A tibble like S
except having randomly removed individuals
so as to only have n left.
ds_pairs
: A tibble like P
except having removed any pairs that
include individuals that were not retained in the sample.
# prepare some input S <- three_pops_with_mig_slurped_results$samples P <- compile_related_pairs(three_pops_with_mig_slurped_results$samples) result <- downsample_pairs(S, P, n = 500) # print the result result
# prepare some input S <- three_pops_with_mig_slurped_results$samples P <- compile_related_pairs(three_pops_with_mig_slurped_results$samples) result <- downsample_pairs(S, P, n = 500) # print the result result
compile_related_pairs()
, but with zero rowsThis is included here so that it can be used as the return value for
compile_related_pairs()
when there are no relatives.
empty_crp
empty_crp
Just created it from the example
This is a list of matrices with names that describe what they represent. The names are Relationship_Number-of-Generations, like: "Unrelated_2gen", "Self_2gen", "Unrelated_3gen", "Unrelated_4gen", "Unrelated_1gen", "Father_Offspring_2gen", "FullSibling_2gen", "PatHalfSibling_2gen", "FullCousin_2gen", "HalfAuntNiece_2gen".
example_amms
example_amms
I just made these for some illustrative figures in the manuscript about this R package.
Easy to do, but quicker to have it wrapped up in a plot.
ggplot_census_by_year_age_sex(census)
ggplot_census_by_year_age_sex(census)
census |
a tibble of census counts with columns |
ggplot_census_by_year_age_sex()
returns a ggplot object which is a
stacked barplot with year on the x-axis,
counts on the y-axis with fill mapped to age. It is facet-gridded
with sex in the columns and populations in the rows.
# A single population example g <- ggplot_census_by_year_age_sex(species_1_slurped_results$census_postkill) # a three-population example g3 <- ggplot_census_by_year_age_sex(three_pops_with_mig_slurped_results$census_postkill)
# A single population example g <- ggplot_census_by_year_age_sex(species_1_slurped_results$census_postkill) # a three-population example g3 <- ggplot_census_by_year_age_sex(three_pops_with_mig_slurped_results$census_postkill)
This is stored as package data so that the vignette can be written even if spip is not installed on the system.
growth_0.01_results
growth_0.01_results
Simulation results
This is stored as package data so that the vignette can be written even if spip is not installed on the system.
growth_n0.005_results
growth_n0.005_results
Simulation results
Just a simple AMM to use in some examples
half_first_cousin_amm
half_first_cousin_amm
Simply wrote this down.
This checks the operating system and installs the correct version (either Darwin or Linux for Mac or Linux, respectively.) To install the spip binary this function downloads it from its GitHub site. It also installs a windows implementation of awk.
install_spip(Dir = tempfile())
install_spip(Dir = tempfile())
Dir |
the directory to install spip into. Because of restrictions
on functions writing to the user's home filespace, this is set, by default,
to a temporary directory. But to really use this function to install spip,
this parameter must be set to |
No return value. Called for side effect of installing the 'spip' binary.
## Not run: install_spip(Dir = system.file(package = "CKMRpop")) ## End(Not run)
## Not run: install_spip(Dir = system.file(package = "CKMRpop")) ## End(Not run)
Here we take the survival rates for females and males and the sex ratio, as well as the annual new cohort size (assumed constant), and we make a leslie-like matrix to compute the stable age distribution.
leslie_from_spip(P, C, growth_rates = NULL, T = NA)
leslie_from_spip(P, C, growth_rates = NULL, T = NA)
P |
a named list of the spip parameters. |
C |
the constant size of the newborn cohort each year |
growth_rates |
If not NULL, a vector of desired growth rates. For each one of these the function will return a leslie matrix, the stable age distribution, the initial number of males and females at that age distribution, and also the cohort sizes until time T (if T is not NA) |
T |
the number of years to compute the cohort sizes for in the growing (or shrinking) populations. |
For growing or shrinking populations, this is currently correct only when the male and female survival rates are the same, and hence the stable age distributions of the two sexes are the same as well.
This function returns a list with the following components:
stable_age_distro_fem
: a vector of the expected number of females in each age
group of 0 up to MaxAge-1 once the stable age distribution has been reached. Note that
this corresponds to the PREKILL_CENSUS from spip. In this vector, the size of the
MaxAge group is left out because this is how it is needed to be to insert into
the --initial-males
and --initial-females
options in spip(). If you want the
size of all age classes, use the output list component
stable_age_distro_fem_with_max_age_class
, described below.
stable_age_distro_male
: same as above, but for males.
stable_age_distro_fem_with_max_age_class
: The expected number of females from age 0 to
MaxAge once the stable age distribution has been reached.
stable_age_distro_male_with_max_age_class
: same as above, but for females.
female_leslie_matrix
: The Leslie matrix implied by the spip parameters in P.
result <- leslie_from_spip(species_1_life_history, 300) # print the result list: result2 <- leslie_from_spip( species_1_life_history, 300, growth_rates = c(-.02, 0, 0.04), T = 60 ) # With several growth rates, including 0.
result <- leslie_from_spip(species_1_life_history, 300) # print the result list: result2 <- leslie_from_spip( species_1_life_history, 300, growth_rates = c(-.02, 0, 0.04), T = 60 ) # With several growth rates, including 0.
This is an integer vector of the num_generations
required in the ancestry vectors
in order to observe each type of relationship in the ancestry match matrix. For example,
if num_generations < 1 you can't see PO
or Si
. This is named according to the
relationship zone names.
I simply defined these
For illustration purposes, if you want to simply plot an ancestry matrix (or several) to show particular values, then this is the handy function for you.
plot_amm_from_matrix(X)
plot_amm_from_matrix(X)
X |
input tibble with a factor or character column |
plot_amm_from_matrix()
returns a ggplot object: each facet is an image of the
ancestry match matrix. It is facet-wrapped over the values in the ID column of X
.
# get some input: all the 2-generation AMMs in `example_amms` X <- example_amms[stringr::str_detect(names(example_amms), "2gen$")] %>% tibble::enframe(name = "ID", value = "anc_match_matrix") # plot those g <- plot_amm_from_matrix(X) + ggplot2::facet_wrap(~ ID)
# get some input: all the 2-generation AMMs in `example_amms` X <- example_amms[stringr::str_detect(names(example_amms), "2gen$")] %>% tibble::enframe(name = "ID", value = "anc_match_matrix") # plot those g <- plot_amm_from_matrix(X) + ggplot2::facet_wrap(~ ID)
This is a simple wrapper for some tidygraph/ggraph functions that will let you find the connected components of a graph in which the related pairs are connected by edges. It also makes a plot of them.
plot_conn_comps(Pairs)
plot_conn_comps(Pairs)
Pairs |
the tibble that comes out of |
Note that it appears that the 'ggraph' package must be loaded for the plot output of this function to print correctly.
This returns a list with two components:
conn_comps
: a tibble with three columns:
name
: the name of the sample
cluster
: the index of the connected component the sample belongs to
cluster_size
: the number of samples belonging
to that cluster
plot
: a ggraph/ggplot plot object showing the connected components as vertices with
edges between them.
# get a Pairs tibble from the stored data Pairs <- compile_related_pairs(species_1_slurped_results_1gen$samples) PCC <- plot_conn_comps(Pairs) # look at the conn_comps: head(PCC$conn_comps) # if you want to print the plot, that seems to require # loading the ggraph library library(ggraph) PCC$plot
# get a Pairs tibble from the stored data Pairs <- compile_related_pairs(species_1_slurped_results_1gen$samples) PCC <- plot_conn_comps(Pairs) # look at the conn_comps: head(PCC$conn_comps) # if you want to print the plot, that seems to require # loading the ggraph library library(ggraph) PCC$plot
A simple character vector of 15 relationship zones in the order they are encountered when traversing an ancestry match matrix out to four generations, ordered by contribution of a single ancestor in the zone to kinship and, within that, increasing in the maximum number of meioses between the shared ancestor and either of the two pair members.
I simply defined these
This is primarily for plotting a figure in the paper about this package,
showing where all the relationship zones are. It merely cycles over the
possible relationships in relationship_zone_names
and produces one or
two rows in a tibble for each that has the corners of the rectangle of that
zone in the columns xmin, xmax, ymin, and ymax. It is designed to be overlaid
upon the ancestry_match_matrix plots. There are some additional columns that give
us the midpoint of the area, etc.
relationship_zone_perimeters()
relationship_zone_perimeters()
Returns a tibble with the following columns:
which_matrix
: a column of values M1
or M2
. M1 denotes that the row's values
are for the relationship zone found in or below the lower diagonal of the ancestry match matrix
and M2 denotes that the row's value are of the zone found in the upper part of the
ancestry match matrix. Symmetrical relationships are considered to be M1.
zone
: The abbreviation for the relationship (e.g., Se, PO, Si, etc.)
xmin
: The left-hand x value of the zone.
xmax
: The right-hand x value of the zone.
ymin
: The bottom y value of the zone.
ymax
: The top y value of the zone.
area
: The area in unit squares of the zone.
xmid
: The x midpoint of the zone.
ymid
: The y midpoint of the zone.
relationship_zone_perimeters()
relationship_zone_perimeters()
Given the ancestry vectors of a collection of individuals (like the set of all sampled individuals from a simulation) this function just figures out who is a relative of whom. Relatives are defined as individuals that share at least one ID within their ancestry vectors. The way this function works is by, for each ID, making a vector of all the other IDs that include that original ID in its ancestry vectors. Then, for each individual, these vectors are catenated (one for each ancestor) then values are made unique and sorted, and returned in a list column of relatives.
relatives_from_ancestry_vectors(A)
relatives_from_ancestry_vectors(A)
A |
a tibble that has columns of |
This function returns the original input, A
, with an additional
list column called relatives
, that includes vectors of the samples that
each sample is related to.
This runs it in a directory and the output from stdout goes into a big file spip_out.txt in that directory. Currently this is pretty bare bones.
run_spip( pars, dir = tempfile(), spip_seeds = ceiling(runif(2, 1, 1e+09)), num_pops = 1, allele_freqs = list(c(0.5, 0.5)) )
run_spip( pars, dir = tempfile(), spip_seeds = ceiling(runif(2, 1, 1e+09)), num_pops = 1, allele_freqs = list(c(0.5, 0.5)) )
pars |
A named list of parameter values. |
dir |
The directory to run it in. Defaults to a temp directory, which will be unique every time it is run. |
spip_seeds |
a vector of two positive integers. These get written to the
file spip_seeds, which is used by spip to seed its random number generator. By
default, R supplies these two integers from its own random number generator. This
way reproducible results from spip can be obtained by calling |
num_pops |
the number of demes that are being simulated. This is still being implemented... |
allele_freqs |
a list of allele frequencies provided if you want
to simulate unlinked genotypes for the sampled individuals. The default
is simply a single locus with two alleles at frequencies 0.5, 0.5, which
is provided because spip has to be given some allele frequencies if sampling
is to be carried out. Note that a user-specified value to this option should
only be given if you want to actually simulate some genetic data from the
sampled individuals. The length of the list should be the number of loci
desired, and the length of each element should be the number of alleles.
For examples for three loci with 2, 3, and 4 equifrequent alleles, respectively,
you would provide |
This creates a temporary directory and runs spip in that directory, redirecting stdout and stderr to files. It then processes the output using awk to create a collection of files. If spip throws an error, the contents of stderr are written to the screen to notify the user of how to correct their input.
For a full example of its use see the Vignette:
vignette("species_1_simulation", package = "CKMRpop")
.
Returns the path to the temporary directory were spip
was run and where the
processed output files can be found to be read in using slurp_spip()
.
This function is run after run_spip()
. It assumes that run_spip()
has left the files: spip_pedigree.tsv
, spip_prekill_census.tsv
, and
spip_samples.tsv
, spip_postkill_census.tsv
, spip_deaths.tsv
,
spip_genotypes.tsv
, and spip_migrants.tsv
inside the directory where run_spip()
was run.
slurp_spip( dir, num_generations, read_pedigree_file = TRUE, find_ancestors_and_relatives = TRUE )
slurp_spip( dir, num_generations, read_pedigree_file = TRUE, find_ancestors_and_relatives = TRUE )
dir |
the path to the directory where spip was run. This is
returned by |
num_generations |
how many generations back do you wish to consider for find relatives of each sampled individual. 0 means just the individual themselves (so, not very interesting, and you likely wouldn't ever use it. 1 means up to and including the parents; 2 means up to and including the grandparents; 3 means up to and including the great grandparents; and so forth. |
read_pedigree_file |
A logical. If TRUE, this function reads the pedgigree file
and returns the pedigree in the output. If FALSE the function does not read the pedigree
file and merely returns an empty tibble for the |
find_ancestors_and_relatives |
A logical indicating whether this function should
read in the complete pedigree of the population and process it to find ancestors and
relatives amongst the sampled individuals using the pedigree file (TRUE). Or whether it
should not do that (FALSE). If option |
For an example of its use, see the Vignette:
vignette("species_1_simulation", package = "CKMRpop")
.
A list of tibbles. Each tibble is a named component of the return list. The names are as follows:
pedigree
census_prekill
,
census_postkill
,
samples
,
deaths
,
genotypes
,
migrants
You can inspect some example output in
the package data object three_pops_with_mig_slurped_results
# see Vignette: vignette("species_1_simulation", package = "CKMRpop")
# see Vignette: vignette("species_1_simulation", package = "CKMRpop")
species_1 for examples.
species_1_life_history
species_1_life_history
Just values that might be typical of a fish.
This is stored as package data so that the vignette can be written even if spip is not installed on the system.
species_1_slurped_results
species_1_slurped_results
Simulation results
This is stored as package data so that the vignette can be written
even if spip is not installed on the system. This particular version
stores the results of running run_spip()
calling for 100 loci segregating
in the population, then slurping the results up with slurp_spip()
.
species_1_slurped_results_100_loci
species_1_slurped_results_100_loci
Simulation results
This is stored as package data so that the vignette can be written even if spip is not installed on the system.
species_1_slurped_results_1gen
species_1_slurped_results_1gen
Simulation results
species_2 for examples. Note that I just set the male and female rates and parameters similar.
species_2_life_history
species_2_life_history
This is something used for simulation testing
This just checks for the file where it gets installed with
install_spip()
. This is exported so it can be used to control
the flow in the vignettes, so that vignettes can still be
written using saved results when spip is not installed (i.e.,
at CRAN, etc.)
spip_exists()
spip_exists()
A single logical scalar, either TRUE or FALSE.
# will be FALSE unless spip has been externally installed spip_exists()
# will be FALSE unless spip has been externally installed spip_exists()
This simply calls spip with the --help
option.
spip_help()
spip_help()
This returns the exit status of spip
if spip is installed,
but the return value is of little use. Mainly this is run for the side
effect of printing the spip
help menu to the console.
## Not run: spip_help()
## Not run: spip_help()
This simply calls spip with the --help-full
option.
spip_help_full()
spip_help_full()
This returns the exit status of spip
if spip is installed,
but the return value is of little use. Mainly this is run for the side
effect of printing the spip
full help menu to the console.
## Not run: spip_help()
## Not run: spip_help()
More later
summarize_offspring_and_mate_numbers( census_postkill, pedigree, deaths, lifetime_hexbin_width = c(1, 1), contrib_bin_width = 0.01 )
summarize_offspring_and_mate_numbers( census_postkill, pedigree, deaths, lifetime_hexbin_width = c(1, 1), contrib_bin_width = 0.01 )
census_postkill |
a tibble with the postkill numbers of individuals. This is here so we know the total number of individuals that could have reproduced in a given year. |
pedigree |
a tibble with columns of |
deaths |
a tibble with columns of |
lifetime_hexbin_width |
a vector of length two. The first element is the width in the
age direction of each hexbin and the second is the width in the lifetime number of offspring
direction for the |
contrib_bin_width |
width of bins of histogram of contribution of parents of each age and sex to the offspring. |
A list with three components, each of them a ggplot object:
plot_age_specific_number_of_offspring
: a ggplot object that plots boxplots and jittered points.
The x-axis are the ages of the individuals; the y-axis shows the number of offspring. Summarized
over the entire spip simulation. This is faceted by sex.
plot_lifetime_output_vs_age_at_death
: a ggplot object. This is a hexbin plot. The x-axis
are age-at-death bins, the y axis are bins of total number of offspring produced in a lifetime.
The fill color of each bin gives the number of individuals with that age at death and number
of offspring encountered over the whole simulation. Plot is faceted by sex.
plot_fraction_of_offspring_from_each_age_class
: a ggplot object. This shows the distribution
over all years of the simulation, of the fraction of offspring produced each year that were
produced by males or females of a given age (the plots are facet-wrapped by both age and sex).
The blue vertical line gives the mean.
# get stored slurped output for an example X <- species_1_slurped_results g <- summarize_offspring_and_mate_numbers( X$census_postkill, X$pedigree, X$deaths ) # Now g is a list holding three plots, accessible like this: # g$plot_age_specific_number_of_offspring # g$plot_lifetime_output_vs_age_at_death # g$plot_fraction_of_offspring_from_each_age_class
# get stored slurped output for an example X <- species_1_slurped_results g <- summarize_offspring_and_mate_numbers( X$census_postkill, X$pedigree, X$deaths ) # Now g is a list holding three plots, accessible like this: # g$plot_age_specific_number_of_offspring # g$plot_lifetime_output_vs_age_at_death # g$plot_fraction_of_offspring_from_each_age_class
The prekill census in year t+1 is the post-kill census in year t, so we can use the prekill census to record the realized fraction of individuals of each age and sex that survived the death episode in each year. In the output survival in year t is the fraction of j-year olds in year t that survive to be j+1 year-olds in year t+1.
summarize_survival_from_census( census, fem_surv_probs = NULL, male_surv_probs = NULL, nbins = 10 )
summarize_survival_from_census( census, fem_surv_probs = NULL, male_surv_probs = NULL, nbins = 10 )
census |
a tibble of census counts with columns |
fem_surv_probs |
a vector of the parameters used for the simulation. If present these are put on the histogram plots. If you provide one of these, you have to provide both. |
male_surv_probs |
a vector of the parameters used for the simulation. If present these are put on the histogram plots. |
nbins |
number of bins for the histograms |
This function does not track migrants. Another one is eventually in order that accounts for migrants out of the population. Also, the plots here might not play well with multiple populations.
A list with components:
survival_tibble
: A tibble with the following columns:
year
: The year
pop
: The population whose census is being counted
age
: The age of individuals
sex
: The sex of individuals
n
: The number of individuals alive and present of sex sex
and age age
in year
year
in pop pop
.
cohort
: The birth year of these individuals
surv_fract
: The fraction of the n individuals that survive to have age age + 1
in
year year + 1
.
plot_histos_by_age_and_sex
: A ggplot object of histograms of observed survival fractions
facet-wrapped by age and sex. Blue vertical lines are the observed means and dashed vertical
red lines are the expected values given the simulation parameters.
result <- summarize_survival_from_census( species_1_slurped_results$census_prekill, species_1_life_history$`fem-surv-probs`, species_1_life_history$`male-surv-probs` ) # print the results if you want result$survival_tibble result$plot_histos_by_age_and_sex
result <- summarize_survival_from_census( species_1_slurped_results$census_prekill, species_1_life_history$`fem-surv-probs`, species_1_life_history$`male-surv-probs` ) # print the results if you want result$survival_tibble result$plot_histos_by_age_and_sex
This is stored as package data so that the vignette can be written even if spip is not installed on the system.
three_pops_no_mig_slurped_results
three_pops_no_mig_slurped_results
Simulation results
This is stored as package data so that the vignette can be written even if spip is not installed on the system.
three_pops_with_mig_slurped_results
three_pops_with_mig_slurped_results
Simulation results
This gives a nice graphical summary of all the kin pairs along with when they were sampled and their age at the time of sampling and their sex. #' In order to visually summarize all the kin pairs that were found, with specific reference to their age, time of sampling, and sex, I find it helpful to use what I have named the "Uncooked Spaghetti Plot". There are multiple subpanels on this plot. Here is how to read/view these plots:
Each row of subpanels is for a different dominant relationship, going from closer relationships near the top and more distant ones further down. You can find the abbreviation for the dominant relationship at the right edge of the panels.
In each row, there are four subpanels: F->F
, F->M
, M->F
, and M->M
. These
refer to the different possible combinations of sexes of the individuals in the pair.
For the non-symmetrical relationships these are naturally defined with the
first letter (F
for female or M
for male) denoting the sex of the "upper_member"
of the relationship. That is, if it is PO, then the sex of the parent is the first letter.
The sex of the non-upper-member is the second letter. Thus a PO
pair that consists of
a father and a daughter would appear in a plot that is in the PO
row in the M->F
column.
For the symmetrical relationships, there isn't a comparably natural way of
ordering the individuals' sexes for presentation. For these relationships, the
first letter refers to the sex of the individual that was sampled in the earliest
year. If both individuals were sampled in the same year, and they are of different
sexes, then the female is considered the first one, so those all go on the F->M
subpanel.
On the subpanels, each straight line (i.e., each piece of uncooked spaghetti) represents a single kin pair. The two endpoints represent the year/time of sampling (on the x-axis) and the age of the individual when it was sampled (on the y-axis) of the two members of the pair.
If the relationship is non-symmetrical, then the line is drawn as an arrow pointing from the upper member to the lower member.
The color of the line gives the number of shared ancestors (max_hit
) at the level
of the dominant relationship. This is how you can distinguish full-sibs from half-sibs, etc.
uncooked_spaghetti(Pairs, Samples, jitter_age = 0.2, jitter_year = 0.2)
uncooked_spaghetti(Pairs, Samples, jitter_age = 0.2, jitter_year = 0.2)
Pairs |
The tibble of kin pairs that comes out of |
Samples |
The tibble of samples that comes out of |
jitter_age |
half the width of the uniform jitter window around age |
jitter_year |
half the width of the uniform jitter window around sampling year |
uncooked_spaghetti()
returns a list with two components: input_data
and plot
.
plot
is a ggplot object of the plot described above in "Description." input_data
is,
itself, another list with the following named components:
P5
: A tibble that is a processed version of the Pairs
input. This is what
goes into making the ggplot.
age_grid
: A tibble giving the coordinates for placing the alternating pink and white
horizontal background rectangles on the plot.
year_grid
: A tibble giving the coordinates for placing the alternating pink and white
vertical background rectangles on the plot.
# get the input variables # only take the first 50 samples to reduce time for example Samples <- species_1_slurped_results$samples[1:50, ] Pairs <- compile_related_pairs(Samples) result <- uncooked_spaghetti(Pairs, Samples) # produce the plot with: # result$plot
# get the input variables # only take the first 50 samples to reduce time for example Samples <- species_1_slurped_results$samples[1:50, ] Pairs <- compile_related_pairs(Samples) result <- uncooked_spaghetti(Pairs, Samples) # produce the plot with: # result$plot