---
title: "About spip"
resource_files:
  - ../man/figures/spip-periods.svg
  - ../man/figures/spip-periods-with-sampling.svg
  - ../man/figures/spip-periods-with-migration.svg
author: "Eric C. Anderson"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{About spip}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This provides a short description of a few things that one
should know about how `spip` works.

## Timing of events in a `spip` life cycle

`spip` is a program that records time in discrete periods that can be
thought of as years.  When individuals are born at time $t$ they are considered
to be of age 0 and they have a birth year of $t$.  The user must input to the program
the maximum possible age of an individual.  Here, we will refer that
maximum age as $\mathrm{MA}$. 

Within each year in `spip` the following events occur, in order:

1. The age of each individual is incremented at the beginning of the new year $t$. This
means that newborns from year $t-1$ are turned into 1-year-olds as they enter year $t$,
and so forth.
1. Next there is a chance to sample individuals before the death episode.  Only individuals
of ages 1 to $\mathrm{MA}$ can be sampled.  These are the individuals, who, if they
survive the upcoming death episode, might have the chance to contribute to the
next generation.
1. Next there is an episode of death.  Individuals that are $a$ years old in year $t$
survive this episode of death with a probability of $s_a$. Individuals that are 
$\mathrm{MA} + 1$ are all killed with probability of 1. 
1. Then there is a chance to sample individuals after the episode of death.
Only individuals of ages 1 to $\mathrm{MA}$ can be sampled.  This is a sample
from the individuals that are around and alive to contribute to the next generation.
1. Next there is an episode of reproduction which produces 0-year-olds born at time
$t$. 
    - During the reproduction episode there is an opportunity
      to draw a sample from the individuals that were chosen to 
      be amongst the reproducing adults.  (This is particularly useful for critters like
      salmon that can be sampled explicitly when they are migrating in fresh water to
      reproduce).
    - After the sampling period during reproduction, there is also the chance for
      individuals to die after reproduction according to the probability
      set by `--fem-postrep-die` and `--male-postrep-die`.  With this value set
      to 1, for example, semelparity can be enforced.
1. After reproduction there are individuals of ages 0 to $\mathrm{MA}$ that
sit around and don't do much of anything.  Eventually the year gets advanced
to the following one ($t+1$), and the ages of individuals get incremented, so that
at the beginning of year $t+1$ individuals are of ages 1 to $\mathrm{MA}+1$.  


It is worth noting that even though there are some $\mathrm{MA}+1$-year-olds around
at the beginning of each time period, they all die
during the episode of death, and, because they cannot be sampled during the
sampling episode before death, it is like they do not exist.


The year in spip can thus be divided into three different periods between
the demographic events/episodes:

1. The period before the episode of death.  This is known as the _prekill_
period of the year, and, in the following, we will use a
superscript $\mbox{}^\mathrm{pre}$ to denote census sizes during that period.
2. The period after the episode of death, but before reproduction.  This
is known as the _postkill_ period, and will be denoted with a superscript
$\mbox{}^\mathrm{pok}$.
3. The period after reproduction, but before the year gets incremented. This
is the _post-reproduction_ period and will be denoted with a superscript
$\mbox{}^\mathrm{por}$.

We will use $F_{t,a}^\mathrm{pre}$, $F_{t,a}^\mathrm{pok}$, and $F_{t,a}^\mathrm{por}$
to denote the number of $a$-year-old females during the
prekill, postkill, and post-reproduction periods, respectively, of time $t$.
The diagram below, showing these numbers in relation to one another,
along with notations of their expected values
should help users to understand the spip annual cycle.
The numbers of males change across time periods in a similar fashion.
```{r, echo=FALSE, out.width="100%"}
knitr::include_graphics("../man/figures/spip-periods.svg")
```

Annotated in the above are the three distinct periods in spip's annual cycle:
the _prekill_, _postkill_ and _post-reproductive_ periods.  When the output
of spip is slurped up by CKMRpop, these numbers become the respective tibbles
(elements of the output list of `slurp_spip()`) of: `census_prekill`, `census_postkill`,
and `census_postrepro`, respectively.

The expected numbers of individuals after each transition is as follows:

- $F_{t,a}^\mathrm{pre} \equiv F_{t-1,a-1}^\mathrm{por}~~,~~a = 1,\ldots,\mathrm{MA}$
- $E[F_{t,a}^\mathrm{pok}] = s_a^F F_{t,a}^\mathrm{pre}~~,~~a = 1,\ldots,\mathrm{MA}$, where $s_a^F$ is the probability that an $a-1$  year old female survives to be an $a$ year old female, as given with the `--fem-surv-probs` option. 
- $E[F_{t,a}^\mathrm{por}] = (1 -  r_a^Fd_a^F) F_{t,a}^\mathrm{pok}~~,~~a = 1,\ldots,\mathrm{MA}$, where $r_a^F$ is the probability that an $a$-year-old
female reproduces (set with the `--fem-prob-repro` option) and $d_a^F$ is the
probability that an $a$-year-old female will die after engaging in reproduction
(even if no offspring were actually produced!), as given
in the `--fem-postrep-die` option.  This is an additional source of death
that is useful for modeling anadromous species whose reproductive journey
incurs a substantial cost.

## Sampling episodes in a `spip` annual cycle

The two main sampling schemes available in spip are keyed to these different
time periods within the spip annual cycle as shown by the following figure:
```{r, echo=FALSE, out.width="100%"}
knitr::include_graphics("../man/figures/spip-periods-with-sampling.svg")
```
Thus, `--gtyp-ppn-fem-pre` and `--gtyp-ppn-male-pre` involve sampling
from the simulated population at a different point in the year than
do the `--gtyp-ppn-fem-post` and `--gtyp-ppn-male-post` options.  
It is also possible to only sample those individuals that are trying to
reproduce in a certain year using a third sampling scheme requested with the
`--gtyp-ppn-fem-dur` and `--gtyp-ppn-male-dur` options
to spip.  The probability that an individual would try to reproduce
in a given year is age specific and is set using the `--fem-prob-repro` 
and `--male-prob-repro` options.

It is worth noting that the `pre`, `post` and `dur`, sampling options all
occur relatively independently (so long as sampling is not lethal---see the
somewhat experimental `--lethal-sampling` option). spip reports the different
years when an individual is sampled during the `pre`, `post`, and `dur`
periods in the year. CKMRpop preserves those times in separate lists
when it slurps up the spip output.  For example `slurped$samples` has the
list columns: `samp_years_list_pre`, `samp_years_list_post`, and `samp_years_list_dur`.
For all downstream analyses, CKMRpop uses the list column `samp_years_list`, which,
by default is the same as the `samp_years_list_post`.  This means, at the present
time, you should use the options to sample individuals after the episode of death
using the the `--gtyp-ppn-fem-post` and `--gtyp-ppn-male-post` options.

Note that, in most cases when exploring CKMR, the user will want to 
use the `--gtyp-ppn-fem-post` and `--gtyp-ppn-male-post` options, anyway, because
those are samples from the adult population that are available for reproduction.

If it is desired to sample all newborns at time $t$, then currently the way to do
that is to sample 1-year-olds at time $t+1$ using the
`--gtyp-ppn-fem-pre` and `--gtyp-ppn-male-pre` options.  However, it would take some
extra finagling to get those sampling years into the `samp_years_list` column
referenced above for the downstream analyses.


TODO ITEM: combine sampling at all times into the single `samp_years_list` column, perhaps,
or make it easier for users to decide how to combine those different sampling
episodes.  For now, though, users should stick to using the
`--gtyp-ppn-fem-post` and `--gtyp-ppn-male-post` options.

## How inter-population migration occurs in `spip`

We can use the same diagrams developed above to describe how migration is implemented
in `spip`.  Migration in `spip` is a "two-stage" phenomenon: in the first stage,
individuals leave a population with sex-, year- and age-specific out-migration rates
specified with the population's options `--fem-prob-mig-out` and
`--male-prob-mig-out`.
They leave each population before the prekill census occurs and also before the prekill
sampling occurs.  Diagrammatically,
it looks like this:
```{r, echo=FALSE, out.width="100%", fig.cap="Schematic describing the first stage of migration: migration out of a population. Each blue line shows individuals leaving the population and entering a pool of migrants."}
knitr::include_graphics("../man/figures/spip-periods-with-migration.svg")
```

The expected numbers of individuals in the pool of migrants who have left the population
is given by the time- and age-specific rates set by the user.  We will denote the outmigration
rate for age $a$ individuals at time $t$ from a given population by $m^\mathrm{out}_{t,a}$.
It follows then that, for this given population:
$$
E[F^\mathrm{out}_{t,a}] = m^\mathrm{out}_{t,a}F_{t-1,a-1},~a=1,\ldots, \mathrm{MA}.
$$

In the following, we will want to refer to these outmigration rates for each
population, so we may also adorn the notation, thus:
$$
E[F^{\mathrm{out},i}_{t,a}] = m^{\mathrm{out},i}_{t,a}F^i_{t-1,a-1},~a=1,\ldots, \mathrm{MA}.
$$

to refer to rates and sizes specifically for population $i$.


After the outmigration stage, each population has a pool of migrants that are waiting
to migrate into other populations.  The rates by which this happens are specified
with the `--fem-prob-mig-in` and `--male-prob-mig-in` 
options. These options set in-migration rates for different years and
for different ages, effectively setting the fraction of the total number
of out-migrated individuals from population $i$ of age $a$ at time $t$,
$F^{\mathrm{out},i}_{t,a}$, that will migrate into the other populations. Thus,
there is one number to set for each population.  For example, if
there are $K$ populations, we would have:
$$
m^{\mathrm{in},i}_{t,a} = [m^{\mathrm{in},i}_{t,a,1},\ldots,m^{\mathrm{in},i}_{t,a,K}]~~,~~
\sum_{j=1}^K m^{\mathrm{in},i}_{t,a,j} = 1.
$$
The probability of migrating back to the population from whence one came
is always 0.  So, even if the user sets that to some non-zero value, 
it will be forced to zero and the values of the remaining in-migration rates will
be re-scaled so as to sum to 1.

Given this set up, the expected number of individuals from the
outmigrant pool from population $i$ that will arrive in population $j$,
of age $a$ at time $t$ is
$$
E[F^{\mathrm{in},i}_{t,a,j}] = m^{\mathrm{in},i}_{t,a,j} F^{\mathrm{out},i}_{t,a}
$$
And, so we can also write that entirely in terms of current population sizes
and migration rates:
$$
E[F^{\mathrm{in},i}_{t,a,j}] = m^{\mathrm{in},i}_{t,a,j} m^{\mathrm{out},i}_{t,a}F^i_{t-1,a-1}
$$

So, this whole system of specifying migrants is a little more complex than
a system whereby the user specifies the fraction of individuals in population
$j$ that originated from population $i$.  But, it does provide a lot more control
by the user, as well as realism, in that the number of migrants into a population
depends on the size of the donor population.