The Detection Problem

Author

Deon Roos

Published

March 4, 2026

How many animals are there, and are they surviving?

Two of the most fundamental questions in ecology and conservation are: how many individuals of a species are there, and how well are they surviving? How many Atlantic salmon are in the River Dee? What proportion of juveniles survive to adulthood? Is that survival rate declining? Is a conservation intervention actually working?

These questions sound simple. They are not. And the reason they are not comes down to a problem that is easy to state but surprisingly easy to forget about when you are standing in a river with a net.

The naive approach

Imagine I want to estimate how many otters live along a stretch of river. I go out and count them. I spot 23 otters.

So there are 23 otters?

Well. Maybe. But I almost certainly did not see every otter that was there. Some were in burrows. Some were around the bend. Some saw me coming and hid (otters are, frankly, suspicious animals). The 23 I counted is not the true population size. It is the true population size multiplied by my probability of detecting any given otter.

\[\text{Count} = N \times p\]

where \(N\) is the true number of otters and \(p\) is my detection probability. If \(p = 1\), my count equals the true population. If \(p < 1\), and it is always less than 1, my count is an underestimate. How badly it underestimates depends entirely on how low \(p\) is.

This is imperfect detection, and it is not a quirk or an edge case. It is the default condition of every wildlife survey ever conducted. If you have ever counted animals in the field and assumed that count represented the true number present, you were almost certainly wrong.

A glimmer of hope: marking individuals

Here is where mark-recapture comes in. The insight, which is beautifully simple once you see it, is this:

If you catch some animals, mark them, release them, and then go back and catch animals again, the proportion of marked animals in your second sample tells you something about how many unmarked ones you missed.

Say I catch 20 otters, tag each one with a unique ID, and release them back into the river. A week later I return and catch another 20 otters. Of those 20, I find that 5 of them already have tags. So I recaptured 5 out of the 20 I originally marked. That recapture rate of \(\frac{5}{20} = 25\%\) is an estimate of my detection probability. And if I only detected 25% of marked animals in my second sample, I probably only detected about 25% of unmarked animals too. Which means my second sample of 20 likely represents about 25% of the true population, implying there are roughly \(\frac{20}{0.25} = 80\) otters in total.

This is the Lincoln-Petersen estimator, and we will work through it properly on the next page. For now, the key idea is that recaptures give us leverage on detection probability, which is what allows us to estimate true population size rather than just report a count and hope for the best.

But there is a second problem layered on top of imperfect detection, one that makes mark-recapture considerably more interesting, and considerably more complicated.

The second problem: individuals can die

Individual animals are not permanent fixtures. They move. They disperse. And, crucially, they die.

This matters enormously because it creates a new version of the detection problem.

Imagine I tag 50 fish in June. In August I go back and sample again. I catch 30 fish, of which 12 have tags. Fine. But what about the 38 tagged fish I did not recapture? There are two possible explanations:

  1. They are still alive, I just did not catch them. My detection probability is less than 1, so this is entirely plausible.

  2. They are dead.

A non-recapture has two possible meanings: missed, or gone. And here’s the crap part of that: both explanations predict exactly the same observation in your data. A zero. Nothing. Silence. In both cases you do not see the fish.

If you cannot separate these two explanations, you cannot estimate either detection probability or survival probability reliably. They are completely tangled up in each other.

Code
library(ggplot2)
library(dplyr)

set.seed(99)

n_tagged <- 50
occasions <- 4

phi_true <- 0.75
p_true <- 0.40

survival_mat <- matrix(NA, nrow = n_tagged, ncol = occasions)
survival_mat[, 1] <- 1

for (t in 2:occasions) {
  survival_mat[, t] <- rbinom(n_tagged, 1, survival_mat[, t - 1] * phi_true)
}

detection_mat <- matrix(NA, nrow = n_tagged, ncol = occasions)
for (t in 1:occasions) {
  detection_mat[, t] <- rbinom(n_tagged, 1, survival_mat[, t] * p_true)
}

observed_counts <- colSums(detection_mat)
true_alive <- colSums(survival_mat)

summary_df <- data.frame(
  occasion = 1:occasions,
  true_alive = true_alive,
  observed = observed_counts
)

ggplot(summary_df) +
  geom_col(aes(x = occasion, y = true_alive),
           fill = "#00A68A", alpha = 0.4, width = 0.6) +
  geom_col(aes(x = occasion, y = observed),
           fill = "#FF5733", alpha = 0.8, width = 0.3) +
  labs(
    x = "Sampling occasion",
    y = "Number of tagged individuals",
    caption = "Green = truly alive  |  Orange = actually detected"
  ) +
  scale_x_continuous(breaks = 1:occasions) +
  theme_minimal()

The green bars show how many tagged animals are truly alive at each occasion. The orange bars show how many we actually detect. The gap between them is entirely due to imperfect detection, not death. But if I only showed you the orange bars and asked you to calculate a survival rate, you would badly underestimate it. Animals are disappearing from your data faster than they are actually dying.

The problem stated plainly

When you do not recapture a tagged animal, you are facing two competing hypotheses:

  • Hypothesis A: The animal is alive but you missed it. This happens with probability \(\phi \times (1 - p)\): the animal survived the time between surveys but you failed to detect it.

  • Hypothesis B: The animal is dead. This happens with probability \(1 - \phi\): the animal did not survive the time between surveys.

Both predict the same observation. With a single resighting occasion you cannot tell them apart.

Code
phi_true <- 0.75
p_true <- 0.40

scenarios <- data.frame(
  scenario = c("Alive, not detected", "Dead"),
  probability = c(phi_true * (1 - p_true), 1 - phi_true)
)

ggplot(scenarios, aes(x = scenario, y = probability)) +
  geom_col(fill = c("#00A68A", "#FF5733"), width = 0.5) +
  scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
  labs(
    x = NULL,
    y = "Probability",
    title = "Two explanations for a non-recapture",
    subtitle = paste0("True \u03d5 = ", phi_true, "  |  True p = ", p_true)
  ) +
  theme_minimal()

With the values used in our simulation (\(\phi = 0.75\), \(p = 0.40\)), the probability that a non-detection means the animal is alive and missed is actually higher than the probability it means the animal is dead. In other words, the naive interpretation of a non-detection as death would be wrong the majority of the time in this scenario.

This is why a single survey with recaptures is not enough. You need a framework that estimates \(\phi\) and \(p\) simultaneously, using the full pattern of detections and non-detections across multiple occasions to untangle them.

There is actually a third problem

I have been quietly ignoring something. Animals can also leave your study area permanently without dying.

If an otter decides to move 10 kilometres upstream and you never sample up there, it vanishes from your data in exactly the same way a dead otter would. Permanent emigration looks identical to death in a capture history. This means that what survival models actually estimate is not true survival but apparent survival, \(\phi\), which bundles together the probability of surviving and the probability of not permanently emigrating.

This is not a fatal flaw. It is a limitation that exists in virtually every mark-recapture study, and the honest thing to do is acknowledge it when you report your results. We will return to it in more detail when we get to the robust design, because the robust design introduces yet another flavour of this problem: animals that leave your study area temporarily and then come back. A non-detection that is neither death nor permanent emigration, but a brief absence.

For now, the point is simply that what looks like a straightforward counting problem is actually an estimation problem with at least three sources of ambiguity packed into every zero in your data.

What we need to make progress

To estimate survival and detection simultaneously, rather than having them hopelessly tangled together, we need three things:

Individually marked animals. Not just a count of how many we saw, but a record of which specific individuals we saw on each occasion. This is what allows us to track fates through time rather than just tallying up numbers.

Multiple sampling occasions. A single occasion gives us a count. Multiple occasions give us a pattern of detections and non-detections across time, and that pattern contains the information we need to estimate both \(\phi\) and \(p\) separately.

A model that represents both processes explicitly. One that says, formally, that a detection requires both survival and detection, and uses that structure to estimate each parameter on its own terms.

The next page introduces the simplest version of this thinking: the Lincoln-Petersen estimator. It does not quite tick all three boxes but it builds the intuition clearly and cleanly before we move on to models that do.