Negative Binomial Distribution Calculator
PMF · CDF · Mean · Variance · Bar Chart · Two Parameterizations
Compute exact probabilities, CDF and statistics for the negative binomial (Pascal) distribution. Supports both the failures-before-r-th-success and trial-of-r-th-success parameterizations.
Quick Examples
Parameterization
Formula & Substitution
Full probability table from k=0 to k_max. Use the Calculator tab first, or enter parameters below.
| k | P(X=k) | P(X≤k) | P(X>k) |
|---|
Dashed red line = mean. Dark green bar = selected k value.
What Is the Negative Binomial Distribution?
The negative binomial distribution is a fundamental discrete probability distribution that models count data arising from a sequence of independent Bernoulli trials. Unlike the binomial distribution — which counts successes in a fixed number of trials — the negative binomial distribution has a fixed target number of successes r and counts how many failures (or total trials) occur before that target is reached.
Formally, suppose each trial independently results in success with probability p and failure with probability 1 − p. The process continues until exactly r successes have been observed. The negative binomial distribution describes the randomness in how many failures accumulate along the way. Its probability mass function is:
where X is the number of failures before the r-th success. This form is particularly natural for quality control, where r = "number of good items required" and k = "number of defective items that appear before meeting the quota."
Two Parameterizations Explained
Statisticians use two equivalent ways to parameterize the negative binomial distribution, which can cause confusion when comparing sources.
Mode A — Number of Failures (X)
X counts the number of failures before the r-th success. The support is k = 0, 1, 2, .... The PMF is P(X=k) = C(k+r−1, k) × p^r × (1−p)^k. The mean is r(1−p)/p and variance is r(1−p)/p². This parameterization is common in probability textbooks.
Mode B — Trial Number (Y, Pascal Distribution)
Y counts the trial on which the r-th success occurs. The support is n = r, r+1, r+2, .... The PMF is P(Y=n) = C(n−1, r−1) × p^r × (1−p)^(n−r). The mean is r/p and variance is r(1−p)/p². This form is known as the Pascal distribution and is natural when the question is "on which attempt does the r-th success occur?"
Relationship to the Geometric Distribution
When r = 1, the negative binomial reduces exactly to the geometric distribution. In Mode A with r = 1, P(X=k) = p × (1−p)^k for k = 0, 1, 2, ..., which is the geometric distribution measuring the number of failures before the first success. In Mode B with r = 1, P(Y=n) = p × (1−p)^(n−1) for n = 1, 2, 3, ..., the standard geometric distribution. You can verify this on the calculator by setting r = 1.
Mean, Variance, and Overdispersion
| Parameterization | Mean | Variance | Std Dev |
|---|---|---|---|
| Mode A (failures X) | r(1−p)/p | r(1−p)/p² | √(r(1−p)/p²) |
| Mode B (trials Y) | r/p | r(1−p)/p² | √(r(1−p)/p²) |
A key property is that Variance = Mean + Mean²/r. Since Mean²/r > 0, the variance always exceeds the mean. This is called overdispersion — in contrast to the Poisson distribution, where variance equals the mean. As r → ∞ (with mean held fixed), the negative binomial converges to a Poisson distribution.
Relationship to the Binomial Distribution
While both distributions involve Bernoulli trials, their roles are reversed. In the binomial distribution B(n, p), n is fixed and successes X are counted. In the negative binomial, r successes are required and failures (or trials) are counted. The same binomial coefficient C(k+r−1, k) appears in the PMF as C(k+r−1, r−1), reflecting the ways r−1 successes can be distributed among the first k+r−1 trials before the final r-th success on trial k+r.
Applications
- Insurance and actuarial science: Modelling claim counts when policyholders are heterogeneous. If each policyholder's claim rate follows a Gamma distribution, the marginal count is negative binomial.
- Ecology: Species abundance and insect population counts are typically aggregated (overdispersed), making the negative binomial a much better fit than Poisson.
- Genomics / RNA-seq: DESeq2, edgeR and other differential expression tools use the negative binomial distribution to model read counts per gene, explicitly modelling overdispersion.
- Epidemiology: COVID-19 spreading data showed strong overdispersion (k parameter in NB notation), indicating superspreader dynamics. The negative binomial fits such data far better than Poisson.
- Reliability engineering: Time to the r-th failure of a repairable system follows a negative binomial when individual failure probability is constant.
- Sports analytics: Goals scored per game in soccer, runs in cricket innings, and wickets in bowling spells all show overdispersed patterns well captured by the negative binomial.
- Marketing and sales: Number of sales calls before r confirmed orders, number of ad impressions before r clicks.