Cumulative standard normal distribution
July 07, 2019 3 minutes reading time Math go rust statistics
I’ve written about the use of the central limit theorem (CLT) to solve some problems in statistics before. It involves calculating a z-value and looking up the probability in a z-table. A z-table holds the cumulative probabilities for values that follow the standard normal distribution, i.e. the mean is 0 and the standard deviation is 1. For other means and standard deviations, the values can be normalized, so that mean/standard deviation of the standard normal distribution apply.
In programs like Excel or in code, we can easily calculate the z-table values. Some languages like Java (Apache Commons Math) or Rust (libm crate) require the use of 3rd party libraries if you don’t want to do all the work yourself, but Go has everything built-in already. Let’s see how to do it.
The cumulative distribution function for a function with normal distribution: $$\phi\left(x\right) = \frac{1}{2}\left(1+\textbf{erf}\left(\frac{x-\mu}{\sigma\sqrt{2}}\right)\right)$$
Where $\textbf{erf}$ is the error function: $$\textbf{erf}\left(z\right) = \frac{2}{\sqrt{\pi}}\int_0^z e^{-t^2} dt$$
Go has the error function $\textbf{erf}$ in the math
package, so it’s easy to write $\phi(x)$ as a higher-order-function, that takes the mean $\mu$ and the standard deviation $\sigma$ and gives us a function that only depends on the random variable x
.
With this we can write all the code necessary to solve the statistics problem mentioned above.
package main
import (
"fmt"
"math"
)
func phi(m float64, sd float64) func(float64) float64 {
return func(x float64) float64 {
z := (x - m) / sd / math.Sqrt(2)
return (1 + math.Erf(z)) / 2
}
}
func main() {
maxWeight, boxes, m, sd := 2630.0, 36.0, 72.0, 3.0
// the central limit theorem states that if we have a
// sufficiently large sample of n boxes, we can expect
// the mean of this sample to be equal to the statistical
// mean, and the σ_sample of the sample to be σ / sqrt(n)
zTable := phi(0, 1) // cumulative std. normal distribution
boxCritical := maxWeight / boxes
sdSample := sd / math.Sqrt(boxes)
// normalize for z-table use
z := (boxCritical - m) / sdSample
p := zTable(z)
// probability that boxes can be transported
fmt.Printf("%f\n", p)
}
Rust has only a minimal standard library by design, where the error function is not included. But you can use the libm crate for that.
use libm::erf;
fn phi(m: f64, sd: f64) -> impl Fn(f64) -> f64 {
move |x| {
(1.0 + erf((x - m) / sd / 2_f64.sqrt())) / 2.0
}
}
fn main() {
let max_weight : f64 = 2630.0; // in kg
let boxes : f64 = 36.0; // amount of boxes
let m : f64 = 72.0; // mean of the historic boxes
let sd : f64 = 3.0; // standard distribution of the historic boxes
let z_table = phi(0.0, 1.0); // cumulative std. normal distribution
let box_critical = max_weight / boxes;
let sd_sample = sd / boxes.sqrt();
// normalize for z-table use
let z = (box_critical - m) / sd_sample;
let p = z_table(z);
// probability that boxes can be transported
println!("{}", p);
}
We could have used the $\phi$ function without normalizing, since it can handle arbitrary means and standard deviations, but I wanted to show a way that mimics how you would do it by hand using a z-table.