Concept of
ANOVA and Its sample size calculation formula
We have
groups, each with
observations: ![]()
Total sample size:
Overall
mean:
Group
means: ![]()
ANOVA decomposition
The total sum of squares:
and this can be decomposed into: ![]()
Were,
known as sum of
square due to treatments.
And,
known as sum of
square due to random errors.
Distribution of Sum of Squares
Under Null Hypothesis:
(say constant)
is just random
noise
(central
chi-square)
Under Alternative Hypothesis: ![]()
has a
mean shift due to the true group effect.
This gives a non-central chi-square
where
is the
non-centrality parameter.
Deriving the
non-centrality parameter ![]()
By
definition, for non-central chi-square: ![]()
Here, mean shift for group
. Each group mean is based on
observations → variance of
is
. So, the contribution to
for each group is: ![]()
Summing over all
groups:
This is exactly the non-centrality parameter
for one-way ANOVA.
F-statistic
distribution
![]()
Under
:
(central)
Under
:
(non-central)
Rejection criteria for
Null hypothesis
Let level of
significance: ![]()
(1) Cut-off
method: F >
:
Reject the Null hypothesis
(2) P-value
approach:
<0.05
: Reject the Null hypothesis
Sample size calculation formula:
For statistical power
and sample size calculations, the non-central F distribution is applied with
Cohen s f effect size. This approach is used because the goal is to determine
the sample size needed to detect meaningful differences in treatment effects on
the outcome.
![]()
Example: One-Way ANOVA and the Logic of
Hypothesis Testing
Imagine a study with
three different treatment groups. Each group receives its own intervention, and
we measure a continuous outcome such as blood pressure, weight loss, or test
scores. The central question is whether the treatments truly differ in their
effects on average, or whether any observed differences could simply be
explained by random variation.
To explore this,
consider a simulation where the population means are set to
, with a common
standard deviation of
= 4. From these populations, we draw 60
observations in total (20 per group). The resulting sample means are
approximately
, with an overall mean
of
(see Figure 1).
The hypotheses are:
Applying one-way ANOVA
gives an observed F-statistic of 9.38. The question then becomes: is 9.38
large enough to reject the null? Under
, the F-statistic follows a central F
distribution with degrees of freedom
this corresponds to
the black curve in Figure 2. Under
, the F-statistic instead follows a noncentral
F distribution the red curve in Figure 2.
At the 5% significance
level, the critical value is about 3.16. Since our observed statistic F = 9.38
is far to the right of this cutoff, we reject the null hypothesis and conclude
that not all treatments have the same mean effect. The p-value formalizes this:
![]()
which here is roughly
0.0003. Such a tiny probability indicates it would be extremely unlikely to see
an F-statistic this large if the group means were truly equal. Importantly:
Figure 3 illustrates
this idea in the same spirit as Figure 2.
Because we simulated
with genuine mean differences, we can also discuss power the probability
of rejecting
when it is false. In this case, the
theoretical noncentrality parameter is λ ≈
15.8, which shifts the F distribution to the right. Simulation shows that with
20 subjects per group and these mean differences; the test achieves about 94%
power at the 5% significance level.
By contrast, if we
repeat the simulation assuming no true treatment differences in the population
(say, all means equal to 10), the resulting test statistic is much smaller: F =
0.25, well below the critical value of 3.16. In this case, we would accept the
null hypothesis of equal means. Figures 4 6 visualize the sample distributions
of the groups, the sampling distributions of the F-statistic under both
and
, and the theoretical population-level
projections of those distributions.






Sample Size Calculation for One-Way ANOVA
Determining the required sample size is
an important part of designing an ANOVA study. The goal is to ensure that the
test has enough power to detect meaningful differences between group means if
such differences truly exist. Two practical approaches are commonly used: a
direct method based on pilot means and standard deviations, and a statistically
rigorous method using the non-central F distribution.
Approach 1: Direct Formula Using Group
Means and Standard Deviations
When pilot or published data already
provide estimates of the group means
and a common standard deviation
, the sample size can
be estimated using a simplified normal-approximation formula. This approach
directly uses the magnitude of real differences among the expected means.
First compute the deviation of each group
mean from the overall mean:
,
Then the per-group sample size
can be approximated as:

This formula increases the required
sample size when:
This method is intuitive, easy to apply, and
produces a data-driven estimate. It is best used when realistic pilot means,
and standard deviations are available.
Approach 2: Using the Non-Central F
Distribution (Statistically Rigorous Method)
ANOVA decisions are based on the
F-statistic, which under the alternative hypothesis follows a non-central F
distribution with non-centrality parameter
. Using this parameter,
the power of the ANOVA test is:
![]()
The sample size
is chosen such that
this probability reaches the target power (e.g., 80% or 90%). Although this
must be solved numerically, it is the exact method and directly matches the
theory used in ANOVA.
An equivalent and widely used
re-expression uses the standardized effect size known as Cohen s
:
; ![]()
Where
![]()
This method is preferred when:
For more details on sample size calculation,
visit this sample size calculation tool: StudySizer.streamlit.app
R
Code
#
------------------------------
#
Simulation Example: 1
#
------------------------------
##--------
Simulate sample ---------##
set.seed(123)
#
Parameters
k=
3 # number of groups
n=
20 # sample size per group
sigma=
4 # common sd
mu=
c(10, 12, 15) # true means for groups (H1: not all
equal)
#
Generate data
group=
rep(1:k, each = n)
values=
c(rnorm(n, mean = mu[1], sd = sigma),
rnorm(n, mean = mu[2], sd = sigma),
rnorm(n, mean = mu[3], sd = sigma))
##
Data
data=
data.frame(group =
factor(group), y = values)
#
##--------
Run ANOVA ---------##
anova_model= aov(y ~ group, data = data)
summary(anova_model)
#
##--------
Manual decomposition ---------##
grand_mean= mean(values)
group_means= tapply(values, group, mean)
#
SST (between-group)
SST=
sum(n * (group_means - grand_mean)^2)
#
SSE (within-group)
SSE=
sum((values - rep(group_means, each = n))^2)
#
TSS
TSS=
SST + SSE
#
F statistic
df_between= k - 1
df_within= k * (n - 1)
Fstat= (SST / df_between)
/ (SSE / df_within)
#
Theoretical noncentrality λ
lambda=
n * sum((mu - mean(mu))^2) /
sigma^2
#
Compare with observed F
crit=
qf(0.95,
df_between, df_within) # critical F at 5%
level
#
P-value
pval= pf(Fstat,df_between,df_within,lower.tail
= F)
#
Simulation for Empirical power
nsim = 5000
rejects
= replicate(nsim, {
values = c(rnorm(n, mean = mu[1], sd = sigma),
rnorm(n, mean = mu[2], sd = sigma),
rnorm(n, mean = mu[3], sd = sigma))
group = rep(1:k,
each = n)
model = aov(values ~ factor(group))
pval =
summary(model)[[1]]$`Pr(>F)`[1]
pval < 0.05
})
mean(rejects) # estimated
power
#
##----------
Visualization -----------##
##--
Kernel Density Estimates of the Sample Data from the 3 Treatment Groups.
plot(density(data$y[group==1]),lty=1,ylim=c(0,0.12),xlim=c(0,28),main="Figure
1: Sample distribution of 3 Treatment groups")
lines(density(data$y[group==2]),lty=2,col="red")
lines(density(data$y[group==3]),lty=3,col="blue")
legend("topright",
legend=c("Treatment 1","Treatment
2","Treatment 3"),
col=c("black","red","blue"),
lty=c(1,2,3))
legend("topleft",
legend=c("mu1 = 10","mu2 =
12","mu3 = 15"),
col=c("black","red","blue"),
lty=c(1,2,3))
#
#
Sample distribution of the test statistic at Null and Alternative hypothesis
plot(density(rf(1000,df_between,df_within)),xlim=c(0,10),main="Figure
2: Sample distribution of the test statistic at Null and Alternative
Hypothesis")
lines(density(rf(1000,df_between,df_within,ncp = lambda)),lty=2,col="red")
abline(v = Fstat, col
= "blue", lwd = 2)
legend("topright",
legend = c("Central F", "Noncentral
F", "Observed F"),
lty = c(1,2,1), col = c("black","red","blue"),
lwd = c(1,1,2))
crit
= qf(0.95,
df_between, df_within)
abline(v = crit, col = "darkgreen",
lwd = 2, lty = 3)
legend("topright",
legend=c("Central F","Noncentral
F","Observed F","Critical
Value"),
col=c("black","red","blue","darkgreen"),
lty=c(1,2,1,3), lwd=c(1,1,2,2))
#
#
Population Projection of the test statistic at Null and Alternative Hypothesis
curve(df(x, df_between, df_within), from=0, to=10, col="black",ylab= "Density",xlab = "Bins",main= "Figure 3:
Population Projection of the test statistic at Null and Alternative
Hypothesis")
curve(df(x, df_between, df_within, ncp=lambda), from=0,
to=10, col="red", add=TRUE, lty=2)
abline(v=Fstat,
col="blue", lwd=2)
abline(v = crit, col = "darkgreen",
lwd = 2, lty = 3)
legend("topright",
legend=c("Central F","Noncentral
F","Observed Value","Critical
Value"),
col=c("black","red","blue","darkgreen"),
lty=c(1,2,1,3), lwd=c(1,1,2,2))
#
------------------------------
#
Simulation Example: 2
#
------------------------------
##--------
Simulate sample ---------##
set.seed(123)
#
Parameters
k=
3 # number of groups
n=
20 # sample size per group
sigma=
4 # common sd
mu=
c(10, 10, 10) # true means for groups (H1: not all
equal)
#
Generate data
group=
rep(1:k, each = n)
values=
c(rnorm(n, mean = mu[1], sd = sigma),
rnorm(n, mean = mu[2], sd = sigma),
rnorm(n, mean = mu[3], sd = sigma))
##
Data
data=
data.frame(group =
factor(group), y = values)
#
##--------
Run ANOVA ---------##
anova_model= aov(y ~ group, data = data)
summary(anova_model)
#
##--------
Manual decomposition ---------##
grand_mean= mean(values)
group_means= tapply(values, group, mean)
#
SST (between-group)
SST=
sum(n * (group_means - grand_mean)^2)
#
SSE (within-group)
SSE=
sum((values - rep(group_means, each = n))^2)
#
TSS
TSS=
SST + SSE
#
F statistic
df_between= k - 1
df_within= k * (n - 1)
Fstat= (SST / df_between)
/ (SSE / df_within)
#
Theoretical noncentrality λ
lambda=
n * sum((mu - mean(mu))^2) /
sigma^2
#
Compare with observed F
crit=
qf(0.95,
df_between, df_within) # critical F at 5%
level
#
P-value
pval= pf(Fstat,df_between,df_within,lower.tail
= F)
#
Simulation for Empirical power
nsim = 5000
rejects
= replicate(nsim, {
values = c(rnorm(n, mean = mu[1], sd = sigma),
rnorm(n, mean = mu[2], sd = sigma),
rnorm(n, mean = mu[3], sd = sigma))
group = rep(1:k,
each = n)
model = aov(values ~ factor(group))
pval =
summary(model)[[1]]$`Pr(>F)`[1]
pval < 0.05
})
mean(rejects) # estimated
power
#
##----------
Visualization -----------##
##--
Kernel Density Estimates of the Sample Data from the 3 Treatment Groups.
plot(density(data$y[group==1]),lty=1,ylim=c(0,0.12),xlim=c(0,25),main="Figure
4: Sample distribution of 3 Treatment groups")
lines(density(data$y[group==2]),lty=2,col="red")
lines(density(data$y[group==3]),lty=3,col="blue")
legend("topright",
legend=c("Treatment 1","Treatment
2","Treatment 3"),
col=c("black","red","blue"),
lty=c(1,2,3))
legend("topleft",
legend=c("mu1 = 10","mu2 =
10","mu3 = 10"),
col=c("black","red","blue"),
lty=c(1,2,3))
#
#
Sample distribution of the test statistic at Null and Alternative hypothesis
plot(density(rf(1000,df_between,df_within)),xlim=c(0,10),main="Figure
5: Sample distribution of the test statistic at Null and Alternative
Hypothesis")
lines(density(rf(1000,df_between,df_within,ncp = lambda)),lty=2,col="red")
abline(v = Fstat, col
= "blue", lwd = 2)
legend("topright",
legend = c("Central F", "Noncentral
F", "Observed F"),
lty = c(1,2,1), col = c("black","red","blue"),
lwd = c(1,1,2))
crit
= qf(0.95,
df_between, df_within)
abline(v = crit, col = "darkgreen",
lwd = 2, lty = 3)
legend("topright",
legend=c("Central F","Noncentral
F","Observed F","Critical
Value"),
col=c("black","red","blue","darkgreen"),
lty=c(1,2,1,3), lwd=c(1,1,2,2))
#
#
Population Projection of the test statistic at Null and Alternative Hypothesis
curve(df(x, df_between, df_within), from=0, to=10, col="black",ylab= "Density",xlab = "Bins",main= "Figure 6:
Population Projection of the test statistic at Null and Alternative
Hypothesis")
curve(df(x, df_between, df_within, ncp=lambda), from=0,
to=10, col="red", add=TRUE, lty=2)
abline(v=Fstat,
col="blue", lwd=2)
abline(v = crit, col = "darkgreen",
lwd = 2, lty = 3)
legend("topright",
legend=c("Central F","Noncentral
F","Observed Value","Critical
Value"),
col=c("black","red","blue","darkgreen"),
lty=c(1,2,1,3), lwd=c(1,1,2,2))