In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert A. Simon. Simon originally called it the Yule distribution.[1]

The probability mass function (pmf) of the Yule–Simon (ρ) distribution is

\( {\displaystyle f(k;\rho )=\rho \operatorname {B} (k,\rho +1),} \)

for integer \( k\geq 1 \) and real \( \rho >0 \), where \( {\displaystyle \operatorname {B} } \) is the beta function. Equivalently the pmf can be written in terms of the rising factorial as

\( {\displaystyle f(k;\rho )={\frac {\rho \Gamma (\rho +1)}{(k+\rho )^{\underline {\rho +1}}}},} \)

where \( \Gamma \) is the gamma function. Thus, if \( \rho \) is an integer,

\( {\displaystyle f(k;\rho )={\frac {\rho \,\rho !\,(k-1)!}{(k+\rho )!}}.} \)

The parameter ρ {\displaystyle \rho } \rho can be estimated using a fixed point algorithm.[2]

The probability mass function f has the property that for sufficiently large k we have

\( {\displaystyle f(k;\rho )\approx {\frac {\rho \Gamma (\rho +1)}{k^{\rho +1}}}\propto {\frac {1}{k^{\rho +1}}}.} \)

Plot of the Yule–Simon(1) distribution (red) and its asymptotic Zipf's law (blue)

This means that the tail of the Yule–Simon distribution is a realization of Zipf's law: \( f(k;\rho ) \) can be used to model, for example, the relative frequency of the kth most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of k.


The Yule–Simon distribution arose originally as the limiting distribution of a particular stochastic process studied by Yule as a model for the distribution of biological taxa and subtaxa.[3] Simon dubbed this process the "Yule process" but it is more commonly known today as a preferential attachment process. The preferential attachment process is an urn process in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number the urn already contains.

The distribution also arises as a compound distribution, in which the parameter of a geometric distribution is treated as a function of random variable having an exponential distribution. Specifically, assume that W {\displaystyle W} W follows an exponential distribution with scale \( 1/\rho \) or rate \( \rho \) :

\( {\displaystyle W\sim \operatorname {Exponential} (\rho ),} \)

with density

\( {\displaystyle h(w;\rho )=\rho \exp(-\rho w).}

Then a Yule–Simon distributed variable K has the following geometric distribution conditional on W:

\( {\displaystyle K\sim \operatorname {Geometric} (1-\exp(-W))\,.} \)

The pmf of a geometric distribution is

\( {\displaystyle g(k;p)=p(1-p)^{k-1}} \)

for \( {\displaystyle k\in \{1,2,\dotsc \}} \). The Yule–Simon pmf is then the following exponential-geometric compound distribution:

\({\displaystyle f(k;\rho )=\int _{0}^{\infty }g(k;\exp(-w))h(w;\rho )\,dw.} \)

The maximum likelihood estimator for the parameter \( \rho \) given the observations \( {\displaystyle k_{1},k_{2},k_{3},\dots ,k_{N}} \) is the solution to the fixed point equation

\( {\displaystyle \rho ^{(t+1)}={\frac {N+a-1}{b+\sum _{i=1}^{N}\sum _{j=1}^{k_{i}}{\frac {1}{\rho ^{(t)}+j}}}},} \)

where \( {\displaystyle b=0,a=1} \)are the rate and shape parameters of the gamma distribution prior on \( \rho \) .

This algorithm is derived by Garcia [2] by directly optimizing the likelihood. Roberts and Roberts [4]

generalize the algorithm to Bayesian settings with the compound geometric formulation described above. Additionally, Roberts and Roberts [4] are able to use the Expectation Maximisation (EM) framework to show convergence of the fixed point algorithm. Moreover, Roberts and Roberts [4] derive the sub-linearity of the convergence rate for the fixed point algorithm. Additionally, they use the EM formulation to give 2 alternate derivations of the standard error of the estimator from the fixed point equation. The variance of the \( \lambda \) estimator is

\( {\displaystyle \operatorname {Var} ({\hat {\lambda }})={\frac {1}{{\frac {N}{{\hat {\lambda }}^{2}}}-\sum _{i=1}^{N}\sum _{j=1}^{k_{i}}{\frac {1}{({\hat {\lambda }}+j)^{2}}}}},} \)

the standard error is the square root of the quantity of this estimate divided by N.

The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule–Simon(ρ, α) distribution is defined as

\( {\displaystyle f(k;\rho ,\alpha )={\frac {\rho }{1-\alpha ^{\rho }}}\;\mathrm {B} _{1-\alpha }(k,\rho +1),\,} \)

with \( 0\leq \alpha <1. \) For \( \alpha =0 \) the ordinary Yule–Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.
See also

Zeta distribution
Scale-free network


Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)


Simon, H. A. (1955). "On a class of skew distribution functions". Biometrika. 42 (3–4): 425–440. doi:10.1093/biomet/42.3-4.425.
Garcia Garcia, Juan Manuel (2011). "A fixed-point algorithm to estimate the Yule-Simon distribution parameter". Applied Mathematics and Computation. 217 (21): 8560–8566. doi:10.1016/j.amc.2011.03.092.
Yule, G. U. (1924). "A Mathematical Theory of Evolution, based on the Conclusions of Dr. J. C. Willis, F.R.S". Philosophical Transactions of the Royal Society B. 213 (402–410): 21–87. doi:10.1098/rstb.1925.0002.
Roberts, Lucas; Roberts, Denisa (2017). "An Expectation Maximization Framework for Preferential Attachment Models". arXiv:1710.08511 [stat.CO].

Undergraduate Texts in Mathematics

Graduate Texts in Mathematics

Graduate Studies in Mathematics

Mathematics Encyclopedia



Hellenica World - Scientific Library

Retrieved from ""
All text is available under the terms of the GNU Free Documentation License