Geometry in Space
Geometry in Space
2
= (I
)
1
where I
= ES
2
is the mean information. If we introduce for x
1
, x
2
X a Johnson
dierence
d(x
1
, x
2
) = [S(x
2
) S(x
1
)], (1.2)
we obtain in the sample space X a non-Euclidean Johnson distance d(x
1
, x
2
) = |
d(x
1
, x
2
)|, which can
be used for the testing of hypotheses and determination of condence intervals.
Densities, Johnson scores, Johnson means and Johnson variances of distributions discussed in this
paper are given for reference in Table 1. Apart from the normal distribution, the support of all other
distributions is X = (0, ).
Table 1. Some distributions and their characteristics.
Distribution f(x) S(x) x
2
normal
1
2
e
1
2
(
x
)
2
x
2
2
lognormal
2x
e
1
2
log
2
(
x
t
)
t
log(x/t)
t t
2
/
2
Weibull
x
(
x
t
)
e
(
x
t
)
t
[(x/t)
1] t t
2
/
2
gamma
x()
x
e
x
(
x
/
1) / /
2
inv. gamma
x()
x
e
/x
(1
/
x
) /
2
/
3
beta-prime
1
xB(p,q)
x
p
(x+1)
p+q
q
p
qxp
x+1
p/q
p(p+q+1)
q
3
Fig.1 shows densities and Johnson scores of Weibull distributions with = 1 (exponential distri-
bution), = 2 (Rayleigh distribution) and = 3 (Maxwell distribution). The densities are quickly
decreasing to zero showing low probability of large observed values. Johnson scores are sensitive to
large values; this sensitivity increases with increasing . Johnson mean of all these distributions is
x
.
1
0 1 2.5
0
1.4
0 1 2.5
5
0
10
3
1
2
1
2
3
Figure 1. Densities and Johnson scores of Weibull distributions.
Fig.2 shows densities and Johnson scores of inverse gamma distributions with = = 0.6(1), 1 and
1.5(3). The densities decrease slowly to zero showing that in this case extremely large values can be
observed. Johnson scores of this heavy-tailed distributions are bounded in innity, so that averages
1
n
S(x
i
) containing large observed values are robust (the averages can be heavily inuenced, however,
by observed values near zero, which occur with low probability). The means of distributions denoted
by 1 and 2 do not exist, the mean of distribution 3 is plotted by the star. The usual description of
distributions by the mean and variance in this quite regular case fails. However, all three distributions
have the same Johnson mean x
0
(a realization of independent identically distributed
according to F
0
random variables X
1
, ...X
n
) with unknown
0
. What can be said about
0
and how
to characterize the data by a small number (two) values ?
A solution of this basic statistical problem consists of three steps:
i/ choosing an inference function Q and treat the data X
n
as
Q
n
= [Q(x
1
), ..., Q(x
n
)],
ii/ making some averages based on Q
n
,
iii/ study the properties of the estimates.
The still used inference function of the approach which we will call the naive statistics is the
identity function Q(x) = x. The distance among data is thus Euclidean, the center of the data
is the sample mean and the dispersion of values around it the sample variance. However, their
theoretical counterparts, the mean EX and variance EX
2
(EX)
2
, may not exist for some (heavy-
tailed) distributions. In such cases (in Table 1: the inverse gamma and the beta-prime distribution)
this approach does not oer any reasonable characteristics of the data.
The inference function of classical statistics is the vector of partial scores for parameters,
Q(x) =
_
1
log f(x; ), ...,
m
log f(x; )
_
.
The maximum likelihood method uses Q in a system of m equations for components of , giving the
best estimate of F
0
. However, the problem of simple characteristics of the center and variability of
the data still remains.
The inference function of the robust statistics is Q(x) = (x) where (so called psi-function) is a
suitable bounded function, suppressing the inuence of large observations. The function prescribes
a nite distance in the sample space, d
R
(x
1
, x
2
) = c|(x
2
) (x
1
)| where c = (E
)
1
, and oers
simple characteristics of the center and variability of the data. The drawback of this approach is the
lack of the connection of the function with properties of the assumed distribution F.
On the base of the account given in [1] we suggest to use as the inference function the Johnson
score. Our treated data are thus [S(x
1
), ..., S(x
n
)].
3 Basic Johnson characteristics of the data
Unlike the usual moments, the sample versions of Johnson score moments cannot be determined
without an assumption about the underlying distribution family. On the other hand, by substituting
the empirical distribution function into (1.1), the resulting system of equations
1
n
n
i=1
S
k
(x
i
; ) = ES
k
(), k = 1, ..., m (3.1)
appears to be an alternative to the system of the maximum likelihood equations. The estimates
n
from (3.1) are shown (Fabian, 2001) to be asymptotically normally distributed with mean
0
and
a certain variance
2
, i.e., AN(
0
,
2
). They can have slightly large variances than the maximum
likelihood estimates, but they are robust if the situation demands it (the heavy tailed distribution
have bounded Johnson scores).
The rst or the rst two equations of system (3.1) give for particular distributions simple estimates
of the Johnson mean or of both Johnson characteristics. Let us call the estimate x
n
and
2
n
of x
and
2
based on observations x
1
, ..., x
n
the sample Johnson mean and sample Johnson variance,
respectively
3
For some particular distributions, the rst equation of the system (3.1) can be written in the form
n
i=1
S(x
i
; x
n
) = 0. (3.2)
Proposition 1 Sample x
n
determined from (3.2) is AN(x
,
2
).
Proof. Random variables S(X) have zero mean ES = 0 and nite variance ES
2
so that x
n
is
AN(x
, 1/ES
2
) according to the Lindeberg-Levy central limit theorem.
2
= 1/ES
2
from the deni-
tion.
This is a nice result saying that the sample Johnson mean has normal distribution for any consid-
ered distribution, including distributions for which the Central limit theorem cannot be applied, and
that its variance attains the Cramer-Rao lower bound.
Proposition 2
n
d(x
, x
n
) is AN(0, 1).
Outline of the proof. The delta method theorem says that if q is AN(q
0
,
2
) then (q) is AN((q
0
), [
(q
0
)]
2
2
).
By this theorem and Proposition 1, S( x
n
) S(x
) is AN(0, [S
(x
)]
2
2
). It can be shown that
ES
= ES
2
so that
n
d( x
n
, x
) is AN(0,
2
(ES
2
)
2
2
) = AN(0, 1).
This is another nice result saying that the approximate (1)% condence intervals for the sample
Johnson mean can be determined from a simple condition
n|
d(x
, x
n
)| u
/2
, (3.3)
where
d is given by (1.2) and u
/2
is the (/2)-th quantile of the normal distribution (u
/2
= 1.96 for
= 5).
Denition 1 Let X, Y be random variables supported by X and Y, respectively, with joint distribution
F, marginal distributions with Johnson scores S
X
, S
Y
and Johnson information I
X
, I
Y
. Let f be the
joint density of (X, Y ). Value
i
XY
=
1
_
I
X
I
Y
_
X
_
Y
S
X
(x)S
Y
(y)f(x, y) dxdy
will be called a Johnson mutual information of X and Y .
Obviously, |i
XY
| 1 according the Cauchy-Schwartz inequality. Having sample (x
i
, y
i
), i = 1, ..., n,
taken from (X, Y ), the sample Johnson mutual information is
XY
=
n
i=1
S
X
(x
i
)S
Y
(y
i
)
_
n
i=1
S
2
X
(x
i
)
n
i=1
S
2
Y
(y
i
)
_
1/2
. (3.4)
(3.4) can serve as an empirical measure of the association between X and Y .
4 Examples
In this section we show examples of statistical procedures which take into account the particular
geometry in the sample space of the assumed distribution.
Normal distribution. Johnson score of the normal distribution is S(x) =
x
2
, x
= and I
=
ES
2
= 1/
2
. The sample Johnson mean and sample Johnson variance are the usual mean and variance;
other statistics are the usual statistics. The Johnson mutual information is the usual correlation
coecient.
4
On the other hand, from our point of view, the use of the data without treatment is equivalent
to the implicit assumption of the normal distribution.
In the rest of this section we denote
n
= 1.96/
n.
Lognormal distribution. The rst two equations (3.1) are
i=1
log
_
x
i
t
_
= 0
2
n
i=1
log
2
_
x
i
t
_
= 1
from which x
n
=
t
n
=
1
n
n
i=1
log x
i
,
2
n
= n/
n
i=1
log
2
(x
i
/
t
n
) and (
n
)
2
=
t
2
n
/
2
n
. Since by (1.2)
d(x
, x
n
) = log(x
/ x
n
), the 95% condence interval for the Johnson mean is, according to (3.3),
x
n
e
n
/
n
x
n
e
n
/
n
.
Weibull distribution. The rst two equations (3.1) are
n
i=1
[(x
i
/t)
1] = 0
n
i=1
[(x
i
/t)
1]
2
= 1
which are to be solved iteratively. For a constant , the sample Johnson mean x
n
=
t
n
= (n
1
n
i=1
x
i
)
1/
is the -th mean. By (1.2),
d(x
, x
n
) = (x
/ x
n
)
n
1 so that the 95% condence interval for x
is
x
n
(1
n
)
1/
n
x
n
(1 +
n
)
1/
n
.
Gamma distribution. The rst two equations (3.1) are
n
i=1
(x
i
) = 0
n
i=1
(x
i
)
2
= n
from which x
n
= / = n
1
n
i=1
x
i
= x and
2
n
= /
2
= n
1
n
i=1
x
2
i
x
2
. Johnson mean
and Johnson variance are thus equal to the normal mean and normal variance. Since
d(x
, x
n
) =
(x
/ x
n
) 1 and
= x/
n
, the 95% condence interval for x
is
x
n
n
x
x +
n
n
.
For distribution with linear Johnson score we obtained the usual symmetrical condence interval.
Inverse gamma distribution. The rst two equations (3.1) are
n
i=1
( /x
i
) = 0
n
i=1
( /x
i
)
2
= n
5
from which x
n
= / = n/
n
i=1
1/x
i
= x
H
, which is the harmonic mean, and
2
n
= x
2
H
x
2
H
x
2H
x
2H
where x
2H
= n/
n
i=1
1/x
2
i
. Since
d(x
, x
n
) =
(1 x
H
/x
) and
= x
H
n
, the 95% condence
interval for x
is
x
H
1 +
n
/ x
H
n
x
x
H
1
n
/ x
H
n
.
Beta-prime distribution. The rst two equations (3.1) are
n
i=1
qx
i
p
x
i
+ 1
= 0
n
i=1
_
qx
i
p
x
i
+ 1
_
2
=
pq
p +q + 1
. (4.1)
As x
n
=
n
i=1
x
i
1 +x
i
n
i=1
1
1 +x
i
. (4.2)
Multiplying (4.1) by 1/pq, substituting p = x
n
q and using formula for
2
from Table 1, we have
2
n
=
n
x
n
(1 + x
n
)
2
(
n
1)
2
where x
n
is given by (4.2) and
n
n
=
1
x
n
n
i=1
x
2
i
(x
i
+ 1)
2
2
n
i=1
x
i
(x
i
+ 1)
2
+ x
n
n
i=1
1
(x
i
+ 1)
2
.
Condition (3.3) is
q
n
n
x
+ 1
n
so that the 95% condence interval for x
is
x
n
n
1 +
n
x
x
n
n
1
n
where
n
=
n
/ q
n
and q is to be determined from the system x
n
= p/q,
2
n
= p(p + q + 1)/q
3
. For
example, if p = q = 1, the 95% condence interval for x
n
= 1 and n = 50 is (0.72, 1.38).
5 Simulations
Example 1. The sample Johnson mean and sample Johnson deviance (the square root of variance)
of samples of length 50 generated from distributions listed in the rst column of Table 1 and with
parameters determined by values x
5000
gamma Weibull lognorm. beta-pr. inv.gam
gamma 1.000 0.94 0.60 0.49 0.12
Weibull 1.06 1.005 0.64 0.53 0.15
lognormal 1.66 1.66 1.010 1.01 0.63
beta-prime 2.00 1.77 1.01 1.008 0.54
inv.gamma 84.4 4.71 1.70 2.13 1.022
5000
gamma 1.094 1.06 0.81 0.72 0.31
Weibull 1.17 1.108 0.83 0.75 0.39
lognormal 2.04 1.62 1.082 1.09 0.74
beta-prime 3.52 2.00 1.11 1.113 0.82
inv.gamma 187. 8.52 2.32 3.23 1.117
It is apparent from Table 2 that erroneous assumptions often lead to unacceptable estimates
(note, however, the similar results obtained under assumptions of the lognormal and beta-prime
distributions). Estimating the Johnson mean and Johnson variance, it is easy to compare mean
characteristics of the data from distributions parametrized by arbitrary ways.
Example 2. In the left part of Fig.3 we plot samples (x
i
, y
i
)
i=1,...,12
from random vector (X, Y ),
where Y = 0.35X+0.65Z and where X and Z are independent random variables with inverse gamma
distribution. In the right part are the corresponding samples [S
X
(x
i
;
X
), S
Y
(y
i
;
Y
)] computed under
the right assumption.
X
,
Y
are the estimated values of the parameters.
Figure 3.
Making dierent assumptions on underlying marginal distributions we obtained the following values
of
i
XY
:
f
X
, f
Y
gamma Weibull lognormal beta-prime inv.gamma
XY
-0.08 -0.01 0.29 0.40 0.53
It is apparent that for the estimation of the degree of the association of random variables, the
assumption on the underlying distribution is substantial.
Aknowledgements. The work was supported by GA ASCR under grant No.1ET 400300513.
References
Pawitan Y. (2001): In all likelihood. Oxford, Clarendon press.
Fabian Z. (2006): Geometry of probabilistic models. Sb. ITAT 2006, 35-40.
Fabian Z. (2008): New measures of central tendency and variability of continuous distributions.
To appear in Commun. in Statist.-Theory Meth., 2.
7