PValue.jl

Documentation for PValue.jl

A set of sparse julia functions for computing p-values in a "frequentist" or "Bayesian" scenario.

PValue.jl
- Index

Index

PValue.ACF_EK
PValue.BIC
PValue.FourierPeriodogram
PValue.Frequentist_p_value
PValue.Gelman_Bayesian_p_value
PValue.GetACF
PValue.GetCrossCorr
PValue.GetPACF
PValue.Lucy_Bayesian_p_value
PValue.RMS
PValue.SSR
PValue.SigmaClip
PValue.WeightedArithmeticMean
PValue.Z2N

PValue.ACF_EK — Method

ACF_EK(t::Vector{Float64},y::Vector{Float64},dy::Vector{Float64}; bins::Int=20)

Compute the auto-correlation function for data irregularly sampled, following the algorithm by Edelson & Krolik (1988). This a porting of the python astroML version.

Arguments

t is the vector of observing times.
y is the vector of fluxes.
ey is the vector of the uncertainties of the fluxes.

Returns a tuple of three value: the ACF, the ACF uncertainties, and the bin limits.

Example

t = [1.1,2.3,3.2,4.5]
y = [1.,2.,0.5,0.3]
ey = 0.1 .* y

ACF_EK(t,y,ey,bins=2)
([0.6817802730844681, 0.2530024508265458], [0.4472135954999579, 0.5773502691896257], -3.4:3.40000000005:3.4000000001)

source

PValue.BIC — Method

BIC(lp::AbstractFloat,ndata::Integer,nvar::Integer)

Compute the Bayes Information Criterion.

Arguments

lp logarithm of the likelihood.
ndata number of datapoints.
nvar number of model parameters.

Examples


BIC(56.,100,3)

# output

-98.18448944203573

source

PValue.FourierPeriodogram — Method

FourierPeriodogram(signal,fs;zerofreq=true)

Compute the discrete Fourier periodogram for the inpout signal.

Arguments

signal array of input data.
fs sampling in frequency of the input data (1/dt).
'zerofreq' is true (false) to (not) include the zero frequency in the output.

Outputs are two arrays: the frequencies and the powers.

Examples


FourierPeriodogram([1.,2.,3.,4.],1.)

# output

([0.0, 0.25], [100.0, 8.000000000000002])

source

PValue.Frequentist_p_value — Method

Frequentist_p_value(ssrv,ndata,nvar)
Frequentist_p_value(ssrv,ndof)

Compute the 'classic' frequentist p-value.

Arguments

ssrv SSR, the sum of squared residuals.
ndata number of datapoints.
nvar number of fit parameters.
ndof number of degrees of freedom

(i.e. $ndata-nvar$).

Examples


Frequentist_p_value(85.3,100,10)

# output

0.620453567577112

Frequentist_p_value(85.3,90)

# output

0.620453567577112

source

PValue.Gelman_Bayesian_p_value — Method

Gelman_Bayesian_p_value(modvecs,simvecs,obsvec,errobsvec)

Compute a 'Bayesian p-value' following the recipe by A. Gelman et al., 2013.

Arguments

obsvec datapoints.
errobsvec datapoint uncertainties.
modvecs model vector.
simvecs simulated vector.

Explanation

obsvec and errobsvec are length-'m' vectors of datapoints and relative uncertainties. modvecs is a vector computed by the posterior distribution of parameters (n chains), e.g. by a MCMC, where each component is a vector of m values computed using the fit function and each set of parmeters from the posterior distribution. Finally, simvecs is like modevecs with the addition of the predicted noise, i.e. these are simulated datapoints.

The routinely essentially compares the SSR (or, in principle, any test statistics) of each model based on the derived posterior distribution of parameters vs the data and to SSR computed by simulated data and again the posterior.

Examples

A full example of application of the GelmanBayesianp_value as well as the LucyBayesianp_value and the Frequentistpvalue is reported in this Jupyter notebook.

source

PValue.GetACF — Method

GetACF(data::Vector{Float64},lags::Integer;sigma=1.96)

Compute the [AutoCorrelation Function[(https://en.wikipedia.org/wiki/Autocorrelation) for the given lags. It returns a dictionary with the ACF and the minimum and maximum uncertainties against a white noise hypothesis.

Arguments

data is the vector of input data.
lags last lag to be computed.
sigma number of sigmas for the uncertainties.

Examples


GetACF([1.2,2.5,3.5,4.3],2)["ACF"]

# output

3-element Vector{Float64}:
  1.0
  0.23928737773637632
 -0.2945971122496507

source

PValue.GetCrossCorr — Method

GetCrossCorr(x::Vector{Float64},y::Vector{Float64},lags::Integer)

Compute the [crosscorrelation [(https://en.wikipedia.org/wiki/Cross-correlation) between the x and y datasets for the given lags. It returns the cross-correlation values.

Arguments

x is the first input vector.
y is the second input vector.
lags last lag to be computed.

Examples


GetCrossCorr([1.2,2.5,3.5,4.3],[1.5,2.9,3.0,4.1],2)

# output

5-element Vector{Float64}:
 -0.1926156048478174
  0.1658715565267623
  0.9627857395579823
  0.15827215481804718
 -0.15637230439086838

source

PValue.GetPACF — Method

GetPACF(data::Vector{Float64},lags::Integer;sigma=1.96)

Compute the [Partial AutoCorrelation Function[(https://en.wikipedia.org/wiki/Partialautocorrelationfunction) for the given lags. It returns a dictionary with the PACF and the minimum and maximum uncertainties.

Arguments

data is the vector of input data.
lags last lag to be computed.
sigma number of sigmas for the uncertainties.

Examples


GetPACF([1.2,2.5,3.5,4.3],1)["PACF"]

# output

2-element Vector{Float64}:
 1.0
 0.7819548872180438

source

PValue.Lucy_Bayesian_p_value — Method

Lucy_Bayesian_p_value(modvecs,obsvec,errobsvec,nvars)

Compute a 'Bayesian p-value' following the recipe by L.B. Lucy, 2016, A&A 588, 19.

Arguments

obsvec datapoints.
errobsvec datapoint uncertainties.
nvars number of parameters.

Explanation

obsvec and errobsvec are length-m vectors of datapoints and relative uncertainties. modvecs is a vector computed by the posterior distribution of parameters (n chains), e.g. by a MCMC, where each component is a vector of m values computed using the fit function and each set of parmeters from the posterior distribution. Finally, nvars is the number of parameters.

This algorithm relies on the Chi2 distribution as in the 'frequentist' case. Howver the SSR is not based only on a punt estimate but it is computed by the whole posterior distribution of parameters.

Examples


x = [1,2,3,4,5]
y = [1.01,1.95,3.05,3.97,5.1]
ey = [0.05,0.1,0.11,0.17,0.2]

f(x;a=1.,b=0.) = a.*x.+b

# Sample from a fake posterior distribution
ch = DataFrame(a=[0.99,0.95,1.01,1.02,1.03], b=[0.,-0.01,0.01,0.02,-0.01])

res = []
for i in 1:nrow(ch)
    push!(res,f(x;a=ch[i,:a],b=ch[i,:b]))
end

Lucy_Bayesian_p_value(res,y,ey,2)

# output

0.7200318895143041

source

PValue.RMS — Method

RMS(datavec,modvec)

Compute the Root Mean Square value.

Arguments

datavec datapoints.
modvec model values.

Examples


RMS([1.1,2.2],[1.15,2.15])

# output

0.050000000000000044

source

PValue.SSR — Method

SSR(modvec,obsvec,errobsvec)

Compute the Sum of Squared Residuals.

Arguments

modvec model predictions.
obsvec observed data.
`errobsvec~ uncertainties.

Examples


SSR([1.,2.,3.,4.],[1.1,1.9,3.05,3.8],[0.1,0.05,0.2,0.1])

# output

9.062500000000016

source

PValue.SigmaClip — Function

SigmaClip(x, ex=ones(size(x)); sigmacutlevel=2)

Sigma-clipping filtering of an input array,

Arguments

x input array.
ex uncertainties.
sigmacutlevel sigma-clipping level.

It performs a one-iteration sigma clipping and reports a mask to select the surviving elements in the input arrays or other related arrays.

Examples


x = [4.,6.,8.,1.,3.,5.,20.]
mask = SigmaClip(x)
x[mask]

# output

6-element Vector{Float64}:
 4.0
 6.0
 8.0
 1.0
 3.0
 5.0

source

PValue.WeightedArithmeticMean — Method

WeightedArithmeticMean(x,ex)

Compute the Weighted Arithmetic Mean.

Arguments

x input vector
ex uncertainties.

Examples


x = [1.2,2.2,4.5,3,3.6]
ex = [0.2,0.2,0.5,0.1,0.6]

WeightedArithmeticMean(x,ex)

# output

(2.634301913536499, 0.07986523020975032)

source

PValue.Z2N — Method

Z2N(freqs, time)

Compute the Rayleigh power spectrum of a time series in a given range of frequencies.

Arguments

freqs is an array with frequencies in units of 1/[time].
time is an array with the time series where to find a period.
harm is the number of harmonics to be used in the analysis.

Examples


Z2N([1.,0.5,0.25], [1.,2.,2.5,3.5,5.])

# output

3-element Vector{Any}:
 0.4
 0.4000000000000002
 0.537258300203048

source