PValue.jl

Documentation for PValue.jl

A set of sparse julia functions for computing p-values in a "frequentist" or "Bayesian" scenario.

Index

PValue.ACF_EKMethod

ACF_EK(t::Vector{Float64},y::Vector{Float64},dy::Vector{Float64}; bins::Int=20)

Compute the auto-correlation function for data irregularly sampled, following the algorithm by Edelson & Krolik (1988). This a porting of the python astroML version.

Arguments

  • t is the vector of observing times.
  • y is the vector of fluxes.
  • ey is the vector of the uncertainties of the fluxes.

Returns a tuple of three value: the ACF, the ACF uncertainties, and the bin limits.

Example

t = [1.1,2.3,3.2,4.5]
y = [1.,2.,0.5,0.3]
ey = 0.1 .* y

ACF_EK(t,y,ey,bins=2)
([0.6817802730844681, 0.2530024508265458], [0.4472135954999579, 0.5773502691896257], -3.4:3.40000000005:3.4000000001)
source
PValue.BICMethod
BIC(lp::AbstractFloat,ndata::Integer,nvar::Integer)

Compute the Bayes Information Criterion.

Arguments

  • lp logarithm of the likelihood.
  • ndata number of datapoints.
  • nvar number of model parameters.

Examples


BIC(56.,100,3)

# output

-98.18448944203573
source
PValue.FourierPeriodogramMethod
FourierPeriodogram(signal,fs;zerofreq=true)

Compute the discrete Fourier periodogram for the inpout signal.

Arguments

  • signal array of input data.
  • fs sampling in frequency of the input data (1/dt).
  • 'zerofreq' is true (false) to (not) include the zero frequency in the output.

Outputs are two arrays: the frequencies and the powers.

Examples


FourierPeriodogram([1.,2.,3.,4.],1.)

# output

([0.0, 0.25], [100.0, 8.000000000000002])
source
PValue.Frequentist_p_valueMethod
Frequentist_p_value(ssrv,ndata,nvar)
Frequentist_p_value(ssrv,ndof)

Compute the 'classic' frequentist p-value.

Arguments

  • ssrv SSR, the sum of squared residuals.
  • ndata number of datapoints.
  • nvar number of fit parameters.
  • ndof number of degrees of freedom

(i.e. $ndata-nvar$).

Examples


Frequentist_p_value(85.3,100,10)

# output

0.620453567577112
Frequentist_p_value(85.3,90)

# output

0.620453567577112
source
PValue.Gelman_Bayesian_p_valueMethod
Gelman_Bayesian_p_value(modvecs,simvecs,obsvec,errobsvec)

Compute a 'Bayesian p-value' following the recipe by A. Gelman et al., 2013.

Arguments

  • obsvec datapoints.
  • errobsvec datapoint uncertainties.
  • modvecs model vector.
  • simvecs simulated vector.

Explanation

obsvec and errobsvec are length-'m' vectors of datapoints and relative uncertainties. modvecs is a vector computed by the posterior distribution of parameters (n chains), e.g. by a MCMC, where each component is a vector of m values computed using the fit function and each set of parmeters from the posterior distribution. Finally, simvecs is like modevecs with the addition of the predicted noise, i.e. these are simulated datapoints.

The routinely essentially compares the SSR (or, in principle, any test statistics) of each model based on the derived posterior distribution of parameters vs the data and to SSR computed by simulated data and again the posterior.

Examples

A full example of application of the GelmanBayesianp_value as well as the LucyBayesianp_value and the Frequentistpvalue is reported in this Jupyter notebook.

source
PValue.GetACFMethod
GetACF(data::Vector{Float64},lags::Integer;sigma=1.96)

Compute the [AutoCorrelation Function[(https://en.wikipedia.org/wiki/Autocorrelation) for the given lags. It returns a dictionary with the ACF and the minimum and maximum uncertainties against a white noise hypothesis.

Arguments

  • data is the vector of input data.
  • lags last lag to be computed.
  • sigma number of sigmas for the uncertainties.

Examples


GetACF([1.2,2.5,3.5,4.3],2)["ACF"]

# output

3-element Vector{Float64}:
  1.0
  0.23928737773637632
 -0.2945971122496507
source
PValue.GetCrossCorrMethod
GetCrossCorr(x::Vector{Float64},y::Vector{Float64},lags::Integer)

Compute the [crosscorrelation [(https://en.wikipedia.org/wiki/Cross-correlation) between the x and y datasets for the given lags. It returns the cross-correlation values.

Arguments

  • x is the first input vector.
  • y is the second input vector.
  • lags last lag to be computed.

Examples


GetCrossCorr([1.2,2.5,3.5,4.3],[1.5,2.9,3.0,4.1],2)

# output

5-element Vector{Float64}:
 -0.1926156048478174
  0.1658715565267623
  0.9627857395579823
  0.15827215481804718
 -0.15637230439086838
source
PValue.GetPACFMethod
GetPACF(data::Vector{Float64},lags::Integer;sigma=1.96)

Compute the [Partial AutoCorrelation Function[(https://en.wikipedia.org/wiki/Partialautocorrelationfunction) for the given lags. It returns a dictionary with the PACF and the minimum and maximum uncertainties.

Arguments

  • data is the vector of input data.
  • lags last lag to be computed.
  • sigma number of sigmas for the uncertainties.

Examples


GetPACF([1.2,2.5,3.5,4.3],1)["PACF"]

# output

2-element Vector{Float64}:
 1.0
 0.7819548872180438
source
PValue.Lucy_Bayesian_p_valueMethod
Lucy_Bayesian_p_value(modvecs,obsvec,errobsvec,nvars)

Compute a 'Bayesian p-value' following the recipe by L.B. Lucy, 2016, A&A 588, 19.

Arguments

  • obsvec datapoints.
  • errobsvec datapoint uncertainties.
  • nvars number of parameters.

Explanation

obsvec and errobsvec are length-m vectors of datapoints and relative uncertainties. modvecs is a vector computed by the posterior distribution of parameters (n chains), e.g. by a MCMC, where each component is a vector of m values computed using the fit function and each set of parmeters from the posterior distribution. Finally, nvars is the number of parameters.

This algorithm relies on the Chi2 distribution as in the 'frequentist' case. Howver the SSR is not based only on a punt estimate but it is computed by the whole posterior distribution of parameters.

Examples


x = [1,2,3,4,5]
y = [1.01,1.95,3.05,3.97,5.1]
ey = [0.05,0.1,0.11,0.17,0.2]

f(x;a=1.,b=0.) = a.*x.+b

# Sample from a fake posterior distribution
ch = DataFrame(a=[0.99,0.95,1.01,1.02,1.03], b=[0.,-0.01,0.01,0.02,-0.01])

res = []
for i in 1:nrow(ch)
    push!(res,f(x;a=ch[i,:a],b=ch[i,:b]))
end

Lucy_Bayesian_p_value(res,y,ey,2)

# output

0.7200318895143041
source
PValue.RMSMethod
RMS(datavec,modvec)

Compute the Root Mean Square value.

Arguments

  • datavec datapoints.
  • modvec model values.

Examples


RMS([1.1,2.2],[1.15,2.15])

# output

0.050000000000000044
source
PValue.SSRMethod
SSR(modvec,obsvec,errobsvec)

Compute the Sum of Squared Residuals.

Arguments

  • modvec model predictions.
  • obsvec observed data.
  • `errobsvec~ uncertainties.

Examples


SSR([1.,2.,3.,4.],[1.1,1.9,3.05,3.8],[0.1,0.05,0.2,0.1])

# output

9.062500000000016
source
PValue.SigmaClipFunction
SigmaClip(x, ex=ones(size(x)); sigmacutlevel=2)

Sigma-clipping filtering of an input array,

Arguments

  • x input array.
  • ex uncertainties.
  • sigmacutlevel sigma-clipping level.

It performs a one-iteration sigma clipping and reports a mask to select the surviving elements in the input arrays or other related arrays.

Examples


x = [4.,6.,8.,1.,3.,5.,20.]
mask = SigmaClip(x)
x[mask]

# output

6-element Vector{Float64}:
 4.0
 6.0
 8.0
 1.0
 3.0
 5.0
source
PValue.Z2NMethod
Z2N(freqs, time)

Compute the Rayleigh power spectrum of a time series in a given range of frequencies.

Arguments

  • freqs is an array with frequencies in units of 1/[time].
  • time is an array with the time series where to find a period.
  • harm is the number of harmonics to be used in the analysis.

Examples


Z2N([1.,0.5,0.25], [1.,2.,2.5,3.5,5.])

# output

3-element Vector{Any}:
 0.4
 0.4000000000000002
 0.537258300203048
source