PValue.jl
Documentation for PValue.jl
A set of sparse julia functions for computing p-values in a "frequentist" or "Bayesian" scenario.
Index
PValue.ACF_EK
PValue.BIC
PValue.FourierPeriodogram
PValue.Frequentist_p_value
PValue.Gelman_Bayesian_p_value
PValue.GetACF
PValue.GetCrossCorr
PValue.GetPACF
PValue.Lucy_Bayesian_p_value
PValue.RMS
PValue.SSR
PValue.SigmaClip
PValue.WeightedArithmeticMean
PValue.Z2N
PValue.ACF_EK
— MethodACF_EK(t::Vector{Float64},y::Vector{Float64},dy::Vector{Float64}; bins::Int=20)
Compute the auto-correlation function for data irregularly sampled, following the algorithm by Edelson & Krolik (1988). This a porting of the python
astroML version.
Arguments
t
is the vector of observing times.y
is the vector of fluxes.ey
is the vector of the uncertainties of the fluxes.
Returns a tuple of three value: the ACF, the ACF uncertainties, and the bin limits.
Example
t = [1.1,2.3,3.2,4.5]
y = [1.,2.,0.5,0.3]
ey = 0.1 .* y
ACF_EK(t,y,ey,bins=2)
([0.6817802730844681, 0.2530024508265458], [0.4472135954999579, 0.5773502691896257], -3.4:3.40000000005:3.4000000001)
PValue.BIC
— MethodBIC(lp::AbstractFloat,ndata::Integer,nvar::Integer)
Compute the Bayes Information Criterion.
Arguments
lp
logarithm of the likelihood.ndata
number of datapoints.nvar
number of model parameters.
Examples
BIC(56.,100,3)
# output
-98.18448944203573
PValue.FourierPeriodogram
— MethodFourierPeriodogram(signal,fs;zerofreq=true)
Compute the discrete Fourier periodogram for the inpout signal.
Arguments
signal
array of input data.fs
sampling in frequency of the input data (1/dt).- 'zerofreq' is true (false) to (not) include the zero frequency in the output.
Outputs are two arrays: the frequencies and the powers.
Examples
FourierPeriodogram([1.,2.,3.,4.],1.)
# output
([0.0, 0.25], [100.0, 8.000000000000002])
PValue.Frequentist_p_value
— MethodFrequentist_p_value(ssrv,ndata,nvar)
Frequentist_p_value(ssrv,ndof)
Compute the 'classic' frequentist p-value.
Arguments
ssrv
SSR, the sum of squared residuals.ndata
number of datapoints.nvar
number of fit parameters.ndof
number of degrees of freedom
(i.e. $ndata-nvar$).
Examples
Frequentist_p_value(85.3,100,10)
# output
0.620453567577112
Frequentist_p_value(85.3,90)
# output
0.620453567577112
PValue.Gelman_Bayesian_p_value
— MethodGelman_Bayesian_p_value(modvecs,simvecs,obsvec,errobsvec)
Compute a 'Bayesian p-value' following the recipe by A. Gelman et al., 2013.
Arguments
obsvec
datapoints.errobsvec
datapoint uncertainties.modvecs
model vector.simvecs
simulated vector.
Explanation
obsvec
and errobsvec
are length-'m' vectors of datapoints and relative uncertainties. modvecs
is a vector computed by the posterior distribution of parameters (n
chains), e.g. by a MCMC, where each component is a vector of m
values computed using the fit function and each set of parmeters from the posterior distribution. Finally, simvecs
is like modevecs
with the addition of the predicted noise, i.e. these are simulated datapoints.
The routinely essentially compares the SSR (or, in principle, any test statistics) of each model based on the derived posterior distribution of parameters vs the data and to SSR computed by simulated data and again the posterior.
Examples
A full example of application of the GelmanBayesianp_value as well as the LucyBayesianp_value and the Frequentistpvalue is reported in this Jupyter notebook.
PValue.GetACF
— MethodGetACF(data::Vector{Float64},lags::Integer;sigma=1.96)
Compute the [AutoCorrelation Function[(https://en.wikipedia.org/wiki/Autocorrelation) for the given lags. It returns a dictionary with the ACF and the minimum and maximum uncertainties against a white noise hypothesis.
Arguments
data
is the vector of input data.lags
last lag to be computed.sigma
number of sigmas for the uncertainties.
Examples
GetACF([1.2,2.5,3.5,4.3],2)["ACF"]
# output
3-element Vector{Float64}:
1.0
0.23928737773637632
-0.2945971122496507
PValue.GetCrossCorr
— MethodGetCrossCorr(x::Vector{Float64},y::Vector{Float64},lags::Integer)
Compute the [crosscorrelation [(https://en.wikipedia.org/wiki/Cross-correlation) between the x
and y
datasets for the given lags. It returns the cross-correlation values.
Arguments
x
is the first input vector.y
is the second input vector.lags
last lag to be computed.
Examples
GetCrossCorr([1.2,2.5,3.5,4.3],[1.5,2.9,3.0,4.1],2)
# output
5-element Vector{Float64}:
-0.1926156048478174
0.1658715565267623
0.9627857395579823
0.15827215481804718
-0.15637230439086838
PValue.GetPACF
— MethodGetPACF(data::Vector{Float64},lags::Integer;sigma=1.96)
Compute the [Partial AutoCorrelation Function[(https://en.wikipedia.org/wiki/Partialautocorrelationfunction) for the given lags. It returns a dictionary with the PACF and the minimum and maximum uncertainties.
Arguments
data
is the vector of input data.lags
last lag to be computed.sigma
number of sigmas for the uncertainties.
Examples
GetPACF([1.2,2.5,3.5,4.3],1)["PACF"]
# output
2-element Vector{Float64}:
1.0
0.7819548872180438
PValue.Lucy_Bayesian_p_value
— MethodLucy_Bayesian_p_value(modvecs,obsvec,errobsvec,nvars)
Compute a 'Bayesian p-value' following the recipe by L.B. Lucy, 2016, A&A 588, 19.
Arguments
obsvec
datapoints.errobsvec
datapoint uncertainties.nvars
number of parameters.
Explanation
obsvec
and errobsvec
are length-m
vectors of datapoints and relative uncertainties. modvecs
is a vector computed by the posterior distribution of parameters (n
chains), e.g. by a MCMC, where each component is a vector of m
values computed using the fit function and each set of parmeters from the posterior distribution. Finally, nvars
is the number of parameters.
This algorithm relies on the Chi2 distribution as in the 'frequentist' case. Howver the SSR is not based only on a punt estimate but it is computed by the whole posterior distribution of parameters.
Examples
x = [1,2,3,4,5]
y = [1.01,1.95,3.05,3.97,5.1]
ey = [0.05,0.1,0.11,0.17,0.2]
f(x;a=1.,b=0.) = a.*x.+b
# Sample from a fake posterior distribution
ch = DataFrame(a=[0.99,0.95,1.01,1.02,1.03], b=[0.,-0.01,0.01,0.02,-0.01])
res = []
for i in 1:nrow(ch)
push!(res,f(x;a=ch[i,:a],b=ch[i,:b]))
end
Lucy_Bayesian_p_value(res,y,ey,2)
# output
0.7200318895143041
PValue.RMS
— MethodRMS(datavec,modvec)
Compute the Root Mean Square value.
Arguments
datavec
datapoints.modvec
model values.
Examples
RMS([1.1,2.2],[1.15,2.15])
# output
0.050000000000000044
PValue.SSR
— MethodSSR(modvec,obsvec,errobsvec)
Compute the Sum of Squared Residuals.
Arguments
modvec
model predictions.obsvec
observed data.- `errobsvec~ uncertainties.
Examples
SSR([1.,2.,3.,4.],[1.1,1.9,3.05,3.8],[0.1,0.05,0.2,0.1])
# output
9.062500000000016
PValue.SigmaClip
— FunctionSigmaClip(x, ex=ones(size(x)); sigmacutlevel=2)
Sigma-clipping filtering of an input array,
Arguments
x
input array.ex
uncertainties.sigmacutlevel
sigma-clipping level.
It performs a one-iteration sigma clipping and reports a mask to select the surviving elements in the input arrays or other related arrays.
Examples
x = [4.,6.,8.,1.,3.,5.,20.]
mask = SigmaClip(x)
x[mask]
# output
6-element Vector{Float64}:
4.0
6.0
8.0
1.0
3.0
5.0
PValue.WeightedArithmeticMean
— MethodWeightedArithmeticMean(x,ex)
Compute the Weighted Arithmetic Mean.
Arguments
x
input vectorex
uncertainties.
Examples
x = [1.2,2.2,4.5,3,3.6]
ex = [0.2,0.2,0.5,0.1,0.6]
WeightedArithmeticMean(x,ex)
# output
(2.634301913536499, 0.07986523020975032)
PValue.Z2N
— MethodZ2N(freqs, time)
Compute the Rayleigh power spectrum of a time series in a given range of frequencies.
Arguments
freqs
is an array with frequencies in units of 1/[time].time
is an array with the time series where to find a period.harm
is the number of harmonics to be used in the analysis.
Examples
Z2N([1.,0.5,0.25], [1.,2.,2.5,3.5,5.])
# output
3-element Vector{Any}:
0.4
0.4000000000000002
0.537258300203048