PValue.jl
Documentation for PValue.jl
A set of sparse julia functions for computing p-values in a "frequentist" or "Bayesian" scenario.
Index
PValue.ACF_EKPValue.BICPValue.FourierPeriodogramPValue.Frequentist_p_valuePValue.Gelman_Bayesian_p_valuePValue.GetACFPValue.GetCrossCorrPValue.GetPACFPValue.Lucy_Bayesian_p_valuePValue.RMSPValue.SSRPValue.SchusterPeriodogramPValue.SigmaClipPValue.WeightedArithmeticMeanPValue.Z2N
PValue.ACF_EK — Method
ACF_EK(t::Vector{Float64},y::Vector{Float64},dy::Vector{Float64}; bins::Int=20)
Compute the auto-correlation function for data irregularly sampled, following the algorithm by Edelson & Krolik (1988). This a porting of the python astroML version.
Arguments
tis the vector of observing times.yis the vector of fluxes.eyis the vector of the uncertainties of the fluxes.
Returns a tuple of three value: the ACF, the ACF uncertainties, and the bin limits.
Example
t = [1.1,2.3,3.2,4.5]
y = [1.,2.,0.5,0.3]
ey = 0.1 .* y
ACF_EK(t,y,ey,bins=2)
([0.6817802730844681, 0.2530024508265458], [0.4472135954999579, 0.5773502691896257], -3.4:3.40000000005:3.4000000001)
sourcePValue.BIC — Method
BIC(lp::AbstractFloat,ndata::Integer,nvar::Integer)Compute the Bayes Information Criterion.
Arguments
lplogarithm of the likelihood.ndatanumber of datapoints.nvarnumber of model parameters.
Examples
BIC(56.,100,3)
# output
-98.18448944203573sourcePValue.FourierPeriodogram — Method
FourierPeriodogram(signal,fs;zerofreq=true)Compute the discrete Fourier periodogram for the inpout signal.
Arguments
signalarray of input data.fssampling in frequency of the input data (1/dt).- 'zerofreq' is true (false) to (not) include the zero frequency in the output.
Outputs are two arrays: the frequencies and the powers.
Examples
FourierPeriodogram([1.,2.,3.,4.],1.)
# output
([0.0, 0.25], [100.0, 8.000000000000002])sourcePValue.Frequentist_p_value — Method
Frequentist_p_value(ssrv,ndata,nvar)
Frequentist_p_value(ssrv,ndof)Compute the 'classic' frequentist p-value.
Arguments
ssrvSSR, the sum of squared residuals.ndatanumber of datapoints.nvarnumber of fit parameters.ndofnumber of degrees of freedom
(i.e. $ndata-nvar$).
Examples
Frequentist_p_value(85.3,100,10)
# output
0.620453567577112Frequentist_p_value(85.3,90)
# output
0.620453567577112sourcePValue.Gelman_Bayesian_p_value — Method
Gelman_Bayesian_p_value(modvecs,simvecs,obsvec,errobsvec)Compute a 'Bayesian p-value' following the recipe by A. Gelman et al., 2013.
Arguments
obsvecdatapoints.errobsvecdatapoint uncertainties.modvecsmodel vector.simvecssimulated vector.
Explanation
obsvec and errobsvec are length-'m' vectors of datapoints and relative uncertainties. modvecs is a vector computed by the posterior distribution of parameters (n chains), e.g. by a MCMC, where each component is a vector of m values computed using the fit function and each set of parmeters from the posterior distribution. Finally, simvecs is like modevecs with the addition of the predicted noise, i.e. these are simulated datapoints.
The routinely essentially compares the SSR (or, in principle, any test statistics) of each model based on the derived posterior distribution of parameters vs the data and to SSR computed by simulated data and again the posterior.
Examples
A full example of application of the GelmanBayesianp_value as well as the LucyBayesianp_value and the Frequentistpvalue is reported in this Jupyter notebook.
sourcePValue.GetACF — Method
GetACF(data::Vector{Float64},lags::Integer;sigma=1.96,bartlett=false)Compute the [AutoCorrelation Function[(https://en.wikipedia.org/wiki/Autocorrelation) for the given lags. It returns a dictionary with the ACF and the minimum and maximum uncertainties against a white noise hypothesis.
Arguments
datais the vector of input data.lagslast lag to be computed.sigmanumber of sigmas for the uncertainties.bartlettif a flag driving the error computation. IftrueBartlett formula is applied.
Examples
GetACF([1.2,2.5,3.5,4.3],2)["ACF"]
# output
3-element Vector{Float64}:
1.0
0.23928737773637632
-0.2945971122496507sourcePValue.GetCrossCorr — Method
GetCrossCorr(x::Vector{Float64},y::Vector{Float64},lags::Integer)Compute the [crosscorrelation [(https://en.wikipedia.org/wiki/Cross-correlation) between the x and y datasets for the given lags. It returns the cross-correlation values.
Arguments
xis the first input vector.yis the second input vector.lagslast lag to be computed.
Examples
GetCrossCorr([1.2,2.5,3.5,4.3],[1.5,2.9,3.0,4.1],2)
# output
5-element Vector{Float64}:
-0.1926156048478174
0.1658715565267623
0.9627857395579823
0.15827215481804718
-0.15637230439086838sourcePValue.GetPACF — Method
GetPACF(data::Vector{Float64},lags::Integer;sigma=1.96)Compute the [Partial AutoCorrelation Function[(https://en.wikipedia.org/wiki/Partialautocorrelationfunction) for the given lags. It returns a dictionary with the PACF and the minimum and maximum uncertainties.
Arguments
datais the vector of input data.lagslast lag to be computed.sigmanumber of sigmas for the uncertainties.
Examples
GetPACF([1.2,2.5,3.5,4.3],1)["PACF"]
# output
2-element Vector{Float64}:
1.0
0.7819548872180438sourcePValue.Lucy_Bayesian_p_value — Method
Lucy_Bayesian_p_value(modvecs,obsvec,errobsvec,nvars)Compute a 'Bayesian p-value' following the recipe by L.B. Lucy, 2016, A&A 588, 19.
Arguments
obsvecdatapoints.errobsvecdatapoint uncertainties.nvarsnumber of parameters.
Explanation
obsvec and errobsvec are length-m vectors of datapoints and relative uncertainties. modvecs is a vector computed by the posterior distribution of parameters (n chains), e.g. by a MCMC, where each component is a vector of m values computed using the fit function and each set of parmeters from the posterior distribution. Finally, nvars is the number of parameters.
This algorithm relies on the Chi2 distribution as in the 'frequentist' case. Howver the SSR is not based only on a punt estimate but it is computed by the whole posterior distribution of parameters.
Examples
x = [1,2,3,4,5]
y = [1.01,1.95,3.05,3.97,5.1]
ey = [0.05,0.1,0.11,0.17,0.2]
f(x;a=1.,b=0.) = a.*x.+b
# Sample from a fake posterior distribution
ch = DataFrame(a=[0.99,0.95,1.01,1.02,1.03], b=[0.,-0.01,0.01,0.02,-0.01])
res = []
for i in 1:nrow(ch)
push!(res,f(x;a=ch[i,:a],b=ch[i,:b]))
end
Lucy_Bayesian_p_value(res,y,ey,2)
# output
0.7200318895143041sourcePValue.RMS — Method
RMS(datavec,modvec)Compute the Root Mean Square value.
Arguments
datavecdatapoints.modvecmodel values.
Examples
RMS([1.1,2.2],[1.15,2.15])
# output
0.050000000000000044sourcePValue.SSR — Method
SSR(modvec,obsvec,errobsvec)Compute the Sum of Squared Residuals.
Arguments
modvecmodel predictions.obsvecobserved data.- `errobsvec~ uncertainties.
Examples
SSR([1.,2.,3.,4.],[1.1,1.9,3.05,3.8],[0.1,0.05,0.2,0.1])
# output
9.062500000000016sourcePValue.SchusterPeriodogram — Method
SchusterPeriodogram(time::Vector{Float64}, flux::Vector{Float64}, freq::Vector{Float64})Computes a periodogram applying the textbook formula.
Arguments
timeis the vector of the input time-series.fluxis the vector with the time-series values.freqis the vector with the desired frequencies where to compute the power.
This is not a very efficient algorithm to compute a periodogram, i.e. it is much less efficient than the fast Fourier transform, but it can be computed at any frequency one may choose for regularly or irregularly sampled time-series.
sourcePValue.SigmaClip — Function
SigmaClip(x, ex=ones(size(x)); sigmacutlevel=2)Sigma-clipping filtering of an input array,
Arguments
xinput array.exuncertainties.sigmacutlevelsigma-clipping level.
It performs a one-iteration sigma clipping and reports a mask to select the surviving elements in the input arrays or other related arrays.
Examples
x = [4.,6.,8.,1.,3.,5.,20.]
mask = SigmaClip(x)
x[mask]
# output
6-element Vector{Float64}:
4.0
6.0
8.0
1.0
3.0
5.0sourcePValue.WeightedArithmeticMean — Method
WeightedArithmeticMean(x,ex)Compute the Weighted Arithmetic Mean.
Arguments
xinput vectorexuncertainties.
Examples
x = [1.2,2.2,4.5,3,3.6]
ex = [0.2,0.2,0.5,0.1,0.6]
WeightedArithmeticMean(x,ex)
# output
(2.634301913536499, 0.07986523020975032)sourcePValue.Z2N — Method
Z2N(freqs, time)Compute the Rayleigh power spectrum of a time series in a given range of frequencies.
Arguments
freqsis an array with frequencies in units of 1/[time].timeis an array with the time series where to find a period.harmis the number of harmonics to be used in the analysis.
Examples
Z2N([1.,0.5,0.25], [1.,2.,2.5,3.5,5.])
# output
3-element Vector{Any}:
0.4
0.4000000000000002
0.537258300203048source