4 @package MPCDAnalysis.OnTheFlyStatisticsDDDA
5 Defines the `OnTheFlyStatisticsDDDA` class.
10 Computes sample means and their errors for (possibly) serially correlated
13 The algorithm used is called "Dynamic Distributable Decorrelation
14 Algorithm" (DDDA), and is described in
15 "Efficient algorithm for “on-the-fly” error analysis of local or distributed
16 serially correlated data"
17 by David R. Kent IV, Richard P. Muller, Amos G. Anderson,
18 William A. Goddard III, and Michael T. Feldmann,
19 Journal of Computational Chemistry,
20 November 2007, Vol. 28, No. 14, pp. 2309-2316,
21 DOI: 10.1002/jcc.20746,
22 which in turn is partly based on the article
23 "Error estimates on averages of correlated data"
24 by H. Flyvbjerg and H. G. Petersen,
25 The Journal of Chemical Physics, July 1989, Vol. 91, No. 1, pp. 461-466
34 from .OnTheFlyStatistics
import OnTheFlyStatistics
36 self.
_blocks = [OnTheFlyStatistics()]
42 Adds a datum to the sample.
44 It is assumed that the "time" intervals between subsequently added data
45 are constant; here, "time" may, for example, refer to Molecular Dynamics
54 Merges the information in `rhs` with this instance's.
56 Let \f$ x_1, x_2, \ldots, x_{N_1} \f$ be the data that have been
57 supplied to this instance prior to calling this function, with
58 \f$ N_1 \f$ being the sample size of this instance prior to calling this
59 function. Similarly, let \f$ y_1, y_2, \ldots, y_{N_2} \f$ be the data
60 supplied to `rhs` previously.
61 Then, after this function successfully returns, this instance's state is
62 approximately as if it had started empty, and then the data
63 \f$ x_1, x_2, \ldots, x_{N_1}, y_1, y_2, \ldots, y_{N_2} \f$
64 had been supplied. This is true to a high degree for the state of block
65 `0`; other blocks may have significantly different state due to the way
66 blocks are populated with data points. The implementation is equivalent
67 to the one named `Decorrelation.addition` in Appendix B of
70 The `rhs` instance is left unchanged (unless, of course, it is the same
71 object as this instance, which is allowed).
74 Throws if `rhs` is not of type `OnTheFlyStatisticsDDDA`.
77 The instance to merge into this one. Must be an instance of
78 `OnTheFlyStatisticsDDDA`.
81 if not isinstance(rhs, OnTheFlyStatisticsDDDA):
85 from .OnTheFlyStatistics
import OnTheFlyStatistics
86 while len(rhs._blocks) > len(self.
_blocks):
87 self.
_blocks.append(OnTheFlyStatistics())
90 for i
in range(0, len(self.
_blocks)):
91 if i >= len(rhs._blocks):
93 self.
_blocks[i].mergeSample(rhs._blocks[i])
95 for i
in range(0, len(self.
_blocks)):
96 if i >= len(rhs._blocks):
99 if rhs._waiting[i]
is None:
104 self.
_waiting[i] = copy.deepcopy(rhs._waiting[i])
106 mean = (self.
_waiting[i] + rhs._waiting[i]) / 2.0
110 self.
_blocks.append(OnTheFlyStatistics())
118 Returns the number of data points added so far.
126 Returns the mean of all the values added so far.
128 Since the mean of all values added is returned, and the sample size may
129 not be a power of 2, statistics with different blocking length may
130 not incorporate the same amount of information. This may lead to
131 difficulties when using the error estimates of statistics of different
132 block lengths to estimate the error in the entire, possibly correlated
133 data set, since the statistics of different blocking lengths do not
134 necessarily incorporate the same measurements.
136 If no values have been added so far, `Exception` is raised.
140 raise Exception(
"Tried to get mean without having supplied data.")
147 Returns the largest block size for which there is at least one data
150 If no values have been added so far, `Exception` is raised.
161 Returns the ID of the largest block size created so far.
169 Returns whether the block with the given `blockID` has enough data to
170 compute a sample variance.
173 The block ID, which must be an integer in the range
174 [0, `getMaximumBlockID()`].
177 if not isinstance(blockID, int):
188 Returns the sample variance in the block with the given `blockID`.
191 Throws if `not self.hasBlockVariance(blockID)`.
194 The block ID, which must be an integer in the range
195 [0, `getMaximumBlockID()`].
198 if not isinstance(blockID, int):
207 return self.
_blocks[blockID].getSampleVariance()
212 Returns the sample standard deviation in the block with the given
216 Throws if `blockID` is not of type `int`.
218 Throws if `blockID` is out of range.
220 Throws if `not self.hasBlockVariance(blockID)`.
223 The block ID, which must be an integer in the range
224 [0, `getMaximumBlockID()`].
233 Returns the raw sample standard deviation, i.e. the sample standard
234 deviation in block `0`.
237 Throws if `not self.hasBlockVariance(0)`.
245 Returns an estimate for the standard deviation of the standard error of
246 the mean for a given `blockID`.
249 Throws if `not self.hasBlockVariance(blockID)`.
252 The block ID, which must be an integer in the range
253 [0, `getMaximumBlockID()`].
256 if not isinstance(blockID, int):
265 return self.
_blocks[blockID].getStandardErrorOfTheMean()
272 Returns an estimate for the standard deviation of the standard error of
273 the mean for a given `blockID`.
275 The returned estimate corresponds to Eq. (28) in
276 "Error estimates on averages of correlated data"
277 by H. Flyvbjerg and H. G. Petersen,
278 The Journal of Chemical Physics, July 1989, Vol. 91, No. 1, pp. 461-466
279 DOI: 10.1063/1.457480
282 Throws if `not self.hasBlockVariance(blockID)`.
285 The block ID, which must be an integer in the range
286 [0, `getMaximumBlockID()`].
289 if not isinstance(blockID, int):
302 return se / math.sqrt(2 * reducedSampleSize)
307 Returns the block ID corresponding to the optimal block size, in the
308 sense that the corresponding block provides the most accurate estimate
309 for the standard error of the mean.
311 If there is no variance in the data, `0` is returned.
313 The algorithm used is described in Section IV of the article
314 "Strategies for improving the efficiency of quantum Monte Carlo
316 by R. M. Lee, G. J. Conduit, N. Nemec, P. López Ríos, and
318 Physical Review E, June 2011, Vol. 83, No. 6, pp. 066706
319 DOI: 10.1103/PhysRevE.83.066706
322 Throws if fewer than two data points have been supplied.
329 self.
_blocks[0].getStandardErrorOfTheMean()
331 if rawStandardError == 0:
343 blockSize = 2 ** blockID
344 blockedStandardError = \
345 self.
_blocks[blockID].getStandardErrorOfTheMean()
346 quotient = blockedStandardError / rawStandardError
348 if blockSize ** 3 > threshold:
349 optimalBlockID = blockID
351 return optimalBlockID
356 Returns whether the sample is large enough for the estimate of the
357 standard error of the mean, as provided by the block indicated by
358 `getOptimalBlockIDForStandardErrorOfTheMean`, to be reliable.
360 The algorithm used is described in Section IV of the article
361 "Strategies for improving the efficiency of quantum Monte Carlo
363 by R. M. Lee, G. J. Conduit, N. Nemec, P. López Ríos, and
365 Physical Review E, June 2011, Vol. 83, No. 6, pp. 066706
366 DOI: 10.1103/PhysRevE.83.066706
369 Throws if fewer than two data points have been supplied.
373 blockSize = 2 ** blockID
380 Returns the best estimation of the true standard error of the mean of
381 the data, after decorrelation.
383 @see optimalStandardErrorOfTheMeanEstimateIsReliable
385 The algorithm used is described in Section IV of the article
386 "Strategies for improving the efficiency of quantum Monte Carlo
388 by R. M. Lee, G. J. Conduit, N. Nemec, P. López Ríos, and
390 Physical Review E, June 2011, Vol. 83, No. 6, pp. 066706
391 DOI: 10.1103/PhysRevE.83.066706
394 Throws if fewer than two data points have been supplied.
404 Returns a `str` that contains the state of this instance.
406 @see unserializeFromString
414 ret +=
"|" + block.serializeToString()
417 if waiting
is not None:
425 Discards the current state, and loads the state specified in the given
429 Throws if `state` is of the wrong type.
431 Throws if `state` does not encode a valid state.
434 The state to load. Must be a `str` created by
438 if not isinstance(state, str):
441 parts = state.split(
"|")
442 if int(parts[0]) != 1:
445 blockCount = int(parts[1])
449 if len(parts) != 2 + 2 * blockCount:
452 from .OnTheFlyStatistics
import OnTheFlyStatistics
454 for i
in range(0, blockCount):
455 block = OnTheFlyStatistics()
456 block.unserializeFromString(parts[2 + i])
460 for i
in range(0, blockCount):
461 part = parts[2 + blockCount + i]
465 waiting.append(float(part))
469 blocks = [OnTheFlyStatistics()]
476 def getMPLAxes(self, showEstimatedStandardDeviation = True):
478 Returns an `matplotlib.axes.Axes` object that plots the estimated
479 standard error of the mean, and optionally its estimated standard
480 deviation, as a function of the base-2 logarithm of the block size.
483 Throws if any of the arguments have invalid types.
485 @param[in] showEstimatedStandardDeviation
486 Whether to show, for each data point, the estimated standard
487 deviation of the estimated standard error of the mean.
488 Must be a `bool` value.
491 if not isinstance(showEstimatedStandardDeviation, bool):
495 import matplotlib.figure
497 figure = matplotlib.figure.Figure()
498 axes = figure.add_subplot(1, 1, 1)
506 blockSizes.append(blockID)
510 if showEstimatedStandardDeviation:
517 errorbars.append(getSD(blockID))
519 axes.errorbar(blockSizes, values, yerr = errorbars)
521 axes.set_xlabel(
r'$ \log_2 \left( BlockSize \right) $')
522 axes.set_ylabel(
r'Estimated Standard Error of the Mean')
529 Returns whether this instance and `rhs` are equivalent, i.e. whether the
530 two instances behave as if they had been supplied the same data.
533 Throws if `rhs` is not of type `OnTheFlyStatisticsDDDA`.
536 The `OnTheFlyStatisticsDDDA` instance to compare to.
539 if not isinstance(rhs, self.__class__):
542 return self.
__dict__ == rhs.__dict__
547 Returns the negation of `self == rhs`.
550 Throws if `rhs` is not of type `OnTheFlyStatisticsDDDA`.
553 The `OnTheFlyStatisticsDDDA` instance to compare to.
556 return not self == rhs
561 Returns a string (type `str`) representation of this instance.
566 ret =
"OnTheFlyStatisticsDDDA: {"
571 ret +=
", Standard Deviation [blockID=0]: "
573 ret +=
", Variance [blockID=0]: "
579 def _addDatum(self, datum, blockID):
588 if blockID + 1 == len(self.
_blocks):
589 from .OnTheFlyStatistics
import OnTheFlyStatistics
590 self.
_blocks.append(OnTheFlyStatistics())