# Data Analysis, Statistics and Probability (physics.data-an)

• Shannon quantum information entropies $S_{x,k}$, Fisher informations $I_{x,k}$, Onicescu energies $O_{x,k}$ and statistical complexities $e^{S_{x,k}}O_{x,k}$ are calculated both in the position (subscript $x$) and momentum ($k$) representations for the Robin quantum well characterized by the extrapolation lengths $\Lambda_-$ and $\Lambda_+$ at the two confining surfaces. The analysis concentrates on finding and explaining the most characteristic features of these quantum information measures in the whole range of variation of the Robin distance $\Lambda$ for the symmetric, $\Lambda_-=\Lambda_+=\Lambda$, and antisymmetric, $\Lambda_-=-\Lambda_+=\Lambda$, geometries. Analytic results obtained in the limiting cases of the extremely large and very small magnitudes of the extrapolation parameter are corroborated by the exact numerical computations that are extended to the arbitrary length $\Lambda$. It is confirmed, in particular, that the entropic uncertainty relation $S_{x_n}+S_{k_n}\geq1+\ln\pi$ and general inequality $e^SO\geq1$, which is valid both in the position and momentum spaces, hold true at any Robin distance and for every quantum state $n$. For either configuration, there is a range of the extrapolation lengths where the rule $S_{x_{n+1}}(\Lambda)+S_{k_{n+1}}(\Lambda)\geq S_{x_n}(\Lambda)+S_{k_n}(\Lambda)$ that is correct for the Neumann ($\Lambda=\infty$) or Dirichlet ($\Lambda=0$) boundary conditions, is violated. Other analytic and numerical results for all measures are discussed too and their physical meaning is highlighted.
• Thermodynamic properties of Robin quantum well with extrapolation length $\Lambda$ are analyzed theoretically both for canonical and two grand canonical ensembles with special attention being paid to situation when energies of one or two lowest-lying states are split-off from rest of spectrum by large gap that is controlled by varying $\Lambda$. For single split-off level, which exists for the geometry with equal magnitudes but opposite signs of Robin distances on confining interfaces, heat capacity $c_V$ of canonical averaging is a nonmonotonic function of temperature $T$ with its salient maximum growing to infinity as $\ln^2\Lambda$ for decreasing to zero extrapolation length and its position being proportional to $1/(\Lambda^2\ln\Lambda)$. Specific heat per particle $c_N$ of Fermi-Dirac ensemble depends nonmonotonically on temperature too with its pronounced extremum being foregone on $T$ axis by plateau whose value at dying $\Lambda$ is $(N-1)/(2N)k_B$, with $N$ being a number of fermions. Maximum of $c_N$, similar to canonical averaging, unrestrictedly increases as $\Lambda$ goes to zero and is the largest for one particle. Most essential property of Bose-Einstein ensemble is a formation, for growing number of bosons, of sharp asymmetric shape on the $c_N-T$ characteristics that is more protrusive at the smaller Robin distances. This cusp-like structure is a manifestation of the phase transition to the condensate state. For two split-off orbitals, one additional maximum emerges whose position is shifted to colder temperatures with increase of energy gap between these two states and their higher-lying counterparts and whose magnitude approaches $\Lambda$-independent value. All these physical phenomena are qualitatively and quantitatively explained by variation of energy spectrum by Robin distance.
• Four-dimensional variational data assimilation (4DVar) has become an increasingly important tool in data science with wide applications in many engineering and scientific fields such as geoscience1-12, biology13 and the financial industry14. The 4DVar seeks a solution that minimizes the departure from the background field and the mismatch between the forecast trajectory and the observations within an assimilation window. The current state-of-the-art 4DVar offers only two choices by using different forms of the forecast model: the strong- and weak-constrained 4DVar approaches15-16. The former ignores the model error and only corrects the initial condition error at the expense of reduced accuracy; while the latter accounts for both the initial and model errors and corrects them separately, which increases computational costs and uncertainty. To overcome these limitations, here we develop an integral correcting 4DVar (i4DVar) approach by treating all errors as a whole and correcting them simultaneously and indiscriminately. To achieve that, a novel exponentially decaying function is proposed to characterize the error evolution and correct it at each time step in the i4DVar. As a result, the i4DVar greatly enhances the capability of the strong-constrained 4DVar for correcting the model error while also overcomes the limitation of the weak-constrained 4DVar for being prohibitively expensive with added uncertainty. Numerical experiments with the Lorenz model show that the i4DVar significantly outperforms the existing 4DVar approaches. It has the potential to be applied in many scientific and engineering fields and industrial sectors in the big data era because of its ease of implementation and superior performance.
• Temporal inhomogeneities observed in various natural and social phenomena have often been characterized in terms of scaling behaviors in the autocorrelation function with a decaying exponent $\gamma$, the interevent time distribution with a power-law exponent $\alpha$, and the burst size distributions. Here the interevent time is defined as a time interval between two consecutive events in the event sequence, and the burst size denotes the number of events in a bursty train detected for a given time window. In order to understand such temporal scaling behaviors implying a hierarchical temporal structure, we devise a hierarchical burst model by assuming that each observed event might be a consequence of the multi-level causal or decision-making process. By studying our model analytically and numerically, we confirm the scaling relation $\alpha+\gamma=2$, established for the uncorrelated interevent times, despite of the existence of correlations between interevent times. Such correlations between interevent times are supported by the stretched exponential burst size distributions, for which we provide an analytic argument. In addition, by imposing conditions for the ordering of events, we observe an additional feature of log-periodic behavior in the autocorrelation function. Our modeling approach for the hierarchical temporal structure can help us better understand the underlying mechanisms behind complex bursty dynamics showing temporal scaling behaviors.
• A new data cleaning procedure for electron cyclotron emission imaging (ECEI) of EAST tokamak is constructed. Machine learning techniques, including SVM and Decision tree, are applied to identifying saturated, zero, and weak signals of ECEI raw data, which not only reduces the effort of researchers for data analysis, but also improves the accuracy of data preprocessing. To enhance the reliability of the procedure, proper training sets are sampled based on massive raw data from the experiments of ECEI on EAST tokamak. Window size of temporal signal, kernel function, and other model parameters are obtained after model training. Consequently, the recognition rates of saturated, zero, and weak signals in raw data are 99.4%, 99.86%, and 99.9%, respectively, which proves the accuracy of this procedure.
• Environmental contaminant exposure can pose significant risks to human health. Therefore, evaluating the impact of this exposure is of great importance; however, it is often difficult because both the molecular mechanism of disease and the mode of action of the contaminants are complex. We used network biology techniques to quantitatively assess the impact of environmental contaminants on the human interactome and diseases with a particular focus on seven major contaminant categories: persistent organic pollutants (POPs), dioxins, polycyclic aromatic hydrocarbons (PAHs), pesticides, perfluorochemicals (PFCs), metals, and pharmaceutical and personal care products (PPCPs). We integrated publicly available data on toxicogenomics, the diseasome, protein-protein interactions (PPIs), and gene essentiality and found that a few contaminants were targeted to many genes, and a few genes were targeted by many contaminants. The contaminant targets were hub proteins in the human PPI network, whereas the target proteins in most categories did not contain abundant essential proteins. Generally, contaminant targets and disease-associated proteins were closely associated with the PPI network, and the closeness of the associations depended on the disease type and chemical category. Network biology techniques were used to identify environmental contaminants with broad effects on the human interactome and contaminant-sensitive biomarkers. Moreover, this method enabled us to quantify the relationship between environmental contaminants and human diseases, which was supported by epidemiological and experimental evidence. These methods and findings have facilitated the elucidation of the complex relationship between environmental exposure and adverse health outcomes.

Noon van der Silk Jan 27 2016 03:39 UTC

Great institute name ...

Chris Granade Sep 22 2015 19:15 UTC

Thank you for the kind comments, I'm glad that our paper, source code, and tutorial are useful!

Travis Scholten Sep 21 2015 17:05 UTC

This was a really well-written paper! Am very glad to see this kind of work being done.

In addition, the openness about source code is refreshing. By explicitly relating the work to [QInfer](https://github.com/csferrie/python-qinfer), this paper makes it more easy to check the authors' work. Furthe

...(continued)
Chris Granade Sep 15 2015 02:40 UTC