- This work studies how an AI-controlled dog-fighting agent with tunable decision-making parameters can learn to optimize performance against an intelligent adversary, as measured by a stochastic objective function evaluated on simulated combat engagements. Gaussian process Bayesian optimization (GPBO) techniques are developed to automatically learn global Gaussian Process (GP) surrogate models, which provide statistical performance predictions in both explored and unexplored areas of the parameter space. This allows a learning engine to sample full-combat simulations at parameter values that are most likely to optimize performance and also provide highly informative data points for improving future predictions. However, standard GPBO methods do not provide a reliable surrogate model for the highly volatile objective functions found in aerial combat, and thus do not reliably identify global maxima. These issues are addressed by novel Repeat Sampling (RS) and Hybrid Repeat/Multi-point Sampling (HRMS) techniques. Simulation studies show that HRMS improves the accuracy of GP surrogate models, allowing AI decision-makers to more accurately predict performance and efficiently tune parameters.
- We study a variant of the source identification game with training data in which part of the training data is corrupted by an attacker. In the addressed scenario, the defender aims at deciding whether a test sequence has been drawn according to a discrete memoryless source $X \sim P_X$, whose statistics are known to him through the observation of a training sequence generated by $X$. In order to undermine the correct decision under the alternative hypothesis that the test sequence has not been drawn from $X$, the attacker can modify a sequence produced by a source $Y \sim P_Y$ up to a certain distortion, and corrupt the training sequence either by adding some fake samples or by replacing some samples with fake ones. We derive the unique rationalizable equilibrium of the two versions of the game in the asymptotic regime and by assuming that the defender bases its decision by relying only on the first order statistics of the test and the training sequences. By mimicking Stein's lemma, we derive the best achievable performance for the defender when the first type error probability is required to tend to zero exponentially fast with an arbitrarily small, yet positive, error exponent. We then use such a result to analyze the ultimate distinguishability of any two sources as a function of the allowed distortion and the fraction of corrupted samples injected into the training sequence.
- Mar 29 2017 stat.ML arXiv:1703.09631v1
- Mar 29 2017 stat.ML arXiv:1703.09528v1