results for au:Eger_S in:cs

- Mar 06 2018 cs.CL arXiv:1803.01400v1Average word embeddings are a common baseline for more sophisticated sentence embedding techniques. An important advantage of average word embeddings is their computational and conceptual simplicity. However, they typically fall short of the performances of more complex models such as InferSent. Here, we generalize the concept of average word embeddings to $p$-mean word embeddings, which are (almost) as efficiently computable. We show that the concatenation of different types of $p$-mean word embeddings considerably closes the gap to state-of-the-art methods such as InferSent monolingually and substantially outperforms these more complex techniques cross-lingually. In addition, our proposed method outperforms different recently proposed baselines such as SIF and Sent2Vec by a solid margin, thus constituting a much harder-to-beat monolingual baseline for a wide variety of transfer tasks. Our data and code are publicly available.
- Apr 25 2017 cs.CL arXiv:1704.07203v3Argument mining has become a popular research area in NLP. It typically includes the identification of argumentative components, e.g. claims, as the central component of an argument. We perform a qualitative analysis across six different datasets and show that these appear to conceptualize claims quite differently. To learn about the consequences of such different conceptualizations of claim for practical applications, we carried out extensive experiments using state-of-the-art feature-rich and deep learning systems, to identify claims in a cross-domain fashion. While the divergent perception of claims in different datasets is indeed harmful to cross-domain classification, we show that there are shared properties on the lexical level as well as system configurations that can help to overcome these gaps.
- Apr 21 2017 cs.CL arXiv:1704.06104v2We investigate neural techniques for end-to-end computational argumentation mining (AM). We frame AM both as a token-based dependency parsing and as a token-based sequence tagging problem, including a multi-task learning setup. Contrary to models that operate on the argument component level, we find that framing AM as dependency parsing leads to subpar performance results. In contrast, less complex (local) tagging models based on BiLSTMs perform robustly across classification scenarios, being able to catch long-range dependencies inherent to the AM problem. Moreover, we find that jointly learning 'natural' subtasks, in a multi-task learning setup, improves performance.
- A vector composition of a vector $\mathbf{\ell}$ is a matrix $\mathbf{A}$ whose rows sum to $\mathbf{\ell}$. We define a weighted vector composition as a vector composition in which the column values of $\mathbf{A}$ may appear in different colors. We study vector compositions from different viewpoints: (1) We show how they are related to sums of random vectors and (2) how they allow to derive formulas for partial derivatives of composite functions. (3) We study congruence properties of the number of weighted vector compositions, for fixed and arbitrary number of parts, many of which are analogous to those of ordinary binomial coefficients and related quantities. Via the Central Limit Theorem and their multivariate generating functions, (4) we also investigate the asymptotic behavior of several special cases of numbers of weighted vector compositions. Finally, (5) we conjecture an extension of a primality criterion due to Mann and Shanks in the context of weighted vector compositions.
- Apr 11 2017 cs.CL arXiv:1704.02497v1We consider two graph models of semantic change. The first is a time-series model that relates embedding vectors from one time period to embedding vectors of previous time periods. In the second, we construct one graph for each word: nodes in this graph correspond to time points and edge weights to the similarity of the word's meaning across two time points. We apply our two models to corpora across three different languages. We find that semantic change is linear in two senses. Firstly, today's embedding vectors (= meaning) of words can be derived as linear combinations of embedding vectors of their neighbors in previous time periods. Secondly, self-similarity of words decays linearly in time. We consider both findings as new laws/hypotheses of semantic change.
- Apr 10 2017 cs.CL arXiv:1704.02215v2This paper describes our approach to the SemEval 2017 Task 10: "Extracting Keyphrases and Relations from Scientific Publications", specifically to Subtask (B): "Classification of identified keyphrases". We explored three different deep learning approaches: a character-level convolutional neural network (CNN), a stacked learner with an MLP meta-classifier, and an attention based Bi-LSTM. From these approaches, we created an ensemble of differently hyper-parameterized systems, achieving a micro-F1-score of 0.63 on the test data. Our approach ranks 2nd (score of 1st placed system: 0.64) out of four according to this official score. However, we erroneously trained 2 out of 3 neural nets (the stacker and the CNN) on only roughly 15% of the full data, namely, the original development set. When trained on the full data (training+development), our ensemble has a micro-F1-score of 0.69. Our code is available from https://github.com/UKPLab/semeval2017-scienceie.
- Oct 26 2016 cs.CL arXiv:1610.07796v2We analyze the performance of encoder-decoder neural models and compare them with well-known established methods. The latter represent different classes of traditional approaches that are applied to the monotone sequence-to-sequence tasks OCR post-correction, spelling correction, grapheme-to-phoneme conversion, and lemmatization. Such tasks are of practical relevance for various higher-level research fields including digital humanities, automatic text correction, and speech recognition. We investigate how well generic deep-learning approaches adapt to these tasks, and how they perform in comparison with established and more specialized methods, including our own adaptation of pruned CRFs.
- Jul 19 2016 cs.CL arXiv:1607.05014v2We study the role of the second language in bilingual word embeddings in monolingual semantic evaluation tasks. We find strongly and weakly positive correlations between down-stream task performance and second language similarity to the target language. Additionally, we show how bilingual word embeddings can be employed for the task of semantic language classification and that joint semantic spaces vary in meaningful ways across second languages. Our results support the hypothesis that semantic language similarity is influenced by both structural similarity as well as geography/contact.
- Jan 06 2016 cs.LG arXiv:1601.00925v1A Support Vector Machine (SVM) has become a very popular machine learning method for text classification. One reason for this relates to the range of existing kernels which allow for classifying data that is not linearly separable. The linear, polynomial and RBF (Gaussian Radial Basis Function) kernel are commonly used and serve as a basis of comparison in our study. We show how to derive the primal form of the quadratic Power Kernel (PK) -- also called the Negative Euclidean Distance Kernel (NDK) -- by means of complex numbers. We exemplify the NDK in the framework of text categorization using the Dewey Document Classification (DDC) as the target scheme. Our evaluation shows that the power kernel produces F-scores that are comparable to the reference kernels, but is -- except for the linear kernel -- faster to compute. Finally, we show how to extend the NDK-approach by including the Mahalanobis distance.
- We count the number of alignments of $N \ge 1$ sequences when match-up types are from a specified set $S\subseteq \mathbb{N}^N$. Equivalently, we count the number of nonnegative integer matrices whose rows sum to a given fixed vector and each of whose columns lie in $S$. We provide a new asymptotic formula for the case $S=\{(s_1,\ldots,s_N) \:|\: 1\le s_i\le 2\}$.
- In this note, I discuss results on integer compositions/partitions given in the paper "A Unified Approach to Algorithms Generating Unrestricted and Restricted Integer Compositions and Integer Partitions". I also experiment with four different generation algorithms for restricted integer compositions and find the algorithm designed in the named paper to be pretty slow, comparatively. Some of my comments may be subjective.
- We study an endogenous opinion (or, belief) dynamics model where we endogenize the social network that models the link (`trust') weights between agents. Our network adjustment mechanism is simple: an agent increases her weight for another agent if that agent has been close to truth (whence, our adjustment criterion is `past performance'). Moreover, we consider multiply biased agents that do not learn in a fully rational manner but are subject to persuasion bias - they learn in a DeGroot manner, via a simple `rule of thumb' - and that have biased initial beliefs. In addition, we also study this setup under conformity, opposition, and homophily - which are recently suggested variants of DeGroot learning in social networks - thereby taking into account further biases agents are susceptible to. Our main focus is on crowd wisdom, that is, on the question whether the so biased agents can adequately aggregate dispersed information and, consequently, learn the true states of the topics they communicate about. In particular, we present several conditions under which wisdom fails.
- We study a DeGroot-like opinion dynamics model in which agents may oppose other agents. As an underlying motivation, in our setup, agents want to adjust their opinions to match those of the agents of their 'in-group' and, in addition, they want to adjust their opinions to match the 'inverse' of those of the agents of their 'out-group'. Our paradigm can account for persistent disagreement in connected societies as well as bi- and multi-polarization. Outcomes depend upon network structure and the choice of deviation function modeling the mode of opposition between agents. For a particular choice of deviation function, which we call soft opposition, we derive necessary and sufficient conditions for long-run polarization. We also consider social influence (who are the opinion leaders in the network?) as well as the question of wisdom in our naive learning paradigm, finding that wisdom is difficult to attain when there exist sufficiently strong negative relations between agents.
- We derive asymptotic formulas for central extended binomial coefficients, which are generalizations of binomial coefficients. To do so, we relate the exact distribution of the sum of independent discrete uniform random variables to the asymptotic distribution, obtained from the Central Limit Theorem and a local limit variant.
- Among all restricted integer compositions with at most $m$ parts, each of which has size at most $l$, choose one uniformly at random. Which integer does this composition represent? In the current note, we show that underlying distribution is, for large $m$ and $l$, approximately normal with mean value $\frac{ml}{2}$.