Idiom-based features in sentiment analysis: Cutting the Gordian knot


In this paper we describe an automated approach to enriching sentiment analysis with idiom-based features. Specifically, we automated the development of the supporting lexico-semantic resources, which include (1) a set of rules used to identify idioms in text and (2) their sentiment polarity classifications. Our method demonstrates how idiom dictionaries, which are readily available general pedagogical resources, can be adapted into purpose-specific computational resources automatically. These resources were then used to replace the manually engineered counterparts in an existing system, which originally outperformed the baseline sentiment analysis approaches by 17 percentage points on average, taking the F-measure from 40s into 60s. The new fully automated approach outperformed the baselines by 8 percentage points on average taking the F-measure from 40s into 50s. Although the latter improvement is not as high as the one achieved with the manually engineered features, it has got the advantage of being more general in a sense that it can readily utilize an arbitrary list of idioms without the knowledge acquisition overhead previously associated with this task, thereby fully automating the original approach.

IEEE Transactions on Affective Computing