One Restaurant Corpus

One Restaurant Corpus (ORCo) is a dataset that consists of opinions about an unique entity or opinion object in the Restaurants domain.

Opinion summarisation task consists in obtaining a structured summary automatically which extracts knowledge from a set of opinions with regard an entity. Hence, aiming at obtaining a significant summary, opinion summarisation task must be applied over a set of opinions about an unique entity. As far as we know, there are not annotated opinion sets clustered by entity, what means that the opinion summarisation workflow addressed as an ABSA (Aspect Based Sentiment Analysis) cannot fully be evaluated.

We propose One Restaurant Corpus, a new set of opinions with regard an entity in the restaurant domain from Tripadvisor. It contains 50 opinions divided in 277 sentences, being 25 opinions rated as 1 star and 25 as 5 stars in TripAdvisor.

Each sentence in ORCo has at least an aspect category annotated (Food, Desserts, Ambience, Staff, Location, General, Price, Drinks, Desserts and None) and a sentiment polarity (-1,0,1) according to three annotators. We evaluate the inter-annotator agreement by using Kripendorff’s alpha value with a value of 0.7311 for assessing the aspect categories and multi-k coefficient with a value of 0.9041 for the sentiment polarity.

The file consists of 4 columns. Review_id column which aims at grouping sentences according to the opinion they belong to, Phrase column that contains the text of the sentence, AspectCategory which contains the aspect categories of each sentence (if there are various labels they are separated by character ‘/’), Polarity column which indicates the sentiment polarity and TripadvisorReviewStarRating which contains the Tripadvisor stars given by the opinion holder.

This dataset has been employed for the evaluation of a opinion summarisation methodology which is in review.

Dataset can be downloaded from the repository


July 2020


María Victoria Luzón García