This will be a short post, as I just want to signal an extremely interesting article by Michael Graber titled "Use Customer Reviews as Consumer Insights". I think he summarizes very effectively the human process behind an effective analysis of reviews:
First, decide on what problem area or opportunity territory you want to explore. Next, locate your sources of review (...) then gather the data. (...) Start with the stellar, five-star reviews and mine them for such magic statements as “if it only did this…” (...) Now, the real needs and places you can learn about the category are embedded in the single or zero star reviews. Gather all the relevant ones. (...) sort them by themes (...) Tease the insights out of the themes and chart out the sub-categories inherent in each of the thematic families.
I took the freedom of citing most crucial passages and tagging some parts with colors. Indeed, while Michael seems to evocate a completely manual process, here I try to distinguish what must be manual (blue) from what can automated by specialized software (red).
In particular the phase that we could call "domain modelling" should always be human-driven: it is composed mainly of source definition, problem statement and definition of categories (although the decision on the categorization "tree" can be made much easier and effective by using certain language processing algorithms such as automatic product feature extraction and semantic clustering).
Other tasks can be performed by specialized software: of course the task of crawling and cleaning the data, as well the selection of relevant reviews on the basis of structured attributes such as "stars" is quite trivial for a computer. What is less trivial is the language analysis underlying the phase of insights mining. Language is a complex thing, and one might cast some doubt on the capability of a machine to understand it. Still in recent years, the domain of Natural Language Processing (NLP) made huge progresses and it is now possible to produce reliable analyses. Of course this does not exclude the role of the human: but when you have ten thousands of zero stars reviews, a software which can automatically categorize them , for instance, into "lack of robustness", "aesthetic refuse", "delivery problems", "unreasonable cost" etc. might speed up the human work and increase the quality (sometimes human analysis tends to introduce a bias).
I would like to conclude with a caveat on language analysis capabilities: automatic semantic analysis is feasible, as I said, but it is difficult. For instance, better not to trust approaches which claims to be "language independent": you can automatically mine gold (insights) from stars (reviews), but doing it independently from the language of the reviews is just science fiction.