Client-side NLP research
Natural Language Processing (NLP) is a sub-division of artificial intelligence and computer linguistics, which deals with everything where natural (human, not programmed) language interferes with computer language. Here, we are going to talk about numerous spheres of its application – computer translations, interfaces, text analysis, text generation, and many others.
In this research, we focus on one of the simplest and most pragmatic applications of NLP:
- information extraction – extracting different data from the text (names, dates, places, emails)
- sentiment analysis – stylistic analysis of the text
Prerequisite: work on the client-side, which enables running the solutions on many apps and process information “on the go.”
For example, when the manager writes a response to a client in the ticket, and the script instantly analyses it and gives the assessment: “it’s too rude, why don’t you take out all those f-words every here and there.”
1. Support of foreign languages (besides English), in particular, Russian and German.
2. Expandability – option to add your own modules.
3. Productivity on long texts.
4. Practical application of searching the names, dates, and places in the text; sentiment analysis
5. User-friendliness in terms of development.
Knwl.js is not a homogenous library, but a set of modules: each function recognizing emails, dates, or phone numbers, comes as a separate module which can be developed by separate people. It is quite good in terms of expandability, but this is the only advantage. In theory, these modules may contain words in foreign languages, including Russian, but the developer claims that this will take place “some day”, which is highly doubtable.
Lib promises accessibility on client-side, but the way described in the readme is simply not working. This makes the Lib less appealing and questions its supportability.
In addition to high quality of performing primarily tasks of recognizing places, dates and names, Compromise has a toolbar: you can make for manipulating a sentence speech. Lib is quite well-documented ( https://nlp-expo.firebaseapp.com/docs ) and has a user-friendly interface. It is the only lib that works on a client-side.
It worked smoothly on short and medium-sized emails, but it failed on the long ones and emails containing specific vocabulary: not all dates, places and names were recognized.
Sentiment analysis in Node.js
During my research, I encountered three js-Libs for detecting the tone of a message. These operate on Node.js, but I thought that they will be of interest anyways, at least as an open-source alternative to proprietary API. All three of them have similar interface, work mode (based on different versions of AFINN), and even similar names.
ML-Sentiment (https://github.com/syzer/sentiment-analyser) is minimalistic: it assesses only one aspect, while other Libs look at positivity/negativity/neutrality. An advantage is the option of analysis in the German language.
The second one, Sentiment (https://github.com/winster/sentiment), is basically an upgrade of the third one - Sentimental (https://github.com/thinkroth/Sentimental) with a better speed and an option to directly modify assessments of words.
To assess the quality of Libs, I took several lists of reviews, positive and negative, 1000 each, and measured the number of positive and negative assessments made by the programs per hour. Below is one of benchmark results:
Positive sentiment analysis:
Negative sentiment analysis:
In general, winster/sentiment works faster and more precisely than all other of those libs. Moreover, analysis of negative reviews made by the modules turned out to be more sophisticated.
Sentiment analysis through external services
Officially, numerous services offering sentiment analysis are registered on Mashape, but we managed to find only three of them which are not broken and can be tested for free; and only one allowed benchmarking.
I gave my subjective assessment of precision for the other two, based on 5-6- reviews, primarily negative ones. I did not perform normal benchmarking because I was afraid to exceed the limits (I had to give my card number, even for free plans). Generally speaking, these API also have hard times while dealing with complicated reviews, even though they are built on more complex models.
Briefly about each of them:
Skyttle (https://market.mashape.com/sentinelprojects/skyttle2-0) is appealing with the option of analyzing German, Russian and French texts, but this analysis is not available in free versions, so I could not check it out. Besides, assessments in English seem less precise than of two other API.
Webknox (https://market.mashape.com/webknox/text-processing-1#) allows analyzing texts in the German language and it gives a bit different form of result: instead of traditional positive/negative/neutral, it only gives one of them and assesses the probability of the result correctness.
Japerk (https://market.mashape.com/japerk/text-processing) works with a great number of languages on other endpoints, but for sentiment analysis it enables the analysis only in the English and Dutch languages. Unlike others, it allows a large number of free enquiries – 45,000 per month; therefore, I managed to get a more precise estimate for its correctness, based on multiple reviews checked on Node.js.
Positive sentiment analysis:
Negative sentiment analysis:
So, this service gives more precise results while analyzing negative reviews. While analyzing positive reviews, it gives worse results than more primitive Node.js modules.
To perform parsing of names, dates and places, library Compromise would be a good option. It can perform these actions on client-side, which is a big advantage in many cases.
Among free libraries for sentiment analysis, winster/sentiment performs better than others, but it is better to use external API, the most easily accessible of which is japerk; and Skyttle is the most universal in terms of the number of languages.
UPDATE: When I was performing the demo, some issues in the work of Compromise came up; they lower the rating of this library quite a bit. The library can recognize the parts of speech but in such spheres like recognizing places and names, it refers to its internal database which is limited. Compromise recognizes all countries, but among cities – only the biggest ones, and among names – mostly American. That’s why one can face difficulties in recognizing English text which deals with, say, Eastern-European realia.