natural language processing(NLP) is the science of making computers speak or interact with humans in human language. Examples of natural language processing include speech recognition, spell checking, autocomplete, chatbots, and search engines.
Natural language processing has been around for years but is often taken for granted. Here are eight examples of natural language processing applications that you may not be familiar with. If you have a large amount of text data, feel free to rent oneNLP consultantlike fast data science.
8 examples of natural language processing in business
Using NLP to get information from documents
When companies have large amounts of text documents (imagine alaw firmsmany cases or regulatory documents in a pharmaceutical company), it can be difficult to get information from there.
For example, a pharmaceutical executive might want to know thousands of pieces of information.clinical trialswhich the company managed, how many led to a certain side effect, when that information is stored in a pile of documents and no one has time to read them all.
Natural language processing provides us with a series of tools to automate this type of task.
Traditional Business Intelligence (BI) tools, such asBI de energiajdiagramenable analysts to pull insights from structured databases so they can quickly see, for example, which team generated the most revenue in a given quarter. But a lot of the data circulating in organizations is in an unstructured format like PDF documents, and that's not where Power BI can easily help.
An expert in natural language processing can detect patterns in unstructured data. For example,topic modeling(grouping) can be used to find key topics in a set of documents andRecognition of named entitiesYou can identify product names, personal names or key positions.document classificationcan be used to automatically sort documents into categories.
I often work with an open source library such asApache, which can convert PDF documents to plain text and then train natural language processing models on the plain text. But even after converting PDF to text, the text is often messed up, page numbers and headers are reversed in the document, and formatting information is lost.
NLP for Spell Check Forms
Spelling and grammar checkers are now widely used and help us fill in web forms correctly and avoid typos. When typing on a cell phone screen, I find that the spell checker probably corrects most words.
You might think that writing a spell checker is as easy as compiling a list of all the legal words in a language, but the problem is much more complex. How can such a system distinguish betweenThey are,Leavesjthey are? Today, more sophisticated spell checkers use neural networks to verify that the correct homonym is being used. Even for languages with more complicated morphologies than English, spell checking can become very computationally intensive.
Example of natural language processing to retrieve information and answer questions
There's been a lot of talk lately about transformer models, which are the latest version of neural networks. Transformers can represent natural language grammar in extremely deep and sophisticated ways, and have improved the performance of document classification, text generation, and question answering systems. The best known of these tools are BERT, GPT-2 and GPT-3.
The easiest way to get started with BERT is to install a library calledhug face. Below is my experiment to get the facts about theDonoghue gegen Stevenson("Snail in a bottle"), a historic decision in English tort law that laid the foundation for the modern doctrine of negligence. You can see that BERT was able to get the facts easily (On August 26, 1928, Claimant drank a bottle of ginger beer produced by Claimant...). While impressive, BERT's sophistication is currently limited to finding the relevant passage of text.
NLP example of spelling conversion between US and UK English
One problem I often run into is running natural language processing algorithms on corpus of documents or survey response lists that are a mix of American and British spelling or full of common misspellings. One of the irritating consequences of the lack of spelling standardization is that words likenormalize/normalizeThey are not normally chosen as high frequency words when split between variants. Because of this, we often have to use spelling and grammar normalization tools.
After this problem appeared in many of my projects, I wrote my own Python package calledlocal spellingThis allows a user to convert all text in a document to British or American or to identify which variant is used in the document.
While spelling normalization may seem unimportant, the BBC reported on it in 2022Spelling errors cost the UK millions of pounds in lost revenue, and that a single spelling mistake on a website can cut conversion rates in half. Unbelievable!
Example of NLP for speech recognition
Given text in an unknown language, it is surprisingly easy for natural language processing to identify the language. There are two main approaches to voice recognition:
Speech recognition via stop word lists
An NLP system can look for stop words (small function words likeAND,Em,Em) in a text and compare it with a list of known stop words for several languages. The language with the most stop words in unknown text is identified as the language. Therefore, a document with many occurrences ofANDjTo dieit is likely to be French, for example.
Language identification of n-gram lists
A slightly more sophisticated language identification technique is to compile a list of languagesn-grams, which are character strings that have a characteristic frequency in each language. For example the combinationCHit is common in English, Dutch, Spanish, German, French and other languages.
but the combinationschis common only in German and Dutch, andwateris common in French as a three-letter sequence. Although East Asian scripts may look similar to the untrained eye, the most common character in Japanese is の and the most common character in Chinese is 的, both of which correspond to English.'SSuffix.
By counting the sequences of one, two and three letters in a text (unigrams, bigrams and trigrams), a language can be identified from a short sequence of just a few sentences.
Example of natural language processing for author identification
As an extension of the previous problem, sometimes a text by an unknown author appears and we want to know who wrote it.
Examples are novels written under a pseudonym, such as JK Rowling's crime series, written under the pseudonym Robert Galbraith, or the pseudonym of the Italian author.Elena Ferrante. In politics, we have The New York Times anonymous op-ed.I am part of the resistance within the Trump administration, which triggered a witch hunt against its author, and the open question aboutwho wrote the rose garden statement by dominic cummings.
The excellent linguist YouTuber Joshua R took a tripqualitative analysis of a message in French written in 2015 by one of the Bataclan terrorists, where you identified the key demographic information behind the author (educational level, cultural background, etc.).
The science of determining the authorship of unknown texts is calledforensic stylometry. Every author has a signature fingerprint of their writing style, even if they are word processing documents and there is no handwriting.
You can read more about forensic stylometry in myprevious blog post on the topic, and you can also try alive demoan author identification system on the website.
While forensic stylometry can be considered a qualitative discipline and is used by scholars in the humanities for problems such as unknown texts in Latin or Greek, it is also an interesting example of the application of natural language processing.
NLP machine translation example
Gone are the days when machine translation systems were known to convert texts like "the spirit is willing but the flesh is weak" to "the vodka is good but the flesh is lazy". (Althoughthe EconomistReliable informs me that this story is apocryphal.)
Today, Google Translate covers an incredible range of languages and processes most of them using statistical models trained on huge corpora of text that may not be available in the language pair.Transformer Modelsenabled tech giants to develop translation systems based solely on monolingual text.
In 2022, the Meta conglomerate, owners of Facebook,announced the creation of a single AI modelcapable of translating into 200 different languages and democratizing access to natural language processing for lesser spoken languages like Twi (Ghana) that were previously not supported by NLP tools.
The monolingual approach is also much more scalable, as Facebook templates can be translated from Thai to Lao or Nepali to Assamese just as easily as between those languages and English. As the number of supported languages increased, the number of language pairs would become unmanageable if each language pair had to be developed and maintained. Previous iterations of machine translation models tended to underperform when not translating to or from English.
However, much remains to be done to improve coverage of the world's languages. Facebook estimates that over 20% of the world's population is still not covered by commercial translation technology. In general, coverage for the world's major languages is very good, with some outliers (notably Yue and Wu Chinese, sometimes known as Cantonese and Shanghai Chinese).
Many of the unsupported languages are languages with many speakers but unofficial status, such as B. the many spoken variants of Arabic.
Interestingly, the Bible was translatedover 6,000 languagesand it is often the first book to be published in a new language.
NLP example for sentiment analysis
sentiment analysisis an example of how natural language processing can be used to identify the subjective content of a text. Of course, this is very useful for companies that want to monitor social media traffic related to their brands and competitor brands or key topics, and also want to monitor the dialogue mood between users and chatbots or customer service agents. Sentiment analysis has been used in finance to identify emerging trends that may indicate profitable trades.
For more examples of how this area of natural language processing can be applied to your business, see myBlog post about trends in sentiment analysiswhat do oneinteractive demoof a sentiment analysis tool and shows how sentiment analysis technology has evolved from the 1970s to the present day.
What are other examples of NLP in business?
Natural language processing can quickly transform an organization. Companies in industries such as pharmaceuticals, legal, insurance and scientific research can use the vast amounts of data they have stored in silos to outperform the competition.
Natural language processing can be used to improve the customer experience in the form of chatbots and rating systems.incoming sales inquiriesand customer service requests.
For more examples of how natural language processing can be used for efficiency and profitability in your business, click herecontato fast data science.
SIL International,Ethnologist: Languages of the World(2022, 25th edition)
the EconomistA talent for languages(2009)