What Is Natural Language Processing?

Data analytics with human language data

Natural language processing (NLP) is the broad class of computational techniques for incorporating speech and text data, along with other types of engineering data, into the development of smart systems.

Raw human language data can come from a variety of sources, including audio signals, web and social media, documents and databases containing valuable information such as voice commands, public sentiment on topics, operational data, and maintenance reports. Natural language processing can be used to combine and simplify these large sources of data, transforming them into meaningful insight with visualizations, topic models, and machine learning classifiers. For example, using MATLAB® you can detect the presence of human speech in an audio segment, perform speech-to-text transcription, and then perform text mining and machine learning on those sources.

Natural language processing is used in finance, manufacturing, electronics, software, information technology, and other industries for applications such as:

  • Automating the classification of reviews based on sentiment, whether positive or negative
  • Counting the frequency of words or phrases in documents and performing topic modeling
  • Developing predictive equipment maintenance schedules based on sensor and text log data
  • Automating labeling and tagging of speech recordings

To learn more about deriving understanding from speech and text data using natural language processing, see Text Analytics Toolbox™, Audio Toolbox™, and Statistics and Machine Learning Toolbox™.

See also: data science, machine learning, deep learning, sentiment analysis, text mining, long short-term memory (LSTM) networks, N-gram, recurrent neural network