What to remove from the source data and what not depends on the problem statement. For example, if you are working with text from the field of economics or business, signs such as $ or other currency symbols may contain hidden information that you do not want to lose. But in most cases, we delete them.


NLP and deep learning Go Study New Zealand
NLP and deep learning

Speech recognition and synthesis

Speech recognition is the process of converting a speech signal into digital information such as text. Speech synthesis works in the opposite direction, forming a speech signal from the printed text.

Speech synthesis and recognition are used in a wide variety of fields, such as voice assistants, IVR systems and smart homes.

Highlighting entities and facts

Another popular NLP task is extracting Named-entity recognition (NER) from text. Imagine that you have a solid text about the purchase and sale of assets, and you need to highlight persons, as well as dates and assets.

Without NER, it is difficult to imagine the solution of many NLP problems, for example, solving the pronominal anaphora or building question-answer systems. If you ask in the search engine the question "Who played the role of Batman in the movie" The Dark Knight "", then the answer is found just by highlighting named entities: we select the entities (film, role, etc.), understand what is asked, and then look for the answer is in the database (read more here: doctranslator).

The NER problem statement is very flexible. You can select any desired continuous fragments of text that are somewhat different from the rest of the text. As a result, you can select your own set of entities for a specific practical task, process the texts with this set, and train the model. This scenario is ubiquitous and makes NER one of the most frequently solved NLP problems in the industry.