Text Mining and Natural Language Processing with Python

  • Level: intermediate
  • Duration: 2-day course
  • Delivered: in-house

What you will learn

You will learn the fundamental skills you need to extract syntactic, semantic and even emotional information from text.

  • Text processing (parsing, tokenisation, lematisation)
  • Syntactic analysis (POS tagging)
  • Semantic analysis (word vector analysis, IR techniques)
  • Topic analysis
  • Language models and text generation

Languages and libraries

  • Python programming language
  • numpy and pandas for data manipulation
  • scikit-learn for machine learning algorithms
  • plotly for interactive visualisations


Day One

Essential techniques for text processing and information extraction

Session 1

Text processing

  • Text tokenisation
  • Lemmatisation, parsing

Session 2

Semantic analysis and information extraction

  • Overview of information extraction
  • Vector representations of words
  • Evaluating semantic similarity

Day Two

NLP applications and machine learning

Session 1

Text classification and ranking

  • Naive Bayes for spam filtering
  • Sentiment analysis

Session 2

Topic segmentation

  • Clustering.
  • Multi-class classification

Session 3

Language models

  • Text prediction
  • Text generation


Prerequisites: Good knowledge of python, some familiarity with matrices, basic understanding of machine learning practice (as taught in Introduction to Data Science)


Those who wish to take their data science skills further and learn state-of-the-art techniques in this constantly evolving field.