Machine Learning for Natural Language Processing

Andreas Vlachos
Department of Computer Science
University of Sheffield

Session goals

  • Understand some basic concepts in natural language processing
  • Gain inutition behind machine learning approaches
  • Explore how they can be applied to language assessment

What is Natural Language Processing (NLP)?

Why machine learning (ML) for NLP?

Learning from data adapts:

  • to evolution: just learn from new data
  • to different applications: just learn with the appropriate target representation

Compared to rule-based approaches, ML-based ones:

  • offer wider coverage
  • can capture more complex patterns:
    • weighted features
    • continuous representations (a.k.a. neural networks)


Short answer: NOt really

  • Useful ML-based NLP captures linguistic intuition
  • The target representations come from linguistics

Words of caution

When exploring a task, it is often useful to experiment with some simple rules to test our assumptions

In fact, for some tasks rule-based approaches rule, especially in the industry:

  • coreference resolution
  • natural language generation

If we don't know how to perform a task, unlikely that an ML algorithm will find it out for us

Session structure

Part 1: Text classification

  • how to represent text as vectors
  • learning a classifier with the perceptron
  • more advanced classification methods


Part 2: Language Modeling

  • count-based language models
  • dealing with sparsity
  • more advanced language models