The WordLens-Project
  • The WordLens-Project
  • Course Overview
  • Part 1: Transform and Visualize Data
    • 1 Working Environment
    • 2 R and the Tidyverse
    • 3 Data Loading
      • Tabular Data
      • Tidy Data
      • Exploring New Data
    • 4 Data Transformation
      • Select Columns
      • Filter Rows
      • Sort Rows
      • Add Or Change Columns
        • Calculate New Columns
        • Change Data Types
        • Rename Columns
        • Joining Data Sets
      • Summarize Rows
    • 5 Data Visualization
      • Pleas for Visualization
      • Fast and Simple Plots
      • Grammar of Graphics
  • Part 2: Rule-Based NLP
    • 6 Unstructured Data
    • 7 Searching Text
    • 8 Tokenizing Text
      • Filter or Sample Data
      • Clean and Normalize Text
      • Split Text Into Tokens
      • Removing Stop Words
      • Enrich Tokens
    • 9 Topic Classification
      • Deductive
      • Inductive
    • 10 Sentiment Analysis
    • 11 Text Classification
    • 12 Word Pairs and N-Grams
  • Part 3: NLP with Machine Learning
    • 13 Text Embeddings
    • 14 Part-Of-Speech
    • 15 Named Entities
    • 16 Syntactic Dependency
    • 17 Similarity
    • 18 Sentiment
    • 19 Text Classification
    • 20 Transformers
    • 21 Training a Model
    • 22 Large Language Models
  • Appendix
  • Resources
Powered by GitBook
On this page

The WordLens-Project

In this course, you'll learn the basics of working with unstructured or "big" data using the example of text on the internet.

The Project Setup

The company you work for has recognized the potential of analyzing text data for their business. The company believes that there are multiple use cases in which text data can serve as a valuable source of business insight. However, until now, there was no expertise on how to perform analysis on unstructured data like text.

This is why your company initiated the WordLens project. The goal is to build expertise in the field of text analysis and natural language processing (NLP) to leverage the many types of text data the company has access to. This includes:

  • Social media channels on Twitter, Facebook, and Instagram

  • Online news articles

  • A massive body of research papers that openly available

  • Product ratings and reviews from the company's e-commerce platform

  • Job applications

  • A database of written complaints from customers

  • A vast amount of internal documents in the form of Microsoft Word, PowerPoint, or PDF files

  • Countless e-mails from employees

You are responsible for the success of the WordLens project, and it is your task to build the required expertise and to showcase the possibilities that lie in analyzing the aforementioned sources of unstructured text data.

NextCourse Overview

Last updated 2 years ago