{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Named Entity Recognition (NER)\n", "\n", "This notebook is from [AI for Beginners Curriculum](http://aka.ms/ai-beginners).\n", "\n", "In this example, we will learn how to train NER model on [Annotated Corpus for Named Entity Recognition](https://www.kaggle.com/datasets/abhinavwalia95/entity-annotated-corpus) Dataset from Kaggle. Before procedding, please donwload [ner_dataset.csv](https://www.kaggle.com/datasets/abhinavwalia95/entity-annotated-corpus?resource=download&select=ner_dataset.csv) file into current directory." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from tensorflow import keras\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparing the Dataset \n", "\n", "We will start by reading the dataset into a dataframe. If you want to learn more about using Pandas, visit a [lesson on data processing](https://github.com/microsoft/Data-Science-For-Beginners/tree/main/2-Working-With-Data/07-python) in our [Data Science for Beginners](http://aka.ms/datascience-beginners)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Sentence # | \n", "Word | \n", "POS | \n", "Tag | \n", "
---|---|---|---|---|
0 | \n", "Sentence: 1 | \n", "Thousands | \n", "NNS | \n", "O | \n", "
1 | \n", "NaN | \n", "of | \n", "IN | \n", "O | \n", "
2 | \n", "NaN | \n", "demonstrators | \n", "NNS | \n", "O | \n", "
3 | \n", "NaN | \n", "have | \n", "VBP | \n", "O | \n", "
4 | \n", "NaN | \n", "marched | \n", "VBN | \n", "O | \n", "