Introduction to Flair for NLP: A Simple yet Powerful State-of-the-Art NLP Library

Sharoon Saxena Last Updated : 16 Dec, 2019

13 min read

Introduction

Last couple of years have been incredible for Natural Language Processing (NLP) as a domain! We have seen multiple breakthroughs – ULMFiT, ELMo, Facebook’s PyText, Google’s BERT, among many others. These have rapidly accelerated the state-of-the-art research in NLP (and language modeling, in particular).

We can now predict the next sentence, given a sequence of preceding words.

What’s even more important is that machines are now beginning to understand the key element that had eluded them for long.

Context! Understanding context has broken down barriers that had prevented NLP techniques making headway before. And today, we are going to talk about one such library – Flair.

Until now, the words were either represented as a sparse matrix or as word embeddings such as GLoVe, Bert and ELMo, and the results have been pretty impressive. But, there’s always room for improvement and Flair is willing to stand up to it.

In this article, we will first understand what Flair is and the concept behind it. Then we’ll dive into implementing NLP tasks using Flair. Get ready to be impressed by its accuracy!

Please note that this article assumes familiarity with NLP concepts. You can go through the below articles if you need a quick refresher:

What is ‘Flair’ Library?
What gives Flair the Edge
Introduction to Contextual String Embeddings for Sequence Labeling
Performing NLP Tasks in Python using Flair
What’s Next for Flair?

What is ‘Flair’ Library?

Flair is a simple natural language processing (NLP) library developed and open-sourced by Zalando Research. Flair’s framework builds directly on PyTorch, one of the best deep learning frameworks out there. The Zalando Research team has also released several pre-trained models for the following NLP tasks:

Name-Entity Recognition (NER): It can recognise whether a word represents a person, location or names in the text.
Parts-of-Speech Tagging (PoS): Tags all the words in the given text as to which “part of speech” they belong to.
Text Classification: Classifying text based on the criteria (labels)
Training Custom Models: Making our own custom models.

All of this looks promising. But what truly caught my attention was when I saw Flair outperforming several state-of-the-art results in NLP. Check out this table:

Note: F1 score is an evaluation metric primarily used for classification tasks. It’s often used in machine learning projects over the accuracy metric when evaluating models. The F1 score takes into consideration the distribution of the classes present.

What Gives Flair the Edge?

There are plenty of awesome features packaged into the Flair library. Here’s my pick of the most prominent ones:

It comprises of popular and state-of-the-art word embeddings, such as GloVe, BERT, ELMo, Character Embeddings, etc. There are very easy to use thanks to the Flair API
Flair’s interface allows us to combine different word embeddings and use them to embed documents. This in turn leads to a significant uptick in results
‘Flair Embedding’ is the signature embedding provided within the Flair library. It is powered by contextual string embeddings. We’ll understand this concept in detail in the next section
Flair supports a number of languages – and is always looking to add new ones

Introduction to Contextual String Embeddings for Sequence Labeling

Context is so vital when working on NLP tasks. Learning to predict the next character based on previous characters forms the basis of sequence modeling.

Contextual String Embeddings leverage the internal states of a trained character language model to produce a novel type of word embedding. In simple terms, it uses certain internal principles of a trained character model, such that words can have different meaning in different sentences.

Note: A language and character model is a probability distribution of Words / Characters such that every new word or character depends on the words or characters that came before it. Have a look here to know more about it.

There are two primary factors powering contextual string embeddings:

The words are trained as characters (without any notion of words). Aka, it works similar to character embeddings
The embeddings are contextualised by their surrounding text. This implies that the same word can have different embeddings depending on the context. Quite similar to natural human language, isn’t it? The same word may have different meanings in different situations

Let’s look at an example to understand this:

Case 1: Reading a book
Case 2: Please book a train ticket

Explanation:

In case 1, book is an OBJECT
In case 2, book is a VERB

Language is such a wonderful yet complex thing. You can read more about Contextual String Embeddings in this Research Paper.

Performing NLP Tasks in Python using Flair

It’s time to put Flair to the test! We’ve seen what this awesome library is all about. Now let’s see firsthand how it works on our machines.

We’ll use Flair to perform all the below NLP tasks in Python:

Text Classification using the Flair embeddings
Part of Speech Tagging (PoS) and comparison with the NLTK library

Setting up the Environment

We will be using Google Colaboratory for running our code. One of the best things about Colab is that it provides GPU support for free! It is pretty handy for training deep learning models.

Why use Colab?

Completely free
Comes with pretty decent hardware configuration
It’s on your web browser so even old machines with outdated hardware can run it
Connected to your Google Drive
Very well integrated with Github

All you need is a stable internet connection.

About the Dataset

We’ll be working on the Twitter Sentiment Analysis practice problem. Go ahead and download the dataset from there (you’ll need to register/log in first).

The problem statement posed by this challenge is:

The objective of this task is to detect hate speech in tweets. For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets.

1. Text Classification Using Flair Embeddings

Overview of steps:

Step 1: Import the data into the local Environment of Colab:

Step 2: Installing Flair

Step 3: Preparing text to work with Flair

Step 4: Word Embeddings with Flair

Step 5: Vectorizing the text

Step 6: Partitioning the data for Train and Test Sets

Step 7: Time for predictions!

Step 1: Import the data into the local Environment of Colab:

# Install the PyDrive wrapper & import libraries.
# This only needs to be done once per notebook.

!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials


# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Download a file based on its file ID.
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = '1GhyH4k9C4uPRnMAMKhJYOqa-V9Tqt4q8' ### File ID ###
data = drive.CreateFile({'id': file_id})
#print('Downloaded content "{}"'.format(downloaded.GetContentString()))

You can find the file ID in the shareable link of the dataset file in the drive.

Importing the dataset into the Colab notebook:

import io
Import pandas as pd
data = pd.read_csv(io.StringIO(data.GetContentString())) 
data.head()

All the emoticons and symbols have been removed from the data and the characters have been converted to lowercase. Additionally, our dataset has already been divided into train and test sets. You can download this clean dataset from here.

Step 2: Installing Flair

# download flair library #
import torch
!pip install flair
import flair

A Brief look at Flair Data Types

There are two types of objects central to this library – Sentence and Token objects. A Sentence holds a textual sentence and is essentially a list of Tokens:

from flair.data import Sentence
# create a sentence #
sentence = Sentence('Blogs of Analytics Vidhya are Awesome.')
# print the sentence to see what’s in it. #
print(Sentence)

Step 3: Preparing text to work with Flair

#extracting the tweet part#
text = data['tweet'] 
 ## txt is a list of tweets ##
txt = text.tolist()
print(txt[:10])

Step 4: Word Embeddings with Flair

Feel free to first go through this article if you’re new to word embeddings: An Intuitive Understanding of Word Embeddings.

## Importing the Embeddings ##
from flair.embeddings import WordEmbeddings
from flair.embeddings import CharacterEmbeddings
from flair.embeddings import StackedEmbeddings
from flair.embeddings import FlairEmbeddings
from flair.embeddings import BertEmbeddings
from flair.embeddings import ELMoEmbeddings
from flair.embeddings import FlairEmbeddings

### Initialising embeddings (un-comment to use others) ###
#glove_embedding = WordEmbeddings('glove')
#character_embeddings = CharacterEmbeddings()
flair_forward  = FlairEmbeddings('news-forward-fast')
flair_backward = FlairEmbeddings('news-backward-fast')
#bert_embedding = BertEmbedding()
#elmo_embedding = ElmoEmbedding()

stacked_embeddings = StackedEmbeddings( embeddings = [ 
                                                       flair_forward-fast, 
                                                       flair_backward-fast
                                                      ])

You would have noticed we just used some of the most popular word embeddings above. Awesome! You can remove the comments ‘#’ to use all the embeddings.

Now you might be asking – What in the world are “Stacked Embeddings”? Here, we can combine multiple embeddings to build a powerful word representation model without much complexity. Quite like ensembling, isn’t it?

We are using the stacked embedding of Flair only for reducing the computational time in this article. Feel free to play around with this and other embeddings by using any combination you like.

Testing the stacked embeddings:

# create a sentence #
sentence = Sentence(‘ Analytics Vidhya blogs are Awesome .')
# embed words in sentence #
stacked.embeddings(sentence)
for token in sentence:
  print(token.embedding)
# data type and size of embedding #
print(type(token.embedding))
# storing size (length) #
z = token.embedding.size()[0]

Step 5: Vectorizing the text

We’ll be showcasing this using two approaches.

Mean of Word Embeddings within a Tweet

We will be calculating the following in this approach:

For each sentence:

Generate word embedding for each word
Calculate the mean of the embeddings of each word to obtain the embedding of the sentence

from tqdm import tqdm ## tracks progress of loop ##

# creating a tensor for storing sentence embeddings #
s = torch.zeros(0,z)

# iterating Sentence (tqdm tracks progress) #
for tweet in tqdm(txt):   
  # empty tensor for words #
  w = torch.zeros(0,z)   
  sentence = Sentence(tweet)
  stacked_embeddings.embed(sentence)
  # for every word #
  for token in sentence:
    # storing word Embeddings of each word in a sentence #
    w = torch.cat((w,token.embedding.view(-1,z)),0)
  # storing sentence Embeddings (mean of embeddings of all words)   #
  s = torch.cat((s, w.mean(dim = 0).view(-1, z)),0)

Document Embedding: Vectorizing the entire Tweet

from flair.embeddings import DocumentPoolEmbeddings

### initialize the document embeddings, mode = mean ###
document_embeddings = DocumentPoolEmbeddings([
                                              flair_embedding_backward,
                                              flair_embedding_forward
                                             ])
# Storing Size of embedding #
z = sentence.embedding.size()[1]

### Vectorising text ###
# creating a tensor for storing sentence embeddings
s = torch.zeros(0,z)
# iterating Sentences #
for tweet in tqdm(txt):   
  sentence = Sentence(tweet)
  document_embeddings.embed(sentence)
  # Adding Document embeddings to list #
  s = torch.cat((s, sentence.embedding.view(-1,z)),0)

You can choose either approach for your model. Now that our text is vectorised, we can feed it to our machine learning model!

Step 6: Partitioning the data for Train and Test Sets

## tensor to numpy array ##
X = s.numpy()   

## Test set ##
test = X[31962:,:]
train = X[:31962,:]

# extracting labels of the training set #
target = data['label'][data['label'].isnull()==False].values

Step 6: Building the Model and Defining Custom Evaluator (for F1 Score)

Defining custom F1 evaluator for XGBoost

def custom_eval(preds, dtrain):
    labels = dtrain.get_label().astype(np.int)
    preds = (preds >= 0.3).astype(np.int)
    return [('f1_score', f1_score(labels, preds))]

Building the XGBoost model

import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

### Splitting training set ###
x_train, x_valid, y_train, y_valid = train_test_split(train, target,  
                                                      random_state=42, 
                                                          test_size=0.3)

### XGBoost compatible data ###
dtrain = xgb.DMatrix(x_train,y_train)         
dvalid = xgb.DMatrix(x_valid, label = y_valid)

### defining parameters ###
params = {
          'colsample': 0.9,
          'colsample_bytree': 0.5,
          'eta': 0.1,
          'max_depth': 8,
          'min_child_weight': 6,
          'objective': 'binary:logistic',
          'subsample': 0.9
          }

### Training the model ###
xgb_model = xgb.train(
                      params,
                      dtrain,
                      feval= custom_eval,
                      num_boost_round= 1000,
                      maximize=True,
                      evals=[(dvalid, "Validation")],
                      early_stopping_rounds=30
                      )

Our model has been trained and is ready for evaluation! Note: The parameters were taken from this Notebook.

Step 7: Time for predictions!

### Reformatting test set for XGB ###
dtest = xgb.DMatrix(test)

### Predicting ###
predict = xgb_model.predict(dtest) # predicting

I uploaded the predictions to the practice problem page with 0.2 as probability threshold:

Word Embedding	F1- Score
Glove	0.53
flair-forward -fast	0.45
flair-backward-fast	0.48
Stacked (flair-forward-fast + flair-backward-fast)	0.54

Note: According to Flair’s official documentation, stacking of the flair embedding with other embeddings often yields even better results, But, there is a catch..

It might take a VERY LONG time to compute on a CPU. I highly recommend leveraging a GPU for faster results. You can use the free one within Colab!

2. Part of Speech (POS) Tagging with Flair

We will be using a subset of the Conll-2003 dataset, is a pre-tagged dataset in English. Download the dataset from here.

Overview of steps:

Step 1: Importing the dataset

Step 2 : Extracting Sentences and PoS Tags from the dataset

Step 3: Tagging the text using NLTK and Flair

Step 4: Evaluating the PoS tags from NLTK and Flair against the tagged dataset

Step 1: Importing the dataset

### file was uploaded manually to local environment of Colab ###
data = open('pos-tagged_corpus.txt','r')
txt = data.read()
#print(txt)

The data file contains one word per line, with empty lines representing sentence boundaries.

Step 2 : Extracting Sentences and PoS Tags from the dataset

### converting text in form of list of (words with their tags) ###
txt = txt.split('\n')

### removing DOCSTART (document header)
txt = [x for x in txt if x != '-DOCSTART- -X- -X- O']
### check ###
for i in range(10):
  print(txt[i])
  print(‘-’*10)

### Extracting Sentences ###
# Initialize empty list for storing words
words = []
# initialize empty list for storing sentences #
corpus = []

for i in tqdm(txt):
  ## if blank sentence encountered ##
  if i =='':
    ## previous words form a sentence ##
    corpus.append(' '.join(words))
    ## Refresh Word list ##
    words = []
  else:
   ## word at index 0 ##
    words.append(i.split()[0])
  
# did it work? #
for i in range(10):
  print(corpus[i])
  print(‘-’*10)


### Extracting POS ###
# Initialize empty list for storing word pos
w_pos = []
#initialize empty list for storing sentence pos #
POS = []
for i in tqdm(txt):
  ## blank sentence = new line ##
  if i =='':
    ## previous words form a sentence POS ##
    POS.append(' '.join(w_pos))
    ## Refresh words list ##
    w_pos = []
  else:
    ## pos tag from index 1 ##
    w_pos.append(i.split()[1])
  
# did it work? #
for i in range(10):
  print(corpus[i])
  print(POS[i])

### Removing blanks form sentence and pos ###
corpus = [x for x in corpus if x!= '']
POS = [x for x in POS if x!= '']

### Check ###
For i in range(10):
  print(corpus[i])
  print(POS[i])

We have extracted the essentials aspects we require from the dataset. Let’s move on to step 3.

Step 3: Tagging the text using NLTK and Flair

Tagging using NLTK:

First, import the required libraries:

import nltk
nltk.download('tagsets')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk import word_tokenize

This will download all the necessary files to tag the text using NLTK.

### Tagging the corpus with NLTK ###
#for storing results#
nltk_pos = []
##for every sentence ##
for i in tqdm(corpus):
  # Tokenize sentence #
  text = word_tokenize(i)
  #tag Words#
  z = nltk.pos_tag(text)
  # store #
  nltk_pos.append(z)

The PoS tags are in this format:

[(‘token_1’, ‘tag_1’), ………….. , (‘token_n’, ‘tag_n’)]

Lets extract PoS from this:

### Extracting final pos by nltk in a list ###

tmp = []
nltk_result = []

## every tagged sentence ##
for i in tqdm(nltk_pos):
  tmp = []
  ## every word ##
  for j in i:
    ## append tag (from index 1) ##
    tmp.append(j[1])
  # join the tags of every sentence #
  nltk_result.append(' '.join(tmp))

### check ###
for i in range(10):
  print(nltk_result[i])
  print(corpus[i])

The NLTK tags are ready for business.

Turning our attention to Flair now

Importing the libraries first:

!pip install flair
from flair.data import Sentence
from flair.models import SequenceTagger

Tagging using Flair

# initiating object #
pos = SequenceTagger.load('pos-fast')

#for storing pos tagged string#
f_pos = []
## for every sentence ##
for i in tqdm(corpus):
  sentence = Sentence(i)
  pos.predict(sentence)
  ## append tagged sentence ##
  f_pos.append(sentence.to_tagged_string())

###check ###
for i in range(10):
  print(f_pos[i])
  print(corpus[i])

The result is in the below format:

token_1 <tag_1> token_2 <tag_2> ………………….. token_n <tag_n>

Note: We can use different taggers available within the Flair library. Feel free to tinker around and experiment. You can find the list here.

Extract the sentence-wise tags as we did in NLTK

Import re

### Extracting POS tags ###
## in every sentence by index ##
for i in tqdm(range(len(f_pos))):
  ## for every words ith sentence ##
  for j in corpus[i].split():
    ## replace that word from ith sentence in f_pos ##
    f_pos[i] = str(f_pos[i]).replace(j,"",1)

  ## Removing < > symbols ##
  for j in  ['<','>']:
    f_pos[i] = str(f_pos[i]).replace(j,"")

    ## removing redundant spaces ##
    f_pos[i] = re.sub(' +', ' ', str(f_pos[i]))
    f_pos[i] = str(f_pos[i]).lstrip()

### check ###
for i in range(10):
  print(f_pos[i])
  print(corpus[i])

Aha! We have finally tagged the corpus and extracted them sentence-wise. We are free to remove all the punctuation and special symbols.

### Removing Symbols and redundant space ###

## in every sentence by index ##
for i in tqdm(range(len(corpus))):
  # Removing Symbols #
  corpus[i] = re.sub('[^a-zA-Z]', ' ', str(corpus[i]))
  POS[i] = re.sub('[^a-zA-Z]', ' ', str(POS[i]))
  f_pos[i] = re.sub('[^a-zA-Z]', ' ', str(f_pos[i]))
  nltk_result[i] = re.sub('[^a-zA-Z]', ' ', str(nltk_result[i]))

  ## Removing HYPH SYM (they are for symbols) ##
  f_pos[i] = str(f_pos[i]).replace('HYPH',"")
  f_pos[i] = str(f_pos[i]).replace('SYM',"")
  POS[i] = str(POS[i]).replace('SYM',"")
  POS[i] = str(POS[i]).replace('HYPH',"")
  nltk_result[i] = str(nltk_result[i].replace('HYPH',''))
  nltk_result[i] = str(nltk_result[i].replace('SYM',''))                     

  ## Removing redundant space ##
  POS[i] = re.sub(' +', ' ', str(POS[i]))
  f_pos[i] = re.sub(' +', ' ', str(f_pos[i]))
  corpus[i] = re.sub(' +', ' ', str(corpus[i]))
  nltk_result[i] = re.sub(' +', ' ', str(nltk_result[i]))

We have tagged the corpus using NLTK and Flair, extracted and removed all the unnecessary elements. Let’s see it for ourselves:

for i in range(1000):
  print('corpus   '+corpus[i])
  print('actual   '+POS[i])
  print('nltk     '+nltk_result[i])
  print('flair    '+f_pos[i])
  print('-'*50)

OUTPUT:

corpus   SOCCER JAPAN GET LUCKY WIN CHINA IN SURPRISE DEFEAT
actual    NN NNP VB NNP NNP NNP IN DT NN
nltk        NNP NNP NNP NNP NNP NNP NNP NNP NNP
flair        NNP NNP VBP JJ NN NNP IN NNP NNP
————————————————–
corpus   Nadim Ladki
actual    NNP NNP
nltk        NNP NNP
flair        NNP NNP
————————————————–
corpus   AL AIN United Arab Emirates
actual    NNP NNP NNP NNPS CD
nltk        NNP NNP NNP VBZ JJ
flair        NNP NNP NNP NNP CD

That looks convincing!

Step 4: Evaluating the PoS tags from NLTK and Flair against the tagged dataset

Here, we are doing word-wise evaluation of the tags with the help of a custom-made evaluator.

corpus Japan coach Shu Kamo said The Syrian own goal proved lucky for us
actual NNP NN NNP NNP VBD POS DT JJ JJ NN VBD JJ IN PRP
nltk NNP VBP NNP NNP VBD DT JJ JJ NN VBD JJ IN PRP
flair NNP NN NNP NNP VBD DT JJ JJ NN VBD JJ IN PRP

Note that in the example above, the actual POS tags contain redundancy compared to NLTK and flair tags as shown (in bold). Therefore we will not be considering the POS tagged sentences where the sentences are of unequal length.

### EVALUATION FUNCTION ###
def eval(x,y):
  # correct match #
  count = 0
  #Total comparisons made# 
  comp = 0
  ## for every sentence index in dataset ##
  for i in range(len(x)):
    ## if the sentence length match ##
    if len(x[i].split()) == len(y[i].split()):
      ## compare each word ##
      for j in range(len(x[i].split())):
        if x[i][j] == y[i][j] :
          ## Match! ##
          count = count+1
          comp = comp + 1
        else:
          comp = comp + 1
  return (count/comp)*100

Finally we evaluate the POS tags of NLTK and Flair against the POS tags provided by the dataset.

print("nltk Score ", eval2(POS,nltk_result))
print("Flair Score ", eval2(POS,f_pos))

Our Result:

NLTK Score: 85.38654023442645

Flair Score: 90.96172124773179

Well, well, well. I can see why Flair has been getting so much attention in the NLP community.

End Notes

Flair clearly provides an edge in word embeddings and stacked word embeddings. These can be implemented without much hassle due to its high level API. The Flair embedding is something to keep an eye on in the near future.

I love that the Flair library supports multiple languages. The developers are additionally currently working on “Frame Detection” using flair. The future looks really bright for this library.

I personally enjoyed working and learning the in’s and out’s of this library. I hope you found the tutorial useful and will be using Flair to your advantage next time you take up an NLP challenge.

Sharoon Saxena

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

kabir ali

Getting error at File "", line 9 z = token.embedding.size()[0] ^ SyntaxError: invalid syntax

Show 1 reply

Hi Kabir, I think you might be missing a bracket somewhere in that cell.

jamil

nice tutorial! please do you have any guide to use flair with some neural network like LSTM.

houssemus

Hi sir , I would like to extract named entities from resumes which I have as text files . please can you suggest how to do that ? flair framework seems to work only with sentence=Sentence('') . whereas I would like my input to be a text file . how can I do that please ? thanks !

Reading list

Data analyst Learning Path

Tableau Learning Path

NLP Learning Path

Data Scientist Learning Path

Data Engineer Learning Path

MLOps Learning Path

AI Engineer Learning Path

Computer Vision Learning Path

Generative AI Learning Path

Generative AI Roadmap for Enterprises

LLMs Roadmap

Prompt Engineer Leaning Path

Introduction to Flair for NLP: A Simple yet Powerful State-of-the-Art NLP Library

Introduction

Table of contents

What is ‘Flair’ Library?

What Gives Flair the Edge?

Introduction to Contextual String Embeddings for Sequence Labeling

There are two primary factors powering contextual string embeddings:

Performing NLP Tasks in Python using Flair

Setting up the Environment

Why use Colab?

About the Dataset

1. Text Classification Using Flair Embeddings

Step 1: Import the data into the local Environment of Colab:

Step 2: Installing Flair

Step 3: Preparing text to work with Flair

Step 4: Word Embeddings with Flair

Step 5: Vectorizing the text

Step 6: Partitioning the data for Train and Test Sets

Step 6: Building the Model and Defining Custom Evaluator (for F1 Score)

Step 7: Time for predictions!

2. Part of Speech (POS) Tagging with Flair

Step 1: Importing the dataset

Step 2 : Extracting Sentences and PoS Tags from the dataset

Step 3: Tagging the text using NLTK and Flair

Step 4: Evaluating the PoS tags from NLTK and Flair against the tagged dataset

End Notes

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm

AnalyticsSyncHistory