0% found this document useful (0 votes)
178 views26 pages

Full-Text Search and PostgresSQL Extensions PDF

The document discusses functions for manipulating data in PostgreSQL, including full text search functions and extensions. It covers the LIKE operator for pattern matching, as well as full text search which provides natural language queries and accounts for variations, stemming, and spelling mistakes. The document also discusses user-defined data types, functions, and common extensions used with PostgreSQL like PostGIS for location data and fuzzystrmatch for improving full text search capabilities.

Uploaded by

dieko
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
178 views26 pages

Full-Text Search and PostgresSQL Extensions PDF

The document discusses functions for manipulating data in PostgreSQL, including full text search functions and extensions. It covers the LIKE operator for pattern matching, as well as full text search which provides natural language queries and accounts for variations, stemming, and spelling mistakes. The document also discusses user-defined data types, functions, and common extensions used with PostgreSQL like PostGIS for location data and fuzzystrmatch for improving full text search capabilities.

Uploaded by

dieko
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 26

Introduction to full-

text search
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L

Brian Piccolo
Sr. Director, Digital Strategy
Topics
Full Text search

Extending PostgreSQL

Improving full text search with extensions

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


The LIKE operator
_ wildcard: Used to match exactly one character.
% wildcard: Used to match zero or more characters.
SELECT title
FROM film
WHERE title LIKE 'ELF%';

+----------------------+
| title |
+----------------------+
| ELF PARTY |
+----------------------+

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


The LIKE operator
SELECT title
FROM film
WHERE title LIKE '%ELF';

+----------------------+
| title |
+----------------------+
| ENCINO ELF |
| GHOSTBUSTERS ELF |
+----------------------+

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


The LIKE operator
SELECT title
FROM film
WHERE title LIKE '%elf%'; it0s case sensitivd

+----------------------+
| title |
+----------------------+

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


LIKE versus full-text search
While the LIKE operator is a simple pattern matching tool in your SQL toolbox, it's an
expensive operation in terms of performance

SELECT title, description


FROM film
match operator

WHERE to_tsvector(title) @@ to_tsquery('elf'); this statement? accounts for variation and is case insensitive

+----------------------+
| title |
+----------------------+
| ELF PARTY |
| ENCINO ELF |
| GHOSTBUSTERS ELF |
+----------------------+

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


What is full-text search?
Full text search provides a means for performing natural language queries of text data in your
database.

Stemming

Spelling mistakes

Ranking

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


Full-text search syntax explained
SELECT title, description
FROM film
WHERE to_tsvector(title) @@ to_tsquery('elf');

SELECT to_tsvector(description)
FROM film;

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


Let's practice!
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
Extending
PostgreSQL
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L

Brian Piccolo
Sr. Director, Digital Strategy
User-de ned data types
Enumerated data types

CREATE TYPE dayofweek AS ENUM (


'Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday',
'Sunday'
);

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


Getting information about user-de ned data
types
SELECT typname, typcategory
FROM pg_type table to get info about all data types availabe in your databse

WHERE typname='dayofweek';

+-----------+-------------+
| typname | typcategory |
|-----------|-------------|
| dayofweek | E |
+-----------+-------------+

E= enum type

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


Getting information about user-de ned data
types
SELECT column_name, data_type, udt_name
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name ='film';

+-----------------------------------------------+
| column_name | data_type | udt_name |
|-------------|-------------------|-------------|
| title | character varying | varchar |
| rating | USER-DEFINED | mpaa_rating |
+-----------------------------------------------+

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


User-de ned functions
CREATE FUNCTION squared(i integer) RETURNS integer AS $$
BEGIN
RETURN i * i;
END;
$$ LANGUAGE plpgsql; $$ specifies that the function will be suing sql as the language

SELECT squared(10);

+---------+
| squared |
|---------|
| 100 |
+---------+

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


User-de ned functions in the Sakila database
get_customer_balance(customer_id, effective_data): calculates the current outstanding balance for
a given customer.

inventory_held_by_customer(inventory_id): returns the customer_id that is currently renting an


inventory item or null if it's currently available.

inventory_in_stock(inventory_id): returns a boolean value of whether an inventory item is


currently in stock.

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


Let's practice!
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
Intro to PostreSQL
extensions
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L

Brian Piccolo
Sr. Director, Digital Strategy
Intro to PostgreSQL extensions
Commonly used extensions
PostGIS adds support for allowing location queries to be run in sql

PostPic allows for image processing within the databse

fuzzystrmatch
extend full text search capabilities by finding similarities between strings

pg_trgm

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


Querying extension meta data
Available Extensions Installed Extensions
SELECT name SELECT extname
FROM pg_available_extensions; FROM pg_extension;

+--------------------+ +---------+
| name | | name |
|--------------------| |---------|
| dblink | | plpgsql |
| pg_stat_statements | +---------+
+--------------------+

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


--Enable the fuzzystrmatch extension
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
--Confirm that fuzzstrmatch has been enabled
SELECT extname FROM pg_extension;

+---------------+
| name |
|---------------|
| plpgsql |
| fuzzystrmatch |
+---------------+

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


Using fuzzystrmatch or fuzzy searching
SELECT levenshtein('GUMBO', 'GAMBOL');

+-------------+
| levenshtein |
|-------------|
| 2 |
+-------------+

levenshtein distance represents the number of edits


required to convert one string to another string being
compared.

In a search application or when performing data


analysis on any data that contains manual user input,
you will always want to account for typos or incorrect
spellings. The levenshtein function provides a great
method for performing this task

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


Compare two strings with pg_trgm
SELECT similarity('GUMBO', 'GAMBOL');

+------------+
| similarity |
|------------|
| 0.18181818 |
+------------+

determie the similarity of two strings using trigram 8groups of 3 consecutive chars in a str) matchings

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


-- Select the title and description columns
In this exercise, we are going to use many of the techniques
SELECT
and concepts we learned throughout the course to generate
title,
a data set that we could use to predict whether the words
description
and phrases used to describe a film have an impact on the
FROM
number of rentals.
film
WHERE
First, you need to create a tsvector from the description
-- Match "Astounding Drama" in the description
column in the film table. You will match against a tsquery to
to_tsvector(description) @@
determine if the phrase "Astounding Drama" leads to more
to_tsquery('Astounding & Drama');
rentals per month.

SELECT
title,
Next, create a new column using the description,
similarity function to rank the film -- Calculate the similarity

Let's practice!
descriptions based on this phrase. similarity(description, 'Astounding Drama')
FROM
film
WHERE
to_tsvector(description) @@
to_tsquery('Astounding & Drama')
ORDER BY
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
similarity(description, 'Astounding Drama') DESC;
Putting it All
Together
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L

Brian Piccolo
Sr. Director, Digital Strategy
Functions for manipulating data recap and review
Common data types in PostgreSQL

Date/time functions and operators

Parsing and manipulating text

PostgreSQL Extensions and full-text search

FUNCTIONS FOR MANIPULATING DATA IN POSTGRESQL


Thank you!
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L

You might also like