unit 5
unit 5
The goal of this file is to help you get started with OQL. The examples presented
in this file refer to classes defined in the file "O2 Tutorial".
Please note that the syntax in some of the examples might need minor adjustment
before they will work with the current version of O2. If you find any errors, or
places which are unclear, or if you have any suggestions or comments, please let us
know (email to Michalis - mpetropo@cs.ucsd.edu). Your help is greatly
appreciated.
Introduction
OQL is the way to access data in an O2 database. OQL is a powerful and easy-to-
use SQL-like query language with special features dealing with complex objects,
values and methods.
Using OQL
We've been able to create classes and write some programs. So far, O2 appears to
be like an object-oriented programming language like C++ instead of a database
system. Probably the main difference is that O2 supports queries. The queries that
you'll be creating will look very similar to that of SQL.
In order to perform queries, you'll need to enter query mode. From within the O2
client, do the command:
query;
This will put you in a sub-shell for queries. From here, you can enter your queries
followed by Ctrl-D. To exit query mode, just hit Ctrl-D without entering a query.
Example Query 1
Give the names of people who are older than 26 years old:
We use the dot notation and path expressions to access components of complex
values.
Let variables t and ta range over objects in extents (persistent names) of Tutors
and TAs (i.e., range over objects in sets Tutors and TAs).
Cascade of dots can be used if all names represent objects and not a collection.
Example Query 2
Find the names of the students of all tutors:
SELECT s.name
FROM Tutors t, t.students s
Here we notice that the variable t that binds to the first collection of FROM is
used to help us define the second collection s. Because students is a
collection, we use it in the FROM list, like t.students above, if we want to access
attributes of students.
Example Query 3
Give the names of the Tutors which have a salary greater than $300 and have a
student paying more than $30:
SELECT t.name
FROM ( SELECT t FROM Tutors t WHERE t.salary > 300 ) r, r.students s
WHERE s.fee > 30
Example Query 4
Give the names of people who aren't TAs:
SELECT p.name
FROM p in People
WHERE not ( p.name in SELECT t.name FROM t in TAs )
The standard O2C operators for sets are + (union), * (intersection), and -
(difference). In OQL, the operators are written as UNION, INTERSECT
and EXCEPT , respectively.
Example Query 5
Give the names of TAs with the highest salary:
SELECT t.name
FROM t in TAs
WHERE t.salary = max ( select ta.salary from ta in TAs )
GROUP BY
The GROUP BY operator creates a set of tuples with two fields. The first has the
type of the specified GROUP BY attribute. The second field is the set of tuples
that match that attribute. By default, the second field is called PARTITION.
Example Query 6
Give the names of the students and the average fee they pay their Tutors:
1. Initial collection
We begin from collection Tutors, but technically it is a bag of tuples of the form:
where t1 is a Tutor object and s denotes a student tuple. In general, there are
fields for all of the variable bindings in the FROM clause.
2. Intermediate collection
The GROUP BY attribute s.name maps the tuples of the initial collection to the
value of the name of the student. The intermediate collection is a set of tuples of
type:
tuple( sname: string, partition: set( tuple(t: Tutor, s: tuple( name: string, fee:
real ) ) ) )
For example:
3. Output collection
Consists of student-average fee pairs, one for each tuple in the intermediate
collection. The type of tuples in the output is:
We let p range over all tuples in partition. Each of these tuples contains a Tutor
object and a student tuple. Thus, p.s.fee extracts the fee from one of the
student tuples.
Instead of using query mode, you can incorporate these queries in your O2
programs using the "o2query" command:
run body {
o2 real total_salaries;
o2query( total_salaries, "sum ( SELECT ta->get_salary \
FROM ta in TAs )" );
printf("TAs combined salary: %.2f\n", total_salaries);
};
The first argument for o2query is the variable in which you want to store the
query results. The second argument is a string that contains the query to be
performed. If your query string takes up several lines, be sure to backslash (\)
the carriage returns.
In MongoDB, data has a flexible schema. It is totally different from SQL database
where you had to determine and declare a table's schema before inserting data.
MongoDB collections do not enforce document structure.
The main challenge in data modeling is balancing the need of the application, the
performance characteristics of the database engine, and the data retrieval patterns.
Example
Suppose a client needs a database design for his blog/website and see the
differences between RDBMS and MongoDB schema design. Website has the
following requirements.
Every post has the unique title, description and url.
Every post can have one or more tags.
Every post has the name of its publisher and total number of likes.
Every post has comments given by users along with their name, message,
data-time and likes.
On each post, there can be zero or more comments.
In RDBMS schema, design for above requirements will have minimum three tables.
While in MongoDB schema, design will have one collection post and the following
structure −
{
_id: POST_ID
title: TITLE_OF_POST,
description: POST_DESCRIPTION,
by: POST_BY,
url: URL_OF_POST,
tags: [TAG1, TAG2, TAG3],
likes: TOTAL_LIKES,
comments: [
{
user:'COMMENT_BY',
message: TEXT,
dateCreated: DATE_TIME,
like: LIKES
},
{
user:'COMMENT_BY',
message: TEXT,
dateCreated: DATE_TIME,
like: LIKES
}
]
}
So while showing the data, in RDBMS you need to join three tables and in
MongoDB, data will be shown from one collection only.
If the create operation is successful, a new document is created. The function will return an
object where “acknowledged” is “true” and “insertID” is the newly created “ObjectId.”
> db.RecordsDB.insertOne({
... name: "Marsh",
... age: "6 years",
... species: "Dog",
... ownerAddress: "380 W. Fir Ave",
... chipped: true
... })
{
"acknowledged" : true,
"insertedId" : ObjectId("5fd989674e6b9ceb8665c57d")
}
insertMany()
It’s possible to insert multiple items at one time by calling the insertMany() method on the
desired collection. In this case, we pass multiple items into our chosen collection
(RecordsDB) and separate them by commas. Within the parentheses, we use brackets to
indicate that we are passing in a list of multiple entries. This is commonly referred to as a
nested method.
db.RecordsDB.insertMany([{
name: "Marsh",
age: "6 years",
species: "Dog",
ownerAddress: "380 W. Fir Ave",
chipped: true},
{name: "Kitana",
age: "4 years",
species: "Cat",
ownerAddress: "521 E. Cortland",
chipped: true}])
find()
In order to get all the documents from a collection, we can simply use the find() method on
our chosen collection. Executing just the find() method with no arguments will return all
records currently in the collection.
db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "6 years",
"species" : "Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "3 years",
"species" : "Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Kevin", "age" : "8 years",
"species" : "Dog", "ownerAddress" : "900 W. Wood Way", "chipped" : true }
Here we can see that every record has an assigned “ObjectId” mapped to the “_id” key.
If you want to get more specific with a read operation and find a desired subsection of the
records, you can use the previously mentioned filtering criteria to choose what results should
be returned. One of the most common ways of filtering the results is to search by value.
db.RecordsDB.find({"species":"Cat"})
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
findOne()
In order to get one document that satisfies the search criteria, we can simply use
the findOne() method on our chosen collection. If multiple documents satisfy the query, this
method returns the first document according to the natural order which reflects the order of
documents on the disk. If no documents satisfy the search criteria, the function returns null.
The function takes the following form of syntax.
db.{collection}.findOne({query}, {projection})
Let's take the following collection—say, RecordsDB, as an example.
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "8 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "6 years",
"species" : "Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "3 years",
"species" : "Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Kevin", "age" : "8 years",
"species" : "Dog", "ownerAddress" : "900 W. Wood Way", "chipped" : true }
And, we run the following line of code:
db.RecordsDB.find({"age":"8 years"})
We would get the following result:
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "8 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
Notice that even though two documents meet the search criteria, only the first document that
matches the search condition is returned.
Update Operations
Like create operations, update operations operate on a single collection, and they are atomic
at a single document level. An update operation takes filters and criteria to select the
documents you want to update.
You should be careful when updating documents, as updates are permanent and can’t be
rolled back. This applies to delete operations as well.
For MongoDB CRUD, there are three different methods of updating documents:
db.collection.updateOne()
db.collection.updateMany()
db.collection.replaceOne()
updateOne()
We can update a currently existing record and change a single document with an update
operation. To do this, we use the updateOne() method on a chosen collection, which here is
“RecordsDB.” To update a document, we provide the method with two arguments: an update
filter and an update action.
The update filter defines which items we want to update, and the update action defines how
to update those items. We first pass in the update filter. Then, we use the “$set” key and
provide the fields we want to update as a value. This method will update the first record that
matches the provided filter.
db.RecordsDB.updateOne({name: "Marsh"}, {$set:{ownerAddress: "451 W. Coffee St.
A204"}})
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "6 years",
"species" : "Dog", "ownerAddress" : "451 W. Coffee St. A204", "chipped" : true }
updateMany()
updateMany() allows us to update multiple items by passing in a list of items, just as we did
when inserting multiple items. This update operation uses the same syntax for updating a
single document.
db.RecordsDB.updateMany({species:"Dog"}, {$set: {age: "5"}})
{ "acknowledged" : true, "matchedCount" : 3, "modifiedCount" : 3 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "5", "species" :
"Dog", "ownerAddress" : "451 W. Coffee St. A204", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "5", "species" :
"Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Kevin", "age" : "5", "species" :
"Dog", "ownerAddress" : "900 W. Wood Way", "chipped" : true }
replaceOne()
The replaceOne() method is used to replace a single document in the specified
collection. replaceOne() replaces the entire document, meaning fields in the old document
not contained in the new will be lost.
db.RecordsDB.replaceOne({name: "Kevin"}, {name: "Maki"})
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "5", "species" :
"Dog", "ownerAddress" : "451 W. Coffee St. A204", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "5", "species" :
"Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Maki" }
Delete Operations
Delete operations operate on a single collection, like update and create operations. Delete
operations are also atomic for a single document. You can provide delete operations with
filters and criteria in order to specify which documents you would like to delete from a
collection. The filter options rely on the same syntax that read operations utilize.
MongoDB has two different methods of deleting records from a collection:
db.collection.deleteOne()
db.collection.deleteMany()
deleteOne()
deleteOne() is used to remove a document from a specified collection on the MongoDB
server. A filter criteria is used to specify the item to delete. It deletes the first record that
matches the provided filter.
db.RecordsDB.deleteOne({name:"Maki"})
{ "acknowledged" : true, "deletedCount" : 1 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species" : "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "5", "species" :
"Dog", "ownerAddress" : "451 W. Coffee St. A204", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "5", "species" :
"Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
deleteMany()
deleteMany() is a method used to delete multiple documents from a desired collection with a
single delete operation. A list is passed into the method and the individual items are defined
with filter criteria as in deleteOne().
db.RecordsDB.deleteMany({species:"Dog"})
{ "acknowledged" : true, "deletedCount" : 2 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years",
"species"
HBase Data Model
The HBase Data Model is designed to handle semi-structured data that may differ in
field size, which is a form of data and columns. The data model’s layout partitions the
data into simpler components and spread them across the cluster. HBase's Data
Model consists of various logical components, such as a table, line, column,
family, column, column, cell, and edition.
Table:
An HBase table is made up of several columns. The tables in HBase defines upfront
during the time of the schema specification.
Row:
An HBase row consists of a row key and one or more associated value columns.
Row keys are the bytes that are not interpreted. Rows are ordered lexicographically,
with the first row appearing in a table in the lowest order. The layout of the row key is
very critical for this purpose.
Column:
A column in HBase consists of a family of columns and a qualifier of columns, which
is identified by a character: (colon).
Column Family:
Apache HBase columns are separated into the families of columns. The column
families physically position a group of columns and their values to increase its
performance. Every row in a table has a similar family of columns, but there may not
be anything in a given family of columns.
The same prefix is granted to all column members of a column family. For example,
Column courses: history and courses: math, are both members of the column
family of courses. The character of the colon (:) distinguishes the family of columns
from the qualifier of the family of columns. The prefix of the column family must be
made up of printable characters.
During schema definition time, column families must be declared upfront while
columns are not specified during schema time. They can be conjured on the fly when
the table is up and running. Physically, all members of the column family are stored
on the file system together.
Column Qualifier
The column qualifier is added to a column family. A column standard could
be content (html and pdf), which provides the content of a column unit. Although
column families are set up at table formation, column qualifiers are mutable and can
vary significantly from row to row.
Cell:
A Cell store data and is quite a unique combination of row key, Column Family, and
the Column. The data stored in a cell call its value and data types, which is every
time treated as a byte[].
Timestamp:
In addition to each value, the timestamp is written and is the identifier for a given
version of a number. The timestamp reflects the time when the data is written on the
Region Server. But when we put data into the cell, we can assign a different
timestamp value.
Create
Let’s create an HBase table and insert data into the table. Now that we know, while creating a
table user needs to create required Column Families.
Here we have created two-column families for table ‘employee’. First Column Family is
‘Personal Info’ and Second Column Family is ‘Professional Info’.
1 create 'employee', 'Personal info', 'Professional Info'
2 0 row(s) in 1.4750 seconds
3
4 => Hbase::Table - employee
Upon successful creation of the table, the shell will return 0 rows.
Create a table with Namespace:
A namespace is nothing but a logical grouping of tables.’company_empinfo’ is the
namespace id in the below command.
1 create 'company_empinfo:employee', 'Personal info', 'Professional Info'
Create a table with version:
By default, versioning is not enabled in HBase. So users need to specify while creating.
Given below is the syntax for creating an HBase table with versioning enabled.
1 create 'tableName',{NAME=>"CF1",VERSIONS=>5},{NAME=."CF2",VERSIONS=>5}
2 create 'bankdetails',{NAME=>"address",VERSIONS=>5}
Put:
Put command is used to insert records into HBase.
1 put 'employee', 1, 'Personal info:empId', 10
2 put 'employee', 1, 'Personal info:Name', 'Alex'
3 put 'employee', 1, 'Professional Info:Dept, 'IT'
Here in the above example all the rows having Row Key as 1 is considered to be one row in
HBase.To add multiple rows
1 put 'employee', 2, 'Personal info:empId', 20
2 put 'employee', 2, 'Personal info:Name', 'Bob'
3 put 'employee', 2, 'Professional Info:Dept', 'Sales'
As discussed earlier, the user can add any number of columns as part of the row.
Read
‘get’ and ‘scan’ command is used to read data from HBase. Lets first discuss ‘get’ operation.
get: ‘get’ operation returns a single row from the HBase table. Given below is the syntax for
the ‘get’ method.
1 get 'table Name', 'Row Key'
1 hbase(main):022:get 'employee', 1
COLUMN CELL
Personal info:Name timestamp=1504600767520, value=Alex
Per sonal info:empId timestamp=1504600767491, value=10
Pro fessional Info:Dept timestamp=1504600767540, value=IT
3 row(s) in 0.0250 seconds
Note: Notice that there is a timestamp attached to each cell. These timestamps will update for
the cell whenever the cell value is updated. All the old values will be there but timestamp
having the latest value will be displayed as output.
Get all version of a column
Below given command is used to find different versions. Here ‘VERSIONS => 3’ defines
number of version to be retrieved.
1 get 'Table Name', 'Row Key', {COLUMN => 'Column Family', VERSIONS => 3}
scan:
‘scan’ command is used to retrieve multiple rows.
Select all:
The below command is an example of a basic search on the entire table.
1 scan 'Table Name'
1 hbase(main):074:> scan 'employee'
ROW COLUMN+CELL
1 column=Personal info:Name, timestamp=1504600767520, value=Alex
1 column=Personal info:empId, timestamp=1504606480934, value=15
1 column=Professional Info:Dept, timestamp=1504600767540, value=IT
2 column=Personal info:Name, timestamp=1504600767588, value=Bob
2 column=Personal info:empId, timestamp=1504600767568, value=20
2 column=Professional Info:Dept, timestamp=1504600768266, value=Sales
2 row(s) in 0.0500 seconds
Note: All the Rows are arranged by Row Keys along with columns in each row.
Column Selection:
The below command is used to Scan any particular column.
1 hbase(main):001:>scan 'employee' ,{COLUMNS => 'Personal info:Name' }
ROW COLUMN+CELL
1 column=Personal info:Name, timestamp=1504600767520, value=Alex
2 column=Personal info:Name, timestamp=1504600767588, value=Bob
2 row(s) in 0.3660 seconds
Limit Query:
The below command is used to Scan any particular column.
1 hbase(main):002:>scan 'employee' ,{COLUMNS => 'Personal info:Name',LIMIT =>1 }
ROW COLUMN+CELL
1 column=Personal info:Name, timestamp=1504600767520, value=Alex
1 row(s) in 0.0270 seconds
Update
To update any record HBase uses ‘put’ command. To update any column value, users need to
put new values and HBase will automatically update the new record with the latest
timestamp.
1 put 'employee', 1, 'Personal info:empId', 30
The old value will not be deleted from the HBase table. Only the updated record with the
latest timestamp will be shown as query output.
To check the old value of any row use below command.
1 get 'Table Name', 'Row Key', {COLUMN => 'Column Family', VERSIONS => 3}
Delete
‘delete‘ command is used to delete individual cells of a record.
The below command is the syntax of delete command in the HBase Shell.
1 delete 'Table Name' ,'Row Key','Column Family:Column'
1 delete 'employee',1, 'Personal info:Name'
Drop Table:
To drop any table in HBase, first, it is required to disable the table. The query will return an
error if the user is trying to delete the table without disabling the table. Disable removes the
indexes from memory.
The below command is used to disable and drop the table.
1 disable 'employee'
Once the table is disabled, the user can drop using below syntax.
1 drop 'employee'
You can verify the table in using ‘exist’ command and enable table which is already disabled,
just use ‘enable’ command.