By the end of this unit, you should be able to
- Interpret relational algebra terms (queries)
- Define relational algebra terms to query a relational model
Given that relational model is a logical model (abstracting away the implementation details), we need something operations defined on the same layout to manupulate the data in a relational model. (So that we don't need to deal with the implementation details yet.)
Like math algebra, Relational Algebra is a way to express data operation through symbols and relation manipulations.
- The data manipulation operations are defined in terms of a sequence of operation applications.
- Each symbolic operator takes one or more relation(s) (or intermediate relations) as operands and produces one result relation.
For example, given a relation of Publish(article_id, book_id, publisher_id)
The instance of a Publish relation is given
article_id | book_id | publisher_id |
---|---|---|
a1 | b1 | p1 |
a2 | b1 | p1 |
a1 | b2 | p2 |
The query "finding all articles that are published by both publishers p1 and p2" can be expressed as the following relational algebra term.
A selection operator,
Note that
- a simple predicate such as a equality test $name = "tom" or
- a conjunction or disjunction of predicates, e.g.
$name = "tom"\ {\tt AND} \ age > 21$ .
For example, the following relational algebra expression returns a relation with all tuples from p1
.
A projection operator
For example, the following expression returns a relation all all $article_id$s from p1
.
An intersection operation
We have seen an example of this earlier.
A union operation
A difference operation
A cartesian product operation
For example, consider
A | B |
---|---|
a1 | 101 |
a2 | 102 |
C | D |
---|---|
a3 | 103 |
a4 | 104 |
R.A | R.B | S.C | S.D |
---|---|---|---|
a1 | 101 | a3 | 103 |
a1 | 101 | a4 | 104 |
a2 | 102 | a3 | 103 |
a2 | 102 | a4 | 104 |
Cartesian Product is one of the four possible join operators.
Let's discuss the other three.
The inner join operator
Let
A | B | C |
---|---|---|
a1 | 101 | 0 |
a2 | 102 | 1 |
a3 | 103 | 0 |
D | E | F |
---|---|---|
a3 | 103 | 'a' |
a1 | 107 | 'b' |
a5 | 105 | 'c' |
R.A | R.B | R.C | S.D | S.E | S.F |
---|---|---|---|---|---|
a1 | 101 | 0 | a1 | 107 | 'b' |
a3 | 103 | 0 | a3 | 103 | 'a' |
The natural join operator
Note that the common column are merged after natural join.
Let
A | B | C |
---|---|---|
a1 | 101 | 0 |
a2 | 102 | 1 |
a3 | 103 | 0 |
A | E | F |
---|---|---|
a3 | 103 | 'a' |
a1 | 107 | 'b' |
a5 | 105 | 'c' |
R.A | R.B | R.C | S.E | S.F |
---|---|---|---|---|
a1 | 101 | 0 | 107 | 'b' |
a3 | 103 | 0 | 103 | 'a' |
Right outer join
A | B | C |
---|---|---|
a1 | 101 | 0 |
a2 | 102 | 1 |
a3 | 103 | 0 |
D | E | F |
---|---|---|
a3 | 103 | 'a' |
a1 | 107 | 'b' |
a5 | 105 | 'c' |
R.A | R.B | R.C | S.D | S.E | S.F |
---|---|---|---|---|---|
a1 | 101 | 0 | a1 | 107 | 'b' |
a3 | 103 | 0 | a3 | 103 | 'a' |
NULL | NULL | NULL | a5 | 105 | 'c' |
Likewise for left outer join.
Renaming operation
We omit the attribute name
Aggregation operation
-
$A_1,...,A_n$ as the attribute names to group by -
$F_1(B_1),...,F_m(B_m)$ as the aggregated values where -
$F_1, ..., F_m$ are aggregation functions such asSUM()
,AVG()
,MIN()
,MAX()
,COUNT()
. -
$A_1, ..., A_n$ ,$B_1, ..., B_m$ are attributes from$R$ .
For example, given
A | B | C |
---|---|---|
a1 | 101 | 0 |
a2 | 102 | 1 |
a3 | 103 | 0 |
C | CNT |
---|---|
0 | 2 |
1 | 1 |
Sometimes we can rewrite the above expression as $${C} \gamma{{\tt COUNT}(B)\ {\tt as}\ CNT}(R)$$ without using the renaming operator
Alternative to relational algebra, relational calculus is designed to serve a similiar idea. The difference between them is that relational algebra is more procedural (like C, Java and Python) where relational calculus is more declarative (like CSS and SQL). If you are interested to find out more please refer to the text books. Note that relational calculus will not be assessed in this module.