Lecture 9: Database Primer 1
Introduction to Databases
Types of Databases
There are five categories of databases which are useful to consider: Paper-based, Flat, Relational, Object, and Hybrid. It is important to realize that databases have been around for a long time (here is Wikipedia's cut on that), but in a paper-based form that required creating data sheets in duplicate, triplicate, and even quadriplicate in order to store it efficiently for quick retrieval. A flat database is one where the data is stored in one big file and read into arrays that are managed by the calling code (in our project 2, we looked at a whole file directory instead of one file but we treated the directory as if it were a flat database).
Relational databases are special databases that follow the mathematical proofs of Yourdon Codd. Codd proved mathematically that his storage and retrieval methods were the fastest way to store data possible given short text or numeric units of storage (the fundimental unit being called a field value). Because he so elegantly promoted his proofs, a whole huge electronic database industry grew up that was extremely consistent across database management system designers. Such a consistentcy is rare in many technological tool designs as usually designers compete over the implementation. Because of the consistency, the database management system industry had little resistance to creating a Structured Query Language (SQL) for all databases to incorporate as their interface. Society has benefited tremendously from that happening - compare the state of the art in databases to the state of the art in Operating Systems, for example -> much harder to change your operating system than your relational database management system (RDMS).
Although big business usually uses data that requires only numbers and small identifiers to make decisions, the expansion of electronic data capture has moved data storage into the realm of medics/dentists (digital x-rays), artists (digital art), home users (digital photography) and almost everyone. The data we store in those capacities is quite different than the data big business stored in the 1970s and 1980s (before the graphical computer drove data standards into new markets). And, the newer graphical data we store is not proven as optimal in the relational model. Instead, database management systems manufacturers are again competing in developing systems that improve graphical data storage. The generic, but often used, word for a database that can store all kinds of digital information is object. An object database can usually store large files in its guts in innovative ways for minimizing size while increasing retrieval speed.
A hybrid database management system is one that uses numbers and small text identifiers to tag all objects in ways that the relational services can find and explore the data quickly and the object services can provide the larger fields quickly on demand. Those large fields are stored as Binary Large OBjects (BLOBs). The word blob is thrown around a lot and just means a string of bits that can be retrieved based on its associative descriptive fields. We've used the word tag recently to refer to those identifiers as describing digital content.
Database Role in the Big Picture of the Web
Ideally, because databases are so fast and efficient, we perhaps should be working towards storing all our Web content in databases (or so many people think). The database has demonstrated much promise as a structured storage medium, and that structure would let us more easily incorporate new Web-based visions such as the Semantic Web. Almost all database management systems contain the same services (plus some bonus features they all compete with) and therefore we can put our data into any database management system and still communicate across databases. There are many flavors of XML databases which are really a nice transition from static HTML to static XML to XML database. The XML databases use a relational engine at their core in order to take advantage of the mathematical properties of a relational database. And, so, although XML databases are good to learn, they are probably not the best place to start. Dive into XML databases on your own after you have learned MySQL in this class - just to see the differences and get a sense of whether you are a promoter or an antagonist to that vision.
Database Tables as Columns and Rows
All relational databases can be thought of as a collection of virtual spreadsheets (which is why a spreadsheet like Excel can export so readily into a relational database management system format). Each virtual spreadsheet exists in computer memory as a table and consists of records (which go across line by line like a row does) and attributes (which go column by column). These terms are entrenched in database discussions. Some tables are stored in the electronic database literally as they are produced in memory (so you can load them and store them directly). Other tables are generated from multiple tables in the electronic database and so aren't stored on a hard drive as they are kept in memory. There is a huge relational database theory concept that dictates how these tables should be stored and generated. We could spend 20 weeks just on that (similar to how we could spend 20 weeks on arrays). I recommend you do that over the course of your time working with dynamic Web pages - but, you don't have to learn it all at once. In this class, you'll just learn the basics to motivate you to learn more and yet be productive with dynamic websites without understanding everything.
Database Records and Field Data Types
Relational databases are row dominant, meaning that you'll find data fields stored closer together on the hard drive based on the row they are in than the column. This makes sense since you are most likely interested in the bundle of fields in a row than a column. For example, in a card catalog database at a library, you are more likely interested in knowing all the attributes of a book (title, author, date, dewey decimal number, etc.) you are interested in than all the titles in the library. Still, being able to access all fields in a column is useful for searching purposes (where you want to find all books written by your favorite author of late). Database fields have data types just like PHP variables have data types. You need to learn about data types to be an efficient data manager. And yet, efficiency is nowhere as important today as it was years back when relational theory was being developed. The cost of storage is less than a penny on the dollar compared to a dollar's storage back then. People store data very inefficiently now out of laziness and/or tight deadlines. Still, if you learn about data types (which is not too difficult and relates to many other things in computing) you can store your data more efficiently without losing any time in doing so.
Structured Query Language and MySQL as one Implementation
MySQL is just one of many, many database management systems that can understand SQL. This is great news as you can learn SQL with MySQL in this class and then use it with all the other popular databases in production (Oracle, IBM's DB2, Microsoft's SQL Server, Postgres, INGRES, etc.). MySQL is free for download and is quite capable and so it is just perfect for our class and for you to use when you can choose your own databases management system in the future. Let's be clear it is just one implementation. We'll use a Parts Tutorial and Command Reference from the PostgreSQL documentation site and see how it works just as well when using a MySQL back-end.
Introduction to Project 3 - Ocean Picture of the Day (see oceanproject.org)
We built an OPOD database in class using the following MySQL commands:
mysql> create database opod;
Query OK, 1 row affected (0.04 sec)
mysql> use opod
Database changed
mysql> create table picture (id int, size int, date date, lat double, lon double, type int, owner_id int, name int);
Query OK, 0 rows affected (0.05 sec)
mysql> insert into picture values (0, 52540, '2008-10-12', 45.3, -64.9, 0, 0, '2.jpg');
Query OK, 1 row affected (0.01 sec)
mysql> create table submits (id int, caption text, status int, use_date date, picture_id int, user_id int);
Query OK, 0 rows affected (0.05 sec)
mysql> create table user (id int, first text, last text, email text);
Query OK, 0 rows affected (0.04 sec)
mysql> create table type (id int, description text);
Query OK, 0 rows affected (0.04 sec)
mysql>insert into type values (0, '.jpg');
mysql>insert into type values (1, '.gif');
mysql>insert into type values (2, '.png');
Query OK, 3 rows affected (0.01 sec)
mysql> select * from type;
+------+-------------+
| id | description |
+------+-------------+
| 0 | .jpg |
| 1 | .gif |
| 2 | .png |
+------+-------------+
3 rows in set (0.02 sec)
Here is the interactive MySQL terminal feedback as I created the database for our tutorial from class:
mysql> create database parts;
Query OK, 1 row affected (0.04 sec)
mysql> use parts
Database changed
mysql> create table SUPPLIER (SNO int, SNAME text, CITY text);
Query OK, 0 rows affected (0.07 sec)
mysql> insert into SUPPLIER values (1,'Smith','London');
mysql> insert into SUPPLIER values (2,'Jones','Paris');
mysql> insert into SUPPLIER values (3,'Adams','Vienna');
mysql> insert into SUPPLIER values (4,'Blake','Rome');
Query OK, 4 rows affected (0.01 sec)
mysql> create table SELLS (SNO int, PNO int);
Query OK, 0 rows affected (0.00 sec)
mysql> insert into SELLS values (1,1);
mysql> insert into SELLS values (1,2);
mysql> insert into SELLS values (2,4);
mysql> insert into SELLS values (3,1);
mysql> insert into SELLS values (3,3);
mysql> insert into SELLS values (4,2);
mysql> insert into SELLS values (4,3);
mysql> insert into SELLS values (4,4);
Query OK, 8 rows affected (0.01 sec)
mysql> create table PART (PNO int, PNAME text, PRICE real);
Query OK, 0 rows affected (0.00 sec)
mysql> insert into PART values( 1, 'Screw', 10);
mysql> insert into PART values( 2, 'Nut' , 8);
mysql> insert into PART values(3, 'Bolt', 14);
mysql> insert into PART values(4, 'Cam', 25);
Query OK, 4 rows affected (0.00 sec)
mysql> select * from part;
+------+-------+-------+
| pno | pname | price |
+------+-------+-------+
| 1 | Screw | 10 |
| 2 | Nut | 8 |
| 3 | Bolt | 15 |
| 4 | Cam | 25 |
+------+-------+-------+
4 rows in set (0.00 sec)
|