spacer
Home News Links People Catalog
spacer
activepages
spacer

Week 10: Database Modeling and Querying

Partner Database Example

In class, we looked at a PHP report example (based on the partner data):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-gb" lang="en-gb" >
<head>
<title>Partner Report</title>
</head>
<body>
<center><img src="http://theoceanproject.org/wp-content/themes/ocean/img/header.png" /></center>
<div style="position:absolute;top:240px;left:50%;margin-left:-200px;">
<h1>Number of Partners by Location</h1>
<?php
//Connect To Database
$hostname='localhost';
$username='bdc';
$password='password';
$dbname='partner';

$query = 'SELECT state, COUNT(state) AS count FROM partner GROUP BY state ORDER by count DESC, state ASC';
//echo $query;

mysql_connect($hostname, $username, $password) OR DIE (mysql_error());
mysql_select_db($dbname);

$result = mysql_query($query);
if($result) {
    echo '<table>';
    while($row = mysql_fetch_array($result)) {
        echo '<tr><td>'.$row['state'].'</td><td>'.$row['count'].'</td></tr>';
        echo "\n";
    }
    echo '</table>';
}
?>
</div>
</body>
</html>

Other Partner Queries of Interest

SELECT state, COUNT(state) AS count FROM partner WHERE length(state) = 2 GROUP BY state ORDER by count DESC, state ASC

SELECT title FROM partner WHERE state='HI' AND year >= 2007

SELECT title FROM partner WHERE state=(SELECT state FROM partner WHERE year = 2012 GROUP BY state ORDER BY COUNT(state) DESC LIMIT 1) AND year = 2012

Database Modeling

Wikipedia has a very good page on Data Modeling.

One goal of a good data modeling session is to create or refine an Entity-Relationship diagram. There are many examples of E-R diagrams on the Web like this one here.

OPOD Querying

We downloaded the OPOD database SQL export with representative data for a year. We ran the SQL commands contained in the download in the SQL tab of our phpmyadmin interface to the mysql services in the MAMP bundle we had installed in the classroom. We then played with the following SQL queries:

Query Examples

We looked at the top state/countries from which pictures had been sent:


SELECT state, count(state) 
FROM pic 
GROUP BY state 
ORDER BY count(state) DESC
We looked at the top state/countries for which pictures had been used in the OPOD service:

SELECT state, count(state) 
FROM pic, submits 
WHERE status=1 AND pic.id = submits.picture_id
GROUP BY state 
ORDER BY count(state) DESC
We looked at the distribution of first names of picture submitters:

SELECT first, count(first) as A
FROM submits, user 
WHERE user.id = submits.user_id
GROUP BY first 
ORDER BY A DESC
We looked at differences in submission behaviors between northern and southern hemispheres (very roughly). Here we look at number of submissions outside of the southern hemisphere summer months:

SELECT count(date)
FROM pic 
WHERE date <= '2010-12-01' AND date >= '2010-03-31' 
AND (state = 'Australia' OR state = 'New Zealand' OR state = 'South Africa')
And here we look at number of submissions inside of the southern hemisphere summer months. Feel free to combine these two queries as one command (perhaps with subqueries, but other options are available) but I don't expect you to master SQL to that level in this class:

SELECT count(date)
FROM pic 
WHERE date > '2010-12-01' OR
date < '2010-03-31' AND (state = 'Australia' OR state = 'New Zealand' OR state = 'South Africa')
Here we look at the average number of days a submission takes to be used from the day the picture was taken (note that mysql provides a DATEDIFF function to properly calculate number of days which turns out to be nearer to 84 than the 17 I guesstimated):

SELECT AVG(DATEDIFF(submits.use_date, date(pic.date)))
FROM pic, submits 
WHERE pic.id = submits.picture_id AND date > '2007-01-01' AND date < '2011-03-10' AND use_date > '2009-09-01' AND submits.status=1
Here we use the mysql SUBSTRING function to get an aggregate count of aall submitters by their email type. Note that 151 users did not want to report their email address:
 
SELECT SUBSTRING(email, -4) as N, count(SUBSTRING(email, -4)) as R
FROM user
GROUP BY N
ORDER BY R desc 
We suggested we would do an analysis of which words were used most often in the captions submitted with OPOD pictures. We're limited by the string processing functions as seen at http://dev.mysql.com/doc/refman/5.0/en/string-functions.html

So, I would process a basic SQL command to get all the captions into a PHP array and then use avialable PHP predefined functions to build another array that stored word counts as I went through each string and parsed them. Well beyond the scope of this course, but you get a sense of what fun that could be:

SELECT caption
FROM submits
We suggested it would be interesting to compare data by gender. We didn't ask for gender in our OPOD submission form so we'd have to stereotype names to make a best guess at the gender of each submitter before doing the analysis. A good reminder that data modeling is important to make sure we capture all attributes we will need to do all useful analyses in the future.