1
|
- Week 7
- LBSC 690
- Information Technology
|
2
|
- How they learned about Skype
- Role of content in an online community
- TerpConnect afs access control lists
|
3
|
- Questions
- Relational database design
- Microsoft Access
|
4
|
- Database
- Collection of data, organized to support access
- Models some aspects of reality
- DataBase Management System (DBMS)
- Software to create and access databases
- Relational Algebra
- Special-purpose programming language
|
5
|
- Field An “atomic” unit of data
- number, string, true/false, …
- Record A collection of related fields
- Table A collection of related records
- Each record is one row in the table
- Each field is one column in the table
- Primary Key The field that identifies a record
- Values of a primary key must be unique
- Database A collection of tables
|
6
|
|
7
|
- Which students are in which courses?
- What do we need to know about the students?
- first name, last name, email, department
- What do we need to know about the courses?
- course ID, description, enrolled students, grades
|
8
|
|
9
|
- Save space
- More rapid updates
- Every fact only needs to be updated once
- More rapid search
- Finding something once is good enough
- Avoid inconsistency
- Changing data once changes it everywhere
|
10
|
- Tables represent “relations”
- Course, course description
- Name, email address, department
- Named fields represent “attributes”
- Each row in the table is called a “tuple”
- The order of the rows is not important
- Queries specify desired conditions
- The DBMS then finds data that satisfies them
|
11
|
|
12
|
- For simple problems (like the homework)
- Start with “binary relationships”
- Pairs of fields that are related
- Group together wherever possible
- Add keys where necessary
- For more complicated problems
- Entity relationship modeling (LBSC 670)
|
13
|
|
14
|
- Data modeling for join is complex
- Useful to start with E-R modeling
- Join are expensive to compute
- Both in time and storage space
- But it is joins that make databases relational
- Projection and restriction also used in flat files
|
15
|
- “Primary Key” uniquely identifies a record
- e.g. student ID in the student table
- “Compound” primary key
- Synthesize a primary key with a combination of fields
- e.g., Student ID + Course ID in the enrollment table
- “Foreign Key” is primary key in the other table
- Note: it need not be unique in this table
|
16
|
|
17
|
|
18
|
- Graphical visualization of the data model
- Entities are captured in boxes
- Relationships are captured using arrows
|
19
|
|
20
|
- What questions must you answer?
- What data is needed to generate the answers?
- Entities
- Attributes of those entities
- Relationships
- Nature of those relationships
- How will the user interact with the system?
- Relating the question to the available data
- Expressing the answer in a useful form
|
21
|
|
22
|
- Entities
- Types
- Subtypes (disjoint / overlapping)
- Attributes
- Identifier
- Relationships
- Cardinality
- Existence
- Degree
|
23
|
|
24
|
- Pick a primary key for each entity
- Build the tables
- One per entity
- Plus one per M:M relationship
- Choose terse but memorable table and field names
- Check for parsimonious representation
- Relational “normalization”
- Redundant storage of computable values
- Implement using a DBMS
|
25
|
- 1NF: Single-valued indivisible (atomic) attributes
- Split “Doug Oard” to two attributes as (“DougR=
21;,
“Oard”)
- Model M:M implement-role relationship with a table
- 2NF: Attributes depend on complete primary key
- (id, impl-role, name)->(id, name)+(id, impl-role)
- 3NF: Attributes depend directly on primary key
- (id, addr, city, state, zip)->(id, addr, zip)+(zip, city, state)=
- 4NF: Divide independent M:M tables
- (id, role, courses) -> (id, role) + (id, courses)
- 5NF: Don’t enumerate derivable combinations
|
26
|
- Persons: id, fname, lname, userid, password
- Contacts: id, ctype, cstring
- Ctlabels: ctype, string
- Students: id, team, mrole
- Iroles: id, irole
- Rlabels: role, string
- Projects: team, client, pstring
|
27
|
|
28
|
- Registrar database must be internally consistent
- Enrolled students must have an entry in student table
- Courses must have a name
- What happens:
- When a student withdraws from the university?
- When a course is taken off the books?
|
29
|
- Conditions that must always be true
- Specified when the database is designed
- Checked when the database is modified
- RDBMS ensures integrity constraints are respected
- So database contents remain faithful to real world
- Helps avoid data entry errors
|
30
|
- Foreign key values must exist in other table
- If not, those records cannot be joined
- Can be enforced when data is added
- Associate a primary key with each foreign key
- Helps avoid erroneous data
- Only need to ensure data quality for primary keys
|
31
|
- Natural language
- Goal is ease of use
- e.g., Show me the last names of students in CLIS
- Ambiguity sometimes results in errors
- Structured Query Language (SQL)
- Consistent, unambiguous interface to any DBMS
- Simple command structure:
- e.g., SELECT Last name FROM Students WHERE Dept=3DCLIS
- Useful standard for inter-process communications
- Visual programming (e.g., Microsoft Access)
- Unambiguous, and easier to learn than SQL
|
32
|
- Create a database called M:\rides.mdb
- File->New->Blank Database
- Specify the fields (columns)
- “Create a Table in Design View”
- Fill in the records (rows)
- Double-click on the i=
con
for the table
|
33
|
- Enter field name
- Must be unique, but only within the same table
- Select field type from a menu
- Use date/time for times
- Use text for phone numbers
- Designate primary key (right mouse button)
- Save the table
- That’s when you get to assign a table name
|
34
|
- Open the table
- Enter new data in the bottom row
- A new (blank) bottom row will appear
- Close the table
- No need to “save” – data is stored automatically<=
/li>
|
35
|
- Copy ride.mdb to your M:\ drive
- “Create Query in Design View”
- Choose two tables, Flight and Company
- Pick each field you need using the menus
- Unclick “show” to not project
- Enter a criterion to “restrict”
- Save, exit, and reselect to run the query
|
36
|
- Joins are automatic if field names are same
- Otherwise, drag a line between the fields
- Sort order is easy to specify
- Queries form the basis for reports
- Reports give good control over layout
- Use the report wizard - the formats are complex
|
37
|
- Forms manage input better than raw tables
- Invalid data can be identified when input
- Graphics can be incorporated
|
38
|
- Project chooses columns
- Restrict chooses rows
- Based on their contents
- e.g. department ID =3D “HIST”
- These can be specified together
- SELECT Student ID, Dept WHERE Dept =3D “History”
|
39
|
- Each SELECT contains a single WHERE
- Numeric comparison
- Boolean operations
- e.g., Name =3D “John” AND Dept <> “HISTR=
21;
|
40
|
- Some typical database applications:
- Banking (e.g., saving/checking accounts)
- Trading (e.g., stocks)
- Airline reservations
- Characteristics:
- Lots of data
- Lots of concurrent access
- Must have fast access
- “Mission critical”
|
41
|
- Thought experiment: You and your project partner are editing the same
file…
- Scenario 1: you both save it at the same time
- Scenario 2: you save first, but before it’s done saving, your
partner saves
|
42
|
- Possible actions on a checking account
- Deposit check (read balance, write new balance)
- Cash check (read balance, write new balance)
- Scenario:
- Current balance: $500
- You try to deposit a $50 check and someone tries to cash a $100 che=
ck
at the same time
- Possible sequences: (what happens in each case?)
|
43
|
- Transaction: sequence of grouped database actions
- e.g., transfer $500 from checking to savings
- “ACID” properties
- Atomicity
- Consistency
- Each transaction must take the DB between consistent states.
- Isolation:
- Concurrent transactions must appear to run in isolation
- Durability
- Results of transactions must survive even if systems crash
|
44
|
- Idea: keep a log (history) of all actions carried out while executing
transactions
- Before a change is made to the database, the corresponding log entr=
y is
forced to a safe location
- Recovering from a crash:
- Effects of partially executed transactions are undone
- Effects of committed transactions are redone
|
45
|
- Databases are a good choice when you have
- Lots of data
- A problem that contains inherent relationships
- Design before you implement
- This is just another type of programming
- The mythical person-month applies!
- Join is the most important concept
- Project and restrict just remove undesired stuff
|
46
|
- On a sheet of paper, answer the following (ungraded) question (no
names, please):
- What was the muddiest point in today’s class?
|