If you’re planning to take the SY0-501 Security+ exam, you should have a basic understanding of database concepts and several of the secure coding techniques and attacks that apply directly to databases.
For example, can you answer this question?
Q. Database administrators have created a database used by a web application. However, testing shows that the application is taking a significant amount of time accessing data within the database. Which of the following actions is MOST likely to improve the overall performance of a database?
A. Normalization
B. Client-side input validation
C. Server-side input validation
D. Obfuscation
More, do you know why the correct answer is correct and the incorrect answers are incorrect? The answer and explanation is available at the end of this post.
Many attacks target server applications such as those hosted on web servers. Web servers are highly susceptible to several types of attacks, such as buffer overflow attacks and SQL injection attacks, because they commonly accept data from users. Other servers are susceptible to some types of command injection attacks.
Database Concepts
Several of the secure coding techniques and attacks apply directly to databases. SQL (pronounced as “sequel” or “es-que-el”) is a Structured Query Language used to communicate with databases. SQL statements read, insert, update, and delete data to and from a database. Many web sites use SQL statements to interact with a database, providing users with dynamic content.
A database is a structured set of data. It typically includes multiple tables and each table holds multiple columns and rows. As an example, consider the figure. It shows the database schema for a database intended to hold information about books and their authors. It includes two incorrect entries, which are described in this blog.
Database schema
The Book table (on the left) identifies the column names for the table. Each of these columns has a name and identifies the data type or attribute type allowed in the column. For example, INT represents integer, VARCHAR represents a variable number of alphanumeric characters, TEXT is used for paragraphs, and decimal can store monetary values.
The Author table holds information on authors, such as their names and addresses. The BookAuthor table creates a relationship between the Book table and the Author table. The Publisher column should not be there, but it helps describe normalization in the next section.
The figure shows three rows of the Author table. It also shows the difference between columns and rows. Because the column identifies the data type, columns are sometimes referred to as attributes. Also, because each row represents a record, rows are sometimes called records or tuples.
Database table
Individual elements within a database are called fields. For example, the field in the second row of the FirstName column is a field holding the value of Moe.
Normalization
Normalization of a database refers to organizing the tables and columns to reduce redundant data and improve overall database performance. Although there are several normal forms, the first three are the most important.
First Normal Form
A database is in first normal form (1NF) if it meets the following three criteria:
- Each row within a table is unique and identified with a primary key.
- Related data is contained in a separate table.
- None of the columns include repeating groups.
Second Normal Form
Second normal form (2NF) only applies to tables that have a composite primary key, where two or more columns make up the full primary key. The BookAuthor table has a composite key that includes the Book_BookID column and the Author_AuthorID column. A database is in 2NF if it meets the following criteria:
- It is in 1NF.
- Non-primary key attributes are completely dependent on the composite primary key. If any column is dependent on only one column of the composite key, it is not in in 1NF.
The BookAuthor table shown in the figure violates this with the Publisher column. A book has a unique publisher so the publisher is related to the Book_BookID column. However, an author can publish books through multiple publishers, so the publisher value is not dependent on the Author_AuthorID column.
Notice that the Book table correctly has the Publisher column, so the easy fix to have this database in 2NF is to delete the Publisher column in the BookAuthor table.
Third Normal Form
Third normal form (3NF) helps eliminate unnecessary redundancies within a database. A database is in 3NF if it meets the following criteria:
- It is in This implies it is also in 1NF.
- All columns that aren’t primary keys are only dependent on the primary In other words, none of the columns in the table are dependent on non-primary key attributes.
The Book table violates the second rule of 3NF with the PublisherCity column. The city where the publisher is located is dependent on the publisher, not the book. Imagine this table had 100 book entries from the same publisher located in Virginia Beach. When entering the data, you’d need to repeatedly enter Virginia Beach for this publisher.
There are two ways to fix this. First, ask if the city is needed. If not, delete the column and the database is now in 3NF. If the city is needed, you can create another table with publisher data. You would then relate the Publisher table with the Book table.
Remember this
Normalization is a process used to optimize databases. While there are several normal forms available, a database is considered normalized when it conforms to the first three normal forms.
Q. Database administrators have created a database used by a web application. However, testing shows that the application is taking a significant amount of time accessing data within the database. Which of the following actions is MOST likely to improve the overall performance of a database?
A. Normalization
B. Client-side input validation
C. Server-side input validation
D. Obfuscation
Answer is A. Normalization techniques organize tables and columns in a database and improve overall database performance. None of the other answers improve the database performance.
Input validation techniques help prevent many types of attacks, and server-side input validation techniques are preferred over client-side techniques.
Obfuscation techniques make the code more difficult to read.
See Chapter 7 of the CompTIA Security+: Get Certified Get Ahead: SY0-501 Study Guide
or
Chapter 7 of the CompTIA Security+: Get Certified Get Ahead: SY0-401 Study Guide
for more information on application attacks.