Chapter 1: Structured Query Language for SQL Injection
Today's world relies on the concept of cyberspace every day: the internet allows people all around the globe to connect to computers in any part of the world. This enables instant fruition of many services that rely on a plethora of technologies, protocols, and mechanisms that constitute the basis for whatever is available on the World Wide Web. Unfortunately, the theme of security is relevant for this intricate web of connections and services in the same way it is for the real world.
Malicious agents perform attacks against computers worldwide every day, mostly just for personal gain or advantage. By exploiting online applications and services, in fact, it may be possible to gain control of computers or entire networks, thereby taking advantage of specific of the intrinsic vulnerabilities of some technologies, protocols, frameworks, or just applications. One of the most common β and notorious β ways to do so is through SQL injection, which is a specific type of attack that tries to exploit the syntax of a language used in databases β SQL, which stands for Structured Query Language β in order to access otherwise unobtainable information present on a database, including the ones responsible for account authentication, which contain usernames and passwords used to access services. Despite being a well-known attack, vulnerable applications are still present today, hinting to the fact that, sometimes, security in the context of application development is not considered enough.
This book aims to give insight on the matter of SQL injection by explaining what it is all about both in terms of theory and practice.
This chapter serves as a theoretical introduction to the matter, describing at a high-level what SQL is, what it is used for, and its possible weaknesses that lead to SQL injection. This theoretical overview is crucial in order to understand the concepts behind SQL injection that will be further explored in the next chapter.
After introducing the concepts of databases, database management systems and models, queries, and SQL specifically, aspects of syntax and logic will be covered, quickly showing the main constructs and items that can lead to security weaknesses in the use of SQL, ultimately leading to the core matter: SQL injection.
In this chapter, the following topics will be covered:
- An overview of SQL β a relational query language: A preliminary overview of SQL, our main language of reference that SQL injection is traditionally based on, and the relational model versus other DBMS models.
- Syntax and logic of SQL: An explanation of the main concepts and constructs behind SQL, some of which could be exploited by malicious attackers.
- Security implications of SQL: A brief introduction to the concept of security in SQL and its use in applications.
- Weakness in the use of SQL: An explanation of the main pitfalls an application relying on SQL can have, highlighting some general advice in secure development.
Technical requirements
For this chapter and the next, the topics we will cover will mostly be theoretical. However, we suggest that you read the SQL technical documentation. Here, we have provided, for reference, the MySQL, Oracle, and Microsoft SQL Server documentation:
- https://dev.mysql.com/doc/refman/8.0/en/
- https://docs.oracle.com/en/database/oracle/oracle-database/index.html
- https://docs.microsoft.com/en-us/sql/sql-server/?view=sql-server-ver15
An overview of SQL β a relational query language
One of the most common ways to keep data memorized in computer systems is by relying on databases. Databases can be seen as large software containers that can hold lots of information in a structured and accessible way, in order to optimize how to store data and access their operations.
Depending on the approach and model used, the way in which this is achieved can vary in terms of implementation. One of the most common ways is to use the relational model, which is based on relational algebra, for which data is a collected as a series of records that describe the relationships that exist among objects. SQL is a query language that is based on such concepts, and it is widely adopted in many database systems. This section will deal with these topics in depth by first explaining database management systems, relational databases, and SQL.
Database management systems and relational databases
The implementation of a database, as we mentioned earlier, relies on an underlying system, or a database nanagement system (DBMS). A DBMS is basically a piece of software responsible for storing, accessing, manipulating and, in general, managing data through a specific definition of the collected and managed information.
For the purpose of this book, we will now divide database systems into two large families to better understand the differences between them. We can distinguish between database models in terms of relational databases and non-relational databases due to the relevance of the relational model in data management.
Relational databases
Relational databases have been widely considered as a standard due to their many advantages. Data is collected in tables, in which rows represent objects, memorized as records, and columns represent their attributes. The name is derived from the way in which data can be correlated and connected, that is, through relations based on common attributes among tables. Thus, the concept of relational algebra becomes relevant as it describes the way in which, through a structured procedural language, data tables can be managed. SQL is the most popular representative of this model as it takes advantage of most of the concepts of relational algebra, thus providing a model that is easy to use by anyone without any coding experience, while maintaining its overall efficiency:
Figure 1.1 β A simple relational schema made up of three tables describing cars and owners, with the IDs (unique) put into a relationship
Non-relational databases
No-rel, which stands for non-relational, databases are a family of DBMS models considered as an alternative to the relational model and are usually much more prominent among database systems. Originally, the term NoSQL was used to define this family of systems, but it was considered misleading: some of the first attempts at building non-relational databases actually used some concepts of the relational model. No-rel databases include many models, some of which are as follows:
- Network databases model the data as connected nodes in a network:
Figure 1.2 β A simple network schema to represent ownership relations between owners and cars
- Graph-based databases highlight the connections among data using a graph-like navigable structure:
Figure 1.3 β The same ownership relation as in the relational example, this time represented in a graph-based model schema
- Object-oriented databases model data as objects, in a similar fashion as in programming languages such as Java:
Figure 1.4 β The ownership relationship represented in an object-oriented model schema
- Document-based databases describe data within documents containing key-value pairs, specify the way in which data is memorized and managed, and provide a flexible approach that does not rely on a defined schema. Document-based models can usually include embedded objects as collections within a single key, as shown in the following image:
Figure 1.5 β A document-bas...