Learning Python for Forensics
eBook - ePub

Learning Python for Forensics

Leverage the power of Python in forensic investigations, 2nd Edition

  1. 476 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Learning Python for Forensics

Leverage the power of Python in forensic investigations, 2nd Edition

About this book

Design, develop, and deploy innovative forensic solutions using Python

Key Features

  • Discover how to develop Python scripts for effective digital forensic analysis
  • Master the skills of parsing complex data structures with Python libraries
  • Solve forensic challenges through the development of practical Python scripts

Book Description

Digital forensics plays an integral role in solving complex cybercrimes and helping organizations make sense of cybersecurity incidents. This second edition of Learning Python for Forensics illustrates how Python can be used to support these digital investigations and permits the examiner to automate the parsing of forensic artifacts to spend more time examining actionable data.

The second edition of Learning Python for Forensics will illustrate how to develop Python scripts using an iterative design. Further, it demonstrates how to leverage the various built-in and community-sourced forensics scripts and libraries available for Python today. This book will help strengthen your analysis skills and efficiency as you creatively solve real-world problems through instruction-based tutorials.

By the end of this book, you will build a collection of Python scripts capable of investigating an array of forensic artifacts and master the skills of extracting metadata and parsing complex data structures into actionable reports. Most importantly, you will have developed a foundation upon which to build as you continue to learn Python and enhance your efficacy as an investigator.

What you will learn

  • Learn how to develop Python scripts to solve complex forensic problems
  • Build scripts using an iterative design
  • Design code to accommodate present and future hurdles
  • Leverage built-in and community-sourced libraries
  • Understand the best practices in forensic programming
  • Learn how to transform raw data into customized reports and visualizations
  • Create forensic frameworks to automate analysis of multiple forensic artifacts
  • Conduct effective and efficient investigations through programmatic processing

Who this book is for

If you are a forensics student, hobbyist, or professional seeking to increase your understanding in forensics through the use of a programming language, then Learning Python for Forensics is for you. You are not required to have previous experience in programming to learn and master the content within this book. This material, created by forensic professionals, was written with a unique perspective and understanding for examiners who wish to learn programming.

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription.
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn more here.
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes! You can use the Perlego app on both iOS or Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app.
Yes, you can access Learning Python for Forensics by Preston Miller, Chapin Bryce in PDF and/or ePUB format, as well as other popular books in Computer Science & Cyber Security. We have over one million books available in our catalogue for you to explore.

Information

Databases in Python

In this chapter, we will leverage databases in our scripts so that we can accomplish meaningful tasks when working with large quantities of data. Using a simple example, we will demonstrate the capabilities and benefits of using a database backend in our Python scripts. We will store file metadata that has been recursively indexed from a given root directory into a database and then query it to generate reports. Although this may seem like a simple feat, the purpose of this chapter is to showcase the ways we can interact with a database in Python by creating an active file listing.
In this chapter, we will delve into the following topics:
  • The basic design and implementation of SQLite3 databases
  • Working with these databases in Python using built-in and third-party modules
  • Understanding how to recursively iterate through directories in Python
  • Understanding filesystem metadata and the methods for accessing it using Python
  • Crafting CSV and HTML reports for easy review by our end user
The code for this chapter was developed and tested using Python 2.7.15 and Python 3.7.1. The file_lister.py script was developed to work with Python 3.7.1. The file_lister_peewee.py script was developed and tested using both Python 2.7.15 and Python 3.7.1.

An overview of databases

Databases provide an efficient means of storing large amounts of data in a structured manner. There are many types of databases, commonly broken into two categories: SQL or NoSQL. SQL (short for Structured Query Language) is designed to be a simple language that allows users to manipulate large datasets that are stored in a database. This includes common databases, such as MySQL, SQLite, and PostgreSQL. NoSQL databases are also useful and generally use JSON or XML to store data of varying structures, both of which were discussed as common serialized data types in the previous chapter.

Using SQLite3

SQLite3 is the latest version of SQLite and is one of the most common databases found in application development. This database, unlike others, is stored as a single file and does not require a server instance to be running or installed. For this reason, it is widely used due to its portability and is found in many applications for mobile devices, desktop applications, and web services. SQLite3 uses a slightly modified SQL syntax, though of the many SQL variations that exist, it is one of its simpler implementations. Naturally, there are some limitations to this lightweight database. These limitations include a restriction of one writer being connected to the database at a time, 140 TB of storage, and that it is not client-server based. Because our application will not execute multiple write statements simultaneously, uses less than 140 TB of storage, and does not require a client-server setup for distribution, we will be using SQLite for our example in this chapter.

Using SQL

Before developing our code, let's take a look at the basic SQL statements we will be using. This will help us understand how we can interact with databases even without Python. In SQL, commands are commonly written in uppercase, although they are case-insensitive. For this exercise, we will use uppercase to improve legibility. All SQL statements must end in a semicolon to execute, as it denotes the end of a statement.
If you would like to follow along, install a SQLite management tool, such as the command-line tool sqlite3. This tool can be downloaded from https://www.sqlite.org/download.html. The output shown in this section has been generated with the sqlite3 command-line tool, though the statements that have been given will generate the same database in most other sqlite3 graphical applications. When in doubt, use the official sqlite3 command-line tool.
To begin, we will create a table, a fundamental component of any database. If we compare a database to an Excel workbook, a table is tantamount to a worksheet. Tables contain named columns, as well as rows of data that are mapped to these columns. Just like how an Excel workbook may contain multiple worksheets, so too can a database contain multiple tables. To create a table, we will use the CREATE TABLE command, specifying the table name and then wrapping, in parentheses, the column names and their data types as a comma-separated list. Finally, we end the SQL statement with a semicolon:
>>> CREATE TABLE custodians (id INTEGER PRIMARY KEY, name TEXT);
As we can see in the CREATE TABLE statement, we specify the id and name columns in the custodians table. The id field is an integer and primary key. This designation of INTEGER PRIMARY KEY in SQLite3 will create an automatic index that sequentially increments for each added row, therefore creating an index of unique row identifiers. The name column has the data type of TEXT, which allows any character to be stored as a text string. SQLite supports five data types, two of which we've already introduced:
  • INTEGER
  • TEXT
  • REAL
  • BLOB
  • NULL
The REAL data type allows floating point numbers (for example, decimals). The BLOB (short for Binary Large OBject) data type preserves any input data exactly as is, without casting it as a certain type. The NULL data type simply stores an empty value.
After creating the table, we can begin to add data to it. As we can see in the following code block, we can use the INSERT INTO command to insert data into the table. The syntax following this command specifies the table name, the columns to insert the data into, followed by the VALUES command specifying the values to be inserted. The columns and data must be wrapped in parentheses, as shown in the following code. Using the null statement as a value, the auto-incrementing feature of SQLite will step in and fill in this value with the next available unique integer. Remember that this auto-incrementing is only true because we designated it as INTEGER PRIMARY KEY. As a general rule, only one column in a table should have this designation:
>>> INSERT INTO custodians (id, name) VALUES (null, 'Chell');
>>> INSERT INTO custodians (id, name) VALUES (null, 'GLaDOS');
We've inserted two custodians, Chell and GLaDOS, and we let SQLite assign IDs to each of them. After the data has been inserted, we can select and view this information using the SELECT command. The basic syntax involves invoking the SELECT command, followed by the columns to select (or an asterisk * to designate all columns) and the FROM statement, indicating the table name following a trailing semicolon. As we can see in the following code, SELECT will print out a pipe (|) separated list of the values stored:
>>> SELECT * FROM custodians;
1|Chell
2|GLaDOS
In addition to showing only the desired columns from our table, we can also filter data on one or more conditions. The WHERE statement allows us to filter results and return only responsive items. For the purpose of the script in this chapter, we will stick to a simple where statement and only use the equals operator to return responsive values. When executed, the SELECT-WHERE statement returns only the custodian information where the id value is 1. In addition, note that the order of the columns reflects the order in which they were specified:
>>> SELECT name,id FROM custodians WHERE id = 1;
Chell|1
There are more operations and statements available to interact with SQLite3 databases, although the preceding operations highlight all that we require for our scripts. We invite you to explore additional operations in the SQLite3 documentation, which can be found at https://sqlite.org.

Designing our script

The first iteration of our script focuses on performing the task at hand with a standard module, sqlite3, in a more manual fashion. This entails writing out each SQL statement and executing them as if you were working with the database itself. Although this is not a very Pythonic manner of handling a database, it demonstrates the methods that are used to interact with a database with Python. Our second iteration employs two third-party libraries: peewee and jinja2.
Peewee is an object-relational mapper (ORM), which is a term that's used to describe a software suite that uses objects to handle database operations. In short, this ORM allows the developer to call functions and define classes in Python that are interpreted as database commands. This layer of abstraction helps to standardize database calls and allows for multiple database backends to be easily interchanged. Peewee is a light ORM, as it is a single Python file that supports PostgreSQL, MySQL, and SQLite3 database connections. If we needed to switch our second script from SQLite3 to PostgreSQL, it would only require that we modify a few lines of code; our first script would require more attention to handle this same conversion. This being said, our first version does not require any dependencies beyond the sta...

Table of contents

  1. Title Page
  2. Copyright and Credits
  3. About Packt
  4. Contributors
  5. Preface
  6. Now for Something Completely Different
  7. Python Fundamentals
  8. Parsing Text Files
  9. Working with Serialized Data Structures
  10. Databases in Python
  11. Extracting Artifacts from Binary Files
  12. Fuzzy Hashing
  13. The Media Age
  14. Uncovering Time
  15. Rapidly Triaging Systems
  16. Parsing Outlook PST Containers
  17. Recovering Transient Database Records
  18. Coming Full Circle
  19. Other Books You May Enjoy