Machine Learning for Cybersecurity Cookbook
eBook - ePub

Machine Learning for Cybersecurity Cookbook

Over 80 recipes on how to implement machine learning algorithms for building security systems using Python

Emmanuel Tsukerman

  1. 346 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Machine Learning for Cybersecurity Cookbook

Over 80 recipes on how to implement machine learning algorithms for building security systems using Python

Emmanuel Tsukerman

Book details
Book preview
Table of contents
Citations

About This Book

Learn how to apply modern AI to create powerful cybersecurity solutions for malware, pentesting, social engineering, data privacy, and intrusion detection

Key Features

  • Manage data of varying complexity to protect your system using the Python ecosystem
  • Apply ML to pentesting, malware, data privacy, intrusion detection system(IDS) and social engineering
  • Automate your daily workflow by addressing various security challenges using the recipes covered in the book

Book Description

Organizations today face a major threat in terms of cybersecurity, from malicious URLs to credential reuse, and having robust security systems can make all the difference. With this book, you'll learn how to use Python libraries such as TensorFlow and scikit-learn to implement the latest artificial intelligence (AI) techniques and handle challenges faced by cybersecurity researchers.

You'll begin by exploring various machine learning (ML) techniques and tips for setting up a secure lab environment. Next, you'll implement key ML algorithms such as clustering, gradient boosting, random forest, and XGBoost. The book will guide you through constructing classifiers and features for malware, which you'll train and test on real samples. As you progress, you'll build self-learning, reliant systems to handle cybersecurity tasks such as identifying malicious URLs, spam email detection, intrusion detection, network protection, and tracking user and process behavior. Later, you'll apply generative adversarial networks (GANs) and autoencoders to advanced security tasks. Finally, you'll delve into secure and private AI to protect the privacy rights of consumers using your ML models.

By the end of this book, you'll have the skills you need to tackle real-world problems faced in the cybersecurity domain using a recipe-based approach.

What you will learn

  • Learn how to build malware classifiers to detect suspicious activities
  • Apply ML to generate custom malware to pentest your security
  • Use ML algorithms with complex datasets to implement cybersecurity concepts
  • Create neural networks to identify fake videos and images
  • Secure your organization from one of the most popular threats – insider threats
  • Defend against zero-day threats by constructing an anomaly detection system
  • Detect web vulnerabilities effectively by combining Metasploit and ML
  • Understand how to train a model without exposing the training data

Who this book is for

This book is for cybersecurity professionals and security researchers who are looking to implement the latest machine learning techniques to boost computer security, and gain insights into securing an organization using red and blue team ML. This recipe-based book will also be useful for data scientists and machine learning developers who want to experiment with smart techniques in the cybersecurity domain. Working knowledge of Python programming and familiarity with cybersecurity fundamentals will help you get the most out of this book.

Frequently asked questions

How do I cancel my subscription?
Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
Can/how do I download books?
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
What is the difference between the pricing plans?
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
What is Perlego?
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Do you support text-to-speech?
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Is Machine Learning for Cybersecurity Cookbook an online PDF/ePUB?
Yes, you can access Machine Learning for Cybersecurity Cookbook by Emmanuel Tsukerman in PDF and/or ePUB format, as well as other popular books in Computer Science & Cyber Security. We have over one million books available in our catalogue for you to explore.

Information

Year
2019
ISBN
9781838556341
Edition
1

Automatic Intrusion Detection

An intrusion detection system monitors a network or a collection of systems for malicious activity or policy violations. Any malicious activity or violation caught is stopped or reported. In this chapter, we will design and implement several intrusion detection systems using machine learning. We will begin with the classical problem of detecting spam email. We will then move on to classifying malicious URLs. We will take a brief detour to explain how to capture network traffic, so that we may tackle more challenging network problems, such as botnet and DDoS detection. We will construct a classifier for insider threats. Finally, we will address the example-dependent, cost-sensitive, radically imbalanced, and challenging problem of credit card fraud.
This chapter contains the following recipes:
  • Spam filtering using machine learning
  • Phishing URL detection
  • Capturing network traffic
  • Network behavior anomaly detection
  • Botnet traffic detection
  • Feature engineering for insider threat detection
  • Employing anomaly detection for insider threats
  • Detecting DDoS
  • Credit card fraud detection
  • Counterfeit bank note detection
  • Ad blocking using machine learning
  • Wireless indoor localization

Technical requirements

The following are the technical prerequisites for this chapter:
  • Wireshark
  • PyShark
  • costcla
  • scikit-learn
  • pandas
  • NumPy
Code and datasets may be found at https://github.com/PacktPublishing/Machine-Learning-for-Cybersecurity-Cookbook/tree/master/Chapter06.

Spam filtering using machine learning

Spam mails (unwanted mails) constitute around 60% of global email traffic. Aside from the fact that spam detection software has progressed since the first spam message in 1978, anyone with an email account knows that spam continues to be a time-consuming and expensive problem. Here, we provide a recipe for spam-ham (non-spam) classification using machine learning.

Getting ready

Preparation for this recipe involves installing the scikit-learn package in pip. The command is as follows:
pip install sklearn
In addition, extract spamassassin-public-corpus.7z into a folder named spamassassin-public-corpus.

How to do it...

In the following steps, we build a classifier for wanted and unwanted email:
  1. Unzip the spamassassin-public-corpus.7z dataset.
  1. Specify the path of your spam and ham directories:
import os

spam_emails_path = os.path.join("spamassassin-public-corpus", "spam")
ham_emails_path = os.path.join("spamassassin-public-corpus", "ham")
labeled_file_directories = [(spam_emails_path, 0), (ham_emails_path, 1)]
  1. Create labels for the two classes and read the emails into a corpus:
email_corpus = []
labels = []

for class_files, label in labeled_file_directories:
files = os.listdir(class_files)
for file in files:
file_path = os.path.join(class_files, file)
try:
with open(file_path, "r") as currentFile:
email_content = currentFile.read().replace("\n", "")
email_content = str(email_content)
email_corpus.append(email_content)
labels.append(label)
except:
pass
  1. Train-test split the dataset:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
email_corpus, labels, test_size=0.2, random_state=11
)
  1. Train an NLP pipeline on the training data:
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import HashingVectorizer, TfidfTransformer
from sklearn import tree

nlp_followed_by_dt = Pipeline(
[
("vect", HashingVectorizer(input="content", ngram_range=(1, 3))),
("tfidf", TfidfTransformer(use_idf=True,)),
("dt", tree.DecisionTreeClassifier(class_weight="balanced")),
]
)
nlp_followed_by_dt.fit(X_train, y_train)
  1. Evaluate the classifier on the testing data:
from sklearn.metrics import accuracy_score, confusion_matrix

y_test_pred = nlp_followed_by_dt.predict(X_test)
print(accuracy_score(y_test, y_test_pred))
print(confusion_matrix(y_test, y_test_pred))
The following is the output:
0.9761620977353993
[[291 7]
[ 13 528]]

How it works


We start by preparing a dataset consisting of raw emails (Step 1), which the reader can examine by looking at the dataset. In Step 2, we specify the paths of the spam and ham emails, as well as assign labels to their directories. We proceed to read all of the emails into an array, and create a labels array in Step 3. Next, we train-test split our dataset (Step 4), and then fit an NLP pipeline on it in Step 5. Finally, in Step 6, we test our pipeline. We see that accuracy is pretty high. Since the dataset is relatively balanced, there is no need to use special metrics to evaluate success.

Phishing URL detection

A phishing website is a website that tries to obtain your account password or other personal information by making you think that you are on a legitimate website. S...

Table of contents

Citation styles for Machine Learning for Cybersecurity Cookbook

APA 6 Citation

Tsukerman, E. (2019). Machine Learning for Cybersecurity Cookbook (1st ed.). Packt Publishing. Retrieved from https://www.perlego.com/book/1284230/machine-learning-for-cybersecurity-cookbook-over-80-recipes-on-how-to-implement-machine-learning-algorithms-for-building-security-systems-using-python-pdf (Original work published 2019)

Chicago Citation

Tsukerman, Emmanuel. (2019) 2019. Machine Learning for Cybersecurity Cookbook. 1st ed. Packt Publishing. https://www.perlego.com/book/1284230/machine-learning-for-cybersecurity-cookbook-over-80-recipes-on-how-to-implement-machine-learning-algorithms-for-building-security-systems-using-python-pdf.

Harvard Citation

Tsukerman, E. (2019) Machine Learning for Cybersecurity Cookbook. 1st edn. Packt Publishing. Available at: https://www.perlego.com/book/1284230/machine-learning-for-cybersecurity-cookbook-over-80-recipes-on-how-to-implement-machine-learning-algorithms-for-building-security-systems-using-python-pdf (Accessed: 14 October 2022).

MLA 7 Citation

Tsukerman, Emmanuel. Machine Learning for Cybersecurity Cookbook. 1st ed. Packt Publishing, 2019. Web. 14 Oct. 2022.