eBook - ePub

Phishing Detection Using Content-Based Image Classification

Name: Phishing Detection Using Content-Based Image Classification
Author: Shekhar Khandelwal, Rik Das

Shekhar Khandelwal, Rik Das

Condividi libro

130 pagine
English
ePUB (disponibile sull'app)
Disponibile su iOS e Android

eBook - ePub

Phishing Detection Using Content-Based Image Classification

Shekhar Khandelwal, Rik Das

Dettagli del libro

Anteprima del libro

Indice dei contenuti

Citazioni

Informazioni sul libro

Phishing Detection Using Content-Based Image Classification is an invaluable resource for any deep learning and cybersecurity professional and scholar trying to solve various cybersecurity tasks using new age technologies like Deep Learning and Computer Vision. With various rule-based phishing detection techniques at play which can be bypassed by phishers, this book provides a step-by-step approach to solve this problem using Computer Vision and Deep Learning techniques with significant accuracy.

The book offers comprehensive coverage of the most essential topics, including:

Programmatically reading and manipulating image data
Extracting relevant features from images
Building statistical models using image features
Using state-of-the-art Deep Learning models for feature extraction
Build a robust phishing detection tool even with less data
Dimensionality reduction techniques
Class imbalance treatment
Feature Fusion techniques
Building performance metrics for multi-class classification task

Another unique aspect of this book is it comes with a completely reproducible code base developed by the author and shared via python notebooks for quick launch and running capabilities. They can be leveraged for further enhancing the provided models using new advancement in the field of computer vision and more advanced algorithms.

Domande frequenti

Come faccio ad annullare l'abbonamento?

È semplicissimo: basta accedere alla sezione Account nelle Impostazioni e cliccare su "Annulla abbonamento". Dopo la cancellazione, l'abbonamento rimarrà attivo per il periodo rimanente già pagato. Per maggiori informazioni, clicca qui

È possibile scaricare libri? Se sì, come?

Al momento è possibile scaricare tramite l'app tutti i nostri libri ePub mobile-friendly. Anche la maggior parte dei nostri PDF è scaricabile e stiamo lavorando per rendere disponibile quanto prima il download di tutti gli altri file. Per maggiori informazioni, clicca qui

Che differenza c'è tra i piani?

Entrambi i piani ti danno accesso illimitato alla libreria e a tutte le funzionalità di Perlego. Le uniche differenze sono il prezzo e il periodo di abbonamento: con il piano annuale risparmierai circa il 30% rispetto a 12 rate con quello mensile.

Cos'è Perlego?

Perlego è un servizio di abbonamento a testi accademici, che ti permette di accedere a un'intera libreria online a un prezzo inferiore rispetto a quello che pagheresti per acquistare un singolo libro al mese. Con oltre 1 milione di testi suddivisi in più di 1.000 categorie, troverai sicuramente ciò che fa per te! Per maggiori informazioni, clicca qui.

Perlego supporta la sintesi vocale?

Cerca l'icona Sintesi vocale nel prossimo libro che leggerai per verificare se è possibile riprodurre l'audio. Questo strumento permette di leggere il testo a voce alta, evidenziandolo man mano che la lettura procede. Puoi aumentare o diminuire la velocità della sintesi vocale, oppure sospendere la riproduzione. Per maggiori informazioni, clicca qui.

Phishing Detection Using Content-Based Image Classification è disponibile online in formato PDF/ePub?

Sì, puoi accedere a Phishing Detection Using Content-Based Image Classification di Shekhar Khandelwal, Rik Das in formato PDF e/o ePub, così come ad altri libri molto apprezzati nelle sezioni relative a Informatique e Ingénierie de l'informatique. Scopri oltre 1 milione di libri disponibili nel nostro catalogo.

Informazioni

Editore

Chapman and Hall/CRC

Anno

2022

ISBN

9781000597691

Edizione

Argomento

Informatique

Categoria

Ingénierie de l'informatique

1 Phishing and Cybersecurity

DOI: 10.1201/9781003217381-1

Phishing is a cybercrime intended to trap innocent web users into a counterfeit website, which is visually similar to its legitimate counterpart. Initially, users are redirected to phishing websites through various social and technical routing techniques. Unaware of the illegitimacy of the website, the users may then provide their personal information such as user id, password, credit card details or bank account details, to name a few. The phishers use such information to steal money from banks, damage a brand’s image or even commit graver crimes like identity theft. Although many phishing detection and prevention techniques are available in the existing literature, the advent of smart machine learning and deep learning methods has widened their scope in the cyber-security world.

Structure

In this chapter, we will cover the following topics:

Basics of phishing in cybersecurity
Phishing detection techniques
- List (whitelist/blacklist)-based
- Heuristics (predefined rules)-based
- Visual similarity-based
Race between phishers and anti-phishers
Computer vision-based phishing detection approach

Objective

After studying this chapter, you should know what phishing is and how it affects everyone who has a web footprint. You will learn about the various ways phishers attack web users and how users can protect themselves from phishing attacks. You will also know the various phishing detection mechanisms that play a vital role in protecting web users from phishing attacks.

Basics of Phishing in Cybersecurity

Phishing is a term derived from fishing, by replacing “f” with “ph”, but contextually they mean the same (Phishing Definition & Meaning | What Is Phishing?, n.d.). Just as fish get trapped in fishing nets, so too are innocent web users being trapped by phishing websites. Phishing websites are counterfeit websites that are visually similar to their legitimate counterparts. Web users are redirected to phishing websites by various means. Figure 1.1 depicts the various techniques employed by phishers to circulate spam messages that contain links to phishing websites.

This figure displays different categories of Phishing attacks: social engineering and technical subterfuge. Within social engineering attacks, there are attacks occurring through emails, instant messages, VoIP, Relay chat and Blogs. Within technical subterfuge, there are malwares, Web trojan, keylogger and DNS poisoning. — **FIGURE 1.1** Types of phishing attacks.

Jain and Gupta (2017) stated that spreading infected links is the starting point of any phishing attack. Once users have received the infected links in their inbox through any of the phishing attack mechanisms shown in Figure 1.1, whether they click on those links or not depends on the users’ awareness. Hence, at the outset, user awareness is the most important, yet most ignored, anti-phishing mechanism.

But to protect users from phishing attacks, anti-phishers have explored many technical anti-phishing mechanisms by considering even novice and technically inept users.

Phishing Detection Techniques

Phishing detection mechanisms are broadly categorized into four groups, as depicted in Figure 1.2 (Khonji et al., 2013):

This image displays various phishing detection mechanisms. Two major categories are software-based and user education-based. Within software-based, there are sub-categories like list-based, heuristic-based, visual similarity-based and AI/ML-based. Visual similarity is further classified into other subcategories, in which pixel-based features are further sub-categorized into comparison-based and ML-based. — **FIGURE 1.2** Phishing detection mechanisms.

List (whitelist/blacklist)-based
Heuristics (pre-defined rules)-based
Visual similarity-based
AI/ML-based

List (Whitelist/Blacklist)-Based

In a list-based anti-phishing mechanism, a whitelist and blacklist of URLs are created and are compared against a suspicious website URL to conclude whether the website under scrutiny is a phishing website or a legitimate one (Jain & Gupta, 2016) (Prakash et al., 2010).

There are various limitations with the list-based approach, namely:

It is dependent on a third-party service provider that captures and maintains such lists, like Google safe browsing API (Google Safe Browsing | Google Developers, n.d.).
Adding a newly deployed phishing website to the white/blacklist is a process that takes time. First such a website has to be identified, and then it has to be listed. Since the average lifetime of a phishing website is 24–32 hours, hence zero-day phishing attacks, this is a serious limitation (Zero-Day (Computing) – Wikipedia, n.d.).

Heuristics (Pre-Defined Rules)-Based

In heuristic-based approaches, various website features like image, text, URL and DNS records are extracted and used to build a rule-based engine or a machine learning-based classifier to classify a given website as phishing or legitimate. Although heuristic-based approaches are among quite effective anti-phishing mechanisms, some of their drawbacks have been pointed out by Varshney et al. (2016):

The time and computational resources required for training are too high.
Heuristic-based applications cannot be used as a browser plugin.
The approach would be ineffective once scammers discovered the key features that can be used to bypass the rules.

Visual Similarity-Based

Visual similarity-based techniques are very useful in detecting phishing since phishing websites look similar to their legitimate counterparts. These techniques use visual features like text content, text format, DOM (Document Object Model) features, CSS features, website images, etc., to detect phishing. Here, DOM-, CSS-, HTML tags- and pixel-based features are compared to their legitimate counterparts in order to make a decision.

Within pixel-based techniques, there are two broad categories through which phishing detection is achieved. One approach is through comparison of visual signatures of suspicious website images with the stored visual signatures of legitimate websites. For example, hand-crafted image features like SIFT (Scale Invariant Feature Transform) (Lowe, 2004), SURF (Speeded Up Robust Features) (Bay et al., 2006), HOG (Histogram of Oriented Gradient) (Li et al., 2016), LBP (Local Binary Patterns) (Nhat & Hoang, 2019), DAISY (Tola et al., 2010) and MPEG7 (Rayar, 2017) are extracted from the legitimate websites and stored in a local datastore, which is used as a baseline for comparing similar features from the websites under scrutiny. And based on the comparison result, the phishing website is classified. Another approach is machine learning- or deep learning classifier-based, where image features of phishing and legitimate webpages are extracted and used to build a classifier for phishing detection.

Race between Phishers and Anti-Phishers

Phishers are continually upgrading their skills and devising new and innovative ways to bypass all the security layers and deceive innocent users. For example, many heuristic-based approaches validate if the website under suspicion is SSL-enabled or not, to determine whether it is a legitimate website or a phishing website. However, nowadays, the number of phishing websites hosted on HTTPS is also increasing significantly.

Similarly, for other significant predictors of a phishing website, phishers may find ways to bypass all the rules employed to detect phishing, which is evident from the upward trend of phishing attacks attempted in recent years. Hence, if phishers find a way to bypass list-based, heuristic-based and hybrid anti-phishing detection mechanisms, to redirect the users to the phishing website, then the image processing-based anti-phishing techniques play a vital role in providing the final security layer to the web users. In this book, we propose numerous machine learning and deep learning methods that manually extract features using computer vision techniques for phishing detection. However, there are two major limitations of these methods. First, these methods utilize a comparison-based technique that requires creating a large datastore of baseline values of legitimate websites. Second, these methods rely on manual hand-crafted feature extraction techniques.

2 Image Processing-Based Phishing Detection Techniques

DOI: 10.1201/9781003217381-2

Studies and statistics suggest that phishing is still a pressing issue in the world of cybercrime. Despite all the research, innovations and developments made in phishing detection mechanisms, revenue losses through phishing attacks are humongous, and therefore there is a pressing need to continue research on various aspects of phishing, as anti-phishers are in an arms race with phishers. And in order to win this race, anti-phishers need to think outside the box and close all the doors before phishers enter users’ premises for theft.

Assume that phishers are able to make users bypass all the list-based, heuristic-based and user awareness-based approaches, and finally made the user to land on the phishing website. At this stage, by analyzing the website image, using image processing techniques to classify whether the website in question is legitimate or phishing, can be considered as a final resort to warn users for phishing attack.

Additionally, list-based approaches cannot protect users from zero-day attacks, and heuristics-based approaches are only go...