eBook - ePub

Python Artificial Intelligence Projects for Beginners

Name: Python Artificial Intelligence Projects for Beginners
Author: Dr. Joshua Eckroth

Get up and running with Artificial Intelligence using 8 smart and exciting AI applications

Dr. Joshua Eckroth

Share book

162 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Python Artificial Intelligence Projects for Beginners

Get up and running with Artificial Intelligence using 8 smart and exciting AI applications

Dr. Joshua Eckroth

Book details

Book preview

Table of contents

Citations

About This Book

Build smart applications by implementing real-world artificial intelligence projects

Key Features

Explore a variety of AI projects with Python
Get well-versed with different types of neural networks and popular deep learning algorithms
Leverage popular Python deep learning libraries for your AI projects

Book Description

Artificial Intelligence (AI) is the newest technology that's being employed among varied businesses, industries, and sectors. Python Artificial Intelligence Projects for Beginners demonstrates AI projects in Python, covering modern techniques that make up the world of Artificial Intelligence.

This book begins with helping you to build your first prediction model using the popular Python library, scikit-learn. You will understand how to build a classifier using an effective machine learning technique, random forest, and decision trees. With exciting projects on predicting bird species, analyzing student performance data, song genre identification, and spam detection, you will learn the fundamentals and various algorithms and techniques that foster the development of these smart applications. In the concluding chapters, you will also understand deep learning and neural network mechanisms through these projects with the help of the Keras library.

By the end of this book, you will be confident in building your own AI projects with Python and be ready to take on more advanced projects as you progress

What you will learn

Build a prediction model using decision trees and random forest
Use neural networks, decision trees, and random forests for classification
Detect YouTube comment spam with a bag-of-words and random forests
Identify handwritten mathematical symbols with convolutional neural networks
Revise the bird species identifier to use images
Learn to detect positive and negative sentiment in user reviews

Who this book is for

Python Artificial Intelligence Projects for Beginners is for Python developers who want to take their first step into the world of Artificial Intelligence using easy-to-follow projects. Basic working knowledge of Python programming is expected so that you're able to play around with code

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Python Artificial Intelligence Projects for Beginners an online PDF/ePUB?

Yes, you can access Python Artificial Intelligence Projects for Beginners by Dr. Joshua Eckroth in PDF and/or ePUB format, as well as other popular books in Informatik & Künstliche Intelligenz (KI) & Semantik. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2018

ISBN

9781789538243

Edition

Topic

Informatik

Subtopic

Künstliche Intelligenz (KI) & Semantik

Applications for Comment Classification

In this chapter, we'll overview the bag-of-words model for text classification. We will look at predicting YouTube comment spam with the bag-of-words and the random forest techniques. Then we'll look at the Word2Vec models and prediction of positive and negative reviews with the Word2Vec approach and the k-nearest neighbor classifier.

In this chapter, we will particularly focus on text and words and classify internet comments as spam or not spam or to identify internet reviews as positive or negative. We will also have an overview for bag of words for text classification and prediction model to predict YouTube comments are spam or not using bag of words and random forest techniques. We will also look at Word2Vec models an k-nearest neighbor classifier.

But, before we start, we'll answer the following question: what makes text classification an interesting problem?

Text classification

To find the answer to our question, we will consider the famous iris flower dataset as an example dataset. The following image is of iris versicolor species. To identify the species, we need some more information other than just an image of the species, such as the flower's Petal length, Petal width, Sepal length, and Sepal width would help us identify the image better:

The dataset not only contains examples of versicolor but also contains examples of setosa and virginica as well. Every example in the dataset contains these four measurements. The dataset contains around 150 examples, with 50 examples of each species. We can use a decision tree or any other model to predict the species of a new flower, if provided with the same four measurements. As we know same species will have almost similar measurements. Since similarity has different definition all together but here we consider similarity as the closeness on a graph, if we consider each point is a flower. The following graph is a comparison between sepal width versus petal width:

If we had no way of measuring similarity, if, say, every flower had different measurements, then there'd be no way to use machine learning to build a classifier.

As we are aware of the fact that flowers of same species have same measurement and that helps us to distinguish different species. Consider what if every flower had different measurement, it would of no use to build classifier using machine learning to identify images of species.

Machine learning techniques

Before to that we considered images, let's now consider text. For example, consider the following sentences and try to find what makes the first pair of phrases similar to the second pair:

I hope you got the answer to that question, otherwise we will not be able to build a decision tree, a random forest or anything else to predict the model. To answer the question, notice that the top pair of phrases are similar as they contain some words in common, such as subscribe and channel, while the second pair of sentences have fewer words in common, such as to and the. Consider the each phrase representing vector of numbers in a way that the top pair is similar to the numbers in the second pair. Only then we will be able to use random forest or another technique for classification, in this case, to detect YouTube comment spam. To achieve this, we need to use the bag-of-words model.

Bag of words

The bag-of-words model does exactly we want that is to convert the phrases or sentences and counts the number of times a similar word appears. In the world of computer science, a bag refers to a data structure that keeps track of objects like an array or list does, but in such cases the order does not matter and if an object appears more than once, we just keep track of the count rather we keep repeating them.

For example, consider the first phrase from the previous diagram, it has a bag of words that contents words such as channel, with one occurrence, plz, with one occurrence, subscribe, two occurrences, and so on. Then, we would collect all these counts in a vector, where one vector per phrase or sentence or document, depending on what you are working with. Again, the order in which the words appeared originally doesn't matter.

The vector that we created can also be used to sort data alphabetically, but it needs to be done consistently for all the different phrases. However, we still have the same problem. Each phrase has a vector with different columns, because each phrase has different words and a different number of columns, as shown in the following two tables:

If we make a larger vector with all the unique words across both phrases, we get a proper matrix representation. With each row representing a different phrase, notice the use of 0 to indicate that a phrase doesn't have a word:

If you want to have a bag of words with lots of phrases, documents, or we would need to collect all the unique words that occur across all the examples and create a huge matrix, N x M, where N is the number of examples and M is the number of occurrences. We could easily have thousands of dimensions compared in a four-dimensional model for the iris dataset. The bag of words matrix is likely to be sparse, meaning mostly zeros, since most phrases don't have most words.

Before we start building our bag of words model, we need to take care of a few things, such as the following:

Lowercase every word
Drop punctuation
Drop very common words (stop words)
Remove plurals (for example, bunnies => bunny)
Perform lemmatization (for example, reader => read, reading = read)
Use n-grams, such as bigrams (two-word pairs) or trigrams
Keep only frequent words (for example, must appear in >10 examples)
Keep only the most frequent M words (for example, keep only 1,000)
Record binary counts (1 = present, 0 = absent) rather than true counts

There are ...