Python Artificial Intelligence Projects for Beginners
eBook - ePub

Python Artificial Intelligence Projects for Beginners

Get up and running with Artificial Intelligence using 8 smart and exciting AI applications

Dr. Joshua Eckroth

Compartir libro
  1. 162 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Python Artificial Intelligence Projects for Beginners

Get up and running with Artificial Intelligence using 8 smart and exciting AI applications

Dr. Joshua Eckroth

Detalles del libro
Vista previa del libro
Índice
Citas

Información del libro

Build smart applications by implementing real-world artificial intelligence projects

Key Features

  • Explore a variety of AI projects with Python
  • Get well-versed with different types of neural networks and popular deep learning algorithms
  • Leverage popular Python deep learning libraries for your AI projects

Book Description

Artificial Intelligence (AI) is the newest technology that's being employed among varied businesses, industries, and sectors. Python Artificial Intelligence Projects for Beginners demonstrates AI projects in Python, covering modern techniques that make up the world of Artificial Intelligence.

This book begins with helping you to build your first prediction model using the popular Python library, scikit-learn. You will understand how to build a classifier using an effective machine learning technique, random forest, and decision trees. With exciting projects on predicting bird species, analyzing student performance data, song genre identification, and spam detection, you will learn the fundamentals and various algorithms and techniques that foster the development of these smart applications. In the concluding chapters, you will also understand deep learning and neural network mechanisms through these projects with the help of the Keras library.

By the end of this book, you will be confident in building your own AI projects with Python and be ready to take on more advanced projects as you progress

What you will learn

  • Build a prediction model using decision trees and random forest
  • Use neural networks, decision trees, and random forests for classification
  • Detect YouTube comment spam with a bag-of-words and random forests
  • Identify handwritten mathematical symbols with convolutional neural networks
  • Revise the bird species identifier to use images
  • Learn to detect positive and negative sentiment in user reviews

Who this book is for

Python Artificial Intelligence Projects for Beginners is for Python developers who want to take their first step into the world of Artificial Intelligence using easy-to-follow projects. Basic working knowledge of Python programming is expected so that you're able to play around with code

Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Python Artificial Intelligence Projects for Beginners un PDF/ePUB en línea?
Sí, puedes acceder a Python Artificial Intelligence Projects for Beginners de Dr. Joshua Eckroth en formato PDF o ePUB, así como a otros libros populares de Computer Science y Artificial Intelligence (AI) & Semantics. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.

Información

Año
2018
ISBN
9781789538243

Applications for Comment Classification

In this chapter, we'll overview the bag-of-words model for text classification. We will look at predicting YouTube comment spam with the bag-of-words and the random forest techniques. Then we'll look at the Word2Vec models and prediction of positive and negative reviews with the Word2Vec approach and the k-nearest neighbor classifier.
In this chapter, we will particularly focus on text and words and classify internet comments as spam or not spam or to identify internet reviews as positive or negative. We will also have an overview for bag of words for text classification and prediction model to predict YouTube comments are spam or not using bag of words and random forest techniques. We will also look at Word2Vec models an k-nearest neighbor classifier.
But, before we start, we'll answer the following question: what makes text classification an interesting problem?

Text classification

To find the answer to our question, we will consider the famous iris flower dataset as an example dataset. The following image is of iris versicolor species. To identify the species, we need some more information other than just an image of the species, such as the flower's Petal length, Petal width, Sepal length, and Sepal width would help us identify the image better:
The dataset not only contains examples of versicolor but also contains examples of setosa and virginica as well. Every example in the dataset contains these four measurements. The dataset contains around 150 examples, with 50 examples of each species. We can use a decision tree or any other model to predict the species of a new flower, if provided with the same four measurements. As we know same species will have almost similar measurements. Since similarity has different definition all together but here we consider similarity as the closeness on a graph, if we consider each point is a flower. The following graph is a comparison between sepal width versus petal width:
If we had no way of measuring similarity, if, say, every flower had different measurements, then there'd be no way to use machine learning to build a classifier.
As we are aware of the fact that flowers of same species have same measurement and that helps us to distinguish different species. Consider what if every flower had different measurement, it would of no use to build classifier using machine learning to identify images of species.

Machine learning techniques

Before to that we considered images, let's now consider text. For example, consider the following sentences and try to find what makes the first pair of phrases similar to the second pair:
I hope you got the answer to that question, otherwise we will not be able to build a decision tree, a random forest or anything else to predict the model. To answer the question, notice that the top pair of phrases are similar as they contain some words in common, such as subscribe and channel, while the second pair of sentences have fewer words in common, such as to and the. Consider the each phrase representing vector of numbers in a way that the top pair is similar to the numbers in the second pair. Only then we will be able to use random forest or another technique for classification, in this case, to detect YouTube comment spam. To achieve this, we need to use the bag-of-words model.

Bag of words

The bag-of-words model does exactly we want that is to convert the phrases or sentences and counts the number of times a similar word appears. In the world of computer science, a bag refers to a data structure that keeps track of objects like an array or list does, but in such cases the order does not matter and if an object appears more than once, we just keep track of the count rather we keep repeating them.
For example, consider the first phrase from the previous diagram, it has a bag of words that contents words such as channel, with one occurrence, plz, with one occurrence, subscribe, two occurrences, and so on. Then, we would collect all these counts in a vector, where one vector per phrase or sentence or document, depending on what you are working with. Again, the order in which the words appeared originally doesn't matter.
The vector that we created can also be used to sort data alphabetically, but it needs to be done consistently for all the different phrases. However, we still have the same problem. Each phrase has a vector with different columns, because each phrase has different words and a different number of columns, as shown in the following two tables:
If we make a larger vector with all the unique words across both phrases, we get a proper matrix representation. With each row representing a different phrase, notice the use of 0 to indicate that a phrase doesn't have a word:
If you want to have a bag of words with lots of phrases, documents, or we would need to collect all the unique words that occur across all the examples and create a huge matrix, N x M, where N is the number of examples and M is the number of occurrences. We could easily have thousands of dimensions compared in a four-dimensional model for the iris dataset. The bag of words matrix is likely to be sparse, meaning mostly zeros, since most phrases don't have most words.
Before we start building our bag of words model, we need to take care of a few things, such as the following:
  • Lowercase every word
  • Drop punctuation
  • Drop very common words (stop words)
  • Remove plurals (for example, bunnies => bunny)
  • Perform lemmatization (for example, reader => read, reading = read)
  • Use n-grams, such as bigrams (two-word pairs) or trigrams
  • Keep only frequent words (for example, must appear in >10 examples)
  • Keep only the most frequent M words (for example, keep only 1,000)
  • Record binary counts (1 = present, 0 = absent) rather than true counts
There are ...

Índice