Mathematics

Data transformations

Data transformations refer to the process of converting, manipulating, or reorganizing data to make it more suitable for analysis or presentation. In mathematics, data transformations often involve applying functions or operations to a set of data points, such as scaling, shifting, or applying mathematical functions like logarithms or exponentials. These transformations can help reveal patterns, relationships, or insights within the data.

Written by Perlego with AI-assistance

Related key terms

1 of 5

3 Key excerpts on "Data transformations"

No longer available |Learn more
Spark in Action, Second Edition
- Jean-Georges Perrin(Author)
- 2020(Publication Date)
- Manning Publications
  (Publisher)
I believe that using real-life datasets from official sources can help you understand the concepts more thoroughly. This process definitely simulates issues like the ones you are or will be facing in your day-to-day job. However, as with real data, you will have to go through the hurdle of formatting and understanding the data. Teaching you this process adds to the length of this chapter.

Lab Examples from this chapter are available in GitHub at https://github .com/jgperrin/net.jgp.books.spark.ch12 .

12.1 What is data transformation?

Data transformation is the process of converting data from one format or structure into another. In this short section, you will read more about the types of data that can be transformed as well as the types of transformations you will be able to perform. Data can be of several types:

Data can be structured and well organized, like tables and columns in relational databases.

Data can be in the form of documents, in a semistructured way. Those documents are often seen in NoSQL databases.

Data can be raw, completely unstructured, like a binary large object ( blob ) or document.

Data transformations change the data either from one type to another or within the same type. Transformations can apply to several aspects of the data:

At the record level: you can modify the values directly in the record (or row).

At the column level: you can create and drop columns in the dataframe.

In the metadata/structure of the dataframe.

Figure 12.1 summarizes where transformations can take place.
Apache Spark is a good candidate for any Data transformations; the size of the data does not really matter. Spark shines when data is structured and organized, but can be easily extended for more blobby (from blob ) and obscure data. Chapter 9 gave you an idea of this when you ingested metadata from photos.
Figure 12.1 Data can take many shapes, whether structured or unstructured. Transformations can happen between the shapes or within the same shape.
Sign up to read
Learn more about book
eBook - ePub
Hands-On Exploratory Data Analysis with Python
Perform EDA techniques to understand, summarize, and investigate your data
- Suresh Kumar Mukhiya, Usman Ahmed(Authors)
- 2020(Publication Date)
- Packt Publishing
  (Publisher)
Data transformation is a set of techniques used to convert data from one format or structure to another format or structure. The following are some examples of transformation activities:

Data deduplication involves the identification of duplicates and their removal.

Key restructuring involves transforming any keys with built-in meanings to the generic keys.

Data cleansing involves extracting words and deleting out-of-date, inaccurate, and incomplete information from the source language without extracting the meaning or information to enhance the accuracy of the source data.

Data validation is a process of formulating rules or algorithms that help in validating different types of data against some known issues.

Format revisioning involves converting from one format to another.

Data derivation consists of creating a set of rules to generate more information from the data source.

Data aggregation involves searching, extracting, summarizing, and preserving important information in different types of reporting systems.

Data integration involves converting different data types and merging them into a common structure or schema.

Data filtering involves identifying information relevant to any particular user.

Data joining involves establishing a relationship between two or more tables.

The main reason for transforming the data is to get a better representation such that the transformed data is compatible with other data. In addition to this, interoperability in a system can be achieved by following a common data structure and format.
Having said that, let's start looking at data transformation techniques with data integration in the next section. Passage contains an image

Merging database-style dataframes

Many beginner developers get confused when working with pandas dataframes, especially regarding when to use append, concat, merge, or join. In this section, we are going to check out the separate use cases for each of these.
Sign up to read
Learn more about book
eBook - ePub
Mathematical Structures for Computer Graphics
- Steven J. Janke(Author)
- 2014(Publication Date)
- Wiley
  (Publisher)
Once we have the list of triangles and associated vertices representing the car, we still need to position it in any broader scene; there may well be many other objects. This requires another transformation which will alter all the car's vertex coordinates once again. Finally, when we view the scene, we need to decide where the camera (or our eye) is and in which direction we are looking. This, too, requires another transformation plus a special one to convert our three-dimensional scene into a two-dimensional display. There is no escaping being adept at choosing and applying transformations.

Technically, a transformation is just a function that sends each point (or vertex), , to another point called . The result is to transform an object into a new object. The new object may have the same shape as the old and just a new position, or it may have an altered shape. Before we can talk about the mechanics of actually performing a transformation, we need to once again consider the differences between vectors and points in order to be careful about how we deal with each. Recall that we decided to represent both vectors and points as a column of numbers, so

The vector, however, is a displacement and the point is a position in the plane. We know there is a connection between the two because we determine a vector by subtracting two points. We are most interested in thinking of transformations as moving points to points, but we can also think of them as acting on vectors. By singling out a point, say the origin , every point can be thought of as a vector from to . Then, if the transformation is reasonably behaved, moving point to is analogous to moving the vector to a vector from to .

To transform vectors, we can change their direction and length in various ways. It does not, however, make sense to translate them (move to a new position) because vectors are independent of position. In contrast, it does make sense to translate points by moving them to new positions. It also makes sense to transform points in a way that changes their direction or distance from the origin. To differentiate further between these cases, we call our complete collection of points an affine space and any associated transformations affine transformations. Then the collection of vectors formed by taking the difference of any two points is called the associated vector space
Sign up to read
Learn more about book

Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.

Explore more topic indexes

1 of 8

View all

Data transformations

Related key terms

3 Key excerpts on "Data transformations"

Spark in Action, Second Edition

12.1 What is data transformation?

Hands-On Exploratory Data Analysis with Python

Perform EDA techniques to understand, summarize, and investigate your data

Merging database-style dataframes

Mathematical Structures for Computer Graphics

Explore more topic indexes