Computer Science

Scatter Chart Python

A scatter chart in Python is a type of data visualization that displays individual data points on a two-dimensional graph. It is commonly used to show the relationship between two variables and identify patterns or trends in the data. In Python, scatter charts can be created using libraries such as Matplotlib or Seaborn, providing flexibility in customization and presentation.

Written by Perlego with AI-assistance

7 Key excerpts on "Scatter Chart Python"

  • Book cover image for: Data Visualization with Python
    eBook - PDF

    Data Visualization with Python

    Exploring Matplotlib, Seaborn, and Bokeh for Interactive Visualizations (English Edition)

    • Outliers: Outliers, or data points that fall outside the expected range of values, can be easily identified on a scatter plot. Scatter plots are a useful tool for exploring the relationship between two variables in a simple, visual way, and are often used in exploratory data analysis. They can be created using the scatter function in the Matplotlib library in Python. Here is an example of how to create a scatter plot of the "Age" and "Fare" attributes of the Titanic survival data using Matplotlib in Python: import matplotlib.pyplot as plt import pandas as pd #Load the Titanic dataset df = pd.read_csv('titanic.csv') #Select the 'Age' and 'Fare' columns from dataset age = df.age.fillna(df.age.max()) fare = df.fare.fillna(df.fare.max()) #Plot the Scatter Plot plt.scatter(age, fare, alpha =0.7) #Add labels and title plt.xlabel('Age') plt.ylabel('Fare') plt.title('Scatter Plot of Titanic Passengers \'Age\' and \'Fare\'') #Show the plot plt.show() In this example, we use the read_csv() function from the Pandas library to load the Titanic survival data into a DataFrame. We then select the "Age" and "Fare" columns and remove missing values using the dropna() function. We use the scatter() function to plot the scatter plot of the "Age" and "Fare" data. We add labels and a title 112  Data Visualization with Python to the plot using the xlabel(), ylabel(), and title() functions, and display the plot using the show() function. Figure 5.9: A Scatter Plot This will create a scatter plot of the Titanic passengers' ages and fares with the x-axis labeled as "Age" and the y-axis labeled as "Fare". The title of the plot will be "Scatter Plot of Titanic Passengers' Age and Fare". Pie chart A pie chart is a type of circular statistical graphic, which is used to represent the proportional distribution of a dataset. Each segment of the pie chart represents a proportion of the total dataset, with the size of each segment proportional to the magnitude of the value it represents.
  • Book cover image for: Applied Univariate, Bivariate, and Multivariate Statistics Using Python
    eBook - ePub
    • Daniel J. Denis(Author)
    • 2021(Publication Date)
    • Wiley
      (Publisher)
    Data itself, however, never equals theory. In reference to our example, all we see is a greater number of individuals with COVID-19 in the southern parts of the state. The “how and why” behind those numbers is where theory comes in. Hence, you can perform data analyses all you like, but informing theory and obtaining scientific explanation can still be very challenging, especially if you are unable to perform a rigorous experiment to rule out many competing alternative hypotheses.

    3.4  The Scatterplot

    One of the most common and useful graphs in the history of statistics is that of the scatterplot. The scatterplot depicts the bivariate relationship between two variables or the multivariate relationship among many more variables (e.g. three-dimensional scatterplots), and hence it is appropriate for bivariate or multivariate data. However, it becomes unwieldy for visualizations in higher than three dimensions. The classic two-dimensional scatterplot can be easily generated using matplotlib in Python. Matplotlib is one of the more popular packages in Python for generating a variety of graphics, which include static, animated, and interactive visualizations. You can read more on matplotlib by referring to its website at matplotlib.org .
    As an example of a simple scatterplot in Python, we build a plot from scratch on two generic variables x and y. The data for this example are fictitious:
    x = [10, 15, 16, 23, 27, 38, 43, 56, 57, 60] y = [5, 8, 9, 13, 16, 20, 40, 45, 67, 75] import matplotlib.pyplot as plt import numpy as np plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.Reds) Out[6]:
    We can easily add a title to the plot quite simply by applying plt.title(“Scatterplot”) . We could also quite easily make adjustments to the plot, such as changing the number of bins. Let us change the bins from 50,50 to 100,100: plt.hist2d(x, y, bins=(100, 100), cmap=plt.cm.Reds)
  • Book cover image for: Hands on Data Science for Biologists Using Python
    • Yasha Hasija, Rajkumar Chakraborty(Authors)
    • 2021(Publication Date)
    • CRC Press
      (Publisher)
    5
    Python for Data Visualization

    Introduction

    “Data science” is a buzzword in today’s age of high throughput biology. When we say data science, we handle enormous amounts of data and arrive at insights into biological findings. Up until this point, we have learned how to handle large datasets and how to do an efficient calculation on these. Data visualization is another way to derive insights from data through visualizations by using elements like graphs (e.g. scatterplots, histograms, etc), maps, or charts that allow for the understanding of complexities within the data by identifying local trends or patterns, forming clusters, locating outliers, and more. Data visualization is the preliminary step after loading the data to view the distribution of values. Cleaning the data, checking the quality of data, doing exploratory data analysis, and presenting data and results are some of the necessary tasks that a data scientist needs to do before applying any Machine Learning or statistical model on the data. In this chapter, we will describe one of the primary data visualization libraries of Python called “Matplotlib” and draw a few basic graphics. Next, we will browse through a library called “Seaborn” which provides a high-level interface for drawing beautiful and informative statistical graphs. Lastly, we will learn about interactive and geographical data plotting.

    Matplotlib

    Matplotlib is the most popular plotting library in the Python community. It gives us control over almost every aspect of a figure or plot. Its design is familiar with Matlab, which is another programming language with its own graphical plotting capabilities. The primary goal of this section is to go through the basics of plotting using Matplotlib. If we have Anaconda distribution, then we have acquired Matplotlib installed by default, or else we have to install it using a “pip” installer. Matplotlib is imported as “plt”, similar to “np” for NumPy and “pd” for pandas. To view the graphs in the Jupyter Notebook, we have to use a Jupyter function “%matplotlib inline” in the notebook. The notebook of this chapter can be found with the supplementary files labeled as “Python for Data Visualization5.ipnyb”.
  • Book cover image for: A History of Data Visualization and Graphic Communication
    6 The Origin and Development of the Scatterplot As we saw in Chapter 5, most modern forms of data graphics—pie charts, line graphs, and bar charts—can generally be attributed to William Playfair in the period 1785–1805. All of these, even though presented as two-dimensional graphs, were essentially one-dimensional in their view of data. They showed a single quantitative variable (such as land area or value of trade) broken down by a categorical variable, as in a pie chart or bar chart, or plotted over time (perhaps with separate curves for imports and exports), as in a line graph. In the development of a language and taxonomy of graphs, Playfair’s graphs and other visual representations of data in this time can considered 1.5D— more than just a single variable shown, but not quite enough to qualify for 2D status. In Playfair’s visual understanding, the horizontal axis in his plots most often bound to time, forcing him to use other means to show relations with other variables. The next major invention in data graphics—the first fully two-dimensional one—was the scatterplot. Indeed, among all forms of statistical graphics, the scatterplot may be considered the most versatile and generally useful inven- tion in the entire history of statistical graphics. 1 Essential characteristics of a scatterplot are that two quantitative variables are measured on the same observational units (workers); the values are plotted as points referred to perpendicular axes; and the goal is to show something about the relation between these variables, typically how the ordinate variable, y, varies with the abscissa variable, x. Figure 6.1 shows a typical, if simplistic, modern scatterplot. It relates the number of years of experience of some workers on the horizontal (x) axis to their current annual salary on the vertical (y) axis. The experience and salary
  • Book cover image for: Applied Medical Statistics Using SAS
    Two elements of the shape of a scatter plot that are most useful in describing relationships between variables are measures of ‘location’ and ‘spread’. For example, location might be measured as a line or a curve that runs through the bulk of the data, while spread might be measured in terms of deviations of ( x, y ) points from the estimated location. 188 Applied Medical Statistics Using SAS The simple xy scatter plot has certainly been in use for a long time—at least from the eighteenth century, and it has many virtues, indeed, according to Tufte (1983): The relational graphic—in its barest form the scatterplot and its variants—is the greatest of all graphical designs. It links at least two variables encouraging and even imploring the viewer to assess the possible causal relationship between the plotted variables. It confronts causal theories that x causes y with empirical evidence as to the actual relationship between x and y. Now let’s have a look at an example of a scatter plot. For this we will use the data shown in Table 7.1, which were collected in a study investigating the pos-sible link between alcohol consumption and the death rate per 100,000 of the population from cirrhosis and alcoholism (data collected before West Germany ceased to exist as a separate country).
  • Book cover image for: Compassionate Statistics
    eBook - ePub

    Compassionate Statistics

    Applied Quantitative Analysis for Social Services (With exercises and instructions in SPSS)

    Chapter 11 , will introduce correlations used for inferential purposes.

    Scattergrams/Scatterplots

    It would be perfectly legitimate to portray the existence of a numerical correlation between two variables in a standard table, as in Table 10.1 .
    Unfortunately, the true nature of this correlation is not easily apparent in a table format. Only partially revealed in Table 10.1 is the fact that the data indicate (surprisingly) that the longer clients stayed in treatment at this agency, the worse became their attitude about that agency’s effectiveness. It is for this reason that most researchers create scattergrams, rather than tables, to present correlations used solely for descriptive purposes.
    Table 10.1   Table of the Length of Client Contact (in Weeks) and Client Attitude Toward Agency Effectiveness (10 = Very High , 1 = Very Low )
    Client # Contact in Weeks Attitude Toward Agency
    01 2 10
    02 4 8
    03 7 2
    04 6 3
    05 4 6
    06 2 10
    07 7 2
    08 10 1
    09 5 5
    10 7 3
    11 3 8
    12 2 8
    13 8 3
    14 9 1
    15 4 10
    16 8 2
    17 4 10
    A scattergram , also called a scatterplot, offers a clear visual image of the intersection of the values contained in the two variables. Scattergrams are conceptually based on the image of the X axis and Y axis. You are undoubtedly familiar with this image since it is commonly used as a template in economics and business administration classes to display economic trends and forecasts.
    A scattergram literally pinpoints where individual cases (usually people) are placed on a grid bounded by the possible values of the two variables being analyzed, then scatters those points, thereby forming some variation of a pattern. These possible values start at an absolute zero point, where the X axis crosses the Y axis, and then continue to increase vertically and horizontally out from that zero point, as illustrated in Figure 10.4
  • Book cover image for: Data Science Fundamentals and Practical Approaches
    eBook - ePub
    • Rupam Kumar Sharma, Kapil Jain(Authors)
    • 2020(Publication Date)
    • BPB Publications
      (Publisher)
    (x,y,z) dataset.
    Following (Program 3.23 ) illustrates the Python code for designing a 3D scatter plot for a given triplet that contains three feature values for (x, y, z ). The scatter () function is used for plotting the visualization graph. The labels of the three axes are accordingly provided and the title of the visualization graph is given as 3D Scatter Plot .
    #Program 3.23: Designing a 3D Scatter Pot # import libraries import matplotlib.pyplot as plt #Control the size of the plotted figure Figure = plt.figure(figsize=(10,8)) ax = fig.add_subplot(111, projection='3d') #Assign values for the 3D Plot x =[2,4,6,8,10,12,14,16,18,20,22] y =[2,5,6,9,11,7,8,14,17,19,20] z =[6,8,10,8,9,11,10,16,14,12,16] #Plot the 3D Scatter Plot ax.scatter(x, y, z, c='b', marker='D') ax.set_xlabel('X Label') ax.set_ylabel('Y Label') ax.set_zlabel('Z Label') ax.set_title('3D Scatter Plot') plt.show()
    The corresponding output of the above program is displayed in Figure 3.28 . The output displays a 3D scatter plotbased on x , y , and z values provided. The size of the figure is specified as (10,8) and the projection type is 3D. The plotted marks are specified to be of diamond shape and blue color.
    Figure 3.28: An example of 3D Scatter Plot
    Any 3D plots always give a special and effective result to a reader. 3D scatter plot is one such visualization tool that can represent various data series in one graph with the 3D effect. The fake notion of depth in the 3D scatter plot is created by introducing the Z-axis as explained in the Program 3.23 in this section.

    3.7. Advanced data visualization tools

    The basic and specialized visualization tools discussed in the above two sections are commonly used in data analytics to present data in a graphical format so as to allow users to grasp the information better than plain textual information. In this section, we will discuss three more advanced and complex visualization tools that are used in data analytics namely, wordclouds, waffle charts, and seaborn plots. A brief discussion of these three tools along with illustrative examples is given below.
Index pages curate the most relevant extracts from our library of academic textbooks. They’ve been created using an in-house natural language model (NLM), each adding context and meaning to key research topics.