GIS Algorithms
eBook - ePub

GIS Algorithms

  1. 336 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

GIS Algorithms

About this book

Geographic information systems (GIS) have become increasingly important in helping us understand complex social, economic, and natural dynamics where spatial components play a key role. The critical algorithms used in GIS, however, are notoriously difficult to both teach and understand, in part due to the lack of a coherent representation. GIS Algorithms attempts to address this problem by combining rigorous formal language with example case studies and student exercises.

Using Python code throughout, Xiao breaks the subject down into three fundamental areas:
  • Geometric Algorithms
  • Spatial Indexing
  • Spatial Analysis and Modelling
With its comprehensive coverage of the many algorithms involved, GIS Algorithms is a key new textbook in this complex and critical area of geography.

Tools to learn more effectively

Saving Books

Saving Books

Keyword Search

Keyword Search

Annotating Text

Annotating Text

Listen to it instead

Listen to it instead

1 Introduction

Algorithms1 are designed to solve computational problems. In general, an algorithm is a process that contains a set of well designed steps for calculation. For example, to correctly calculate the sum of 18 and 19, one must know how to deal with the fact that 8 plus 9 is more than 10, though different cultures have different ways of processing that. Even for a simple problem like this, we expect that the steps used can help us get the answer quickly and correctly. There are many problems that are more difficult than the simple problem of addition, and solving these problems, again efficiently and correctly, requires more careful design of the computational steps.
In GIS development and applications, algorithms are important in almost every aspect. When we click on a map, for example, we expect a quick response from the computer system so that we can pull out relevant information about the point or area we just clicked on. Such a fundamental daily routine for almost every GIS application involves a variety of algorithms to ensure a satisfying response. It starts from searching for the object (point, line, polygon, or pixel) underneath the clicking point. An efficient search algorithm will allow us to narrow down to the area of interest quickly. While a brute-force approach may work by physically checking every object in our data, it will not be useful for a large data set that will make the time of finding the target impractically long. Many spatial indexing and query algorithms are designed to address this issue. While the search is ongoing, we must check whether the object in our data matches the point clicked. For polygons, we must decide whether the click point is within a polygon in our data, which requires a special algorithm to quickly return a yes or no answer to decide whether the point is in the polygon. Geospatial data normally come from many different sources, and it has been common practice to transform them into the same coordinate system so that different data sets can be processed consistently. Another common application of the multiple data sources is to overlay them to make the information more useful together.
There are many aspects of an algorithm to be examined. It is straightforward to require that an algorithm solve the problem correctly. For some algorithms, it is easy to prove their correctness. For example, we will introduce two search algorithms later in this chapter, and their correctness should be quite straightforward. Other algorithms, however, are not so obvious, and proving their correctness will require more formal analysis. A second feature of algorithms is their efficiency or running time. Of course we always want an algorithm to be fast, but there are theoretical limits on how fast or efficient an algorithm can be, as determined by the problem. We will discuss some of those problems at the end of the book under topics of spatial optimization. Besides correctness and running time, algorithms are often closely related to how the data are organized to enable the processes and how the algorithms are actually implemented.

1.1 Computational concerns for algorithms

Let us assume we have a list of n points and the list does not have any order. We want to find a point from the list. How long will we take to find the point? This is a reasonable question. But the actual time is highly related to a lot of issues such as the programming language, the skill of the person who codes the program, the platform, the speed and number of the CPUs, and so on. A more useful way to examine the time issue is to know how many steps we need to finish the job, and then we analyze the total cost of performing the algorithm in terms of the number of steps used. The cost of each step is of course variable and is dependent on what constitutes a step. Nevertheless, it is still a more reliable way of thinking about computing time because many computational steps, such as simple arithmetic operations, logical expressions, accessing computer memory for information retrieval, and variable value assignment, can be identified and they only cost a constant amount of time. If we can figure out a way to count how many steps are needed to carry out a procedure, we will then have a pretty good idea about how much time the entire procedure will cost, especially when we compare algorithms.
Returning to our list of points, if there is no structure in the list – the points are stored in an arbitrary order – the best we can do to find a point from the list is to test all the points in the list, one by one, until we can conclude it is in or not in the list. Let us assume the name of the list is points and we want to find if the list includes point p0. We can use a simple algorithm to do the search (Listing 1.1).
Figure 3
The algorithm in Listing 1.1 is called a linear search; in it we simply go through all the points, if necessary, to search for the information we need. How many steps are necessary in this algorithm? The first line is a loop and, because of the size of the list, it will run as many as n times when the item we are looking for happens to be the last one in the list. The cost of running just once in the loop part in line 1 is a constant because the list is stored in the computer memory, and the main operation steps here are to access the information at a fixed location in the memory and then to move on to the next item in the memory. Suppose that the cost is c1 and we will run it up to to n times in the loop. The second line is a logic comparison between two points. It will run up to n times as well because it is inside the loop. Suppose that the cost of doing a logic comparison is c2 and it is a constant too. Line 3 simply returns the value of the point found; it has a constant cost of c and it will only run once. For the best case scenario, we will find the target at the first iteration of the loop and therefore the total cost is simply c1 + c2 + c, which can be generalized as a constant b + c. In the worst case scenario, however, we will need to run all the way to the last item in the list and therefore the total cost becomes c1n + c2n + c, which can be generalized as bn + c, where b and c are constants, and n is the size of the list (also the size of the problem). On average, if the list is a random set of points and we are going to search for a random point many times, we should expect a cost of c1n/2 + c2n/2 + c, which can be generalized as bn + c, and we know b< b, meaning that it will not cost as much as the worse case scenario does.
How much are we interested in the actual values of b, b′, and c in the above analysis? How will these values impact the total computation cost? As it turns out, not much, because they are constants. But adding them up many times will have a real impact and n, the problem size, generally controls how many times these constant costs will be added together. When n reaches a certain level, the impact of the constants will become minimal and it is really the magnitude of n that controls the growth of the total computation cost.
Some algorithms will have a cost related to n2, which is significantly different from the cost of n. For example, the algorithm in Listing 1.2 is a simple procedure to compute the shortest pairwise distance between two points in the list of n points. Here, the first loop (line 2) will run n times at a cost of t1 each, and the second loop (line 3) will run exactly n2 times at the same cost of t1 each. The logic comparison (line 4) will run n2 times and we assume each time the cost is t2. The calculation of distance (line 5) will definitely be more costly than the other simple operations such as logic comparison, but it is still a constant as the input is fixed (with two points) and only a small finite set of steps will be taken to carry out the calculation. We say the cost of each distance calculation is a constant t3. Since we do not compute the distance between the point and itself, the distance calculation will run n2n times, as will the comparison in line 6 (with time t4). The assignment in line 7 will cost a constant time of t5 and may run up to n2n times in the worst case scenario where every next distance is shorter than the previous one. The last line will only run once with a time of c. Overall, the total time for this algorithm will be t1n + t1n2 + t2n2 + t3 (n2n) + t4 (n2n) + t5 (n2n) + c, which can be generalized as an2 + bn + c. Now it should be clear that this algorithm has a running time that is controlled by n2.
Figure 4
In the two example algorithms we have examined so far, the order of n indicates the total cost and we say that our linear search algorithm has a computation cost in the order of n and the shortest pairwise distance algorithm in the order of n2. For the linear search, we also know that, when n increases, the total cost of search will always have an upper bound of bn. But is there a lower bound? We know the best case scenario has a running time of a constant, or in the order of n0, but that does not apply to the general case. When we can definitely find an upper bound but not a lower bound of the running time, we use the O-notation to denote the order. In our case, we have O(n) for the average case, and the worst case scenario as well (because again the constants do not control the total cost). In other words, we say that the running time, or time complexity, of the linear search algorithm is O(n). Because the O-notation is about the upper bound, which is meant to be the worst case scenario, we also mean the time complexity of the worst case scenario.
There are algorithms for which we do not have the upper bound of their running time. But we know their lower bound and we use the Ω-notation to indicate that. A running time of Ω(n) would mean we know the algorithm will at least cost an order of n in its running time, though we do not know the upper bound of the running time. For other algorithms, we know both upper and lower bounds of the running time and we use the Θ-notation to indicate that. For example, a running time of Θ(n2) indicates that the algorithm will take an order of n2 in running time in all cases, best and worst. This is the case for our shortest distance algorithm because the process will always run n2 times, regardless of the outcome of the comparison in line 6. It is more accurate to say the time complexity is Θ(n2) instead of O(n2) because we know the lower bound of the running time of pairwise shortest distance is always in the order of n2.
Now we reorganize our points in the previous list in a particular tree structure as illustrated in Figure 1.1. This is a binary tree because each node on the tree can have at most two branches, starting from the root. Here the root of the tree stores point (6, 7) and we show it at the top. All the points with X coordinates smaller than or equal to that at the root are stored in the left branches of the root and those with X coordinates greater than that of the root point are stored in the right branches of the root. Going down the tree to the second level, we have two points there, (4, 6) a...

Table of contents

  1. Cover
  2. Half Title
  3. Acknowledgements
  4. Title Page
  5. Copyright Page
  6. Acknowledgements
  7. Contents
  8. About the Author
  9. Preface
  10. 1 Introduction
  11. Part I Geometric Algorithms
  12. 2 Basic Geometric Operations
  13. 3 Polygon Overlay
  14. Part II Spatial Indexing
  15. 4 Indexing
  16. 5 k-D Trees
  17. 6 Quadtrees
  18. 7 Indexing Lines and Polygons
  19. Part III Spatial Analysis and Modeling
  20. 8 Interpolation
  21. 9 Spatial Pattern and Analysis
  22. 10 Network Analysis
  23. 11 Spatial Optimization
  24. 12 Heuristic Search Algorithms
  25. Postscript
  26. Appendix A Python: A Primer
  27. Appendix B GDAL/OGR and PySAL
  28. Appendix C Code List
  29. References
  30. Index

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access GIS Algorithms by Ningchuan Xiao,Author in PDF and/or ePUB format, as well as other popular books in Technik & Maschinenbau & Bauingenieurwesen. We have over one million books available in our catalogue for you to explore.