Performance evaluation for most software products is a very scientific process. First, we determine the maximum/minimum supported performance metrics, such as the allowed memory usage, acceptable CPU consumption, and the number of concurrent users. Next, we perform load testing against the application in scenarios with a version of the application built for the target platform, and test it while gathering instrumentation data. Once this data is collected, we analyze and search it for performance bottlenecks. If problems are discovered, we complete a Root Cause Analysis (RCA), and then make changes in the configuration or application code to fix the issue and repeat it.
Although game development is a very artistic process, it is still exceptionally technical. Our game should have a target audience in mind, which can tell us what hardware limitations our game might be operating under and, perhaps, tell us exactly what performance targets we need to meet (particularly in the case of console and mobile games). We can perform runtime testing on our application, gather performance data from multiple subsystems (CPU, GPU memory, the physics engine, the Rendering Pipeline, and so on), and compare them against what we consider to be acceptable. We can then use this data to identify bottlenecks in our application, perform additional instrumentation measurements, and determine the root cause of the issue. Finally, depending on the type of problem, we should be capable of applying a number of solutions to improve our application's performance.
However, before we spend even a single moment making performance fixes, we will first need to prove that a performance problem exists. It is unwise to spend time rewriting and refactoring code until there is a good reason to do so since pre-optimization is rarely worth the hassle. Once we have proof of a performance issue, the next task is figuring out exactly where the bottleneck is located. It is important to ensure that we understand why the performance issue is happening; otherwise, we could waste even more time applying fixes that are little more than educated guesses. Doing so often means that we only fix a symptom of the issue, not its root cause, and so we risk it manifesting itself in other ways in the future, or in ways we haven't yet detected.
In this chapter, we will explore the following:
- How to gather profiling data using the Unity Profiler
- How to analyze Profiler data for performance bottlenecks
- Techniques to isolate a performance problem and determine its root cause
With a thorough understanding of the problems you're likely to face, you will then be ready for the information presented in the remaining chapters, where you will learn what solutions are available for the types of issue we detect.
The Unity Profiler is built into the Unity Editor itself and provides an expedient way of narrowing down our search for performance bottlenecks by generating usage and statistics reports on a multitude of Unity3D subsystems during runtime. The different subsystems for which it can gather data are listed as follows:
- CPU consumption (per-major subsystem)
- Basic and detailed rendering and GPU information
- Runtime memory allocations and overall consumption
- Audio source/data usage
- Physics engine (2D and 3D) usage
- Network messaging and operation usage
- Video playback usage
- Basic and detailed user interface performance
- Global Illumination (GI) statistics
There are generally two approaches to making use of a profiling tool: instrumentation and benchmarking (although, admittedly, the two terms are often used interchangeably).
Instrumentation typically means taking a close look into the inner workings of the application by observing the behavior of targeted function calls, where/how much memory is being allocated, and, generally getting an accurate picture of what is happening with the hope of finding the root cause of a problem. However, this is normally not an efficient way of starting to identify performance problems because profiling of any application comes with a performance cost of its own.
When a Unity application is compiled in Development Mode (determined by the Development Build flag in the Build Settings menu), additional compiler flags are enabled causing the application to generate special events at runtime, which get logged and stored by the Profiler. Naturally, this will cause additional CPU and memory overhead at runtime due to all of the extra workload the application takes on. Even worse, if the application is being profiled through the Unity Editor, then even more CPU and memory use will be incurred, ensuring that the Editor updates its interface, renders additional windows (such as the Scene window), and handles background tasks. This profiling cost is not always negligible. In excessively large projects, it can sometimes cause all kinds of inconsistent and unexpected behavior when the Profiler is enabled: Unity can go out of memory, some scripts may refuse to run, physics may stop being updated (the time used for a frame may be so large that the physics engine reaches the maximum allowed updates per frame), and more. This is a necessary price we pay for a deep analysis of our code's behavior at runtime, and we should always be aware of its implications. Therefore, before we get ahead of ourselves and start analyzing every line of code in our application, it would be wiser to do some benchmarking.
Benchmarking involves performing a surface-level measurement of the application. We should gather some rudimentary data and perform test scenarios during a runtime session of our game while it runs on the target hardware; the test case could simply be, for example, a few seconds of gameplay, playback of a cutscene, or a partial playthrough of a level. The idea of this activity is to get a general feel for what the user might experience and keep watching for moments when performance becomes noticeably worse. Such problems may be severe enough to warrant further analysis.
The important metrics we're interested in when we carry out a benchmarking process are often the number of frames per-second (FPS) being rendered, overall memory consumption, how CPU activity behaves (looking for large spikes in activity), and sometimes CPU/GPU temperature. These are all relatively simple metrics to collect and can be used as a go-to first approach to performance analysis for one important reason: it will save us an enormous amount of time in the long run. It ensures that we only spend our time investigating problems that users would notice.
We should dig deeper into instrumentation only after a benchmarking test indicates that further analysis is required. It is also very important to benchmark by simulating actual platform behavior as much as possible if we want a realistic data sample. As such, we should never accept benchmarking data that was generated through Editor mode as being representative of real gameplay, since Editor mode comes with some additional overhead costs that might mislead us, or hide potential race conditions in a real application. Instead, we should hook the profiling tool into the application while it is running in a standalone format on the target hardware.
Many Unity developers are surprised to find that the Editor sometimes calculates the results of operations much faster than a standalone application does. This is particularly common when dealing with serialized data such as audio files, Prefabs, and scriptable objects. This is because the Editor will cache previously imported data and is able to access it much faster than a real application would.
Now, let's cover how to access the Unity Profiler and connect it to the target device so that we can start to make accurate benchmarking tests.
Users who are already familiar with connecting the Unity Profiler to their applications can skip to the section entitled The Profiler window.
We will begin with a brief tutorial on how to connect our game to the Unity Profiler within a variety of contexts:
- Local instances of the application, either through the Editor or a standalone instance
- Local instances of a WebGL application running in a browser
- Remote instances of the application on an iOS device (for example, iPhone or iPad)
- Remote instances of the application on an Android device (for example, an Android tablet or phone)
- Profiling the E...