III
Supports in Virtual Machine
CHAPTER 7
Native Interface
THROUGHOUT THE DISCUSSIONS ON just-in-time (JIT), garbage collection (GC), and threading, we mentioned a couple of core functionalities that need supports in a virtual machine (VM). In the following few chapters in Section III, we will discuss them with more details.
7.1 WHY NATIVE INTERFACE
Native interface is needed for high-level languages to access low-level system resource and VM services. They cannot directly access low-level resource for security, portability, and implementation reasons.
ā¢ Security reason: High-level language is not allowed to directly manipulate memory address, machine instruction, input or output (I/O) interfaces, and so on. These accesses are necessary when the program needs to deal with low-level logics or to provide high performance.
ā¢ Portability reason: High-level language is designed to be platform independent. To access platform-specific features such as file system, it has to use the native language of the platform.
ā¢ Implementation reason: Sometimes, certain libraries are only available in native languages such as media libraries that are either not ported to high-level languages or only available as legacy implementation.
To bridge the gaps, native interface is needed for the high-level language, which is implemented in its VM. The word ānativeā here refers to the nature that the interface provides the access to the native language of the operating system (OS) underlying the VM. Since C programming language is the native language in major OSs available today, it makes sense for Java Native Interface (JNI) to support C language access, while Java VM (JVM) does not exclude other languages from programming native methods.
Native interface design has following properties:
Native language: The native language of an OS is not necessarily C language, or even not necessarily low-level language. It all depends on the implementation. For a Java-based OS, Java can be regarded the native language of the OS. However, such an OS still needs native interface for Java to access the low-level hardware or system resource, unless the hardware is designed in a way that allows for secure programming. The ultimate question is whether the world is safe by itself that can be modeled by a computing machine. If the answer is not, then a native interface is always necessary on the boundary between safe and unsafe worlds. As a result, the native language can be lower level than C, as long as the interface convention is well defined.
Native code to managed code: Native interface is defined not only for the high-level language to access a low-level one, but also for the reverse direction, that is, the low-level language to access the high-level one. The latter is needed, because otherwise there is no way to launch the VM system from the OS, or to call back from native code to the high-level program. For example, a C-written listener application on a network socket wakes up for a socket event and invokes the event handler that is written in Java program.
Data sharing: Native interface is needed not only because of the code access between high-level and low-level languages, but also for the data sharing between them. The low-level language should be able to access the data created by the high-level language. It is also desirable for the low-level language to create data that is accessible to the high-level language.
High-level properties: Although it is designed for low-level language access, native interface is part of the high-level language design. That means, the application programming interface (API) of native interface should not break the important safety properties of the high-level language. For example, the object layout should still be opaque to native code. Same exception-throwing process should still be observed in native code.
The safety property maintenance is a feature of the program only when it is written in ānative interface,ā because the native interface is under VMās control. Programs written in ānative codeā but not following ānative interfaceā do not maintain the safety property. Native code can do anything it is designed for. It can allocate virtual memory, create native thread, and others, with the low-level language API. Those entities are then not managed by the VM but by the low-level languageās implementation. For example, the directly allocated virtual memory in native code is not subject to the VMās garbage collection.
In recent years, web application is becoming popular, where the high-level programming language is HTML/Javascript. The VM for web application is called web runtime that is usually embedded in a web browser. As a result, although the term ānative languageā in web browser community refers C/C++ as in Java community, it refers to different things in web application community.
For example, the web application community calls Java the native language of Android, because Java is the major programming language of Android, in contrast to the web programming language HTML/Javascript. Similarly, Object-C or Swift is referred as the native language of iOS by web application community. However, to the browser developers of Chrome or Safari (not web application developers), the native language to the web runtime is still C/C++, because that is the language implementing the web runtime and providing it the low-level resource access.
In the remaining part of this chapter, we use JNI as an example to discuss the details of a common native interface implementation while the design is not limited to JNI.
7.2 TRANSITION FROM MANAGED CODE TO NATIVE CODE
The primary requirement of native interface is to allow the managed code to call native code and vice versa. Then the key is to agree on a calling convention between the two worlds. Calling convention defines the Application Binary Interface (ABI) for the program control flow to transfer into and out of a function (or method), that is, how to pass arguments and return values, how to prepare and restore the stack. Sometimes, it also needs to maintain the stack frame information to support the requirements of debugging, exception handling, and garbage collection. Once a calling convention is defined for a language on a platform, any compiler when generating code for that language on that platform should follow the convention. Code from different languages may be able to interact with each other if they follow the same calling convention.
Native code is compiled by a different compiler than the VMās JIT compiler, and the native code compiler is usually not part of the VM. In other words, the calling convention of native code is not defined by the VM. If the managed code wants to interact with the native code, it should follow the native codeās calling convention. That is, JVM should know Cās calling convention to support JNI.
7.2.1 Wrapper for Native Method
A common way to implement native call in JVM is to generate wrapper code to conduct the calling convention transformation between Java and native code. The wrapper code does all the necessary preparation and bookkeeping for the control flow transference, as shown in Figures 7.1 and 7.2.
When compiling the callerās Java code, the JIT compiler generates a call instruction to the wrapper code, which in turn calls into the actual native method. The wrapper follows Java calling convention to the Java caller and fo...