Build Your Own Programming Language
eBook - ePub

Build Your Own Programming Language

Clinton L. Jeffery

  1. 494 páginas
  2. English
  3. ePUB (apto para móviles)
  4. Disponible en iOS y Android
eBook - ePub

Build Your Own Programming Language

Clinton L. Jeffery

Detalles del libro
Vista previa del libro

Información del libro

Written by the creator of the Unicon programming language, this book will show you how to implement programming languages to reduce the time and cost of creating applications for new or specialized areas of computing

Key Features

  • Reduce development time and solve pain points in your application domain by building a custom programming language
  • Learn how to create parsers, code generators, file readers, analyzers, and interpreters
  • Create an alternative to frameworks and libraries to solve domain-specific problems

Book Description

The need for different types of computer languages is growing rapidly and developers prefer creating domain-specific languages for solving specific application domain problems. Building your own programming language has its advantages. It can be your antidote to the ever-increasing size and complexity of software.In this book, you'll start with implementing the frontend of a compiler for your language, including a lexical analyzer and parser. The book covers a series of traversals of syntax trees, culminating with code generation for a bytecode virtual machine. Moving ahead, you'll learn how domain-specific language features are often best represented by operators and functions that are built into the language, rather than library functions. We'll conclude with how to implement garbage collection, including reference counting and mark-and-sweep garbage collection. Throughout the book, Dr. Jeffery weaves in his experience of building the Unicon programming language to give better context to the concepts where relevant examples are provided in both Unicon and Java so that you can follow the code of your choice of either a very high-level language with advanced features, or a mainstream language.By the end of this book, you'll be able to build and deploy your own domain-specific languages, capable of compiling and running programs.

What you will learn

  • Perform requirements analysis for the new language and design language syntax and semantics
  • Write lexical and context-free grammar rules for common expressions and control structures
  • Develop a scanner that reads source code and generate a parser that checks syntax
  • Build key data structures in a compiler and use your compiler to build a syntax-coloring code editor
  • Implement a bytecode interpreter and run bytecode generated by your compiler
  • Write tree traversals that insert information into the syntax tree
  • Implement garbage collection in your language

Who this book is for

This book is for software developers interested in the idea of inventing their own language or developing a domain-specific language. Computer science students taking compiler construction courses will also find this book highly useful as a practical guide to language implementation to supplement more theoretical textbooks. Intermediate-level knowledge and experience working with a high-level language such as Java or the C++ language are expected to help you get the most out of this book.


Preguntas frecuentes

¿Cómo cancelo mi suscripción?
Simplemente, dirígete a la sección ajustes de la cuenta y haz clic en «Cancelar suscripción». Así de sencillo. Después de cancelar tu suscripción, esta permanecerá activa el tiempo restante que hayas pagado. Obtén más información aquí.
¿Cómo descargo los libros?
Por el momento, todos nuestros libros ePub adaptables a dispositivos móviles se pueden descargar a través de la aplicación. La mayor parte de nuestros PDF también se puede descargar y ya estamos trabajando para que el resto también sea descargable. Obtén más información aquí.
¿En qué se diferencian los planes de precios?
Ambos planes te permiten acceder por completo a la biblioteca y a todas las funciones de Perlego. Las únicas diferencias son el precio y el período de suscripción: con el plan anual ahorrarás en torno a un 30 % en comparación con 12 meses de un plan mensual.
¿Qué es Perlego?
Somos un servicio de suscripción de libros de texto en línea que te permite acceder a toda una biblioteca en línea por menos de lo que cuesta un libro al mes. Con más de un millón de libros sobre más de 1000 categorías, ¡tenemos todo lo que necesitas! Obtén más información aquí.
¿Perlego ofrece la función de texto a voz?
Busca el símbolo de lectura en voz alta en tu próximo libro para ver si puedes escucharlo. La herramienta de lectura en voz alta lee el texto en voz alta por ti, resaltando el texto a medida que se lee. Puedes pausarla, acelerarla y ralentizarla. Obtén más información aquí.
¿Es Build Your Own Programming Language un PDF/ePUB en línea?
Sí, puedes acceder a Build Your Own Programming Language de Clinton L. Jeffery en formato PDF o ePUB, así como a otros libros populares de Computer Science y Programming in Java. Tenemos más de un millón de libros disponibles en nuestro catálogo para que explores.


Computer Science

Section 1: Programming Language Frontends

In this section, you will create a basic language design and implement the frontend of a compiler for it, including a lexical analyzer and a parser that builds a syntax tree from an input source file.
This section comprises the following chapters:
  • Chapter 1, Why Build Another Programming Language?
  • Chapter 2, Programming Language Design
  • Chapter 3, Scanning Source Code
  • Chapter 4, Parsing
  • Chapter 5, Syntax Trees

Chapter 1: Why Build Another Programming Language?

This book will show you how to build your own programming language, but first, you should ask yourself, why would I want to do this? For a few of you, the answer will be simple: because it is so much fun. However, for the rest of us, it is a lot of work to build a programming language, and we need to be sure about it before we make a start. Do you have the patience and persistence that it takes?
This chapter points out a few good reasons for building your own programming language, as well as some situations where you don't have to build your contemplated language; after all, designing a class library for your application domain might be simpler and just as effective. However, libraries have their downsides, and sometimes only a new language will do.
After this chapter, the rest of this book will, having considered things carefully, take for granted that you have decided to build a language. In that case, you should determine some of the requirements for your language. But first, we're going to cover the following main topics in this chapter:
  • Motivations for writing your own programming language
  • The difference between programming languages and libraries
  • The applicability of programming language tools to other software projects
  • Establishing the requirements for your language
  • A case study that discusses the requirements for the Unicon language
Let's start by looking at motivations.

So, you want to write your own programming language…

Sure, some programming language inventors are rock stars of computer science, such as Dennis Ritchie or Guido van Rossum! But becoming a rock star of computer science was easier back then. I heard the following report a long time ago from an attendee of the second History of Programming Languages Conference: The consensus was that the field of programming languages is dead. All the important languages have been invented already. This was proven wildly wrong a year or two later when Java hit the scene, and perhaps a dozen times since then when languages such as Go emerged. After a mere six decades, it would be unwise to claim our field is mature and that there's nothing new to invent that might make you famous.
Still, celebrity is a bad reason for building a programming language. The chances of acquiring fame or fortune from your programming language invention are slim. Curiosity and desire to know how things work are valid reasons, so long as you've got the time and inclination, but perhaps the best reasons for building your own programming language are need and necessity.
Some folks need to build a new language or a new implementation of an existing programming language to target a new processor or compete with a rival company. If that's not you, then perhaps you've looked at the best languages (and compilers or interpreters) available for some domain that you are developing programs for, and they are missing some key features for what you are doing, and those missing features are causing you pain. Every once in a blue moon, someone comes up with a whole new style of computing that a new programming paradigm requires a new language for.
While we are discussing your motivations for building a language, let's talk about the different kinds of languages, organization, and the examples this book will use to guide you. Each of these topics is worth looking at.

Types of programming language implementations

Whatever your reasons, before you build a programming language, you should pick the best tools and technologies you can find to do the job. In our case, this book will pick them for you. First, there is a question of the implementation language that you are building your language in. Programming language academics like to brag about writing their language in that language itself, but this is usually only a half-truth (or someone was being very impractical and showing off at the same time). There is also the question of just what kind of programming language implementation to build:
  • A pure interpreter that executes source code itself
  • A native compiler and a runtime system, such as in C
  • A transpiler that translates your language into some other high-level language
  • A bytecode compiler with an accompanying bytecode machine, such as Java
The first option is fun but usually too slow. The second option is the best, but usually, it's too labor-intensive; a good native compiler may take many person-years of effort.
While the third option is by far the easiest and probably the most fun, and I have used it before with success, if it isn't a prototype, then it is sort of cheating. Sure, the first version of C++ was a transpiler, but that gave way to compilers and not just because it was buggy. Strangely, generating high-level code seems to make your language even more dependent on the underlying language than the other options, and languages are moving targets. Good languages have died because their underlying dependencies disappeared or broke irreparably on them. It can be the death of a thousand small cuts.
This book chooses the fourth option: we will build a bytecode compiler with an accompanying bytecode machine because that is a sweet spot that gives the most flexibility while still offering decent performance. A chapter on native code compilation is included for those of you who require the fastest possible execution.
The notion of a bytecode machine is very old; it was made famous by UCSD's Pascal implementation and the classic SmallTalk-80 implementation, among others. It became ubiquitous to the point of entering lay English with the promulgation of Java's JVM. Bytecode machines are abstract processors interpreted by software; they are often called virtual machines (as in Java Virtual Machine), although I will not use that terminology because it is also used to refer to software tools that use real hardware instruction sets, such as IBM's classic platforms or more modern tools such as Virtual Box.
A bytecode machine is typically quite a bit higher level than a piece of hardware, so a bytecode implementation affords much flexibility. Let's have a quick look at what it will take to get there…

Organizing a bytecode language implementation

To a large extent, the organization of this book follows the classic organization of a bytecode compiler and its corresponding virtual machine. These components are defined here, followed by a diagram to summarize them:
  • A lexical analyzer reads in source code characters and figures out how they are grouped into a sequence of words or tokens.
  • A syntax analyzer reads in a sequence of tokens and determines whether that sequence is legal according to the grammar of the language. If the tokens are in a legal order, it produces a syntax tree.
  • A semantic analyzer checks to ensure that all the names being used are legal for the operations in which they are being used. It checks their types to determine exactly what operations are being performed. All this checking makes the syntax tree heavy, laden with the extra information about where variables are declared and what their types are.
  • An intermediate code generator figures out memory locations for all the variables and all the places where a program may abruptly change execution flow, such as loops and function calls. It adds them to the syntax tree and then walks this even fatter tree before building a list of machine-independent intermediate code instructions.
  • A final code generator turns the list of intermediate code instructions into the actual bytecode in a file format that will be efficient to load and execute.
Independent from the steps of this bytecode virtual machine compiler, a bytecode interpreter is written to load and execute programs. It is a gigantic loop with a switch statement in it, but for exotic programming languages, the compiler might be no big deal and all the magic will happen in the bytecode interpreter. The whole organization can be summarized by the following diagram:
Figure 1.1 – Phases and dataflow in a simple programming language
Figure 1.1 – Phases and dataflow in a simple programming language
It will take a lot of code to illustrate how to build a bytecode machine implementation of a programming language. How that code is presented is important and will tell you what you need to know going in, and much of what you may learn from going through this book.

Languages used in the examples

This book provides code examples in two languages using a parallel translations model. The first language is Java, because that language is ubiquitous. Hopefully, you know it or C++ and will be able to read the examples with intermediate proficiency. The second example language is the author's own language, Unicon. While reading this book, you can judge for yourself which language is better suited to building your own programming language. As many examples as possible will be provided in both languages, and the examples in the two languages will be written as similarly as possible. Sometimes, this will be to the advantage of the lesser language.
The differences between Java and Unicon wi...


  1. Build Your Own Programming Language
  2. Contributors
  3. Preface
  4. Section 1: Programming Language Frontends
  5. Chapter 1: Why Build Another Programming Language?
  6. Chapter 2: Programming Language Design
  7. Chapter 3: Scanning Source Code
  8. Chapter 4: Parsing
  9. Chapter 5: Syntax Trees
  10. Section 2: Syntax Tree Traversals
  11. Chapter 6: Symbol Tables
  12. Chapter 7: Checking Base Types
  13. Chapter 8: Checking Types on Arrays, Method Calls, and Structure Accesses
  14. Chapter 9: Intermediate Code Generation
  15. Chapter 10: Syntax Coloring in an IDE
  16. Section 3: Code Generation and Runtime Systems
  17. Chapter 11: Bytecode Interpreters
  18. Chapter 12: Generating Bytecode
  19. Chapter 13: Native Code Generation
  20. Chapter 14: Implementing Operators and Built-In Functions
  21. Chapter 15: Domain Control Structures
  22. Chapter 16: Garbage Collection
  23. Chapter 17: Final Thoughts
  24. Section 4: Appendix
  25. Appendix: Unicon Essentials
  26. Assessments
  27. Other Books You May Enjoy