eBook - ePub

Build Your Own Programming Language

Name: Build Your Own Programming Language
Author: Clinton L. Jeffery

Clinton L. Jeffery

Share book

494 pages
English
ePUB (mobile friendly)
Available on iOS & Android

eBook - ePub

Build Your Own Programming Language

Clinton L. Jeffery

Book details

Book preview

Table of contents

Citations

About This Book

Written by the creator of the Unicon programming language, this book will show you how to implement programming languages to reduce the time and cost of creating applications for new or specialized areas of computing

Key Features

Reduce development time and solve pain points in your application domain by building a custom programming language
Learn how to create parsers, code generators, file readers, analyzers, and interpreters
Create an alternative to frameworks and libraries to solve domain-specific problems

Book Description

The need for different types of computer languages is growing rapidly and developers prefer creating domain-specific languages for solving specific application domain problems. Building your own programming language has its advantages. It can be your antidote to the ever-increasing size and complexity of software.In this book, you'll start with implementing the frontend of a compiler for your language, including a lexical analyzer and parser. The book covers a series of traversals of syntax trees, culminating with code generation for a bytecode virtual machine. Moving ahead, you'll learn how domain-specific language features are often best represented by operators and functions that are built into the language, rather than library functions. We'll conclude with how to implement garbage collection, including reference counting and mark-and-sweep garbage collection. Throughout the book, Dr. Jeffery weaves in his experience of building the Unicon programming language to give better context to the concepts where relevant examples are provided in both Unicon and Java so that you can follow the code of your choice of either a very high-level language with advanced features, or a mainstream language.By the end of this book, you'll be able to build and deploy your own domain-specific languages, capable of compiling and running programs.

What you will learn

Perform requirements analysis for the new language and design language syntax and semantics
Write lexical and context-free grammar rules for common expressions and control structures
Develop a scanner that reads source code and generate a parser that checks syntax
Build key data structures in a compiler and use your compiler to build a syntax-coloring code editor
Implement a bytecode interpreter and run bytecode generated by your compiler
Write tree traversals that insert information into the syntax tree
Implement garbage collection in your language

Who this book is for

This book is for software developers interested in the idea of inventing their own language or developing a domain-specific language. Computer science students taking compiler construction courses will also find this book highly useful as a practical guide to language implementation to supplement more theoretical textbooks. Intermediate-level knowledge and experience working with a high-level language such as Java or the C++ language are expected to help you get the most out of this book.

]]>

Frequently asked questions

How do I cancel my subscription?

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.

Can/how do I download books?

At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.

What is the difference between the pricing plans?

Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.

What is Perlego?

We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.

Do you support text-to-speech?

Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.

Is Build Your Own Programming Language an online PDF/ePUB?

Yes, you can access Build Your Own Programming Language by Clinton L. Jeffery in PDF and/or ePUB format, as well as other popular books in Computer Science & Programming in Java. We have over one million books available in our catalogue for you to explore.

Information

Publisher

Packt Publishing

Year

2021

ISBN

9781800200333

Topic

Computer Science

Subtopic

Programming in Java

Index

Computer Science

Section 1: Programming Language Frontends

In this section, you will create a basic language design and implement the frontend of a compiler for it, including a lexical analyzer and a parser that builds a syntax tree from an input source file.

This section comprises the following chapters:

Chapter 1, Why Build Another Programming Language?
Chapter 2, Programming Language Design
Chapter 3, Scanning Source Code
Chapter 4, Parsing
Chapter 5, Syntax Trees

Chapter 1: Why Build Another Programming Language?

This book will show you how to build your own programming language, but first, you should ask yourself, why would I want to do this? For a few of you, the answer will be simple: because it is so much fun. However, for the rest of us, it is a lot of work to build a programming language, and we need to be sure about it before we make a start. Do you have the patience and persistence that it takes?

This chapter points out a few good reasons for building your own programming language, as well as some situations where you don't have to build your contemplated language; after all, designing a class library for your application domain might be simpler and just as effective. However, libraries have their downsides, and sometimes only a new language will do.

After this chapter, the rest of this book will, having considered things carefully, take for granted that you have decided to build a language. In that case, you should determine some of the requirements for your language. But first, we're going to cover the following main topics in this chapter:

Motivations for writing your own programming language
The difference between programming languages and libraries
The applicability of programming language tools to other software projects
Establishing the requirements for your language
A case study that discusses the requirements for the Unicon language

Let's start by looking at motivations.

So, you want to write your own programming language…

Sure, some programming language inventors are rock stars of computer science, such as Dennis Ritchie or Guido van Rossum! But becoming a rock star of computer science was easier back then. I heard the following report a long time ago from an attendee of the second History of Programming Languages Conference: The consensus was that the field of programming languages is dead. All the important languages have been invented already. This was proven wildly wrong a year or two later when Java hit the scene, and perhaps a dozen times since then when languages such as Go emerged. After a mere six decades, it would be unwise to claim our field is mature and that there's nothing new to invent that might make you famous.

Still, celebrity is a bad reason for building a programming language. The chances of acquiring fame or fortune from your programming language invention are slim. Curiosity and desire to know how things work are valid reasons, so long as you've got the time and inclination, but perhaps the best reasons for building your own programming language are need and necessity.

Some folks need to build a new language or a new implementation of an existing programming language to target a new processor or compete with a rival company. If that's not you, then perhaps you've looked at the best languages (and compilers or interpreters) available for some domain that you are developing programs for, and they are missing some key features for what you are doing, and those missing features are causing you pain. Every once in a blue moon, someone comes up with a whole new style of computing that a new programming paradigm requires a new language for.

While we are discussing your motivations for building a language, let's talk about the different kinds of languages, organization, and the examples this book will use to guide you. Each of these topics is worth looking at.

Types of programming language implementations

Whatever your reasons, before you build a programming language, you should pick the best tools and technologies you can find to do the job. In our case, this book will pick them for you. First, there is a question of the implementation language that you are building your language in. Programming language academics like to brag about writing their language in that language itself, but this is usually only a half-truth (or someone was being very impractical and showing off at the same time). There is also the question of just what kind of programming language implementation to build:

A pure interpreter that executes source code itself
A native compiler and a runtime system, such as in C
A transpiler that translates your language into some other high-level language
A bytecode compiler with an accompanying bytecode machine, such as Java

The first option is fun but usually too slow. The second option is the best, but usually, it's too labor-intensive; a good native compiler may take many person-years of effort.

While the third option is by far the easiest and probably the most fun, and I have used it before with success, if it isn't a prototype, then it is sort of cheating. Sure, the first version of C++ was a transpiler, but that gave way to compilers and not just because it was buggy. Strangely, generating high-level code seems to make your language even more dependent on the underlying language than the other options, and languages are moving targets. Good languages have died because their underlying dependencies disappeared or broke irreparably on them. It can be the death of a thousand small cuts.

This book chooses the fourth option: we will build a bytecode compiler with an accompanying bytecode machine because that is a sweet spot that gives the most flexibility while still offering decent performance. A chapter on native code compilation is included for those of you who require the fastest possible execution.

The notion of a bytecode machine is very old; it was made famous by UCSD's Pascal implementation and the classic SmallTalk-80 implementation, among others. It became ubiquitous to the point of entering lay English with the promulgation of Java's JVM. Bytecode machines are abstract processors interpreted by software; they are often called virtual machines (as in Java Virtual Machine), although I will not use that terminology because it is also used to refer to software tools that use real hardware instruction sets, such as IBM's classic platforms or more modern tools such as Virtual Box.

A bytecode machine is typically quite a bit higher level than a piece of hardware, so a bytecode implementation affords much flexibility. Let's have a quick look at what it will take to get there…

Organizing a bytecode language implementation

To a large extent, the organization of this book follows the classic organization of a bytecode compiler and its corresponding virtual machine. These components are defined here, followed by a diagram to summarize them:

A lexical analyzer reads in source code characters and figures out how they are grouped into a sequence of words or tokens.
A syntax analyzer reads in a sequence of tokens and determines whether that sequence is legal according to the grammar of the language. If the tokens are in a legal order, it produces a syntax tree.
A semantic analyzer checks to ensure that all the names being used are legal for the operations in which they are being used. It checks their types to determine exactly what operations are being performed. All this checking makes the syntax tree heavy, laden with the extra information about where variables are declared and what their types are.
An intermediate code generator figures out memory locations for all the variables and all the places where a program may abruptly change execution flow, such as loops and function calls. It adds them to the syntax tree and then walks this even fatter tree before building a list of machine-independent intermediate code instructions.
A final code generator turns the list of intermediate code instructions into the actual bytecode in a file format that will be efficient to load and execute.

Independent from the steps of this bytecode virtual machine compiler, a bytecode interpreter is written to load and execute programs. It is a gigantic loop with a switch statement in it, but for exotic programming languages, the compiler might be no big deal and all the magic will happen in the bytecode interpreter. The whole organization can be summarized by the following diagram:

Figure 1.1 – Phases and dataflow in a simple programming language

It will take a lot of code to illustrate how to build a bytecode machine implementation of a programming language. How that code is presented is important and will tell you what you need to know going in, and much of what you may learn from going through this book.

Languages used in the examples

This book provides code examples in two languages using a parallel translations model. The first language is Java, because that language is ubiquitous. Hopefully, you know it or C++ and will be able to read the examples with intermediate proficiency. The second example language is the author's own language, Unicon. While reading this book, you can judge for yourself which language is better suited to building your own programming language. As many examples as possible will be provided in both languages, and the examples in the two languages will be written as similarly as possible. Sometimes, this will be to the advantage of the lesser language.

The differences between Java and Unicon wi...