Tutorials

Static typing tutorial

Static typing of variables

Variables in your program have a meaning, and what they mean can be seen from their names. Suppose you have a variable called current_speed. It will probably hold a floating point number denoting the speed of some object. It would be weird if all of a sudden this variable would e.g. hold the name of a city. What's less clear is wether or not current_speed is a vector or just a number. A vector has a size and a direction, and is usually represented by an x, y and z component. So while variable names may help, they don't tell the whole story. Of course you may name your variable current_speed_vector. But using such a homebrew type system correctly is all left to your personal discipline. And it's quite a lot of typing on top of that.

Suppose you have a variable called number_of_persons. It is rather obvious that assigning 3.5 to this variable is probably an error. But that multiplying it by 0.5 is also questionable is less obvious.

It would be nice if mistakes like that could be caught early, preferable in advance, even if the offending code is in the branch of an if-statement that's rarely hit. And it would be nice if no run-time checks were needed, since these cost time.

Static typing: a little bit of history for those interested

Machine language and assembly

There was a time when computers were programmed directly in bits and bytes: machine language. Such bits and bytes could denote data: text strings, numbers, truth values or whatever a programmer choose to use them for. They also could denote instructions on what to do with the data. Since these bits and bytes were hard to read for a human being, they were replaced by short codes that were easier to remember. Such codes, called mnemonics, could again denote instructions or data. These codes were assembled (translated) into machine code by a program aptly called an assembler. The language that consisted of these codes was called assembly language.

Codes for instructions, called mnemonics, were e.g. load, add or jump, making the processor load some data from memory, add two lumps of data or jump to another instruction respectively. Codes that represented data came in two flavors. Some, called literals, stood for immutable constants, like 'f' for the bit pattern representing the character 'f' and 128 for the bit pattern representing the number 128. Others, called identifiers, were names for memory addresses of variables, locations in memory where modifiable data was stored. A variable name like count could denote a memory address destined to store an integer, obviously used to count something. There wasn't any protection against the programmer storing something completely different at that address, e.g. a floating point number, occupying more bytes than were reserved or three bytes representing the human readable characters '1', '2' and '3' rather than the one byte binary representation of the number 123. Errors like that led to a variety of bugs named mixed type bugs.

Assemblers became more clever, gaining the ability to replace identifiers by whole blocks of code. By repeatedly using such an identifier, a certain block of code could be inserted in several places. This replacement was called macro expansion, and the assembler program that could handle it a macro assembler. When expanded at different locations, macro's took different variables to work on. Such variables supplied to a macro were called parameters. Macros with parameters were a precursor to functions, that aren't copied over and over again but rather called from different locations.

Typed programming languages

Working directly in assembly language was often tedious. Each processor had its own language and portability lay hidden in the future. Well, not completely. Quite early on, there were the so called high level languags like COBOL, FORTRAN, ALGOL and PL1. They departed from the human side of things, favouring readability and portability over speed and compactness. To prevent accidentally mixing up datatypes, all variables had a fixed, immutable datatype, e.g. integer, or float, or string. Programs witten in these english-like languages were compiled into machine language by a program called, well, obviously, a compiler.

Still not all high level languages were compiled and not all of them had static typing. IBM came up with a simplified version of FORTRAN called BASIC: Beginners All Purpose Standard Instruction Code. The first BASIC programs used a grossly simplified type system. Variables ending on a $ were strings, the rest were numbers. What was above all new to BASIC is that it wasn't compiled in advance, but interpreted while the program was running by, well, an interpreter. While BASIC programs often were messy and slow, the language became immensely popular on the first home computers, since it required very little resources.

Compilers at that time were bulky and slow, but they caught a lot of errors before a program had ever run, due to their static typing system. Gradually some leaner compilers came into existence, the first one for a language called Pascal, designed by the austrian Niklaus Wirth. Pascal compilers were small and cheap and the generated code was fast. Pascal had a very rigid type system, that many times came in the way of programmers. But a small firm called Borland, loosened it somewhat, making an immensely popular compiler called Turbo Pascal. Static typing became self-evident and assemblers were forced into a niche.

But at the same time the rigidity of Pascal made some people think, most notably Brian Kernighan and Dennis Ritchie. They came up with a language that at first sight held the middle ground between a macro assembler on one hand, and Pascal on the other: A typed programming language which supported functions as well as macro's, and typed variables as well as typeless ones, refered to by so called void pointers. Programmers enjoyed this typeless freedom without much worries and void pointers became synonym to trouble. If you wanted your data to have a varying type, you just created a void pointer to it. No checks, no overhead, it was said that 'C provides all the rope you need to hang yourself'. C had an awkward syntax and compiler errors were often cryptic.

To address the shortcomings of C's void pointers, its successor C++ introduced two concepts that made typing more flexible than in Pascal, while retaining the possibility to have the compiler catch type incompatibilities. First the concept of polymorphism was introduced. The idea here was that if you need a mamal, you'll be satisfied with e.g. a cat, a dog or a horse. But when you need a dog, you'll not be satisfied with a cat or a horse. So specialized types can be used were general ones are allowed, but not the other way around. The second concept introduced was generic typing, called templates in C++. Generic types were immutable, but depending on the use of e.g. a function with generically typed parameters, a version of this function with the right, immutable parameter types was generated by the compiler. Both these concepts were quite powerful, and C++ can safely be regarded as a very effective language, combining superior speed with almost unlimited expressiveness. The downside to this all was a rather daunting complexity. This is not to say that C++ isn't relevant anymore. It is utterly relevant, for fast computions, graph traversal algorithms, operating systems and hypeless things like that. There is and will remain a large category of problems that requires the power of C++.

At the same time, even in computation intensive applications, there's a lot of code where execution speed is less important than development speed. This is where Python, created by Guido van Rossum at the CWI in the Netherlands comes in. With Python, readability and flexibility are primary design goals. Python programs are compiled to something called bytecode, executed by a small program called a virtual machine. If Python has to run on new hardware, the only thing that has to be ported is the virtual machine. The 'feet on the ground', realistic, experience driven background of Python shines through all of the design, and the language seems to sell itself: it's becoming ever more popular. One of the very strong points of Python is its smooth interoperability with C++. Any timecritical CPpython library is written in C or C++. So even though the language itself is interpreted, mathematical operations like the ones in the Numpy and Scipy libraries are fast. Unfortunately attempting to run C code in the browser still results in very large downloads, which creates a niche for tools like Transcrypt.

Python is a dynamically typed (runtime typed) language. It doesn't have the rigidity of Pascal, nor the riskyness of C, nor the complexity of C++. Variables can hold references to all kinds of data objects during program execution, but the interpreter always exactly knows what's there and can deal with it appropriately. This is a far cry from C's void pointers, which were mere memory addresses that could just as well belong to a variable of a completely unknown type as to a piece of safety critical operating system code. When switching from C++ to Python, a tenfold increase in development speed is realistically achievable. Still there's a problem...

What's the problem with computer programming?

People learn to master a programming language, and when that is done, they start to program. They may e.g. start out in an academic setting, writing algorithms. Such programs may range from e.g. 1 kB to 100 kB of source code. Still an overview can be kept. But then these small algorithmic programs get joined into larger applications, worked on by teams of developers, some of them specialized on databases, others on GUI's, and yet others of web front-ends. The amount of code grows rapidly and it becomes apparent that the problem with computer programming is organizational complexity. Not local complexity, some difficult screenful of code, but global complexity: How does it all fit together. While dynamically typed languages offer excellent development speed with very little overhead for a small to medium scale project worked on by a small, fixed team, for large scale projects, worked on by large teams of varying composition, the lack of type checking can lead to chaos. It as with organizations. If they are small, informal interaction between people and departments works fine. But for a larger organization, rules are needed. This makes them rigid, but sometimes that just cannot be avoided. While communication between large departments sometines has to be formalized, local communication can often be informal, even in a large company.

This metaphore translates well to computer programming. The flexible way in which dynamically typed pieces of code interact, is invaluable. Note that 'flexible' doesn't stand for 'thoughtlessly designed' here. Designing effective, maintainable, expandable dynamically typed code is truly an art, and takes a lot of experience. But statically validated types have their advantages too. If I design an algorithm that works with vectors of integers, why not make sure that that's what actually supplied to my algorithm: vectors of integers, so not e.g. matrices of floats. Once my algorithm finds its place in a large software system, I can be sure it is supplied with the right type of data, otherwise the type validator will complain.

In a large software system, anything that helps catching bugs early is welcome. Static typing is one of those things. At its core, Python is and will remain a dynamically typed language, that won't get in the way of developers needing flexibility. But optional typechecks are like security personal in a department store. They don't meddle with the primary process, which is selling goods. But, stationed at strategical locations, they prevent things from getting out of hand.

Where to use static typing

An obvious answer to the question where static typing is best applied, would be: in large programs. But that would be too much of a simplification. Static typing can also be very useful in small algorithms that by nature work with fixed datatypes. So where should you use it? It's upto you! That may not seem a very satisfying answer, but it is the truth. There are no dogma's in programming. You're invited to experiment with static type validation and discover its merits. There's a fair chance you'll use it in many places, especially once you've looked back at a piece of code you wrote months ago and find it very easy to comprehend and use, due to the extra information supplied by the type hints. It would be a very big loss if Python became a stricktly typed language. This won't happen. But static type evaluation is an extremely powerful add-on that will make this versatile language even more versatile. If you take programming seriously, you should have it in your toolbox.

And Typescript?

The mere act of adding type validation to a language doesn't change its fundamentals. Typescript is for JavaScript programmers. Transcrypt is for Python programmers. That's it.