Temba, His Arms open()
Languages have rules about how their words or signs fit together (syntax). They have patterns of how their words or signs tend to sound or look, how some elements are built out of others, and how poets break the rules to make things interesting. One can have words that look like they belong to a language (sequenced, conjugated, declined, and capitalised correctly) and yet not be valid examples of that language. Likewise, one can have elements of a language used correctly according to all its rules, and their combination still has no meaning according to its semantics, even to poets
Software .. [are] systems of meaning – Kevlin Henny
Software languages, while borrowing words from human languages, are even more rigid in their grammar, syntax, and context. They still hide incorrect meanings in technically-correct text. The danger is not ungrammatical code, because compilers catch that. The danger is code that reads as though it means one thing and does another
This And That #
Even seemingly obvious things like conjunctions are not safe. Natural language is deeply context-dependent while programming languages are not.
In English, one might say “Find all the doctors and dentists.” A programmer translating this literally might write:
where profession = “doctor” AND profession = “dentist”
But no record can have two values in the same field simultaneously. The correct expression is:
where profession = “doctor” OR profession = “dentist”
The English word “and” performs a union of two sets of people — doctors over here, dentists over there, combined. The programming operator AND performs an intersection of two conditions on a single record. They are not the same operation. The English is unambiguous to any native speaker
Combinations of AND and OR compound the problem. “Get three oranges or tangerines and a dozen eggs” would rarely result in omitting the eggs if tangerines are unavailable becausea a human infers the grouping from context and common sense. But computer code has this sort of vaudeville misunderstanding built in. Most languages evaluate AND before OR, so without explicit parentheses, oranges OR tangerines AND eggs means oranges OR (tangerines AND eggs), not (oranges OR tangerines) AND eggs. The parentheses are not optional decoration; they are the meaning
What Language Are You Using? #
Some tools try to close that gap by making code look like English. They tend to succeed at the reading end and fail at the writing end
Cucumber is a testing tool that stores its tests in English-like phrases1. For example:
Given I am accessing the system with proper authentication
When I shall transfer with enough balance in my source account
And the destination details are correct
And I supply the correct transaction password
And I press or click the send button
Then amount will transfer
And the event will be logged in the log file
Reading this, one understands most of what it means. Writing it is another matter entirely because the tool does not understand English. It does not know what “destination details” or “are correct” mean. Programmers defined those strings of characters to trigger specific coded behaviours that have only an approximate relationship to the words in those strings. Cucumber offers syntactical sugar that allows a rule to read like an English sentence by interpreting “the destination details are correct” as set_correct_destination(), but the mapping is arbitrary and must be defined and maintained by a programmer
While any English speaker will understand that “I supply the correct destination details” means the same thing as “the destination details are correct”, the system will not, because the strings are not identical. English semantics apply when a human reads these rules. Only the semantics explicitly created for this particular Cucumber setup apply when writing them. The language looks shared but is not.
This is the second gap: the map is not the terrain. The English-shaped surface conceals the model underneath. It hides the way programmers think about rules and steps, which is not the same as how product managers, customer support staff, or designers think about them. A sentence that reads clearly to everyone may mean something specific and technical to the code — or may mean nothing at all until a programmer has defined it.
COBOL pioneered this approach and made the same promise:
This “natural language interface” is a cargo-cult use of English, somewhat like COBOL’s
“COBOL statements have an English-like syntax, which was designed to be self-documenting and highly readable”2
The English syntax and tokens (words) make it easy to read for many, but much of the information is implicit and hidden (e.g. what “correct” really means). It hides the way programmers think about rules and steps, which is different than how others (product managers, customer support, designers) think about them
Rails uses more words than symbols or notation to encapsulate meaning. Instead of memorizing and referencing a collection of function names, Rails uses simple nouns, verbs, and symbols, as well as an expectation of what the user will name their types. This results in code that resembles English sentences but has grammatic idiosyncracies of any other programming framework
expect(response).to have_http_status 200
expect(json).not_to be_empty
expect(json[‘id’].to equ(expected_id)
For example, most Rails methods that might raise an error end with an exclamation mark. These all have non-exception-raising versions of the same name without the exclamation mark. However, the method find will raise an exception RecordNotFound, while find_by_id does not. This difference is not intuitive or inferrable; it’s Just The Way It Works. The worst sin is thinking that the English syntax allows non-programmers to code “because it’s just English”
Programming is not just knowing a language or symbols; it’s how one thinks
Expert knowledge of English grants one no more expertise in Cucumber (or Rails or COBOL) than familiarity with bricks allows one to build a shopping mall. The English surface is a reading aid, not a writing interface. It makes code more legible; it does not make programming more accessible
Technically correct text can hide incorrect meaning. English-shaped syntax can hide the fact that there is a programming model underneath — one that is precise, inflexible, and entirely indifferent to what the words look like
The map is not the terrain. The terminology is not the meaning