Literate programming

Literate Programming by Donald Knuth is the seminal book on literate programming.

Literate programming is a programming paradigm introduced in 1984 by Donald Knuth in which a computer program is given as an explanation of how it works in a natural language, such as English, interspersed (embedded) with snippets of macros and traditional source code, from which compilable source code can be generated. The approach is used in scientific computing and in data science routinely for reproducible research and open access purposes. Literate programming tools are used by millions of programmers today.

The literate programming paradigm, as conceived by Donald Knuth, represents a move away from writing computer programs in the manner and order imposed by the compiler, and instead gives programmers macros to develop programs in the order demanded by the logic and flow of their thoughts. Literate programs are written as an exposition of logic in more natural language in which macros are used to hide abstractions and traditional source code, more like the text of an essay.

Literate programming (LP) tools are used to obtain two representations from a source file: one understandable by a compiler or interpreter, the "tangled" code, and another for viewing as formatted documentation, which is said to be "woven" from the literate source. While the first generation of literate programming tools were computer language-specific, the later ones are language-agnostic and exist beyond the individual programming languages.

History and philosophy

Literate programming was first introduced in 1984 by Donald Knuth, who intended it to create programs that were suitable literature for human beings. He implemented it at Stanford University as a part of his research on algorithms and digital typography. The implementation was called "WEB" since he believed that it was one of the few three-letter words of English that had not yet been applied to computing. However, it resembles the complicated nature of software delicately pieced together from simple materials. The practice of literate programming has seen an important resurgence in the 2010s with the use of computational notebooks, especially in data science.

Concept

Literate programming is writing out the program logic in a human language with included (separated by a primitive markup) code snippets and macros. Macros in a literate source file are simply title-like or explanatory phrases in a human language that describe human abstractions created while solving the programming problem, and hiding chunks of code or lower-level macros. These macros are similar to the algorithms in pseudocode typically used in teaching computer science. These arbitrary explanatory phrases become precise new operators, created on the fly by the programmer, forming a meta-language on top of the underlying programming language.

A preprocessor is used to substitute arbitrary hierarchies, or rather "interconnected 'webs' of macros", to produce the compilable source code with one command ("tangle"), and documentation with another ("weave"). The preprocessor also provides an ability to write out the content of the macros and to add to already created macros in any place in the text of the literate program source file, thereby disposing of the need to keep in mind the restrictions imposed by traditional programming languages or to interrupt the flow of thought.

Advantages

According to Knuth, literate programming provides higher-quality programs, since it forces programmers to explicitly state the thoughts behind the program, making poorly thought-out design decisions more obvious. Knuth also claims that literate programming provides a first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program's creation. The resulting documentation allows the author to restart their own thought processes at any later time, and allows other programmers to understand the construction of the program more easily. This differs from traditional documentation, in which a programmer is presented with source code that follows a compiler-imposed order, and must decipher the thought process behind the program from the code and its associated comments. The meta-language capabilities of literate programming are also claimed to facilitate thinking, giving a higher "bird's eye view" of the code and increasing the number of concepts the mind can successfully retain and process. Applicability of the concept to programming on a large scale, that of commercial-grade programs, is proven by an edition of TeX code as a literate program.

Knuth also claims that literate programming can lead to easy porting of software to multiple environments, and even cites the implementation of TeX as an example.

Contrast with documentation generation

Literate programming is very often misunderstood to refer only to formatted documentation produced from a common file with both source code and comments – which is properly called documentation generation – or to voluminous commentaries included with code. This is the converse of literate programming: well-documented code or documentation extracted from code follows the structure of the code, with documentation embedded in the code; while in literate programming, code is embedded in documentation, with the code following the structure of the documentation.

This misconception has led to claims that comment-extraction tools, such as the Perl Plain Old Documentation or Java Javadoc systems, are "literate programming tools". However, because these tools do not implement the "web of abstract concepts" hiding behind the system of natural-language macros, or provide an ability to change the order of the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth.

Critique

In 1986, Jon Bentley asked Knuth to demonstrate the concept of literate programming for his Programming Pearls column in the Communications of the ACM, by writing a program in WEB. Knuth sent him a program for a problem previously discussed in the column (that of sampling M random numbers in the range 1..N), and also asked for an "assignment". Bentley gave him the problem of finding the K most common words from a text file, for which Knuth wrote a WEB program that was published together with a review by Douglas McIlroy of Bell Labs. McIlroy praised the intricacy of Knuth's solution, his choice of a data structure (a variant of Frank M. Liang's hash trie), and the presentation. He criticized some matters of style, such as the fact that the central idea was described late in the paper, the use of magic constants, and the absence of a diagram to accompany the explanation of the data structure. McIlroy also used the review to critique the programming task itself, pointing out that in Unix (developed at Bell Labs), utilities for text processing (tr, sort, uniq and sed) had been written previously that were "staples", and a solution that was easy to implement, debug and reuse could be obtained by combining these utilities in a six-line shell script. In response, Bentley wrote that:

[McIlroy] admires the execution of the solution, but faults the problem on engineering grounds. (That is, of course, my responsibility as problem assigner; Knuth solved the problem he was given on grounds that are important to most engineers—the paychecks provided by their problem assigners.)

McIlroy later admitted that his critique was unfair, since he criticized Knuth's program on engineering grounds, while Knuth's purpose was only to demonstrate the literate programming technique. In 1987, Communications of the ACM published a followup article which illustrated literate programming with a C program that combined artistic approach of Knuth with engineering approach of McIlroy, with a critique by John Gilbert.

Workflow

Implementing literate programming consists of two steps:

  1. Weaving: Generating a comprehensive document about the program and its maintenance.
  2. Tangling: Generating machine executable code

Weaving and tangling are done on the same source so that they are consistent with each other.

Example

A classic example of literate programming is the literate implementation of the standard Unix wc word counting program. Knuth presented a CWEB version of this example in Chapter 12 of his Literate Programming book. The same example was later rewritten for the noweb literate programming tool. This example provides a good illustration of the basic elements of literate programming.

Creation of macros

The following snippet of the wc literate program shows how arbitrary descriptive phrases in a natural language are used in a literate program to create macros, which act as new "operators" in the literate programming language, and hide chunks of code or other macros. The mark-up notation consists of double angle brackets ("<<...>>") that indicate macros, the "@" symbol which indicates the end of the code section in a noweb file. The "<<*>>" symbol stands for the "root", topmost node the literate programming tool will start expanding the web of macros from. Actually, writing out the expanded source code can be done from any section or subsection (i.e. a piece of code designated as "<<name of the chunk>>=", with the equal sign), so one literate program file can contain several files with machine source code.

Thepurposeofwcistocountlines,words,and/orcharactersinalistoffiles.The
numberoflinesinafileis......../moreexplanations/

Here,then,isanoverviewofthefilewc.cthatisdefinedbythenowebprogramwc.nw:
<<*>>=
<<Headerfilestoinclude>>
<<Definitions>>
<<Globalvariables>>
<<Functions>>
<<Themainprogram>>
@

WemustincludethestandardI/Odefinitions,sincewewanttosendformattedoutput
tostdoutandstderr.
<<Headerfilestoinclude>>=
#include<stdio.h>
@

The unraveling of the chunks can be done in any place in the literate program text file, not necessarily in the order they are sequenced in the enclosing chunk, but as is demanded by the logic reflected in the explanatory text that envelops the whole program.

Program as a web

Macros are not the same as "section names" in standard documentation. Literate programming macros hide the real code behind themselves, and be used inside any low-level machine language operators, often inside logical operators such as "if", "while" or "case". This can be seen in the following wc literate program.

Thepresentchunk,whichdoesthecounting,wasactuallyoneof
thesimplesttowrite.Welookateachcharacterandchangestateifitbeginsorends
aword.

<<Scanfile>>=
while(1){
<<Fillbufferifitisempty;breakatendoffile>>
c=*ptr++;
if(c>' '&&c<0177){
/* visible ASCII codes */
if(!in_word){
word_count++;
in_word=1;
}
continue;
}
if(c=='\n')line_count++;
elseif(c!=' '&&c!='\t')continue;
in_word=0;
/* c is newline, space, or tab */
}
@

The macros stand for any chunk of code or other macros, and are more general than top-down or bottom-up "chunking", or than subsectioning. Donald Knuth said that when he realized this, he began to think of a program as a web of various parts.

Order of human logic, not that of the compiler

In a noweb literate program besides the free order of their exposition, the chunks behind macros, once introduced with "<<...>>=", can be grown later in any place in the file by simply writing "<<name of the chunk>>=" and adding more content to it, as the following snippet illustrates ("plus" is added by the document formatter for readability, and is not in the code).

 The grand totals must be initialized to zero at the beginning of the program.
If we made these variables local to main, we would have to do this  initialization
explicitly; however, C globals are automatically zeroed. (Or rather,``statically
zeroed.'' (Get it?)

    <<Global variables>>+=
    long tot_word_count, tot_line_count,
         tot_char_count;
      /* total number of words, lines, chars */
    @

Record of the train of thought

The documentation for a literate program is produced as part of writing the program. Instead of comments provided as side notes to source code a literate program contains the explanation of concepts on each level, with lower level concepts deferred to their appropriate place, which allows for better communication of thought. The snippets of the literate wc above show how an explanation of the program and its source code are interwoven. Such exposition of ideas creates the flow of thought that is like a literary work. Knuth wrote a "novel" which explains the code of the interactive fiction game Colossal Cave Adventure.

Remarkable examples

  • Axiom, which is evolved from scratchpad, a computer algebra system developed by IBM. It is now being developed by Tim Daly, one of the developers of scratchpad, Axiom is totally written as a literate program.

Literate programming practices

The first published literate programming environment was WEB, introduced by Knuth in 1981 for his TeX typesetting system; it uses Pascal as its underlying programming language and TeX for typesetting of the documentation. The complete commented TeX source code was published in Knuth's TeX: The program, volume B of his 5-volume Computers and Typesetting. Knuth had privately used a literate programming system called DOC as early as 1979. He was inspired by the ideas of Pierre-Arnoul de Marneffe. The free CWEB, written by Knuth and Silvio Levy, is WEB adapted for C and C++, runs on most operating systems and can produce TeX and PDF documentation.

There are various other implementations of the literate programming concept as given below. Many of the newer ones among these don't have macros and hence violate the order of human logic principle, which makes them more of semi-literate tools. These however, allow cellular execution of code which makes them more on the likes of exploratory programming tools.

Name Supported Languages Written in Markup Language Macros & Custom Order Cellular Execution Comments
WEB Pascal Pascal TeX Yes No The first published literate programming environment.
CWEB C++ and C C TeX Yes No Is WEB adapted for C and C++.
NoWEB Any C, AWK, and Icon LaTeX, TeX, HTML and troff Yes No It is well known for its simplicity and it allows for text formatting in HTML rather than going through the TeX system.
Literate Any D Markdown Yes No Supports TeX equations. Compatible with Vim (literate.vim)
FunnelWeb Any C HTML and TeX Yes? It has more complicated markup, but has many more flexible options
NuWEB Any C++ LaTeX It can translate a single LP source into any number of code files. It does it in a single invocation; it does not have separate weave and tangle commands. It does not have the extensibility of noweb
pyWeb Any Python ReStructuredText Yes Respects indentation which makes usable for the languages like Python, though you can use it for any programming language.
Molly Any Perl HTML Aims to modernize and scale it with "folding HTML" and "virtual views" on code. It uses "noweb" markup for the literate source files.
Codnar Ruby It is an inverse literate programming tool available as a Ruby Gem. Instead of the machine-readable source code being extracted out of the literate documentation sources, the literate documentation is extracted out of the normal machine-readable source code files.
Emacs org-mode Any Emacs Lisp Plain text Requires Babel, which allows embedding blocks of source code from multiple programming languages within a single text document. Blocks of code can share data with each other, display images inline, or be parsed into pure source code using the noweb reference syntax.
CoffeeScript CoffeeScript CoffeeScript, JavaScript Markdown CoffeeScript supports a "literate" mode, which enables programs to be compiled from a source document written in Markdown with indented blocks of code.
Maple worksheets Maple (software) XML Maple worksheets are a platform-agnostic literate programming environment that combines text and graphics with live code for symbolic computation."Maple Worksheets". www.maplesoft.com. Retrieved 2020-05-30.
Wolfram Notebooks Wolfram Language Wolfram Language Wolfram notebooks are a platform-agnostic literate programming method that combines text and graphics with live code.
Playgrounds Swift (programming language) Provides an interactive programming environment that evaluates each statement and displays live results as the code is edited. Playgrounds also allow the user to add Markup language along with the code that provide headers, inline formatting and images.
Jupyter Notebook, formerly IPython Notebook Python and any with a Jupyter Kernel JSON format Specification for ipynb No Yes Works in the format of notebooks, which combine headings, text (including LaTeX), plots, etc. with the written code.
Jupytext plugin for Jupyter Many Languages Python Markdown in comments No Yes
nbdev Python and Jupyter Notebook nbdev is a library that allows you to develop a python library in Jupyter Notebooks, putting all your code, tests and documentation in one place.
Julia (programming language) Pluto.jl is a reactive notebook environment allowing custom order. But web-like macros aren't supported. Yes Supports the iJulia mode of development which was inspired by iPython.
Agda (programming language) Supports a limited form of literate programming out of the box.
Eve programming language Programs are primarily prose. Eve combines variants of Datalog and Markdown with a live graphical development environment
R Markdown Notebooks (or R Notebooks) R, Python, Julia and SQL PDF, Microsoft Word, LibreOffice and presentation or slide show formats plus interactive formats like HTML widgets No Yes
Quarto R, Python, Julia and Observable PDF, Microsoft Word, LibreOffice and presentation or slide show formats plus interactive formats like HTML widgets No Yes
Sweave R PDF
Knitr R LaTeX, PDF, LyX, HTML, Markdown, AsciiDoc, and reStructuredText
Codebraid Pandoc, Rust, Julia, Python, R, Bash Python Markdown No Yes
Pweave Python PDF No
MATLAB Live Editor MATLAB Markdown No Yes
Inweb C, C++, Inform 6, Inform 7 C, CWEB TeX, HTML Yes? Used to write the Inform Programming Language since 2004.
Mercury Python Python, TypeScript JSON format specification for ipynb Mercury turns Jupyter Notebook into interactive computational documents. They can be published as web application, dashboards, reports, REST API, or slides. The executed document can be exported as standalone HTML or PDF file. Documents can be scheduled for automatic execution. The document presence and widgets are controlled with YAML header in the first cell of the notebook.
Observable JavaScript JavaScript, TypeScript TeX(KaTeX), HTML Stored on the cloud with web interface. Contents are publishable as websites. Version controlled; the platform defines its own version control operations. Code cells can be organized out-of-order; observable notebooks will construct the execution graph (a DAG) automatically. A rich standard library implemented with modern features of JavaScript. Cells from different observable notebooks can reference each other. Npm libraries can be imported on the fly.
Ganesha JavaScript, TypeScript JavaScript Markdown Enables Node.js to load literate modules, represented by Markdown files containing JavaScript or TypeScript code interspersed with richly formatted prose. Supports bundling literate modules for browsers when using the Rollup or Vite frontend module bundlers.
JWEB C, C++, JavaScript, TypeScript JavaScript Markdown Yes No

Other useful tools include:

  • The Leo text editor is an outlining editor which supports optional noweb and CWEB markup. The author of Leo mixes two different approaches: first, Leo is an outlining editor, which helps with management of large texts; second, Leo incorporates some of the ideas of literate programming, which in its pure form (i.e., the way it is used by Knuth Web tool or tools like "noweb") is possible only with some degree of inventiveness and the use of the editor in a way not exactly envisioned by its author (in modified @root nodes). However, this and other extensions (@file nodes) make outline programming and text management successful and easy and in some ways similar to literate programming.
  • The Haskell programming language has native support for semi-literate programming. The compiler/interpreter supports two file name extensions: .hs and .lhs; the latter stands for literate Haskell.
The literate scripts can be full LaTeX source text, at the same time it can be compiled, with no changes, because the interpreter only compiles the text in a code environment, for example:
% here text describing the function:
\begin{code}
fact0=1
fact(n+1)=(n+1)*factn
\end{code}
here more text
The code can be also marked in the Richard Bird style, starting each line with a greater than symbol and a space, preceding and ending the piece of code with blank lines.
The LaTeX listings package provides a lstlisting environment which can be used to embellish the source code. It can be used to define a code environment to use within Haskell to print the symbols in the following manner:
\newenvironment{code}{\lstlistings[language=Haskell]}{\endlstlistings}

\begin{code}
comp::(beta->gamma)->(alpha->beta)->(alpha->gamma)
(g`comp`f)x=g(fx)
\end{code}
which can be configured to yield:
Although the package does not provide means to organize chunks of code, one can split the LaTeX source code in different files. See listings manual for an overview.
  • The Web 68 Literate Programming system used Algol 68 as the underlying programming language, although there was nothing in the pre-processor 'tang' to force the use of that language.
  • The customization mechanism of the Text Encoding Initiative which enables the constraining, modification, or extension of the TEI scheme enables users to mix prose documentation with fragments of schema specification in their One Document Does-it-all format. From this prose documentation, schemas, and processing model pipelines can be generated and Knuth's Literate Programming paradigm is cited as the inspiration for this way of working.

See also

  • Documentation generator – the inverse on literate programming where documentation is embedded in and generated from source code
  • Notebook interface – virtual notebook environment used for literate programming
  • Sweave and Knitr – examples of use of the "noweb"-like Literate Programming tool inside the R language for creation of dynamic statistical reports
  • Self-documenting code – source code that can be easily understood without documentation

This page was last updated at 2023-12-01 23:14 UTC. Update now. View original page.

All our content comes from Wikipedia and under the Creative Commons Attribution-ShareAlike License.


Top

If mathematical, chemical, physical and other formulas are not displayed correctly on this page, please useFirefox or Safari