PMW HACKING NOTES
=================

This document contains a few notes about the internals of PMW for the benefit
of anyone wanting to make changes. See also the file src/README, comments
within the code modules, and the files doc/fontspec.pdf and fontmaint/README
for information about the PMW-Music and PMW-Alpha fonts.

This codebase started life as a BCPL program in 1987, and has been heavily
modified over several decades. At various times it ran on a number of different
operating systems, though now it is just a C program that reads and writes
files, so it should run wherever it can be compiled. The only system-specific
code is for finding the user's ~/.pmwrc file in a Unix-like environment, but
this can be cut out at compile time.

Files for the common "./configure; make; make install" paradigm are supplied
with each release tarball (see below). The GitHub repository contains files
called "configure.ac" and "Makefile.in", but not "configure" or "Makefile". The
latter are created when you run the "autoconf" command. The top-level Makefile
calls src/Makefile to build the pmw binary within the src directory.


DEBUGGING AIDS
==============

The command line -d option can be used to request debugging output. It must be
followed, without any space, by one or more debug selectors, separated by plus
or minus signs, which add or remove the selector, respectively. For example:

  pmw -d+headers some.file

The -help option lists all the selectors. The "trace" selector tracks the path
through the various code functions. The various "bar" selectors output the
internal bar data at different parts of the processing; "bar" and "barR" are
synonymous, and show the data immediately after the input has been read; "barP"
shows it after pagination; "barO" after the main output is written. If MusicXML
output is requested, "barX" shows the bar data if it gets modified for underlay
continuation handling. Without other options, the "bar" selectors show all
bars; the -dbd option can be used to restrict the output to a single bar, and
it implies -d+bar.

The command line -testing option makes some modifications to the output that
PMW generates, to aid automatic and manual testing. It can be followed by a
number that defaults to 2. For PostScript output, any non-zero value has the
effect of the 0x01 bit, and also suppresses the inclusion of the PostScript
header file.

The 0x01 bit suppresses version and date information in the output so that
automatic comparisons between versions are straightforward. The RunTests script
in the "testing" directory makes use of this.

The 0x02 bit applies to PDF output only: it causes comments to be inserted at
page and bar boundaries in the output, to help human navigation.

The 0x04 bit also applies only to PDF: it causes font programs to be omitted
from the output to reduce the size of test file output and make it easier for
humans to read.

The 0x08 bit forces the output to be red. This is useful when comparing
PostScript versus PDF output, or output from two different versions of PMW. By
displaying red on top of black and vice versa, differences are easy to spot.

See also some comments in the main documentation.


PROCESSING OVERVIEW
===================

The program entry point is in the main.c source file. A PMW run proceeds in
several distinct phases. As long as there are no show-stopping errors, it
happens like this:

1. Initialization: A function called init_command() has the opportunity to
modify the command argument list. Unless the code is omitted by the setting of
NO_PMWRC, the contents of the user's .pmwrc file are inserted at the start of
the argument list. A general-purpose command line parser (coded in rdargs.h and
rdargs.c) is then called from the decode_command() function, and the results
are saved in various global variables.

2. Input: The input file is read and converted into a number of inter-connected
data structures in memory (more details below). The code for reading an input
file in PMW's own format is mainly in the source files whose names begin
"pmw_read_", plus the file called preprocess.c. The files whose names begin
"xml_" contain code for reading input in MusicXML format, an optional feature
that is experimental (and incomplete).

3. Pagination: The next phase processes the in-memory data, assigning bars to
systems, and headings, footings, and systems to pages, that is, it does the
pagination (paginate.c). This results in more data structures that specify what
is to go on each output page.

4. Output: The fourth phase generates either PostScript or PDF output from the
page information. The main output control functions are in out.c; these call
functions in files whose names start with "set". These in turn call either the
functions in ps.c or those in pdf.c. Some sub-functions used in both formats
are in the file pout.c.

5. MIDI: If MIDI output has been requested, there is another phase in which the
code in the midi.c file scans the music data (not the pagination data) to
generate a MIDI file.

6. Similarly, if MusicXML output has been requested, there is a final phase in 
which the music and pagination data are used by the code in the xmlout.c file 
to write one or more MusicXML output files.

The names of source modules not mentioned above should give a clue to their
contents. For example, "font.c" is all about font handling, and "transpose"
contains functions for note and key signature transposition.


MEMORY HANDLING
===============

Most memory block acquisition is in the mem.c module, which handles two types
of memory block. "Independent" blocks are used for large items such as font
tables. Smaller blocks are doled out from larger chunks as required. Some types
of small block that are used and re-used are put on free chains in between. In
today's era of gigabyte memories this is probably overkill.

The large chunks and some independent blocks are hung on a single chain for
freeing at exit time. Certain other types of block are handled independently; a
number of these may be expanded using realloc() as processing proceeds. You can
see all of these listed in the tidy_up() function in main.c, which is called
when the program exits.

Some data is held in binary balanced trees, for speed of searching. The code
for tree handling is in tree.c. Processors are so fast these days that this
probably doesn't matter, but it is an elegant algorithm.


DIMENSIONS
==========

All printing dimensions in PMW are held in millipoints, that is 1/1000 of a
point. Musical dimensions, that is, the notional time lengths of notes, are in
units where a breve is defined as (128*4*5*7*9*11*13). This should cope with
all but the most esoteric subdivisions. Most calculations are done using
fixed-point arithmetic, but occasionally floating point is needed.


STRUCTURES USED FOR STAVE DATA
==============================

The diagram below shows some of the stave data structures that PMW uses. Most
of this is set up as the input file is read, but the data that hangs off the
posvector pointer is created during pagination.

The global vector called "movements" is the root. Within each movement's
structure is a vector of pointers to each stave's data, and there is also a
vector that holds the printable value for each bar's number, because bars can
be uncounted. PMW works with "absolute" bar numbers (starting at 0), using the
barvector when it needs to identify a bar to the user. The format of the values
in barvector is (integer part) << 16 + (fractional part).

Also indexed by absolute bar number is a vector of barposstr structures that
hold information about each bar (independent of any stave). This includes a
pointer to a vector of horizontal positions within the bar, each of which
contains a "musical offset" and a corresponding horizontal offset. The number
of entries is in the barposstr.


  (vector)       (struct)
  movements       movtstr
  ---------    --------------
  |    ---|--->|    ...     |
  ---------    -------------                             ---------     ------
  |       |    | stavetable.|.....vector of 64 pointers: | 0 | 1 | ... | 63 |
  ---------    -------------                             ---------     ------
  |       |    |    ...     |        (vector)              |
  ---------    -------------     -----------------         |
               | barvector -|--->| print bar no. |         |
               -------------     -----------------         |
               |    ...     |    |               |         |
               --------------    -----------------         |
               | posvector -|--  |     ...       |         |
               -------------- |  -----------------         |
               |    ...     | |                            |
               -------------- |      (vector)              |
                              |     barposstr              |
                              |  -----------------         |
                  (vector)    -->|  bar 0 data,  |         |
               --------------    |   including   |         |
               | mus-offset |<---|--- pointer    |         |
               |   x-offset |    -----------------         |
               --------------    |  bar 1 data,  |         |
               |    ...     |    |     etc.      |         |
               --------------    -----------------         |
                                 |     ...       |         |
                                 -----------------         V
                                                        (struct)
      stave item           anchor                       stavestr
       (struct)           (struct)                   -------------
        barstr             barstr        (vector)    |    ...    |
     -------------      -------------    --------    -------------
  ---|-- next    |<-----|-- next    |<---|--    |<---|- barindex |
  |  -------------      -------------    --------    -------------
  |  |   prev  --|-->   |   NULL    |    |      |    |    ...    |
  |  -------------      -------------    --------    -------------
  |  |   type    |      |  b_start  |    | .... |
  |  -------------      -------------    --------
  |  |    ...    |      |  rep num  |
  |  -------------      -------------
  |
  |  -------------                -------------
  -->|   next  --|--->  etc   --->|   NULL    |
     -------------                -------------
  <--|-- prev    |            <---|-- prev    |
     -------------                -------------
     |   type    |                | b_barline |
     -------------                -------------
     |    ...    |                |  bartype  |
     -------------                -------------
                                  |  barstyle |
                                  -------------

Each stave is anchored in a stavestr, within which barindex points to a vector
with pointers to the start of each bar on that stave. A bar's data is a two-way
chain of blocks, all of which start with the chain pointers and a type value.
The rest of each block depends on the type, which is an enum value whose name
starts with "b_". The final item always has the type b_barline. Most of the
individual bar structures have names of the form "b_somethingstr".

For unrepeated bars, the anchoring barstr block has zero in the repeat number
field. When a bar is repeated using the [<number>] syntax, its repeat number is
set to one. Then additional entries in the barindex vector are set up. Each of
these has its own barstr whose "next" field point to the original bar's data,
and whose repeat number is set appropriately. The repeat number is used to
implement the \r\ escape sequence in strings, used for numbering repeated bars.


STRUCTURES USED FOR PAGE INFORMATION
====================================

The following diagram shows the page structures that are set up by the
pagination process, which also sets up the posvector structures as shown above.
Each page is represented as a chain of blocks that are either heading blocks
or system blocks, the latter containing information about musical systems.
There is a separate chain of footing blocks.

                        (struct)
                         pagestr
                    ---------------
main_pageanchor --->|    next   --|---> chain of pagestr blocks
                    ---------------
        ------------|  sysblocks  |
        |           ---------------
        |           |  footing  --|---> chain of footing blocks, which are
        |           ---------------     headstr structs as shown below
        |           |    ...      |
        |           ---------------
        |           | page number |
        |           ---------------
        |
        V
  --------------
  |   next   --|---> chain of sysblocks/headblocks
  --------------
  |   movt   --|---> relevant movement struct
  --------------
  |  sys/head? | TRUE for sysblock
  -------------- FALSE for headblock
       then
  FOR HEADBLOCK
  --------------                              (struct)
  |    ...     |                               headstr
  --------------                          ---------------
  | headings --|------------------------->|    next   --|--->
  --------------                          ---------------
        or                                | space after |
   FOR SYSBLOCK                           ---------------
  --------------                          | string[3] --|---> left/mid/right
  |    ...     |                          ---------------
  --------------                          | font data   |
  | notsuspend |                          ---------------
  --------------                          | space above |
  |    ...     |                          ---------------
  --------------                          |  drawing  --|---> drawing tree
  |  barstart  |  starting bar number     ---------------
  --------------                          |  drawargs --|---> drawitem chain
  |   barend   |  ending bar number       ---------------
  --------------
  |    ...     |
  --------------

The function out_page() outputs a single page; it is called either from within
ps.c or pdf.c. In a sysblock, "notsuspend" is a 64-bit unsigned int that
indicates which staves are included in this system. Stave selection via the
command line or the "selectstaves" directive suspends a stave for an entire
movement. Otherwise, suspension is controlled by the "suspend" directives.


PAGINATION
==========

The function paginate() in the paginate.c module controls the pagination
process. After some initialization, it runs a big loop until page_done becomes
TRUE. The loop is a single switch on the page_state variable, which selects
from the following different action states:

(1) Start of movement: process any heading lines, set up a footing, create a
    barvector, various initialization, change to state 2.

(2) Start of system: initialization, change to state 3.

(3) In mid-system: measure the printing width of the next bar. If the "layout"
    directive was not used, see if the bar will fit on the line, and if the bar
    is too wide, change to state 4. However, always accept the first bar of a
    system. If "layout" was used, accept the bar if it is within the required
    number for the system. In both cases of acceptance, change to state 4 if it
    was the last bar of the movement; otherwise, remain in state 3.

(4) End of system: apply any required horizontal stretching to the system, and
    then see if the system (and any footnotes) fits on the current page. If
    not, start a new page. If there are no more bars, change to state 5,
    otherwise change to state 2.

(5) End of movement: end the loop if it was the last movement. Otherwise,
    prepare for another movement and change to state 1.

State 3 is probably the most complicated. A function called makepostable()
creates a table that contains the positions of the various items in a bar, and
so determines the horizontal width. This is the data that is accessed via the
posvector field in the movement structure and is used when generating the
output.

A complication arises in state 3 when the bar that follows an accepted system
starts with a key or time signature and a cautionary version is required at the
end of the system. PMW checks to see whether this can fit alongside the
accepted bars. If not, it has to back up by removing the final bar that it
accepted, and re-stretch the system if horizontal justfication is enabled.


FONT HANDLING
=============

For each required typeface there is a fontstr block that contains the font's
data such as its name, character widths, etc. These are held as a vector of
structs called font_list. Pointers to entries in font_list are held in a vector
called font_table, which is indexed by an identifier such as font_rm, font_it,
etc.

     font_table                               font_list
  ----------------                          -------------
  |            --|---> Roman font --e.g.--->|  data for |  i.e. a fontstr
  ----------------                          |   a font  |
  |            --|---> Italic font          ------------
  ----------------                              ...
  |            --|---> Bold font                ...
  ----------------                          -------------
  |            --|---> Bold italic --e.g.-->|   name    |
  ----------------                          -------------
  |            --|---> Symbol font          |  widths --|---> table of widths
  ----------------                          -------------
  |            --|---> Music font           |   kerns --|---> table of kerns
  ----------------                          -------------
  |            --|---> First "extra" font   | more data |
  ----------------                          -------------
        ...                                     ...
  ----------------                              ...
  |            --|---> Last "extra" font
  ----------------

A block called fontinststr contains the date for a specific instance of a font.
It contains a pointer to a transformation matrix (or NULL), a point size for
the font, and extra width to add to spaces, a field that is used when
justifying headings, footings, and footnotes. The "font data" field in headstr
blocks (see above) is in fact a fontinststr block.

Each movement structure has field called fontsizes, which points to a struct
called fontsizestr. This contains fontinstr structures for various musical
constructs that require different font sizes, for example, normal notes, cue
notes, grace notes, bar numbers. It also contains size information for text
fonts (such as underlay) and the fixed and variable user-specified sizes.

      movtstr
  ---------------          struct of
  |     ...     |           structs
  ---------------       ---------------
  | fontsizes --|------>| fontsizestr |   called fontsize_music
  ---------------       |    for      |
  |     ...     |       |   music     |
  ---------------       ---------------
                        | fontsizestr |   called fontsize_grace
                        |    for      |
                        | grace notes |
                        ---------------
                             ...
                        ---------------   fontsize_text[] is a vector
                        | fontsizestr |     of the text font sizes
                        |    for      |
                        |  last text  |
                        |    size     |
                        ---------------

There is a struct with default sizes called init_fontsizes in the tables.c code
module. Movement structures are initialized to point to this. The first time a
header directive changes a font size, a copy is made.

When making use of a font the font id (font_rm, etc) is used via font_table get
the font's metric information, and a fontsizestr is obtained either from the
fontsizes field in the current movement, or from data included in a headstr.


STRING HANDLING
===============

The string.c module contains many string-handling functions. When a text string
is read, it is interpreted as UTF-8 and turned into a vector of 32-bit values,
zero-terminated. Escape character sequences are handled at reading time, and
the string is checked for unknown or invalid characters. If special-purpose
B2PF processing is enabled, it happens here as well.

The most significant 8-bits of each 32-bit value are a font identifier, with
the remaining 24-bits being a code point. Defined values like font_rm are the
font ids, which are small numbers. The 0x80 bit can be set in a font id in a
string character to indicate a smaller than usual font size. This is used for
the \mu\ music escape, and for the \sc\ small caps escape.

The highest valid Unicode code point is 0x0010ffff; values above this (but
still within 24 bits) are used for special purposes, and are given the
following names:

  ss_page           insert current page number (\p\)
  ss_pageodd        ditto, but only if an odd number (\po\)
  ss_pageeven       ditto, but only if an even number (\pe\)
  ss_skipodd        if page number is odd, skip to next ss_skipodd (\so\)
  ss_skipeven       if page number is even, skip to next ss_skipeven (\se\)
  ss_repeatnumber   insert bar repeat number (\r\)
  ss_repeatnumber2  insert bar repeat number only if >= 2 (\r2\)
  ss_verticalbar    unescaped vertical bar (recognized in headings/footings)
  ss_asciiquote     escaped ASCII quote character (do not make closing quote)
  ss_asciigrave     escaped ASCII grave accent (do not make opening quote)
  ss_escapedhyphen  escaped hyphen )
  ss_escapedequals  escaped equals ) in underlay; do not interpret
  ss_escapedsharp   escaped sharp  )


STRING OUTPUT
=============

PMW strings (see above) can contain characters in different fonts. When a
string is to be output, it has to be split into substrings, each of whose
characters are all in the same font. This has become more complicated than it
was originally, because of a change in font technology.

When PMW was originally implemented, the only available technology for a
user-created font was an Adobe Type 3 font, and this is what I used for the
music font (PMW-Music). Later, Adobe published the format of its previously
secret Type 1 format and PMW-Music was converted. In both these formats, each
character can be associated with a movement of the current point in any
direction after a character has been output. I made use of this ability by
creating a number of characters that output nothing, but move various distances
up, down, left, or right. Other characters display fragments of musical symbols
(e.g. tails for quavers, etc). This was all designed so that a single string
consisting of a mixture of characters could assemble musical symbols out of a
number of parts. There are also some non-blank characters that have a vertical
rather than a horizontal movement (for vertical squiggles).

Adobe's initial secrecy about its Type 1 format (and its royalty charges) led
to the development of other font technologies, in particular, TrueType,
developed by Apple and also used by Microsoft. Eventually the players got
together and developed OpenType, which is a combined technology that can
contain either Adobe-style or TrueType-style font outlines. This is being
pushed as the future of fonts. However, from PMW's point of view it has a
serious drawback: after outputting a character from an OpenType font, the
current point can be set to move only in one direction. As PMW is a
left-to-right font, it is not possible to have characters that cause up, down,
or leftwards movement. This restriction is inherited from TrueType.

The PMW code relies on being able to output strings in the PMW-Music font that
use the special moving characters. For example, the string "z.w{{y." is used
to create a semiquaver rest from two suitably placed quaver rests. Its action
as as follows:

  z    moves right by 0.1 times the font's size
  .    outputs a quaver rest
  w    moves down by 0.4 times the font's size
  {{y  moves left by 0.76 times the font's size (2*0.33 + 0.1)
  .    outputs another quaver rest

To support an OpenType version of PMW-Music, the string outputting code in PMW
was changed. A string of characters in PMW-Music is no longer always output as
a whole, but is split into two kinds of substring: those that consist of moving
characters unsupported by OpenType, and the rest. The latter kind may be
terminated by a character that needs vertical movement, which is implemented in
PMW after outputting the substring and adjusting for its horizontal length. The
movement-only substrings are never output, but just cause x and y adjustments
within PMW. This change makes for larger output files, but storage is a lot
more plentiful that it was back in 1987.


RIGHT-TO_LEFT PROCESSING
========================

Missionaries in Egypt wrote music from right to left so that the Arabic words
and the music of hymns were both read in the same direction. There are other
examples in countries that use right-to-left scripts. Although the music's
direction is changed, the shapes of the notes, clefs, etc. remain the same.
Some years ago I was asked whether PMW could be coerced into setting music this
way.

In principle it is easy: transform the coordinate system so that the x axis
runs from right to left and the origin is at the bottom righthand corner of the
page. Then, within this coordinate system, re-transform each font so that its
characters do not appear as mirror images. It turns out not to be quite as
simple as that. Various additional adjusments need to be made, but it is the
fonts that cause the most trouble because, having been re-transformed, their
output goes left-to-right. Whenever a string is to be output at point x, say,
if the horizontal width of the string is w, the actual output must take place
at x + w (with an adjustment related to the final character's bounding box).
There is further complication for strings in the music font because of the
issues discussed in the previous section.

There is a global variable called main_righttoleft that is set TRUE for
right-to-left output. Searching for that name should find all the places where
special action is needed for right-to-left output.


DRAWING FEATURES
================

PMW supports "drawings", defined in a PostScript-like manner, for creating
custom marks on the output page. All the drawing code is in the draw.c module.
Each drawing has a name which is used to arrange them in a binary balanced tree
for fast access. (Overkill, since in most PMW input files that use drawings
there are probably only one or two. But the tree code is available for other
uses, so why not?)

The tree's "data" field points to a vector of drawitem structures, each of
which has a type (dr_something) and a value. The type indicates whether the
value is an operator or operand. One type is dr_jump, which is used to chain
more memory if the initial allocation is filled.

Execution of a drawing consists of interpreting the vector of draw items after
first pushing any arguments on the execution stack.


VARIABLE TYPES
==============

When PMW was first implemented in C, C90 was the language standard. A serious
"gotcha" in C90 is that the char type may be signed or unsigned, depending on
the environment. I had been caught by this, so for PMW I typedef'd uschar as
unsigned char and used it for the majority of character vectors. (I see that in
C11, the wording has changed somewhat.) To save typing I also typedef'd usint
for unsigned int. Over the years, as both C and PMW have been revised, newer C
integer types such as int32_t have been introduced.

There was no stdbool.h back then, so there are typedefs for BOOL (as an
unsigned int) and CBOOL (as an unsigned char, later changed to uint8_t).

Most of the stuctures used in PMW are defined as their own types in structs.h.


MACROS IN THE CODE
==================

Macros without arguments are widely used to give names to various constants and
to provide short abbreviations for some long function names. There are also a
number of macros that do have arguments. Thay are mainly defined in the pmw.h
file. Many have names that begin with "mac_".

The standard string-handling functions such as strlen(), require char *
arguments. Because PMW works mostly with unsigned chars, I defined a set of
shorthand casts, such as CS for (char *), and then a set of string-handling
macros such as Ustrlen() for use with unsigned char strings.


RUNNING DATA
============

A set of variables that are used while reading one bar are defined in a
structure called breadstr so that they can easily be initialized in a block at
the start of reading each bar. The global variable brs is the working breadstr.
Similarly, for reading a stave there is sreadstr and srs.

A structure call pagedatastr is used for remembering things while creating a
system. There are several versions so that backtracking can occur. See the
comments near the start of paginate.c.


ERROR HANDLING
==============

Errors are handled in the module error.c. Error numbers are given names such as
ERR42 so that their use can easily be found by grepping. There are two error
functions: error() just takes an error number and arguments to fill in the
message; error_skip() has in addition a 32-bit argument that contains one or
two 8-bit characters. Input is skipped until either of these, or newline, is
reached.


USER DOCUMENTATION
==================

It's all in the doc directory. There is a Makefile for building the user
manual, with comments describing its use at the start. There is also a file
called Makefile.ALL that in addition builds a document about the music font.
These two PDF documents are created using my text typesetting software (aspic,
xfpt, and sdop) as well as PMW itself for the music examples.


MAKING A RELEASE
================

A shell script called MakeRelease, whose argument is just a release number,
copies the files needed for a release into Releases/pmw-xx.xx and then makes a
tarball. Now that there is a GitHub repository, I tag each release in GitHub
and add the tarball as one of its assets.


FONT MAINTENANCE
================

Maintenance of the PMW-Music and PMW-Alpha fonts happens in the fontmaint
directory. There's a README file with details.


Philip Hazel
September 2025
