(or, Using Mnemonic Atoms in Symbolic Naming)

by Ron Gutman

Computer Language magazine, October 1984

Assembly languages and high-level programming languages free us from using numbers to tell our computing machines what to do. These languages also require us to devise many names -- for variables and subroutines, for instance -- ourselves. We typically choose mnemonic names derived from our human languages so that the names remind us of the activities they stand for.

Programmers appreciate this freedom to devise names to their own liking. But the freedom can be a burdensome responsibility, especially when writing a large program that contains hundreds or even thousands of user-defined names or symbols. How does a programmer select symbolic names that describe the objects to which the names are given?

Before inventing the technique this article describes, I struggled to choose names in a consistent fashion so that I could subsequently recognize the names and the objects they referred to. This took a lot of programming time, and the results were not the best. Often I would end up with the same name for two different things. Other times I could not recognize the name or mistook it for something else. Could I expect anybody else to understand the names I chose?

Contributing to the problem, the assembler I used allowed only six characters to form each symbolic name. A limitation that severe is not uncommon in assembly languages and comparably severe limitations exist in some compilers for high-level languages.

How does a programmer describe an object in six characters? I often felt that I needed six words. It doesn't help much that the assembler or compiler allows you to tag on extra characters beyond those it uses to distinguish one symbolic name from another. You still have to insure that each name you choose is distinguished from all others in the program you are writing.

My story does have a happy ending, though. I overcame the problem by inventing a technique using "mnemonic atoms," as I call them.

If readers find themselves in a situation similar to mine -- having to write a large program containing a lot of somewhat abbreviated symbolic names -- the technique I will describe may be invaluable to their projects. This is particularly the case if readers use assembly language because its programs can have large numbers of symbol definitions. There are fewer facilities (like GOTO-less control structures) that help avoid the use of symbols. My technique would also be useful where storage limitations demand a terse source code. Since mnemonic atoms provide a degree of documentation, they can reduce the need for comments in the code.

Of course, one solution to the problem is to use programming systems that are less restrictive with regard to user-defined symbols. COBOL programmers can use -- and usually do use -- long-winded names like "monthly-income-average". A COBOL programmer can select a few words of English or computer jargon that describe the thing he or she is naming and put them together with hyphens.

When a programmer is limited to six characters, he or she will do the same thing but will abbreviate each word and omit the hyphens. So instead of "monthly-income-average" the programmer might use MOINCA. Or, perhaps unsatisfied with such short abbreviations, the programmer might choose to leave out one of the components, arriving at MOINCO for "monthly-income". MO seems like an adequate abbreviation for "month" because it is generally accepted as such, but INCO is more recognizable as "income" than INC, which could be confused for other words such as "increment".

Now let's pretend we are browsing through a source listing and see how we do at recognizing some abbreviations. We see a reference to a subroutine called SRTBL. That could be an abbreviation for "search-table". Or is it "sort-block"? Of course, the context or the comments will tell us, but recognition does not come quickly.

Suppose SRTBL searches a table that contains names and phone numbers for a given name, and suppose we find that we also need the capability to search the table for a given phone number. So we write a new routine that we will call SRTBLP to search by phone number (P for "phone"). The existing routine we rename SRTBLN to make it clear that it searches by name (N for "name").

Later we find that we want to sort the table and print it out either in alphabetical order by name or numerical order by phone. If SRT is our abbreviation for "sort", then we ought to call our new routines SRTTBLN and SRTTBLP. But now our assembler, which only recognizes six characters of each name, considers these two names to be identical or just plain illegal. We can't use SR for "sort" because we have already used the names SRTBLN and SRTBLP for our search routines. We finally settle on SRTTBN and SRTTBP, using TB for "table". Having two abbreviations for "table" is a little disconcerting, however. Maybe we should rename our other routines to use the new abbreviation for "table".

Some fiddling around would yield more solutions -- each requiring some kind of compromise -- but the reader may now have a feel for some of the problems involved in choosing names. After our efforts at solving this naming puzzle, how well will our solution serve us? Will we be able to recognize SRTBLN and SRTTBN and be able to say readily which is which? How about a month from now, when we might have to modify the program?

It would be difficult, primarily because the abbreviations are sometimes one, sometimes two, and sometimes three characters long. How will our eyes divide these names into their components? By trial and error. Remember that SRTBLN is divided SR-TBL-N, while SRTTBN is divided SRT-TB-N, or did you forget already? Seeing the SRT in SRTBLN will certainly distract us from its correct division.

Don't think these issues are too trivial. They matter in all but the smallest programming projects.

So how can we put some order into this naming business where there is now confusion?

At this point I think our resistance to applying some kind of disciplined scheme to the problem has broken down, so I will now offer my rules for constructing mnemonic names. The first two are commandments to be followed religiously. They will make all the difference in the world. The other two will help, but they need not be followed religiously because sometimes a given situation just won't oblige us in our efforts to follow them.

The First Commandment

All abbreviations shall be composed of the same number of characters. These equal-length abbreviations are called mnemonic atoms.

All mnemonic atoms have the same number of characters. Fine, but what is that number? You have to make a trade-off between the number of possible mnemonic atoms you can invent and the number you can use to form one name. You want both of these numbers to be as large as possible. However, your programming language limits the number of characters in a name, so if your atoms contain too many characters your symbolic names won't contain very many atoms. On the other hand, if your atoms contain too few characters, there won't be many combinations of characters to form atoms.

Suppose names are limited to six characters, and suppose you decide that all of your atoms will be two characters long. Then you can combine up to three atoms to form a name and, assuming you use only alphabetic characters, there will be 26 x 26, or 676, sequences of two characters you can choose from to form atoms. I think two characters per atom is optimum given a limit of six characters per name.

If pressed for a formula, I would venture that the number of characters per atom should be the square root of the character limit rounded down to the nearest whole number. By that formula, two-character atoms would be used with any limit from four to eight characters, and three-character atoms would be used from nine to 15. Above 15, you might prefer to use another scheme altogether and form your names in the COBOL fashion using a delimiter to separate the components.

The Second Commandment

Each mnemonic atom shall be entered into a "dictionary" of mnemonic atoms. This dictionary shall be in alphabetical order and must give the meaning of each.

Perhaps the word "dictionary" implies too much tedium. But my dictionaries have been just a few pages long. I include my dictionary in the source code as a section of comments, or I maintain the dictionary in a separate text file that later becomes part of the documentation in the software package.

In any case, use the computer to maintain the dictionary. You won't invent all of your atoms at one time. You might start with a few atoms in the dictionary, but you will be adding them as you go along. One of the main purposes of the dictionary is to tell you those combinations of characters that you have already used before you attempt to add new ones. When I add an atom, I just pencil it into my dictionary listing with an arrow to show where it goes. Periodically I edit the dictionary on the computer to get a clean, up-to-date listing.

Your dictionary must be alphabetized by mnemonic atom. Otherwise it will be intolerably tedious for you to determine what new atoms you can add or find out what an unfamiliar atom means.

Now for an advanced lesson. Divide your dictionary into two dictionaries. One dictionary will contain atoms that you are likely to use in future projects. Most of these atoms will be abbreviations for common programming terms like "table", "stack", "pointer", "index", "move", or "error". The other dictionary will contain atoms that are specific to your current project. These atoms will be abbreviations for terms used in the specific application you are working on. When you go on to your next project, you can start out with the dictionary of common atoms, leaving your application-specific dictionary behind. The big pay off comes when you use atoms you have already become familiar with.

A brief example of a typical dictionary appears in Table 1.

          Sample mnemonic atom dictionary

AK - acknowledge                 MN - minimum
                                 MO - mode
CL - clear, reset                MS - message
CP - copy                        MV - move
CR - create                      MX - maximum
CU - current
CV - convert                     OP - output

DA - disable                     PK - pack
DB - debug                       PN - pointer
DC - decrement                   PR - prompt
DL - delete                      PV - previous
DW - day of week
DT - date                        QU - queue

EA - enable                      RC - receive
EO - end of                      RD - read
ER - error                       RE - record
EX - exit
                                 SE - set
FI - file                        SK - stack
FL - flag                        SO - sort
FP - floating point value        SR - search
                                 SV - save
IC - increment
IP - input                       TB - table
IR - interrupt                   TD - time of day
IS - insert                      TK - task
IV - interval, span of time
IX - index                       UP - unpack
IZ - initialize
                                 WR - write
LN - length                      WT - wait
LS - list
                                 XM - transmit, send
ME - menu

The following atoms stand for terms and concepts related
to the specific application of a mailing list program:

AD - address                     PH - phone number
NA - name                        PU - purge
                 ML - mailing label

Table 1.

The Third Commandment

Do not create two mnemonic atoms with identical meanings, such as SR for "search" in some cases and SE for "search" in others. Conversely, avoid using one mnemonic atom for more than one one meaning, such as SR for "search" in some cases and "sort" in others.

There are two purposes in not creating two atoms with the same meaning. One is to keep to a minimum the number of atoms your mind has to deal with -- thereby increasing your facility with them. The other is to keep to a maximum the number of unused combinations of characters available for new atoms. Often the combination you want for a new atom has been used, making it hard to follow the second part of the commandment. This will happen less often if the first part is kept in mind.

It's not hard to see the benefit reaped when each mnemonic atom has only one meaning: less ambiguity in deciphering symbolic names in your source code. But it is hard to achieve this. Often the most mnemonically satisfying abbreviation of a term is already being used for another term (more on this matter later). Use the dictionary to check whether an abbreviation is already in use.

The Fourth Commandment

Be conservative about inventing new mnemonic atoms. Use an existing one if possible.

Put no frivolous atoms in the dictionary! This has the same purpose as The Third Commandment. A frivolous atom might be one you really can't use or one that serves virtually the same purpose as another atom. Also, avoid creating atoms for vague terms with broad meanings like "number", "process", or "data" unless you have a specific meaning for them in your application. You are trying to pack as much meaning into your atoms and the names built on those atoms as you can. In that respect, atoms for vague terms just don't carry enough punch.

An example

The second part of the dictionary in Table 1 contains atoms that might be used in a mailing list program. As the project progressed, this part of the dictionary would grow considerably as would the first part if the programmer were starting a dictionary from scratch.

Notice that some atoms are not defined by just one word. DW stands for "day of week". PH stands for "phone number". Do not think of atoms as having single-word definitions -- though they often will -- because that concept is too restricting.

Many atoms come in pairs that represent complementary concepts such as MN and MX for "minimum" and "maximum" or RD and WR for "read" and "write" or DA and EA for "disable" and "enable". EO, for example, could be used in combination with FI or RE for "end-of-file" or "end-of-record".

Now we can apply mnemonic atoms to the hypothetical problem we tackled above. Those subroutines needed to search the table by name and by phone number (which we named SRTBLN and SRTBLP) can now be called SRTBNA and SRTBPH. The corresponding sort routines can be called SOTBNA and SOTBPH. This is a consistent and elegant scheme for naming these routines. We have given up TBL as an abbreviation for "table" for the consistent use of TB. It is no loss because as long as we must get used to TB, the use of TBL only adds confusion.

Of course, the atoms could be combined in a different order. I'm going to permit the issue of order to be decided by the reader. However, I will say that the ordering of atoms should be exploited to put more information into symbolic names, so the programmer should devise some ordering scheme.

Using my own personal scheme, the routine to search by name would be called TBNASR. That is because I like the last atom to reveal what kind of thing is being named. For variables and data structures, my final atom acts as a noun, as in TBSRPN for "table-search-pointer". For routines, my final atom is a verb. In the case of TBNASR, SR stands for the verb "search", which tells us that the thing being named is a routine that searches. The remaining atoms in the name are modifiers of the verb or noun. TBSRPN is a pointer. The atoms TB and SR modify the word "pointer" by telling how the pointer is used (it is used in the table search).

Similarly, if I had a name for the string variable that holds the name TBNASR is to search for, then the name would be TBSRNA. This last example shows how the order of atoms can be used to impart information. TBSRNA is just a permutation of atoms of TB NA SR, but the order allows me to distinguish between the two.

This last example illustrates the ability of each atom to be used in different ways as a noun, verb, or modifier -- an important feature of mnemonic atoms because it allows each atom in your dictionary to do more for you. This is possible because the atoms can be permuted in any fashion without affecting the way names are divided into atoms. That, in turn, is possible only because the atoms are equal in number of characters.

If the atoms were unequal -- some one, some two, and some three characters -- they could be permuted at the expense of having inconsistent division of names, or the names could be consistently divided into unequal atoms, but each atom could only serve in a particular position in any name. The advantages of consistent division and permutability are only achieved with equal atoms.

Often my first atom identifies a module with which the object being named is associated. I might have, for instance, a module that handles all of the operations on a table. Then TBNASR, TBSRNA, TBPHSO, etc. are all identified as being associated with that module by their initial atom, TB.

Again, the wise strategy is to devise a scheme that uses the order of atoms to convey information.

I'm going to leave one issue open because it is beyond the scope of this article and probably best left to personal preference. That is the nasty issue of selecting abbreviations to use for atoms. It is particularly hard to avoid conflicts when your atoms are only two characters. Some combinations of characters, such as IN and ST, just cry out to be used over and over. IN could be an abbreviation for "input", "index", "initialize", and many other terms. ST could stand for "status", "stack", or "store". Notice that the dictionary in Table 1 avoids these abbreviations altogether.

But the good news is that it doesn't matter much what abbreviations you come up with or how inappropriate they seem. If you use them consistently, they will become old familiar pals that couldn't seem more appropriate, just as "lbs." is a very familiar abbreviation for "pounds".

Use your imagination. The abbreviation doesn't have to be the first two letters of the word being abbreviated. Even the first letter does not have to be used. NT has mnemonic value as an abbreviation for "integer" because the sounds agree when we pronounce the abbreviation (in-tee) and pronounce the word "integer". And there is XM for "transmission" because "trans" means "cross". Also try to abbreviate a different word with the same meaning. CL can be used for "clear" instead of RE or RS for "reset".

One handy rule of thumb: when you have the choice, rely on the more uncommon letters (J's, Q's, X's, and Z's) in your abbreviations. They will present fewer future conflicts and will more readily bring to mind the terms for which they are abbreviations.

No doubt there is an article to be written on the art of abbreviation. If adopting mnemonic atoms causes the reader eventually to write such an article, I promise to read it.

Ron Gutman has a B.S. and M.S. in computer science from the Univ. of Calif. at Berkeley. He has been designing and implementing software for seven years and is now working for GRiD Systems Corp. in Mountain View, Calif.