***************************************
*                                     *
*     KRAKOWICZ'S KRACKING KORNER     *
*                                     *
*      THE BASICS OF KRACKING II:     *
*     SINGLE-LOAD GAMES, STARTING     *
*     LOCATIONS, AND OBFUSCATION.     *
*                                     *
***************************************

  The first in this series was straightforward, since the hardware reset
is a necessity to begin kracking. After that, the path divides, and
there are many many ways to producing an unprotected version of a
program.  The path you follow is governed by three things: the kind of
program, the type of protection employed, and your own personal style
(style, by the way, is primarily the result of limitations. Try to keep
an open mind and develop as much versatility as possible).  The easiest
kind of program to deal with is the one that is seen less frequently
every month: the "single-load" program or game. These are programs which
are loaded in from disk only once, and then are run strictly from memory
with no disk access.  In the good old days, almost every game was like
this, and removing protection was not that difficult.  On the other
hand, when you read something like Olaf Lubeck's challenge in track 17,
sector D of Cannonball Blitz: "YOU'LL NEVER CRACK IT", there's more
satisfaction when you get to say "oh, yes i did!".

  In order to become proficient at this and the techniques to be
discussed in future episodes, you will have to get used to committing a
very unnatural act: interpreting assembler code with no comments or
instructions to guide you. The disassembler (monitor 'l' command) is a
great help in this work, since it translates machine code into assembler
mnenonics, but the real burden falls on the ingenuity of the krackist. 
There is no substitute for experience, and no one can teach you how to
do it beyond pointing out some of the techniques we use, and warning you
about some of the tricks used to keep you from succeeding.

  The philosophy of attack with these games is to find the starting
location--the address which will always restart the game, and then to
save the game (program) as a normal DOS 3.3 binary file.  As a simple
example of a starting location, you probably already know that when you
mess up with Apple's "fid" program, you can restart by typing '803G'
from the monitor. At one time, before the publishers got smart, a
starting location was likely to be a common, even number like $800, C00,
4000, or 6000, and it's still worth checking these 'old favorites' in
case you find a naive or lazy author. If these fail, we will have to
begin the process of memory snooping.  This is the introduction to the
unglamorous activity that occupies most of the time of the dedicated
krackist.  As always, Inspector and Watson in ROM are highly
recommended, since they make the process infinitely easier.  What we are
trying to do is directly locate the beginning address of the program, or
to search back to it from something we can recognize.

  Since many games begin by displaying a hi-res "banner" or game screen,
a good place to start looking is the series of instructions that set up
the hi-res screen (there is a discussion of this in the doc for
masterkey plus, but they make a few too many assumptions).  Apple's
screen display, as you probably know, is set up by accessing some "soft
switches". In hex, these are locations $C050 to C057 (sorry, but if
you're going to learn the gentle art of kracking, you'll have to become
fluent in hexadecimal--we won't pull any punches when it comes to number
systems). It doesn't matter what you do to these locations, as long as
you make a reference, so the following instructions all establish
graphics mode:

   lda $c050,   bit $c050,   rol $c050
   sta $c050,   cmp $c050,   eor $c050

(also, this one: ldy #$71; and $bfaf,y)

many authors have established the habit, however, of writing the
sequence

   lda $c054   (select primary page)
   lda $c057   (select hi-res graphics)
   lda $c050   (select graphics mode)

and sometimes,

   lda $c052   (pure graphics screen).

to find these instructions, use the inspector's 'find' function, and
program it to search for the two-byte sequences of '50 C0' and '57 C0'.
Generally, as long as the writers aren't deliberately trying to confuse
you, you will find one to several locations where these sequences are
close to each other. You will also find some addresses that don't really
contain a screen reference, since the search is only for two bytes (for
you trivia/ statistics buffs out there, a given two-byte sequence would
occur less that once in the entire RAM memory space from 0 to $BFFF if
the distribution were truly random. It's not.).

  To see if each occurance of the pattern is the starting location, look
backwards until you find an absolute end for the previous subroutine
such as 'rts' or 'jmp'.  Your subroutine should begin immediately after
that, and you should assume for the moment that it's the starting
location.  If, for example, the location you found is $4123, test it by
reloading the game, resetting it, and typing '4123G'.  If it runs, sit
back and gloat, otherwise read on (it sounds unnecessary to reload, but
the Inspector uses a few locations in pages 0, 2, and 3, so it's best to
be safe). If Murphy's law of dynamic negatives is with you and the game
didn't start, it's usually because you haven't found the true starting
location.  You then need to trace back further in the program sequence
to find the real start.

  There are three ways for another routine to get to the one you're
looking at: jmp, jsr, and the family of branch instructions.  To
eliminate the third possibility, keep in mind that branches can reach up
to $7F (127) locations away from either direction. This is equal to
about 60 instructions, so you should review about one full page of
disassembly printout (three screensful) before and rarely after what
looked like a possible start. If you find a 'bne 4123', or 'bcc 4123',
etc., you will have to track back to the beginning of that routine and
try again.  Repeat this process until you find a location that can only
be reached by a jmp or jsr.

  To find out how the program got to this location, do a 3-byte search
with the Inspector for a jsr $4123: 20 23 41. If nothing shows up, try
the jmp $4123: 4c 23 41.  One of these must produce a reference, or you
messed up the earlier check for branches.  Once you find the earlier
reference, go through the same procedure to find the start of this
routine, and try it out as a starting location for the game.  If it
doesn't  work, try one more step further back (Krakowicz's fourth law of
kracking says that if you have to go back more than two steps, you're
probably not on the right trail).

  A number of games still do us the favor of putting up a screen,
perhaps playing a little music, and then waiting for the space bar or
other key to be pressed.  If it's not possible to find the screen setup,
we still have a fairly obvious "hook" into finding the starting address,
and in many cases the game can be saved 'as is' by using the keyboard
routine as the starting address. Don't worry for now about exactly how
we will "save the game". We'll go through that carefully and thoroughly
in the next episode.

  Since the keyboard address is C000, we can usually locate all the
inputs by searching for the 3-byte sequence of 'AD 00 C0' with the
Inspector. Occasionally, the x or y register is used to load keyboard
data, so the sequences AC 00 C0 and AE 00 C0 should be tried if the
first comes up blank (only the real bastards like Sirius use ldy #$67;
lda $bf99,y for the keyboard input).  Also, keep in mind that all the
addresses from C000 to C00F will access the keyboard, and if someone was
really determined to confuse you they could use C007 one time, C00D the
next, and so on.  If you know that the game uses the keyboard and the
preliminary searches don't show how, keep on looking for these
addresses, or the Sirius-type computed addresses.  It probably means
they have something to hide, and locating the keyboard read will reveal
enough to make the search worthwhile.

  If the program is waiting for the space bar, you will usually find a
sequence like:

  78e0: lda $c000   ;read the keyborard
        bpl $78e0   ;no key pressed
        sta $c010   ;reset kbd strobe
       *cmp #$a0    ;was it space?
       *bne $78e0   ;nope, keep trying
        jmp $6012   ;yes, go to start

*these two lines are eliminated if pressing any key will start the game.

  To check out 6012 as a starting address, set up to view the hi-res
screen (otherwise the game might be running while you watch a blank text
screen) with: C050 (cr) C057 (cr), then type 6012G. As before, you will
know at once if you were successful.

  Another way to find a restart point is to search through the keyboard
input routines for a restart key.  It has become conventional to use
ctrl-r as the restart command (occasionally ctrl-s or ctrl-b), and this
is even easier to trace. In one of the routines following a C000
reference, you will find a cmp #$92 (see the reference manual, p. 7 for
the hex values of the keyboard). The location branched to or jumped to
by a successful compare will be the restart for the game. Again, you can
save the game as is and use your new-found starting location.

  If these relatively simple approaches fail, you'll have to resort to
the real grunt type of detective work--looking for something promising
(we'll discuss boot-tracing as an alternative way of getting to this
point in another episode devoted entirely to that technique).  Likely
things to look for are "setups", where a lot of zero page locations are
initialized to begin the game:

           lda #$00
           sta $23
           sta $57
           lda #$12
           sta $30
           lda #$e9
           sta $72
           etc.
           etc

or, sometimes, a game start is indicated by a subroutine sequence which
maps out the path for the game (this is an indication of an experienced,
well-disciplied programmer, and thus is more commonly seen in business
or professional programs; rarely in game programming).

        jsr $8cd
        jsr $ce4
        jsr $2020
        jsr $203d
        jsr $8fe
        etc.

and, although it's less often the start of a program or game, a "jump
table" can be a significant clue to the organization of the program:

        jmp $204d
        jmp $2433
        jmp $ef2
        jmp $2077
        etc.

  Unfortunately, snooping for these is a time-consuming, hit-and-miss
operation - the real starting address can be anywhere from 0000 to BFFF
(or even via a basic subroutine in D000-F7FF, but I don't want to
discourage you yet).

  While it will be disconcerting to the beginner, as you get more
experience you begin to enjoy defeating various deliberate attempts to
throw you off the trail--the general subject of obfuscation, or
intentional lack of clarity.  Because the major software companies know
we're out here waiting for their latest output, they often try to
misdirect us or find innovative ways of hiding sensitive portions of the
program with a variety of techniques. Take a look at the following piece
of code from On-Line's Cannonball Blitz:

59e4-   ce e7 59    dec   $59e7
59e7-   cf          ???
59e8-   ea          nop
59e9-   59 ef ea    eor   $eaef,y
59ec-   59 ad 51    eor   $51ad,y
59ef-   c0 ad       cpy   #$ad
59f1-   54          ???
59f2-   c0 ad       cpy   #$ad
59f4-   57          ???
59f5-   c0 ad       cpy   #$ad
59f7-   52          ???
59f8-   c0 20       cpy   #$20
59fa-   60          rts
59fb-   5b          ???
59fc-   20 c5 5b    jsr   $5bc5
59ff-   20 4e 5b    jsr   $5b4e

this is an example of "self-modifying code"-instructions that change as
the program is run.  It's dangerous and generally poor programming
practice, but it can be used to throw the dogs off the scent. At first
glance, it looks like data or garbage stuck in before some real code.
Let's look at exactly how it works. Executing the first instruction
changes the second instruction from junk into a legal instruction:

59e4-   ce e7 59    dec   $59e7
59e7-   ce ea 59    dec   $59ea
59ea-   ef          ???
59eb-   ea          nop
59ec-   59 ad 51    eor   $51ad,y
59ef-   c0 ad       cpy   #$ad

(if you have an old monitor rom, you can type 59E4S to execute the first
instruction).  If we execute the second instruction, the entire picture
changes:

59e4-   ce e7 59    dec   $59e7
59e7-   ce ea 59    dec   $59ea
59ea-   ee ea 59    inc   $59ea
59ed-   ad 51 c0    lda   $c051
59f0-   ad 54 c0    lda   $c054
59f3-   ad 57 c0    lda   $c057
59f6-   ad 52 c0    lda   $c052
59f9-   20 60 5b    jsr   $5b60
59fc-   20 c5 5b    jsr   $5bc5
59ff-   20 4e 5b    jsr   $5b4e
5a02-   a9 04       lda   #$04
5a04-   8d ec b7    sta   $b7ec
5a07-   a9 00       lda   #$00
5a09-   8d eb b7    sta   $b7eb
5a0c-   a9 00       lda   #$00
5a0e-   8d f0 b7    sta   $b7f0
5a11-   a9 60       lda   #$60
5a13-   8d f1 b7    sta   $b7f1
5a16-   a9 40       lda   #$40
5a18-   20 45 5a    jsr   $5a45
5a1b-   10 01       bpl   $5a1e
5a1d-   a9 20       lda   #$20
5a1f-   91 5a       sta   ($5a),y
5a21-   ad 50 c0    lda   $c050
5a24-   a9 09       lda   #$09

suddenly, the screen setup code that was always there pops into view.
This points out the value of searching with the Inspector, since even
the closest scrutiny would probably not have made you suspect what was
actually here. Notice, too, that the third instruction increments 59EA,
so once it's been run, it's obscured again.

  Another standard trick, also shown in this example, is called "false
disassembly", and is dear to Edu-Ware, On-Line, IDSI, and Scientific
Research Associates.  Here, extra bytes are added for the sole purpose
of giving a false indication of program flow; the fake bytes are then
branched around. Look closely at the instruction in 5A1B-it says bpl
5A1E.  The next instructions in sequence appear to the casual eye to be
lda $#20; sta ($5a),y. Actually, the next instruction is jsr $5A91. This
is crucial, since this subroutine loads in the game and does a nibble
count.  To see a whole bunch of false disassemblies in a row, look at
the code in the actual subroutine:

5a91-   a9 00       lda   #$00
5a93-   10 01       bpl   $5a96
5a95-   20 a8 59    jsr   $59a8
5a98-   00          brk
5a99-   27          ???
5a9a-   c8          iny
5a9b-   d0 fa       bne   $5a97
5a9d-   85 10       sta   $10
5a9f-   f0 01       beq   $5aa2
5aa1-   a9 a9       lda   #$a9
5aa3-   20 59 00    jsr   $0059
5aa6-   27          ???
5aa7-   c8          iny
5aa8-   c8          iny
5aa9-   d0 f9       bne   $5aa4
5aab-   85 11       sta   $11
5aad-   49 b7       eor   #$b7
5aaf-   48          pha
5ab0-   a5 10       lda   $10
5ab2-   49 11       eor   #$11
5ab4-   48          pha
5ab5-   d0 01       bne   $5ab8
5ab7-   4c 60 08    jmp   $0860
5aba-   60          rts

I strongly urge you to sit down and figure out exactly what the real
program is here, and if possible, what it does.  Cover up the
explanation below, and go through the code byte by byte to eliminate the
fake bytes. It's not just character-building--if you go through a few of
these, you'll learn to recognize them when they pop up.

  Those of you who really went through it, give yourselves four kracking
honor points. For the rest of you, here's a listing of the functional
equivalent (some addresses are changed because the junk bytes have been
taken out):

5a91-   a9 00       lda   #$00
5a93-   a8          tay
5a94-   59 00 27    eor   $2700,y
5a97-   c8          iny
5a98-   d0 fa       bne   $5a94
5a9a-   85 10       sta   $10
5a9c-   a9 20       lda   #$20
5a9e-   59 00 27    eor   $2700,y
5aa1-   c8          iny
5aa2-   c8          iny
5aa3-   d0 f9       bne   $5a9e
5aa5-   85 11       sta   $11
5aa7-   45 b7       eor   $b7
5aa9-   48          pha
5aaa-   a5 10       lda   $10
5aac-   49 11       eor   #$11
5aae-   48          pha
5aaf-   60          rts

this is also valuable because it introduces the concept of "jumping
through the stack".  The rts instruction transfers the two bytes above
the stack pointer in page one to the program counter, increments the low
byte by one, and jumps to that location. Ordinarily, the bytes on the
stack were placed there as a return address by the jsr instruction.  In
this case, in very roundabout fashion, the on-liners have pushed two
bytes on the stack and executed an rts, which jumps to the location one
higher that the values stored.  The story of the subroutine goes like
this: create a checksum by exclusive-oring together all the bytes from
2700 to 27FF, and store it in $10. This allows a check to see if any of
the bytes in the nibble count routine were altered.  Do a second
checksum on every other byte from 2700 to 27FF, starting with a value of
#$20.  Store this in $11, then exclusive-or it with #$B7 to produce the
low byte of the return address:FF. Push this on the stack, exclusive-or
the first checksum with #$11 to produce the return high byte of $26,
then do the rts to jump to 2700. When you look at 2700, you find
this:

2700-   ce 03 27    dec   $2703
2703-   ef          ???
2704-   03          ???
2705-   27          ???
2706-   ad 24 27    lda   $2724
2709-   49 8a       eor   #$8a
270b-   d0 01       bne   $270e
270d-   20 8d 24    jsr   $248d
2710-   27          ???
2711-   d0 01       bne   $2714
2713-   4c a0 25    jmp   $25a0
2716-   98          tya
2717-   59 00 27    eor   $2700,y
271a-   99 00 27    sta   $2700,y
271d-   c8          iny
271e-   d0 f6       bne   $2716

(you see, now that we're familiar with this kind of trick, there's
nothing to decoding that mess, is there?)

     stay tuned for next week, when we finish this subject by answering
the burning question "what is the window-shade technique?", and proceed
to a discussion of memory moving and file saving.
This file was brought to you by Christer Ericson. (christer@cs.umu.se)