Early Copy Protection on the Apple II

For many years, a war has been waged between those who publish software and those who don't want to pay for it (or want to profit by selling illegal copies). In recent years the war has expanded to video tapes, DVDs, and audio CDs. Nobody believes that a copy-proof format is possible, but many publishers believe that they can increase their sales by discouraging illegal copies.

The copy protection techniques employed on Apple II 5.25" diskettes are the stuff of legend. The drive was completely under software control, requiring timing-critical loops to read and write data, so there were few limitations on what the copy protectors could do. A scheme called Spiradisc, mentioned in Steven Levy's book _Hackers_, wrote tracks in a spiral pattern. The disc was essentially impossible to copy directly, though that couldn't stop someone from cracking it.

Before the Apple II had floppy drives, however, it had an audio cassette interface for storing programs and data. This was a very primitive system, requiring you to hook up a cassette recorder to your computer and fiddle with the volume knob until things started working. To read data from tape, you specified a range of memory to fill, and hit the "play" button on your tape recorder. If all went well, the computer cheerfully beeped at you and off you went. Loading BASIC programs was even easier, because the start location was pre-determined, and the length was stored on the tape. All you had to do was type "LOAD".

I recently found myself extracting software from cassette tapes purchased on eBay. At the start of the project, I thought to myself, "it's awkward to get at the data, but at least there's no copy protection." As it turns out, I was wrong.

Possibilities

You read a set of bytes from tape. You can write them out to a new tape the same way. How can you stop somebody from copying your software?

Before we ponder that, consider something even simpler. Why not just hook up two tape recorders and copy the tape directly? This approach is foiled by the same issue that kept the music industry cozy and warm for many years: after a couple of generations, the quality degrades to the point where the data is no longer readable. The only way to avoid this problem is to write a new copy of the cassette data from the computer, creating a new analog copy from a digital master.

So how do we prevent the user from just reading and writing bytes? One possible approach is to use a two-stage loader. To load binary data from an Apple II cassette you need to know its length in bytes. If we write a short program whose purpose is to load the main part of the application, we never have to tell the user the length of the main part. This seems like a small victory, and it is; it's pretty easy to come up with a way to load the data without knowing the length (e.g. try to read more data than is on the tape, see where it stops looking reasonable, reduce the length and iterate until the checksum is valid). As with all copy protection, however, the goal isn't to defeat a skilled and determined attacker, but rather discourage the casual user. Besides, if we have our own loader, there are other tricks we can play.

We're going to take a detailed look at a trio of Apple II games that sought to confound the pirates. The programs are arranged in order of increasing complexity of protection. Before we can do that, however, we need to understand a few things about how an Apple II works.

Apple II Innards

The original Apple II used a 6502 processor running at 1MHz. Not a real speed demon by today's standards, but pretty good at the time. It could access 64KB of memory, of which up to 48K was RAM. The upper 16K was ROM, memory-mapped I/O locations, and firmware for installed peripheral cards. (Expansion cards, such as the Apple Language Card, provided additional RAM through bank-switching.) Some of the locations in RAM had specific purposes, defined by the currently-running version of BASIC, or by the system monitor.

The 6502 has three 8-bit registers, the accumulator (A) and two index registers (X and Y). The instruction set doesn't even pretend to be orthogonal. Because there are so few registers, and no way to hold a 16-bit address in them, the 6502 includes some "zero page" address modes. These allow indirect access to 16-bit addresses, and most zero-page operations are faster than their more general counterparts.

The Apple II convention for displaying hexadecimal numbers is '$', so instead of "0x2000" or "2000h" we write "$2000". With that in mind, here are some 6502 instruction examples:

LDA #$7A - load the hexadecimal value 7A into the A register.
STX $36 - store the value held in the X register into zero-page address $0036.
LDA $2000,X - load the A register with the value held at address $2000 + X. That is, if X is $15, it would load the value at $2015 into A.
STA ($36),Y - get the little-endian 16-bit address from location $0036 and $0037, add Y to it, and store the value of the accumulator there. If 36/37 hold $2000, and Y is $10, then the A register will be stored in address $2010.
JSR $FDED - jump to a subroutine at address $FDED. When the code there finishes, it issues an RTS instruction, and control returns to the instruction following the JSR.

The "monitor" was part of the "F8 ROM", which occupied the last 2K of the address space (0xF800 to 0xFFFF). It provided a way to enter, disassemble, and execute code. On early Apple II models, hitting the reset key would leave you in the monitor. Later models with the "autostart ROM" would leave you in BASIC, but you could access the monitor with "CALL -151".

The memory layout was usually discussed in terms of "pages". Each "page" was 256 bytes long. The 6502 doesn't have a page-oriented architecture, but this provided a convenient way to talk about sections of memory. The first few pages looked like this:

Page 0: a/k/a "zero page". The monitor, BASIC, DOS 3.3, and ProDOS all staked out territory here, so applications needed to avoid touching certain locations.
Page 1: the CPU stack lives here, starting at 0x01FF and moving downward.
Page 2: keyboard input buffer, by convention. Anything placed here gets partially overwritten as soon as the user regains control of the keyboard.
Page 3: mostly free space, but later versions of the Apple II firmware put vectors here for software breakpoints and the reset key.
Page 4-7: text page 1. The text on the screen is stored here. Sort of a text frame buffer.

After that comes a second text page, some open space, and then two "hi-res" graphics frame buffers.

One trick employed by cassette publishers, which wasn't copy protection so much as an attempt to do something cool, was to start loading the program at address 0x0200 (page two). The act of loading the tape would place commands in the keyboard input buffer so that, when loading completed, the program would start automatically, just as if the user had typed them. The program typically started at $0800, which meant the tape would also load data onto the text page, allowing a "please wait" start-up banner.

Cassette tapes have a 10-second 770Hz lead-in, followed by cycles at 1KHz or 2KHz, representing '1's and '0's, respectively. For a program of moderate size this means an average speed of about 1200bps, or about 1000 times slower than a 1x CD-ROM drive. The 6502 code for the cassette read/write routines, including subroutines shared with other code, only requires about 180 bytes of space in ROM.

With this in mind, let's explore the first of the three programs.

Personal Software - Microchess 2.0

Microchess was a very early chess program, released in 1978. It used the Apple II's high-resolution graphics screen (280x192, six colors) to display the chess board. The complete game fit in 7.5K of RAM, making it easy to load from tape. Running it in a 16K machine alongside an 8K graphics frame buffer was a bit of a squeeze, but they managed it.

The game's instructions provide the following system monitor command to load the game:

2000.2200R 2000G

This means, "load memory locations $2000 through $2200 (inclusive) with data from the tape, then start executing the code at $2000." The manual further says to leave the tape running until the graphical chess board appears. The code at $2000, then, is our stage 1 loader. It starts off pretty simply:

2000-   20 84 FE    JSR   $FE84    F8ROM:SETNORM
2003-   20 2F FB    JSR   $FB2F    F8ROM:INIT
2006-   20 93 FE    JSR   $FE93    F8ROM:SETVID
2009-   D8          CLD
200A-   20 58 FC    JSR   $FC58    F8ROM:HOME
200D-   A2 FF       LDX   #$FF
[...]

The code above calls some F8 ROM routines to perform basic system initializations, then the next part (removed for brevity -- it's long and not very interesting) prints a "game starts in two minutes" message with a short delay, and puts address $0200 into zero page memory location $02-03. Then things start to get interesting:

204E-   A9 02       LDA   #$02
2050-   85 3D       STA   $3D
2052-   A9 20       LDA   #$20
2054-   85 3F       STA   $3F
2056-   A9 00       LDA   #$00
2058-   85 3C       STA   $3C
205A-   85 3E       STA   $3E
205C-   EA          NOP
205D-   EA          NOP
205E-   EA          NOP
205F-   EA          NOP
2060-   EA          NOP
2061-   EA          NOP
2062-   EA          NOP
2063-   EA          NOP
2064-   20 58 21    JSR   $2158
2067-   20 4B 21    JSR   $214B
206A-   20 3A FF    JSR   $FF3A    F8ROM:BELL
206D-   20 FD FE    JSR   $FEFD    F8ROM:READ

The values stuffed into $3C-3D and $3E-3F define the start and end of a range used in a system monitor command. While a command like "read from tape" is being executed, the address at $3C is incremented until it becomes equal to the address in $3E. The first seven lines of the code above are therefore equivalent to typing "200.2000". The "NOP"s are "no operation" statements, meaning they do nothing but eat a couple of cycles. (Most likely there was some other code in there before the software shipped.)

The bottom part of the code calls $2158, which erases RAM from $4000 up, including hi-res page 2, and then calls $214B, which turns on display of hi-res page 2. This leaves the user staring at a blank screen, or (on systems with only 16K of RAM) a semi-random pattern. It emits a "beep" via the BELL routine and then calls the monitor cassette read function. Showing a blank graphics page is nice because the tape overwrites text page 1 with executable code, which isn't much fun to look at. Continuing:

2070-   AD 80 04    LDA   $0480
2073-   C9 C5       CMP   #$C5
2075-   D0 03       BNE   $207A
2077-   4C 59 FF    JMP   $FF59    F8ROM:OLDRST

The above code checks the return value from the tape read function. The tape read function, unfortunately, doesn't actually return anything -- it just prints "ERR" to the screen if something goes wrong. So, the code checks to see if the letter 'E' appears at a certain location on the text page. If so, it gives up and jumps into the monitor.

At this point, the code does something slightly odd:

207A-   20 4B 21    JSR   $214B
207D-   A2 00       LDX   #$00
207F-   A1 02       LDA   ($02,X)
2081-   49 A5       EOR   #$A5
2083-   81 02       STA   ($02,X)
2085-   E6 02       INC   $02
2087-   D0 02       BNE   $208B
2089-   E6 03       INC   $03
208B-   A5 03       LDA   $03
208D-   C9 20       CMP   #$20
208F-   D0 EC       BNE   $207D
2091-   4C 00 06    JMP   $0600

It calls $214B a second time, which is unnecessary, since we're already looking at hi-res page 2. It then loads a byte from the address held in $02-03 (which was initialized to $0200 earlier), performs an exclusive-OR with the constant value $A5, and puts it back. This is repeated for every byte from $0200 to $1FFF, after which the game is executed with a jump to location $0600. This is a fairly common trick, used to disguise sections of code or data. If you exclusive-OR a byte with a non-zero value, you get a new value. If you exclusive-OR it with the same value a second time, you get the original byte back.

The code stored on the tape is exclusive-ORed so that, if you try to load the second stage directly, you'll end up with what appears to be unreadable junk. If you want to copy the tape, you have to figure out how it's encoded, or you have to copy both stages.

Copying this to a new tape or adapting it for use on a disk-based system is straightforward. The easiest way to make a copy is to simply copy both stages to a new tape, without modifying either. For a disk-based system, decode part 2, and add a simple memory-move function.

Softape - Module 6

A company called Softape published a large number of programs for the Apple II on cassette. In 1978, they published "Module 6", part of a series of games. This particular one was an Integer BASIC implementation of the card game "Blackjack".

The game didn't use a two-stage loader, but it did have slightly peculiar instructions for loading from tape:

30.3FFFR

At first glance, the seems like it must be shorthand notation for "read from $3000 to $3FFF", or perhaps it's a typographical error. In fact, the program really does start loading on page zero, and continues through the system stack, input buffer, text page, and so on. Note there is no "xxxxG" command here, which means the software uses some other means to start itself running.

Why is this copy protection? It should be easy to simply load the program at a different address (e.g. 1030.4FFFR) and save it to a new tape. The problem with this approach is that most Apple II systems being sold in 1978 had at most 16K of RAM. Anything more than that was a luxury. The tape was designed to completely fill RAM on most systems. Just as CD-ROMs and audio CDs went from "pretty secure" to "wide open" as hard drive capacity and Internet bandwidth increased, this scheme became worthless once larger configurations became common.

If you can fit a hi-res chess program in 7.5K, though, why does it take nearly 16K for a text Blackjack game? It doesn't. Much of the data on the tape -- more than half, as it turns out -- is either temporary "splash screen" stuff or is filler code from other programs, inserted to prevent us from paring the code down to the core. The real program is in the last part of the tape.

So, where do we start? We know that, once the tape read function finishes, it will return to the system monitor command line. However, the tape has overwritten the system stack, so the return address is no longer there. Because of the way the tape read function works, this doesn't actually interfere with loading data from tape, but where does it go when it's done?

Looking at a hex dump of the stack area, we find:

00000100: 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000110: 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
00000120: 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03 03  ................
[...]

This means the 16-bit value pulled off of the stack will be $0303, no matter what our stack pointer happened to be. This translates to a return address of $0304, because the 6502 RTS instruction adds one to the value pulled off of the stack. Since we also loaded that chunk of memory from the tape, we can find it in the hex dump. The results are somewhat baffling:

0304-   FF          ???
0305-   FF          ???
0306-   FF          ???
0307-   FF          ???

There's no code there. Eventually, after falling through lots of nonsense, we hit a software break instruction (BRK). This seems like an accident. When we hit a software break, we go through the monitor's handler at $FA4C, which decides it's a software interrupt and jumps through a vector at $3F0. The data at $3F0 is $FFFF, but if we start executing there we wrap around to $0000 -- which was not loaded from the tape -- and most likely hit a BRK before long, which leaves us in an infinite loop.

What did we miss? Well, a copy protection scheme that assumes a 16K machine might also make other assumptions. In this case, it's assuming the old version of the monitor ROM, which did not go through a vector at $3F0. Instead, it just dumps us into the monitor at $FF65. So we avoid the infinite loop, but we're stopped in the monitor. Now what?

The key here is a popular F8 ROM function called "COUT". If a program wanted to put text on the screen, it could either stuff the values into memory directly, or it could use a firmware function called "COUT" at $FDED. A program would load a character into the accumulator, then "JSR $FDED" to print it. The firmware function would perform an indirect jump through the little-endian 16-bit address at location $36-37, which by default would jump to $FDF0 ("COUT1"), which would use some other zero-page values to determine the proper location to output the character. Every time you call COUT1, the horizontal position advances by one, until you hit the right edge and it wraps around to a new line.

The tape started loading at $0030. The first 16 bytes look like this:

00000030: ff 00 ff aa 13 28 40 08 00 08 00 40 3c 00 00 40  ...*.(@....@<..@

The COUT vector at $36 has been set to $0840, so when the BRK handler outputs a character, control transfers there. It's also worth pointing out that the value at $3C-3D is $003C, which is important because that holds the address where data is being loaded from tape. Setting it to $003C means the data loads from $30 to $3FFF without skipping around.

Here's what happens at $0840:

0840-   A9 00       LDA   #$00
0842-   85 36       STA   $36
0844-   A9 09       LDA   #$09
0846-   85 37       STA   $37
0848-   00          BRK

Same thing all over again. The COUT vector is changed to point at $0900, and we do another software break. When we try to output a character, we end up here:

0900-   60          RTS

As mentioned earlier, RTS is "return to subroutine". The upshot of the new state of things is that, whenever somebody tries to output a character through COUT, we just return without doing anything. The monitor output has been suppressed. This seems to leave us in an awkward place though, because once again we're left without an active thread of execution.

Something must happen, though, and when we return from our various shenanigans we find ourselves back in the warm embrace of the system monitor, which still wants to output some information about our last software break, and then give us a command line to type stuff in on. It tries to input a character by calling the monitor RDKEY1 function. RDKEY1 works much the same way that COUT does, doing an indirect jump through a zero-page vector, in this case $38-39.

Checking back to the hex dump of $0030, we see that address $38 holds $0800, which is the next stage in the process:

0800-   A9 8E       LDA   #$8E
0802-   85 CA       STA   $CA
0804-   A9 22       LDA   #$22
0806-   85 CB       STA   $CB
0808-   86 02       STX   $02
080A-   A9 30       LDA   #$30
080C-   85 00       STA   $00
080E-   A9 08       LDA   #$08
0810-   85 01       STA   $01
0812-   A2 00       LDX   #$00
0814-   A9 EA       LDA   #$EA
0816-   8D 0C 08    STA   $080C
0819-   8D 0D 08    STA   $080D
081C-   8D 10 08    STA   $0810
081F-   8D 11 08    STA   $0811
0822-   A1 00       LDA   ($00,X)
0824-   E6 00       INC   $00
0826-   D0 02       BNE   $082A
0828-   E6 01       INC   $01
082A-   A6 02       LDX   $02
082C-   60          RTS

It starts off nicely enough. Putting $228E into $CA-CB tells Integer BASIC where to find the start of the program. It then saves the X register in location $02, puts $0830 into address $00-01, and then does some self-modifying code that NOPs out the instructions at $080C and $0810. This prevents it from re-initializing $00-01 on subsequent calls. It then reads a byte from the indirect address at $00-01, increments $00-01, restores X, and returns with the byte we loaded in the accumulator.

This is a rather complicated way of making the system think that the user is typing the string of characters at $830. These, as it turns out, are:

00000830: 9b c0 83 8d d2 d5 ce 8d ff ff ff ff ff ff ff ff  .@..RUN.........

Translated, that's "<Esc> @ <Ctrl-C> <Return> R U N <Return>". Escape-@ clears the screen, Ctrl-C starts Integer BASIC, and "RUN" starts the BASIC program running. The first thing the BASIC program does is delete part of itself:

    0 DIM D(52),R$(41),Q$(9),S$(3),C$(3),F$(4),A$(10),DEBT(4),CASH(4),U(4),H(144
),INSR(4): GOTO 32000
    0 REM   ************************
    1 REM   *      SOFTAPE *       * 
    2 REM   *  SOFTWARE EXCHANGE   *
    3 REM   *.===.===.===.===.===.===.===.=*
    4 REM   *     MODULE  #6       *
    5 REM   *                      *
    6 REM   *COPYRIGHT 1978-SOFTAPE*
    7 REM   *----------------------*
    8 REM   *  DUPLICATION OF THIS *
    9 REM   *PROGRAM OR ANY PORTION*
   10 REM   * THEREOF  CONSTITUTES *
   11 REM   *   INFRINGEMENT OF    *
   12 REM   *      COPYRIGHT       *
   13 REM   *.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*.*
   14 REM 
   15 POKE 204,63: POKE 205,10
   30 TEXT : CALL -936: VTAB 4: TAB 14: PRINT "S O F T A P E": TAB 4: PRINT "S O
 F T W A R E  E X C H A N G E": PRINT 
[...]
32000 DEL 0: IN# 0: PR# 0: GOTO 15

Yes, there are two different version of line zero. The code in line 32000 deletes the first one, uses "IN#0:PR#0" to reset the vectors at $36-37 and $38-39 to defaults, and then jumps to line 15 to start the program running. Without the "DIM" statements in the first line 0, the program will not execute if saved and reloaded, though somebody looking at the program after it has executed once might have trouble deciding why.

(Incidentally, the '.'s in lines 3 and 13 are actually Ctrl-Gs, invisible characters that make the speaker "beep" when you list the code. The block-formatting of the REM statements looks consistent on screen.)

In this case the copy protection was not only ineffectual, it was also a nuisance because it prevented the program from running on later machines. The implementation appears slightly flawed as well: there's an explicit BRK statement at address $0303, suggesting that the programmer expected execution to continue there with his stack full of 3s, but the 6502 jumps to address+1 when an RTS instruction is found.

Transferring this to disk is easy, assuming a 48K machine: load the code at a higher address, substitute a memory move in place of the cassette load function, replace the BRK vector at $3F0 with something more reasonable, and launch it. Better yet, stop right before the program starts running, and just save it as a BASIC program.

On a 16K cassette-only system, though, you'd need to write a custom tape load routine to capture the image in two pieces. There's no other way to get at the last half of the program. For a brief period, this technique was reasonably effective.

Hayden - Sargon II

The Sargon chess program was one of the earliest developed for microcomputers. It was written in Z-80 assembly language by Dan and Kathe Spracklen, and won the all-microcomputer tournament at the 1978 West Coast Computer Faire. An Apple II version was developed by Gary Shannon, who published some early Apple II titles through Softape (e.g. Othello and Jupiter Express), and also happened to be Kathe Spracklen's brother. Subsequent releases were Sargon II (Hayden 1979), Sargon III (Hayden 1983), Sargon IV (Spinnaker 1989), and Sargon V (Activision 1991). A book detailing the way Sargon works, Sargon: A Computer Chess Program, can sometimes be found in used book stores (and occasionally online).

The Apple II version of Sargon II featured text and hi-res display, and required 24K of RAM. The copy protection used the best features found in the previous two examples, and took them a step further.

The instructions for loading the game are quite simple:

30.3FFR

This is the same low-memory start as "Module 6", but this time it's just a small stage 1 loader. There's no "xxxxG" command listed, so this is another self-starting program. Looking at the first 16 bytes, we see:

00000030: 01 00 ff aa 05 00 3e 02 1b fd 00 00 3c 02 20 89 ...*..>..}..<. .

As in "Module 6", the text output vector has been altered, though the input vector is set to the default ($FB1D). Looking a little closer, we see a new trick. The value at $3C-3D, which holds the address used by the cassette read/write functions, is $023C instead of $003c. This means that the data from the cassette will fill locations $0030-003D, skip forward to $023E, and continue until $03FF. If you try to be clever and load the stage-one program with "1030.13FFR", you will fail, because there isn't that much data on the tape.

Skipping forward like this means the stack and the start of the input buffer, at $0100 and $0200, are left unmodified. When the monitor finishes loading the data, it will output a beep by writing a Ctrl-G through COUT, which sends us through the vector at $36 to address $023E. Here we find this:

023E-   20 89 FE    JSR   $FE89    F8ROM:SETKBD
0241-   20 93 FE    JSR   $FE93    F8ROM:SETVID
0244-   20 60 03    JSR   $0360
0247-   A9 03       LDA   #$03
0249-   48          PHA
024A-   A9 0E       LDA   #$0E
024C-   48          PHA
024D-   A9 1C       LDA   #$1C
024F-   8D F2 03    STA   $03F2
0252-   A9 17       LDA   #$17
0254-   8D F3 03    STA   $03F3
0257-   49 A5       EOR   #$A5
0259-   8D F4 03    STA   $03F4
025C-   20 39 FB    JSR   $FB39    F8ROM:SETTXT
025F-   20 58 FC    JSR   $FC58    F8ROM:HOME
0262-   A9 77       LDA   #$77
0264-   8D FF 5F    STA   $5FFF
0267-   AD FF 5F    LDA   $5FFF
026A-   C9 77       CMP   #$77
026C-   D0 01       BNE   $026F
026E-   60          RTS

It calls F8 ROM routines to reset the vectors at $36-37 and $38-39, does some stuff at $0360 (discussed next), pushes $030E onto the stack with PHA instructions, points the autostart ROM reset key vector at $03F2 at itself, and clears the screen by calling HOME. The code at $0262 writes and reads a byte into $5FFF to see if the machine has at least 24K of RAM, and jumps to $026F if it doesn't. The RTS at $026E jumps to the address we just pushed on, plus one ($030F); we'll come back to this later.

The code at $0360 writes a copyright message to the text screen, then jumps to $02B3. The code there handles the stage two loading:

02B3-   A9 00       LDA   #$00
02B5-   85 3C       STA   $3C
02B7-   A9 08       LDA   #$08
02B9-   85 3D       STA   $3D
02BB-   A9 FF       LDA   #$FF
02BD-   85 3E       STA   $3E
02BF-   A9 2F       LDA   #$2F
02C1-   85 3F       STA   $3F
02C3-   A2 00       LDX   #$00
02C5-   20 FA FC    JSR   $FCFA
02C8-   A9 16       LDA   #$16
02CA-   20 C9 FC    JSR   $FCC9    F8ROM:HEADR
02CD-   85 1F       STA   $1F
02CF-   20 FA FC    JSR   $FCFA
02D2-   A0 24       LDY   #$24
02D4-   20 FD FC    JSR   $FCFD
02D7-   B0 F9       BCS   $02D2
02D9-   20 FD FC    JSR   $FCFD
02DC-   A0 3B       LDY   #$3B
02DE-   20 EC FC    JSR   $FCEC
02E1-   81 3C       STA   ($3C,X)
02E3-   45 1F       EOR   $1F
02E5-   85 1F       STA   $1F
02E7-   20 BA FC    JSR   $FCBA    F8ROM:NXTA1
02EA-   A0 35       LDY   #$35
02EC-   90 F0       BCC   $02DE
02EE-   20 EC FC    JSR   $FCEC
02F1-   49 A5       EOR   #$A5
02F3-   C5 1F       CMP   $1F
02F5-   F0 03       BEQ   $02FA
02F7-   4C 2D FF    JMP   $FF2D    F8ROM:PRERR
02FA-   60          RTS

As you can see (if you have been following along carefully), this sets up the monitor start and end address to be 800.2FFF. It then provides its own, slightly modified, tape read routine. The code is identical to what the system monitor does, with one exception: before comparing the checksum it accumulated at $001F for correctness, it exclusive-ORs the value with $A5.

This means that the checksum stored on the cassette is deliberately wrong. If you tried to load stage two with "800.2FFFR", the monitor will report a failure, even though the data was read correctly. (The author could have taken this a step farther and employed different timings, or perhaps reversed the meaning of '0' and '1', but for whatever reason they stuck with the standard Apple II format.) You could look at the data at $0800 to see if it looks okay, but as we're about to see that won't work.

Assuming the data loaded correctly, we RTS our way back to the first bit of code, which RTSs us to the address pushed on the stack ($030F). Here we hit the obfuscation layers:

030F-   A9 00       LDA   #$00
0311-   85 00       STA   $00
0313-   A9 08       LDA   #$08
0315-   85 01       STA   $01
0317-   A0 00       LDY   #$00
0319-   B1 00       LDA   ($00),Y
031B-   49 AD       EOR   #$AD
031D-   91 00       STA   ($00),Y
031F-   C8          INY
0320-   D0 F7       BNE   $0319
0322-   E6 01       INC   $01
0324-   A5 01       LDA   $01
0326-   C9 30       CMP   #$30
0328-   90 EF       BCC   $0319

This uses the technique we saw earlier in Microchess, where the code is exclusive-ORed with a value, so that the data read from tape looks like gibberish until this function decodes it. In this case, the value is $AD, and everything from $0800 through $2FFF (the entire second stage) is altered.

This next part doesn't make sense at first:

032A-   18          CLC
032B-   A0 13       LDY   #$13
032D-   B9 9F 02    LDA   $029F,Y
0330-   79 FB 02    ADC   $02FB,Y
0333-   99 00 01    STA   $0100,Y
0336-   88          DEY
0337-   10 F4       BPL   $032D
0339-   20 00 01    JSR   $0100

It's reading some values from one location, adding them to values from another location, storing them at $0100-$0113, and then executing them. Instead of simply concealing the code with exclusive-ORs, it's actually assembling a subroutine from two different places, and dropping it onto the stack page. The code, once assembled, looks like this:

0100-   A9 33       LDA   #$33
0102-   8D 10 03    STA   $0310
0105-   A9 9A       LDA   #$9A
0107-   85 3C       STA   $3C
0109-   A9 03       LDA   #$03
010B-   8D 7A 40    STA   $407A
010E-   A9 27       LDA   #$27
0110-   8D FE 5F    STA   $5FFE
0113-   60          RTS

This subtly sabotages the exclusive-OR routine and the cassette start address, and leaves a couple of values in seemingly random locations in memory. The first two are attempts to throw a red herring at us. It's not clear what the last two do, but it's a good bet that the chess application won't work correctly without them. When this little gem returns, we're back here:

033C-   A0 13       LDY   #$13
033E-   A9 FF       LDA   #$FF
0340-   99 00 01    STA   $0100,Y
0343-   88          DEY
0344-   10 FA       BPL   $0340
0346-   A0 00       LDY   #$00
0348-   98          TYA
0349-   99 00 02    STA   $0200,Y
034C-   C8          INY
034D-   D0 FA       BNE   $0349
034F-   A9 97       LDA   #$97
0351-   8D 1C 03    STA   $031C
0354-   20 3A FF    JSR   $FF3A    F8ROM:BELL
0357-   6C F2 03    JMP   ($03F2)

This erases the code at $0100 and $0200, and stores $97 at $031C. Looking at the code above, we see that $031C holds the value used to exclusive-OR the data from tape. Rather than erasing the code, the authors replaced the value with a slightly different one, in an attempt to lead the unwary on a chase down the wrong path.

After all that, the code emits the traditional post-cassette-load speaker beep (which some enterprising individuals with EPROM burners might have trapped to cause a modified software break), and jumps through the reset vector at $03F2. Earlier this was set to $171C, which is the stage two entry point.

Transferring this to disk requires loading the stages at a higher address, disabling the custom tape load function, and starting the first-stage loader with some code that memory-moves the code into place and sets up the output vector at $36-37. Making a tape copy is hard because there's no easy way to create the modified checksum. You can write a decoded copy of the main program, but you also need to set things up the way the code at $0100 does.

Closing Notes

The goals of and approaches used in copy protection on audio cassettes in 1978 aren't much different from those used on CD-ROMs in 2004. Any program can have its copy protection removed. The skill and effort employed in protecting a program determines how much knowledge and determination is required to strip away the protection. The goal remains deterrence of illegal copying by making the material difficult for a casual user to duplicate faithfully. Because legal users must be allowed access to the material, approaches to copy protection rely on obfuscation and minor format alterations.

Some technological advances -- strong encryption built into televisions and headphones -- may change the rules in the future. It's clear from this examination, though, that the face of copy protection hasn't really changed in over 25 years.

It's unfortunate that so few kids these days have the opportunity to solve problems of a similar nature. Some genuinely clever people worked on copy protection for the Apple II, and I learned a great deal by disassembling code while I was growing up. It motivated me to acquire a greater understanding of system-level programming than I would have developed otherwise. The desire to understand how things work, so fundamental to a larval-stage engineer, is tremendously stimulated by "forbidden" challenges.

The data was recovered from original cassette tapes purchased on eBay. The audio was captured as WAV files on a PC, and converted to Apple II files on a disk image with a program I wrote called CiderPress. The lengths of the programs on tape are determined automatically, which greatly simplified the process of extracting them. The annotated disassemblies, BASIC listings, and hex dumps above were also generated with CiderPress. The screen shots were captured while running the KEGS Apple II emulator.