Page 2 of 7

First Stargazing

Telescope detail revealing eyepiece assembly

I’d wanted a telescope for a really long time.

I guess I should say, I’d wanted a real telescope for a very long time. I had one as a kid, one of those small telescopes that leap to mind when you hear the word “telescope”: a long tube on a tripod tapering to an eyepiece on one end. I tried to use it, but I had no guidance. I don’t know much about that telescope’s provenance—at that time, I was content to know it came from Santa Claus—but it wasn’t the highest quality. The experience never worked out for me. All I saw through it were bright, blurry dots streaking briefly in and out of view.

Not long ago, I got really curious about what would be possible if I bought a new telescope today. It turns out, this is a huge hobby with a lot of writing about it online, and I had to spend weeks reading up before I knew what I wanted. I finally got something called an “Orion SkyQuest XT8 Classic Dobsonian Telescope.” All my reading had led me to the conclusion that I wanted to set aside all other considerations in favor of the most power for the dollar. In terms of telescopes, that came down to things like focal length and aperture, so I didn’t get cool things like a tracking computer.

I thought it was going to take several days to arrive, but it came the next morning after I ordered it, and I wasn’t prepared for the ridiculous size. All that I said about aperture and focal length? That’s all size, and this thing is a bit silly in that regard. The telescope tube came whole, looking like a bathroom trashcan that grew up to stand nearly as tall as a person and with a large makeup mirror in the bottom. I spent maybe an hour putting the base together, on which I mounted the tube like a cannon.

I was pretty jazzed about using it right away, but I had to find a place and wait till nightfall. I asked around and looked online, and several sources mentioned a place called Stub Stewart State Park. My friend Shawna ended up coming with me, and it was just as well no one else joined me that night because the telescope tube alone occupied my entire backseat.

We headed out of town around eleven at night on the very same day I got it. I’d never been to this park, and even though it’s only about forty minutes out of town, the roads out that way dwindle quickly in size, and the darkness made it feel very remote and a touch creepy. So I was surprised when I ended up at a large parking area filled with cars in the darkness. It was actually a bit crowded, though so dark and moonless that I never did end up seeing another human. The spot was popular enough for astronomers that the bathrooms were lit with red bulbs, making the only visible edifice seem hellish.

Shawna helped me drag my telescope out to an area that seemed clear enough. I had done some research to figure out what things I might’ve wanted to look at, and it turns out all those things had set below the horizon by midnight, so I had no idea what to do from that point. I only had one eyepiece with me, a twenty-five millimeter eyepiece which gave me forty-eight-times magnification. Regardless of what all that magnification may have been suited for, that’s all I had to work with.

The sky, even without any aid, was striking. Without a moon or any light for miles, the Milky Way could be seen clearly spreading across the entire sky. Once our eyes adjusted, the sky was full, and it would’ve been worth the trip for that view alone.

I was really anxious to try out the telescope because I didn’t even know if it’d work or not—my childhood telescope had been a complete disappointment. I took out my phone and used an app to see what was around, and pretty soon I saw Saturn sitting some degrees above the horizon. Taking my phone down, I saw some fuzzy stars in roughly the same direction and had to figure out which one of these dots might’ve been Saturn. I made a guess and worked on aiming the telescope that way.

My aim was off at first, so I slid my telescope around till a bright yellowish blur was in view. While unfocused, it was like a fat, bright dot, but I noticed it had a bit of an oval shape, and that oval became more pronounced as I focused. When it finally became crisp, I noticed the oval had gaps in it. I was actually seeing rings, around Saturn.

It was an unimpressive speck and dazzling sight at the same time. What had first been a tiny dot as anonymous as the rest was now familiar and improbable at the same time, like spotting a celebrity. The magnification rendered it quite small, little more than a bulge with a ring-like shape around it, but it was hard to look away. I let Shawna look, to share it but also confirm that the thing I was seeing was actually Saturn: I had trouble believing I’d found it.

If I’d been alone and had thought to bring a chair, I probably would’ve just sat there and looked at it for a while, but we were getting cold and uncomfortable, and I wanted to see if I could find anything else. I instantly thought of the Andromeda Galaxy, so I pulled out the Sky Guide app and found it high up in the sky in the other direction. When I put the app out of view, up in the sky, I could see stars, but I couldn’t see Andromeda (which wasn’t surprising).

I didn’t really have a choice, so I put my telescope in the neighborhood where it was supposed to be and just started scanning around. This took considerably longer without a clear dot at least to aim for, but eventually a very large oval smear came into view. I tried focusing on it, but it didn’t improve much. I didn’t figure it out at the time, but here was another situation where my eyepiece was inappropriate, this time because it magnified too much. I was seeing only most of the middle portion, and finer details had been dimmed by the magnification.

So Andromeda was even less impressive a sight than Saturn, and somehow even more. Featureless as it seemed, it filled the field of view. Seeing another galaxy was more meaningful to me than seeing a planet or a star. Coming from so far away, Andromeda’s light is not just ancient but primordial. We on Earth can visit Saturn with probes, but we’ll never touch Andromeda. I thought of Edwin Hubble, spotting a Cepheid variable star there and knowing for the first time what an immense chasm of time and space lay between that “island universe” and us. Andromeda taught us just how large the universe could be, and I remembered this as I looked at it.

Before we left, we took a last look at Saturn—I couldn’t resist. Then Shawna and I started on the trip home, by this time very early Sunday morning. We shared an exhilaration from the experience. I know I have to do this again soon, and I don’t doubt Shawna will be willing to join me.

In the Back of the House

I got my first job at fifteen, going on sixteen. I worked for my hometown newspaper as an inserter, and as time passed, I began filling in occasionally as a “pressman.” Inserters were a collective bunch of old ladies (and me) who made spare money assembling the newspaper sections and stuffing in the ad inserts. When I got to help with the actual printing, it took the form of developing, treating, and bending the lithographic plates in preparation for printing. More often, I caught the papers as they rolled off the press to bundle them up for distribution. I also cleaned up, sweeping and trash takeout and the like, but I wasn’t good at it. I liked to take breaks to play my guitar at the back of the shop, so I think the editor-in-chief who ran things probably was annoyed as piss at me half the time.

There was no question I worked in the bowels of the operation. The real fun (and to the extent a small, rural paper could afford it, the real money) happened at the front of the building where the editor-in-chief and reporters worked. I passed through to gather up trash a few times a week. As I went, I admired the editor-in-chief’s ancient typewriter collection in his office. I enjoyed talking to the lead reporter, who loved Star Trek. The layout team’s work fascinated me, especially as they transitioned to digital layout from cutting and splicing pieces of paper together.

After my tour, I returned to the back, and I only heard from the front when it was time to go to press or when we had to stop the presses. We weren’t a separate world by any means, but we had a job to do, and that job was entirely a pragmatic one, keeping the machinery running and enabling the actual enterprise which paid us. Inasmuch as I felt like an important part of the whole, it was in a sense of responsibility toward the final product.

About a decade later, I stumbled across my current programming thing. Now I find myself at the back of the house again. The work echoes my first job sometimes—working on the machinery, keeping things running, along with other programmers and operations folks. This time the job comes with a dose of values dissonance for me. It feels like a wildly inverted amount of prestige goes to us, to the people running the machines, instead of the others who are closer to the actual creation (and the customers using it).

I’m not sure our perceived value is unwarranted—programming is hard. I’m more concerned about the relationship between the front and back of the house. It could be that we, as programmers and tech people, undervalue the people making the content and interacting with the customers. I see the skewed relationship when I look at inflated tech salaries. It makes itself evident in startups made up of all or mostly engineers. I felt it most acutely when I considered becoming a tech writer, only to be reminded it could derail my career and cost me monetarily.

I don’t think my observation comes with a cogent point. Maybe only that tech can’t be just about the engineering, no more than a newspaper can be only a printing press.

Functional Programming for Everyone Else

Functional programming has become a hot topic in the last few years for programmers, but non-programmers might not understand what we’re talking about. I can imagine them wondering, “Did programs not work before? Why are they suddenly ‘functional’ now?”

Earlier today, I was tweeting a bit about the challenge of explaining to family or primary school students what the big deal is all about. Even we programmers take some time to cotton on to the notion. I know I’d have to pause for a moment if someone asked me what Haskell offers over Python.

If you’re not a programmer and have been wondering what the big deal is, here’s my attempt to lay it out.


First, consider the time just before computers existed. By the 1920s, a kind of math problem called a decision problem led various people to learn how to investigate the problem solving process itself. To do this, we had to invent an idea we call computability, meaning to automate problem solving using small, reusable pieces. A couple of mathematicians tackled this idea in two different ways (though they each came to the same conclusion), and today we have two ways to think about computability as a result.

I’m partial to Alan Turing’s approach because it’s very practical. He envisioned a process that’s quite mechanical. In fact, we now call it a Turing machine, even though his machine never actually existed. It was more of a mental exercise than something he intended to build.

To solve a problem with a Turing machine, he would break a problem into a sequence of steps which would pass through the machine on an endless tape. The machine itself knew how to understand the steps on the tape, and it knew how to store information in its memory. As the steps on the tape passed through, one at a time, the machine would consult the tape and its own memory to figure out what to do. This usually meant modifying something in its memory, which in turn could affect the following step, over and over until the steps ran out. By choosing the right set of steps, when you were done, the machine’s memory would end up with the answer you needed.

Since that time, most computers and programs are based on this concept of stringing together instructions which modify values in memory to arrive at a result. Learning to program means learning a vast number of details, but much of it boils down to understanding how to break a problem into instructions to accomplish the same thing. Programs made this way would not be considered “functional.”

At the same time, another mathematician, Alonzo Church, came up with another approach called lambda calculus. At its heart, it has a lot in common with Turing’s approach: lambda calculus breaks up a problem into small parts called functions. Instead of modifying things in memory, though, the key proposition of a function is that it takes input and calculates a result—nothing more. To solve a problem this way, little functions are written to calculate small parts of the problem, which are in turn fed to other functions which do something else, and so on until you get an answer.

Lambda calculus takes a much more abstract approach, so it took longer to work out how to make programs with it. When we did, we called these programs “functional programs” because functions were so fundamental to how they worked.


Putting all this together, I think of functional programs as ones which do their jobs without stopping to take notes along the way. As a practical consequence, this implies a few odd things. The little niceties that come first nature to procedural programs—like storing values, printing out text, or doing more than one thing at once—don’t come easy to functional programs1. On the other hand, functional programs allow for understanding better what a program will do, since it will do the same thing every time if its input doesn’t change.

I think both approaches have something to offer, and in fact, most programs are made with a combination of these ideas. Turing proved neither approach was better than the other. They’re just two ways of ending up at the same result. Programmers each have to decide for themselves which approach suits best—and that decision problem can’t be solved by a program yet.

On Thematic Storytelling

When I reached ninth grade, I was dumped briefly in Honors AP English, and my teacher initiated us into the secret language of symbolism in literature. Writing was (and remains) important to me, so that class made a deep impression.

It was life-changing to find that many of the things I had read or had yet to read contained hidden meaning. Throughout the next several years, my intellectual and spiritual development involved understanding layers of meaning, their connections, and their implications—not only in the stories I enjoyed reading and writing, but also on my life and my understanding of my own existence and that of God. It was the first time I realized that works of art and literature had meaningful things to say about my world and not just their own.

That class tied symbolism into overarching themes, and themes are really what I want to talk about now. Themes were like secret messages; symbolism, the vocabulary; and stories were the paper on which the messages were written. We learned about themes and symbolism with an emphasis on connections to ancient mythology, the Bible, and fatalism versus determinism. For me, this skewed the significance of thematic content above all other literary content. I took away the idea that these stories had such powerful and important existential messages to impart that all other elements of the story only were included to support them. The only way I could imagine writing a story this way was to start with the theme, which would inform everything else.

Unfortunately, that misapprehension stunted my development as a reader and writer for years after. When reading, I got too analytical. When writing, I didn’t feel creative anymore. I regarded everything besides the theme—all the descriptions, characters, suspense, drama—as fluff. I didn’t know how to start at a theme and end up back at the same place that originally inspired me to write a story, and I figured doing anything else was frivolous.

What I described about my English class—mining literature for symbolism that may or may not even be there—seems to be a rite of passage. It’s probably attractive curriculum because it reduces reading comprehension into something rather mechanistic and testable. For whatever reason, I know a lot of people who had an experience in high school like mine.

I don’t think I’m the only person for whom this approach to storytelling presented a dilemma. The problem, as I saw it, is that writers either started with a story and later artificially incorporated symbolism, or they began with the symbolism and tried to wring a story out of it.

I remember Stephen King writing about precisely this issue in his non-fiction book On Writing. King said he tended to write the story first, and if he saw the potential to develop a theme, he would elaborate on that as he wrote and rewrote, polishing the theme until it shone through. The biggest takeaway I got from On Writing is that thematic aspects of a story are sometimes already there naturally, waiting to be developed.

He’s prolific and has taken lots of approaches to writing fiction on a theme. Some of his books, especially the earlier ones, didn’t bother. It’s difficult to come away from Christine or Cujo2 feeling like you missed a deeper meaning.

Other books of his swung in the other direction, incorporating lots of symbolism. Insomnia is the one that stands out for me. In On Writing, King describes struggling with the amount of planning he did in that book, and it shows. It’s full of allusions, symbols, and outright literal descriptions of the protagonist defying deities and struggling against fatalism. The Waste Lands directly describes symbolism via an English class, itself makes heavy allusion to T. S. Eliot, and uses nonsensical jokes (among other things) to symbolize a world losing its coherency and sanity.

After several years, I finally realized this approach made sense when I considered themes as an element of storytelling on the same level as other elements such as setting, plot, or characterization. Any story entails decisions about how to include and involve them and to what extent.

One example which springs to mind right away is Isaac Asimov’s “The Last Question”, which contains almost no characterization whatsoever but whose plot and theme concern itself with nothing less than the nature of the universe, sweeping its entire breadth in space and time. In the same way, some stories naturally require elaborating on setting (think of worldbuilding in high-fantasy); others never allude to their setting at all.

This way of thinking about theme feels so right to me. It frees me from the problematic dilemma I described earlier, if only I apply the same kind of thinking about theme as I apply regarding, say, plot. For example, it feels natural to me to sketch out a story plot to begin with, letting each storytelling ingredient interact and complement one another, and then proceed to work out the details as I move along. The wholeness of the plot emerges as the work continues. Why can’t theme and symbolism manifest the same way? After all, it never occurred to me to let decisions about plot or setting paralyze me this way.

I also think it would be really interesting if high school English classes gave as much consideration to characterization or setting the way they do to theme. Maybe the thinking is that those things are written in an obvious way and theme is much more tied into the context and history of the writer. On the other hand, it’s easy to say that Stephen King writes about Maine because he’s from there without further examination, but does this efface a deeper conversation we could be having about why such a familiar and detailed setting makes his books work so well? If we have to read Catcher in the Rye in high school, is it more important to talk about what the ducks in winter represent, or should we be talking about what an immature hypocrite Holden is? Does Slaughterhouse-Five teach us as much about the historical destruction of Dresden as it does about how avoidable and pointless war is?

I guess my breakthrough is to realize all these questions hold equal consideration in my mind now and broadens the kinds of stories I enjoy and the way in which I appreciate them.

Announcing My Content Licensing

A couple of recent popular posts I’ve made have motivated me to license my content here on my site. This doesn’t directly affect readers. It just means that I want to disclose what others are allowed to do with what I write without asking me first.

If anybody wants to re-license anything here (for some reason), I’m open to discussing it, but I may ask for compensation. The easiest way to contact me is just by sending email to to any address at my domain (which all lands in my inbox).

A Gentle Primer on Reverse Engineering

Over the weekend at Women Who Hack I gave a short demonstration on reverse engineering. I wanted to show how “cracking” works, to give a better understanding of how programs work once they’re compiled. It also serves my abiding interest in processors and other low-level stuff from the 80s.

My goal was to write a program which accepts a password and outputs whether the password is correct or not. Then I would compile the program to binary form (the way in which most programs are distributed) and attempt to alter the compiled program3 to accept any password. I did the demonstration on OS X, but the entire process uses open source tools from beginning to end, so you can easily do this on Windows (in an environment like Cygwin) or on Linux. If you want to follow along at home, I’m assuming an audience familiar with programming, in some form or another, but not much else.

Building a Program

I opened a terminal window and fired up my text editor (Vim) to create a new file called program.c. I wanted to write something that would be easy to understand, edit, and yet still could be compiled, so C seemed like a fine choice. My program wasn’t doing anything that would’ve been strange in the year 1982.

First, I wrote a function for validating a password.

int is_valid(const char* password)
{
    if (strcmp(password, "poop") == 0) {
        return 1;
    } else {
        return 0;
    }
}

This function accepts a string and returns a 1 if the string is “poop” and 0 otherwise4. I’ve chosen to call it is_valid to make it easier to find later. You’ll understand what I mean a few sections down.

Now we need a bit of code to accept a string as input and call is_valid on it.

int main()
{
    char* input = NULL;
    input = malloc(256);5
    printf("Please input a word: ");
    scanf("%s", input);

    if (is_valid(input)) {
        printf("That's correct!\n");
    } else {
        printf("That's not correct!\n");
    }

    free(input);
    return 0;
}

This source code is likewise pretty standard. It prompts the user to type in a string and reads it in to a variable called input. Once that’s done, it calls is_valid with that string. Depending on the result, it either prints “That’s correct!” or “That’s not correct!” and exits, returning control to the operating system. With a couple of “include” directives at the top, this is a fully functioning program6.

Let’s build it! I saved the file program.c and used the command gcc program.c -o program7 to build it.

This outputs a file in the current directory called program which can be executed directly. Let’s run our program by typing ./program. It’ll ask us to put in a word to check. We already know what to put in (“poop”), so let’s do that and make sure we see the result we expect.

Please input a word: poop
That's correct!

And if we run it again and type in the wrong word, we get the other possible result.

Please input a word: butts
That's not correct!

So far, so good.

A Deeper Look

There’s nothing special about this program that makes it different than your web browser or photo editor; it’s just a lot simpler. I can demonstrate this on my system with the file command. Trying it first on the program I just built, with the command file program, I see:

program: Mach-O 64-bit executable x86_64

This is the file format OS X uses to store programs. If this kind of file seems unfamiliar, the reason is that most applications are distributed as app bundles which are essentially folders holding the executable program itself and some ancillary resources. Again, with file, we can see this directly by running file /Applications/Safari.app/Contents/MacOS/Safari:

/Applications/Safari.app/Contents/MacOS/Safari: Mach-O 64-bit executable x86_64

Let’s learn a little more about the binary we just built. We can’t open it in a text editor, or else we get garbage. Using a program called hexdump we can see the raw binary information (translated to hexadecimal) contained in the file. Let’s get a glimpse with hexdump -C program | head -n 20.

00000000  cf fa ed fe 07 00 00 01  03 00 00 80 02 00 00 00  |................|
00000010  10 00 00 00 10 05 00 00  85 00 20 00 00 00 00 00  |.......... .....|
00000020  19 00 00 00 48 00 00 00  5f 5f 50 41 47 45 5a 45  |....H...__PAGEZE|
00000030  52 4f 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |RO..............|
00000040  00 00 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  19 00 00 00 28 02 00 00  |............(...|
00000070  5f 5f 54 45 58 54 00 00  00 00 00 00 00 00 00 00  |__TEXT..........|
00000080  00 00 00 00 01 00 00 00  00 10 00 00 00 00 00 00  |................|
00000090  00 00 00 00 00 00 00 00  00 10 00 00 00 00 00 00  |................|
000000a0  07 00 00 00 05 00 00 00  06 00 00 00 00 00 00 00  |................|
000000b0  5f 5f 74 65 78 74 00 00  00 00 00 00 00 00 00 00  |__text..........|
000000c0  5f 5f 54 45 58 54 00 00  00 00 00 00 00 00 00 00  |__TEXT..........|
000000d0  10 0e 00 00 01 00 00 00  e7 00 00 00 00 00 00 00  |................|
000000e0  10 0e 00 00 04 00 00 00  00 00 00 00 00 00 00 00  |................|
000000f0  00 04 00 80 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000100  5f 5f 73 74 75 62 73 00  00 00 00 00 00 00 00 00  |__stubs.........|
00000110  5f 5f 54 45 58 54 00 00  00 00 00 00 00 00 00 00  |__TEXT..........|
00000120  f8 0e 00 00 01 00 00 00  1e 00 00 00 00 00 00 00  |................|
00000130  f8 0e 00 00 01 00 00 00  00 00 00 00 00 00 00 00  |................|

The left column is the “offset,” in hexadecimal (like line numbering, it tells us how many bytes into the file we are on a particular line). The middle two columns are the actual contents of the file itself, again in hexadecimal. The right column shows an ASCII equivalent for the file’s contents, where possible. If you pipe the file’s contents to less you can scan through and see mostly a lot of garbage and also a few familiar strings. If you’re interested in knowing what pieces of text are embedded in a file, the program strings speeds this process up a great deal. In our case, it tells us:

poop
Please input a word:
That's correct!
That's not correct!

So clearly those strings are still floating around in the file. What’s the rest of this stuff? Volumes of documentation exist out there on the Mach-O file format, but I don’t want to bog down in the details. I have to level with you here—I honestly don’t actually know much about it. Analogizing from other executable formats I’ve seen before, I know there’s probably a header of some kind that helps the operating system know what kind of file this is and points out how the rest of the file is laid out. The rest of the file, incidentally, is made up of sections which may contain any of a number of things, including data (the strings in this case) built into the program; information on how to find code called from elsewhere in the system (imports, like our printf and strcmp functions, among others); and executable machine code.

Disassembling the Program

It’s the machine code we’re interested in now. This is the interesting part! Machine code is binary data, a long string of numbers which correspond to instructions the processor understands. When we run our program, the operating system looks at the file, lays it out in memory, finds the entry point, and starts feeding those instructions directly to the processor.

If you’re used to scripted programming languages, this concept might seem a little odd, but it bears on what we’re about to do to our binary. There’s no interpreter going over things, checking stuff, making sure it makes sense, throwing exceptions for errors and ensuring they get handled. These instructions go right into the processor, and being a physical machine, it has no choice but to accept them and execute each one8. This knowledge is very empowering because we have the final say over what these instructions are.

As you may know, the compiler gcc translated my source code I wrote earlier into machine language (and packaged it nicely in an executable file). This allows the operating system to execute it directly, but as another important consequence of this process, we also no longer need the source code. Most of the programs you run likely came as binary executables without source code at all. Others may have source code available, but they’re distributed in binary form.

Whatever the case, let’s imagine I lost the source code to program up above and can’t remember it. Let’s also imagine I can’t even remember the password, and now my program holds hostage important secrets.

You might think I could run the binary through the strings utility, hoping the password gets printed out, and in this case, you’d be on the right track. Imagine if the program didn’t have a single password built in and only accepted passwords whose letters were in alphabetical order or added up (in binary) a specific way. Without the source code, I couldn’t scan to see which strings seem interesting, and I wouldn’t have a clue what to type in.

But we don’t need to lose heart because we already know that the program contains machine code, and since this machine code is meant to be fed directly to the processor, there’s no chance it’s been obfuscated or otherwise hidden. It’s there, and it can’t hide. If we knew how to read the machine code, there would be no need for the source code.

Machine code is hard for a human to read. There’s a nice GNU utility called objdump which helps enormously in this respect. We’ll use it to disassemble the binary. This process is called “disassembly” instead of “decompilation” because we can’t get back the original source code; instead we can recover the names of the instructions encoded in machine code. It’s not ideal, but we’ll have to do our best. (Many people use a debugger to do this job, and there’s a ton of benefits to doing so, like being able to watch instructions execute step by step, inspect values in memory, and so on, but a disassembly listing is simpler and less abstract.)

I looked up the documentation for gobjdump (as it’s called on my system9) and picked out some options that made sense for my purposes. I ended up running gobjdump -S -l -C -F -t -w program | less to get the disassembly10. This is probably more than we’d care to know about our program’s binary, much of it mysterious to me, but there’s some very useful information here too.

The Disassembly

I’ll share at least what I can make of the disassembly. At the top of the listing is some general information. This symbol table is interesting. We can see the names of the functions I defined. If I had truly never seen the source code, I would at this point take an especial amount of interest in a function called is_valid, wouldn’t I?

Immediately below this is a “Disassembly of section .text”. I happen to know from past experience that the “.text” bit is a bit misleading for historical reasons; a “.text” section actually contains machine code! The leftmost column contains offsets (the place in the file where each instruction begins). The next column is the binary instructions themselves, represented in hexadecimal. After that are the names and parameters of each instruction (sometimes with a helpful little annotation left by objdump).

Of course, the very first thing I see is the instructions of the is_valid function.

Disassembly of section .text:11

0000000100000e10  (File Offset: 0xe10):
   100000e10:   55                      push   %rbp
   100000e11:   48 89 e5                mov    %rsp,%rbp
   100000e14:   48 83 ec 10             sub    $0x10,%rsp
   100000e18:   48 89 7d f0             mov    %rdi,-0x10(%rbp)
   100000e1c:   48 8b 7d f0             mov    -0x10(%rbp),%rdi
   100000e20:   48 8d 35 33 01 00 00    lea    0x133(%rip),%rsi        # 100000f5a <strcmp$stub+0x4a> (File Offset: 0xf5a)
   100000e27:   e8 e4 00 00 00          callq  100000f10 <strcmp$stub> (File Offset: 0xf10)
   100000e2c:   3d 00 00 00 00          cmp    $0x0,%eax
   100000e31:   0f 85 0c 00 00 00       jne    100000e43 <is_valid+0x33> (File Offset: 0xe43)
   100000e37:   c7 45 fc 01 00 00 00    movl   $0x1,-0x4(%rbp)
   100000e3e:   e9 07 00 00 00          jmpq   100000e4a <is_valid+0x3a> (File Offset: 0xe4a)
   100000e43:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
   100000e4a:   8b 45 fc                mov    -0x4(%rbp),%eax
   100000e4d:   48 83 c4 10             add    $0x10,%rsp
   100000e51:   5d                      pop    %rbp
   100000e52:   c3                      retq   
   100000e53:   66 66 66 66 2e 0f 1f 84 00 00 00 00 00  data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)

This is super exciting because we’re about to read assembly language12. There are lots of books and sites on this subject, and my own understanding of assembly language is a bit rusty from years of disuse, but I know enough to get the gist. Let’s break it down.

  • The first three instructions (the first three lines, starting with 100000e10) are a preamble that begin most functions in assembly language generated by a compiler. They’re not important for us. (It saves the old frame pointer, gets a new frame pointer, and clears space on the stack for locals.)13
  • The next two instructions set up for our strcmp function. This looks a bit odd in assembly language compared to what we’re used to. The mov instructions are shifting data from a location in memory to a register14 and vice versa. Because registers are involved, the disassembly wasn’t able to hint very well what these values may be, but we can guess it’s moving the strings to compare into place. I know this because of the calling convention for the function call (basically, set up the data and then make the call, which will know where to find the data); because the %rbp is the base register, which usually points to data; and because -0x10(%rbp) is a way of saying “look sixteen bytes earlier in memory than the address in the %rbp register.”
  • The lea and callq instructions load and call the strcmp function using the parameters we just moved in place. That function lives elsewhere in the system, so some magic happens here to transfer control of our program to that function.
  • By the time we reach the cmp instruction, strcmp has done its thing and stored its result in the accumulator register %eax. By convention15, return values usually live in %eax, so given that we’re using a cmp (“compare”), and it’s acting on %eax and $0x0 (a zero), it’s a safe bet we’re checking to make sure strcmp returned zero. This instruction has the side effect of setting a flag in the processor called ZF to either 1 or 0, depending on if the comparison is true or not.
  • The next instruction is jne which is short for “jump if not equal.” It checks the ZF flag, and if it’s zero, skips ahead twelve bytes (bypassing any instructions in the intervening space).
  • That’s followed by a movl and a jmpq. These instructions move a 1 into a location in memory and skip ahead another seven bytes. Look at the two-digit hexadecimal numbers to the left of these two instructions. They add up to twelve!
  • Likewise, after these instructions, one other instruction moves the value 0 into the same location of memory and continues ahead. This instruction is exactly seven bytes long. So these jumps accomplish one of either two things: either the memory location -0x4(%rbp) is going to hold a 1 or a 0 by the time we get to the final mov. This is how assembly language does an if—a very interesting detail we’ll return to.
  • That last mov puts the value at -0x4(%rbp) (we just saw it’s either a 1 or a 0) into %eax, which we know is going to be the return value.
  • Finally, the function undoes the work from the preamble and returns. (After that is some junk that’s never executed.)

That was a lengthy explanation, so to sum up, we learned that the binary executable has a function called is_valid, and this function calls strcmp with some values and returns either a 1 or a 0 based on its return value. That’s a pretty accurate picture based on what we know of the source code, so I’m pleased as punch!

Directly below the definition for this function is the main function. It’s longer, but it’s no more complex. It does the same basic tasks of moving values around, calling functions, inspecting the values, and branching based on this. Again, the values are difficult to get insight into because many registers are used, and there’s a bit more setup. For the sake of brevity, I’ll leave analyzing this function as an exercise for the reader (I promise it won’t be on the test).

Breaking the Program

Remember, we don’t have the slightest idea what the password is, and there’s no good indication from the disassembly what it might be. Now that we have a good understanding of how the program works, we stand a good chance of modifying the program so that it believes any password is correct, which is the next best thing.

We can’t modify this disassembly listing itself. It’s output from objdump meant to help us understand the machine code (the stuff in the second column). We have to modify the program file itself by finding and changing those hexadecimal numbers somewhere in the file.

After looking over how both is_valid and main work, there are lots of opportunities to change the flow of the program to get the result we want, but we have to stay within a few rules. Notice how a lot of the instructions specify where other parts of the program are in terms of relative counts of bytes? That means that we can’t change the number of bytes anywhere, or else we’d break all the symbol references, section locations, jumps, offsets, and so on. We also need to put in numbers which are valid processor instructions so that the program doesn’t crash.

If this were your first program, I’d be forced to assume you wouldn’t know what numbers mean what to the processor. Luckily, the disassembly gives us hints on how to attack it. Let’s confine our possibilities (such as changing jump logic or overwriting instructions with dummy instructions) to only those we can exploit by using looking at this disassembly itself. There isn’t a lot of variety here.

To me, one neat thing about is_valid stands out. Two of the lines are extremely similar: movl $0x0,-0x4(%rbp) and movl $0x1,-0x4(%rbp). They do complementing things with the same memory location, use the same number of bytes (seven), involve the same setup, are near one another, and directly set up the return value for is_valid. This says to me the machine code for each instruction would be interchangeable, and by changing one or the other, we can directly change the return value for is_valid to whatever we want. It’s a safe bet, with a function named that, we want it to return a 1, but if we weren’t sure, I could look ahead to the main function and see how its return value gets used later on.

In other words, we want to change movl $0x0,-0x4(%rbp) to be movl $0x1,-0x4(%rbp) so that no matter what, is_valid returns a one. The machine code for the instruction we have is c7 45 fc 00 00 00 00. Conveniently, the machine code for that precise instruction we want is just two lines above: c7 45 fc 01 00 00 00. The last challenge ahead is to find these bytes in the actual file and change them.

Where in the file are these bytes? Note that the listing says “File Offset: 0xe10” for the function is_valid. That’s actually the count of bytes into the file we’d find the first instruction for this function (3648 bytes, in decimal), and the offset in the left column for the first instruction is “100000e10”, so those offsets in the left column look like they tell where in the file each instruction’s machine code is. The instruction we care about is at “100000e43”, so it must be 3651 bytes into the file. We only need to change the fourth byte of the instruction, so we can add four to that count to get 3655 bytes.

Using hexdump -C program | less and scrolling ahead a bit, I find a line like this one:

00000e40  00 00 00 c7 45 fc 00 00  00 00 8b 45 fc 48 83 c4  |....E......E.H..|

Sure enough, there’s the instruction, and the seventh byte on this line is the one we want to change. Patching a binary file from the command line is sort of difficult, but this command should do the trick:

printf '\x01' | dd of=program bs=1 seek=3654 count=1 conv=notrunc16

dd is writing to the file program (of=program), seeking by one byte at a time (bs=1), skipping ahead 3654 bytes past the first one to land on 3655 (seek=3654), changing only one byte (count=1), and not truncating the rest of the file (conv=notrunc).

Now I’ll run the program the same way we did before (./program) and see if this worked.

Please input a word: butts
That's correct!

Success!

Conclusions

That’s about it. It’s a contrived example, and I knew it would work out before the end, but this is a great way to start learning how programs are compiled, how processors work, and how software cracking happens. The concepts here also apply themselves to understanding how many security exploits work on a mechanistic level.

Disclosing and Consequences

Before writing “Disclosing,” I would’ve given anything to peek into the future and see this post I’m about to write. I was fearful of the consequences of putting information out in the world that I could never take back. I don’t know what I expected. I just know I’ve never been so worked up about a piece of apparent non-information ever.

Afterwards, I was happy to have ripped the bandage off and have done with it. It did ease my anxiety in a lot of ways. I’ve formed a lot of habits around controlling information about my private life (even up to being cagey about my full name), and it’s freeing to lower that boundary.

It reminds me of the attitude I carried with me early in my transition, about the importance of visibility. It was important to talk to people, even do activism (including lecturing before doctors and nurses). I didn’t necessarily like the position I was in, but I knew that I had had so much false garbage in my head about transsexuality growing up that I went through years of needless self-inflicted pain. It felt good to shed that, once again.

The long and short of the actual response was that nothing happened at all. There were no consequences whatsoever, whether good or bad. The tweet got some few supportive replies. (Many people missed it entirely and possibly are learning about it from this post.)

One other nice consequence of all this is that it might be possible now to revive some of my past writing from about five years ago that I had to hide away. I learned a ton; no reason not to share that now.

Clarity Through Static Typing

I can’t seem to find much discussion online contrasting dynamic and static typing as teaching tools. Others have covered the technical merits up and down, but I wanted to make a case for static typing for teaching new programmers.

It’s true that it’s easier, even necessary, to elide abstract concepts like types when first starting out. Dynamically typed programming languages (like Python, Ruby, or JavaScript) allow learners to get started quickly and see results right away. It’s important not to underestimate the importance of that quick, tight feedback loop! While getting started, students don’t need to know that “Hello World” is skating on layers of abstractions on the way to the screen.

At the risk of veering into criticizing dynamic typing itself (which isn’t my intention!) languages like Ruby and Python unfortunately also lengthen the feedback cycle between making certain kinds of mistakes and seeing an error produced from them. In the worst cases, the error becomes much more difficult to understand when it occurs. Testing becomes crucial to ferret out these kinds of errors.

That’s a relatively minor concern of mine, though. I’m more concerned about what happens when a student turns into a new programmer interacting with a non-trivial system. It’s inevitable that a new programmer will have to learn an existing complex system—if not on the job, then at the least while learning a web framework. At this point, she will have to use or modify some part of the system before understanding the whole. In other words, a new programmer will have to point at a symbol or word on the screen and ask, “What is that?”

In a language like Ruby or Python, it literally takes longer to figure out what a variable on the screen represents, and it sometimes requires searching through many files and holding many abstractions in your head to understand any non-trivial piece of source code. Using or modifying a complex system requires deeper and more expert knowledge of the system. It’s for this reason that I feel static typing helps peel away abstractions. It also makes information about the system more explicit, closer at hand, and more readily searchable.

I find it ironic in the case of Python especially. “Explicit is better than implicit,” say the Pythonistas—except when it comes to types?

Disclosing

I’ve dog-whistled this relatively loudly already, but just so everyone’s on the same page—I have a transsexual history. Reach out to me privately if you have questions, but I’ll cover a few points here.

  • To clarify, I’m a woman, and I consider myself transsexual. Specifically, I say I have a transsexual history. I also consider myself homosexual, attracted primarily to women. I consider intersexuality as part of my history, but I don’t claim intersex as an identity (a really complicated topic).

  • I have a complex relationship with this history, my body, and my gender, which includes a history of activism, lots of therapy, and in general, lots of feelings. Consider the delicacy this implies if engaging me on the topic.

  • It’s cliché, but if you didn’t know my history before, this changes nothing you know about me.

  • I prefer to retain whatever control possible over this information. I understand this post constitutes a public announcement, and that necessarily means I’ve sacrificed most control, but when possible, avoid assumptions about my history, my body, or my gender. Point people to me for clarification or questions.

I’m doing this now for a few reasons.

  • First of all, I trust the people around me in my life and in my work enough that I feel this disclosure won’t risk me bodily, psychologically, or financially.

  • Also, it’s pained me for a very long time to keep the amount of distance I need to dissimulate my history. It’s prevented me from explaining much about why family isn’t in my life, why I’m in Portland in the first place, or what my life has been about in the past.

  • It frees me to pursue medical interventions without having to come up with a weird cover story.

  • It gives me a voice, once again, on issues of transsexuality and gender which I used to self-censor out of fear of speaking out.

  • Finally, it reaffirms why I did this in the first place. The goal was always to look and feel more like who I’ve always been, not just to sell an identity or history to others.

Most of the Mistakes

When I interviewed at Simple, I wanted to get across one very important thing—if I got hired, I would begin by making every possible mistake, but I would only make each mistake once.

I think enough time has passed that I’ve managed to make most of the mistakes I needed to get past feeling stuck. Five months in, and I’m finally feeling like I’m contributing steadily. My world at work has also widened, putting me in contact with other teams.

One of the major shortcomings of my last job was that I found I spent so much time helping with maintenance that I never got to create things. Programming actually became a rare part of my job. I spend almost every day at Simple actually programming, and I’m really thankful for that. Not necessarily because I enjoy programming in and of itself (that comes and goes) but because I get to have a say in how things work, and I get to help drive us forward.

« Older posts Newer posts »

© 2017 Emily St*

Theme by Anders NorenUp ↑