Introduction
Before beginning to hack, it might be a good idea to find out a little about computers and networking (*doh*). Anyone who already has a firm grasp of the basic principles of computing in the 21st century can happily skip this chapter, or dip into it as they wish. This chapter is a very brief introduction to computers, and for further reading see Chapter 14: Learning More.
Even with all the increases in speed, memory capability, storage capacity and graphics in the last 30 years, the basic block design for a computer has hardly changed. Whether it is in a Mac-sized box, a PC-sized box, or inside room the size of a football field with air-conditioning and a zillion white-suited attendants, the majority of computers adhere to a very similar design.
Computer Architecture
At the heart of the computer is the Central Processing Unit (CPU) which takes computer commands and data and acts upon them. Programs and data are stored in the STORAGE unit, which used to be paper tape, but is now floppy or hard disk. When the CPU needs a program stored on the storage unit, it loads some or all of it into MEMORY and then proceeds to execute the instructions it finds. A program is just a set of instructions for the computer to execute, telling it to perform some task and to send the results to an OUTPUT device which can be a plotter, printer, VDU, network or whatever. Discussing computer fundamentals at this point will take up too much valuable time and space, so let's move on to a subject that is far more useful to a hacker: computer storage and binary numbers.
Figure 1: Architecture of the majority of the world's computers.
Bits, Bytes and Hexadecimal
Now you might not really want to learn this stuff, because it's hard at first glance and it can spin your head out, but a good grasp of this will set you up in good stead if you're serious about becoming a hacker. Basically you *need* this stuff, because if you can't wrap your head around it, using many hacker tools will be impossible, and the nifty stuff like "stack overflow" and "IP spoofing" will be incomprehensible. You could use a Hex to Decimal table, or invest in a good scientific calculator with a hex/octal/binary/decimal converter function, but the best way of really understanding this stuff is to write your own conversion program from scratch in your favourite language.
Binary
Because of the nature of storage devices in computers, the only way a computer has of storing numbers is as a binary system where each digit, or "bit", can only have two possible values, 0 or 1. How can a computer store numbers larger than 1 then?
Table 1.1: Binary numbers use 1 and 0 in the fields to store numbers. LSB denotes the Least Significant Bit or lowest possible value, while MSB is the Most Significant Bit or highest possible value.
As you can see from the diagram, each successive leftmost digit is worth double the value of the previous digit and this is because each successive digit is raised by the power of 2 from the previous digit. Why 2? Well, binary is a base-2 system, so that each digit can only have two values, 0 and 1, unlike our normal base-10 or decimal system where each digit can have 10 possible values, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. However, the principle is just the same. Just as each successive leftmost digit in base-10 arithmetic is raised by the power of 10 (remember that stuff about hundreds, tens and units from school?), so each successive digit in binary is raised by the power of 2. A comparison chart between base-10 digits and binary digits is below.
Decimal | Binary |
0 | 0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 1 |
2 | 0 | 0 | 1 | 0 |
3 | 0 | 0 | 1 | 1 |
4 | 0 | 1 | 0 | 0 |
5 | 0 | 1 | 0 | 1 |
6 | 0 | 1 | 1 | 0 |
7 | 0 | 1 | 1 | 1 |
8 | 1 | 0 | 0 | 0 |
9 | 1 | 0 | 0 | 1 |
10 | 1 | 0 | 1 | 0 |
11 | 1 | 0 | 1 | 1 |
12 | 1 | 1 | 0 | 0 |
13 | 1 | 1 | 0 | 1 |
14 | 1 | 1 | 1 | 0 |
15 | 1 | 1 | 1 | 1 |
Table 1.2: Binary numbers from 0 to 15 use four binary bits
Thus to represent any number in this system it just needs breaking down into the bits needed to add up to the number needed. Try a couple of examples:
(i) We need to represent the number 147 as a binary number. To do this we need to break down 147 into chunks that can be represented in binary, ie 128+16+2+1 = 147, so the bits corresponding to those values in the binary number are flipped to the "on" position giving 10010011.
Table 1.3: Decimal 147 is binary 10010011
(ii) We need to represent the number 31337 as a binary number. Once again we need to break down the number into the chunks that can be represented in binary, but this time there aren't enough values in the eight-bit binary number, often called a "byte" or "octet", to represent the much larger number. The solution is to place two bytes side by side to form a "word" and then to treat the leftmost byte as a continuation of the first. This now gives us a binary number that can store a number up to 65535 by representing 31333 as 16384+8192+4096+2048+512+64+32+8+1, which when all the relevant bits are flipped to "1" becomes 0111101001101001.
Decimal | Bit |
1 | 1 |
2 | 0 |
4 | 0 |
8 | 1 |
16 | 0 |
32 | 1 |
64 | 1 |
128 | 0 |
256 | 0 |
512 | 1 |
1024 | 0 |
2048 | 1 |
4096 | 1 |
8192 | 1 |
16384 | 1 |
32762 | 0 |
Table 1.4: A double byte is called a binary word and can store values up to 65535. The diagram illustrates the decimal number 31337 as 0111101001101001.
Hexadecimal
Strangely enough, human beings aren't used to representing numbers like 147 as 10010011, so the following system was devised to make binary representations easier to read. Hexadecimal is a base-16 number system, meaning that each digit runs from 0-15 so now it can have sixteen different values, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. These values are represented as a single digits as 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F. This looks no easier than binary at first glance, but suddenly decimal 147 becomes much easier to read as 93 (Hex), often written as 0x93 with the "0x" part signifying a hex number, than 10010011. However, the real gain is that the mapping of each hex digit is to half a byte (4 bits), allowing a much faster recognition, readout and conversion of binary numbers in hex form.
Decimal | Binary | Hexadecimal |
0 | 0000 | 0x0 |
1 | 0001 | 0x1 |
2 | 0010 | 0x2 |
3 | 0011 | 0x3 |
4 | 0100 | 0x4 |
5 | 0101 | 0x5 |
6 | 0110 | 0x6 |
7 | 0111 | 0x7 |
8 | 1000 | 0x8 |
9 | 1001 | 0x9 |
10 | 1010 | 0xA |
11 | 1011 | 0xB |
12 | 1100 | 0xC |
13 | 1101 | 0xD |
14 | 1110 | 0xE |
15 | 1111 | 0xF |
Table 1.5: Decimal, binary and hexadecimal comparison table
So, going back to the 31333 example which was 0111101001101001 in binary, we can now represent 0111101001100001 as four hexadecimal digits by breaking 0111101001101001 into 0111, 1010, 0110, and 1001 to get 7A69, which is quicker to read, easier to remember and parses back into binary simply by knowing which bit patterns correspond to which hexadecimal number. In addition to this, hex is more fun. Keep an eye out for Novell internal network numbers that run 0xDEADBEEF or 0x1BADBABE, and when you see one you know that someone hackish set up the system.
Octal
Octal is almost obsolete these days, but you are likely to come across it when trying to work out UNIX file permissions, as all UNIX file permissions are based on a three-bit "bitmask" which defines who can do what with the file, and which can be mapped to octal very conveniently.
As you might guess from the name, octal is a base-8 numbering system, meaning that each digit can represent eight numbers, 0, 1, 2, 3, 4, 5, 6, 7. If you recall from the discussion of binary earlier, you will know that the first three rightmost bits of a binary number can represent number from 0-7, so octal is a shorthand form for 3-bit numbers, just as hexadecimal is a shorthand form for 4-bit numbers. This means that when encountering an octal number like 357, it can be converted into binary by writing out the bits as 3=011, 5=101 and 7=111, giving a binary coding for octal 357 as 011101111.
Binary | Octal |
000 | 0 |
001 | 1 |
010 | 2 |
011 | 3 |
100 | 4 |
101 | 5 |
110 | 6 |
111 | 7 |
Table 1.6: Binary and octal number values
ASCII
The final thing to mention in this section is the mysterious "ASCII" which stands for American Standard Code for Information Interchange. ASCII is a way of representing alphanumeric symbols by assigning the lowest seven bits of a byte to a known symbol guaranteeing that any computer program that reads and writes in ASCII has a consistent mapping of bytes to alphanumeric characters. To see how useful this is, imagine that there were no ASCII, and instead web sites had to create their own character-to-byte mappings. This would cause chaos, with some sites choosing one mapping, and other sites choosing other mappings, but the web browser would have to understand *all* the different mappings.
ASCII gets around this problem by providing a standard mapping that most computers use, allowing text from many different computers systems to be displayed on many other computer systems easily. There are other ways of mapping bytes to characters, but with a bit of luck you'll never hear about them, or by the time you meet another character code you'll be a seasoned hacker.
ASCII Table |
Oct | Dec | Hex | Cha | Oct | Dec | Hex | Char |
000 | 0 | 00 | NUL | 100 | 64 | 40 | @ |
001 | 1 | 01 | SOH | 101 | 65 | 41 | A |
002 | 2 | 02 | STX | 102 | 66 | 42 | B |
003 | 3 | 03 | ETX | 103 | 67 | 43 | C |
004 | 4 | 04 | EOT | 104 | 68 | 44 | D |
005 | 5 | 05 | ENQ | 105 | 69 | 45 | E |
006 | 6 | 06 | ACK | 106 | 70 | 46 | F |
007 | 7 | 07 | BEL | 107 | 71 | 47 | G |
010 | 8 | 08 | BS | 110 | 72 | 48 | H |
011 | 9 | 09 | HT | 111 | 73 | 49 | I |
012 | 10 | 0A | LF | 112 | 74 | 4A | J |
013 | 11 | 0B | VT | 113 | 75 | 4B | K |
014 | 12 | 0C | FF | 114 | 76 | 4C | L |
015 | 13 | 0D | CR | 115 | 77 | 4D | M |
016 | 14 | 0E | SO | 116 | 78 | 4E | N |
017 | 15 | 0F | SI | 117 | 79 | 4F | O |
020 | 16 | 10 | DLE | 120 | 80 | 50 | P |
021 | 17 | 11 | DC1 | 121 | 81 | 51 | Q |
022 | 18 | 12 | DC2 | 122 | 82 | 52 | R |
023 | 19 | 13 | DC3 | 123 | 83 | 53 | S |
024 | 20 | 14 | DC4 | 124 | 84 | 54 | T |
025 | 21 | 15 | NAK | 125 | 85 | 55 | U |
026 | 22 | 16 | SYN | 126 | 86 | 56 | V |
027 | 23 | 17 | ETB | 127 | 87 | 57 | W |
030 | 24 | 18 | CAN | 130 | 88 | 58 | X |
031 | 25 | 19 | EM | 131 | 89 | 59 | Y |
032 | 26 | 1A | SUB | 132 | 90 | 5A | Z |
033 | 27 | 1B | ESC | 133 | 91 | 5B | [ |
034 | 28 | 1C | FS | 134 | 92 | 5C | \ |
035 | 29 | 1D | GS | 135 | 93 | 5D | ] |
036 | 30 | 1E | RS | 136 | 94 | 5E | ^ |
037 | 31 | 1F | US | 137 | 95 | 5F | _ |
040 | 32 | 20 | SPACE | 140 | 96 | 60 | ` |
041 | 33 | 21 | ! | 141 | 97 | 61 | a |
042 | 34 | 22 | " | 142 | 98 | 62 | b |
043 | 35 | 23 | # | 143 | 99 | 63 | c |
044 | 36 | 24 | $ | 144 | 100 | 64 | d |
045 | 37 | 25 | % | 145 | 101 | 65 | e |
046 | 38 | 26 | & | 146 | 102 | 66 | f |
047 | 39 | 27 | ' | 147 | 103 | 67 | g |
050 | 40 | 28 | ( | 150 | 104 | 68 | h |
051 | 41 | 29 | ) | 151 | 105 | 69 | i |
052 | 42 | 2A | * | 152 | 106 | 6A | j |
053 | 43 | 2B | + | 153 | 107 | 6B | k |
054 | 44 | 2C | , | 154 | 108 | 6C | l |
055 | 45 | 2D | - | 155 | 109 | 6D | m |
056 | 46 | 2E | . | 156 | 110 | 6E | n |
057 | 47 | 2F | / | 157 | 111 | 6F | o |
060 | 48 | 30 | 0 | 160 | 112 | 70 | p |
061 | 49 | 31 | 1 | 161 | 113 | 71 | q |
062 | 50 | 32 | 2 | 162 | 114 | 72 | r |
063 | 51 | 33 | 3 | 163 | 115 | 73 | s |
064 | 52 | 34 | 4 | 164 | 116 | 74 | t |
065 | 53 | 35 | 5 | 165 | 117 | 75 | u |
066 | 54 | 36 | 6 | 166 | 118 | 76 | v |
067 | 55 | 37 | 7 | 167 | 119 | 77 | w |
070 | 56 | 38 | 8 | 170 | 120 | 78 | x |
071 | 57 | 39 | 9 | 171 | 121 | 79 | y |
072 | 58 | 3A | : | 172 | 122 | 7A | z |
073 | 59 | 3B | ; | 173 | 123 | 7B | { |
074 | 60 | 3C | < | 174 | 124 | 7C | | |
075 | 61 | 3D | = | 175 | 125 | 7D | } |
076 | 62 | 3E | > | 176 | 126 | 7E | ~ |
077 | 63 | 3F | ? | 177 | 127 | 7F | DEL |
Table 1.7: ASCII
Common Operating Systems
Once upon a time people hacked to get access to a computer. Now, in the days of Cyber Cafes, free ISPs, and cheap computers, getting access to a computer and the Internet is no longer a problem. You need remarkably little computing power to begin hacking. In the early days we all used slow, 8-bit machines with limited memory and cassette tape drives for storage. Purchasing a modem meant paying as much as buying a new computer does now, so we all learned very early on how to build, maintain and use computers built around obsolete, scrounged, junk or cheap kit, and then proceeded to write the programs we wanted ourselves.
So even if you have no money, don't give up! Car boot or garage sales are a source of cheap (if old) computers, and I still run my Pentium 166MMX in a case designed for an IBM 8086 AT machine. Junk can be useful to a hacker in all sorts of ways. My hallway is currently home to a 486SX machine + EGA monitor that I picked up when it was being chucked out. It doesn't sound like much, but for someone with a limited budget, once loaded with the right tools, the 486SX can be a better hacker's machine than the unaffordable Pentium III running WinDoze 95.
For a hacker, ANY computer is better than NO computer!
Let's just have a brief look at some of the operating systems that a computer hacker is likely to come up against.
MS-DOS
For those old, old, old PCs, DOS is it. There are lots of hacking and phreaking tools written that run under DOS (see Chapter 4: The Hacker's Toolbox), and everything from old 8088/8086s right up to the newest, sexiest P3s run it. Best of all, all those ancient luggables, portables and laptops from a few years ago are now so cheap they can be had for a few dollars, or picked up out of a skip. A laptop or portable is an essential tool for learning and exploring hacking and, if you are on a limited budget, learning DOS is going to pay off handsomely later. Currently old 486 machines are about $20-40 in the US, or �15-35 in the UK, from dealers specializing in obsolete kit. These are quite suitable for BBSs, dumb terminals, running LINUX, and making up those holes in your home LAN so that you have enough access for that QUAKE/DOOM/MUD party you and your friends have always planned, but never got round to.
Windows 3.1
For PCs of 80386 class or above, Windows 3.1 is still a viable choice. Faster than Windows 95, and supporting a large base of hacker tools, there are still a very large amount of sites running Win3.1. If you are stuck with it, then use it, but if you have a thirst for knowledge and want to learn about computers, use LINUX.
Windows 95/98
This one you need to know by necessity, because almost 100 per cent of all manufactured PCs are being shipped out with this operating system. Most systems you will find in a corporate or university setting will run Win95/98, so you need to know Win95/98 system (in)security. Know it by all means but, unless you have to use it for some reason, go for LINUX instead.
Windows NT
Considerably more robust than Win95, and requiring more resources, no serious hacker worth their salt should ignore this OS. Insecure, power-hungry and resource-grabbing (and those are the good points), NT can be found the length and breadth of the Internet. I have successfully run NT on a 486DX-66 with 64 Mb of main memory and it only degraded when I put other resource-hungry programs on it. Best of all, NT has been host to many security holes and makes a worthy addition to any hacker's LAN as you try to hack, crack and secure it against the myriad of exploits and Denial of Service (DoS) attacks that are floating around the web. If you really need NT, don't forget to make it dual booting so you can run LINUX on a spare partition when you need to do some real hacking.
UNIX
Found in large corporations, banks, insurance companies, universities, phone and networking companies and the military, UNIX has been the hacker's OS of choice for as long as I can remember. Arcane command lines, cryptic help messages and a multiplicity of variants mean that UNIX has the reputation of being "as friendly as a cornered rat". However, this hides an elegant operating system that begs to be hacked by both black and white-hat hackers alike. I love it. Other hackers love it. You should learn to love it.
Mac OS
Of all the "consumer"-directed operating systems, this is the joker in the pack. Only Apple's high prices and proprietary parts prevented the Mac from becoming the computer on every hacker's desk. It has loads of hack/phreak tools, and older models running on Motorola 680XX series of CPUs can be picked up very cheaply. However, it doesn't lend itself to upgrade and repair so well as a PC does, so if you are on a limited budget, this is may not be the machine for you.
LINUX
In the hacker world Linus Torvalds is the closest we have to a god apart from Richard Stallman. Prior to LINUX the only realistic UN*X variant for 80386-class machines and above was SCO UNIX, which cost mega-bucks. Now everyone can run an operating system with open source, running GNU tools that are equally open-source, modifying and changing the source code as we see fit. LINUX runs the web ... need I say more? LINUX is THE hacker's OS and it's free. Download it and install it now ... if not sooner.
X-Windows
Not really an operating system as such, but a GUI extension to UNIX and LINUX. The only reason I mention it is that (a) Microsoft could have learned a few lessons from X-Windows, and (b) it's riddled with security holes. A while ago you needed a $5,000+ workstation or an X-Windows terminal to play with it, but now you can use X-Free86 on LINUX and have some fun. Yet another reason why LINUX rocks hard.
Common Languages
If you are hacking, then sooner or later you will need to do some programming. Either the tools you require don't exist, or the existing tools don't do everything that you want, or maybe you just have an idea for some code that would be fun to write. Getting to grips with several computer languages is going to improve your hacking skills and teach you more about computers. Here are some of the more common languages and what they are useful for.
BASIC
There was a time when every microcomputer was shipped with a small BASIC interpreter, and many hackers cut their first code using BASIC. BASIC is a simple language that is easy to program and allows small programs to be written very quickly. The disadvantages of most BASIC is that it is slow and lacks any proper structured programming features. BASIC has recently made a comeback as Microsoft Visual Basic, which is certainly visual, but is anything but basic. Some applications come with a form of BASIC which can be very handy for automation purposes, and some networking software suites come with very sophisticated BASIC which interfaces with the TCP/IP stack and allows automated network operations. I leave it to your hackerly imagination to find uses for such a beast, but software like this is very useful to have around.
Assembler
Once you get deeper into the guts of your computer, you begin to realize that there are some things you can't program in any other language but the machine code that the processor understands. This machine code is horribly opaque, mostly consisting of a series of bytes that have meaning only to the processor, and which humans find very hard to write. In order to make this simpler, the ASSEMBLER software was designed to take the meaningful statements called "mnemonics" and turn them into the byte-soup which computers understand. Luckily these days C-compilers are available that can do the job almost as efficiently, but there are still times when you need to program in ASSEMBLER either to get "down to the metal" and control the computer hardware directly, or when the best optimizing C-compiler still doesn't produce code that runs fast enough. Mostly you will never need it, but when you do you'll know it, so learning the basis of x86 ASSEMBLER probably won't hurt you in the long run.
C and C++
The C language is probably one of the commonest languages on the planet. UNIX and LINUX were mostly written in C, and any hacker who is serious about their trade is going to need to use it at sometime or another. Most "rootkits", "exploits", "scanners" and a lot of security software (eg SATAN, COPS) come as a big archive file of C code and will need to be compiled before use. Understanding what is happening is essential, especially when the compilation breaks because the system you are using isn't *quite* the same as the one the package was developed on. C++ is similar to C, but provides Object Orientation to the standard C. It used to be not quite so common because it tends to run more slowly and have larger program files, but more and more applications are being written in C++, meaning that more and more faster processors and large hard disks can be sold to users who now have to put up with slow and bloated code.
PERL
PERL is the language that controls many of the "back-end" parts of web sites. PERL is a wonderful little language that runs on most UNIX boxes, as well as NT. There is a very good chance that when you fill in a form on the web and click on the send button the program that processes the results is PERL via the Common Gateway Interface (CGI). Understanding security and insecurity on the World Wide Web means understanding how PERL/CGI work together. Most hacked web sites have been hacked because the webmaster or designer did not understand how to construct "safe" CGI scripts using PERL. If that isn't a good enough reason to learn it, then I don't know what is.
Java
Java is a relatively new language invented by SUN Microsystems to support distributed applications. The Java code is downloaded off the network and then run using a Java interpreter which is often embedded inside a web browser. Because Java is interpreted, it allows web sites to execute arbitrary code on the machine hosting the Java interpreter. Anyone interested in developing web sites should investigate Java, as well as hackers interested in network security. The great thing about Java is that it is given away free by Sun Microsystems and there are Java tutorials and sample code all over the Internet.
HTML
HyperText Markup Language (HTML) is the presentation language of the web. In theory any page written in HTML can be downloaded and turned into the same page regardless of the web browser used. However, certain browser vendors have not been able to resist the temptation to "tweak" the HTML specification, and thus this ideal is not always realized. There are HTML tutorials all over the web, and all you need to start writing HTML is one of these tutorials, a web browser to display the results and a simple text editor such as EMACS or NOTEPAD. Once you have got to grips with HTML then there is enough free webspace on the web to allow everyone, their families, their pets and their unborn children to have their own homepage.
Conclusion
This chapter has looked at some computer fundamentals to get the newbie started. If you want to learn more, there are web sites with tutorials available on the Internet, or read some of the books recommended in Chapter 14: Learning More. If you are a newbie then I recommend you read everything you can get your hands on for the first 6-12 months, and don't forget to ask questions if you know someone with more knowledge than you. Read computer magazines and study the hints and tips columns for more information, read books on programming and other people's program code, but make sure you do some programming yourself. If things don't make sense now, they will in six months' time, so don't get discouraged by the sheer mass of information that threatens to swamp you at the start.