How Data and Programs Are Represented in the Computer
Essay by review • December 4, 2010 • Research Paper • 1,508 Words (7 Pages) • 1,495 Views
Useful Data
By Sarah E. Hutchinson and Stacey C. Sawyer
How Data and Programs Are Represented in the Computer
Before we study the inner workings of the processor, we need to expand on an earlier discussion of data representation in the computer--how the processor "understands" data. We started with a simple fact: electricity can be either on or off.
Other kinds of technology also use this two-state on/off arrangement. An electrical circuit may be open or closed. The magnetic pulses on a disk or tape may be present or absent. Current may be high voltage or low voltage. A punched card or tape may have a hole or not have a hole. This two-state situation allows computers to use the binary system to represent data and programs.
The decimal system that we are accustomed to has 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9). By contrast, the binary system has only two digits: 0 and 1. (Bi- means "two.") Thus, in the computer the 0 can be represented by the electrical current being off (or at low voltage) and the 1 by the current being on (or at high voltage). All data and programs that go into the computer are represented in terms of these numbers. For example, the letter H is a translation of the electronic signal 01001000, or off-on-off-off-on-off-off-off. When you press the key for H on the computer keyboard, the character is automatically converted into the series of electronic impulses that the computer recognizes.
Binary Coding Schemes
All the amazing things that computers do are based on binary numbers made up of 0s and 1s. Fortunately, we don't have to enter data into the computer using groupings of 0s and 1s. Rather, data is encoded, or arranged, by means of binary, or digital, coding schemes to represent letters, numbers, and special characters.
There are many coding schemes. Two common ones are EBCDIC and ASCII. Both use 7 or 8 bits to form each byte, providing up to 256 combinations with which to form letters, numbers, and special characters, such as math symbols and Greek letters. One newer coding scheme uses 16 bits, enabling it to represent 65,536 unique characters.
* EBCDIC: Pronounced "eb-see-dick," EBCDIC, which stands for Extended Binary Coded Decimal Interchange Code, is commonly used in IBM mainframes. EBCDIC is an 8-bit coding scheme, meaning that it can represent 256 characters.
* ASCII: Pronounced "as-key," ASCII, which stands for American Standard Code for Information Interchange, is the most widely used binary code with non-IBM mainframes and microcomputers. Whereas standard ASCII originally used 7 bits for each character, limiting its character set to 128, the more common extended ASCII uses 8 bits.
* Unicode: Although ASCII can handle English and European languages well, it cannot handle all the characters of some other languages, such as Chinese and Japanese. Unicode, which was developed to deal with languages, uses 2 bytes (16 bits) for each character, instead of 1 byte (8 bits), enabling it to handle 65,536 character combinations rather than just 256. Although each Unicode character takes up twice as much memory space and disk space as each ASCII character, conversion to the Unicode standard seems
* likely. However, because most existing software applications and databases use the 8-bit standard, the conversion will take time.
The Parity Bit: Checking for Errors
Dust, electrical disturbance, weather conditions, and other factors can cause interference in a circuit or communications line that is transmitting a byte. How does the computer know if an error has occurred? Detection is accomplished by use of a parity bit. A parity bit, also called a check bit, is an extra bit attached to the end of a byte for purposes of checking for accuracy.
Parity schemes may be even parity or odd parity. In an even-parity scheme, for example, the ASCII letter H (01001000) consists of two 1s. Thus, the ninth bit, the parity bit, would be 0 in order to make an even number of set bits. Likewise, with the letter O (01001111), which has five 1s, the ninth bit would be 1 to make an even number of set bits. The system software in the computer automatically and continually checks the parity scheme for accuracy.
Machine Language: Your Brand of Computer's Very Own Language
So far we have been discussing how data is represented in the computer--for example, via ASCII code in microcomputers. But if data is represented this way in all microcomputers, why won't word processing software that runs on an Apple Macintosh run (without special arrangements) on an IBM PC? In other words, why are these two microcomputer platforms incompatible? It's because each hardware platform, or processor model family, has a unique machine language. Machine language is a binary programming language that the computer can run directly. To most people an instruction written in machine language is incomprehensible, consisting only of 0s and 1s. However, it is what the computer itself can understand, and the 0s and 1s represent precise storage locations and operations.
Many people are initially confused by the difference between the 0 and 1 ASCII code used for data representation and the 0 and 1 code used in machine language. What's the difference? ASCII is used for data files--that is, files containing only data in the form of ASCII code. Data files cannot be opened and worked on without execution programs, the software instructions that tell the computer what to do with the data files. These execution programs are run by the computer in the form of machine language.
But wouldn't it be horrendously difficult for programmers to write complex applications programs
...
...