Ron Horii's Tech Pages:
hand scanner

Mr. Scanner


  • Introduction
  • Predictions
  • Types of Scanners
  • Scan Bit Depth
  • Scan Performance Measurements
  • Scan Resolution Rules of Thumb
  • JPEG Quality vs. Compression
  • Examples of JPEGs at Different Quality Levels
  • My Space-Squeezing Experience
  • Zooming In
  • Reliability Considerations
  • Scanner recommendations
  • Scanner Information and Reviews
  • OCR Software
  • Image Editing Software
  • Scanner Manufacturers
  • Computer Art, Graphics, and Photography
  • 1999 Update Note
  •  2009 Update Note
  • Home Links




    I bought a scanner early in 1997. I thought it might be handy, but I didn't have a clue that it would end up being one of the most useful peripherals I've ever bought. I've come to the conclusion that a scanner is a critical enabling technology that provides a quantum leap in the power and usefulness of a PC. It's a key input device like mice, keyboards, audio digitizers, and video digitizers. It feeds a portion of the real world into a computer, where the processing and communications power of the computer can turn the input into something useful. It fills a critical gap in this regard. Mice, joysticks, and keyboards feed user movements and commands into the computer. Sound cards turn sound into data and vice versa. Video digitizers and digital video cameras turn real-time motion into computer data. One major area that's left is a gigantic one: printed matter and artwork.

    Where is the vast majority of civilized man's knowledge and expression stored? It's not in people's minds. It's not on tape or computer media (yet). It's mostly on paper or canvas in libraries, galleries, museums, schools, businesses, and homes around the world. This includes knowledge and information dating back to the beginning of human civilization. What's the big problem with this information? It's hard to find, and it's hard and slow to get to. You often have to physically travel to places, sometimes far away, to find them. Even when you get there, the material may be damaged, lost, restricted, checked out, or not what you're really looking for. Searching for the information can be very time-consuming, even with computerized card catalogs, because you still have to find the actual works and manually scan through them. Because of the difficulty of getting at this information, it limits its usefulness. Productivity and progress is thus limited.

    The computer is a tremendous tool. It enables a user to search, access, manipulate, file, reuse, and transmit information anywhere in the world instantly. The trick is to get the information into it. The scanner is the link between printed information and the computer. The scanner can take this information and turn it into a form that a computer can work with. It opens up the computer to an enormous and virtually unlimited source of input. With the power of the computer applied to this source of input, the possibilities are staggering. Information is power. Combine that computer processing power with the instantaneous and widespread capability of disseminating information on the Internet and the Worldwide Web, and you have a potential for empowering the average person to a degree that's unprecedented in human history.

    That's all impressive from a global standpoint, but for the average user, the question is: what can a scanner do for me? Here are some general uses:

    Here are some specific business and scientific uses: Here are some specific home uses: For technical info about scanner usage, here is one of the best online documents on scanners: A Few Scanning Tips by Wayne Fulton. The author is a customer and user of a Microtek E3, like me. (Reading his pages along with other reviews was one of the reason I bought E3.) His pages have just about everything you would want to know about scanner usage, terminology, and tradeoffs.


    I predict that scanners will soon become so important that they will be bundled with PC's like modems, and sound cards. A low-end scanner is about the same price as a high-end sound card, and for office use, a scanner is much more useful than a sound card. Printers and scanners complement each other, so it's likely they'll be bundled together with compatible resolution.

    I also predict that the huge memory requirements of scanned images will drive the need for more powerful computers with enhanced graphics-handling capability, more RAM, and more storage. In the storage area, this not only includes larger hard disk drives, but high-capacity removable storage for archiving and transporting data. Standard 3.5" 1.4MB diskettes are inadequate for this task.

    As it becomes easier to incorporate color graphic content into documents, this will increase the demand for high-quality color output devices, such as inkjet and color laser printers. The resolution and quality for both scanners and printers will increase until they meet or surpass film and high-quality printing processes.

    Graphics made the World Wide Web the instant phenomenon that it is today. Scanners put more power into the hands of small office, academic, and home users. Scanned images will proliferate even more on Web pages (like this one), which will increase the demand for more bandwidth on the Web.

    Scanners may displace dedicated fax machines in some office applications where fax demands are light and a computer, printer, and modem are already present. For multiple-page faxing, an auto-document feeder is useful. A scanner, with the appropriate hardware and software at the receiving end, can do something no ordinary fax machine can do: send high-resolution color faxes.

    Digital cameras will proliferate and will displace film cameras to some degree. They won't, however, replace scanners for a long time. The resolution, in terms of total numbers of points digitized, of a digital camera is orders of magnitude smaller than a scanner. A high-end digital camera, costing in the $1K+ range, will have a resolution of 1000+ points total across the entire image. A low-end scanner costing under $100 can resolve (with interpolation) 1000+ points every fraction of an inch. A digital camera has the advantage of portability and immediacy and is very useful, but it's a different use from a scanner. It's like the difference between RAM and hard disk storage. The two are complementary, not mutually exclusive. I do believe that digital cameras will become more popular as they get better and cheaper, but they'll displace film cameras, not scanners.

    Types of Scanners

    There are several types of scanners. The type to get depends on your application:

    Type  Usage
    Handheld  Specialized applications, portable scanning of documents for OCR. Cheap, light, but limited in size of scan and quality.
    Sheet-Fed Ideal for OCR of multiple sheets of text pages. Many have auto-document feeders. Compact, some are portable, can be cheaper than flatbeds. Quality not quite up to the best flatbeds. Size of scan theoretically unlimited in vertical direction. Specialized variations include photo scanners, business card scanners, and combination keyboard-scanners.
    Flatbed Most versatile scanner. Can scan sheets, books, objects. Wide range of price, quality. Some have auto-document feeders and slide scanners as accessories. Takes a lot of deskspace. 
    Film For direct scans of negatives and slides, usually 35 mm. For professional photographic work of highest quality. Compact, but more expensive than above types. Has widest dynamic range.
    Drum Highest quality for scans of sheets. Uses photomultiplier tube. Has highest resolution, dynamic range, and color fidelity. Extremely expensive, graphics arts pros only. Usually owned by service bureaus.

    Scan Bit Depth binary numbers

    Scanners convert analog data (page images) to digital data. Digital data can have different bit depths depending on the application and scanner hardware:

    Scan Performance Measurements tachometer

    I ran a test on my scanner at home to see how long and how much memory it takes to scan at different resolutions and in color vs. black & white. Resolution is measured in SPI (Samples Per Inch) or DPI (Dots Per Inch). I measured scan time from the time I hit the button to start the scan until the scanned image started to appear on the video screen.

    Scanner: Microtek Scanmaker E3
    Maximum resolution: 300 X 300 DPI optical resolution, 2400 X 2400 interpolated resolution
    Interface: SCSI
    Computer: Pentium 120 CPU
    Memory: 49 MB RAM, 128K pipeline burst cache
    Scan size: 8 1/2" X 11"

    Scan Size and Time Measurements Table

    Scan Type Resolution File Size Scan Time
    1-bit B&W 75 DPI 258 KB 13 secs
    1-bit B&W 100 DPI 458 KB 18 secs
    1-bit B&W 300 DPI 4.11 MB 65 secs
    1-bit B&W 600 DPI 16.44 MB 175 secs
    24-bit Color 75 DPI 6.17 MB 59 secs
    24-bit Color 100 DPI 10.96 MB 81 secs
    24-bit Color 300 DPI 98 MB *
    24-bit Color 600 DPI 394.4 MB *
    24-bit Color 1200 DPI 1.58 GB *
    24-bit Color 2400 DPI 2.4 GB *

    * Not enough memory to scan.

    This shows that scan time and memory required can vary tremendously with resolution. Color also takes much more time and memory than B&W. Scanning is processor and memory intensive, so the scan speed depends on the speed of the PC and the amount of RAM installed. The memory requirements are such that even if your scanner is capable of 1200 to 9600 DPI interpolated, you may find it impractical to use such high resolutions except for special purposes. (See Zooming In.)

    Scan time may or may not be important, depending on what your application is. If you're using the scanner for photo imaging, you'll more likely spend much more time editing and manipulating the photo scan than doing the scan itself. Scan time becomes important if you're scanning in multiple pages for OCR or archiving purposes. However for OCR work, the OCR processing time can be longer than the scan time.

    One problem with trying to compare scan times is that there is no standard measurement. One manufacturer may quote the scan time at 300 DPI of a 4 X 6 photo, while another may quote the scan time for an 8 1/2 X 11 sheet at 100 DPI. Independent reviewers will test all the scanners they're reviewing under the same test conditions, but different reviewers may use different conditions. I've seen in some reviews that the relative ranking of the speed of different scanners will vary with the test conditions. The PC speed and configuration can have a big effect on the scan speed. Some scanners may have built-in intelligence and rely little on the PC speed, while others may heavily use PC resources and will be very sensitive to PC speed. Scanner drivers can vary in the way they use RAM and hard drive space to store scans, so the scan speed can be greatly affected by the amount of RAM or the speed of the PC's hard drive.

    Scan Resolution Rules of Thumb Thumb's up

    One of the most important scanner specs is resolution. Resolution is measured in samples per inch (SPI) in the horizontal and vertical directions. It is usually specified as DPI (dots per inch) or LPI (lines per inch), but technically, what the scanner is doing is acquiring a certain number of digital samples per inch. Except for drum scanners, which use photomultiplier tubes, most scanners use CCD arrays. The horizontal resolution is fixed by the number of CCD elements across the page. The vertical resolution is determined by the stepper motor step size. In sheet-fed scanners, the stepper motor moves the paper. In flatbed scanners, it moves the CCD array. Moving the CCD array is a more precision process than moving paper, so flatbed scanners can have higher vertical resolution than sheet-fed scanners. Most scanner manufacturers specify the optical resolution, which is the hardware-limited resolution, but they also tend to specify the interpolated resolution, which uses software to generate fake samples in-between real samples. It's mostly useful for removing the "jaggies" from line art. Be wary of suspiciously high resolution specs. If you see a new <$100 scanner boasting "2400 DPI" resolution, it's probably specifying the interpolated resolution.

    How much resolution do you need? It depends on your application. The more resolution, the more you pay, so you don't want to pay for resolution you don't need. However, if you have multiple applications for the scanner, get the resolution appropriate for the most demanding application you think you MIGHT have. Keep in mind that the important specification is image resolution. Image resolution is the resolution of the final output. That depends on the initial sample resolution and how much you blow it up or shrink it down. If you blow images up, you'll need more sample resolution than if you use them actual size. Conversely, if you shrink scanned images down, you need less sample resolution. If you don't know what you need, get the highest resolution scanner you can afford, or wait until the prices come down. Here are some rules of thumb on what scanning resolution is needed for different applications, based upon using the images actual size:

    JPEG Quality vs. Compression

    The JPEG (Joint Photographic Experts Group) format is the most popular file storage format for color photographs on the Web. The graphic nature of the Web has promoted the use of color graphics, but the bandwidth limits of most Web connections has made it necessary to reduce the size of graphics files. JPEG has the advantage that it can provide a tremendous level of file compression and yet maintain the color and fidelity of photographic images. It stores 24 bits of color information, so it can provide much greater color fidelity and range than the GIF format, which is an 8-bit (256 color) format. JPEG is a "lossy" format, however, so some detail can be lost, unlike GIF, which is lossless. The higher the level of compression, the greater the loss. JPEG is most appropriate for photographs with smooth color transitions and not a lot of sharp edges or details. It is not appropriate for line art or text. It is also not appropriate for storing master copies of images that will be edited later. Master copies are best stored as BMP or TIF files. You should not edit or post-process JPEGs and re-save them as JPEGs. You lose too much image quality that way. JPEGS should be created from raw scan data or BMP or TIF files. I ran an experiment to see how much space a JPEG file takes and how the quality varies depending on the level of compression. I did one scan, sharpened it a little, and converted the scan data to JPEGs using different quality levels. Here are the test conditions:

    Scan size: 5.89" X 3.96" (4 X 6 photo)
    DPI: 21
    BMP size: 122,894 bytes
    Color: 24-bit
    Pixels: 246 X 166

    JPEG Quality vs. File Size

    The following are the file sizes for each JPEG quality level. Notice the tremendous reduction in file size, even with the highest quality. The amount of compression varies with the image. Images with little detail will compress the most. I chose a test image that had a lot of detail in the middle third, less detail in the bottom third, and almost solid color in the upper third. The sample image is shown below.
    Quality File Size
    5% 1,528
    15% 3,113
    25% 4,475
    35% 5,903
    45% 7,021
    55% 8,053
    65% 9,488
    75% 11,559
    85% 15,204
    95% 25,981

    Examples of JPEGs at Different Quality Levels

    The pictures below were scanned in under the conditions above. The BMP file was saved as progressive JPEGs at different quality levels. The lower the quality level, the higher the compression rate and the smaller the JPEG file size. The BMP file is too big too include, but looks identical (to my eyes at least) to the 75% quality JPEG. Notice how the quality degrades with increased compression. Details start to get blurry and artifacts appear in the sky near other objects.

    JPEG at 75% quality

    JPEG at 75% quality, 11,559 bytes
    This looks virtually identical to the original BMP file, but is much smaller. 75" quality is a safe compromise between quality and compression for most images.

    JPEG at 55% quality

    JPEG at 55% quality, 8,053 bytes
    This still looks acceptable, but some artifacts can be seen in the hills at the left and along the horizon. Some of the details are starting to get blurry. You can get away with this level of compression if you have an image with large details, or where distortions in the detail are not apparent, such as in pictures of trees or grass. Also, you shouldn't have detailed areas next to solid-color areas or else you'll see color artifacts in the solid areas.

    JPEG at 25% quality

    JPEG at 25% quality, 4,475 bytes
    The details along the horizon and left side are very blurry. Detail is lost on the hills. The sky shows some blockiness on the right side. It's marginally acceptable, and the high degree of compression may be more important than the quality.

    JPEG at 15% quality

    JPEG at 15% quality, 3,113 bytes
    This looks a view through a wet window. Much detail is lost. The sky is severely blocky, with many color artifacts. This is probably unacceptable for this image, but there may be some images where this level of quality works. The only way to tell is try it and see.

    My Space-Squeezing Experience

    If you go to my Bay Area Back Pages you'll see a lot of scanned photographs, around 140. I originally scanned them in at around 25-50 DPI, depending on how much I wanted to zoom in on the original. I tried to scan them in so the bitmap size would be around 100-200K. I then converted them to JPEG files. At first, I compressed them with 75% quality and got them to around 15-20K. However, I had so many pictures, I rapidly filled up the 2 MB of space allotted me by Prodigy Internet. To get more space, I went back and recompressed as many of them as I could. I did a no-no and re-JPEG'd JPEGs. I was too cheap to save the original scans in TIF format and too lazy to hunt down the pictures and re-scan them, but it still worked out OK. Since these were small pictures, and since these were travel pictures, not art pictures, I could tradeoff quality for quantity. I was able to squeeze out over 250KB more space total (and make room for this Web page). I was able to squeeze many of the pictures at 55% quality and still have acceptable (in my opinion) quality. The filesize was reduced to around 8-13K, which is a huge reduction from the original. Of course, if I didn't have a measly 2 MB to work with, I wouldn't have had to go through all this trouble (excuse the shameless plea for more space).

    Zooming In

    The above photos were scanned at a low resolution of only 21 DPI, which reduces the 4 X 6 original quite a bit, but it gives a recognizable image and a JPEG of tolerable size for Web use. My scanner has an optical resolution of 300 X 300 DPI, which is way overkill for Web images. The only time that resolution might be used for an online picture is for zooming in on a small area. In the above image, I zoomed in on the white school buildings to the left of center, which is an area of 0.49" X 0.26" and takes 134 KB as a bitmap. The image below is a 65% quality JPEG of the image, which takes 9604 bytes.

    Zoom in on schoolThe raw image was somewhat blurry, so I sharpened it up with PhotoImpact. The sharpening process enhances the edges of objects, but it also introduces some "noise" into the picture. The original photograph was taken with a $40 point-and-shoot camera on 35 mm film, processed at a discount store, so it's not the sharpest original in the world. I don't know if it's true, but I read somewhere that mass market photofinishers tend to print negatives slightly out of focus to hide dust, grain, and scratches. This means that if you scan these prints at high resolution, they'll be blurry. For maximum resolution and quality, dedicated negative scanners are the best, but that's only necessary if you're a very serious user or a graphics arts professional. The point is that if you're using the scanner for Web images, you don't need very much resolution. Even the cheapest scanners are adequate.

    Reliability Considerations

    One good thing about flatbed scanners is that you can easily look inside them and see how well they're made. You can see if they use precision metal parts or a lot of cheap plastic parts. You can get an idea of how rugged the construction is. Fortunately, scanners are relatively simple mechanically. They are like inkjet or dot matrix printers in that they have an active mechanism that moves linearly in a precise manner. However, unlike printers, the mechanism moves relatively slowly and infrequently. Whereas a printer's mechanism may have to move back and forth at high speed a hundred times or more per page, a scanner's mechanism only has to slowly sweep once per page (or 3 times for older 3-pass scanners). Thus, a scanner's mechanics are less likely to wear out than a printer's, given the same quality of construction.

    I would guess, based on the inherent design differences between sheet-fed and flatbed scanners, that flatbeds are more reliable. The sheet-feds have to handle paper and are more prone to jamming, just like printers. Paper tends to shed particles, which can clog the mechanics or dirty the optics. Dust and other contaminants can get into the innards of a sheet-fed more easily than a flatbed, which tends to be sealed up. It's like the difference between a floppy disk drive and hard disk drive.

    The component most likely to go out first is the scanner's lamp. Many scanners, like my E3, use a standard fluorescent lamp that stays on all the time to stabilize its color temperature. Other scanners use cold cathode lamps that have 10,000 hour lives, which is probably beyond the useful life of the scanner. On the other hand, even though standard fluorescent lamps don't last as long as cold cathode lamps, they are cheap to replace, about $5.

    Scanner Recommendations making a point

    What's the best scanner to buy? That's like asking what's the best car to buy.  The best answer is: it depends on a lot of factors. Here are some key factors to consider:

    Type of Scanner

    For all-around general use at home or in the office, the flatbed scanner is the best since it's the most versatile. It may not be the best at all jobs, but it can do the most jobs. Some come with options such as auto-document feeders (ADF) and transparency adapters, though these tend to be very expensive. If you need automatic sheet-feeding, it's cheaper to buy a dedicated sheetfed scanner, though they tend not to have the scan resolution of flatbeds. On the other hand, most ADF applications don't need high resolution or even color. If you know you're only going to be scanning sheets of paper and you're short on desk space, get a sheetfed scanner. If you don't know, what you're going to use a scanner for, get a good flatbed that has an ADF option. Use the scanner a lot and see if you need an ADF. If so, see if it's cheaper to buy the ADF option or get a separate sheetfed scanner with built-in ADF with the resolution you need. ADF is only appropriate for certain applications: OCR'ing pages with text only, faxing documents, or copying pages. Other applications require manually manipulating the image, often after each scan, so ADF is redundant.

    Most flatbed and sheetfed scanners use CCD (charge-coupled device) arrays with a system of mirrors and lenses to project the scanned image onto the array. A recent innovation that does away with the mirrors and lenses is the CIS (contact image sensor). CIS uses a long, thin array of sensor elements next to a row of color LEDs that provide a light source. The advantage of the CIS technology is that it allows the scanner to be very thin and light. It also uses less power, which can allow these to be powered by batteries or by the power from the USB port. As with any new technology, it is going through some growing pains, so the image quality is still inferior to traditional CCD designs. This may change in the future. For most home and office uses, the space-saving (vertical height, not desk area) and the lower power is not a big deal, so there's no desperate need for this technology.

    If you're a serious film photographer and want the ultimate in picture quality for print publishing, get a film scanner. Unfortunately, they're much more expensive than flatbeds and are much more specialized. The professional-level ones are over $1000, but the prices are coming down to the point where lower-priced models are affordable by serious amateurs. They are still not quite mass-market yet (and it's uncertain if they will ever be), so you don't see as many new models or such intense price competition as in the flatbed market. HP has one, the Photosmart, that is geared towards high-end consumers and is priced below $500. It can handle not only slides and negatives, but color prints up to 5X7. Other big players in this arena are the traditional camera companies, like Kodak, Nikon, Minolta, Polaroid, Konica, and Olympus. All have models above and below $1000. Microtek's ScanMaker 35T Plus is the venerable scanner manufacturer's slide scanner. (See the Scanner Manufacturers links below.) Color slides have tremendous dynamic range, much more so than prints. You need a good film scanner, preferably with 30-bits or more bit depth to capture and take advantage of that dynamic range. Since film scanners are aimed at  professionals or serious amateurs, performance is more important than ease of setup. That's why these scanners mostly have SCSI interfaces.


    One important consideration is what type of interface to use. The most common types are SCSI and parallel port, with the new USB bus becoming more popular. SCSI scanners should theoretically be faster than parallel port scanners, but the mechanical speed of the scanner may make more of a difference, depending on the scan resolution. Parallel port scanners are a lot easier to setup and can easily be moved from one PC to another. That's why they've become more popular than SCSI scanners for SOHO use. SCSI scanners are the best choice for demanding professional users who need the highest performance and are willing and able to handle the technical installation details.

    SCSI scanners may come with their own SCSI card. My E3, for instance, came with a low-end Adaptec SCSI card that only officially supports one device. Other SCSI scanners require you to provide your own SCSI card. SCSI cards can cost as little as $50 for a low-end model, to >$200 for high-end cards. Some scanners have their own proprietary AT-bus interface cards; some are plug-n-play. In any case, unless you already have a SCSI card installed, you need to open up your PC, which can be a pain for some users. SCSI cards also require a precious interrupt, which may not be available if you already have a lot of peripherals on the system (not that the archaic PC architecture has a lot of interrupts to spare - don't get me started on a diatribe about this).

    There are many different kinds and price ranges of SCSI interface cards. The cards that are typically bundled with low-cost scanners are very simple cards that are intended to be used only with the scanner. They run in programmed I/O mode, which means the CPU has to get involved with each bye of data transferred. The Adaptec card that came with my Microtek E3 is one such card. While scanning, it totally ties up the CPU, so all other processes are frozen. For more money, you can get a general-purpose bus-mastering SCSI interface card that will not tie up the CPU as much while scanning. It can also interface with more than one device. This is an advantage. If you can spare an interrupt to install the SCSI card, you can plug multiple devices into that SCSI card and only use that one interrupt. However, SCSI peripherals can be tricky to set up. There are several flavors of SCSI (SCSI-1, SCSI-2, Ultra-SCSI, wide SCSI, etc.), so you can run into complications if you try to drive different types of SCSI types from the same card. Different types of SCSI use different connectors, which are not compatible. There are adapters available to convert from one type to another (e.g. 68-pin to 50-pin), but they tend to be expensive.  SCSI also has cable-length limitations and termination requirements that you have to keep in mind.

    If you already have an interrupt allocated for the parallel port, you don't need another interrupt to run the parallel port scanner. These scanners have pass-through connectors so you can hook up your printer to the same port. If you also have a parallel-port Zip drive, you could (theoretically), daisy-chain all three. However, this can have compatibility problems, particularly with the printer. You could use a switch box, but that can also cause problems, depending on how picky your printer is about signal quality. You also have to remember to manually set the switch box to the right position. Like SCSI, the parallel interface has cable length limitations, but they are less well-defined than SCSI. Usually, it's the printer that's the pickiest about the cabling.

    The easiest interface solution, in my opinion, is the new Universal Serial Bus (USB). This bus is found on most new computers and is supported by Windows 98. It's not as fast as SCSI, at 12 million bits/sec, but it's much easier to set up and is faster than most parallel ports. It's also much more expandable. Theoretically, you could hook 127 USB devices on the same bus, not that you'd want to or could afford to. USB connectors are smaller and easier to hook up than parallel port or SCSI cables. They are also thinner and can be longer. The USB port provides power to low-powered peripherals like joysticks. Most scanners use too much power to be powered from the USB port, but with the new low-powered CIS technology, this may change. USB is relatively new, but more and more peripherals are coming out with it. A USB scanner would be my choice if I had a new PC and needed a new scanner. If you have an old PC and don't have built-in USB ports, you can add a USB card. Newer Macs, like the iMac, also have USB ports. If you have a Mac, you need to make sure the software that comes with the scanner supports it.

    You have to decide on what your priorities are and read the reviews and manufacturers' specs to see how different scanners compare in each of these factors. See the links below to go to those specs and reviews. Unless you have limited specialized applications, the best type of scanner to get is a flatbed. It's the most versatile, but it does take a lot of space. Personally, I like my Scanmaker E3. It works great. The software bundle is very good, especially PhotoImpact. I haven't yet had a need for a higher optical resolution than 300 DPI, but my HP660C printer is limited to 300 DPI. However, the E3 is not state of the art anymore. For a few dollars more, you can get a 600 X 300 scanner by several companies. 600 X 600 and even 600 X 1200 scanners for reasonable prices are becoming more common.

    Scanner Information and Reviews newsboy

    OCR Software stack of papers

    OCR stands for Optical Character Recognition. It's the process of scanning printed or even handwritten materials and converting the graphical representation of the text into character or numeric data. The data can then be manipulated by such programs as word processors, spreadsheets, and databases. Books and articles can be scanned in and converted to text files. The text files can be orders of magnitude smaller than the bitmapped scan data. Also, the text files are capable of being searched, sorted, copied, and filed.

    The process of optical character recognition is not an easy job. It requires a tremendous amount of computer power, and it's only with the more powerful processors like Pentiums has it become practical for home computer use. If you consider all the thousands of fonts that are available, in all different sizes and spacing, the OCR program has to be very flexible to be able to recognize them. Some characters are very difficult to tell apart, such as 1, I, l, 0, O, and o, especially with mixed fonts of different sizes. The better OCR programs can be "trained" to improve recognition quality, especially with unusual fonts. The quality of the original is also important for accurate scanning. Dirt or smudges on the original, or copies with blurry or broken letters are also difficult to scan. Decimal points can get lost if they're too small, or spots can appear as periods in the wrong place.

    The hardware requirements for good OCR work is not too demanding. The scan resolution required depends on the size of the text. The smallest font size that can be scanned effectively is usually around 6 points. At this size, you need to scan it at 300-400 DPI. For bigger fonts, you can and should scan at a lower resolution. Over-scanning doesn't help and can even hurt. The OCR program has to sift through all that scan data to recognize the characters. No use overloading it with unnecessary samples. Normally, documents are scanned as line art.

    The accuracy of the OCR process is mostly dependent on the software. There are many OCR programs. Most scanners come bundled with one. The OCR software bundled with lower-end scanners are often simpler versions or out-of-date versions of full-featured OCR packages. Which is the best OCR program? New programs and updates appear regularly, so the situation changes constantly. In general, Caere's Omnipage and Xerox's Textbridge have gotten the best ratings. A lite and old version of Omnipage came with my Scanmaker E3. It does a good job on text in the 10-point range. However, when I tried it on 6-point type, it made a lot of errors. I didn't try optimizing it, however. I later tried the latest and greatest version of Xerox's Textbridge (on a UMAX 600P scanner) with the default setup, and it read the same text with few errors. Many OCR programs, including Textbridge, allow for training the OCR to improve its accuracy. An OCR program can be setup to feed text directly into a word processor. It can appear as a menu item in the File menu of a word processor.

    Image Editing Software painting

    Image editing software is essential for getting the most out of a scanner. Most scanners come with bundled image-editing software. Often, they are low-end versions of full-featured editors. The industry standard for photo image software is Adobe Photoshop. It has every feature imaginable, and is the standard against which all other programs are compared. However, Photoshop by itself can cost around $500-$800--several times more than a low-end scanner by itself. Adobe has a "lite" version of Photoshop called PhotoDeluxe that's bundled with low to mid-range scanners in the $100-$300 range. Photoshop itself is bundled with mid to high-end scanners in the $800-$1000 range and above. Ulead's PhotoImpact (special edition) is a fairly powerful program that comes with several scanners, like my Microtek E3. It has lots of image processing functions and special effects, but unlike the Adobe products, it does not have layers. Layers allows you to combine multiple images, graphics, and text, with one overlayed on the other. Some features of image editing software include the following:

    Image Editing Software Links artist's palette

    Scanner Manufacturers Scaner

    cameraman Computer Art, Graphics, and Photography SLR

    1999 Update Note: This is the first update for this page in over a year. Most of what I wrote above was done at the end of 1997. Since then, many things have changed in the scanner world. Prices have continued to come down. I predicted that scanners would become more common and even bundled with PC systems. This has been happening. Prices have come down to the point where you can get a decent scanner for under $30. I had a table of scanner prices, but they're obsolete now, so I deleted the table. Prices change so fast, I can't keep up with them, so I'll try to avoid mentioning specific prices.

    Recent hardware innovations that have popped up since the last update are the Universal Serial Bus interface and the Contact Image Sensor (CIS) technology, which I discuss above. I also got a new computer with a USB bus, but I haven't seen any reason to replace my good old reliable SCSI Microtek S3 with a hot new USB model.

    I also added some more information about film scanners. I checked and deleted some of the dead links in the reviews links, but kept some of the old ones since some of the information is still valid, if not the prices. I updated the manufacturers links. Some manufacturers (like Storm recently) have died or changed names. New ones have popped up. The old reliable brands (HP, Microtek, Umax, and Mustek) are still plugging along. I added a few new retailers. I haven't gotten around to checking all the other links, so if there are a few dead ones, I'll prune them out later.

    I admit that when I first created this page, I went overboard on animated GIFs and horizontal rules, and it's way too long. If I ever get time, I'll re-design and re-organize this page. I'm still dabbling with Web page styles. To see one of my latest pages, go to Bay Area Biking or the even more recent North Coast and Redwood Empire pages.

    2009 Update Notes:
    Geez, it's been 10 years since I updated this page. It's so old, I'm tempted to just start over. Instead, I just cleaned it up a bit, fixed the spelling errors, deleted lots of dead links, and added some new ones. The page is interesting from an historical perspective. While some of the details and predictions are a bit out-dated, the principles are still valid. Many changes have occurred since this was first written. USB has become the only interface for scanners. USB 2.0 has increased the speed of the USB interface. SCSI and parallel port printers are no longer available. All-in-one multi-function scanner-printer-faxes have gotten more popular and are cost and space-saving alternatives to having a separate printer and scanner. Many small scanner companies have disappeared. Microtek has stopped selling to consumers in North America.
    Digital cameras have pretty much displaced film cameras. Konica and Minolta have exited the consumer camera and scanner business.

    I got rid of my good old reliable SCSI Microtek S3 scanner and got a USB Epson 1650 and a Canon Canoscan CIS scanner. I use digital cameras for photography now, so I rarely use the scanners for scanning pictures, except for old photos to be used in digital picture frames. See my web page on outdoor photography for more on digital photography. I mostly use the scanners for scanning documents so I have a softcopy of them and can e-mail them.

    Click here to go to Ron Horii's Bay Area Back Pages (lots of scanned photos)

    Previous version 12/11/97. Latest Update 3/6/2009.