One of the questions I often get is "why are my PDF files so large?!". This question typically refers to scanned-PDF files and we all know, usually from direct experience, that scanned color PDF files are larger than black and white, and high resolution files are larger than low resolution files.
Here is the mathematics behind file size. Understanding the math and the variables will give you a way to understand controlling file size.
I - By the Numbers
A) 3 Dimensional Paper
When paper is digitized, it becomes 3 dimensional. The three dimensions are height, width, and bit-depth.
B) Bit Depth
Bit-depth refers to the number of bits needed to represent each dot on the page. Black-and-White dots are 1 bit in size (on for black, off for white) while grayscale is 8 bits in size and colors are 24 bits in size.
C) Dots Per Inch
Dots per inch is controlled at the scanner and with software with 300 DPI being a common setting.
D) Bits make Bytes
As you know, 8 bits make up a single Byte of information (just reciting this for clarity).
E) Compression
To display an image, the raw data must be converted to a format which can be displayed in a PDF. Typical formats are TIFF or JPG. TIFF was developed for fax transmission, where size matters especially back when faxing was invented and data moved over plain-old-telephone-signal wire. TIFF-Group-IV will compress an image at about 20:1 so its been a natural choice as an image format.
II - How it Adds Up
See how things start to add up quickly and sometimes exponentially. Going from gray-scale to black-and-white will reduce your image size by a factor of 8. Going from 300 DPI to 200 DPI will shrink your image size by 1/3 - and will be just as readable.
Lets look at an example of a page scanned at 200 DPI. The formula works like this:
A) Image Size
Other factors to consider when evaluating image file size is:
Happy Scanning!
Questions? Comments? Need Advice? Get in touch at Bill Lipner email
Here is the mathematics behind file size. Understanding the math and the variables will give you a way to understand controlling file size.
I - By the Numbers
A) 3 Dimensional Paper
When paper is digitized, it becomes 3 dimensional. The three dimensions are height, width, and bit-depth.
B) Bit Depth
Bit-depth refers to the number of bits needed to represent each dot on the page. Black-and-White dots are 1 bit in size (on for black, off for white) while grayscale is 8 bits in size and colors are 24 bits in size.
C) Dots Per Inch
Dots per inch is controlled at the scanner and with software with 300 DPI being a common setting.
D) Bits make Bytes
As you know, 8 bits make up a single Byte of information (just reciting this for clarity).
E) Compression
To display an image, the raw data must be converted to a format which can be displayed in a PDF. Typical formats are TIFF or JPG. TIFF was developed for fax transmission, where size matters especially back when faxing was invented and data moved over plain-old-telephone-signal wire. TIFF-Group-IV will compress an image at about 20:1 so its been a natural choice as an image format.
II - How it Adds Up
See how things start to add up quickly and sometimes exponentially. Going from gray-scale to black-and-white will reduce your image size by a factor of 8. Going from 300 DPI to 200 DPI will shrink your image size by 1/3 - and will be just as readable.
Lets look at an example of a page scanned at 200 DPI. The formula works like this:
A) Image Size
- example 8.5 x 11 = 93.5 square inches)
- example 200 x 200 = 40000 dots per inch x 93.5 sq inches = 3,740,000 dots per page
- example Black-and-White = 3,740,000 total bits at 200 DPI
- example Gray-Scale = 29,920,000 total bits at 200 DPI
- example 200 DPI Black and White = 467,500 raw bytes
- example 200 DPI Grayscale = 3,740,000 raw bytes
- example 200 DPI Black and White = 23,375 bytes or 23K page size
- example 200 DPI Grayscale = 187,000 bytes or 187k page size
- 300 x 300 = 90000 dots per inch x 93.5 sq inches = 8,415,000 dots per page
- 8,415,000 dots (bits) divided by 8 = 1,051,875 bytes
- TIFF Group IV compression of 20:1 yields a byte size of 52,593 or 52K
Other factors to consider when evaluating image file size is:
- how the images will be used since you may affect readability. There is a vast difference between viewing a page of type-written text, vs interpreting margin notes during a document review.
- what they will be viewed on . There is a vast difference between reading a scanned image on a laptop vs a large-screen-high-resolution monitor.
Happy Scanning!
Questions? Comments? Need Advice? Get in touch at Bill Lipner email