πŸ“· Compress Scanned PDF

Last updated: April 11, 2026

πŸ“· Compress Scanned PDF

Downsamples & recompresses embedded images β€” dramatically shrinks image-heavy scanned PDFs while keeping them legible.

πŸ“„

Drop your scanned PDF here

or click to choose a file

πŸ“„
50%
Processing…

All processing happens entirely in your browser. No file is ever uploaded to any server.

Why Scanned PDFs Are So Bloated β€” and How Downsampling Actually Fixes It

Open a scanned PDF from a photocopier, a document management system, or a mobile scanning app, and you will often find something surprising: a fifteen-page contract that contains almost no meaningful text data has somehow grown to 80 or 90 megabytes. The culprit is not the text β€” text compresses beautifully. The culprit is the raster image that represents the text. Every page of a scanned document is, at the PDF level, a photograph of paper. And photographs, especially ones stored at scanner-default resolutions, are enormous.

Understanding why requires a short trip inside the PDF format itself. A PDF file is a collection of numbered objects β€” dictionaries, streams, cross-references. Each page typically references one or more image XObjects, which are stream-encoded blobs of pixel data. The scanner software that created the file decided, at capture time, how many dots per inch to record and which compression algorithm to apply. Most office scanners default to 300 DPI. At 300 DPI, an A4 page is 2480 Γ— 3508 pixels. In full RGB color, that is just under 26 million bytes of raw image data per page. Even with JPEG compression at a decent quality level, each page image can easily occupy 500 KB to 2 MB. Multiply by 40 pages of a board meeting packet and you have a file nobody wants to email.

The Two Levers: Resolution and Quality

Compressing a scanned PDF comes down to two independent decisions that compound each other: how many pixels to keep, and how hard to squeeze those pixels with lossy compression.

Resolution reduction (downsampling) is the more powerful lever for large savings. Halving the linear dimensions of each image β€” from 2480 Γ— 3508 to 1240 Γ— 1754 β€” reduces the pixel count by 75 percent before any other compression is applied. For on-screen reading and printing at normal sizes, 150 DPI is perfectly legible for typed text and comfortable for body-sized handwriting. The visual degradation going from 300 to 150 DPI on a monitor is negligible because monitors themselves typically render at 96–220 DPI. Only workflows that involve legal-grade archiving, forensic examination of signatures, or printing at large format have a real need for 300 DPI scanned originals.

JPEG quality (the quantization parameter) is the second lever. JPEG works by transforming pixel blocks into frequency components via a discrete cosine transform, then quantizing those frequency values β€” discarding fine detail the human visual system is least sensitive to. Quality 50–60 (on a 0–100 scale) typically produces an image that looks identical to quality 90 for document scans, because scanned paper contains very little of the high-frequency texture that gets mangled at low quality. The savings are substantial: a quality-50 JPEG of a page scan is often 4–8Γ— smaller than a quality-85 version of the same pixels, with no visible difference at normal reading zoom.

What the Browser Is Actually Doing

This tool performs the compression entirely in the browser using three modern web APIs that require no external libraries. The first is TextDecoder with Latin-1 encoding, which lets us treat the raw PDF bytes as a string for the purpose of scanning for structural keywords β€” object declarations, stream boundaries, trailer dictionaries β€” while maintaining a 1:1 correspondence between string character positions and byte offsets. That correspondence is crucial: when we locate a stream within the text representation, we can slice the exact same byte range from the original Uint8Array to extract the compressed image data.

For each image XObject we find β€” identified by its dictionary containing /Subtype /Image alongside dimension and filter metadata β€” the tool dispatches one of two decompression paths. Images stored with the DCTDecode filter (raw JPEG data) are decoded by creating a Blob, calling createImageBitmap, and reading the result back through an OffscreenCanvas. Images stored with FlateDecode (zlib-compressed raw pixel data, common in PDF/A and distilled documents) are decompressed using the DecompressionStream API with format negotiation between 'deflate' (zlib-wrapped) and 'deflate-raw'. If the FlateDecode stream carries a PNG predictor β€” a per-row filter byte prepended to each scanline that dramatically improves zlib's compression of pixel data β€” the tool undoes all five PNG filter types (None, Sub, Up, Average, Paeth) before interpreting the pixels. CMYK color spaces are converted to RGB via the standard formula before the canvas render step.

After decompression, the pixel data is placed on an OffscreenCanvas at the original dimensions, then drawn down onto a smaller canvas at the chosen scale factor using bilinear interpolation (imageSmoothingQuality: 'high'). The rescaled canvas is then encoded to JPEG at the specified quality via convertToBlob. The resulting JPEG bytes replace the original stream, the image dictionary is rewritten with updated Width, Height, Length, and Filter values, and the object is placed into the rebuilt file.

The PDF is then reassembled by writing each processed object sequentially, tracking byte offsets as we go, and constructing a fresh cross-reference table that maps each object number to its new offset. The original trailer dictionary β€” which contains the critical /Root and /Info references β€” is preserved and re-emitted with an updated /Size. The result is a structurally valid PDF that all compliant readers can open.

What It Won't Help With

This approach targets the specific problem of image-heavy scanned documents. If your PDF's size is dominated by embedded fonts (common in design files and some export workflows), this tool will leave it unchanged β€” those objects are kept verbatim. Similarly, PDFs that use obscure or multi-layered filter chains (for instance JBIG2Decode, which is used for monochrome scanned text at very high compression ratios, or CCITTFaxDecode for Group 4 fax encoding) are skipped rather than corrupted β€” the original stream is kept as-is. Encrypted PDFs will show an error early in parsing. For those cases, dedicated tools with full filter support are the right choice.

Encrypted PDFs are a particularly common pitfall. Many enterprise scanners apply 128-bit RC4 or AES encryption to output files. The PDF object structure is still readable, but stream contents are encrypted, so decompressing them as if they were plain zlib or JPEG data produces garbage. The tool detects missing object data and gracefully falls back to keeping the original.

Choosing Your Settings

For most typed-text scans intended for screen reading or emailing, the 50% downscale combined with quality 50 is the sweet spot β€” expect 70–90% size reduction on typical office scanner output. If you need the document to print clearly on a standard printer, use 75% downscale with quality 60–70. If you are archiving something that must be re-scanned-quality or enlarged, stay at 100% scale and use quality 75–85 to achieve moderate savings without pixel reduction. For documents going purely to web thumbnails or preview renders, the 25% scale at quality 40 produces the smallest possible output while retaining readability at zoom levels up to about 100%.

The quality-size tradeoff is not linear. Going from quality 90 to quality 70 cuts JPEG size roughly in half. Going from quality 70 to quality 50 cuts it roughly in half again. Below quality 30, blocky DCT artifacts become noticeable in high-contrast areas like printed text on white paper β€” which is precisely the content of most scanned documents β€” so there is limited practical use below 35%.

Since everything runs in your browser, there is no file size limit imposed by a server, no privacy risk from uploading sensitive contracts or medical records, and no dependency on a network connection. The processing time is proportional to the number of pages and the original image resolution, typically completing in a few seconds for a standard 20-page A4 scan on a modern machine.

FAQ

Will this work on any PDF, or only specific types?
It works best on scanned PDFs where each page is a raster image β€” the kind produced by photocopiers, mobile scanning apps, and document scanners. PDFs that are primarily vector graphics or formatted text (like those exported from Word or InDesign) contain little or no image data, so the file size will not change much. PDFs with JBIG2, CCITTFax, or other exotic image filters are supported on a best-effort basis: unsupported images are kept as-is rather than corrupted. Encrypted PDFs cannot be processed.
Is 50% downscale too aggressive? Will the text still be readable?
For the vast majority of scanned documents intended for on-screen reading, 50% is perfectly fine. Office scanners default to 300 DPI; reducing to 150 DPI is still sharper than most monitors can display. Typed text remains very clear. Handwriting and small font sizes (below 8pt) may show very slight softening. If you need to print the output at A4 or letter size, or if the document contains fine-detail charts, use 75% downscale instead.
My compressed PDF is only slightly smaller than the original. Why?
A few things can cause this. First, the original PDF may already have been compressed β€” if a previous tool already applied JPEG compression, re-compressing at the same quality produces minimal further savings. Second, if the file is large but contains mostly fonts, form fields, or vector content rather than raster images, image compression has little effect. Third, some scanners produce JBIG2-encoded monochrome pages that are already extremely compact β€” this tool will leave those streams unchanged and only compress any color images present.
Does this tool upload my files anywhere?
No. All processing happens entirely inside your browser using JavaScript. Your PDF bytes never leave your machine. You can verify this by disconnecting from the internet before using the tool β€” it will still work. The output is generated as a local Blob URL and downloaded directly to your device.
What JPEG quality should I use for archival copies?
For archival purposes where you want to preserve image fidelity, use quality 75–85 with no downscale (100% scale). This gives moderate file-size savings β€” typically 30–60% on scanner-default settings β€” while keeping all pixels at original resolution. For legal or medical documents where image integrity may be scrutinized, consider whether lossy compression is appropriate at all; lossless compression tools are better suited for those cases.
The tool says 'Unsupported filter' for some images. What does that mean?
PDF images can be encoded with many different compression schemes. This tool handles the two most common ones β€” DCTDecode (JPEG) and FlateDecode (zlib/deflate with optional PNG predictors). Less common formats like JBIG2Decode (high-efficiency monochrome encoding), CCITTFaxDecode (fax Group 3/4), or RunLengthDecode are noted as unsupported, and those specific images are kept in the output at their original size. The rest of the PDF is still recompressed normally.